`normalize` from Text.Pandoc.Shared is more general. In tests, though,
it more than doubles the run time. `strNormalize` does less, but it does
what we need. This comment is added for future maintainability.
Use a function `stripSpaces`, instead of recursion. Makes it a bit
easier to read and mantain, and simplify normalizing DefinitionList,
which was left out the first time.
This brings pandoc's rendering of haddock markup in line
with the new haddock.
Note that we preserve line breaks in `@` code blocks, unlike
the earlier version.
Modified tests pass. More tests would be good.
Closes#1236.
Note, this is a bit of a kludge, to work around the fact that xml-light
doesn't parse `<?asciidoc-br?>` correctly. We preprocess the input,
replacing that instruction with `<br/>`, and then parse that as a line
break. Other XML instructions are simply removed from the input stream.
Closes#1345. Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.
I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc. The previous behavior resulted in the
text not appearing in any output format.
`\emph{ hi }` gets parsed as `[Space, Emph [Str "hi"], Space]`
so that we don't get things like `* hi *` in markdown output.
Also applies to textbf and some other constructions.
Closes#1146. (`--normalize` isn't touched by this, but
normalization should not generally be necessary with the
changes to the readers.)
This change rewrites `inlineLaTeXCommand` so that parsec will
know when input is being consumed. Previously a run-time
error would be produced with some input involving raw latex.
(I believe this does not affect the last release, as the inline
latex reading was added recently.)
This should have fixed#1305, allowing the reference.docx to define
section numbering, but it doesn't. Now the headings appear with proper
indentation, but the numbers don't appear. Unclear why. styles.xml and
numbering.xml basically match the docx which has the expected result.
Now the minimum id used by pandoc is 990. All ids start with "99".
This gives some room for a reference.docx to define numbering styles.
Note: this is not yet possible, since pandoc generates numbering.xml
entirely on its own.
Instead of sequential numbering, we assign numbers based on the
list marker styles. This simplifies some of the code and should
make it easier to modify numbering in the future.
* All media from reference.docx are copied into result.
* Added defaults for common image types to [Content Types].
* Avoided redundant XML parse + write for entries taken over from
reference.docx, for better performance.
With the move from parsec to attoparsec, we lost good error
reporting. In fact, since we weren't testing for end of input,
malformed templates would fail silently. Here we revert back to
Parsec for better error messages.
Inline LaTeX is now accepted and parsed by the org-mode reader. Both,
math symbols (like \tau) and LaTeX commands (like \cite{Coffee}), can be
used without any further escaping.
In 1.12.4 and 1.12.4.2, the cover image would not appear properly,
because the metadata id was not correct.
This was introduced by the fix to #1254.
Now we derive the id from the actual cover image filename,
which we preserve rather than using "cover-image."
Citations are defined via the "normal citation" syntax used in markdown,
with the sole difference that newlines are not allowed between "[...]".
This is for consistency, as org-mode generally disallows newlines
between square brackets.
The extension is turned on by default and can be turned off via the
default syntax-extension mechanism, i.e. by specifying "org-citation" as
the input format.
Move `citeKey` from Readers.Markdown into Parsing
The function can be used by other readers, so it is made accessible for
all parsers.
Both `ParserState` and `OrgParserState` keep track of the parser position at
which the last string ended. This patch introduces a new class
`HasLastStrPosition` and makes the above types instances of that class. This
enables the generalization of functions updating the state or checking if one
is right after a string.
The reader produced wrong results for block containing non-letter chars
in their parameter arguments. This patch relaxes constraints in that it
allows block header arguments to contain any non-space character (except
for ']' for inline blocks).
Thanks to Xiao Hanyu for noticing this.
The general form of source block headers
(`#+BEGIN_SRC <language> <switches> <header arguments>`) was not
recognized by the reader. This patch adds support for the above form,
adds header arguments to the block's key-value pairs and marks the block
as a rundoc block if header arguments are present.
This closes#1286.
(It seems clearer to put the whitespace parsing in the grouped
parser. This also uses stateLastStrPos to determine when the
border is adjacent to an alphanumeric.)
Org's inline code blocks take forms like `src_haskell(print "hi")` and
are frequently used to include results from computations called from
within the document. The blocks are read as inline code and marked with
the special class `rundoc-block`. Proper handling and execution of
these blocks is the subject of a separate library, rundoc, which is
work in progress.
This closes#1278.
* Undid changes to parseXml in last commit.
* Instead of a string fallback, we have parseXml fall back
on the reference.docx that comes with pandoc if the user's
reference.docx does not contain a needed file.
* Closes#1185.