Closes#1345. Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.
I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc. The previous behavior resulted in the
text not appearing in any output format.
`\emph{ hi }` gets parsed as `[Space, Emph [Str "hi"], Space]`
so that we don't get things like `* hi *` in markdown output.
Also applies to textbf and some other constructions.
Closes#1146. (`--normalize` isn't touched by this, but
normalization should not generally be necessary with the
changes to the readers.)
This change rewrites `inlineLaTeXCommand` so that parsec will
know when input is being consumed. Previously a run-time
error would be produced with some input involving raw latex.
(I believe this does not affect the last release, as the inline
latex reading was added recently.)
This should have fixed#1305, allowing the reference.docx to define
section numbering, but it doesn't. Now the headings appear with proper
indentation, but the numbers don't appear. Unclear why. styles.xml and
numbering.xml basically match the docx which has the expected result.
Now the minimum id used by pandoc is 990. All ids start with "99".
This gives some room for a reference.docx to define numbering styles.
Note: this is not yet possible, since pandoc generates numbering.xml
entirely on its own.
Instead of sequential numbering, we assign numbers based on the
list marker styles. This simplifies some of the code and should
make it easier to modify numbering in the future.
* All media from reference.docx are copied into result.
* Added defaults for common image types to [Content Types].
* Avoided redundant XML parse + write for entries taken over from
reference.docx, for better performance.
With the move from parsec to attoparsec, we lost good error
reporting. In fact, since we weren't testing for end of input,
malformed templates would fail silently. Here we revert back to
Parsec for better error messages.
Inline LaTeX is now accepted and parsed by the org-mode reader. Both,
math symbols (like \tau) and LaTeX commands (like \cite{Coffee}), can be
used without any further escaping.
In 1.12.4 and 1.12.4.2, the cover image would not appear properly,
because the metadata id was not correct.
This was introduced by the fix to #1254.
Now we derive the id from the actual cover image filename,
which we preserve rather than using "cover-image."
Citations are defined via the "normal citation" syntax used in markdown,
with the sole difference that newlines are not allowed between "[...]".
This is for consistency, as org-mode generally disallows newlines
between square brackets.
The extension is turned on by default and can be turned off via the
default syntax-extension mechanism, i.e. by specifying "org-citation" as
the input format.
Move `citeKey` from Readers.Markdown into Parsing
The function can be used by other readers, so it is made accessible for
all parsers.
Both `ParserState` and `OrgParserState` keep track of the parser position at
which the last string ended. This patch introduces a new class
`HasLastStrPosition` and makes the above types instances of that class. This
enables the generalization of functions updating the state or checking if one
is right after a string.
The reader produced wrong results for block containing non-letter chars
in their parameter arguments. This patch relaxes constraints in that it
allows block header arguments to contain any non-space character (except
for ']' for inline blocks).
Thanks to Xiao Hanyu for noticing this.
The general form of source block headers
(`#+BEGIN_SRC <language> <switches> <header arguments>`) was not
recognized by the reader. This patch adds support for the above form,
adds header arguments to the block's key-value pairs and marks the block
as a rundoc block if header arguments are present.
This closes#1286.
(It seems clearer to put the whitespace parsing in the grouped
parser. This also uses stateLastStrPos to determine when the
border is adjacent to an alphanumeric.)
Org's inline code blocks take forms like `src_haskell(print "hi")` and
are frequently used to include results from computations called from
within the document. The blocks are read as inline code and marked with
the special class `rundoc-block`. Proper handling and execution of
these blocks is the subject of a separate library, rundoc, which is
work in progress.
This closes#1278.
* Undid changes to parseXml in last commit.
* Instead of a string fallback, we have parseXml fall back
on the reference.docx that comes with pandoc if the user's
reference.docx does not contain a needed file.
* Closes#1185.
Closes#1274.
Rewrote handleIncludes.
We now report the actual source file and position where the error
occurs, even if it is included. We do this by inserting special
commands, `\PandocStartInclude` and `\PandocEndInclude`, that encode
this information in the preprocessing phase.
Also generalized the types of a couple functions from
`Text.Pandoc.Parsing`.
Org allows users to define their own custom link types. E.g., in a
document with a lot of links to Wikipedia articles, one can define a
custom wikipedia link-type via
#+LINK: wp https://en.wikipedia.org/wiki/
This allows to write [[wp:Org_mode][Org-mode]] instead of the
equivallent [[https://en.wikipedia.org/wiki/Org_mode][Org-mode]].
* We now correctly handle field lists that are indented more than
3 spaces.
* We treat an "aafig" directive as a code block with attributes,
so it can be processed in a filter. (Closes #1212.)
We now check the writerName for a lua script in pandoc.hs, so that
lowercasing and format parsing aren't done. Note this behavior
change: getWriter in Text.Pandoc no longer returns a custom writer on
input "foo.lua".
This adds nocite citations to a metadata field, `nocite`.
These will appear in the bibliography but not in the text
(unless you use a `$nocite$` variable in your template, of
course).
Internal links in Org are possible by using an anchor-name as the target
of a link:
[[some-anchor][This]] is an internal link.
It links <<some-anchor>> here.
The function `compactify'DL`, used to change the final definition item of a
definition list into a `Plain` iff all other items are `Plain`s as well, is
useful in many parsers and hence moved into Text.Pandoc.Shared.
Footnotes can consist of multiple blocks and end only at a header or at
the beginning of another footnote. This fixes the previous behavior,
which restricted notes to a single paragraph.
Support for standard org-blocks is improved. The parser now handles
"HTML", "LATEX", "ASCII", "EXAMPLE", "QUOTE" and "VERSE" blocks in a
sensible fashion.
* Use a <literallayout> for the entire paragraph, not just for the
newline character
* Don't let LineBreaks inside footnotes influence the enclosing
paragraph
This fixes the org-reader's handling of sub- and superscript
expressions. Simple expressions (like `2^+10`), expressions in
parentheses (`a_(n+1)`) and nested sexp (like `a_(nested()parens)`) are
now read correctly.
Support all of the following variants as valid ways to define inline or
display math inlines:
- `\[..\]` (display)
- `$$..$$` (display)
- `\(..\)` (inline)
- `$..$` (inline)
This closes#1223. Again.
In particular we now pick up on attributes. Since pandoc links
can't have attributes, we enclose the whole link in a span
if there are attributes.
Closes#1008.
These previously produced invalid LaTeX: `\paragraph` or
`\subparagraph` in a `quote` environment. This adds an
`mbox{}` in these contexts to work around the problem.
See http://tex.stackexchange.com/a/169833/22451.
Closes#1221.
Instead of being ignored, attributes are now parsed and
included in Span inlines.
The output will be a bit different from stock textile:
e.g. for `*(foo)hi*`, we'll get `<em><span class="foo">hi</span></em>`
instead of `<em class="foo">hi</em>`. But at least the data is
not lost.
Org-mode and Pandoc use different language identifiers, marking source
code as being written in a certain programming language. This adds more
translations from identifiers as used in Org to identifiers used in
Pandoc.
The full list of identifiers used in Org and Pandoc is available through
http://orgmode.org/manual/Languages.html and `pandoc -v`, respectively.
Text such as /*this*/ was not correctly parsed as a strong, emphasised
word. This was due to the end-of-word recognition being to strict as it
did not accept markup chars as part of a word. The fix involves an
additional parser state field, listing the markup chars which might be
parsed as part of a word.
The default pandoc ParserState is replaced with `OrgParserState`. This
is done to simplify the introduction of new state fields required for
efficient Org parsing.