Previously we only used the first anchor span to affect header ids. This
allows us to use all the anchor spans in a header, whether they're
nested or not.
Along with 62882f97, this closes#3088.
Previously we always generated an id for headers (since they wouldn't
bring one from Docx). Now we let it use an existing one if
possible. This should allow us to recurs through anchor spans.
Previously, we would only be able to figure out internal links to a
header in a docx if the anchor span was empty. We change that to read
the inlines out of the first anchor span in a header.
This still leaves another problem: what to do if there are multiple
anchor spans in a header. That will be addressed in a future commit.
The functions `isElem` and `elemName` (defined in Docx/Util.hs) make the
code a lot cleaner than the original XML.Light functions, but they had
been used inconsistently. This puts them in wherever applicable.
Image sources as those in plain images, image links, or figures, must be
proper URIs or relative file paths to be recognized as images. This
restriction is now enforced for all image sources.
This also fixes the reader's usage of uncleaned image sources, leading
to `file:` prefixes not being deleted from figure
images (e.g. `[[file:image.jpg]]` leading to a broken image `<img
src="file:image.jpg"/>)
Thanks to @bsag for noticing this bug.
Previously these yielded strings of alternating Code and Space
elements; we now incorporate the spaces into the Code. Emphasis
etc. is still possible inside these.
Closes#3055.
Previously an unquoted attribute value in a table row
could cause parsing problems.
Fixes#3053 (well, proper rowspans and colspans aren't
created, but that's a bigger limitation with the current
Pandoc document model for tables).
Previously blockquotes were used. Now a Div is used
with class `admonition` and (if relevant) one of the
following: `attention`, `caution`, `danger`, `error`,
`hint`, `important`, `note`, `tip`, `warning`.
`sidebar` is also put into a Div.
Note: This will change rendering of RST documents!
It should provide much more flexibility.
Closes#3031.
We now handle cell and row attributes, mostly by skipping
them. However, alignments are now handled properly.
Since in pandoc alignment is per-column, not per-cell, we
try to devine column alignments from cell alignments.
Table captions are also now parsed, and textile indicators
for thead and tfoot no longer cause parse failure. (However,
a row designated as tfoot will just be a regular row in pandoc.)
Org rules for allowed characters before or after markup chars were not
checked for verbatim text. This resultet in wrong parsing outcomes of
if the verbatim text contained e.g. space enclosed markup characters as
part of the text (`=is_substr = True=`). Forcing the parser to update
the positions of allowed/forbidden markup border characters fixes this.
This fixes#3016.
Some less-than-smart code required a pragma switching of overlapping
pattern warnings in order to compile seamlessly. Using view patterns
makes the code easier to read and also doesn't require overlapping
pattern checks to be disabled.
Handling of archived trees can be modified using the `arch` option.
Archived trees are either dropped, exported completely, or collapsed to
include just the header when the `arch` option is nil, non-nil, or
`headline`, respectively.
Comment trees were handled after parsing, as pattern matching on lists
is easier than matching on sequences. The new method of reading
documents as trees allows for more elegant subtree removal.
Emacs org-mode is based on outline-mode, which treats documents as trees
with headlines are nodes. The reader is refactored to parse into a
similar tree structure. This simplifies transformations acting on
document (sub-)trees.
Figure labels given as `#+LABEL: thelabel` are used as the ID of the
respective image. This allows e.g. the LaTeX to add proper `\label`
markup.
This fixes half of #2496 and #2999.
The link
`<foo>`_
should have `foo` as both its link text and its URL.
See RST spec at
<http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#embedded-uris-and-aliases>
"The reference text may also be omitted, in which case the URI will be
duplicated for use as the reference text. This is useful for relative
URIs where the address or file name is also the desired reference text:
See `<a_named_relative_link>`_ or `<an_anonymous_relative_link>`__
for details."
Closes Debian #828167 -- reported by Christian Heller.
We can't guarantee we'll convert every comment correctly, though we'll
do the best we can. This warns if the comment includes something other
than Para or Plain.
This adds simple track-changes comment parsing to the docx reader. It is
turned on with `--track-changes=all`. All comments are converted to
inlines, which can list some information. In the future a warning will
be added for comments with formatting that seems like it will be
excessively denatured.
Note that comments can extend across blocks. For that reason there are
two spans: `comment-start` and `comment-end`. `comment-start` will
contain the comment. `comment-end` will always be empty. The two will be
associated by a numeric id.
Org mode allows arbitrary raw inlines ("export snippets" in Emacs
parlance) to be included as `@@format:raw foreign format text@@`.
Support for this features is added to the Org reader.
A specification for an official Org-mode citation syntax was drafted by
Richard Lawrence and enhanced with the help of others on the orgmode
mailing list. Basic support for this citation style is added to the
reader.
This closes#1978.
Semicolons are used as special characters in citations syntax. This
ensures the correct parsing of Pandoc-style citations:
[prefix; @key; suffix]
Previously, parsing would have failed unless there was a space or other
special character as the last <prefix> character.