With this change, autolinks are parsed as Links with
the `uri` class. (The same is true for bare links, if
the `autolink_bare_uris` extension is enabled.) Email
autolinks are parsed as Links with the `email` class.
This allows the distinction to be represented in the
URI.
Formerly the `uri` class was added to autolinks by
the HTML writer, but it had to guess what was an autolink
and could not distinguish `[http://example.com](http://example.com)`
from `<http://example.com>`. It also incorrectly recognized
`[pandoc](pandoc)` as an autolink. Now the HTML writer
simply passes through the `uri` attribute if it is present,
but does not add anything.
The Textile writer has been modified so that the `uri`
class is not explicitly added for autolinks, even if it
is present.
Closes#4913.
Inclusion of planning info (*DEADLINE*, *SCHEDULED*, and *CLOSED*) can
be controlled via the `p` export option: setting the option to `t` will
add all planning information in a *Plain* block below the respective
headline.
* Use a Span with class "title-reference" for the default
title-reference role.
* Use B.text to split up contents into Spaces, SoftBreaks, and Strs
for title-reference.
* Use Code with class "interpreted-text" instead of Span and Str for
unknown roles. (The RST writer has also been modified to round-trip
this properly.)
* Disallow blank lines in interpreted text.
* Backslash-escape now works in interpreted text.
* Backticks followed by alphanumerics no longer end interpreted text.
Closes#4811.
Closes#4803
After this commit use `$titleblock$` in order to get what was contained
in `$title$` before, that is a title and subtitle rendered according to
the official rST method:
http://docutils.sourceforge.net/docs/user/rst/quickstart.html#document-title-subtitle. from
With this commit, the `$title$` and `$subtitle$` metadata are available and they
simply carry the metadata values. This opens up more possibilities in templates.
Exposes a function converting which flattenes a list of blocks into a
list of inlines. An example use case would be the conversion of Note
elements into other inlines.
RST does not allow nested emphasis, links, or other inline
constructs.
Closes#4581, double parsing of links with URLs as
link text. This supersedes the earlier fix for #4581
in 6419819b46.
Fixes#4561, a bug parsing with URLs inside emphasis.
Closes#4792.
Emphasis was not parsed when it followed directly after some block types
(e.g., lists).
The org reader uses a wrapper for the `parseFromString` function to
handle org-specific state. The last position of a character allowed
before emphasis was reset incorrectly in this wrapper. Emphasized text
was not recognized when placed directly behind a block which the reader
parses using `parseFromString`.
Fixes: #4784
Text.Pandoc.Emoji now exports `emojiToInline`, which returns a Span inline containing the emoji character and some attributes with metadata (class `emoji`, attribute `data-emoji` with emoji name). Previously, emojis (as supported in Markdown and CommonMark readers, e.g "😄")
were simply translated into the corresponding unicode code point. By wrapping them in Span
nodes, we make it possible to do special handling such as giving them a special font
in HTML output. We also open up the possibility of treating them differently when the
`--ascii` option is selected (though that is not part of this commit).
Closes#4743.
Removed `--latexmathml`, `--gladtex`, `--mimetex`, `--jsmath`, `-m`,
`--asciimathml` options.
Removed `JsMath`, `LaTeXMathML`, and `GladTeX` constructors from
`Text.Pandoc.Options.HTMLMathMethod` [API change].
Removed unneeded data file LaTeXMathML.js and updated tests.
Bumped version to 2.2.
Previously we used an odd mix of 3- and 4-space indentation.
Now we use 3-space indentation, except for ordered lists,
where indentation must depend on the width of the list marker.
Closes#4563.
HorizontalRule corresponds to <hr> element in the default output
format, HTML. Current HTML standard defines <hr> element as
"paragraph-level thematic break". In typography it is often
represented by extra space or centered asterism ("⁂"), but since
FB2 does not support text centering, empty line (similar to extra space)
is the only solution.
Line breaks, on the other hand, don't generate <empty-line />
anymore. Previously line breaks generated <empty-line /> element
inside paragraph, which is not allowed. So, this commit addresses
issue #2424 ("FB2 produced by pandoc doesn't validate").
FB2 does not have a way to represent line breaks inside paragraphs.
They are replaced with LF character, which is not rendered by
FB2 readers, but at least preserves some information.
Tests speaker notes appearing after (and inside of) separating blocks.
Output checked on Windows10 (archlinux virtualbox), PowerPoint
2013. Not corrupted, and output as expected.
This patch fixes some cases where the JATS writer was introducing
semantically significant whitespace by indenting and wrapping tags.
Note that the JATS spec has a content model for `<p>` tags of `(#PCDATA | ...`.
Any tag where `#PCDATA` children are possible should not have any
indentation. The same is true for `<th>`, `<td>`, `<term>`, `<label>`.
Amusewiki disables <literal> tags for security reasons.
If user wants similar behavior in pandoc, RawBlocks and RawInlines
can be removed or replaced with filters.
These were never added when the tests were first created.
Output files checked in MS PowerPoint 2013 (Windows 10, VBox). No
corruption, and output as expected.
Make sure there are no empty slides in the pptx output. Because of the
way that slides were split, these could be accidentally produced by
comments after images.
When animations are added, there will be a way to add an empty slide
with either incremental lists or pauses.
Test outputs checked with MS PowerPoint (Office 2013, Windows 10,
VBox). Both files have expected output and are not corrupted.
The name of the Lua script which is executed is made available in the
global Lua variable `PANDOC_SCRIPT_FILE`, both for Lua filters and
custom writers.
Closes: #4393
The characters allowed before and after emphasis can be configured via
`#+pandoc-emphasis-pre` and `#+pandoc-emphasis-post`, respectively. This
allows to change which strings are recognized as emphasized text on a
per-document or even per-paragraph basis. The allowed characters must be
given as (Haskell) string.
#+pandoc-emphasis-pre: "-\t ('\"{"
#+pandoc-emphasis-post: "-\t\n .,:!?;'\")}["
If the argument cannot be read as a string, the default value is
restored.
Closes: #4378
Modify the PowerPoint tests to run all the tests with
template (--reference-doc) as well. Because there are so many
interlocking pieces, bugs can pop up in weird places when using
templates, since it changes how the writer builds its output
file.
For example, I recently discovered a bug in which speaker notes worked
fine and templating worked fine elsewhere, but templating with speaker
notes produced a file that would crash MS PowerPoint. That particular
bug was fixed, but this will forces us to check for that with each new
change.
Lists are parsed in linear instead of exponential time now.
Contents of block tags, such as <quote>, is parsed directly,
without storing it in a string and parsing with parseFromString.
Fixed a bug: headers did not terminate lists.
Muse allows indentation to indicate quotation or alignment,
but only on the top level, not within a <quote> or list.
This patch also simplifies the code by removing museInQuote
and museInList fields from the state structure.
Headers and indented paragraphs are attempted to be parsed
only at the topmost level, instead of aborting parsing with guards.
Text::Amuse already explicitly requires it anyway.
Supporting block tags on the same line as contents makes
it hard to combine closing tag parsers with indentation parsers.
Being able to combine parsers is required for no-reparsing refactoring
of Muse reader.
These are based off the reader tests, with some removed (where the
reader output was identical, based on different docx inputs). There
are still more to be added. In particular, tests for custom-styles
need to be added.
All golden docx files have been checked in MS Word
2013 (windows). There is no corruption.
There is questionable output in the `tables` test: the three tables
seemed to be joined. This will be addressed in a future commit, and
the golden docx file will be changed.
There is very little pptx-specific in these tests, so we abstract out
the basic testing function so it can be used for docx as well. This
should allow us to catch some errors in the docx writer that slipped
by the roundtrip testing.
Fixes#2609.
This PR introduces the new-style section headings: `\section[my-header]{My Header}` -> `\section[title={My Header},reference={my-header}]`.
On top of this, the ConTeXt writer now supports the `--section-divs` option to write sections in the fenced style, with `\startsection` and `\stopsection`.
We had previously re-read the native file and converted it to
Powerpoint. But we have already done that in constructing the test
archive. So now we just convert the archive back to a bytestring and
write it to disk.
Previously we had tested certain properties of the output PowerPoint
slides. Corruption, though, comes as the result of a numebr of
interrelated issues in the output pptx archive. This is a new
approach, which compares the output of the Powerpoint writer with
files that we know to (a) not be corrupt, and (b) to show the desired
output behavior (details below). This commit introduces three tests
using the new framework. More will follow.
The test procedure: given a native file and a pptx file, we generate a
pptx archive from the native file, and then test:
1. Whether the same files are in the two archives
2. Whether each of the contained xml files is the same. (We skip time
entries in `docProps/core.xml`, since these are derived from IO. We
just check to make sure that they're there in the same way in both
files.)
3. Whether each of the media files is the same.
Note that steps 2 and 3, though they compare multiple files, are one
test each, since the number of files depends on the input file (if
there is a failure, it will only report the first failed file
comparison in the test failure).
Now list item contents is parsed as blocks,
without resorting to parseFromString.
Only the first line of paragraph has to
be indented now, just like in Emacs Muse
and Text::Amuse.
Definition lists are not refactored yet.
See also: issue #3865.
This is difficult to recreate with a modern version of Word, so I'm
using the file submitted with the bug report. It would be preferable
to find a smaller example with Latin characters, though, so as not to
confuse the issue being tested.
The change both improves performance and fixes a
regression whereby normal citations inside inline notes
were not parsed correctly.
Closesjgm/pandoc-citeproc#315.
Elements with attributes got an additional `attr` accessor. Attributes
were accessible only via the `identifier`, `classes`, and `attributes`,
which was in conflict with the documentation, which indirectly states
that such elements have the an `attr` property.
Every constructor which accepts a list of blocks now also accepts a
single block element for convenience. Furthermore, strings are accepted as
shorthand for `{pandoc.Str "text"}` in constructors.
We had previously defaulted to slideLevel 2. Now we use the correct
behavior of defaulting to the highest level header followed by
content. We change an expected test result to match this behavior.
This gives a pure way to insert an ersatz file into a FileTree.
In addition, we normalize paths both on insertion and on
lookup, so that "foo" and "./foo" will be judged equivalent.
This is the beginning of a test suite for the powerpoint
writer. Initial tests are for the number of slides.
Note that at the moment it does not test against corruption in
Microsoft PowerPoint; it just tests that certain outcomes work as
expected. More tests will be added.
This test framework uses the PandocPure monad introduced with Pandoc 2.0.
The level of headers in included files can be shifted to a higher level
by specifying a minimum header level via the `:minlevel` parameter. E.g.
`#+include: "tour.org" :minlevel 1` will shift the headers in tour.org
such that the topmost headers become level 1 headers.
Fixes: #4154
The org reader test file had grown large, to the point that editor
performance was negatively affected in some cases. The tests are spread
over multiple submodules, and re-combined into a tasty TestTree in the
main org reader test file.
The same init file (`data/init`) that is used to setup the Lua
interpreter for Lua filters is also used to setup the interpreter of
custom writers.lua.
Support writing <fig> and <table-wrap> elements with <title> and
<caption> inside them by using Divs with class set to on of
fig, table-wrap or cation. The title is included as a Heading
so the constraint on where Heading can occur is also relaxed.
Also leaves out empty alt attributes on links.
The integration with Lua's package/module system is improved: A
pandoc-specific package searcher is prepended to the searchers in
`package.searchers`. The modules `pandoc` and `pandoc.mediabag` can now
be loaded via `require`.
* Deprecate `--strip-empty-paragraphs` option. Instead we now
use an `empty_paragraphs` extension that can be enabled on
the reader or writer. By default, disabled.
* Add `Ext_empty_paragraphs` constructor to `Extension`.
* Revert "Docx reader: don't strip out empty paragraphs."
This reverts commit d6c58eb836.
* Implement `empty_paragraphs` extension in docx reader and writer,
opendocument writer, html reader and writer.
* Add tests for `empty_paragraphs` extension.
We now have the `--strip-empty-paragraphs` option for that,
if you want it. Closes#2252.
Updated docx reader tests.
We use stripEmptyParagraphs to avoid changing too
many tests. We should add new tests for empty paragraphs.