Inclusion of planning info (*DEADLINE*, *SCHEDULED*, and *CLOSED*) can
be controlled via the `p` export option: setting the option to `t` will
add all planning information in a *Plain* block below the respective
headline.
* Use a Span with class "title-reference" for the default
title-reference role.
* Use B.text to split up contents into Spaces, SoftBreaks, and Strs
for title-reference.
* Use Code with class "interpreted-text" instead of Span and Str for
unknown roles. (The RST writer has also been modified to round-trip
this properly.)
* Disallow blank lines in interpreted text.
* Backslash-escape now works in interpreted text.
* Backticks followed by alphanumerics no longer end interpreted text.
Closes#4811.
RST does not allow nested emphasis, links, or other inline
constructs.
Closes#4581, double parsing of links with URLs as
link text. This supersedes the earlier fix for #4581
in 6419819b46.
Fixes#4561, a bug parsing with URLs inside emphasis.
Closes#4792.
Emphasis was not parsed when it followed directly after some block types
(e.g., lists).
The org reader uses a wrapper for the `parseFromString` function to
handle org-specific state. The last position of a character allowed
before emphasis was reset incorrectly in this wrapper. Emphasized text
was not recognized when placed directly behind a block which the reader
parses using `parseFromString`.
Fixes: #4784
Text.Pandoc.Emoji now exports `emojiToInline`, which returns a Span inline containing the emoji character and some attributes with metadata (class `emoji`, attribute `data-emoji` with emoji name). Previously, emojis (as supported in Markdown and CommonMark readers, e.g "😄")
were simply translated into the corresponding unicode code point. By wrapping them in Span
nodes, we make it possible to do special handling such as giving them a special font
in HTML output. We also open up the possibility of treating them differently when the
`--ascii` option is selected (though that is not part of this commit).
Closes#4743.
Amusewiki disables <literal> tags for security reasons.
If user wants similar behavior in pandoc, RawBlocks and RawInlines
can be removed or replaced with filters.
The characters allowed before and after emphasis can be configured via
`#+pandoc-emphasis-pre` and `#+pandoc-emphasis-post`, respectively. This
allows to change which strings are recognized as emphasized text on a
per-document or even per-paragraph basis. The allowed characters must be
given as (Haskell) string.
#+pandoc-emphasis-pre: "-\t ('\"{"
#+pandoc-emphasis-post: "-\t\n .,:!?;'\")}["
If the argument cannot be read as a string, the default value is
restored.
Closes: #4378
Lists are parsed in linear instead of exponential time now.
Contents of block tags, such as <quote>, is parsed directly,
without storing it in a string and parsing with parseFromString.
Fixed a bug: headers did not terminate lists.
Muse allows indentation to indicate quotation or alignment,
but only on the top level, not within a <quote> or list.
This patch also simplifies the code by removing museInQuote
and museInList fields from the state structure.
Headers and indented paragraphs are attempted to be parsed
only at the topmost level, instead of aborting parsing with guards.
Text::Amuse already explicitly requires it anyway.
Supporting block tags on the same line as contents makes
it hard to combine closing tag parsers with indentation parsers.
Being able to combine parsers is required for no-reparsing refactoring
of Muse reader.
Now list item contents is parsed as blocks,
without resorting to parseFromString.
Only the first line of paragraph has to
be indented now, just like in Emacs Muse
and Text::Amuse.
Definition lists are not refactored yet.
See also: issue #3865.
This is difficult to recreate with a modern version of Word, so I'm
using the file submitted with the bug report. It would be preferable
to find a smaller example with Latin characters, though, so as not to
confuse the issue being tested.
The change both improves performance and fixes a
regression whereby normal citations inside inline notes
were not parsed correctly.
Closesjgm/pandoc-citeproc#315.
This gives a pure way to insert an ersatz file into a FileTree.
In addition, we normalize paths both on insertion and on
lookup, so that "foo" and "./foo" will be judged equivalent.
The level of headers in included files can be shifted to a higher level
by specifying a minimum header level via the `:minlevel` parameter. E.g.
`#+include: "tour.org" :minlevel 1` will shift the headers in tour.org
such that the topmost headers become level 1 headers.
Fixes: #4154
The org reader test file had grown large, to the point that editor
performance was negatively affected in some cases. The tests are spread
over multiple submodules, and re-combined into a tasty TestTree in the
main org reader test file.
* Deprecate `--strip-empty-paragraphs` option. Instead we now
use an `empty_paragraphs` extension that can be enabled on
the reader or writer. By default, disabled.
* Add `Ext_empty_paragraphs` constructor to `Extension`.
* Revert "Docx reader: don't strip out empty paragraphs."
This reverts commit d6c58eb836.
* Implement `empty_paragraphs` extension in docx reader and writer,
opendocument writer, html reader and writer.
* Add tests for `empty_paragraphs` extension.
We now have the `--strip-empty-paragraphs` option for that,
if you want it. Closes#2252.
Updated docx reader tests.
We use stripEmptyParagraphs to avoid changing too
many tests. We should add new tests for empty paragraphs.
* Added underlineSpan builder function. This can be easily updated if needed. The purpose is for Readers to transform underlines consistently.
* Docx Reader: Use underlineSpan and update test
* Org Reader: Use underlineSpan and add test
* Textile Reader: Use underlineSpan and add test case
* Txt2Tags Reader: Use underlineSpan and update test
* HTML Reader: Use underlineSpan and add test case
Removed `writerSourceURL` from `WriterOptions` (API change).
Added `stSourceURL` to `CommonState`.
It is set automatically by `setInputFiles`.
Text.Pandoc.Class now exports `setInputFiles`, `setOutputFile`.
The type of `getInputFiles` has changed; it now returns `[FilePath]`
instead of `Maybe [FilePath]`.
Functions in Class that formerly took the source URL as a parameter
now have one fewer parameter (`fetchItem`, `downloadOrRead`,
`setMediaResource`, `fillMediaBag`).
Removed `WriterOptions` parameter from `makeSelfContained` in
`SelfContained`.
The org reader was updated to match current org-mode behavior: the set
of characters which are acceptable to occur as the first or last
character in an org emphasis have been changed and now allows all
non-whitespace chars at the inner border of emphasized text (see
`org-emphasis-regexp-components`).
Fixes: #3933
Previously pandoc would sometimes combine two line blocks separated by blanks, and ignore trailing blank lines within the line block.
Test is checked to be consisted with http://rst.ninjs.org/
This rewrite is primarily motivated by the need to
get macros working properly. A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).
We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.
A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
Exports Macro, Tok, TokType, Line, Column. [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
rawLaTeXInline and rawLaTeXBlock. (Both now return a String,
and they are polymorphic in state.)
* Added orgMacros field to OrgState. [API change]
* Removed readerApplyMacros from ReaderOptions.
Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.
Fixes#1390.
Fixes#2118.
Fixes#3236.
Fixes#3779.
Fixes#934.
Fixes#982.
* XML.toEntities: changed type to Text -> Text.
* Shared.tabFilter -- fixed so it strips out CRs as before.
* Modified writers to take Text.
* Updated tests, benchmarks, trypandoc.
[API change]
Closes#3731.
The Emacs default is to include tags in the headline when exporting.
Instead of just empty spans, which contain the tag name as attribute,
tags are rendered as small caps and wrapped in those spans.
Non-breaking spaces serve as separators for multiple tags.
Until now, org-ref cite keys included special characters also at the
end. This caused problems when citations occur right before colons or
at the end of a sentence.
With this change, all non alphanumeric characters at the end of a cite
key are ignored.
This also adds `,` to the list of special characters that are legal
in cite keys to better mirror the behaviour of org-export.
Emacs parses org documents into a tree structure, which is then
post-processed during exporting. The reader is changed to do the same,
turning the document into a single tree of headlines starting at
level 0.
Fixes: #3695
Parsing of smart quotes and special characters can either be enabled via
the `smart` language extension or the `'` and `-` export options. Smart
parsing is active if either the extension or export option is enabled.
Only smart parsing of special characters (like ellipses and en and em
dashes) is enabled by default, while smart quotes are disabled.
This means that all smart parsing features will be enabled by adding the
`smart` language extension. Fine-grained control is possible by leaving
the language extension disabled. In that case, smart parsing is
controlled via the aforementioned export OPTIONS only.
Previously, all smart parsing was disabled unless the language extension
was enabled.
Source block parameter names are no longer prefixed with *rundoc*. This
was intended to simplify working with the rundoc project, a babel
runner. However, the rundoc project is unmaintained, and adding those
markers is not the reader's job anyway.
The original language that is specified for a source element is now
retained as the `data-org-language` attribute and only added if it
differs from the translated language.
The line-numbering switch that can be given to source blocks (`-n` with
an start number as an optional parameter) is parsed and translated to a
class/key-value combination used by highlighting and other readers and
writers.