This fixes a bug where pandoc would stop parsing a URI with an
empty attribute: for example, `&a=&b=` wolud stop at `a`.
(The uri parser tries to guess which punctuation characters
are part of the URI and which might be punctuation after it.)
Closes#4068.
Previously we got a crash, because we were trying to print
a native cmark STRIKETHROUGH node, and the commonmark writer
in cmark-github doesn't support this. Work around this by
using a raw node to add the strikethrough delimiters.
Closes#4038.
* Move as much as possible to the CSS in the template.
* Ensure that all the HTML-based templates (including epub)
contain the CSS for columns.
* Columns default to 50% width unless they are given a width
attribute.
Closes#4028.
* Remove "width" attribute which is not allowed on div.
* Remove space between `<div class="column">` elements,
since this prevents columns whose widths sum to 100%
(the space takes up space).
Closes#4028.
and other non-HTML formats (`Text.Pandoc.Readers.HTML.htmlTag`).
The parser stopped at the first `>` character, even if it wasn't
the end of the comment.
Closes#4019.
Previously `\include` wouldn't work if the included file
contained, e.g., a begin without a matching end.
We've changed the Tok type so that it stores a full SourcePos,
rather than just a line and column. So tokens keeep track
of the file they came from. This allows us to use a simpler
method for includes, which doesn't require parsing the included
document as a whole.
Closes#3971.
* Options: Added readerStripComments to ReaderOptions.
* Added `--strip-comments` command-line option.
* Made `htmlTag` from the HTML reader sensitive to this feature.
This affects Markdown and Textile input.
Closes#2552.
Div's are difficult to translate into org syntax, as there are multiple
div-like structures (drawers, special blocks, greater blocks) which all
have their advantages and disadvantages. Previously pandoc would
use raw HTML to preserve the full div information; this was rarely
useful and resulted in visual clutter. Div-rendering was changed to
discard the div's classes and key-value pairs if there is no natural way
to translate the div into an org structure.
Closes: #3771
Closes#3511.
Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces. Now the indentation required is determined by the
first line of the list item: to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker. See the CommonMark
spec for more details and examples.
Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules. Here are some examples
of texts that will be parsed differently:
- a
- b
will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.
- a
code
Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin. With the
four-space rule, this would be a regular paragraph rather than a code
block.
- a
code
Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`. With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).
This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
#1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).
Users who want to use the old rules can select the `four_space_rule`
extension.
* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
of `gobbleSpaces` has been changed so that a `ReaderOptions`
parameter is not needed.
Acronyms are not resolved by the reader, but acronym and glossary information is put into attributes on Spans so that they can be processed in filters.
The structure expected is:
<div class="columns">
<div class="column" width="40%">
contents...
</div>
<div class="column" width="60%">
contents...
</div>
</div>
Support has been added for beamer and all HTML slide formats.
Closes#1710.
Note: later we could add a more elegant way to create
this structure in Markdown than to use raw HTML div elements.
This would come for free with a "native div syntax" (#168).
Or we could devise something specific to slides
We assume that comments are defined as parsed by the
docx reader:
I want <span class="comment-start" id="0" author="Jesse Rosenthal"
date="2016-05-09T16:13:00Z">I left a comment.</span>some text to
have a comment <span class="comment-end" id="0"></span>on it.
We assume also that the id attributes are unique and properly
matched between comment-start and comment-end.
Closes#2994.
Previously they would be transmitted to the template without
any escaping.
Note that `--M title='*foo*'` yields a different result from
---
title: *foo*
---
In the latter case, we have emphasis; in the former case, just
a string with literal asterisks (which will be escaped
in formats, like Markdown, that require it).
Closes#3792.
Also, fix regular macros so they're expanded at the
point of use, and NOT also the point of definition.
`\let` macros, by contrast, are expanded at the
point of definition. Added an `ExpansionPoint`
field to `Macro` to track this difference.
We previously did this only with raw blocks, on the assumption
that math environments would always be raw blocks. This has changed
since we now parse them as inline environments.
Closes#3816.
Thus, a span with attribute 'foo' gets written to HTML5
with 'data-foo', so it is valid HTML5.
HTML4 is not affected.
This will allow us to use custom attributes in pandoc without
producing invalid HTML.
Fixed applyMacros so that it operates on the whole
string, not just the first token!
Don't remove macro definitions from the output,
even if Ext_latex_macros is set, so that macros will
be applied. Since they're only applied to math in
Markdown, removing the macros can have bad effects.
Even for math macros, keeping them should be harmless.
This rewrite is primarily motivated by the need to
get macros working properly. A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).
We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.
A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
Exports Macro, Tok, TokType, Line, Column. [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
rawLaTeXInline and rawLaTeXBlock. (Both now return a String,
and they are polymorphic in state.)
* Added orgMacros field to OrgState. [API change]
* Removed readerApplyMacros from ReaderOptions.
Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.
Fixes#1390.
Fixes#2118.
Fixes#3236.
Fixes#3779.
Fixes#934.
Fixes#982.
If the metadata field is all on one line, we try to interpret
it as Inlines, and only try parsing as Blocks if that fails.
If it extends over one line (including possibly the `|` or
`>` character signaling an indented block), then we parse as
Blocks.
This was motivated by some German users finding that
date: '22. Juin 2017'
got parsed as an ordered list.
Closes#3755.
Note that if the table has a first page header and a
continuation page header, the notes will appear only
on the first occurrence of the header.
Closes#2378.
Note that as a result of this change, the following,
which formerly produced a header with two lines separated
by a line break, will now produce a header followed by a
paragraph:
# Hi\
there
This may affect some existing documents that relied on
this undocumented and unintended behavior.
This change makes pandoc more consistent with other
Markdown implementations, and with itself (since the two-space
version of a line break doesn't work inside ATX headers, and
neither version works inside Setext headers).
Closes#3730.
Currently we only handle the form `0.9\linewidth`.
Anything else would have to be converted to a percentage,
using some kind arbitrary assumptions about line widths.
See #3709.
Babel result blocks can have block attributes like captions and names.
Result blocks with attributes were not recognized and were parsed as
normal blocks without attributes.
Fixes: #3706
With `--reference-location` of `section` or `block`, pandoc
will now repeat references that have been used in earlier
sections.
The Markdown reader has also been modified, so that *exactly*
repeated references do not generate a warning, only
references with the same label but different targets.
The idea is that, with references after every block,
one might want to repeat references sometimes.
Closes#3701.
- Export `inEm` from ImageSize [API change].
- Change `showFl` and `show` instance for `Dimension` so
extra decimal places are omitted.
- Added `Em` as a constructor of `Dimension` [API change].
- Allow `em`, `cm`, `in` to pass through without conversion
in HTML, LaTeX.
Closes#3450.
This is now the default for pandoc's Markdown.
It allows whitespace between the two parts of a
reference link: e.g.
[a] [b]
[b]: url
This is now forbidden by default.
Closes#2602.
This is a verison of parseFromString specialied to
ParserState, which resets stateLastStrPos at the end.
This is almost always what we want.
This fixes a bug where `_hi_` wasn't treated as emphasis in
the following, because pandoc got confused about the
position of the last word:
- [o] _hi_
Closes#3690.
Support for the `#+INCLUDE:` file inclusion mechanism was added.
Recognized include types are *example*, *export*, *src*, and normal org
file inclusion. Advanced features like line numbers and level selection
are not implemented yet.
Closes: #3510
The grid table parsers for markdown and rst was combined into one single
parser, slightly changing parsing behavior of both parsers:
- The markdown parser now compactifies block content cell-wise: pure
text blocks in cells are now treated as paragraphs only if the cell
contains multiple paragraphs, and as plain blocks otherwise. Before,
this was true only for single-column tables.
- The rst parser now accepts newlines and multiple blocks in header
cells.
Closes: #3638
Previously we inadvertently interpreted indented HTML as
code blocks. This was a regression.
We now seek to determine the indentation level of the contents
of an HTML block, and (optionally) skip that much indentation.
As a side effect, indentation may be stripped off of raw
HTML blocks, if `markdown_in_html_blocks` is used. This
is better than having things interpreted as indented code
blocks.
Closes#1841.
* Fix keyval funtion: pandoc did not parse options in braces correctly. Additionally, dot, dash, and colon were no valid characters
* Add | as possible option value
* Improved code
Previously the Markdown writer would sometimes create links where there
were none in the source. This is now avoided by selectively escaping bracket
characters when they occur in a place where a link might be created.
Closes#3619.
Ensure that we do not generate reference links
whose labels differ only by case.
Also allow implicit reference links when the link
text and label are identical up to case.
Closes#3615.
Previously the LaTeX writer created invalid LaTeX
when `--listings` was specified and a code span occured
inside emphasis or another construction.
This is because the characters `%{}\` must be escaped
in lstinline when the listinline occurs in another
command, otherwise they must not be escaped.
To deal with this, adoping Michael Kofler's suggestion,
we always wrap lstinline in a dummy command `\passthrough`,
now defined in the default template if `--listings` is
specified. This way we can consistently escape the
special characters.
Closes#1629.
A figure with two subfigures turns into two pandoc
figures; the subcaptions are used and the main caption
ignored, unless there are no subcaptions.
Closes#3577.
Previously we always added an empty div before the list
item, but this created problems with spacing in tight
lists. Now we do this:
If the list item contents begin with a Plain block,
we modify the Plain block by adding a Span around
its contents.
Otherwise, we add a Div around the contents of the
list item (instead of adding an empty Div to the
beginning, as before).
Closes#3596.
* Fix lstinline handling: lstinline with braces can be used (verb cannot be used with braces)
* Use codeWith and determine the language from lstinline
* Improve code
* Add another test: convert lstinline without language option
This is enabled by default in pandoc and GitHub markdown but not the
other flavors.
This requirse a space between the opening #'s and the header
text in ATX headers (as CommonMark does but many other implementations
do not). This is desirable to avoid falsely capturing things ilke
#hashtag
or
#5Closes#3512.
This was failing because of a small discrepancy in markdown
table header line lengths on appveyor.
It's a minor issue, I can't see what is causing it, and
it's irrelevant to the issue this is testing, so we'll
just write native for this test.
Closes#1905.
Removed stateChapters from ParserState.
Now we parse chapters as level 0 headers, and parts as level -1 headers.
After parsing, we check for the lowest header level, and if it's
less than 1 we bump everything up so that 1 is the lowest header level.
So `\part` will always produce a header; no command-line options
are needed.
Previously only autogenerated ids were added to the list
of header identifiers in state, so explicit ids weren't taken
into account when generating unique identifiers. Duplicated
identifiers could result.
This simple fix ensures that explicitly given identifiers are
also taken into account.
Fixes#1745.
Note some limitations, however. An autogenerated identifier
may still coincide with an explicit identifier that is given
for a header later in the document, or with an identifier on
a div, span, link, or image. Fixing this would be much more
difficult, because we need to run `registerHeader` before
we have the complete parse tree (so we can't get a complete
list of identifiers from the document by walking the tree).
However, it might be worth issuing warnings for duplicate
header identifiers; I think we can do that. It is not
common for headers to have the same text, and the issue
can always be worked around by adding explicit identifiers,
if the user is aware of it.
* Previously we got overlong lists with `--wrap=none`. This is fixed.
* Previously a multiline list could become a simple list (and would
always become one with `--wrap=none`).
Closes#3384.
when they occur without space surrounding them.
E.g. equation, math.
This avoids incorrect vertical space around equations.
Closes#3309.
Closes#2171.
See also rstudio/bookdown#358.
E.g. an HTML table with two cells in the first row and one
in the second (but no row/colspan).
We now calculate the number of columns based on the longest
row (or the length of aligns or widths).
Closes#3337.
When multiple YAML metadata blocks are used, and two define
the same field, the value defined first takes precedence,
according to the manual. This was changed briefly in
ba3ee62323. This commit
reverts to the original behavior and adds a test case.
Added test cases.
Fixed HTML reader to parse a span with class "smallcaps" as
SmallCaps.
Fixed Markdown writer to render SmallCaps as a native span
when native spans are enabled.
The citations appear at the end of the document as a definition
list in a special div with id `citations`.
Citations link to the definitions.
Added stateCitations to ParserState.
Closes#853.
Previously we would refuse to parse anything as raw inline if
it was in the blockCommands list. Now we allow exceptions
if they're listed under ignoreInlines in inlineCommands.
This should make it easier e.g. to include an \hspace
between two side-by-side raw LaTeX tables.
Otherwise things like `\noindent foo` break and turn into
`\noindentfoo`.
Affects `-f latex+raw_tex` and `-f markdown` (and other formats
that allow `raw_tex`).
Closes#1773.
For example, in
\begin{tabular}{>{$}l<{$}>{$}l<{$} >{$}l<{$}}
each cell will be interpreted as if it has a `$`
before its content and a `$` after (math mode).
These were confusing.
Now we rely on the +raw_tex or +raw_html extension with latex
or html input.
Thus, instead of
--parse-raw -f latex
we use
-f latex+raw_tex
and instead of
--parse-raw -f html
we use
-f html+raw_html