Starting in 2.2.2, everything after an `\input` (or `\include`)
in a markdown file would be parsed as raw LaTeX.
This commit fixes the issue and adds a regression test.
Closes#4781.
Text.Pandoc.Emoji now exports `emojiToInline`, which returns a Span inline containing the emoji character and some attributes with metadata (class `emoji`, attribute `data-emoji` with emoji name). Previously, emojis (as supported in Markdown and CommonMark readers, e.g "😄")
were simply translated into the corresponding unicode code point. By wrapping them in Span
nodes, we make it possible to do special handling such as giving them a special font
in HTML output. We also open up the possibility of treating them differently when the
`--ascii` option is selected (though that is not part of this commit).
Closes#4743.
Non-ascii characters were not stripped from identifiers even if the
`ascii_identifiers` extension was enabled (which is is by default for
gfm).
Closes#4742
* Markdown writer now includes a blank line at the end
of the row in a single-row multiline table, to prevent it from being
interpreted as a simple table. Closes#4578.
* Markdown reader does a better job computing the relative width of
the last column in a multiline table, so we can round-trip tables
without constantly shrinking the last column.
Previously we used an odd mix of 3- and 4-space indentation.
Now we use 3-space indentation, except for ordered lists,
where indentation must depend on the width of the list marker.
Closes#4563.
* Annotate gridTable code with comments and abstract small functions
* Don't wrap lines in tables when `--wrap=none`. Instead, expand cells, even if
it results in cells that don't respect relative widths or surpass page column width.
* This change affects RST, Markdown, and Haddock writers.
and not in the title. If it's in the title, then we get
a titlebar on slides with the `plain` attribute, when
the id is non-null. This fixes a regression from 1.9.x.
Closes#4307.
Fixes#2609.
This PR introduces the new-style section headings: `\section[my-header]{My Header}` -> `\section[title={My Header},reference={my-header}]`.
On top of this, the ConTeXt writer now supports the `--section-divs` option to write sections in the fenced style, with `\startsection` and `\stopsection`.
Closes#4281.
Previously we allowed "nonindent spaces" before the
opening and closing `:::`, but this interfered with
list parsing, so now we require the fences to be
flush with the margin of the containing block.
rst2latex.py uses an align* environment for math in
`.. math::` blocks, so this math may contain line breaks.
If it does, we put the math in an `aligned` environment
to simulate rst2latex.py's behavior.
Closes#4254.
The change both improves performance and fixes a
regression whereby normal citations inside inline notes
were not parsed correctly.
Closesjgm/pandoc-citeproc#315.
even if the `latex_macros` extension is set.
This reverts to earlier behavior and is probably safer
on the whole, since some macros only modify things in
included packages, which pandoc's macro expansion can't
modify.
Closes#4246.
Don't pass through macro definitions themselves when `latex_macros`
is set. The macros have already been applied.
If `latex_macros` is enabled, then `rawLaTeXBlock` in
Text.Pandoc.Readers.LaTeX will succeed in parsing a macro definition,
and will update pandoc's internal macro map accordingly, but the
empty string will be returned.
Together with earlier changes, this closes#4179.
Previously we erroneously included the enclosing
backticks in a reference ID (closes#4156).
This change also disables interpretation of
syntax inside references, as in docutils.
So, there is no emphasis in
`my *link*`_
* Deprecate `--strip-empty-paragraphs` option. Instead we now
use an `empty_paragraphs` extension that can be enabled on
the reader or writer. By default, disabled.
* Add `Ext_empty_paragraphs` constructor to `Extension`.
* Revert "Docx reader: don't strip out empty paragraphs."
This reverts commit d6c58eb836.
* Implement `empty_paragraphs` extension in docx reader and writer,
opendocument writer, html reader and writer.
* Add tests for `empty_paragraphs` extension.
Previously both needed to be specified (unless the image was
being resized to be smaller than its original size).
If height but not width is specified, we now set width to
textwidth (and similarly if width but not height is specified).
Since we have keepaspectratio, this yields the desired result.
This fixes a bug where pandoc would stop parsing a URI with an
empty attribute: for example, `&a=&b=` wolud stop at `a`.
(The uri parser tries to guess which punctuation characters
are part of the URI and which might be punctuation after it.)
Closes#4068.
Previously we got a crash, because we were trying to print
a native cmark STRIKETHROUGH node, and the commonmark writer
in cmark-github doesn't support this. Work around this by
using a raw node to add the strikethrough delimiters.
Closes#4038.
* Move as much as possible to the CSS in the template.
* Ensure that all the HTML-based templates (including epub)
contain the CSS for columns.
* Columns default to 50% width unless they are given a width
attribute.
Closes#4028.
* Remove "width" attribute which is not allowed on div.
* Remove space between `<div class="column">` elements,
since this prevents columns whose widths sum to 100%
(the space takes up space).
Closes#4028.
and other non-HTML formats (`Text.Pandoc.Readers.HTML.htmlTag`).
The parser stopped at the first `>` character, even if it wasn't
the end of the comment.
Closes#4019.
Previously `\include` wouldn't work if the included file
contained, e.g., a begin without a matching end.
We've changed the Tok type so that it stores a full SourcePos,
rather than just a line and column. So tokens keeep track
of the file they came from. This allows us to use a simpler
method for includes, which doesn't require parsing the included
document as a whole.
Closes#3971.
* Options: Added readerStripComments to ReaderOptions.
* Added `--strip-comments` command-line option.
* Made `htmlTag` from the HTML reader sensitive to this feature.
This affects Markdown and Textile input.
Closes#2552.
Div's are difficult to translate into org syntax, as there are multiple
div-like structures (drawers, special blocks, greater blocks) which all
have their advantages and disadvantages. Previously pandoc would
use raw HTML to preserve the full div information; this was rarely
useful and resulted in visual clutter. Div-rendering was changed to
discard the div's classes and key-value pairs if there is no natural way
to translate the div into an org structure.
Closes: #3771
Closes#3511.
Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces. Now the indentation required is determined by the
first line of the list item: to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker. See the CommonMark
spec for more details and examples.
Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules. Here are some examples
of texts that will be parsed differently:
- a
- b
will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.
- a
code
Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin. With the
four-space rule, this would be a regular paragraph rather than a code
block.
- a
code
Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`. With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).
This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
#1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).
Users who want to use the old rules can select the `four_space_rule`
extension.
* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
of `gobbleSpaces` has been changed so that a `ReaderOptions`
parameter is not needed.
Acronyms are not resolved by the reader, but acronym and glossary information is put into attributes on Spans so that they can be processed in filters.
The structure expected is:
<div class="columns">
<div class="column" width="40%">
contents...
</div>
<div class="column" width="60%">
contents...
</div>
</div>
Support has been added for beamer and all HTML slide formats.
Closes#1710.
Note: later we could add a more elegant way to create
this structure in Markdown than to use raw HTML div elements.
This would come for free with a "native div syntax" (#168).
Or we could devise something specific to slides
We assume that comments are defined as parsed by the
docx reader:
I want <span class="comment-start" id="0" author="Jesse Rosenthal"
date="2016-05-09T16:13:00Z">I left a comment.</span>some text to
have a comment <span class="comment-end" id="0"></span>on it.
We assume also that the id attributes are unique and properly
matched between comment-start and comment-end.
Closes#2994.
Previously they would be transmitted to the template without
any escaping.
Note that `--M title='*foo*'` yields a different result from
---
title: *foo*
---
In the latter case, we have emphasis; in the former case, just
a string with literal asterisks (which will be escaped
in formats, like Markdown, that require it).
Closes#3792.
Also, fix regular macros so they're expanded at the
point of use, and NOT also the point of definition.
`\let` macros, by contrast, are expanded at the
point of definition. Added an `ExpansionPoint`
field to `Macro` to track this difference.
We previously did this only with raw blocks, on the assumption
that math environments would always be raw blocks. This has changed
since we now parse them as inline environments.
Closes#3816.
Thus, a span with attribute 'foo' gets written to HTML5
with 'data-foo', so it is valid HTML5.
HTML4 is not affected.
This will allow us to use custom attributes in pandoc without
producing invalid HTML.
Fixed applyMacros so that it operates on the whole
string, not just the first token!
Don't remove macro definitions from the output,
even if Ext_latex_macros is set, so that macros will
be applied. Since they're only applied to math in
Markdown, removing the macros can have bad effects.
Even for math macros, keeping them should be harmless.
This rewrite is primarily motivated by the need to
get macros working properly. A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).
We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.
A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
Exports Macro, Tok, TokType, Line, Column. [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
rawLaTeXInline and rawLaTeXBlock. (Both now return a String,
and they are polymorphic in state.)
* Added orgMacros field to OrgState. [API change]
* Removed readerApplyMacros from ReaderOptions.
Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.
Fixes#1390.
Fixes#2118.
Fixes#3236.
Fixes#3779.
Fixes#934.
Fixes#982.
If the metadata field is all on one line, we try to interpret
it as Inlines, and only try parsing as Blocks if that fails.
If it extends over one line (including possibly the `|` or
`>` character signaling an indented block), then we parse as
Blocks.
This was motivated by some German users finding that
date: '22. Juin 2017'
got parsed as an ordered list.
Closes#3755.
Note that if the table has a first page header and a
continuation page header, the notes will appear only
on the first occurrence of the header.
Closes#2378.
Note that as a result of this change, the following,
which formerly produced a header with two lines separated
by a line break, will now produce a header followed by a
paragraph:
# Hi\
there
This may affect some existing documents that relied on
this undocumented and unintended behavior.
This change makes pandoc more consistent with other
Markdown implementations, and with itself (since the two-space
version of a line break doesn't work inside ATX headers, and
neither version works inside Setext headers).
Closes#3730.
Currently we only handle the form `0.9\linewidth`.
Anything else would have to be converted to a percentage,
using some kind arbitrary assumptions about line widths.
See #3709.
Babel result blocks can have block attributes like captions and names.
Result blocks with attributes were not recognized and were parsed as
normal blocks without attributes.
Fixes: #3706
With `--reference-location` of `section` or `block`, pandoc
will now repeat references that have been used in earlier
sections.
The Markdown reader has also been modified, so that *exactly*
repeated references do not generate a warning, only
references with the same label but different targets.
The idea is that, with references after every block,
one might want to repeat references sometimes.
Closes#3701.
- Export `inEm` from ImageSize [API change].
- Change `showFl` and `show` instance for `Dimension` so
extra decimal places are omitted.
- Added `Em` as a constructor of `Dimension` [API change].
- Allow `em`, `cm`, `in` to pass through without conversion
in HTML, LaTeX.
Closes#3450.
This is now the default for pandoc's Markdown.
It allows whitespace between the two parts of a
reference link: e.g.
[a] [b]
[b]: url
This is now forbidden by default.
Closes#2602.
This is a verison of parseFromString specialied to
ParserState, which resets stateLastStrPos at the end.
This is almost always what we want.
This fixes a bug where `_hi_` wasn't treated as emphasis in
the following, because pandoc got confused about the
position of the last word:
- [o] _hi_
Closes#3690.
Support for the `#+INCLUDE:` file inclusion mechanism was added.
Recognized include types are *example*, *export*, *src*, and normal org
file inclusion. Advanced features like line numbers and level selection
are not implemented yet.
Closes: #3510
The grid table parsers for markdown and rst was combined into one single
parser, slightly changing parsing behavior of both parsers:
- The markdown parser now compactifies block content cell-wise: pure
text blocks in cells are now treated as paragraphs only if the cell
contains multiple paragraphs, and as plain blocks otherwise. Before,
this was true only for single-column tables.
- The rst parser now accepts newlines and multiple blocks in header
cells.
Closes: #3638
Previously we inadvertently interpreted indented HTML as
code blocks. This was a regression.
We now seek to determine the indentation level of the contents
of an HTML block, and (optionally) skip that much indentation.
As a side effect, indentation may be stripped off of raw
HTML blocks, if `markdown_in_html_blocks` is used. This
is better than having things interpreted as indented code
blocks.
Closes#1841.
* Fix keyval funtion: pandoc did not parse options in braces correctly. Additionally, dot, dash, and colon were no valid characters
* Add | as possible option value
* Improved code