With `--reference-location` of `section` or `block`, pandoc
will now repeat references that have been used in earlier
sections.
The Markdown reader has also been modified, so that *exactly*
repeated references do not generate a warning, only
references with the same label but different targets.
The idea is that, with references after every block,
one might want to repeat references sometimes.
Closes#3701.
Emacs parses org documents into a tree structure, which is then
post-processed during exporting. The reader is changed to do the same,
turning the document into a single tree of headlines starting at
level 0.
Fixes: #3695
- Export `inEm` from ImageSize [API change].
- Change `showFl` and `show` instance for `Dimension` so
extra decimal places are omitted.
- Added `Em` as a constructor of `Dimension` [API change].
- Allow `em`, `cm`, `in` to pass through without conversion
in HTML, LaTeX.
Closes#3450.
This is now the default for pandoc's Markdown.
It allows whitespace between the two parts of a
reference link: e.g.
[a] [b]
[b]: url
This is now forbidden by default.
Closes#2602.
This is a verison of parseFromString specialied to
ParserState, which resets stateLastStrPos at the end.
This is almost always what we want.
This fixes a bug where `_hi_` wasn't treated as emphasis in
the following, because pandoc got confused about the
position of the last word:
- [o] _hi_
Closes#3690.
Parsing of smart quotes and special characters can either be enabled via
the `smart` language extension or the `'` and `-` export options. Smart
parsing is active if either the extension or export option is enabled.
Only smart parsing of special characters (like ellipses and en and em
dashes) is enabled by default, while smart quotes are disabled.
This means that all smart parsing features will be enabled by adding the
`smart` language extension. Fine-grained control is possible by leaving
the language extension disabled. In that case, smart parsing is
controlled via the aforementioned export OPTIONS only.
Previously, all smart parsing was disabled unless the language extension
was enabled.
Support for the `#+INCLUDE:` file inclusion mechanism was added.
Recognized include types are *example*, *export*, *src*, and normal org
file inclusion. Advanced features like line numbers and level selection
are not implemented yet.
Closes: #3510
The grid table parsers for markdown and rst was combined into one single
parser, slightly changing parsing behavior of both parsers:
- The markdown parser now compactifies block content cell-wise: pure
text blocks in cells are now treated as paragraphs only if the cell
contains multiple paragraphs, and as plain blocks otherwise. Before,
this was true only for single-column tables.
- The rst parser now accepts newlines and multiple blocks in header
cells.
Closes: #3638
* LaTeX: Load `parskip` before `hyperref`.
According to `hyperref` package's `README.pdf`, page 22, `hyperref` package
should be loaded after `parskip` package.
* Adjust tests for previous change.
Previously we inadvertently interpreted indented HTML as
code blocks. This was a regression.
We now seek to determine the indentation level of the contents
of an HTML block, and (optionally) skip that much indentation.
As a side effect, indentation may be stripped off of raw
HTML blocks, if `markdown_in_html_blocks` is used. This
is better than having things interpreted as indented code
blocks.
Closes#1841.
* Fix keyval funtion: pandoc did not parse options in braces correctly. Additionally, dot, dash, and colon were no valid characters
* Add | as possible option value
* Improved code
Previously the Markdown writer would sometimes create links where there
were none in the source. This is now avoided by selectively escaping bracket
characters when they occur in a place where a link might be created.
Closes#3619.
Ensure that we do not generate reference links
whose labels differ only by case.
Also allow implicit reference links when the link
text and label are identical up to case.
Closes#3615.
The implicitly defined global filter (i.e. all element filtering
functions defined in the global lua environment) is used if no filter is
returned from a lua script. This allows to just write top-level
functions in order to define a lua filter. E.g
function Emph(elem) return pandoc.Strong(elem.content) end
Previously the LaTeX writer created invalid LaTeX
when `--listings` was specified and a code span occured
inside emphasis or another construction.
This is because the characters `%{}\` must be escaped
in lstinline when the listinline occurs in another
command, otherwise they must not be escaped.
To deal with this, adoping Michael Kofler's suggestion,
we always wrap lstinline in a dummy command `\passthrough`,
now defined in the default template if `--listings` is
specified. This way we can consistently escape the
special characters.
Closes#1629.
A single `read` function parsing pandoc-supported formats is added to
the module. This is simpler and more convenient than the previous method
of exposing all reader functions individually.
We now issue `<div class="line-block">` and include a
default definition for `line-block` in the default
templates, instead of hard-coding a `style` on the
div.
Closes#1623.
A figure with two subfigures turns into two pandoc
figures; the subcaptions are used and the main caption
ignored, unless there are no subcaptions.
Closes#3577.
Source block parameter names are no longer prefixed with *rundoc*. This
was intended to simplify working with the rundoc project, a babel
runner. However, the rundoc project is unmaintained, and adding those
markers is not the reader's job anyway.
The original language that is specified for a source element is now
retained as the `data-org-language` attribute and only added if it
differs from the translated language.
The line-numbering switch that can be given to source blocks (`-n` with
an start number as an optional parameter) is parsed and translated to a
class/key-value combination used by highlighting and other readers and
writers.
Previously we always added an empty div before the list
item, but this created problems with spacing in tight
lists. Now we do this:
If the list item contents begin with a Plain block,
we modify the Plain block by adding a Span around
its contents.
Otherwise, we add a Div around the contents of the
list item (instead of adding an empty Div to the
beginning, as before).
Closes#3596.
Filtering functions take element components as arguments instead of the
whole block elements. This resembles the way elements are handled in
custom writers.
Instead of taking the whole inline element, forcing users to destructure it
themselves, the components of the elements are passed to the filtering
functions.
Plain text readers are exposed to lua scripts via the `pandoc.reader`
submodule, which is further subdivided by format. Converting e.g. a
markdown string into a pandoc document is possible from within lua:
doc = pandoc.reader.markdown.read_doc("Hello, World!")
A `read_block` convenience function is provided for all formats,
although it will still parse the whole string but return only the first
block as the result.
Custom reader options are not supported yet, default options are used
for all parsing operations.
Closes#3547.
Macro definitions are inserted in the template when there is highlighted
code.
Limitations: background colors and underline currently not
supported.
This avoids overly narrow labels for ordered lists with
() delimiters.
However, arguably it creates overly wide labels for bullets.
Also, lists now start flush with the margin, rather than
indented.
Fixes#2421.
Modified template to include a `<back>` and `<body>` section.
This should give authors more flexibility, e.g. to put
acknowledgements metadata in `<back>`. References are
automatically extracted and put into `<back>`.
* Fix lstinline handling: lstinline with braces can be used (verb cannot be used with braces)
* Use codeWith and determine the language from lstinline
* Improve code
* Add another test: convert lstinline without language option
* New module: Text.Pandoc.Writers.Ms.
* New template: default.ms.
* The writer uses texmath's new eqn writer to convert math
to eqn format, so a ms file produced with this writer
should be processed with `groff -ms -e` if it contains
math.
This is enabled by default in pandoc and GitHub markdown but not the
other flavors.
This requirse a space between the opening #'s and the header
text in ATX headers (as CommonMark does but many other implementations
do not). This is desirable to avoid falsely capturing things ilke
#hashtag
or
#5Closes#3512.
* Add `--lua-filter` option. This works like `--filter` but takes pathnames of special lua filters and uses the lua interpreter baked into pandoc, so that no external interpreter is needed. Note that lua filters are all applied after regular filters, regardless of their position on the command line.
* Add Text.Pandoc.Lua, exporting `runLuaFilter`. Add `pandoc.lua` to data files.
* Add private module Text.Pandoc.Lua.PandocModule to supply the default lua module.
* Add Tests.Lua to tests.
* Add data/pandoc.lua, the lua module pandoc imports when processing its lua filters.
* Document in MANUAL.txt.
This was failing because of a small discrepancy in markdown
table header line lengths on appveyor.
It's a minor issue, I can't see what is causing it, and
it's irrelevant to the issue this is testing, so we'll
just write native for this test.
Closes#1905.
Removed stateChapters from ParserState.
Now we parse chapters as level 0 headers, and parts as level -1 headers.
After parsing, we check for the lowest header level, and if it's
less than 1 we bump everything up so that 1 is the lowest header level.
So `\part` will always produce a header; no command-line options
are needed.
Previously only autogenerated ids were added to the list
of header identifiers in state, so explicit ids weren't taken
into account when generating unique identifiers. Duplicated
identifiers could result.
This simple fix ensures that explicitly given identifiers are
also taken into account.
Fixes#1745.
Note some limitations, however. An autogenerated identifier
may still coincide with an explicit identifier that is given
for a header later in the document, or with an identifier on
a div, span, link, or image. Fixing this would be much more
difficult, because we need to run `registerHeader` before
we have the complete parse tree (so we can't get a complete
list of identifiers from the document by walking the tree).
However, it might be worth issuing warnings for duplicate
header identifiers; I think we can do that. It is not
common for headers to have the same text, and the issue
can always be worked around by adding explicit identifiers,
if the user is aware of it.
* Previously we got overlong lists with `--wrap=none`. This is fixed.
* Previously a multiline list could become a simple list (and would
always become one with `--wrap=none`).
Closes#3384.
when they occur without space surrounding them.
E.g. equation, math.
This avoids incorrect vertical space around equations.
Closes#3309.
Closes#2171.
See also rstudio/bookdown#358.
E.g. an HTML table with two cells in the first row and one
in the second (but no row/colspan).
We now calculate the number of columns based on the longest
row (or the length of aligns or widths).
Closes#3337.
When multiple YAML metadata blocks are used, and two define
the same field, the value defined first takes precedence,
according to the manual. This was changed briefly in
ba3ee62323. This commit
reverts to the original behavior and adds a test case.
All templates now include `code{white-space: pre-wrap}`
and CSS for `q` if `--html-q-tags` is used.
Previously some templates had `pre` and others `pre-wrap`;
the `q` styles were only sometimes included.
See #3485.
Added test cases.
Fixed HTML reader to parse a span with class "smallcaps" as
SmallCaps.
Fixed Markdown writer to render SmallCaps as a native span
when native spans are enabled.
Polyglot markup is HTML5 that is also valid XHTML. See
<https://www.w3.org/TR/html-polyglot>. With this change, pandoc's
html5 writer creates HTML that is both valid HTML5 and valid XHTML.
See jgm/pandoc-templates#237 for prior discussion.
* Add xml namespace to `<html>` element.
* Make all `<meta>` elements self closing.
See <https://www.w3.org/TR/html-polyglot/#empty-elements>.
* Add `xml:lang` attribute on `<html>` element, defaulting to blank, and
always include `lang` attribute, even when blank. See
<https://www.w3.org/TR/html-polyglot/#language-attributes>.
* Update test files for template changes.
The key justification for having language values default to blank: it
turns out the HTML5 spec requires it (as I read it). Under
[the HTML5 spec, section "3.2.5.3. The lang and xml:lang
attributes"](https://www.w3.org/TR/html/dom.html#the-lang-and-xmllang-attributes),
providing attributes with blank contents both:
* Has meaning, "unknown", and
* Is a MUST (written as "must") if a language value is not provided ...
> The lang attribute (in no namespace) specifies the primary language
> for the element's contents and for any of the element's attributes that
> contain text. Its value must be a valid BCP 47 language tag, or the
> empty string. Setting the attribute to the empty string indicates that
> the primary language is unknown.
In short, it seems that where a language value is not provided then a
blank value MUST be provided for Polyglot Markup conformance, because
the HTML5 spec stipulates a "must". So although the Polyglot Markup spec
is unclear on this issue it would seem that if it was correctly written,
it would therefore require blank attributes.
Further justifications are found at
https://github.com/jgm/pandoc-templates/issues/237#issuecomment-275584181
(but the HTML5 spec justification given above would seem to be the
clincher).
In addition to having lang-values-default-to-blank I recommend that, when an
author does not provide a lang value, then upon on pandoc command execution
a warning message like the following be provided:
> Polyglot markup stipulates that 'The root element SHOULD always specify
> the language'. It is therefore recommended you specify a language value in
> your source document. See
> <https://www.w3.org/International/articles/language-tags/> for valid
> language values.
The citations appear at the end of the document as a definition
list in a special div with id `citations`.
Citations link to the definitions.
Added stateCitations to ParserState.
Closes#853.
Previously we would refuse to parse anything as raw inline if
it was in the blockCommands list. Now we allow exceptions
if they're listed under ignoreInlines in inlineCommands.
This should make it easier e.g. to include an \hspace
between two side-by-side raw LaTeX tables.
Otherwise things like `\noindent foo` break and turn into
`\noindentfoo`.
Affects `-f latex+raw_tex` and `-f markdown` (and other formats
that allow `raw_tex`).
Closes#1773.
For example, in
\begin{tabular}{>{$}l<{$}>{$}l<{$} >{$}l<{$}}
each cell will be interpreted as if it has a `$`
before its content and a `$` after (math mode).
Currently the support for the `.. table` directive is a bit
limited; we don't yet support the `widths` field. But at least
you can have a proper captioned table.
These were confusing.
Now we rely on the +raw_tex or +raw_html extension with latex
or html input.
Thus, instead of
--parse-raw -f latex
we use
-f latex+raw_tex
and instead of
--parse-raw -f html
we use
-f html+raw_html
The org writer was inserting two spaces after list bullets. Emacs
Org-mode defaults to a single space, so behavior is changed to reflect
this.
Closes: #3417