* Rewrote `withRaw` so it doesn't rely on fragile assumptions
about token positions (which break when macros are expanded).
This requires the addition of `sEnableWithRaw` and `sRawTokens`
in `LaTeXState`, and a new combinator `disablingWithRaw` to
disable collecting of raw tokens in certain contexts.
* Add `parseFromToks` to T.P.Readers.LaTeX.Parsing.
* Fix parsing of single character tokens so it doesn't mess
up the new raw token collecting.
* These changes slightly increase allocations and have a small
performance impact, but it's minor.
Closes#7092.
Setting SOURCE_DATE_EPOCH will allow reproducible builds.
Partially addresses #7093. This does not suffice to fully enable
reproducible in EPUB, since a unique id is being generated for each
build.
This attempts to read the SOURCE_DATE_EPOCH environment variable
and parse a UTC time from it (treating it as a unix date stamp,
see https://reproducible-builds.org/specs/source-date-epoch/).
If the variable is not set or can't be parsed as a unix date
stamp, then the function returns the current date.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content]. This allows
existing pandoc code to use a better parser without
much modification.
The new parser is used in all places where xml-light's
parser was previously used. Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).
Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation. It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.
The new parser also reports errors, which we report
when possible.
A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].
Closes#7091, which was the main stimulus.
These changes revealed the need for some changes
in the tests. The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.
Add entity defs to docbook-reader.docbook.
Update golden tests for docx.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.
It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.
This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.
Closes#7068.
Previously there was a messy code path that gave strange
results in some cases, not passing through raw tex but
trying to extract a string content. This was an artefact
of trying to handle some special bibtex-specific commands
in the BibTeX reader. Now we just handle these in the
LaTeX reader and simplify parsing in the BibTeX reader.
This does mean that more raw tex will be passed through
(and currently this is not sensitive to the `raw_tex`
extension; this should be fixed).
Closes#7049.
The Lua modules `pandoc` and `pandoc.List` are now always loaded from the
system's default data directory. Loading from a different directory by
overriding the default path, e.g. via `--data-dir`, is no longer supported to
avoid unexpected behavior and to address security concerns.
* `biblatex` and `bibtex` are now supported as output
as well as input formats.
* New module Text.Pandoc.Writers.BibTeX, exporting
writeBibTeX and writeBibLaTeX. [API change]
* New unexported function `writeBibtexString` in
Text.Pandoc.Citeproc.BibTeX.
We were losing content from inside spans with a class,
due to logic that is meant to avoid nested inline
structures that can't be represented in RST.
The logic was a bit stricter than necessary. This
commit fixes the issue.
This reverts commit 6efd3460a7.
Since this extension is designed to be used with
GitHub markdown (gfm), we need to implement the parser
as a commonmark extension (commonmark-extensions),
rather than in pandoc's markdown reader. When that is
done, we can add it here.
Instead of hard-coding the border and header cell vertical alignment,
we now let this be determined by the Table style, making use of
Word's "conditional formatting" for the table's first row.
For headerless tables, we use the tblLook element to tell Word
not to apply conditional first-row formatting.
Closes#7008.
Due to a bug in code added to avoid overwriting the cover image
if it had the form `fileX.YYY`, pandoc made an endless sequence
of HTTP requests when writing epub with input from a URL.
Closes#7013.
Additional pipe chars, used to separate "action" state from "no further
action" states, are ignored. E.g., for the following sequence, both
`DONE` and `FINISHED` are states with no further action required.
#+TODO: UNFINISHED | DONE | FINISHED
Previously, parsing of the todo sequence failed if multiple pipe chars
were included.
Closes: #7014
If the table lacks a header, the header row should be an empty
list. Previously we got a list of empty cells, which caused
an empty header to be emitted instead of no header. In LaTeX/PDF
output that meant we got a double top line with space between.
@tarleb @despres - please let me know if this is problematic
for some reason I'm not grasping.
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
* Replace org-mode’s verbatim from code to codeWith.
This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.
* Alter test for org-mode’s verbatim change.
See previous commit for further detail on the new implementation.
when `raw_tex` is not enabled. (When `raw_tex` is enabled,
the whole environment is parsed as a raw block.)
The class name is the name of the environment.
Previously, we just included the contents without the
surrounding Div, but having a record of the environment's
boundaries and name can be useful.
Closes#6997.
In 2.11.3 we started adding `\addlinespace`, which produced less
dense tables. This wasn't an intentional change; I misunderstood
a comment in the discussion leading up to the change. This commit
restores the earlier default table appearance.
Note that if you want a less dense table, you can use something like
`\def\arraystretch{1.5}` in your header.
Closes#6996.
The Div wrapper of code blocks with captions now has the class
"captioned-content". The caption itself is added as a Plain block
inside a Div of class "caption". This makes it easier to write filters
which match on captioned code blocks. Existing filters will need to be
updated.
Closes: #6977
The `renderTags'` function was duplicated when the reader used `Text` as
its string type. The duplication is no longer necessary.
A side effect of this change is that empty `<col>` elements are written
as self-closing tags in raw HTML blocks.
Added field to WriterState that denotes the current nesting level for traversing tables.
Depending on the value of that field nested tables are recognized and written.
Asciidoc supports one level of nesting. If deeper tables are to be written, they are
omitted and a warning is issued.
The raw text is now included verbatim in the output. Previously is was parsed
into XML elements, which prevented the inclusion of partial XML snippets.
The `linkifyVariables` function was changing these to links
which then got treated as non-empty by citeproc, leading
to wrong results (e.g. ignoring nonempty URL when empty DOI is present).
Addresses part 2 of jgm/citeproc#41.
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
- Use dev version of citeproc, which handles duplicate
ids better, preferring the last one in the list
and discarding the rest.
- Ensure that inline citations take priority over external
ones.
See jgm/citeproc#36.
This restores the behavior of pandoc-citeproc.
This means that:
- a URL may be provided, and pandoc will fetch the resource.
- Pandoc will search the resource path for the bibliography
if it is not found relative to the working directory.
Closes#6940.
After the move to JuicyPixels, we were getting incorrect
width and heigh information for some images (see #6936, test-3.jpg).
The correct information was encoded in Exif tags that
JuicyPixels seemed to ignore. So we check these first
before looking at the Width and Height identified by
JuicyPixels.
Closes#6936.
- An image alone in its paragraph (but not a figure) is now
rendered as an independent image, with an `alt` attribute
if a description is supplied.
- An inline image that is not alone in its paragraph will
be rendered, as before, using a substitution.
Such an image cannot have a "center", "left", or
"right" alignment, so the classes `align-center`,
`align-left`, or `align-right` are ignored.
However, `align-top`, `align-middle`, `align-bottom`
will generate a corresponding `align` attribute.
Closes#6948.
Previously we stripped attribute prefixes, reading
`xml:lang` as `lang` for example. This resulted in
two duplicate `lang` attributes when `xml:lang` and
`lang` were both used. This commit causes the prefixes
to be retained, and also avoids invald duplicate
attributes.
Closes#6938.
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3
With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected. The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.
Closes#4565.
DokuWiki lets the user define his own Interwiki links.
Previously pandoc reacted to these by emitting a
google search link, which is not helpful. Instead,
we now just emit the full URL including the
wikilink prefix, e.g. `faquk>FAQ-mathml`.
This at least gives users the ability to
modify the links using filters.
Closes#6932.
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.
Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
Links with (internal) targets that the reader doesn't know about are
converted into emphasized text. Information on the link target is now
preserved by wrapping the text in a Span of class `spurious-link`, with
an attribute `target` set to the link's original target. This allows to
recover and fix broken or unknown links with filters.
See: #6916
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.
Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.
For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.
In order for numbers and reference text to be updated the generated
document must be refreshed.
Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Previously, we only added xmlns attributes to chapter elements,
even when running with --top-level-division=section.
Let’s add the namespaces to part and section elements too,
when they are the selected top-level divisions.
We do not need to add namespaces to documents produced with
--standalone flag, since those will already have xmlns attribute
on the root element in the template.
Previously bold and italics didn't work properly in LTR
text. This commit causes the w:bCs and w:iCs attributes
to be used, in addition to w:b and w:i, for bold and
italics respectively.
Closes#6911.
Fix appearance of bullets/numbered lists (the first level is slightly
indented to the right instead of right on the margin).
New golden files have been tested using Word 2010 on Windows 10.
- `<tfoot>` elements are no longer added to the table body but used as
table footer.
- Separate `<tbody>` elements are no longer combined into one.
- Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>`
elements are preserved.
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.
Closes#6890.
This shouldn't happen, in general, but it can happen with
JPEGs that don't conform to the spec. Having a DPI of 0
will blow up size calculations (division by 0).
Closes#6880.
+ Remove the `\strut` that was added at the end of minipage
environments in cells.
+ Replace `\tabularnewline` with `\\ \addlinespace`.
Closes#6842, closes#6860.
which was made (for reasons forgotten) when transferring
this code from pandoc-citeproc. The change led to `--` in
URLs being interpreted as en-dashes, which is unwanted.
Closes#6874.
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.
Closes#6869.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.
Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.
Closes#6863.
Previously we only self-contained attributes for
certain tag names (`img`, `embed`, `video`, `input`, `audio`,
`source`, `track`, `section`). Now we self-contain any
occurrence of `src`, `data-src`, `poster`, or `data-background-image`,
on any tag; and also `href` on `link` tags.
Closes#6854 (which specifically asked about
`asciinema-player` tags).
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.
We also skip blank lines like groff itself.
Closes#6858.
Previously a path beginning with a drive, like
`C:\foo\bar`, was translated to `C:\/foo/bar`, which
caused problems.
With this fix, the backslashes are removed.
Closes#6173.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes#6662.
Background: syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).
This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.
This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
Previously in-text note citations inside a footnote
would sometimes have the final period stripped, even
if it was needed (e.g. on the end of 'ibid').
See #6813.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`. This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.
Closes#6796.
The map-based YAML representation of filters expects `type` and `path`
fields. The path field had to be present for all filter types, but is
not used for citeproc filters. The field can now be omitted when type
is "citeproc", as described in the MANUAL.
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with: `AUTHOR NAME + COMMA + SPACE + REST`.
Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.
This gives better results. Note that normal citations are still
rendered in parentheses.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.
The default template has been modified to load the calc
package if tables are used.
This ensures that bibliography parsing errors generate messages
that include the bibliography file name -- otherwise it can be
quite mysterious where it is coming from.
[API change] New PandocBibliographyError constructor on
PandocError type.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output. This is a regression
due to our transition from cmark-gfm. This commit fixes it.
Closes#6768.
This change will avoid mixed paths like this one when
`--extract-media` is used with a Word file:
`![](C:\Git\TIJ4\Markdown/media/image30.wmf)`
Instead we'll get
`![](C:\Git\TIJ4\Markdown`media`image30.wmf)`.
Closes#6761.
For polyglossia we now use
`\setmainlanguage[variant=brazilian]{portuguese}`
and for babel
`\usepackage[shorthands=off,main=brazilian]{babel}`.
Closes#2953.