- `<tfoot>` elements are no longer added to the table body but used as
table footer.
- Separate `<tbody>` elements are no longer combined into one.
- Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>`
elements are preserved.
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.
Closes#6890.
This shouldn't happen, in general, but it can happen with
JPEGs that don't conform to the spec. Having a DPI of 0
will blow up size calculations (division by 0).
Closes#6880.
+ Remove the `\strut` that was added at the end of minipage
environments in cells.
+ Replace `\tabularnewline` with `\\ \addlinespace`.
Closes#6842, closes#6860.
which was made (for reasons forgotten) when transferring
this code from pandoc-citeproc. The change led to `--` in
URLs being interpreted as en-dashes, which is unwanted.
Closes#6874.
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.
Closes#6869.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.
Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.
Closes#6863.
Previously we only self-contained attributes for
certain tag names (`img`, `embed`, `video`, `input`, `audio`,
`source`, `track`, `section`). Now we self-contain any
occurrence of `src`, `data-src`, `poster`, or `data-background-image`,
on any tag; and also `href` on `link` tags.
Closes#6854 (which specifically asked about
`asciinema-player` tags).
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.
We also skip blank lines like groff itself.
Closes#6858.
Previously a path beginning with a drive, like
`C:\foo\bar`, was translated to `C:\/foo/bar`, which
caused problems.
With this fix, the backslashes are removed.
Closes#6173.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes#6662.
Background: syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).
This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.
This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
Previously in-text note citations inside a footnote
would sometimes have the final period stripped, even
if it was needed (e.g. on the end of 'ibid').
See #6813.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`. This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.
Closes#6796.
The map-based YAML representation of filters expects `type` and `path`
fields. The path field had to be present for all filter types, but is
not used for citeproc filters. The field can now be omitted when type
is "citeproc", as described in the MANUAL.
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with: `AUTHOR NAME + COMMA + SPACE + REST`.
Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.
This gives better results. Note that normal citations are still
rendered in parentheses.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.
The default template has been modified to load the calc
package if tables are used.
This ensures that bibliography parsing errors generate messages
that include the bibliography file name -- otherwise it can be
quite mysterious where it is coming from.
[API change] New PandocBibliographyError constructor on
PandocError type.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output. This is a regression
due to our transition from cmark-gfm. This commit fixes it.
Closes#6768.
This change will avoid mixed paths like this one when
`--extract-media` is used with a Word file:
`![](C:\Git\TIJ4\Markdown/media/image30.wmf)`
Instead we'll get
`![](C:\Git\TIJ4\Markdown`media`image30.wmf)`.
Closes#6761.
For polyglossia we now use
`\setmainlanguage[variant=brazilian]{portuguese}`
and for babel
`\usepackage[shorthands=off,main=brazilian]{babel}`.
Closes#2953.
Closes#6599
c.f. https://docs.mathjax.org/en/latest/web/components/combined.html
Note that while this use the full variant of the js, this drops the mathml support.
That should be okay, because pandoc renders math in HTML as TeX when using
mathjax.
This change reduces latency.
Closes#6730.
Previously the command would succeed, returning empty metadata,
with no errors or warnings.
API changes:
- Remove now unused CouldNotParseYamlMetadata constructor for
LogMessage (T.P.Logging).
- Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
The renderCslJson function escapes `<`, `>`, and `&` as entities.
This is appropriate when generating HTML, but in CSL JSON
these are supposed to appear unescaped.
Closesjgm/citeproc#17.
For security reasons, some legal firms delete the date from comments and
tracked changes.
* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
...into linked DOI, and similarly for other URLs linked in the
bibliography. We want to avoid having a URL in which only the latter
part is linked. Closes#6723.
T.P.Readers.Markdown now exports yamlToRefs. [API change]
T.P.Readers.Metadata exports yamlBsToRefs. [API change]
These allow specifying an id filter so we parse only references
that are used in the document. Improves timing with a 3M
yaml references file from 36s to 17s.
...and CSL abbreviation files. Use resource path to search
in both USERDATADIR/csl and USERDATADIR/csl/dependent.
Also, add .csl or .json extension as needed, so you can just
do --csl zoology.
Languages appear to be sorted by their long name, which leads to
unexpected results: e.g., the long name of *m4* is *GNU m4*, so it is
listed between *gnuassembler* and *go*.
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.
Closes#5948.
To implement Syntax highlighting for OpenDocument, inlineToOpenDocument in OpenDocument Writer is updated based on Docx Writer.
This commit is only for inline Code because update of CodeBlock needs structual change of output document.
Currently, styles are not generated automatically in styles.xml. To implement it, additional commit for ODT Writer is needed.
Although styles are not included in styles.xml, output file can be shown in LibreOffice(7.0.0.3) like normal characters.
if they come before csl-right-inline. This ensures that
the citation number or label will be separated from the
rest by a space, even in formats (like plain) that don't yet have
special handling for the display spans.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
When a list occurs at the beginning of a definition list definition,
it can start on the same line as the label, which looks bad.
Fix that by starting such lists with an `\item[]`.
* Fix hlint suggestions, update hlint.yaml
Most suggestions were redundant brackets. Some required
LambdaCase.
The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
commonmark/gfm extensions. These shouldn't really be counted
as extensions, because they can't be disabled in commonmark.
Adjust markdown writer to check for commonmark variant in addition
to extensions.
`fenced_code_blocks`, `backtick_code_blocks`, `fenced_code_attributes`.
These can't really be disabled in the reader, but they need to
be enabled in the writer or we just get indented code.
This changes the Lua API. It is highly unlikely for this change to
affect existing filters, since the documentation for the new Table
constructor (and type) was incomplete and partly wrong before.
The Lua API is now more consistent, as all constructors for elements
with attributes now take attributes as the last parameter.
Specifying `-f ipynb+raw_markdown` will cause Markdown cells
to be represented as raw Markdown blocks, instead of being
parsed. This is not what you want when going from `ipynb`
to other formats, but it may be useful when going from `ipynb`
to Markdown or to `ipynb`, to avoid semantically insignificant
changes in the contents of the Markdown cells that might
otherwise be introduced.
Closes#5408.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
Previously it included all of the following, which make
sense for the legacy markdown_github but not for gfm,
since they are part of base commonmark and thus
can't be turned off in gfm:
- `Ext_all_symbols_escapable`
- `Ext_backtick_code_blocks`
- `Ext_fenced_code_blocks`
- `Ext_space_in_atx_header`
- `Ext_intraword_underscores`
- `Ext_lists_without_preceding_blankline`
- `Ext_shortcut_reference_links`
`
These have been removed from `githubMarkdownExtensions`, though
they're still turned on for legacy `markdown_github`.
This allows attributes to be added to any block or inline
element, in principle. (Though in many cases this will be
done by adding a Div or Span container, since pandoc's
AST doesn't have a slot for attributes for most elements.)
Currently this is only possible with the commonmark and gfm
readers.
Add `Ext_attributes` constructor for `Extension` [API change].
...instead of cmark-gfm (a wrapper around a C library).
We can now support many more pandoc extensions for
commonmark and gfm.
Add fenced_code_attributes to gfm/commonmark extensions.
Screen readers read an image's `alt` attribute and the figure caption,
both of which come from the same source in pandoc. The figure caption is
hidden from screen readers with the `aria-hidden` attribute. This
improves accessibility.
For HTML4, where `aria-hidden` is not allowed, pandoc still uses an
empty `alt` attribute to avoid duplicate contents.
Closes: #6491
The reader now parses the contents of the markdown cell to a Pandoc
structure, but *also* stores the raw markdown in a `source`
attribute on the cell Div. When we convert back to markdown,
this attribute is stripped off and the original source is used.
When we convert to other formats, the attribute is usually
ignored (though it will come through in HTML as a `data-source`
attribute, not unhelpfully).
I'll note some potential drawbacks of this approach:
- It makes it impossible to use pandoc to clean up or
change the contents of markdown cells, e.g.
going from `+smart` to `-smart`.
- There may be formats where the addition of the `source`
attribute is problematic. I can't think of any, though.
Closes#5408.
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer
read as metadata, but kept as raw `org` blocks. This ensures that more
information is retained when round-tripping org-mode files;
additionally, this change makes it possible to support non-standard org
extensions via filters.
The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has
changed: Org now allows multiple such lines and adds a space between the
contents of each line. Pandoc now always parses these settings as meta
inlines; setting values are no longer treated as comma-separated lists.
Note that a Lua filter can be used to restore the previous behavior.
`#+DESCRIPTION` lines are treated as text with markup. If multiple such
lines are given, then all lines are read and separated by soft
linebreaks.
Closes: #6485
The `tex` export option can be set with `#+OPTION: tex:nil` and allows
three settings:
- `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX,
- `nil` removes all LaTeX fragments from the document, and
- `verbatim` treats LaTeX as text.
The default is `t`.
Closes: #4070
Exceptions: name (which becomes the id), class (which becomes the
classes), and number-lines (which is treated specially to fit
with pandoc highlighting).
Closes#6465.