* Replace org-mode’s verbatim from code to codeWith.
This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.
* Alter test for org-mode’s verbatim change.
See previous commit for further detail on the new implementation.
when `raw_tex` is not enabled. (When `raw_tex` is enabled,
the whole environment is parsed as a raw block.)
The class name is the name of the environment.
Previously, we just included the contents without the
surrounding Div, but having a record of the environment's
boundaries and name can be useful.
Closes#6997.
In 2.11.3 we started adding `\addlinespace`, which produced less
dense tables. This wasn't an intentional change; I misunderstood
a comment in the discussion leading up to the change. This commit
restores the earlier default table appearance.
Note that if you want a less dense table, you can use something like
`\def\arraystretch{1.5}` in your header.
Closes#6996.
The Div wrapper of code blocks with captions now has the class
"captioned-content". The caption itself is added as a Plain block
inside a Div of class "caption". This makes it easier to write filters
which match on captioned code blocks. Existing filters will need to be
updated.
Closes: #6977
The `renderTags'` function was duplicated when the reader used `Text` as
its string type. The duplication is no longer necessary.
A side effect of this change is that empty `<col>` elements are written
as self-closing tags in raw HTML blocks.
Added field to WriterState that denotes the current nesting level for traversing tables.
Depending on the value of that field nested tables are recognized and written.
Asciidoc supports one level of nesting. If deeper tables are to be written, they are
omitted and a warning is issued.
The raw text is now included verbatim in the output. Previously is was parsed
into XML elements, which prevented the inclusion of partial XML snippets.
The `linkifyVariables` function was changing these to links
which then got treated as non-empty by citeproc, leading
to wrong results (e.g. ignoring nonempty URL when empty DOI is present).
Addresses part 2 of jgm/citeproc#41.
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
- Use dev version of citeproc, which handles duplicate
ids better, preferring the last one in the list
and discarding the rest.
- Ensure that inline citations take priority over external
ones.
See jgm/citeproc#36.
This restores the behavior of pandoc-citeproc.
This means that:
- a URL may be provided, and pandoc will fetch the resource.
- Pandoc will search the resource path for the bibliography
if it is not found relative to the working directory.
Closes#6940.
After the move to JuicyPixels, we were getting incorrect
width and heigh information for some images (see #6936, test-3.jpg).
The correct information was encoded in Exif tags that
JuicyPixels seemed to ignore. So we check these first
before looking at the Width and Height identified by
JuicyPixels.
Closes#6936.
- An image alone in its paragraph (but not a figure) is now
rendered as an independent image, with an `alt` attribute
if a description is supplied.
- An inline image that is not alone in its paragraph will
be rendered, as before, using a substitution.
Such an image cannot have a "center", "left", or
"right" alignment, so the classes `align-center`,
`align-left`, or `align-right` are ignored.
However, `align-top`, `align-middle`, `align-bottom`
will generate a corresponding `align` attribute.
Closes#6948.
Previously we stripped attribute prefixes, reading
`xml:lang` as `lang` for example. This resulted in
two duplicate `lang` attributes when `xml:lang` and
`lang` were both used. This commit causes the prefixes
to be retained, and also avoids invald duplicate
attributes.
Closes#6938.
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3
With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected. The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.
Closes#4565.
DokuWiki lets the user define his own Interwiki links.
Previously pandoc reacted to these by emitting a
google search link, which is not helpful. Instead,
we now just emit the full URL including the
wikilink prefix, e.g. `faquk>FAQ-mathml`.
This at least gives users the ability to
modify the links using filters.
Closes#6932.
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.
Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
Links with (internal) targets that the reader doesn't know about are
converted into emphasized text. Information on the link target is now
preserved by wrapping the text in a Span of class `spurious-link`, with
an attribute `target` set to the link's original target. This allows to
recover and fix broken or unknown links with filters.
See: #6916
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.
Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.
For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.
In order for numbers and reference text to be updated the generated
document must be refreshed.
Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Previously, we only added xmlns attributes to chapter elements,
even when running with --top-level-division=section.
Let’s add the namespaces to part and section elements too,
when they are the selected top-level divisions.
We do not need to add namespaces to documents produced with
--standalone flag, since those will already have xmlns attribute
on the root element in the template.
Previously bold and italics didn't work properly in LTR
text. This commit causes the w:bCs and w:iCs attributes
to be used, in addition to w:b and w:i, for bold and
italics respectively.
Closes#6911.
Fix appearance of bullets/numbered lists (the first level is slightly
indented to the right instead of right on the margin).
New golden files have been tested using Word 2010 on Windows 10.
- `<tfoot>` elements are no longer added to the table body but used as
table footer.
- Separate `<tbody>` elements are no longer combined into one.
- Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>`
elements are preserved.
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.
Closes#6890.
This shouldn't happen, in general, but it can happen with
JPEGs that don't conform to the spec. Having a DPI of 0
will blow up size calculations (division by 0).
Closes#6880.
+ Remove the `\strut` that was added at the end of minipage
environments in cells.
+ Replace `\tabularnewline` with `\\ \addlinespace`.
Closes#6842, closes#6860.
which was made (for reasons forgotten) when transferring
this code from pandoc-citeproc. The change led to `--` in
URLs being interpreted as en-dashes, which is unwanted.
Closes#6874.
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.
Closes#6869.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.
Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.
Closes#6863.
Previously we only self-contained attributes for
certain tag names (`img`, `embed`, `video`, `input`, `audio`,
`source`, `track`, `section`). Now we self-contain any
occurrence of `src`, `data-src`, `poster`, or `data-background-image`,
on any tag; and also `href` on `link` tags.
Closes#6854 (which specifically asked about
`asciinema-player` tags).
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.
We also skip blank lines like groff itself.
Closes#6858.
Previously a path beginning with a drive, like
`C:\foo\bar`, was translated to `C:\/foo/bar`, which
caused problems.
With this fix, the backslashes are removed.
Closes#6173.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes#6662.
Background: syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).
This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.
This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
Previously in-text note citations inside a footnote
would sometimes have the final period stripped, even
if it was needed (e.g. on the end of 'ibid').
See #6813.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`. This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.
Closes#6796.
The map-based YAML representation of filters expects `type` and `path`
fields. The path field had to be present for all filter types, but is
not used for citeproc filters. The field can now be omitted when type
is "citeproc", as described in the MANUAL.
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with: `AUTHOR NAME + COMMA + SPACE + REST`.
Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.
This gives better results. Note that normal citations are still
rendered in parentheses.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.
The default template has been modified to load the calc
package if tables are used.
This ensures that bibliography parsing errors generate messages
that include the bibliography file name -- otherwise it can be
quite mysterious where it is coming from.
[API change] New PandocBibliographyError constructor on
PandocError type.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output. This is a regression
due to our transition from cmark-gfm. This commit fixes it.
Closes#6768.
This change will avoid mixed paths like this one when
`--extract-media` is used with a Word file:
`![](C:\Git\TIJ4\Markdown/media/image30.wmf)`
Instead we'll get
`![](C:\Git\TIJ4\Markdown`media`image30.wmf)`.
Closes#6761.
For polyglossia we now use
`\setmainlanguage[variant=brazilian]{portuguese}`
and for babel
`\usepackage[shorthands=off,main=brazilian]{babel}`.
Closes#2953.
Closes#6599
c.f. https://docs.mathjax.org/en/latest/web/components/combined.html
Note that while this use the full variant of the js, this drops the mathml support.
That should be okay, because pandoc renders math in HTML as TeX when using
mathjax.
This change reduces latency.
Closes#6730.
Previously the command would succeed, returning empty metadata,
with no errors or warnings.
API changes:
- Remove now unused CouldNotParseYamlMetadata constructor for
LogMessage (T.P.Logging).
- Add 'Maybe FilePath' parameter to yamlToMeta in T.P.Readers.Markdown.
The renderCslJson function escapes `<`, `>`, and `&` as entities.
This is appropriate when generating HTML, but in CSL JSON
these are supposed to appear unescaped.
Closesjgm/citeproc#17.
For security reasons, some legal firms delete the date from comments and
tracked changes.
* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
...into linked DOI, and similarly for other URLs linked in the
bibliography. We want to avoid having a URL in which only the latter
part is linked. Closes#6723.
T.P.Readers.Markdown now exports yamlToRefs. [API change]
T.P.Readers.Metadata exports yamlBsToRefs. [API change]
These allow specifying an id filter so we parse only references
that are used in the document. Improves timing with a 3M
yaml references file from 36s to 17s.
...and CSL abbreviation files. Use resource path to search
in both USERDATADIR/csl and USERDATADIR/csl/dependent.
Also, add .csl or .json extension as needed, so you can just
do --csl zoology.
Languages appear to be sorted by their long name, which leads to
unexpected results: e.g., the long name of *m4* is *GNU m4*, so it is
listed between *gnuassembler* and *go*.
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.
Closes#5948.
To implement Syntax highlighting for OpenDocument, inlineToOpenDocument in OpenDocument Writer is updated based on Docx Writer.
This commit is only for inline Code because update of CodeBlock needs structual change of output document.
Currently, styles are not generated automatically in styles.xml. To implement it, additional commit for ODT Writer is needed.
Although styles are not included in styles.xml, output file can be shown in LibreOffice(7.0.0.3) like normal characters.
if they come before csl-right-inline. This ensures that
the citation number or label will be separated from the
rest by a space, even in formats (like plain) that don't yet have
special handling for the display spans.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
When a list occurs at the beginning of a definition list definition,
it can start on the same line as the label, which looks bad.
Fix that by starting such lists with an `\item[]`.
* Fix hlint suggestions, update hlint.yaml
Most suggestions were redundant brackets. Some required
LambdaCase.
The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
commonmark/gfm extensions. These shouldn't really be counted
as extensions, because they can't be disabled in commonmark.
Adjust markdown writer to check for commonmark variant in addition
to extensions.
`fenced_code_blocks`, `backtick_code_blocks`, `fenced_code_attributes`.
These can't really be disabled in the reader, but they need to
be enabled in the writer or we just get indented code.
This changes the Lua API. It is highly unlikely for this change to
affect existing filters, since the documentation for the new Table
constructor (and type) was incomplete and partly wrong before.
The Lua API is now more consistent, as all constructors for elements
with attributes now take attributes as the last parameter.
Specifying `-f ipynb+raw_markdown` will cause Markdown cells
to be represented as raw Markdown blocks, instead of being
parsed. This is not what you want when going from `ipynb`
to other formats, but it may be useful when going from `ipynb`
to Markdown or to `ipynb`, to avoid semantically insignificant
changes in the contents of the Markdown cells that might
otherwise be introduced.
Closes#5408.