- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
and skip instead of failing with an error.
- Closes#7099.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content]. This allows
existing pandoc code to use a better parser without
much modification.
The new parser is used in all places where xml-light's
parser was previously used. Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).
Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation. It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.
The new parser also reports errors, which we report
when possible.
A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].
Closes#7091, which was the main stimulus.
These changes revealed the need for some changes
in the tests. The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.
Add entity defs to docbook-reader.docbook.
Update golden tests for docx.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.
It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.
This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.
Closes#7068.
Previously there was a messy code path that gave strange
results in some cases, not passing through raw tex but
trying to extract a string content. This was an artefact
of trying to handle some special bibtex-specific commands
in the BibTeX reader. Now we just handle these in the
LaTeX reader and simplify parsing in the BibTeX reader.
This does mean that more raw tex will be passed through
(and currently this is not sensitive to the `raw_tex`
extension; this should be fixed).
Closes#7049.
If the table lacks a header, the header row should be an empty
list. Previously we got a list of empty cells, which caused
an empty header to be emitted instead of no header. In LaTeX/PDF
output that meant we got a double top line with space between.
@tarleb @despres - please let me know if this is problematic
for some reason I'm not grasping.
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
In 2.11.3 we started adding `\addlinespace`, which produced less
dense tables. This wasn't an intentional change; I misunderstood
a comment in the discussion leading up to the change. This commit
restores the earlier default table appearance.
Note that if you want a less dense table, you can use something like
`\def\arraystretch{1.5}` in your header.
Closes#6996.
Added field to WriterState that denotes the current nesting level for traversing tables.
Depending on the value of that field nested tables are recognized and written.
Asciidoc supports one level of nesting. If deeper tables are to be written, they are
omitted and a warning is issued.
- An image alone in its paragraph (but not a figure) is now
rendered as an independent image, with an `alt` attribute
if a description is supplied.
- An inline image that is not alone in its paragraph will
be rendered, as before, using a substitution.
Such an image cannot have a "center", "left", or
"right" alignment, so the classes `align-center`,
`align-left`, or `align-right` are ignored.
However, `align-top`, `align-middle`, `align-bottom`
will generate a corresponding `align` attribute.
Closes#6948.
Previously we stripped attribute prefixes, reading
`xml:lang` as `lang` for example. This resulted in
two duplicate `lang` attributes when `xml:lang` and
`lang` were both used. This commit causes the prefixes
to be retained, and also avoids invald duplicate
attributes.
Closes#6938.
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.
Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.
For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.
In order for numbers and reference text to be updated the generated
document must be refreshed.
Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.
Closes#6890.
+ Remove the `\strut` that was added at the end of minipage
environments in cells.
+ Replace `\tabularnewline` with `\\ \addlinespace`.
Closes#6842, closes#6860.
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.
Closes#6869.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.
Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.
Closes#6863.
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.
We also skip blank lines like groff itself.
Closes#6858.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes#6662.
Background: syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).
This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.
This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`. This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.
Closes#6796.
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with: `AUTHOR NAME + COMMA + SPACE + REST`.
Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.
This gives better results. Note that normal citations are still
rendered in parentheses.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.
The default template has been modified to load the calc
package if tables are used.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output. This is a regression
due to our transition from cmark-gfm. This commit fixes it.
Closes#6768.
This fixes a problem with author-in-text citations for references
including both an author and an editor. Previously, both were
included in the text, but only the author should be.
Closes#6765. Added a test.
This fixes the citation number issue with ieee.csl and other
styles that do not explicitly sort bibliographies. (Pandoc
was numbering them by their order in the bibliography file,
rather than the order cited, as required by the CSL spec.)
Closes#6741.
The renderCslJson function escapes `<`, `>`, and `&` as entities.
This is appropriate when generating HTML, but in CSL JSON
these are supposed to appear unescaped.
Closesjgm/citeproc#17.
if they come before csl-right-inline. This ensures that
the citation number or label will be separated from the
rest by a space, even in formats (like plain) that don't yet have
special handling for the display spans.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
When a list occurs at the beginning of a definition list definition,
it can start on the same line as the label, which looks bad.
Fix that by starting such lists with an `\item[]`.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
Screen readers read an image's `alt` attribute and the figure caption,
both of which come from the same source in pandoc. The figure caption is
hidden from screen readers with the `aria-hidden` attribute. This
improves accessibility.
For HTML4, where `aria-hidden` is not allowed, pandoc still uses an
empty `alt` attribute to avoid duplicate contents.
Closes: #6491
Closes#6385. (The summary element needs to be the first
child of details and should not be enclosed by p tags.)
NOTE: you need to include a blank line before the closing
`</details>`, if you want the last part of the content to
be parsed as a paragraph.
Some CSS to ensure that display math is
displayed centered and on a new line is now included
in the default HTML-based templates; this may be
overridden if the user wants a different behavior.
This reverts a change in the last release; the Div is
no longer needed, because we can now put the id right in
the Table's attributes. However, writers may still need
to be modified to do something with the id in a Table
(e.g. create an anchor), so in the short term we may lose
the ability to link to tables in some writers.
The Builder.simpleTable now only adds a row to the TableHead when the
given header row is not null. This uncovered an inconsistency in the
readers: some would unconditionally emit a header filled with empty
cells, even if the header was not present. Now every reader has the
conditional behaviour. Only the XWiki writer depended on the header
row being always present; it now pads its head as necessary.
- Writers.Native is now adapted to the new Table type.
- Inline captions should now be conditionally wrapped in a Plain, not
a Para block.
- The toLegacyTable function now lives in Writers.Shared.
- SmallCaps instead of Span for the part after the initial capital.
- Ensure that both arguments are parsed, so that in Markdown both
are treated as raw LateX. (Closes #6258.)
This reverts commit 3493d6afaa.
This might be worth considering in the future, but let's not do
it yet...the additional complexity needs a better justification.
This is experimental. Normally metadata values are interpreted
as markdown, but if the !!literal tag is used they will be interpreted
as plain strings.
We need to consider whether this can still be implemented if
we switch back from HsYAML to yaml for performance reasons.
Closes#5849. Previously biblatex citations were only grouped if
there was no prefix. This patch allows them to be grouped in
subgroups split by prefixes and suffixes, which allows better citation
sorting.
If parsing fails in a raw environment (e.g. due to special
characters like unescaped `_`), try again as a verbatim
environment, which is less sensitive to special characters.
This allows us to capture special environments that change
catcodes as raw tex when `-f latex+raw_tex` is used.
Closes#6034.
The fix to #6030 actually changed behavior, so that the
2D nesting occurred at slide level N-1 and N, instead of
at the top-level section. This commit restores the 2.7.3 behavior.
If there are more than 2 levels, the top level is horizontal
and the rest are collapsed to vertical.
Closes#6032.
With positive heading shifts, starting in 2.8 this option caused
metadata titles to be removed and changed to regular headings.
This behavior is incompatible with the old behavior of
`--base-header-level` and breaks old workflows, so with this
commit we are rolling back this change.
Now, there is an asymmetry in positive and negative heading
level shifts:
+ With positive shifts, the metadata title stays the same and
does not get changed to a heading in the body.
+ With negative shifts, a heading can be converted into the
metadata title.
I think this is a desirable combination of features, despite
the asymmetry. One might, e.g., want to have a document
with level-1 section headigs, but render it to HTML with
level-2 headings, retaining the metadata title (which pandoc
will render as a level-1 heading with the default template).
Closes#5957.
Revises #5615.
If a simple table would be too wide, we use a grid table.
The code for generating grid tables has been adjusted to
give more intelligent column widths when widths aren't
given. (This also affects the markdown writer.)
Closes#5899.
`Nothing` means: nothing specified.
`Just []` means: an empty list specified (e.g. in defaults).
Potentially these could lead to different behavior: see #5888.
Superscripts and subscripts cannot contain spaces,
but newlines were previously allowed (unintentionally).
This led to bad interactions in some cases with footnotes.
E.g.
```
foo^[note]
bar^[note]
```
With this change newlines are also not allowed inside
super/subscripts.
Closes#5878.
* Add HTML Reader support for `<dfn>`, parsing this as a Span with class `dfn`.
* Change `htmlSpanLikeElements` implementation to retain classes,
attributes and inline content.
Previously, if a document contained two YAML metadata blocks
that set the same field, the conflict would be resolved in favor
of the first. Now it is resolved in favor of the second (due to
a change in pandoc-types).
This makes the behavior more uniform with other things in pandoc
(such as reference links and `--metadata-file`).
When a div surrounds multiple sections at the same level,
or a section of highre level followed by one of lower level,
then we just leave it as a div and create a new div for the
section.
Closes#5846, closes#5761.
Parse <mark> elements from HTML as HTML span like elements, with a
single class matching the tag name `mark`. Mark elements are rendered to
HTML using the native <mark> element.
Fixes https://github.com/jgm/pandoc/issues/5797.
* Text.Pandoc.Shared: export `htmlSpanLikeElements` [API change]
This commit also introduces a mapping of HTML span like elements that
are internally represented as a Span with a single class, but that are
converted back to the original element by the html writer. As of now,
only the kbd element is handled this way. Ideally these elements should
be handled as plain AST values, but since that would be a breaking
change with a large impact, we revert to this stop-gap solution.
Fixes https://github.com/jgm/pandoc/issues/5796.
...even when we must lose width information.
All in all this seems to be people's preferred behavior, even though it
is slightly lossier.
Closes#2608.
Closes#4497.
on conflicting fields. This changes earlier behavior (but not in
a release), where first took precedence.
Note that this may seem inconsistent with the behavior of
multiple YAML blocks within a document, where the first takes
precedence. Still, it is convenient to be able to override
defaults with options later on the command line.
If this is present on a heading with the 'unnumbered' class,
the heading won't appear in the TOC. This class has no
effect if 'unnumbered' is not also specified.
This affects HTML-based writers (including slide shows
and epub), LateX (including beamer), RTF, and PowerPoint.
Other writers do not yet support `unlisted`.
Closes#1762.
We were requiring consistent indentation, but this
isn't required by RST, as long as each nonblank
line of the block has *some* indentation.
Closes#5753.
Previously we used the following Project Gutenberg conventions
for plain output:
- extra space before and after level 1 and 2 headings
- all-caps for strong emphasis `LIKE THIS`
- underscores surrounding regular emphasis `_like this_`
This commit makes `plain` output plainer. Strong and Emph
inlines are rendered without special formatting. Headings
are also rendered without special formatting, and with only
one blank line following.
To restore the former behavior, use `-t plain+gutenberg`.
API change: Add `Ext_gutenberg` constructor to `Extension`.
See #5741.
Deprecate --base-heading-level.
The new option does everything the old one does, but also
allows negative shifts. It also promotes the document
metadata (if not null) to a level-1 heading with a +1 shift,
and demotes an initial level-1 heading to document metadata
with a -1 shift. This supports converting documents that
use an initial level-1 heading for the document title.
Closes#5615.
* Org reader: allow the `-i` switch to ignore leading spaces.
* Org reader: handle awkwardly-aligned code blocks within lists.
Code blocks in Org lists must have their #+BEGIN_ aligned in a
reasonable way, but their other components can be positioned otherwise.
Text.Pandoc.Shared:
+ Remove `Element` type [API change]
+ Remove `makeHierarchicalize` [API change]
+ Add `makeSections` [API change]
+ Export `deLink` [API change]
Now that we have Divs, we can use them to represent the structure
of sections, and we don't need a special Element type.
`makeSections` reorganizes a block list, adding Divs with
class `section` around sections, and adding numbering
if needed.
This change also fixes some longstanding issues recognizing
section structure when the document contains Divs.
Closes#3057, see also #997.
All writers have been changed to use `makeSections`.
Note that in the process we have reverted the change
c1d058aeb1
made in response to #5168, which I'm not completely
sure was a good idea.
Lua modules have also been adjusted accordingly.
Existing lua filters that use `hierarchicalize` will
need to be rewritten to use `make_sections`.
Revert "hierarchicalize: ensure that sections get ids..."
This reverts commit 212406a61d.
Revert "Improve detection of headings in Divs by hierarchicalize."
This reverts commit 6e2cfd6c97.
Revert "Shared.hierarchicalize: improve handling of div and section structure."
This reverts commit 345b33762e.
The structure
```
<h1>one</h1>
<div>
<h1>two</h1>
</div>
```
should create two coordinate sections, not a section with
a subsection. Now it does.
Extends #3057.
Previously Divs were opaque to hierarchicalize, so headings
inside divs didn't get into the table of contents, for
example (#3057).
Now hierarchicalize treats Divs as sections when appropriate.
For example, these structures both yield a section and a
subsection:
``` html
<div>
<h1>one</h1>
<div>
<h2>two</h2>
</div>
</div>
```
``` html
<div>
<h1>one</h1>
<div>
<h1>two</h1>
</div>
</div>
```
Note that
``` html
<h1>one</h1>
<div>
<h2>two</h2>
</div>
<h1>three</h1>
```
gets parsed as the structure
one
two
three
which may not always be desirable.
Closes#3057.
Previously we used named character references with html5 output.
But these aren't valid XML, and we aim to produce html5 that is
also valid XHTML (polyglot markup). (This is also needed for
epub3.)
Closes#5718.
+ Remove Text.Pandoc.Pretty; use doclayout instead. [API change]
+ Text.Pandoc.Writers.Shared: remove metaToJSON, metaToJSON'
[API change].
+ Text.Pandoc.Writers.Shared: modify `addVariablesToContext`,
`defField`, `setField`, `getField`, `resetField` to work with
Context rather than JSON values. [API change]
+ Text.Pandoc.Writers.Shared: export new function `endsWithPlain` [API
change].
+ Use new templates and doclayout in writers.
+ Use Doc-based templates in all writers.
+ Adjust three tests for minor template rendering differences.
+ Added indentation to body in docbook4, docbook5 templates.
The main impact of this change is better reflowing of content
interpolated into templates. Previously, interpolated variables
were rendered independently and intepolated as strings, which could lead
to overly long lines. Now the templates interpolated as Doc values
which may include breaking spaces, and reflowing occurs
after template interpolation rather than before.
Changed optMetadataFile from `Maybe FilePath` to `[FilePath]`. This allows
for multiple YAML metadata files to be added. The new default value has
been changed from `Nothing` to `[]`.
To account for this change in `Text.Pandoc.App`, `metaDataFromFile` now
operates on two `mapM` calls (for `readFileLazy` and `yamlToMeta`) and a fold.
Added a test (command/5700.md) which tests this functionality and
updated MANUAL.txt, as per the contributing guidelines.
With the current behavior, using `foldr1 (<>)`, values within files
specified first will be used over those in later files. (If the reverse
of this behavior would be preferred, it should be fixed by changing
foldr1 to foldl1.)
the token string is modified by a parser (e.g. accent when
it only takes part of a Word token).
Closes#5686. Still not ideal, because we get the whole
`\t0BAR` and not just `\t0` as a raw latex inline command.
But I'm willing to let this be an edge case, since you
can easily work around this by inserting a space, braces,
or raw attribute. The important thing is that we no longer
drop the rest of the document after a raw latex inline
command that gobbles only part of a Word token!
The web service passed in to `--webtex` may render formulas using inline
or display style by default. Prefixing formulas with the appropriate
command ensures they are rendered correctly.
This is a followup to the discussion in #5656.
Previously we just omitted these. Now we render them
using `\hfill\break` instead of `\\`. This is a revision
of a PR by @sabine (#5591) who should be credited with the
idea.
Closes#3324.
The `raw_attribute` will be used to mark raw bits, even HTML
and LaTeX, and even when `raw_html` and `raw_tex` are enabled,
as they are by default.
To get the old behavior, disable `raw_attribute` in the writer.
Closes#4311.
Now we boldface code but not other things. This matches the
most common style in man pages (particularly option lists).
Also, remove a regression in the last commit in which 'nowrap'
was removed.
Previously if the language was not in the list of listings-
supported languages, it would not be added as a class, so
custom syntax highlighting could not be used.
Closes#5540.
Change is in rawLaTeXInline in LaTeX reader, but
it affects the markdown reader and other readers
that allow raw LaTeX.
Previously, trailing `{}` would be included for
unknown commands, but not for known commands.
However, they are sometimes used to avoid a trailing
space after the command. The chances that a `{}`
after a LaTeX command is not part of the command
are very small.
Closes#5439.
Move up the pattern match to be reachable, closes#5517.
Previously `file:/` URLs were handled wrongly and pandoc attempted
to make HTTP requests, which failed.
Previously if labels had integer names, it could produce a conflict
with auto-labeled reference links. Now we test for a conflict and find
the next available integer.
Note that this involves adding a new state variable `stPrevRefs` to
keep track of refs used in other document parts when using
`--reference-location=block|section`
Closes#5495