Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output. This is a regression
due to our transition from cmark-gfm. This commit fixes it.
Closes#6768.
This fixes a problem with author-in-text citations for references
including both an author and an editor. Previously, both were
included in the text, but only the author should be.
Closes#6765. Added a test.
This fixes the citation number issue with ieee.csl and other
styles that do not explicitly sort bibliographies. (Pandoc
was numbering them by their order in the bibliography file,
rather than the order cited, as required by the CSL spec.)
Closes#6741.
The renderCslJson function escapes `<`, `>`, and `&` as entities.
This is appropriate when generating HTML, but in CSL JSON
these are supposed to appear unescaped.
Closesjgm/citeproc#17.
For security reasons, some legal firms delete the date from comments and
tracked changes.
* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.
Closes#5948.
if they come before csl-right-inline. This ensures that
the citation number or label will be separated from the
rest by a space, even in formats (like plain) that don't yet have
special handling for the display spans.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
When a list occurs at the beginning of a definition list definition,
it can start on the same line as the label, which looks bad.
Fix that by starting such lists with an `\item[]`.
* Fix hlint suggestions, update hlint.yaml
Most suggestions were redundant brackets. Some required
LambdaCase.
The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
Screen readers read an image's `alt` attribute and the figure caption,
both of which come from the same source in pandoc. The figure caption is
hidden from screen readers with the `aria-hidden` attribute. This
improves accessibility.
For HTML4, where `aria-hidden` is not allowed, pandoc still uses an
empty `alt` attribute to avoid duplicate contents.
Closes: #6491
The reader now parses the contents of the markdown cell to a Pandoc
structure, but *also* stores the raw markdown in a `source`
attribute on the cell Div. When we convert back to markdown,
this attribute is stripped off and the original source is used.
When we convert to other formats, the attribute is usually
ignored (though it will come through in HTML as a `data-source`
attribute, not unhelpfully).
I'll note some potential drawbacks of this approach:
- It makes it impossible to use pandoc to clean up or
change the contents of markdown cells, e.g.
going from `+smart` to `-smart`.
- There may be formats where the addition of the `source`
attribute is problematic. I can't think of any, though.
Closes#5408.
The lines of unknown keywords, like `#+SOMEWORD: value` are no longer
read as metadata, but kept as raw `org` blocks. This ensures that more
information is retained when round-tripping org-mode files;
additionally, this change makes it possible to support non-standard org
extensions via filters.
The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has
changed: Org now allows multiple such lines and adds a space between the
contents of each line. Pandoc now always parses these settings as meta
inlines; setting values are no longer treated as comma-separated lists.
Note that a Lua filter can be used to restore the previous behavior.
`#+DESCRIPTION` lines are treated as text with markup. If multiple such
lines are given, then all lines are read and separated by soft
linebreaks.
Closes: #6485
The `tex` export option can be set with `#+OPTION: tex:nil` and allows
three settings:
- `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX,
- `nil` removes all LaTeX fragments from the document, and
- `verbatim` treats LaTeX as text.
The default is `t`.
Closes: #4070
Braces are now always escaped, even within words or when surrounded by
whitespace. Jira and Confluence treat braces specially.
Package jira-wiki-markup must be version 1.3.2 or later.
Fixes: #6478
Closes#6385. (The summary element needs to be the first
child of details and should not be enclosed by p tags.)
NOTE: you need to include a blank line before the closing
`</details>`, if you want the last part of the content to
be parsed as a paragraph.
Some CSS to ensure that display math is
displayed centered and on a new line is now included
in the default HTML-based templates; this may be
overridden if the user wants a different behavior.
This change will not have any effect with the default style.
However, it enables users to use a style (via a reference.docx)
that turns on row and/or column bands.
Closes#6371.
This reverts a change in the last release; the Div is
no longer needed, because we can now put the id right in
the Table's attributes. However, writers may still need
to be modified to do something with the id in a Table
(e.g. create an anchor), so in the short term we may lose
the ability to link to tables in some writers.
The PandocError type is used throughout the Lua subsystem, all Lua
functions throw an exception of this type if an error occurs. The
`LuaException` type is removed and no longer exported from
`Text.Pandoc.Lua`. In its place, a new constructor `PandocLuaError` is
added to PandocError.
Now a cell with dimension (h, w) will be cut up into h*w cells of
dimension (1,1), all in the same grid position, with the upper-left
holding the original cell contents and the rest being empty.
The Builder.simpleTable now only adds a row to the TableHead when the
given header row is not null. This uncovered an inconsistency in the
readers: some would unconditionally emit a header filled with empty
cells, even if the header was not present. Now every reader has the
conditional behaviour. Only the XWiki writer depended on the header
row being always present; it now pads its head as necessary.
- Writers.Native is now adapted to the new Table type.
- Inline captions should now be conditionally wrapped in a Plain, not
a Para block.
- The toLegacyTable function now lives in Writers.Shared.
Icons are now converted as follows: `(/)` to ✔, `(x)` to ❌, `(!)` to
❗, `(+)` to ➕, `(-)` to ➖, `(off)` to 🌙, and `(*)` to ☆. The new
icons render well in most fonts. Furthermore, the UTF-8 characters all
fit into 4-bytes.
Closes: #6264
- SmallCaps instead of Span for the part after the initial capital.
- Ensure that both arguments are parsed, so that in Markdown both
are treated as raw LateX. (Closes #6258.)
Nested syntaxes are specified like this:
{{{sql
SELECT * FROM table
}}}
The preformatted code block parser has been extended to check if the
first attribute of the block is not a `key=value` pair, and in that case
it will be considered as a class.
Closes#6256.
Jira text which is marked as `+inserted+` is converted into pandoc's
default representation for underlined text: a span with class
`underline`. Previously, the span was marked with the non-standard class
`inserted`.
Closes: #6237
Jira images attributes as in `!image.jpg|align=right!` are retained as
key-value pairs. Thumbnail images, such as `!example.gif|thumbnail!`,
are marked by a `thumbnail` class in their attributes.
Related to #6234.
* Simplify by collapsing a do block into a single <$>
* Remove an unnecessary variable: `all` takes any Foldable, so only blocksToInlines needs toList.
A bug was fixed which caused faulty parsing if a table was not preceded
by a newline and the first table cell had no space after the initial `|`
characters.
Fixes: #6198
A bug was fixed which caused non-emphasized text containing digits and/or
non-special symbols (like dots) to sometimes be parsed incorrectly.
Fixes: #6196
* Support for colored inlines has been added.
* Lists are now allowed to be indented; i.e., lists are still recognized
if list markers are preceded by spaces.
Closes: #6183, #6184
This reverts commit 3493d6afaa.
This might be worth considering in the future, but let's not do
it yet...the additional complexity needs a better justification.
This is experimental. Normally metadata values are interpreted
as markdown, but if the !!literal tag is used they will be interpreted
as plain strings.
We need to consider whether this can still be implemented if
we switch back from HsYAML to yaml for performance reasons.
Closes#5849. Previously biblatex citations were only grouped if
there was no prefix. This patch allows them to be grouped in
subgroups split by prefixes and suffixes, which allows better citation
sorting.
New formats:
- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."
The "jats" output format is now an alias for "jats_archiving".
Closes: #6014
* Use concatMap instead of reimplementing it
* Replace an unnecessary multi-way if with a regular if
* Use sortOn instead of sortBy and comparing
* Use guards instead of lots of indents for if and else
* Remove redundant do blocks
* Extract common functions from both branches of maybe
Whenever both the Nothing and the Just branch of maybe do the same
function, do that function on the result of maybe instead.
* Use fmap instead of reimplementing it from maybe
* Use negative forms instead of negating the positive forms
* Use mapMaybe instead of mapping and then using catMaybes
* Use zipWith instead of mapping over the result of zip
* Use unwords instead of reimplementing it
* Use <$ instead of <$> and const
* Replace case of Bool with if and else
* Use find instead of listToMaybe and filter
* Use zipWithM instead of mapM and zip
* Inline lambda wrappers into the real functions
* We get zipWithM from Text.Pandoc.Writers.Shared
* Use maybe instead of fromMaybe and fmap
I'm not sure how this one slipped past me.
* Increase a bit of indentation
Lists of Inline and Block elements can now be filtered via `Inlines` and
`Blocks` functions, respectively. This is helpful if a filter conversion
depends on the order of elements rather than a single element.
For example, the following filter can be used to remove all spaces
before a citation:
function isSpaceBeforeCite (spc, cite)
return spc and spc.t == 'Space'
and cite and cite.t == 'Cite'
end
function Inlines (inlines)
for i = #inlines-1,1,-1 do
if isSpaceBeforeCite(inlines[i], inlines[i+1]) then
inlines:remove(i)
end
end
return inlines
end
Closes: #6038
The functions `table.insert`, `table.remove`, and `table.sort` are added
to pandoc.List elements. They can be used as methods, e.g.
local numbers = pandoc.List {2, 3, 1}
numbers:sort() -- numbers is now {1, 2, 3}
If parsing fails in a raw environment (e.g. due to special
characters like unescaped `_`), try again as a verbatim
environment, which is less sensitive to special characters.
This allows us to capture special environments that change
catcodes as raw tex when `-f latex+raw_tex` is used.
Closes#6034.
The fix to #6030 actually changed behavior, so that the
2D nesting occurred at slide level N-1 and N, instead of
at the top-level section. This commit restores the 2.7.3 behavior.
If there are more than 2 levels, the top level is horizontal
and the rest are collapsed to vertical.
Closes#6032.
The current DTD for the JATS writer template is for Journal Publishing (JATS-journalpublishing1.dtd), which does not permit ext-link as a valid child (https://jats.nlm.nih.gov/publishing/tag-library/1.1/element/publisher-name.html).
This update modifies the default output template to be the less restrictive JATS archiving and interchange DTD which systems like PubMed use internally to represent their articles.
Pandoc's AST is translated into the Jira AST, which is then rendered by
the dedicated Jira printer.
The following improvements are included in this change:
- non-jira raw blocks are fully discarded instead of showing as blank
lines;
- table cells can contain multiple blocks;
- unnecessary blank lines are removed from the output;
- markup chars within words are properly surrounded by braces;
- preserving soft linebreaks via `--wrap=preserve` is supported.
Note that backslashes are rendered as HTML entities, as there appears no
alternative to produce a plain backslash if it is followed by markup.
This may cause problems when used with confluence, where rendering seems
to fail in this case.
Closes: #5926
Added `\setupinterlinespace` to `title`, `subtitle`, `date` and `author` elements.
Otherwise longer titles that run over multiple lines will look squashed as
`\tfd` etc. won't adapt the line spacing to the font size.
With positive heading shifts, starting in 2.8 this option caused
metadata titles to be removed and changed to regular headings.
This behavior is incompatible with the old behavior of
`--base-header-level` and breaks old workflows, so with this
commit we are rolling back this change.
Now, there is an asymmetry in positive and negative heading
level shifts:
+ With positive shifts, the metadata title stays the same and
does not get changed to a heading in the body.
+ With negative shifts, a heading can be converted into the
metadata title.
I think this is a desirable combination of features, despite
the asymmetry. One might, e.g., want to have a document
with level-1 section headigs, but render it to HTML with
level-2 headings, retaining the metadata title (which pandoc
will render as a level-1 heading with the default template).
Closes#5957.
Revises #5615.
Backslash-escaping is used instead of HTML entities, as escaped
characters are easier to read this way. Furthermore, Confluence, which
seems to use a subset of Jira markup, seems to get confused by HTML
entities.
Previously they appeared centered at the top of the page;
now we put them centered at the bottom, unless the `pagenumbering`
variable is set (this gives users full control over page
number format and position,
https://wiki.contextgarden.net/Command/setuppagenumbering)
All headings now have a uniform color.
Level-1 headings no longer set `w:themeShade="B5"`.
Level-2 headings are now 14 point rather than 16 point.
Level-3 headings are now 12 point rather than 14 point.
Level-4 headings are italic rather than bold.
Closes#5820.
If a simple table would be too wide, we use a grid table.
The code for generating grid tables has been adjusted to
give more intelligent column widths when widths aren't
given. (This also affects the markdown writer.)
Closes#5899.
`Nothing` means: nothing specified.
`Just []` means: an empty list specified (e.g. in defaults).
Potentially these could lead to different behavior: see #5888.
The check whether a complex inline element following a string must be
escaped, now depends on the last character of the string instead of the
first.
Fixes: #5906
PR #5884.
+ Use pandoc-types 1.20 and texmath 0.12.
+ Text is now used instead of String, with a few exceptions.
+ In the MediaBag module, some of the types using Strings
were switched to use FilePath instead (not Text).
+ In the Parsing module, new parsers `manyChar`, `many1Char`,
`manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`,
`mantyUntilChar` have been added: these are like their
unsuffixed counterparts but pack some or all of their output.
+ `glob` in Text.Pandoc.Class still takes String since it seems
to be intended as an interface to Glob, which uses strings.
It seems to be used only once in the package, in the EPUB writer,
so that is not hard to change.
Superscripts and subscripts cannot contain spaces,
but newlines were previously allowed (unintentionally).
This led to bad interactions in some cases with footnotes.
E.g.
```
foo^[note]
bar^[note]
```
With this change newlines are also not allowed inside
super/subscripts.
Closes#5878.
* Add HTML Reader support for `<dfn>`, parsing this as a Span with class `dfn`.
* Change `htmlSpanLikeElements` implementation to retain classes,
attributes and inline content.
Previously, if a document contained two YAML metadata blocks
that set the same field, the conflict would be resolved in favor
of the first. Now it is resolved in favor of the second (due to
a change in pandoc-types).
This makes the behavior more uniform with other things in pandoc
(such as reference links and `--metadata-file`).
The first list item of a sublist should not resume numbering
from the number of the last sublist item of the same level,
if that sublist was a sublist of a different list item.
That is, we should not get:
```
1. one
1. sub one
2. sub two
2. two
3. sub one
```
* Set dbBook to true when traversing a chapter too.
Currently, a `<title/>` in a chapter and in a `<section/>` below that
chapter have the same level if they're not inside a `<book/>`.
This can happen in a multi-file book project. Also see the example at
https://tdg.docbook.org/tdg/4.5/chapter.html
Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
* Add docbook-chapter test
This tests nested `<section/>` and makes sure `<title/>` in the first
`<section/>` below `<chapter/>` is one level deeper than the `<chapter/>`'s
`<title/>`, also when not inside a `<book/>`.
Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
The new version of doctemplates adds many features to pandoc's
templating system, while remaining backwards-compatible.
New features include partials and filters. Using template filters,
one can lay out data in enumerated lists and tables.
Templates are now layout-sensitive: so, for example, if a
text with soft line breaks is interpolated near the end of
a line, the text will break and wrap naturally. This makes
the templating system much more suitable for programatically
generating markdown or other plain-text files from metadata.
This package is needed for proper handling of image filenames
containing periods (in addition to the period before the
extension).
Unfortunately, grffile breaks in the latest texlive update.
Until a fix is released (see ho-tex/oberdiek#73) it seems best
to remove this from the default template.
This may cause problems if you have filenames with periods.
The workaround is to put `\usepackage{grffile}` in header-includes,
and be sure you're using an older version of texlive packages.
See #5848. We will leave that issue open to remind us to
check upstream, and restore grffile when it's possible to
do so.
When a div surrounds multiple sections at the same level,
or a section of highre level followed by one of lower level,
then we just leave it as a div and create a new div for the
section.
Closes#5846, closes#5761.
Comment lines in Org-mode can be completely empty; both of these line
should produce no output:
# a comment
#
The reader used to produce a wrong result for the latter, but ignores
that line as well now.
Fixes: #5856
* HTML reader: Handle cite attribute for quotes. If a `<q>` tag has a `cite` attribute, we interpret it as a Quoted element with an inner Span. Closes#5798
* Refactor url canonicalization into a helper function
* Modify HTML writer to handle quote with cite.
[0]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/q
Parse <mark> elements from HTML as HTML span like elements, with a
single class matching the tag name `mark`. Mark elements are rendered to
HTML using the native <mark> element.
Fixes https://github.com/jgm/pandoc/issues/5797.
* Text.Pandoc.Shared: export `htmlSpanLikeElements` [API change]
This commit also introduces a mapping of HTML span like elements that
are internally represented as a Span with a single class, but that are
converted back to the original element by the html writer. As of now,
only the kbd element is handled this way. Ideally these elements should
be handled as plain AST values, but since that would be a breaking
change with a large impact, we revert to this stop-gap solution.
Fixes https://github.com/jgm/pandoc/issues/5796.
We change to use 0.5pt rather than `\linethickness`, which
apparently only ever worked "by accident" and no longer works
with recent updates to texlive.
Closes#5801.
...even when we must lose width information.
All in all this seems to be people's preferred behavior, even though it
is slightly lossier.
Closes#2608.
Closes#4497.
on conflicting fields. This changes earlier behavior (but not in
a release), where first took precedence.
Note that this may seem inconsistent with the behavior of
multiple YAML blocks within a document, where the first takes
precedence. Still, it is convenient to be able to override
defaults with options later on the command line.