This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content]. This allows
existing pandoc code to use a better parser without
much modification.
The new parser is used in all places where xml-light's
parser was previously used. Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).
Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation. It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.
The new parser also reports errors, which we report
when possible.
A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].
Closes#7091, which was the main stimulus.
These changes revealed the need for some changes
in the tests. The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.
Add entity defs to docbook-reader.docbook.
Update golden tests for docx.
Load the iftex package directly rather than via the ifxetex and ifluatex compatibility
wrappers, which have been merged into a single package that is part of the LaTeX core.
The capitalization of the commands has been changed for compatibility with older
versions of TeX Live that have the version of iftex by the Persian TeX Group. This had
been removed in
<2845794c0c>
for compatibility with BasicTeX, but that is no longer an issue.
* `biblatex` and `bibtex` are now supported as output
as well as input formats.
* New module Text.Pandoc.Writers.BibTeX, exporting
writeBibTeX and writeBibLaTeX. [API change]
* New unexported function `writeBibtexString` in
Text.Pandoc.Citeproc.BibTeX.
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3
With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected. The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.
Closes#4565.
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.
Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.
For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.
In order for numbers and reference text to be updated the generated
document must be refreshed.
Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).
* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.
Closes#6662.
- Fix margin before codeblock
- Add `monobackgroundcolor` variable, making the background color
and padding of code optional.
- Ensure that backgrounds from highlighting styles take precedence over
monobackgroundcolor
- Remove list markers from TOC
- Add margin-bottom where needed
- Remove italics from blockquote styling
- Change borders and spacing in tables to be more consistent with other
output formats
- Style h5, h6
- Decrease root font-size to 18px
- Update tests for styles.html changes
- Add CSS example to MANUAL
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
This gives a rule that has been been superseded by
commit 47537d26db.
The section is concerned to explain a discrepancy with
original Markdown.pl and its test suite. In the case
under consideration, Markdown.pl gave strange results
which pandoc corrected. I think it's no longer worth
wasting space on this, as its behavior seems clearly wrong.
If we are going to comment on every edge case with
Markdown.pl, the manual will get too long.
Babelmark 2 shows that some of the older implementations
follow Markdown.pl -- PHP Markdown, Python Markdown,
redcarpet, discount.
https://johnmacfarlane.net/babelmark2/?normalize=1&text=%2B+++First%0A%2B+++Second%0A++++-+a%0A++++-+b%0A%0A%2B+Third%0ACloses#6684.
Specifying `-f ipynb+raw_markdown` will cause Markdown cells
to be represented as raw Markdown blocks, instead of being
parsed. This is not what you want when going from `ipynb`
to other formats, but it may be useful when going from `ipynb`
to Markdown or to `ipynb`, to avoid semantically insignificant
changes in the contents of the Markdown cells that might
otherwise be introduced.
Closes#5408.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
This allows attributes to be added to any block or inline
element, in principle. (Though in many cases this will be
done by adding a Div or Span container, since pandoc's
AST doesn't have a slot for attributes for most elements.)
Currently this is only possible with the commonmark and gfm
readers.
Add `Ext_attributes` constructor for `Extension` [API change].
This commit adds the option `--no-check-certificate`, which disables certificate
checking when resources are fetched by HTTP.
Co-authored-by: Cécile Chemin <cecile.chemin@insee.fr>
Co-authored-by: Juliette Fourcot <juliette.fourcot@insee.fr>
This reverts commit 3493d6afaa.
This might be worth considering in the future, but let's not do
it yet...the additional complexity needs a better justification.
This is experimental. Normally metadata values are interpreted
as markdown, but if the !!literal tag is used they will be interpreted
as plain strings.
We need to consider whether this can still be implemented if
we switch back from HsYAML to yaml for performance reasons.
New formats:
- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."
The "jats" output format is now an alias for "jats_archiving".
Closes: #6014
This follows the advise on the Lua
website (https://www.lua.org/about.html#name):
> […] "Lua" is a name, the name of the Earth's moon and the name of the
> language. Like most names, it should be written in lower case with an
> initial capital, that is, "Lua".
With positive heading shifts, starting in 2.8 this option caused
metadata titles to be removed and changed to regular headings.
This behavior is incompatible with the old behavior of
`--base-header-level` and breaks old workflows, so with this
commit we are rolling back this change.
Now, there is an asymmetry in positive and negative heading
level shifts:
+ With positive shifts, the metadata title stays the same and
does not get changed to a heading in the body.
+ With negative shifts, a heading can be converted into the
metadata title.
I think this is a desirable combination of features, despite
the asymmetry. One might, e.g., want to have a document
with level-1 section headigs, but render it to HTML with
level-2 headings, retaining the metadata title (which pandoc
will render as a level-1 heading with the default template).
Closes#5957.
Revises #5615.
Certain options (`--self-contained`, `--include-in-header`, etc.)
imply `--standalone`. We now handle this after option parsing
so that it affects options specified in defaults files too.
Behavior change: `--title-prefix` no longer implies `--standalone`.
Certain command-line arguments can be repeated:
`--metadata-file`, `--css`, `--include-in-header`,
`--include-before-body`, `--include-after-body`, `--variable`,
`--metadata`, `--syntax-definition`. In these cases, values
specified in default files should be added to the list rather
than replacing values specified earlier on the command line
(perhaps in other default files).
So, for example, if one does
pandoc --variable foo=3 --defaults d1 --defaults d2
and `d1` sets the variable `bar` and `d2` sets `baz`,
all three variables will be set.
Closes#5894.
Superscripts and subscripts cannot contain spaces,
but newlines were previously allowed (unintentionally).
This led to bad interactions in some cases with footnotes.
E.g.
```
foo^[note]
bar^[note]
```
With this change newlines are also not allowed inside
super/subscripts.
Closes#5878.
Previously, if a document contained two YAML metadata blocks
that set the same field, the conflict would be resolved in favor
of the first. Now it is resolved in favor of the second (due to
a change in pandoc-types).
This makes the behavior more uniform with other things in pandoc
(such as reference links and `--metadata-file`).
PDF output will not be output to the terminal, but can be
sent to stdout using either `-o -` or a pipe.
The intermediate format will be determined based on
the setting of `--pdf-engine`.
Closes#5751.
when `latex_macros` is disabled. (When `latex_macros` is enabled,
we omit them, since pandoc is applying the macros itself.)
Previously, it was documented that the macro definitions got
passed through as raw latex regardless of whether `latex_macros`
was set -- but in fact they never got passed through.
- ToYAML instance is now for `Opt -> Opt`, rather than `Opt`.
- This allows us to handle `--defaults` without clobbering all the
options that occur prior to `--defaults` on the command line.
(Note, however, that options in `--defaults` can replace these
options if the `--defaults` option is used after them,
which may be a bit confusing given the name.)
- `--defaults` may now be used multiple times on the command line,
allowing users to break defaults into different chunks.
- Add FromYAML instances to Opt and to all subsidiary types.
- Remove the use of HsYAML-aeson, which doesn't give good
position information on errors.
- Rename some fields in Opt to better match cli options or
reflect what the ycontain [API change]:
+ optMetadataFile -> optMetadataFiles
+ optPDFEngineArgs -> optPDFEngineOpts
+ optWrapText -> optWrap
- Add IpynbOutput enumerated type to Text.Pandoc.App.Opts.
Use this instead fo a string for optIpynbOutput.
- Add FromYAML instance for Filter in Text.Pandoc.Filters.
With these changes parsing of defaults files should be
complete and should give decent error messages.
Now (unlike before) we get an error if an unknown field
is used.
on conflicting fields. This changes earlier behavior (but not in
a release), where first took precedence.
Note that this may seem inconsistent with the behavior of
multiple YAML blocks within a document, where the first takes
precedence. Still, it is convenient to be able to override
defaults with options later on the command line.
If this is present on a heading with the 'unnumbered' class,
the heading won't appear in the TOC. This class has no
effect if 'unnumbered' is not also specified.
This affects HTML-based writers (including slide shows
and epub), LateX (including beamer), RTF, and PowerPoint.
Other writers do not yet support `unlisted`.
Closes#1762.
+ An error is now raised if you try to specify (enable or
disable) an extension that does not affect the given
format, e.g. `docx+pipe_tables`.
+ The `--list-extensions[=FORMAT]` option now lists only
extensions that affect the given FORMAT.
+ Text.Pandoc.Error: Add constructors `PandocUnknownReaderError`,
`PandocUnknownWriterError`, `PandocUnsupportedExtensionError`.
[API change]
+ Text.Pandoc.Extensions now exports `getAllExtensions`,
which returns the extensions that affect a given format
(whether enabled by default or not). [API change]
+ Text.Pandoc.Extensions: change type of `parseFormatSpec`
from `Either ParseError (String, Extensions -> Extensions)`
to `Either ParseError (String, [Extension], [Extension])`
[API change].
+ Text.Pandoc.Readers: change type of `getReader` so it returns
a value in the PandocMonad instance rather than an Either
[API change]. Exceptions for unknown formats and unsupported
extensions are now raised by this function and need not be handled by
the calling function.
+ Text.Pandoc.Writers: change type of `getWriter` so it returns
a value in the PandocMonad instance rather than an Either
[API change]. Exceptions for unknown formats and unsupported
extensions are now raised by this function and need not be handled by
the calling function.