This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
...instead of cmark-gfm (a wrapper around a C library).
We can now support many more pandoc extensions for
commonmark and gfm.
Add fenced_code_attributes to gfm/commonmark extensions.
Braces are now always escaped, even within words or when surrounded by
whitespace. Jira and Confluence treat braces specially.
Package jira-wiki-markup must be version 1.3.2 or later.
Fixes: #6478
This solves the following problems of the Jira reader:
* Two consecutive markup chars are now parsed verbatim; styled text
must not be empty.
* Styled text may not contain newlines.
* Links to anchors are now parsed as links.
Fixes: #6343Fixes: #6325Fixes: #6407
In b3cfdc2c7 the license was changed to GPL-2.0-or-later which is an
SPDX expression, however cabal only interprets the license field as an
SPDX expression if cabal-version is 2.2 or later.
Starting with 2.2 cabal-version also has to be the first statement in
the .cabal file.
The PandocError type is used throughout the Lua subsystem, all Lua
functions throw an exception of this type if an error occurs. The
`LuaException` type is removed and no longer exported from
`Text.Pandoc.Lua`. In its place, a new constructor `PandocLuaError` is
added to PandocError.
This commit adds the option `--no-check-certificate`, which disables certificate
checking when resources are fetched by HTTP.
Co-authored-by: Cécile Chemin <cecile.chemin@insee.fr>
Co-authored-by: Juliette Fourcot <juliette.fourcot@insee.fr>
Multiple parsing problems are resolved, including issues with empty
table cells, faulty recognition of closing emphasis characters, and
parsing of image attributes.
Fixes: #6212Fixes: #6219Fixes: #6220
A bug was fixed which caused faulty parsing if a table was not preceded
by a newline and the first table cell had no space after the initial `|`
characters.
Fixes: #6198
A bug was fixed which caused non-emphasized text containing digits and/or
non-special symbols (like dots) to sometimes be parsed incorrectly.
Fixes: #6196
* Use implicit Prelude
The previous behavior was introduced as a fix for #4464. It seems that
this change alone did not fix the issue, and `stack ghci` and `cabal
repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded
for these versions. Given this, it seems cleaner to revert to the
implicit Prelude.
* PandocMonad: remove outdated check for base version
Only base versions 4.9 and later are supported, the check for
`MIN_VERSION_base(4,8,0)` is therefore unnecessary.
* Always use custom prelude
Previously, the custom prelude was used only with older GHC versions, as
a workaround for problems with ghci. The ghci problems are resolved by
replacing package `base` with `base-noprelude`, allowing for consistent
use of the custom prelude across all GHC versions.
* Support for colored inlines has been added.
* Lists are now allowed to be indented; i.e., lists are still recognized
if list markers are preceded by spaces.
Closes: #6183, #6184
New formats:
- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."
The "jats" output format is now an alias for "jats_archiving".
Closes: #6014
Lists of Inline and Block elements can now be filtered via `Inlines` and
`Blocks` functions, respectively. This is helpful if a filter conversion
depends on the order of elements rather than a single element.
For example, the following filter can be used to remove all spaces
before a citation:
function isSpaceBeforeCite (spc, cite)
return spc and spc.t == 'Space'
and cite and cite.t == 'Cite'
end
function Inlines (inlines)
for i = #inlines-1,1,-1 do
if isSpaceBeforeCite(inlines[i], inlines[i+1]) then
inlines:remove(i)
end
end
return inlines
end
Closes: #6038
Closes#6031. The new version of doclayout fixes a
memory leak that affected `--include-in-header` with
large files (and possibly other cases involving extremely
long lines).
- Add Text.Pandoc.Emoji.TH.
- Replace long literal list in Text.Pandoc.Emoji with one-liner
generating it from data/emoji.json using TH.
- Add Makefile target to download data/emoji.json.
- Remove tools/emoji.hs.
PR #5884.
+ Use pandoc-types 1.20 and texmath 0.12.
+ Text is now used instead of String, with a few exceptions.
+ In the MediaBag module, some of the types using Strings
were switched to use FilePath instead (not Text).
+ In the Parsing module, new parsers `manyChar`, `many1Char`,
`manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`,
`mantyUntilChar` have been added: these are like their
unsuffixed counterparts but pack some or all of their output.
+ `glob` in Text.Pandoc.Class still takes String since it seems
to be intended as an interface to Glob, which uses strings.
It seems to be used only once in the package, in the EPUB writer,
so that is not hard to change.
* Set dbBook to true when traversing a chapter too.
Currently, a `<title/>` in a chapter and in a `<section/>` below that
chapter have the same level if they're not inside a `<book/>`.
This can happen in a multi-file book project. Also see the example at
https://tdg.docbook.org/tdg/4.5/chapter.html
Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
* Add docbook-chapter test
This tests nested `<section/>` and makes sure `<title/>` in the first
`<section/>` below `<chapter/>` is one level deeper than the `<chapter/>`'s
`<title/>`, also when not inside a `<book/>`.
Co-authored-by: Félix Baylac-Jacqué <felix@alternativebit.fr>
The new version of doctemplates adds many features to pandoc's
templating system, while remaining backwards-compatible.
New features include partials and filters. Using template filters,
one can lay out data in enumerated lists and tables.
Templates are now layout-sensitive: so, for example, if a
text with soft line breaks is interpolated near the end of
a line, the text will break and wrap naturally. This makes
the templating system much more suitable for programatically
generating markdown or other plain-text files from metadata.
- Add FromYAML instances to Opt and to all subsidiary types.
- Remove the use of HsYAML-aeson, which doesn't give good
position information on errors.
- Rename some fields in Opt to better match cli options or
reflect what the ycontain [API change]:
+ optMetadataFile -> optMetadataFiles
+ optPDFEngineArgs -> optPDFEngineOpts
+ optWrapText -> optWrap
- Add IpynbOutput enumerated type to Text.Pandoc.App.Opts.
Use this instead fo a string for optIpynbOutput.
- Add FromYAML instance for Filter in Text.Pandoc.Filters.
With these changes parsing of defaults files should be
complete and should give decent error messages.
Now (unlike before) we get an error if an unknown field
is used.
* [Docx Parser] Move style-parsing-specific code to a new module
* [Docx Writer] Re-use Readers.Docx.Parse.Styles for StyleMap
* [Docx Writer] Move Readers.Docx.StyleMap to Writers.Docx.StyleMap
It's never used outside of writer code, so it makes more sense to scope it under writers really.
Most of this is due to @vijayphoenix (#5704), but it
needed some revisions to integrate with current
master, and to use the released HsYAML.
Closes#5704.
* pandoc.cabal: remove conditionals for ghc < 8.0. Support for GHC 7.10 has been dropped.
* pandoc.cabal: compile with `-Wcpp-undef` when possible
* pandoc.cabal: compile with `-fhide-source-paths` if possible
+ Remove Text.Pandoc.Pretty; use doclayout instead. [API change]
+ Text.Pandoc.Writers.Shared: remove metaToJSON, metaToJSON'
[API change].
+ Text.Pandoc.Writers.Shared: modify `addVariablesToContext`,
`defField`, `setField`, `getField`, `resetField` to work with
Context rather than JSON values. [API change]
+ Text.Pandoc.Writers.Shared: export new function `endsWithPlain` [API
change].
+ Use new templates and doclayout in writers.
+ Use Doc-based templates in all writers.
+ Adjust three tests for minor template rendering differences.
+ Added indentation to body in docbook4, docbook5 templates.
The main impact of this change is better reflowing of content
interpolated into templates. Previously, interpolated variables
were rendered independently and intepolated as strings, which could lead
to overly long lines. Now the templates interpolated as Doc values
which may include breaking spaces, and reflowing occurs
after template interpolation rather than before.
Changed optMetadataFile from `Maybe FilePath` to `[FilePath]`. This allows
for multiple YAML metadata files to be added. The new default value has
been changed from `Nothing` to `[]`.
To account for this change in `Text.Pandoc.App`, `metaDataFromFile` now
operates on two `mapM` calls (for `readFileLazy` and `yamlToMeta`) and a fold.
Added a test (command/5700.md) which tests this functionality and
updated MANUAL.txt, as per the contributing guidelines.
With the current behavior, using `foldr1 (<>)`, values within files
specified first will be used over those in later files. (If the reverse
of this behavior would be preferred, it should be fixed by changing
foldr1 to foldl1.)
Lua filters must be able to traverse sequences of AST elements and to
replace elements by splicing sequences back in their place. Special
`Walkable` instances can be used for this; those are provided in a new
module `Text.Pandoc.Lua.Walk`.
* Require recent doctemplates. It is more flexible and
supports partials.
* Changed type of writerTemplate to Maybe Template instead
of Maybe String.
* Remove code from the LaTeX, Docbook, and JATS writers that looked in
the template for strings to determine whether it is a book or an
article, or whether csquotes is used. This was always kludgy and
unreliable. To use csquotes for LaTeX, set `csquotes` in your
variables or metadata. It is no longer sufficient to put
`\usepackage{csquotes}` in your template or header includes.
To specify a book style, use the `documentclass` variable or
`--top-level-division`.
* Change template code to use new API for doctemplates.
Return value is now Text rather than being polymorphic.
This makes room for upcoming removal of the TemplateTarget
class from doctemplates.
Other code modified accordingly, and should compile with
both current and upcoming version of doctemplates.
A new function `pandoc.mediabag.items` was added to Lua module
pandoc.mediabag. This allows users to lazily iterate over all media bag
items, loading items into Lua one-by-one. Example:
for filename, mime_type, content in pandoc.mediabag.items() do
-- use media bag item.
end
This is a convenient alternative to using `mediabag.list` in combination
with `mediabag.lookup`.
Version specifiers like `PANDOC_VERSION` and `PANDOC_API_VERSION` are
turned into `Version` objects. The objects simplify version-appropriate
comparisons while maintaining backward-compatibility.
A function `pandoc.types.Version` is added as part of the newly
introduced module `pandoc.types`, allowing users to create version
objects in scripts.
This makes use of tasty-lua, a package to write tests in Lua
and integrate the results into Tasty output. Test output becomes
more informative: individual tests and test groups become visible
in test output. Failures are reported with helpful error messages.
The `system` Lua module provides utility functions to interact with the
operating- and file system. E.g.
print(pandoc.system.get_current_directory())
or
pandoc.system.with_temporary_directory('tikz', function (dir)
-- write and compile a TikZ file with pdflatex
end)
Instead of `$HOME/.pandoc`, the default user data directory is
now `$XDG_DATA_HOME/pandoc`, where `XDG_DATA_HOME` defaults to
`$HOME/.local/share` but can be overridden by setting the environment
variable.
If this directory is missing, then `$HOME/.pandoc` is searched
instead, for backwards compatibility. However, we recommend
moving local pandoc data files from `$HOME/.pandoc` to
`$HOME/.local/share/pandoc`.
On Windows the default user data directory remains the same.
Closes#3582.
[API change]
* Depend on ipynb library.
* Add `ipynb` as input and output format.
* Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4).
* Added Text.Pandoc.Writers.Ipynb (supports nbformat v4).
* Added ipynb readers and writers to T.P.Readers,
T.P.Writers, and T.P.Extensions. Register the
file extension .ipynb for this format.
* Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error.
* Note: there is no template for ipynb.
The only thing we gained from the custom build was
automatic installation of the man page when using
'cabal install'. But custom builds cause problems,
e.g., with cross-compilation.
Installation of the man page is better handled by packagers.
Note to packagers (e.g. Debian): it may be necessary
to add a step installing the man page with the next
release.
The new Opt module has only a few dependencies. This is important for
compile-times during development, as Template Haskell containing modules
are be recompiled whenever a (transitive) dependency changes.
Disabling the flag will cause derivation of ToJSON and FromJSON
instances via GHC Generics instead of Template Haskell. The flag is
enabled by default, as deriving via Generics can be slow (see #4083).
* Lua: allow access to pandoc state
Lua filters and custom writers now have read-only access to most fields
of pandoc's internal state via the global variable `PANDOC_STATE`.
* Lua: allow iterating through fields of PANDOC_STATE
* Lua filters doc: describe CommonState
* Lua filters doc: mention global variable PANDOC_STATE
* Lua: add access to logs
Log messages can currently only be printed, but not decomposed.
(unexported module). These are used in both the man and ms
writers.
Moved groffEscape out of Text.Pandoc.Writers.Shared [cancels earlier
API change from adding it, which was after last release].
This fixes strong/code combination on man (should be `\f[CB]` not
`\f[BC]`), mentioned in #4973.
Updated tests.
Closes#4975.
A proper Lua traceback is added if either loading of a file or execution
of a filter function fails. This should be of help to authors of Lua
filters who need to debug their code.
Now the `write*` functions for Docbook, HTML, ICML, JATS,
Man, Ms, OPML are sensitive to `writerPreferAscii`. Previously
the to-ascii translation was done in Text.Pandoc.App, and
thus not available to those using the writer functions
directly.
In addition, the LaTeX writer is now sensitive to
`writerPreferAscii` and to `--ascii`. 100% ASCII
output can't be guaranteed, but the writer will use
commands like `\"{a}` and `\l` whenever possible,
to avoid emiting a non-ASCII character.
A new unexported module, Text.Pandoc.Groff, has been
added to store functions used in the different groff-based
writers.
This collects some of the general-purpose code from the LaTeX
reader, with the aim of making the module smaller. (We've been
having out-of-memory issues compiling this module on CI.)