- Add manual entry for (non-default) extension
`rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
in Text.Pandoc.Extensions [API change]. When enabled, this
extension rewrites relative image and link paths by prepending
the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.
Closes#3752.
NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
* Allow spaces and most unicode characters in attachment links.
* No longer require a newline character after `{noformat}`.
* Only allow URI path segment characters in bare links.
* The `file:` schema is no longer allowed in bare links; these
rarely make sense.
Closes: #7218
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.
As a result, the readers had no way of knowing which file
was the source of any particular bit of text. This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).
Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class. A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`. The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).
Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides. Modified parsers to use a `Sources` as stream
[API change].
The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.
In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.
T.P.Error: showPos, do not print "-" as source name.
* Build `+RTS -A256m -RTS` into default ghc-options for benchmark,
so we don't have to specify this separately on the command line.
This is necessary to get accurate benchmark results; otherwise
we are largely measuring garbage collecting, some not related
to the current benchmark.
* Switch back from gauge to tasty-bench.
* Allow specifying BASELINE file in 'make bench' for comparison
(otherwise the latest is chosen by default).
* Remove obsolete reference to weigh-pandoc from CONTRIBUTING.md.
* Remove `-Rghc-timing` from 'make bench'.
Jira reader:
* Fixed parsing of autolinks (i.e., of bare URLs in the text).
Previously an autolink would take up the rest of a line, as spaces
were allowed characters in these items.
* Emoji character sequences no longer cause parsing failures. This was
due to missing backtracking when emoji parsing fails.
Jira writer:
* Block quotes are only rendered as `bq.` if they do not contain a
linebreak.
This reverts commit 7a1d0f01e9.
This option gives confusing output when a build is interrupted,
suggesting that packages aren't required when we just didn't
get to the model that requires them.
This version of skylighting uses xml-conduit rather than hxt.
This speeds up parsing of XML syntax definitions fourfold, and
removes four packages from pandoc's dependency graph:
hxt-charproperties
hxt-unicode
hxt-regex-xmlschema
hxt
..and add new definitions isomorphic to xml-light's, but with
Text instead of String. This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.
We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).
Update golden tests for docx and pptx.
OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.
Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.
Benchmarks:
A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit
| Reader | A | B | C |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms | 10 ms |
| opml | 65 ms | 62 ms | 35 ms |
| jats | 15 ms | 11 ms | 9 ms |
| docx | 72 ms | 69 ms | 44 ms |
| odt | 78 ms | 41 ms | 28 ms |
| epub | 64 ms | 61 ms | 56 ms |
| fb2 | 14 ms | 5 ms | 4 ms |
* Modified the Doc parser to skip leading blank lines. This fixes
parsing of documents which start with multiple blank lines.
(#7095)
* Prevent URLs within link aliases to be treated as autolinks.
(#6944)
Fixes: #7095Fixes: #6944
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content]. This allows
existing pandoc code to use a better parser without
much modification.
The new parser is used in all places where xml-light's
parser was previously used. Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).
Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation. It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.
The new parser also reports errors, which we report
when possible.
A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].
Closes#7091, which was the main stimulus.
These changes revealed the need for some changes
in the tests. The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.
Add entity defs to docbook-reader.docbook.
Update golden tests for docx.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.
It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.
This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.
Closes#7068.
* `biblatex` and `bibtex` are now supported as output
as well as input formats.
* New module Text.Pandoc.Writers.BibTeX, exporting
writeBibTeX and writeBibLaTeX. [API change]
* New unexported function `writeBibtexString` in
Text.Pandoc.Citeproc.BibTeX.
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
- Use dev version of citeproc, which handles duplicate
ids better, preferring the last one in the list
and discarding the rest.
- Ensure that inline citations take priority over external
ones.
See jgm/citeproc#36.
This restores the behavior of pandoc-citeproc.
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3
With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected. The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.
Closes#4565.
This isn't really necessary and can be misleading
(e.g. on macOS, where a fully static build isn't
possible). cabal's new option
`--enable-executable-static` does the same. On stack
you can add something like this to the options for your
executable in package.yaml:
ld-options: -static -pthread
In the previous release we pointed to this with cabal.project
and stack.yaml, but jumped the gun because citeproc 0.1.0.3
had not yet been officially released.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
...instead of cmark-gfm (a wrapper around a C library).
We can now support many more pandoc extensions for
commonmark and gfm.
Add fenced_code_attributes to gfm/commonmark extensions.
Braces are now always escaped, even within words or when surrounded by
whitespace. Jira and Confluence treat braces specially.
Package jira-wiki-markup must be version 1.3.2 or later.
Fixes: #6478
This solves the following problems of the Jira reader:
* Two consecutive markup chars are now parsed verbatim; styled text
must not be empty.
* Styled text may not contain newlines.
* Links to anchors are now parsed as links.
Fixes: #6343Fixes: #6325Fixes: #6407
In b3cfdc2c7 the license was changed to GPL-2.0-or-later which is an
SPDX expression, however cabal only interprets the license field as an
SPDX expression if cabal-version is 2.2 or later.
Starting with 2.2 cabal-version also has to be the first statement in
the .cabal file.
The PandocError type is used throughout the Lua subsystem, all Lua
functions throw an exception of this type if an error occurs. The
`LuaException` type is removed and no longer exported from
`Text.Pandoc.Lua`. In its place, a new constructor `PandocLuaError` is
added to PandocError.
This commit adds the option `--no-check-certificate`, which disables certificate
checking when resources are fetched by HTTP.
Co-authored-by: Cécile Chemin <cecile.chemin@insee.fr>
Co-authored-by: Juliette Fourcot <juliette.fourcot@insee.fr>
Multiple parsing problems are resolved, including issues with empty
table cells, faulty recognition of closing emphasis characters, and
parsing of image attributes.
Fixes: #6212Fixes: #6219Fixes: #6220
A bug was fixed which caused faulty parsing if a table was not preceded
by a newline and the first table cell had no space after the initial `|`
characters.
Fixes: #6198
A bug was fixed which caused non-emphasized text containing digits and/or
non-special symbols (like dots) to sometimes be parsed incorrectly.
Fixes: #6196
* Use implicit Prelude
The previous behavior was introduced as a fix for #4464. It seems that
this change alone did not fix the issue, and `stack ghci` and `cabal
repl` only work with GHC 8.4.1 or newer, as no custom Prelude is loaded
for these versions. Given this, it seems cleaner to revert to the
implicit Prelude.
* PandocMonad: remove outdated check for base version
Only base versions 4.9 and later are supported, the check for
`MIN_VERSION_base(4,8,0)` is therefore unnecessary.
* Always use custom prelude
Previously, the custom prelude was used only with older GHC versions, as
a workaround for problems with ghci. The ghci problems are resolved by
replacing package `base` with `base-noprelude`, allowing for consistent
use of the custom prelude across all GHC versions.
* Support for colored inlines has been added.
* Lists are now allowed to be indented; i.e., lists are still recognized
if list markers are preceded by spaces.
Closes: #6183, #6184
New formats:
- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."
The "jats" output format is now an alias for "jats_archiving".
Closes: #6014
Lists of Inline and Block elements can now be filtered via `Inlines` and
`Blocks` functions, respectively. This is helpful if a filter conversion
depends on the order of elements rather than a single element.
For example, the following filter can be used to remove all spaces
before a citation:
function isSpaceBeforeCite (spc, cite)
return spc and spc.t == 'Space'
and cite and cite.t == 'Cite'
end
function Inlines (inlines)
for i = #inlines-1,1,-1 do
if isSpaceBeforeCite(inlines[i], inlines[i+1]) then
inlines:remove(i)
end
end
return inlines
end
Closes: #6038
Closes#6031. The new version of doclayout fixes a
memory leak that affected `--include-in-header` with
large files (and possibly other cases involving extremely
long lines).