..and add new definitions isomorphic to xml-light's, but with
Text instead of String. This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.
We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).
Update golden tests for docx and pptx.
OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.
Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.
Benchmarks:
A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit
| Reader | A | B | C |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms | 10 ms |
| opml | 65 ms | 62 ms | 35 ms |
| jats | 15 ms | 11 ms | 9 ms |
| docx | 72 ms | 69 ms | 44 ms |
| odt | 78 ms | 41 ms | 28 ms |
| epub | 64 ms | 61 ms | 56 ms |
| fb2 | 14 ms | 5 ms | 4 ms |
* Modified the Doc parser to skip leading blank lines. This fixes
parsing of documents which start with multiple blank lines.
(#7095)
* Prevent URLs within link aliases to be treated as autolinks.
(#6944)
Fixes: #7095Fixes: #6944
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content]. This allows
existing pandoc code to use a better parser without
much modification.
The new parser is used in all places where xml-light's
parser was previously used. Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).
Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation. It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.
The new parser also reports errors, which we report
when possible.
A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].
Closes#7091, which was the main stimulus.
These changes revealed the need for some changes
in the tests. The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.
Add entity defs to docbook-reader.docbook.
Update golden tests for docx.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.
It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.
This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.
Closes#7068.
* `biblatex` and `bibtex` are now supported as output
as well as input formats.
* New module Text.Pandoc.Writers.BibTeX, exporting
writeBibTeX and writeBibLaTeX. [API change]
* New unexported function `writeBibtexString` in
Text.Pandoc.Citeproc.BibTeX.
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
- Use dev version of citeproc, which handles duplicate
ids better, preferring the last one in the list
and discarding the rest.
- Ensure that inline citations take priority over external
ones.
See jgm/citeproc#36.
This restores the behavior of pandoc-citeproc.
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3
With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected. The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.
Closes#4565.
This isn't really necessary and can be misleading
(e.g. on macOS, where a fully static build isn't
possible). cabal's new option
`--enable-executable-static` does the same. On stack
you can add something like this to the options for your
executable in package.yaml:
ld-options: -static -pthread
In the previous release we pointed to this with cabal.project
and stack.yaml, but jumped the gun because citeproc 0.1.0.3
had not yet been officially released.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.
* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
modules under Text.Pandoc.Citeproc). Exports `processCitations`.
[API change]
* Add data files needed for Text.Pandoc.Citeproc: default.csl
in the data directory, and a citeproc directory that is just
used at compile-time. Note that we've added file-embed as a mandatory
rather than a conditional depedency, because of the biblatex
localization files. We might eventually want to use readDataFile
for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
release binary build instructions. We will no longer distribute
pandoc-citeproc.
* Markdown reader: tweak abbreviation support. Don't insert a
nonbreaking space after a potential abbreviation if it comes right before
a note or citation. This messes up several things, including citeproc's
moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
to convert between `csljson` and other bibliography formats,
and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
change]
* Added `bibtex`, `biblatex` as input formats. This allows pandoc
to convert between BibLaTeX and BibTeX and other bibliography formats,
and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
`readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
This is needed because pandoc readers for bibliography formats put
the bibliographic information in the `references` field of metadata;
and unless standalone is specified, metadata gets ignored.
(TODO: This needs improvement. We should trigger standalone for the
reader when the input format is bibliographic, and for the writer
when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`. This was just
ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
[API change] This runs the processCitations transformation.
We need to treat it like a filter so it can be placed
in the sequence of filter runs (after some, before others).
In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
so this special filter may be specified either way in a defaults file
(or by `citeproc: true`, though this gives no control of positioning
relative to other filters). TODO: we need to add something to the
manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
This behaves like a filter and will be positioned
relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
section which also includes some information formerly found in
the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
directory. This changes the old pandoc-citeproc behavior, which looked
in `~/.csl`. Users can simply symlink `~/.csl` to the `csl`
subdirectory of their pandoc user data directory if they want
the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
Ms writers. Added CSL-related CSS to styles.html.
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.
Old filters using tables now require minimal changes and can use,
e.g.,
if PANDOC_VERSION > {2,10,1} then
pandoc.Table = pandoc.SimpleTable
end
and
function Table (tbl)
tbl = pandoc.utils.to_simple_table(tbl)
…
return pandoc.utils.from_simple_table(tbl)
end
to work with the current pandoc version.
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
Add Writers.Tables helper functions and types, add tests for those
The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.
The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.
Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
Instead rely on the markdown writer with appropriate extensions.
Export writeCommonMark variant from Markdown writer.
This changes a few small things in rendering markdown,
e.g. w/r/t requiring backslashes before spaces inside
super/subscripts.
...instead of cmark-gfm (a wrapper around a C library).
We can now support many more pandoc extensions for
commonmark and gfm.
Add fenced_code_attributes to gfm/commonmark extensions.
Braces are now always escaped, even within words or when surrounded by
whitespace. Jira and Confluence treat braces specially.
Package jira-wiki-markup must be version 1.3.2 or later.
Fixes: #6478
This solves the following problems of the Jira reader:
* Two consecutive markup chars are now parsed verbatim; styled text
must not be empty.
* Styled text may not contain newlines.
* Links to anchors are now parsed as links.
Fixes: #6343Fixes: #6325Fixes: #6407