Commit graph

7705 commits

Author SHA1 Message Date
John MacFarlane
073895c340 Fix some lint issues. 2021-08-11 17:53:39 -07:00
John MacFarlane
dd1a956a8a LaTeX reader: Support \global before \def, \let, etc.
See #7494.
2021-08-11 16:28:53 -07:00
John MacFarlane
e3a263df46 Fix scope for LaTeX macros.
They should by default scope over the group in which they
are defined (except `\gdef` and `\xdef`, which are global).
In addition, environments must be treated as groups.

We handle this by making sMacros in the LaTeX parser state
a STACK of macro tables. Opening a group adds a table to
the stack, closing one removes one.  Only the top of the stack
is queried.

This commit adds a parameter for scope to the Macro constructor
(not exported).

Closes #7494.
2021-08-11 16:14:34 -07:00
John MacFarlane
a0e44b1ff6 LaTeX reader: improve handling of plain TeX macro primitives.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
  macros (so are equivalent to `\gdef` and `\xdef`).  This should be
  fixed by scoping macro definitions to groups, in a future commit.

Closes #7474.
2021-08-11 10:32:52 -07:00
John MacFarlane
3a924d8f96 HTML reader: treat commments as blank when parsing.
This modifies pBlank.  Previously comments could sometimes
flummox the parser.

Cloes #7482.
2021-08-10 12:50:23 -07:00
John MacFarlane
3d7120083a Fix RTF table parsing bug that created undesired nested tables.
Closes #7488.
2021-08-10 11:09:12 -07:00
John MacFarlane
6543b05116 Add RTF reader.
- `rtf` is now supported as an input format as well as output.
- New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change]

Closes #3982.
2021-08-10 10:48:55 -07:00
John MacFarlane
c0b68b2030 Allow --slide-level=0.
When the slide level is set to 0, headings won't be used at all
in splitting the document into slides. Horizontal rules must be
used to separate slides.

Closes #7476.
2021-08-08 11:20:26 -07:00
John MacFarlane
dea1f0f080 RTF writer: emit \outlinelevel for section headings. 2021-08-04 16:37:20 -06:00
mt_caret
407de98b5e
Stop using the HTTP package. (#7456)
We only depend on the urlEncode function in the package, which is also
provided by http-types. The HTTP package also depends on the network
package, which has difficulty building on ghcjs.

Add internal module Text.Pandoc.Network.HTTP, exporting `urlEncode`.
2021-08-03 15:53:05 -06:00
Peter Fabinski
8667ba2bcc
LaTeX table writer: Increase column width precision (#7466)
In some cases, the rounding performed by the LaTeX table
writer would introduce visible overrun outside the text
area.
This adds two more decimal places to the width values.
2021-08-03 15:34:39 -06:00
John MacFarlane
f938378d00 RTF writer: omit \bin in \pict.
According to the spec, this is not needed or wanted when
the data is in hexadecimal format, as it is here.
2021-08-01 22:45:41 -06:00
John MacFarlane
f145aea0f9 parseFromString: preserve at least the source directory.
Previously we just set the source name to "chunk" when parsing
from strings, to avoid misleading source positions.

This had the side effect that `rebase_relative_paths` would break
inside sections that were parsed as strings.

So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk".

Closes #7464.
2021-07-29 14:54:25 -06:00
John MacFarlane
1f1a30bbf6 LaTeX writer: Use ulem for underline.
ulem is conditionally included already when the `strikeout`
variable is set, so we set this when there is underlined text,
and use `\uline` instead of `\underline`.

This fixes wrapping for underlined text.
Closes #7351.
2021-07-22 23:05:43 -07:00
John MacFarlane
832196fb17 MIME: use image/x-xcf instead of application/x-xcf.
Closes #7454.
2021-07-22 13:08:30 -07:00
John MacFarlane
31a5bccd57 LaTeX reader: avoid trailing hyphen in translating languages.
Previously `\foreignlanguage{english}` turned into `<span lang="en-">`.
The same issue affected Arabic.

Closes #7447.
2021-07-17 23:07:53 -07:00
John MacFarlane
46099e79de DocBook reader: handle images with imageobjectco elements.
Closes #7440.
2021-07-16 13:10:45 -07:00
John MacFarlane
493522c562 LaTeX reader: Support \cline in LaTeX tables.
Closes #7442.
2021-07-16 12:04:43 -07:00
John MacFarlane
18270c7a39 PDF: Fix svgIn path error.
We were duplicating the temp directory; this didn't show up
on macOS or linux because there we use absolute paths for
the temp directory.

Closes #7431.
2021-07-16 11:39:02 -07:00
Jan Tojnar
06408d08e5
DocBook reader: add support for citerefentry (#7437)
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element.
These days, refentry is more general, for example the element documentation pages linked below are each a refentry.

As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot.

https://tdg.docbook.org/tdg/5.1/citerefentry.html
https://tdg.docbook.org/tdg/5.1/manvolnum.html
https://tdg.docbook.org/tdg/5.1/refentry.html

This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11 15:28:52 -07:00
John MacFarlane
ac0a9da6d8 Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).
We now use source positions from the token stream to tell us
how much of the text stream to consume.  Getting this to
work required a few other changes to make token source positions
accurate.

Closes #7434.
2021-07-11 13:50:28 -07:00
John MacFarlane
477a67061f Always use / when adding directory to image path with extractMedia.
Even on Windows.

May help with #7431.
2021-07-09 14:14:19 -07:00
John MacFarlane
ae22b1e977 RST reader: fix regression with code includes.
With the recent changes to include infrastructure,
included code blocks were getting an extra newline.

Closes #7436.  Added regression test.
2021-07-09 12:27:41 -07:00
Michael Hoffmann
565330033a
Don't incorporate externally linked images in EPUB documents (#7430)
Just like it is possible to avoid incorporating an image in EPUB by
passing `data-external="1"` to a raw HTML snippet, this makes the same
possible for native Images, by looking for an associated `external`
attribute.
2021-07-07 09:26:37 -07:00
Michael Hoffmann
e56e2b0e0b
Recognize data-external when reading HTML img tags (#7429)
Preserve all attributes in img tags.  If attributes have a `data-`
prefix, it will be stripped.  In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06 16:06:29 -07:00
John MacFarlane
e7f8cc5786 T.P.PDF, convertImage: normalize paths.
This will avoid paths on Windows with mixed path separators,
which may cause problems with SVG conversion.

See #7431.
2021-07-06 10:39:47 -07:00
John MacFarlane
f88ebf3ebf Markdown reader: don't try to read contents in self-closing HTML tag.
Previously we had problems parsing raw HTML with self-closing
tags like `<col/>`. The problem was that pandoc would look
for a closing tag to close the markdown contents, but the
closing tag had, in effect, already been parsed by `htmlTag`.

This fixes the issue described in
<https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-07-06 10:22:07 -07:00
John MacFarlane
3ed37f0077 HTML reader: add col, colgroup to 'closes' definitions 2021-07-06 10:21:59 -07:00
John MacFarlane
3a31fe68ef Add command test for #7394.
And fix a small bug in handling of citations in notes, which
led to commas at the end of sentences in some cases.
2021-07-05 15:10:14 -07:00
John MacFarlane
77537b1765 Citeproc: cleanup and efficiency improvement in deNote. 2021-07-05 13:41:01 -07:00
John MacFarlane
ff26af59ac Revamp note citation handling.
Use latest citeproc, which uses a Span with a class rather
than a Note for notes.  This helps us distinguish between
user notes and citation notes.

Don't put citations at the beginning of a note in parentheses.
(Closes #7394.)
2021-07-05 13:19:33 -07:00
Aner Lucero
cb038bb312 HTML5 writer, remove aria-hidden when explicit atl text is provided. 2021-07-02 13:02:52 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
John MacFarlane
a01ba4463f Docx writer: Fixed a couple bugs in Figure numbering. 2021-06-29 11:15:13 -07:00
John MacFarlane
a3d745e485 Docx writer: support figure numbers.
These are set up in such a way that they will work with Word's
automatic table of figures.

Closes #7392.
2021-06-29 09:56:21 -07:00
Aner Lucero
f4ef652a41 Remove duplicated alt text in HTML output. 2021-06-29 09:02:13 -07:00
John MacFarlane
851d037b3e Improve punctuation moving with --citeproc.
Previously, using `--citeproc` could cause punctuation to move in
quotes even when there aer no citations. This has been changed;
now, punctuation moving is limited to citations.

In addition, we only move footnotes around punctuation if the
style is a note style, even if `notes-after-punctuation` is `true`.
2021-06-28 22:41:14 -07:00
John MacFarlane
97b0aa667c Allow $ characters in bibtex keys.
Closes #7409.
2021-06-28 13:34:12 -07:00
John MacFarlane
f045e59248 Text.Pandoc.Error: fix line calculations in reporting parsec errors.
Also remove a spurious initial newline in the error report.
2021-06-28 13:28:49 -07:00
John MacFarlane
4262898fe9 Set proper initial source name in parsing BibTeX.
(For better error messages.)
2021-06-28 13:28:02 -07:00
John MacFarlane
dd098d4e15 Markdown writer: put space between Plain and following fenced Div.
Closes #4465.
2021-06-28 11:33:22 -07:00
John MacFarlane
4a7a0cff29 ImageSize: Add Tiff constructor for ImageType.
[Minor API change]

This allows pandoc to get size information from tiff images.
Closes #7405.
2021-06-23 11:39:50 -07:00
John MacFarlane
235cdea629 reveal.js writer: Go back to setting boolean values for variables.
In a previous commit we used strings because boolean False
wouldn't render as `false`. This is changed in the dev
version ofdoctemplates, so we can go back to the more
straightforward approach.
2021-06-23 09:54:14 -07:00
John MacFarlane
1b07997f4a Fix regression with comment-only YAML metadata blocks.
Closes #7400.
2021-06-22 09:55:50 -07:00
John MacFarlane
086790d986 Fix unneeded import 2021-06-22 09:49:24 -07:00
John MacFarlane
8eed5b90d0 LaTeX writer: add strut at end of minipage if it contains...
line breaks.  Without them, the last line is shorter
than it should be, at least in some cases.
2021-06-21 23:33:00 -07:00
John MacFarlane
9867231779 Revert "LaTeX writer: put a strut after a line break (\\)."
This reverts commit e2a7ecb5f7.
2021-06-21 23:19:40 -07:00
John MacFarlane
e2a7ecb5f7 LaTeX writer: put a strut after a line break (\\).
This ensures that we have proper spacing before the next
line (which might e.g. be a table bottom border).
This gives better results in cases like test/command/7272.md.
2021-06-21 23:17:43 -07:00
John MacFarlane
0352f7845b Improve emailAddress in Text.Pandoc.Parsing.
Previously the parser would accept characters in domains
that are illegal in domains, and this sometimes caused it
to gobble bits of the following text.

Closes #7398.

Note that this change, by itself, caused some txt2tag reader
tests to fail. txt2tags allows bare email addresses with
a following form query.  So, in addition to the change
to emailAddress, we modify the txt2tags parser so it can
still handle these cases.
2021-06-21 22:35:07 -07:00
John MacFarlane
ed3974a254 LaTeX writer: always use a minipage for cells with line breaks...
if width information is available.  Otherwise the way we treat them can
lead to content that overflows a cell.

Closes #7393.
2021-06-21 18:25:36 -07:00
John MacFarlane
eee648447a LaTeX writer: Use \strut instead of ~ before \\ in empty line. 2021-06-21 18:25:07 -07:00
John MacFarlane
14b2eb2aeb reveal.js writer: better handling of options.
Previously it was impossible to specify false values for
options that default to true; setting the option to false
just caused the portion of the template setting the option
to be omitted.

Now we prepopulate all the variables with their default
values, including them unconditionally and allowing them
to be overridden.
2021-06-21 16:40:52 -07:00
John MacFarlane
82ad855f38 Markdown writer: Fix regression in code blocks with attributes.
Code blocks with a single class but nonempty attributes
were having attributes drop as a result of #7242.

Closes #7397.
2021-06-21 08:49:00 -07:00
John MacFarlane
3fb5499dd6 insertMediaBag: ensure we get sane mediaPath for URLs.
Long URLs cannot be treated as mediaPaths, but System.FilePath's
`isRelative` often returns True for them.  So we add a check
for an absolute URL.  We also ensure that extensions are derived
only from the path portion of URLs (previously a following query
was being included).

Closes #7391.
2021-06-18 13:19:24 -07:00
John MacFarlane
cfa26e3ca0 Docx reader: handle absolute URIs in Relationship Target.
Closes #7374.
2021-06-12 13:56:09 -07:00
John MacFarlane
ea53a1dc5c Markdown writer: allow pipe_tables to be disabled for commonmark...
(commonmark_x, gfm).  Closes #7375.
2021-06-12 10:20:19 -07:00
John MacFarlane
b0cd6c6224 Fix regression in citeproc processing.
If inline references are used (in the metadata `references` field),
we should still only include in the bibliography items that are
actually cited -- unless `nocite` is used.

Closes #7376.
2021-06-12 10:16:44 -07:00
John MacFarlane
3776e828a8 Fix MediaBag regressions.
With the 2.14 release `--extract-media` stopped working as before;
there could be mismatches between the paths in the rendered document and
the extracted media.

This patch makes several changes (while keeping the same API).

The `mediaPath` in 2.14 was always constructed from the SHA1 hash of
the media contents.  Now, we preserve the original path unless it's
an absolute path or contains `..` segments (in that case we use a path
based on the SHA1 hash of the contents).

When constructing a path from the SHA1 hash, we always use the
original extension, if there is one. Otherwise we look up an
appropriate extension for the mime type.

`mediaDirectory` and `mediaItems` now use the `mediaPath`, rather
than the mediabag key, for the first component of the tuple.
This makes more sense, I think, and fits with the documentation
of these functions; eventually, though, we should rework the API so that
`mediaItems` returns both the keys and the MediaItems.

Rewriting of source paths in `extractMedia` has been fixed.

`fillMediaBag` has been modified so that it doesn't modify
image paths (that was part of the problem in #7345).

We now do path normalization (e.g. `\` separators on Windows) only
in writing the media; the paths are left unchanged in the image
links (sensibly, since they might be URLs and not file paths).

These changes should restore the original behavior from before 2.14.

Closes #7345.
2021-06-10 16:47:02 -07:00
John MacFarlane
aa79b3035c T.P.MIME, extensionFromMimeType: add a few special cases.
When we do a reverse lookup in the MIME table, we just get the
last match, so when the same mime type is associated with several
different extensions, we sometimes got weird results, e.g. `.vs`
for `text/plain`.  These special cases help us get the most standard
extensions for mime types like `text/plain`.
2021-06-10 16:36:54 -07:00
Albert Krewinkel
c7dd33d5aa
Docx writer: fix handling of empty table headers
A table header which does not contain any cells is now treated as an
empty header.

Fixes: #7369
2021-06-10 18:36:49 +02:00
Albert Krewinkel
55bcd4b4fb
Lua utils: fix handling of table headers in from_simple_table
Passing an empty list of header cells now results in an empty table
header.

Fixes: #7369
2021-06-10 18:36:49 +02:00
John MacFarlane
76e5f047b0 Citeproc: avoid duplicate classes and attributes on refs div. 2021-06-08 17:51:53 -07:00
John MacFarlane
21cc52abe3 LaTeX writer: Fix regression in table header position.
In recent versions the table headers were no longer bottom-aligned
(if more than one line).  This patch fixes that by using minipages
for table headers in non-simple tables.

Closes #7347.
2021-06-05 14:13:58 -06:00
Jan Tojnar
c550bf8482 CommonMark writer: do not use simple class for fenced-divs
In https://github.com/jgm/pandoc/pull/7242, we introduced a simple attribute style for for code blocks and fenced divs with a single class but turns out the CommonMark extension does not support it for fenced divs.

https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/fenced_divs.md
2021-06-05 13:51:18 -06:00
Jan Tojnar
7a3ee9d3d8 CommonMark writer: do not throw away attributes when Ext_attributes is enabled
Ext_attributes covers at least the following:

- Ext_fenced_code_attributes
- Ext_header_attributes
- Ext_inline_code_attributes
- Ext_link_attributes
2021-06-05 13:51:18 -06:00
Jan Tojnar
c6f8c38c49 Markdown writer: re-use functions from Inline
Instead of duplicating linkAttributes and attrsToMarkdown, let’s just use those from the Inline module.
2021-06-05 13:51:18 -06:00
Jan Tojnar
c8ab8bccf2 DocBook reader: Add support for danger element
Added in DocBook 5.2:

- https://github.com/docbook/docbook/pull/64
- https://tdg.docbook.org/tdg/5.2/danger.html
2021-06-05 08:02:21 -06:00
Jan Tojnar
af9de925de DocBook writer: Remove non-existent admonitions
attention, error and hint are actually just reStructuredText specific.
danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-06-05 08:02:21 -06:00
John MacFarlane
b6c04383e4 T.P.Class.IO: normalise path in writeMedia.
This ensures that we get `\` separators on Windows.
2021-06-03 18:34:38 -06:00
John MacFarlane
311736fb0a Text.Pandoc.PDF: only print relevant part of environment on --verbose. 2021-06-02 15:21:13 -06:00
John MacFarlane
2b5dad9912 Fix regression in 2.14 for generation of PDFs with SVGs.
Closes #7344.
2021-06-02 10:42:22 -06:00
John MacFarlane
3b628f7664 HTML writer: Don't omit width attribute on div.
Closes #7342.
2021-06-01 21:57:49 -06:00
John MacFarlane
2e4ef14d91 Markdown reader: fix pipe table regression in 2.11.4.
Previously pipe tables with empty headers (that is, a header
line with all empty cells) would be rendered as headerless
tables.  This broke in 2.11.4.

The fix here is to produce an AST with an empty table head
when a pipe table has all empty header cells.

Closes #7343.
2021-06-01 21:44:55 -06:00
John MacFarlane
abb59bd582 LaTeX reader: don't allow optional * on symbol control sequences.
Generally we allow optional starred variants of LaTeX commands
(since many allow them, and if we don't accept these explicitly,
ignoring the star usually gives acceptable results).  But we
don't want to do this for `\(*\)` and similar cases.

Closes #7340.
2021-06-01 13:54:51 -06:00
John MacFarlane
62f46b3995 Fix regression with commonmark/gfm yaml metdata block parsing.
A regression in 2.14 led to the document body being omitted
after YAML metadata in some cases.  This is now fixed.

Closes #7339.
2021-05-31 21:34:51 -06:00
John MacFarlane
fc70f44ee2 HTML reader: fix column width regression.
Column widths specified with a style attribute were
off by a factor of 100 in 2.14.

Closes #7334.
2021-05-30 17:15:14 -07:00
John MacFarlane
cc206af392 Have LoadedResource use relative paths.
The immediate reason for this is to allow the test output of #3752
to work on both windows and linux.
2021-05-30 10:23:00 -07:00
John MacFarlane
c2f46e6df4 Docx writer: fix regression on captions.
The "Table Caption" style was no longer getting applied.
(It was overwritten by "Compact.")

Closes #7328.
2021-05-30 10:07:28 -07:00
John MacFarlane
cc6dcf0392 Markdown reader: in rebasePaths, check for both Windows and Posix
absolute paths.  Previously Windows pandoc was treating
`/foo/bar.jpg` as non-absolute.
2021-05-29 17:36:30 -07:00
John MacFarlane
0d7103de7e In rebasePath, check for absolute paths two ways.
isAbsolute from FilePath doesn't return True on Windows
for paths beginning with `/`, so we check that separately.
2021-05-29 14:41:28 -07:00
John MacFarlane
b6b2331fdc Support rebase_relative_paths for commonmark based formats.
(Including `gfm`.)
2021-05-28 13:58:44 -07:00
Emily Bourke
56b211120c
Docx reader: Support new table features.
* Column spans
* Row spans
  - The spec says that if the `val` attribute is ommitted, its value
    should be assumed to be `continue`, and that its values are
    restricted to {`restart`, `continue`}. If the value has any other
    value, I think it seems reasonable to default it to `continue`. It
    might cause problems if the spec is extended in the future by adding
    a third possible value, in which case this would probably give
    incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
  - The table description element is like alt text for a table (along
    with the table caption element). It seems like we should include
    this somewhere, but I’m not 100% sure how – I’m pairing it with the
    simple caption for the moment. (Should it maybe go in the block
    caption instead?)
* Detect table captions
  - Check for caption paragraph style /and/ either the simple or
    complex table field. This means the caption detection fails for
    captions which don’t contain a field, as in an example doc I added
    as a test. However, I think it’s better to be too conservative: a
    missed table caption will still show up as a paragraph next to the
    table, whereas if I incorrectly classify something else as a table
    caption it could cause havoc by pairing it up with a table it’s
    not at all related to, or dropping it entirely.
* Update tests and add new ones

Partially fixes: #6316
2021-05-28 20:15:23 +02:00
Emily Bourke
44484d0dee
Docx reader: Read table column widths. 2021-05-28 20:15:23 +02:00
John MacFarlane
4842c5fb82 Two citeproc locator/suffix improvements:
- Recognize locators spelled with a capital letter.
  Closes #7323.
- Add a comma and a space in front of the suffix if it doesn't start
  with space or punctuation.  Closes #7324.
2021-05-27 18:28:52 -07:00
John MacFarlane
4b16d181e7 rebase_relative_paths: leave empty paths unchanged. 2021-05-27 14:16:37 -07:00
John MacFarlane
0661ce699f rebase_relative_paths extension: don't change fragment paths.
We don't want a pure fragment path to be rewritten, since
these are used for cross-referencing.
2021-05-27 13:53:26 -07:00
John MacFarlane
6972a7dc91 Modify rebase_reference_links treatment of reference links/images.
The directory is based on the file containing the link
reference, not the file containing the link, if these differ.
2021-05-27 11:26:38 -07:00
John MacFarlane
cbe16b2866 Citeproc: Don't detect math elements as locators.
Closes #7321.
2021-05-27 10:49:45 -07:00
John MacFarlane
834da53058 Add rebase_relative_paths extension.
- Add manual entry for (non-default) extension
  `rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
  in Text.Pandoc.Extensions [API change]. When enabled, this
  extension rewrites relative image and link paths by prepending
  the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.

Closes #3752.

NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
2021-05-27 10:38:25 -07:00
John MacFarlane
81eadfd99a LaTeX reader: improve \def and implement \newif.
- Improve parsing of `\def` macros.  We previously set "verbatim mode"
  even for parsing the initial `\def`; this caused problems for things
  like
  ```
  \def\foo{\def\bar{BAR}}
  \foo
  \bar
  ```
- Implement `\newif`.
- Add tests.
2021-05-27 09:15:04 -07:00
John MacFarlane
8d5014fdfc Logging: remove single quotes around paths in messages.
We weren't doing it consistently and it seems unnecessary.
2021-05-25 11:53:49 -07:00
Albert Krewinkel
105a50569b Allow compilation with base 4.15 2021-05-25 11:52:49 -07:00
Albert Krewinkel
bb2530caa4 Use haddock-library-1.10.0 2021-05-25 11:52:49 -07:00
John MacFarlane
f2c1b57469 PandocMonad: add info message in downloadOrRead...
indicating what path local resources have been loaded from.
2021-05-25 10:08:30 -07:00
John MacFarlane
fb40c8109d Logging: add LoadedResource constructor to LogMessage.
[API change]

This is for INFO-level messages telling where image data has been
loaded from.  (This can vary because of the resource path.)
2021-05-25 10:07:24 -07:00
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
John MacFarlane
8511f6fdf6 MediaBag improvements.
In the current dev version, we will sometimes add
a version of an image with a hashed name, keeping
the original version with the original name, which
would leave to undesirable duplication.

This change separates the media's filename from the
media's canonical name (which is the path of the link
in the document itself).  Filenames are based on SHA1
hashes and assigned automatically.

In Text.Pandoc.MediaBag:

- Export MediaItem type [API change].
- Change MediaBag type to a map from Text to MediaItem [API change].
- `lookupMedia` now returns a `MediaItem` [API change].
- Change `insertMedia` so it sets the `mediaPath` to
  a filename based on the SHA1 hash of the contents.
  This will be used when contents are extracted.

In Text.Pandoc.Class.PandocMonad:

- Remove `fetchMediaResource` [API change].

Lua MediaBag module has been changed minimally. In the future
it would be better, probably, to give Lua access to the full
MediaItem type.
2021-05-24 09:20:44 -07:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
John MacFarlane
1af2cfb287 Handle relative lengths (e.g. 2*) in HTML column widths.
See <https://www.w3.org/TR/html4/types.html#h-6.6>.

"A relative length has the form "i*", where "i" is an integer. When
allotting space among elements competing for that space, user agents
allot pixel and percentage lengths first, then divide up remaining
available space among relative lengths. Each relative length receives a
portion of the available space that is proportional to the integer
preceding the "*". The value "*" is equivalent to "1*". Thus, if 60
pixels of space are available after the user agent allots pixel and
percentage space, and the competing relative lengths are 1*, 2*, and 3*,
the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and
the 3* will be alloted 30 pixels."

Closes #4063.
2021-05-22 22:03:54 -07:00
John MacFarlane
80b4b3fe82 Revert "HTML reader: simplify col width parsing"
This reverts commit f76fe2ab56.
2021-05-22 22:03:51 -07:00
Albert Krewinkel
f76fe2ab56
HTML reader: simplify col width parsing 2021-05-22 13:37:42 +02:00
John MacFarlane
07d299d353 DocBook reader: ensure that first and last names are separated.
Closes #6541.
2021-05-20 18:45:39 -07:00
John MacFarlane
d7b5def287 Ms writer: handle tables with multiple paragraphs.
Previously they overflowed the table cell width.
We now set line lengths per-cell and restore them
after the table has been written.

Closes #7288.
2021-05-20 17:12:38 -07:00
John MacFarlane
bb11f5fb86 LaTeX reader: More siunitx improvements. Closes #6658.
There's still one slight divergence from the siunitx behavior:
we get 'kg m/A/s' instead of 'kg m/(A s)'. At the moment I'm
not going to worry about that.
2021-05-20 15:30:31 -07:00
John MacFarlane
4e990a8cf9 LaTeX/siunitx: fix parsing of \cubic etc. See #6658. 2021-05-20 10:13:20 -07:00
John MacFarlane
bc5058234f LaTeX reader sinuitx: fix + sign on ang. 2021-05-20 10:13:20 -07:00
John MacFarlane
5dc917da3e LaTeX reader siunitx: add leading 0 to numbers starting with . 2021-05-20 10:13:20 -07:00
Denis Maier
183ce58477
ConTeXt reader: improve ordered lists (#7304)
Closes #5016 

- change ordered list from itemize to enumerate
- adds new itemgroup for ordered lists
- add fontfeature for table figures
- remove width from itemize in context writer
2021-05-20 09:59:53 -07:00
John MacFarlane
a366bd6abc LaTeX reader: Fix parsing of +- in siunitx numbers.
See #6658.
2021-05-20 09:03:29 -07:00
John MacFarlane
8437a4a002 LaTeX reader: support \pm in SI{..}.
Closes #6620.
2021-05-20 08:16:46 -07:00
Albert Krewinkel
b6239f4150
ZimWiki writer: allow links and emphasis in headers
The latest version of ZimWiki supports this.

Closes: #6605
2021-05-20 12:48:05 +02:00
John MacFarlane
5736b331d8 LaTeX reader: better support for \xspace.
Previously we only supported it in inline contexts; now
we support it in all contexts, including math.

Partially addresses #7299.
2021-05-19 16:14:49 -07:00
John MacFarlane
640dbf8b8f Remove unused pragma. 2021-05-19 09:51:50 -07:00
John MacFarlane
9b5798bd9a Use fetchItem instead of downloadOrRead in fetchMediaResource. 2021-05-18 22:35:18 -07:00
John MacFarlane
ddbd984a0d Text.Pandoc.MediaBag: change type to use a Text key...
instead of `[FilePath]`.
We normalize the path and use `/` separators for consistency.
2021-05-18 22:34:23 -07:00
Albert Krewinkel
eb3dff148e
LaTeX writer: separate successive quote chars with thin space
Successive quote characters are separated with a thin space to improve
readability and to prevent unwanted ligatures. Detection of these quotes
sometimes had failed if the second quote was nested in a span element.

Closes: #6958
2021-05-18 22:55:47 +02:00
John MacFarlane
56fb4dae1b Citeproc: ensure that CSL-related attributes are passed on...
...to a Div with id 'refs'.  Previously we just left the
attributes of such a Div alone, which meant that style
options like entry-spacing had no effect there.
2021-05-17 20:42:43 -07:00
Albert Krewinkel
1843a8793a
HTML writer: keep attributes from code nested below pre tag.
If a code block is defined with `<pre><code
class="language-x">…</code></pre>`, where the `<pre>` element has no
attributes, then the attributes from the `<code>` element are used
instead. Any leading `language-` prefix is dropped in the code's *class*
attribute are dropped to improve syntax highlighting.

Closes: #7221
2021-05-17 18:08:02 +02:00
Albert Krewinkel
25f5b92777
HTML writer: ensure headings only have valid attribs in HTML4
Fixes: #5944
2021-05-17 15:42:15 +02:00
Albert Krewinkel
4417dacc44
ConTeXt writer: use span identifiers as reference anchors.
Closes: #7246
2021-05-17 13:14:32 +02:00
Albert Krewinkel
d92622ba3c
LaTeX template: define commands for zero width non-joiner character
Closes: #6639

The zero-width non-joiner character is used to avoid ligatures (e.g. in
German).
2021-05-16 12:33:32 -07:00
John MacFarlane
5a6399d9f6 Markdown writer: fewer unneeded escapes for #.
See #6259.
2021-05-16 12:23:34 -07:00
John MacFarlane
39a69c4f93 Markdown writer: improve escaping of @.
We need to escape literal `@` before `{` because of
the new citation syntax.
2021-05-16 11:53:19 -07:00
John MacFarlane
0a4c6925b6 Docx writer: copy over more settings from referenc.odcx.
From settings.xml in the reference-doc, we now include:
`zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`,
`drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`,
`displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`,
`characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`,
`decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`.

Closes #7240.
2021-05-15 15:40:49 -07:00
Albert Krewinkel
0794862aac
HTML writer: parse <header> as a Div
HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-15 16:46:02 +02:00
Albert Krewinkel
013e4a3164
HTML reader: keep h1 tags as normal headers (#7274)
The tags `<title>` and `<h1 class="title">` often contain the same
information, so the latter was dropped from the document. However, as
this can lead to loss of information, the heading is now always
retained.

Use `--shift-heading-level-by=-1` to turn the `<h1>` into the document
title, or a filter to restore the previous behavior.

Closes: #2293
2021-05-14 12:31:24 -07:00
John MacFarlane
76a4e7127b Beamer writer: support exampleblock and alertblock.
A block will be rendered as an exampleblock if the heading
has class `example` and alertblock if it has class `alert`.

Closes #7278.
2021-05-14 10:09:46 -07:00
Albert Krewinkel
3ec5726c9b
Docx writer: fix alignment for cells.
This fixes a regression introduced with the in the colspan/rowspan
changes that caused column alignments to be ignored. The column
alignment is used only if a default alignment is specified at the cell
level; otherwise the cell-level alignment takes precedence.
2021-05-14 16:49:19 +02:00
Albert Krewinkel
17d96404f5
Docx writer: allow multirow table headers 2021-05-14 16:19:20 +02:00
Albert Krewinkel
875f8f3654
HTML reader: don't fail on unmatched closing "script" tag.
Prevent the reader from crashing if the HTML input contains an unmatched
closing `</script>` tag.

Fixes: #7282
2021-05-14 12:13:40 +02:00
John MacFarlane
3f09f53459 Implement curly-brace syntax for Markdown citation keys.
The change provides a way to use citation keys that contain
special characters not usable with the standard citation
key syntax.  Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`.
Closes #6026.

The change requires adding a new parameter to the `citeKey`
parser from Text.Pandoc.Parsing [API change].

Markdown reader: recognize @{..} syntax for citatinos.

Markdown writer:  use @{..} syntax for citations when needed.

Update manual with curly-brace syntax for citations.

Closes #6026.
2021-05-13 21:59:32 -07:00
John MacFarlane
edca1d1656 Plain writer: handle superscript unicode minus.
Closes #7276.  Note:  currently we still get unwanted
white space around the minus; this needs to be addressed
with a change in texmath.
2021-05-12 11:12:27 -07:00
John MacFarlane
0217ae2a4f Hande 'annote' field in bibtex/biblatex writer.
Closes #7266.
2021-05-12 11:05:55 -07:00
John MacFarlane
46309319ef Fix source position reporting for YAML bibliographies.
Closes #7273.
2021-05-12 06:01:13 -06:00
John MacFarlane
5eb7ad7d1e Improve integration of settings from reference.docx.
The settings we can carry over from a reference.docx are
autoHyphenation, consecutiveHyphenLimit, hyphenationZone,
doNotHyphenateCap, evenAndOddHeaders, and proofState.

Previously this was implemented in a buggy way, so that the
reference doc's values AND the new values were included.

This change allows users to create a reference.docx that
sets w:proofState for spelling or grammar to "dirty,"
so that spell/grammar checking will be triggered on the
generated docx.

Closes #1209.
2021-05-11 22:31:38 -06:00
John MacFarlane
a66e50840b T.P.XML.Light - add Eq, Ord instances...
for Content, Element, Attr, CDataKind.
[API change]
2021-05-11 09:01:36 -06:00
John MacFarlane
2bd5d0cafb LaTeX writer: better handling of line breaks in simple tables.
Now we also handle the case where they're embedded in other
elements, e.g. spans. Closes #7272.
2021-05-11 07:52:05 -06:00
nuew
ff7176de80
epub Writer: Fix belongs-to-collection XML id choice (#7267)
The epub writer previously used the same XML id for both the book
identifier and the epub collection. This causes an error on epubcheck.
2021-05-10 09:26:32 -06:00
John MacFarlane
2a2e08d823 RST reader: seek include files in the directory...
...of the file containing the include directive, as
RST requires.

Closes #6632.
2021-05-09 19:11:35 -06:00
John MacFarlane
b2398cd747 Org reader: Resolve org includes relative to ...
...the directory containing the file containing the
INCLUDE directive.  Closes #5501.
2021-05-09 19:11:35 -06:00
John MacFarlane
41a3ac9da9 RST reader: use insertIncludedFile from T.P.Parsing...
instead of reproducing much of its code.
2021-05-09 19:11:34 -06:00
John MacFarlane
05ea507bd7 T.P.Parsing: improve include file functions.
Remove old `insertIncludedFileF`. [API change]
Give `insertIncludedFile` a more general type, allowing it
to be used where `insertIncludedFileF` was.
2021-05-09 19:11:34 -06:00
John MacFarlane
6e45607f99 Change reader types, allowing better tracking of source positions.
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.

As a result, the readers had no way of knowing which file
was the source of any particular bit of text.  This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).

Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class.  A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`.  The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).

Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides.  Modified parsers to use a `Sources` as stream
[API change].

The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.

In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.

T.P.Error: showPos, do not print "-" as source name.
2021-05-09 19:11:34 -06:00
Albert Krewinkel
295d93e96b
ConTeXt writer: support blank lines in line blocks.
Fixes: #6564

Thanks to @denismaier.
2021-05-07 17:17:47 +02:00
Albert Krewinkel
8357b835d9
App: allow tabs expansion even if file-scope is used
Tabs in plain-text inputs are now handled correctly, even if the
`--file-scope` flag is used.

Closes: #6709
2021-05-05 19:09:21 +02:00
Albert Krewinkel
ddbf83f62c
Docx writer: support colspans and rowspans in tables
See: #6315
2021-05-01 18:52:24 +02:00
Albert Krewinkel
3da919e35d
Add new internal module Text.Pandoc.Writers.GridTable 2021-05-01 18:52:24 +02:00
tecosaur
6b16f3bb0d
Org writer: inline latex envs need newlines (#7259)
Closes #7252

As specified in https://orgmode.org/manual/LaTeX-fragments.html, an
inline \begin{}...\end{} LaTeX block must start on a new line.
2021-04-30 10:23:28 +02:00
mbrackeantidot
b6a65445e1
Docx reader: add handling of vml image objects (jgm#4735) (#7257)
They represent images, the same way as other images in vml format.
2021-04-29 09:11:44 -07:00
John MacFarlane
d14c5f94df Further improvements in smart quotes.
Improves heuristic for detection of an "open double quote."
Closes #2103.
2021-04-29 08:48:49 -07:00
John MacFarlane
80e2e88287 Smarter smart quotes.
Treat a leading " with no closing " as a left curly quote.
This supports the practice, in fiction, of continuing
paragraphs quoting the same speaker without an end quote.
It also helps with quotes that break over lines in line
blocks.

Closes #7216.
2021-04-28 23:32:37 -07:00
Albert Krewinkel
85f379e474
JATS writer: use either styled-content or named-content for spans.
If the element has a content-type attribute, or at least one class, then
that value is used as `content-type` and the span is put inside a
`<named-content>` element. Otherwise a `<styled-content>` element is
used instead.

Closes: #7211
2021-04-28 22:21:34 +02:00
Albert Krewinkel
0921b82d98
Docx writer: autoset table width if no column has an explicit width. 2021-04-27 13:27:20 +02:00
John MacFarlane
3a98f7a0c7 Minor code reformatting.
Also taking this opportunity to note, for the record, that
the commit for #7241 should be marked [API change].
It changes the type of `languagesByExtension` in Highlighting,
adding a parameter for a `SyntaxMap`.
2021-04-25 12:22:04 -07:00
Jan Tojnar
c56d080a25
Writers: Recognize custom syntax definitions (#7241)
Languages defined using `--syntax-definition` were not recognized by `languagesByExtension`.
This patch corrects that, allowing the writers to see all custom definitions.

The LaTeX still uses the default syntax map, but that's okay in that context, since
`--syntax-definition` won't create new listings styles.
2021-04-25 12:19:07 -07:00
Jan Tojnar
e9c0f9f97b
Markdown writer: Cleaner (code)blocks with single class (#7242)
When a block only has a single class and no other attributes,
it is not necessary to wrap the class attribute in curly braces –
the class name can be placed after the opening mark as is.

This will result in bit cleaner output when pandoc is used
as a markdown pretty-printer.
2021-04-25 10:36:06 -07:00
John MacFarlane
547bc2cdf8 Add quotes properly in markdown YAML metadata fields.
This fixes a bug, which caused the writer to look at the LAST
rather than the FIRST character in determining whether quotes
were needed.  So we got spurious quotes in some cases and
didn't get necessary quotes in others.

Closes #7245.  Updated a number of test cases accordingly.
2021-04-25 10:31:33 -07:00
Albert Krewinkel
dc0ba7294d
Docx writer: add missing file 2021-04-20 13:38:16 +02:00
Albert Krewinkel
0b74bbbdaa
Docx writer: extract Table handling into separate module 2021-04-20 10:57:54 +02:00
John MacFarlane
16d372abcb Issue error message when reader or writer format is malformed.
Previously we exited with an error status but (due to a bug)
no message.

Closes #7231.
2021-04-19 08:38:31 -07:00
John MacFarlane
73d394ca2a Use MetaInlines not MetaBlocks for multimarkdown metadata fields.
This gives better results in converting to e.g. pandoc markdown.

Ref: <https://groups.google.com/d/msgid/pandoc-discuss/9728d1f4-040e-4392-aa04-148f648a8dfdn%40googlegroups.com>
2021-04-18 22:01:12 -07:00
John MacFarlane
a478a5c4c8 Update to released unicode-collation, latest citeproc dev version.
Update citeproc test.
2021-04-17 16:15:14 -07:00
John MacFarlane
7a7fefce5e Use document's lang for the lang parameter of citeproc...
even if it differs from localeLanguage.  (It is designed
to be possible to override the locale language, and this
is especially useful when one wants to use the unicode
extension syntx, e.g. fr-u-kb.)
2021-04-17 16:15:14 -07:00
John MacFarlane
aecbf8156e Remove Text.Pandoc.BCP47 module.
[API change]

Use Lang from UnicodeCollation.Lang instead.
This is a richer implementation of BCP 47.
2021-04-17 16:15:14 -07:00
John MacFarlane
7ba8c0d2a5 Move getLang from BCP47 -> T.P.Writers.Shared.
[API change]
2021-04-17 16:15:13 -07:00
Albert Krewinkel
5f79a66ed6
JATS writer: reduce unnecessary use of <p> elements for wrapping
The `<p>` element is used for wrapping in cases were the contents would
otherwise not be allowed in a certain context. Unnecessary wrapping is
avoided, especially around quotes (`<disp-quote>` elements).

Closes: #7227
2021-04-16 22:47:37 +02:00
Albert Krewinkel
2d60524de4
JATS writer: convert spans to <named-content> elements
Spans with attributes are converted to `<named-content>` elements
instead of being wrapped with `<milestone-start/>` and `<milestone-end>`
elements. Milestone elements are not allowed in documents using the
articleauthoring tag set, so this change ensures the creation of valid
documents.

Closes: #7211
2021-04-10 11:49:18 +02:00
Albert Krewinkel
051b7ffeaf
JATS writer: add footnote number as label in backmatter
Footnotes in the backmatter are given the footnote's number as a label.
The articleauthoring output is unaffected from this change, as footnotes
are placed inline there.

Closes: #7210
2021-04-10 10:57:06 +02:00
John MacFarlane
20cd33e5a4 Fix regression in grid tables for wide characters.
In the translation from String to Text, a char-width-sensitive
splitAt' was dropped.  This commit reinstates it.
Closes #7214.
2021-04-08 14:48:29 -07:00
Albert Krewinkel
e227496d3a
Lua filter: respect Inlines/Blocks filter functions in pandoc.walk_* 2021-04-08 22:14:47 +02:00
John MacFarlane
60974538b2 Commonmark writer: Use backslash escapes for < and |...
instead of entities.  Closes #7208.
2021-04-05 23:29:22 -07:00
John MacFarlane
21fed4a9c2 SelfContained: remove unneeded imports. 2021-04-05 23:26:54 -07:00
Albert Krewinkel
038261ea52
JATS writer: escape disallows chars in identifiers
XML identifiers must start with an underscore or letter, and can contain
only a limited set of punctuation characters. Any IDs not adhering to
these rules are rewritten by writing the offending characters as Uxxxx,
where `xxxx` is the character's hex code.
2021-04-05 21:55:54 +02:00
John MacFarlane
65a9d3a878 SelfContained: use application/octet-stream for unknown mime types...
instead of halting with an error.
Closes #7202.
2021-04-05 08:49:03 -07:00
John MacFarlane
935d10769d Fix "phrase" in DocBook: take classes from "role" not "class".
Closes #7195.  Revises #6438.
2021-04-02 17:07:18 -07:00
tecosaur
4371223d13
Org writer: Use LaTeX style maths deliminators (#7196)
Org works better with LaTeX-style delimiters.
2021-04-01 23:36:02 +02:00
niszet
40da6c402b
Treat tabs as spaces in ODT Reader. (#7185) 2021-03-31 16:44:34 -07:00
John MacFarlane
e22d1fbb14 Powerpoint writer: allow monofont to be specified in metadata...
...not just using `--variable` on the command line (as in
other writers).  Closes #7187.
2021-03-29 14:56:44 -07:00
John MacFarlane
56ce1fc126 Fix DocBook reader mathml regression...
...caused by the switch in XML libraries.
Also fixed a similar issue in JATS.
Closes #7173.
2021-03-24 12:04:33 -07:00
John MacFarlane
052056289f Simplify T.P.Asciify and export toAsciiText [API change].
Instead of encoding a giant (and incomplete) map, we now
just use unicode-transforms to normalize the text to
a canonical decomposition, and manipulate the result.

The new `toAsciiText` is equivalent to the old
`T.pack . mapMaybe toAsciiChar . T.unpack` but should be faster.
2021-03-21 23:40:19 -07:00
John MacFarlane
c389211e2f Support yaml_metadata_block extension form commonmark, gfm.
This is a bit more limited than with markdown, as documented
in the manual:

- The YAML block must be the first thing in the input.
- The leaf notes are parsed in isolation from the rest of
  the document.  So, for example, you can't use reference
  links if the references are defined later in the document.

Closes #6537.
2021-03-20 15:58:33 -07:00
John MacFarlane
2274eb88a4 Move yamlMetaBlock from Markdown reader to T.P.Readers.Metadata. 2021-03-20 15:58:33 -07:00
John MacFarlane
bea86f394e Markdown reader: export yamlMetaBlock.
[API change]

This will allow us to parse YAML metadata blocks in other
readers, potentially.
2021-03-20 15:58:33 -07:00
John MacFarlane
ce418667ae Text.Pandoc.Parsing: remove F type synonym.
Muse and Org were defining their own F anyway, with their
own state.  We therefore move this definition to the Markdown
reader.
2021-03-20 15:58:32 -07:00
John MacFarlane
4d041953f5 T.P.Readers.Metadata: made yamlBsToMeta, yamlBsToRefs polymorphic...
on the parser state, instead of requiring ParserState.
[API change]
2021-03-20 15:58:32 -07:00
John MacFarlane
84d8f3efd8 RST writer: use NonEmpty for init, last. 2021-03-20 15:58:32 -07:00
Erik Rask
82e8c29cb0 Include Header.Attr.attributes as XML attributes on section
Add key-value pairs found in the attributes list of Header.Attr as
XML attributes on the corresponding section element.

Any key name not allowed as an XML attribute name is dropped, as
are keys with invalid values where they are defined as enums in
DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not
intervene with computed identifiers.
2021-03-20 21:29:17 +01:00
John MacFarlane
a1a57bce4e T.P.Shared: remove backslashEscapes, escapeStringUsing.
[API change]

These are inefficient association list lookups.
Replace with more efficient functions in the writers that
used them (with 10-25% performance improvements in
haddock, org, rtf, texinfo writers).
2021-03-20 00:24:49 -07:00
John MacFarlane
eacead3eb3 Fix fallback to default partials on templates.
If the directory containing a template does not contain
the partial, it should be sought in the default data files.
Closes #7164.
2021-03-19 22:57:48 -07:00
John MacFarlane
7678c48122 Hlint suggestion. 2021-03-19 14:43:42 -07:00
John MacFarlane
005f0fbcd5 T.P.Shared: Remove ToString, ToText typeclasses [API change].
T.P.Parsing: revise type of readWithM so that it takes a Text
rather than a polymorphic ToText value.

These typeclasses were there to ease the transition from String
to Text. They are no longer needed, and they may clash with
more useful versions under the same name.

This will require a bump to 2.13.
2021-03-19 12:36:04 -07:00
John MacFarlane
4002c35a91 Protect partial uses of maximum with NonEmpty. 2021-03-19 11:55:59 -07:00
John MacFarlane
8d5116381b Use NonEmpty instead of minimumDef. 2021-03-19 10:30:32 -07:00
John MacFarlane
a31731b8e2 Docx reader: Don't reimplement NonEmpty. 2021-03-19 10:11:08 -07:00
John MacFarlane
3428248deb Use minimumDef instead of minimum (partial function). 2021-03-18 23:01:12 -07:00
John MacFarlane
f0e4b9cc3c Require safe >= 0.3.18 and remove cpp. 2021-03-18 21:37:56 -07:00
John MacFarlane
1da6208315 Rewrite a foldl1 as a foldl'. 2021-03-18 21:30:59 -07:00
John MacFarlane
67e173bda1 Remove another foldr1 partial function use. 2021-03-18 21:10:22 -07:00
John MacFarlane
fd76e605cd T.P.Readers.Odt.StyleReader: rewrite foldr1 use as foldr.
This avoids a partial function.
2021-03-18 21:02:05 -07:00
John MacFarlane
c3f9e8c122 Docx writer: make nsid in abstractNum deterministic.
Previously we assigned a random number (though in a deterministic
way).  But changes in the random package mean we get different
results now on different architectures, even with the same random
seed. We don't need random values; so now we just assign a value
based on the list number id, which is guaranteed to be unique
to the list marker.
2021-03-17 22:31:20 -07:00
John MacFarlane
7bf4be04b0 Fix regression with tex_math_backslash in Markdown reader.
Added regression test.  Closes #7155.
2021-03-17 09:10:44 -07:00
John MacFarlane
87538966a0 Removed unused LANGUAGE pragmas. 2021-03-16 13:05:29 -07:00
John MacFarlane
805d12ac9c Remove an unneeded import 2021-03-15 14:21:52 -07:00
John MacFarlane
24191a2a27 Use foldl' instead of foldl everywhere. 2021-03-15 10:37:35 -07:00
John MacFarlane
3622097da3 Handle 'nocite' better with --biblatex and --natbib.
Previously the nocite metadata field was ignored with
these formats.  Now it populates a `nocite-ids` template
variable and causes a `\nocite` command to be issued.

Closes #4585.
2021-03-14 00:10:37 -08:00
Albert Krewinkel
35688c4262
T.P.App.FormatHeuristics: shorten code, improve docs. 2021-03-13 22:06:43 +01:00
John MacFarlane
35b66a7671 MediaWiki reader: Allow block-level content in notes (ref).
Closes #7145.
2021-03-13 12:50:44 -08:00
John MacFarlane
eed18d231c Use integral values for w:tblW in docx.
Cloess #7141.
2021-03-13 12:05:52 -08:00
Albert Krewinkel
00e8d0678e
Jira reader: mark divs created from panels with class "panel".
Closes: tarleb/jira-wiki-markup#2
2021-03-13 14:29:47 +01:00
Albert Krewinkel
a8aa301428
Jira writer: improve div/panel handling
Include div attributes in panels, always render divs with class `panel`
as panels, and avoid nesting of panels.
2021-03-13 12:10:02 +01:00
John MacFarlane
894ed8ebb0 Citeproc: apply fixLinks correctly.
This is code that incorporates a prefix like `https://doi.org/`
into a following link when appropriate.  But it didn't work because
we were walking with a `[Inline] -> [Inline]` function on an `Inlines`.
Changed the point of application of `fixLink` to resolve the issue.
Closes #7130.
2021-03-12 11:58:52 -08:00
John MacFarlane
92ffd37475 Simplify compactDL. 2021-03-12 11:58:52 -08:00
John MacFarlane
5608dc01e5 HTML writer: Add warnings on duplicate attribute values.
This prevents emitting invalid HTML.

Ultimately it would be good to prevent this in the types
themselves, but this is better for now.

T.P.Logging: Add DuplicateAttribute constructor to LogMessage.
[API change]
2021-03-10 10:19:40 -08:00
John MacFarlane
1c23e3a824 RST reader: fix logic for ending comments.
Previously comments sometimes got extended too far.  Closes #7134.
2021-03-09 13:03:27 -08:00
Albert Krewinkel
d7f8fbf04b
Org writer: fix operator precedence mistake in previous commit 2021-03-09 21:16:11 +01:00
Albert Krewinkel
b9b2586ed3
Org writer: prevent unintended creation of ordered list items
Adjust line wrapping if default wrapping would cause a line to be read
as an ordered list item.

Fixes #7132
2021-03-09 18:14:54 +01:00
Albert Krewinkel
eb184d9148
Jira writer: use noformat instead of code for unknown languages.
Code blocks that are not marked as a language supported by Jira are
rendered as preformatted text with `{noformat}` blocks.

Fixes: tarleb/jira-wiki-markup#4
2021-03-08 12:50:35 +01:00
John MacFarlane
5aa73bd0a2 LaTeX reader: handle table cells containing & in \verb.
Closes #7129.
2021-03-07 15:49:02 -08:00
John MacFarlane
c652dcc16b LaTeX reader: support hyperref command.
Closes #7127.
2021-03-07 13:22:00 -08:00
John MacFarlane
735a69de6b Allow --resource-path to accumulate.
Previously, if `--resource-path` were used multiple times, the last
resource path would replace the others.

With this change, each time `--resource-path` is used, it prepends
the specified path components to the existing resource path.

Similarly, when `resource-path` is specified in a defaults file,
the paths provided will be prepended to the existing resource
path.

This change also allows one to avoid using the OS-specific path
separator; instead, one can simply use `--resource-path`
a number of times with single paths. This form of command
will not have an OS-dependent behavior.

This change facilitates the use of multiple, small defaults
files: each can specify a directory containing its own
resources without clobbering the resource paths set by
the others.

Closes #6152.
2021-03-06 10:32:51 -08:00
John MacFarlane
df00cf05cb Allow ${.} in defaults files paths...
to refer to the directory where the default file is.
This will make it possible to create moveable
"packages" of resources in a directory.

Closes #5871.
2021-03-05 11:56:41 -08:00
John MacFarlane
6dd7520cc4 Implement environment variable interpolation in defaults files.
This allows the syntax `${HOME}` to be used, in fields that expect
file paths only.  Any environment variable may be interpolated
in this way. A warning will be raised for undefined variables.
The special variable `USERDATA` is automatically set to the
user data directory in force when the defaults file is parsed.
(Note: it may be different from the eventual user data directory,
if the defaults file or further command line options change that.)

Closes #5982.
Closes #5977.
Closes #6108 (path not taken).
2021-03-05 10:46:01 -08:00
John MacFarlane
a832469006 Add fields for CSL optinos to Opt.
* Add `optCSL`, `optBibliography`, `optCitationAbbreviations` to
  `Opt` [API change].
* Move `addMeta` from T.P.App.Opt to T.P.App.CommandLineOptions.
2021-03-05 10:42:33 -08:00
John MacFarlane
ccc530c588 Logging: Add EnvironmentVariableUndefined constructor to LogMessage.
[API change]
2021-03-05 10:28:46 -08:00
John MacFarlane
5f9327cfc8 Shared: Change defaultUserDataDirs -> defaultUserDataDir.
Rationale: the manual says that the XDG data directory will
be used if it exists, otherwise the legacy data directory.
So we should just determine this and use this directory,
rather than having a search path which could cause some
things to be taken from one data directory and others from
others.

[API change]
2021-03-05 10:25:18 -08:00
John MacFarlane
030209fc29 Revert "Revert "Relax --abbreviations rules so that a period isn't required.
This reverts commit 916ce4d511.

I was confused in thinking it wouldn't work.
2021-03-04 16:25:13 -08:00
John MacFarlane
916ce4d511 Revert "Relax --abbreviations rules so that a period isn't required."
This reverts commit e461b7dd45.

Ill-advised change.  This doesn't work because we parse
strings in chunks.
2021-03-04 16:22:08 -08:00
John MacFarlane
e461b7dd45 Relax --abbreviations rules so that a period isn't required.
Partially addresses #7124.
2021-03-04 16:02:46 -08:00
John MacFarlane
92ea8a0cb6 Revert "Add T.P.Readers.LaTeX.Include."
This reverts commit b569b0226d.

Memory usage improvement in compilation wasn't very significant.
2021-03-03 19:07:16 -08:00
John MacFarlane
b569b0226d Add T.P.Readers.LaTeX.Include. 2021-03-03 18:47:17 -08:00
John MacFarlane
33e4c8dd6c Remove T.P.Readers.LaTeX.Accent.
Incorporate accentCommands into T.P.Readers.LaTeX.Inline.
2021-03-03 18:21:32 -08:00
John MacFarlane
da5e9e5956 Move enquote commands to T.P.LaTeX.Lang. 2021-03-03 11:22:42 -08:00
John MacFarlane
044bc44fc6 Moved more into T.P.Readers.LaTeX.Lang. 2021-03-03 11:08:02 -08:00
John MacFarlane
bbcc1501a5 Split out T.P.Readers.LaTeX.Inline. 2021-03-03 10:34:10 -08:00
John MacFarlane
e8e5ffe1f4 Split out T.P.Writers.LaTeX.Util. 2021-03-02 22:40:45 -08:00
John MacFarlane
fe483c653b Split out T.P.Writers.LaTeX.Citation. 2021-03-02 21:57:37 -08:00
John MacFarlane
827ecdd2de Split out T.P.Writers.LaTeX.Lang. 2021-03-02 21:33:58 -08:00
John MacFarlane
2097411e4f Split up T.P.Writers.Markdown...
with T.P.Writers.Markdown.Types and T.P.Writers.Markdown.Inline.
The module was difficult to compile on low-memory system.s
2021-03-02 21:08:13 -08:00
John MacFarlane
7f1b933aaa Make T.P.Readers.LaTeX.Types an unexported module.
[API change]

This is really an implementation detail that shouldn't be
exposed in the public API.
2021-03-01 09:46:43 -08:00
John MacFarlane
382f0e23d2 Factor out T.P.Readers.LaTeX.Macro. 2021-03-01 09:46:43 -08:00
Albert Krewinkel
e1454fe0d0
Jira writer: use Span identifiers as anchors
Closes: tarleb/jira-wiki-markup#3.
2021-03-01 14:36:11 +01:00
John MacFarlane
3793ed8beb Removed unnecessary pragmas. 2021-02-28 23:43:55 -08:00
John MacFarlane
6a6291d9e3 Change T.P.Readers.LaTeX.SIunitx to export a command map...
instead of individual commands.
2021-02-28 23:05:35 -08:00
John MacFarlane
7e38b8e55a T.P.Readers.LaTeX: Don't export tokenize, untokenize.
[API change]

These were only exported for testing, which seems the
wrong thing to do.  They don't belong in the public
API and are not really usable as they are, without access
to the Tok type which is not exported.

Removed the tokenize/untokenize roundtrip test.

We put a quickcheck property in the comments which
may be used when this code is touched (if it is).
2021-02-28 22:53:42 -08:00
John MacFarlane
2463fbf61d LaTeX writer: use function instead of map for accent lookup. 2021-02-28 21:43:11 -08:00
John MacFarlane
d2bb0c7c8d Factor out T.P.Readers.LaTeX.Math. 2021-02-28 21:05:25 -08:00
John MacFarlane
36456070c4 Fix bug in last commit. 2021-02-28 15:36:46 -08:00
John MacFarlane
7229d068c9 Markdown reader efficiency improvements.
Benchmarks show that these make the reader 13-17% faster,
depending on extensions.
2021-02-28 15:18:31 -08:00
John MacFarlane
cc543cf5b6 LaTeX reader: another small efficiency improvement. 2021-02-28 14:34:04 -08:00
John MacFarlane
f6cf03857b LaTeX reader efficiency improvements.
In conjunction with other changes this makes the reader
almost twice as fast on our benchmark as it was on Feb. 10.
2021-02-28 12:52:41 -08:00
John MacFarlane
564c39beef Move setDefaultLanguage to T.P.Readers.LaTeX.Lang. 2021-02-28 09:49:34 -08:00
John MacFarlane
5e571d9635 LaTeX reader: remove two unnecessary parsers in inline.
These are handled anyway by regularSymbol.
2021-02-28 09:39:01 -08:00
John MacFarlane
2faa57e8e9 Factor out T.P.Readers.LaTeX.Citation. 2021-02-28 09:12:09 -08:00
John MacFarlane
08231f5cdd Factor out T.P.Readers.LaTeX.Table. 2021-02-27 21:40:56 -08:00
John MacFarlane
925815bb33 Split off T.P.Readers.LaTeX.Accent.
To help reduce memory demands compiling the main LaTeX reader.
2021-02-27 17:02:44 -08:00
Albert Krewinkel
3327b225a1
Lua: use strict evaluation when retrieving AST value from the stack
Fixes: #6674
2021-02-27 21:57:12 +01:00
Salim B
fae6a204f1
Fix/update URLs and use HTTP**S** where possible (#7122) 2021-02-26 17:56:04 -08:00
John MacFarlane
f0a991a22b T.P.CSV: fix parsing of unquoted values.
Previously we didn't allow unescaped quotes in unquoted values,
but they are allowed. Closes #7112.
2021-02-22 21:18:04 -08:00
John MacFarlane
d30791a381 Fall back to latin1 if UTF-8 decoding fails...
...when handling URL argument served with no charset in the mime type.
The assumption is that most pages that don't specify a charset
in the mime type are either UTF-8 or latin1.  I think that's a good
assumption, though I'm not sure.
2021-02-22 14:17:22 -08:00
John MacFarlane
5a73c5d3f8 When downloading content from URL arguments, be sensitive to...
the character encoding.  We can properly handle UTF-8 and
latin1 (ISO-8859-1); for others we raise an error.
See #5600.
2021-02-22 14:01:10 -08:00
John MacFarlane
bafccd5aa2 T.P.Error: Add PandocUnsupportedCharsetError constructor...
...for PandocError.  [API change]
2021-02-22 14:01:04 -08:00
John MacFarlane
4617f229ea Text.Pandoc.MIME: add exported function getCharset.
[API change]
2021-02-22 13:28:47 -08:00
John MacFarlane
80fde18fb1 Text.Pandoc.UTF8: change IO functions to return Text, not String.
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.

This avoids the need to uselessly create a linked list of characters
when emiting output.
2021-02-22 11:30:07 -08:00
John MacFarlane
2b37ed9f21 LaTeX reader: further optimizations in satisfyTok.
Benchmarks show 2/3 of the run time and 2/3 of the allocation
of the Feb. 10 benchmarks.
2021-02-21 11:30:17 -08:00
John MacFarlane
db4f882315 LaTeX reader: removed sExpanded in state.
This isn't actually needed and checking it doesn't change
anything.

Also remove an unnecessary `doMacros` before `satisfyTok`,
which does it anyway.
2021-02-21 11:24:04 -08:00
John MacFarlane
f43cb5ddcf LaTeX reader: further performance optimization.
Avoid unnecessary 'doMacros'.
2021-02-21 10:58:42 -08:00
John MacFarlane
c0c8865eaa HTML reader: small performance tweak. 2021-02-20 23:40:02 -08:00
John MacFarlane
d8ef383692 T.P.Shared: remove some obsolete functions [API change].
Removed:

- `splitByIndices`
- `splitStringByIndicies`
- `substitute`
- `underlineSpan`

None of these are used elsewhere in the code base.
2021-02-20 23:02:10 -08:00
John MacFarlane
321343b2cf HTML reader: small efficiency improvements.
Also, remove exported class NamedTag(..) [API change].
This was just intended to smooth over the transition from String to Text
and is no longer needed.

The functions isInlineTag and isBlockTag are no longer
polymorphic.
2021-02-20 22:49:20 -08:00
John MacFarlane
cec541e54c LaTeX reader: Another small improvement to macro handling. 2021-02-20 22:14:31 -08:00
John MacFarlane
31b8f60ea8 LaTeX reader: avoid macro resolution code if no macros defined. 2021-02-20 22:03:29 -08:00
John MacFarlane
0f955b10b4 T.P.Readers.LaTeX.Parsing: improve braced'.
Remove the parameter, have it parse the opening brace,
and make it more efficient.
2021-02-20 18:57:46 -08:00
John MacFarlane
13847267e9 HTML reader: efficiency improvements.
Do a lookahead to find the right parser to use.

Benchmarks from 34ms to 23ms, with less allocation.
Also speeds up the epub reader.
2021-02-20 00:07:38 -08:00
John MacFarlane
98d26c2345 DocBook, JATS, OPML readers: performance optimization.
With the new XML parser, we can avoid the expensive tree
normalization step we used to do.

This gives a significant speed boost in docbook and JATS
parsing (e.g. 9.7 to 6 ms).
2021-02-18 21:24:31 -08:00
John MacFarlane
ef642e2bbc T.P.XML Improve fromEntities. 2021-02-18 18:11:27 -08:00
John MacFarlane
0f5c56dfb1 T.P.PDF: disable smart when building PDF via LaTeX.
This is to prevent accidental creation of ligatures like
`` ?` `` and `` !` `` (especially in languages with quotations
like German), and similar ligature issues.

See jgm/citeproc#54.
2021-02-18 17:11:53 -08:00
John MacFarlane
53cf8295a4 LaTeX writer: adjust hypertargets to beginnings of paragraphs.
Use `\vadjust pre` so that the hypertarget takes you to the
beginning of the paragraph rather than one line down.

Closes #7078.

This makes a particular difference for links to citations
using `--citeproc` and `link-citations: true`.
2021-02-18 14:34:38 -08:00
John MacFarlane
9e728b40f3 T.P.Shared: cleanup.
Cleanup up some functions and added deprecation pragmas
to funtions no longer used in the code base.
2021-02-18 13:12:15 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
John MacFarlane
73add05789 Docx reader: use Map instead of list for Namespaces.
This gives a speedup of about 5-10%.

The reader is now approximately twice as fast as in the last
release.
2021-02-17 09:54:39 -08:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
4af378702a Add orig copyright/license info for code derived from xml-light. 2021-02-16 18:44:38 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
8621ed600a
T.P.Error: remove unused variables 2021-02-14 15:49:12 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
Albert Krewinkel
2d60a5127c T.P.Shared: export handleTaskListItem. [API change] 2021-02-13 13:00:37 -08:00
John MacFarlane
6323250bad LaTeX reader: remove unnecessary line 2021-02-13 00:22:22 -08:00
John MacFarlane
25b7df7c2a Remove Ext_fenced_code_attributes from allowed commonmark attributes.
This attribute was listed as allowed, but it didn't actually
do anything. Use `attributes` for code attributes and more.

Closes #7097.
2021-02-13 00:18:40 -08:00
John MacFarlane
eb0c63b002 Avoid an unnecessary withRaw. 2021-02-12 19:29:48 -08:00
John MacFarlane
d9322629a3 LaTeX reader improvements.
* Rewrote `withRaw` so it doesn't rely on fragile assumptions
  about token positions (which break when macros are expanded).
  This requires the addition of `sEnableWithRaw` and `sRawTokens`
  in `LaTeXState`, and a new combinator `disablingWithRaw` to
  disable collecting of raw tokens in certain contexts.
* Add `parseFromToks` to T.P.Readers.LaTeX.Parsing.
* Fix parsing of single character tokens so it doesn't mess
  up the new raw token collecting.
* These changes slightly increase allocations and have a small
  performance impact, but it's minor.

Closes #7092.
2021-02-12 19:04:14 -08:00
John MacFarlane
390d5e65b2 Use getTimestamp instead of getCurrentTime in writers.
Setting SOURCE_DATE_EPOCH will allow reproducible builds.

Partially addresses #7093.  This does not suffice to fully enable
reproducible in EPUB, since a unique id is being generated for each
build.
2021-02-11 14:55:03 -08:00
John MacFarlane
3c4a58bad0 T.P.Class: Add getTimestamp [API change].
This attempts to read the SOURCE_DATE_EPOCH environment variable
and parse a UTC time from it (treating it as a unix date stamp,
see https://reproducible-builds.org/specs/source-date-epoch/).
If the variable is not set or can't be parsed as a unix date
stamp, then the function returns the current date.
2021-02-11 14:54:28 -08:00
John MacFarlane
acc9afaf6f Correctly parse "raw" date value in markdown references metadata.
See jgm/citeproc#53.
2021-02-11 09:16:25 -08:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
John MacFarlane
f70795dc5e ODT reader: finer-grained errors on parse failure.
See #7091.
2021-02-08 09:39:59 -08:00
John MacFarlane
5cd1c1001f ODT reader: give more information if zip can't be unpacked. 2021-02-08 09:39:59 -08:00