Commit graph

1781 commits

Author SHA1 Message Date
John MacFarlane
a0e44b1ff6 LaTeX reader: improve handling of plain TeX macro primitives.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
  macros (so are equivalent to `\gdef` and `\xdef`).  This should be
  fixed by scoping macro definitions to groups, in a future commit.

Closes #7474.
2021-08-11 10:32:52 -07:00
John MacFarlane
06d97131e5 Tests.Helpers: export testGolden and use it in RTF reader.
This gives a diff output on failure.
2021-08-10 22:07:48 -07:00
John MacFarlane
3a924d8f96 HTML reader: treat commments as blank when parsing.
This modifies pBlank.  Previously comments could sometimes
flummox the parser.

Cloes #7482.
2021-08-10 12:50:23 -07:00
John MacFarlane
7ca4233793 Add test for #7488. 2021-08-10 11:11:33 -07:00
John MacFarlane
6543b05116 Add RTF reader.
- `rtf` is now supported as an input format as well as output.
- New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change]

Closes #3982.
2021-08-10 10:48:55 -07:00
John MacFarlane
dea1f0f080 RTF writer: emit \outlinelevel for section headings. 2021-08-04 16:37:20 -06:00
Peter Fabinski
8667ba2bcc
LaTeX table writer: Increase column width precision (#7466)
In some cases, the rounding performed by the LaTeX table
writer would introduce visible overrun outside the text
area.
This adds two more decimal places to the width values.
2021-08-03 15:34:39 -06:00
John MacFarlane
f938378d00 RTF writer: omit \bin in \pict.
According to the spec, this is not needed or wanted when
the data is in hexadecimal format, as it is here.
2021-08-01 22:45:41 -06:00
John MacFarlane
ca12e198ba RTF template: specify font family for fixed-width font f1.
According to the spec, this is mandatory.
2021-08-01 09:45:09 -06:00
Jan Tojnar
06408d08e5
DocBook reader: add support for citerefentry (#7437)
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element.
These days, refentry is more general, for example the element documentation pages linked below are each a refentry.

As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot.

https://tdg.docbook.org/tdg/5.1/citerefentry.html
https://tdg.docbook.org/tdg/5.1/manvolnum.html
https://tdg.docbook.org/tdg/5.1/refentry.html

This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11 15:28:52 -07:00
John MacFarlane
ac0a9da6d8 Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).
We now use source positions from the token stream to tell us
how much of the text stream to consume.  Getting this to
work required a few other changes to make token source positions
accurate.

Closes #7434.
2021-07-11 13:50:28 -07:00
John MacFarlane
ae22b1e977 RST reader: fix regression with code includes.
With the recent changes to include infrastructure,
included code blocks were getting an extra newline.

Closes #7436.  Added regression test.
2021-07-09 12:27:41 -07:00
Michael Hoffmann
e56e2b0e0b
Recognize data-external when reading HTML img tags (#7429)
Preserve all attributes in img tags.  If attributes have a `data-`
prefix, it will be stripped.  In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06 16:06:29 -07:00
John MacFarlane
3a31fe68ef Add command test for #7394.
And fix a small bug in handling of citations in notes, which
led to commas at the end of sentences in some cases.
2021-07-05 15:10:14 -07:00
Mauro Bieg
de4da56079 document-css: reset overflow-wrap on code blocks
fixes #7423
2021-07-05 08:57:23 -07:00
John MacFarlane
972db3cdca Revert "LaTeX template: move title, author, date up to top of preamble."
This reverts commit cc088687b4
and PR #7295.

This fixes issues people had when using LaTeX commands defined later
in the preamble (or in some cases UTF-8 text) in the title or author
fields.  Closes #7422.
2021-07-03 15:34:42 -07:00
Aner Lucero
cb038bb312 HTML5 writer, remove aria-hidden when explicit atl text is provided. 2021-07-02 13:02:52 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
John MacFarlane
a3d745e485 Docx writer: support figure numbers.
These are set up in such a way that they will work with Word's
automatic table of figures.

Closes #7392.
2021-06-29 09:56:21 -07:00
John MacFarlane
b7572db224 Use dev version of citeproc.
This eliminates double hyperlinks in author-in-text citations.
Author-only citations are no longer hyperlinked.
See jgm/citeproc#77.
2021-06-29 09:18:49 -07:00
Aner Lucero
f4ef652a41 Remove duplicated alt text in HTML output. 2021-06-29 09:02:13 -07:00
John MacFarlane
851d037b3e Improve punctuation moving with --citeproc.
Previously, using `--citeproc` could cause punctuation to move in
quotes even when there aer no citations. This has been changed;
now, punctuation moving is limited to citations.

In addition, we only move footnotes around punctuation if the
style is a note style, even if `notes-after-punctuation` is `true`.
2021-06-28 22:41:14 -07:00
John MacFarlane
dd098d4e15 Markdown writer: put space between Plain and following fenced Div.
Closes #4465.
2021-06-28 11:33:22 -07:00
John MacFarlane
1b07997f4a Fix regression with comment-only YAML metadata blocks.
Closes #7400.
2021-06-22 09:55:50 -07:00
John MacFarlane
8eed5b90d0 LaTeX writer: add strut at end of minipage if it contains...
line breaks.  Without them, the last line is shorter
than it should be, at least in some cases.
2021-06-21 23:33:00 -07:00
John MacFarlane
2ef2049b4e Update command test for change to LaTeX LineBreak handling. 2021-06-21 22:34:38 -07:00
John MacFarlane
ed3974a254 LaTeX writer: always use a minipage for cells with line breaks...
if width information is available.  Otherwise the way we treat them can
lead to content that overflows a cell.

Closes #7393.
2021-06-21 18:25:36 -07:00
John MacFarlane
a39313eddb Fix test for #7397 2021-06-21 09:30:23 -07:00
John MacFarlane
82ad855f38 Markdown writer: Fix regression in code blocks with attributes.
Code blocks with a single class but nonempty attributes
were having attributes drop as a result of #7242.

Closes #7397.
2021-06-21 08:49:00 -07:00
John MacFarlane
b0cd6c6224 Fix regression in citeproc processing.
If inline references are used (in the metadata `references` field),
we should still only include in the bibliography items that are
actually cited -- unless `nocite` is used.

Closes #7376.
2021-06-12 10:16:44 -07:00
John MacFarlane
21cc52abe3 LaTeX writer: Fix regression in table header position.
In recent versions the table headers were no longer bottom-aligned
(if more than one line).  This patch fixes that by using minipages
for table headers in non-simple tables.

Closes #7347.
2021-06-05 14:13:58 -06:00
Jan Tojnar
af9de925de DocBook writer: Remove non-existent admonitions
attention, error and hint are actually just reStructuredText specific.
danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-06-05 08:02:21 -06:00
John MacFarlane
2e4ef14d91 Markdown reader: fix pipe table regression in 2.11.4.
Previously pipe tables with empty headers (that is, a header
line with all empty cells) would be rendered as headerless
tables.  This broke in 2.11.4.

The fix here is to produce an AST with an empty table head
when a pipe table has all empty header cells.

Closes #7343.
2021-06-01 21:44:55 -06:00
John MacFarlane
abb59bd582 LaTeX reader: don't allow optional * on symbol control sequences.
Generally we allow optional starred variants of LaTeX commands
(since many allow them, and if we don't accept these explicitly,
ignoring the star usually gives acceptable results).  But we
don't want to do this for `\(*\)` and similar cases.

Closes #7340.
2021-06-01 13:54:51 -06:00
John MacFarlane
62f46b3995 Fix regression with commonmark/gfm yaml metdata block parsing.
A regression in 2.14 led to the document body being omitted
after YAML metadata in some cases.  This is now fixed.

Closes #7339.
2021-05-31 21:34:51 -06:00
John MacFarlane
cc206af392 Have LoadedResource use relative paths.
The immediate reason for this is to allow the test output of #3752
to work on both windows and linux.
2021-05-30 10:23:00 -07:00
John MacFarlane
c210b98366 Fix test #3752 (1) for Windows. 2021-05-29 14:36:49 -07:00
John MacFarlane
5772f7f943 Further test image size reductions. 2021-05-29 12:27:59 -07:00
John MacFarlane
7aade73dce Replace biblatex-exmaples.bib with shorter averroes.bib in tests. 2021-05-29 12:14:37 -07:00
John MacFarlane
e86f6abc45 Further test image size reductions. 2021-05-29 12:09:21 -07:00
John MacFarlane
3ba9ef01eb Reduce size of image in fb2 image test. 2021-05-29 11:54:03 -07:00
John MacFarlane
5cf887db20 Reduce size of cover image in test epub. 2021-05-29 11:48:52 -07:00
John MacFarlane
8660f42f09 Modify pptx tests to take a whole lot less space.
- Replace a 300K image in the reference pptx with a 2K one.
- Updated all the *_templated.pptx files based on the new
  reference pptx.
- These changes should reduce the size of the tarball by
  roughly 7 MB!

See haskell/hackage-server#935
2021-05-29 10:59:14 -07:00
John MacFarlane
b6b2331fdc Support rebase_relative_paths for commonmark based formats.
(Including `gfm`.)
2021-05-28 13:58:44 -07:00
Emily Bourke
56b211120c
Docx reader: Support new table features.
* Column spans
* Row spans
  - The spec says that if the `val` attribute is ommitted, its value
    should be assumed to be `continue`, and that its values are
    restricted to {`restart`, `continue`}. If the value has any other
    value, I think it seems reasonable to default it to `continue`. It
    might cause problems if the spec is extended in the future by adding
    a third possible value, in which case this would probably give
    incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
  - The table description element is like alt text for a table (along
    with the table caption element). It seems like we should include
    this somewhere, but I’m not 100% sure how – I’m pairing it with the
    simple caption for the moment. (Should it maybe go in the block
    caption instead?)
* Detect table captions
  - Check for caption paragraph style /and/ either the simple or
    complex table field. This means the caption detection fails for
    captions which don’t contain a field, as in an example doc I added
    as a test. However, I think it’s better to be too conservative: a
    missed table caption will still show up as a paragraph next to the
    table, whereas if I incorrectly classify something else as a table
    caption it could cause havoc by pairing it up with a table it’s
    not at all related to, or dropping it entirely.
* Update tests and add new ones

Partially fixes: #6316
2021-05-28 20:15:23 +02:00
Emily Bourke
44484d0dee
Docx reader: Read table column widths. 2021-05-28 20:15:23 +02:00
John MacFarlane
4842c5fb82 Two citeproc locator/suffix improvements:
- Recognize locators spelled with a capital letter.
  Closes #7323.
- Add a comma and a space in front of the suffix if it doesn't start
  with space or punctuation.  Closes #7324.
2021-05-27 18:28:52 -07:00
John MacFarlane
4b16d181e7 rebase_relative_paths: leave empty paths unchanged. 2021-05-27 14:16:37 -07:00
John MacFarlane
0661ce699f rebase_relative_paths extension: don't change fragment paths.
We don't want a pure fragment path to be rewritten, since
these are used for cross-referencing.
2021-05-27 13:53:26 -07:00
John MacFarlane
6972a7dc91 Modify rebase_reference_links treatment of reference links/images.
The directory is based on the file containing the link
reference, not the file containing the link, if these differ.
2021-05-27 11:26:38 -07:00
John MacFarlane
cbe16b2866 Citeproc: Don't detect math elements as locators.
Closes #7321.
2021-05-27 10:49:45 -07:00
John MacFarlane
834da53058 Add rebase_relative_paths extension.
- Add manual entry for (non-default) extension
  `rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
  in Text.Pandoc.Extensions [API change]. When enabled, this
  extension rewrites relative image and link paths by prepending
  the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.

Closes #3752.

NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
2021-05-27 10:38:25 -07:00
John MacFarlane
81eadfd99a LaTeX reader: improve \def and implement \newif.
- Improve parsing of `\def` macros.  We previously set "verbatim mode"
  even for parsing the initial `\def`; this caused problems for things
  like
  ```
  \def\foo{\def\bar{BAR}}
  \foo
  \bar
  ```
- Implement `\newif`.
- Add tests.
2021-05-27 09:15:04 -07:00
John MacFarlane
e0a1f7d2cf Command tests: fail if a file contains no tests.
And fix a test that failed in that way!
2021-05-26 09:52:23 -07:00
John MacFarlane
6804f47383 Fix a command test so it writes to stdout not stderr.
The error message to stderr was appearing in test output
and confusing some users, who thought it indicated a failing
test rather than expected output.
2021-05-25 21:41:40 -07:00
John MacFarlane
8d5014fdfc Logging: remove single quotes around paths in messages.
We weren't doing it consistently and it seems unnecessary.
2021-05-25 11:53:49 -07:00
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
John MacFarlane
8511f6fdf6 MediaBag improvements.
In the current dev version, we will sometimes add
a version of an image with a hashed name, keeping
the original version with the original name, which
would leave to undesirable duplication.

This change separates the media's filename from the
media's canonical name (which is the path of the link
in the document itself).  Filenames are based on SHA1
hashes and assigned automatically.

In Text.Pandoc.MediaBag:

- Export MediaItem type [API change].
- Change MediaBag type to a map from Text to MediaItem [API change].
- `lookupMedia` now returns a `MediaItem` [API change].
- Change `insertMedia` so it sets the `mediaPath` to
  a filename based on the SHA1 hash of the contents.
  This will be used when contents are extracted.

In Text.Pandoc.Class.PandocMonad:

- Remove `fetchMediaResource` [API change].

Lua MediaBag module has been changed minimally. In the future
it would be better, probably, to give Lua access to the full
MediaItem type.
2021-05-24 09:20:44 -07:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
John MacFarlane
1af2cfb287 Handle relative lengths (e.g. 2*) in HTML column widths.
See <https://www.w3.org/TR/html4/types.html#h-6.6>.

"A relative length has the form "i*", where "i" is an integer. When
allotting space among elements competing for that space, user agents
allot pixel and percentage lengths first, then divide up remaining
available space among relative lengths. Each relative length receives a
portion of the available space that is proportional to the integer
preceding the "*". The value "*" is equivalent to "1*". Thus, if 60
pixels of space are available after the user agent allots pixel and
percentage space, and the competing relative lengths are 1*, 2*, and 3*,
the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and
the 3* will be alloted 30 pixels."

Closes #4063.
2021-05-22 22:03:54 -07:00
John MacFarlane
07d299d353 DocBook reader: ensure that first and last names are separated.
Closes #6541.
2021-05-20 18:45:39 -07:00
John MacFarlane
d7b5def287 Ms writer: handle tables with multiple paragraphs.
Previously they overflowed the table cell width.
We now set line lengths per-cell and restore them
after the table has been written.

Closes #7288.
2021-05-20 17:12:38 -07:00
John MacFarlane
bb11f5fb86 LaTeX reader: More siunitx improvements. Closes #6658.
There's still one slight divergence from the siunitx behavior:
we get 'kg m/A/s' instead of 'kg m/(A s)'. At the moment I'm
not going to worry about that.
2021-05-20 15:30:31 -07:00
John MacFarlane
4e990a8cf9 LaTeX/siunitx: fix parsing of \cubic etc. See #6658. 2021-05-20 10:13:20 -07:00
John MacFarlane
bc5058234f LaTeX reader sinuitx: fix + sign on ang. 2021-05-20 10:13:20 -07:00
John MacFarlane
5dc917da3e LaTeX reader siunitx: add leading 0 to numbers starting with . 2021-05-20 10:13:20 -07:00
Denis Maier
183ce58477
ConTeXt reader: improve ordered lists (#7304)
Closes #5016 

- change ordered list from itemize to enumerate
- adds new itemgroup for ordered lists
- add fontfeature for table figures
- remove width from itemize in context writer
2021-05-20 09:59:53 -07:00
John MacFarlane
a366bd6abc LaTeX reader: Fix parsing of +- in siunitx numbers.
See #6658.
2021-05-20 09:03:29 -07:00
John MacFarlane
8437a4a002 LaTeX reader: support \pm in SI{..}.
Closes #6620.
2021-05-20 08:16:46 -07:00
Albert Krewinkel
b6239f4150
ZimWiki writer: allow links and emphasis in headers
The latest version of ZimWiki supports this.

Closes: #6605
2021-05-20 12:48:05 +02:00
John MacFarlane
5736b331d8 LaTeX reader: better support for \xspace.
Previously we only supported it in inline contexts; now
we support it in all contexts, including math.

Partially addresses #7299.
2021-05-19 16:14:49 -07:00
Albert Krewinkel
eb3dff148e
LaTeX writer: separate successive quote chars with thin space
Successive quote characters are separated with a thin space to improve
readability and to prevent unwanted ligatures. Detection of these quotes
sometimes had failed if the second quote was nested in a span element.

Closes: #6958
2021-05-18 22:55:47 +02:00
Albert Krewinkel
1843a8793a
HTML writer: keep attributes from code nested below pre tag.
If a code block is defined with `<pre><code
class="language-x">…</code></pre>`, where the `<pre>` element has no
attributes, then the attributes from the `<code>` element are used
instead. Any leading `language-` prefix is dropped in the code's *class*
attribute are dropped to improve syntax highlighting.

Closes: #7221
2021-05-17 18:08:02 +02:00
Albert Krewinkel
25f5b92777
HTML writer: ensure headings only have valid attribs in HTML4
Fixes: #5944
2021-05-17 15:42:15 +02:00
Albert Krewinkel
4417dacc44
ConTeXt writer: use span identifiers as reference anchors.
Closes: #7246
2021-05-17 13:14:32 +02:00
Albert Krewinkel
d3ca48656f
ConTeXt writer tests: keep code lines below 80 chars. 2021-05-17 13:11:33 +02:00
John MacFarlane
cc088687b4 LaTeX template: move title, author, date up to top of preamble.
This allows header-includes to use them, and puts them
in a position where you can see them immediately.
Closes #7295.
2021-05-16 14:35:13 -07:00
John MacFarlane
5a6399d9f6 Markdown writer: fewer unneeded escapes for #.
See #6259.
2021-05-16 12:23:34 -07:00
John MacFarlane
0a4c6925b6 Docx writer: copy over more settings from referenc.odcx.
From settings.xml in the reference-doc, we now include:
`zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`,
`drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`,
`displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`,
`characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`,
`decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`.

Closes #7240.
2021-05-15 15:40:49 -07:00
John MacFarlane
2cf971cf56 docx writer: Remove rsids from settings.docx.
Word will add these when revisions are made.  But it's
pointless to start out with a set of them.
2021-05-15 10:54:05 -07:00
Albert Krewinkel
0794862aac
HTML writer: parse <header> as a Div
HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-15 16:46:02 +02:00
Albert Krewinkel
013e4a3164
HTML reader: keep h1 tags as normal headers (#7274)
The tags `<title>` and `<h1 class="title">` often contain the same
information, so the latter was dropped from the document. However, as
this can lead to loss of information, the heading is now always
retained.

Use `--shift-heading-level-by=-1` to turn the `<h1>` into the document
title, or a filter to restore the previous behavior.

Closes: #2293
2021-05-14 12:31:24 -07:00
John MacFarlane
76a4e7127b Beamer writer: support exampleblock and alertblock.
A block will be rendered as an exampleblock if the heading
has class `example` and alertblock if it has class `alert`.

Closes #7278.
2021-05-14 10:09:46 -07:00
Albert Krewinkel
17d96404f5
Docx writer: allow multirow table headers 2021-05-14 16:19:20 +02:00
Albert Krewinkel
875f8f3654
HTML reader: don't fail on unmatched closing "script" tag.
Prevent the reader from crashing if the HTML input contains an unmatched
closing `</script>` tag.

Fixes: #7282
2021-05-14 12:13:40 +02:00
John MacFarlane
3f09f53459 Implement curly-brace syntax for Markdown citation keys.
The change provides a way to use citation keys that contain
special characters not usable with the standard citation
key syntax.  Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`.
Closes #6026.

The change requires adding a new parameter to the `citeKey`
parser from Text.Pandoc.Parsing [API change].

Markdown reader: recognize @{..} syntax for citatinos.

Markdown writer:  use @{..} syntax for citations when needed.

Update manual with curly-brace syntax for citations.

Closes #6026.
2021-05-13 21:59:32 -07:00
John MacFarlane
0217ae2a4f Hande 'annote' field in bibtex/biblatex writer.
Closes #7266.
2021-05-12 11:05:55 -07:00
John MacFarlane
5eb7ad7d1e Improve integration of settings from reference.docx.
The settings we can carry over from a reference.docx are
autoHyphenation, consecutiveHyphenLimit, hyphenationZone,
doNotHyphenateCap, evenAndOddHeaders, and proofState.

Previously this was implemented in a buggy way, so that the
reference doc's values AND the new values were included.

This change allows users to create a reference.docx that
sets w:proofState for spelling or grammar to "dirty,"
so that spell/grammar checking will be triggered on the
generated docx.

Closes #1209.
2021-05-11 22:31:38 -06:00
John MacFarlane
2bd5d0cafb LaTeX writer: better handling of line breaks in simple tables.
Now we also handle the case where they're embedded in other
elements, e.g. spans. Closes #7272.
2021-05-11 07:52:05 -06:00
John MacFarlane
6e45607f99 Change reader types, allowing better tracking of source positions.
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.

As a result, the readers had no way of knowing which file
was the source of any particular bit of text.  This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).

Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class.  A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`.  The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).

Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides.  Modified parsers to use a `Sources` as stream
[API change].

The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.

In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.

T.P.Error: showPos, do not print "-" as source name.
2021-05-09 19:11:34 -06:00
Albert Krewinkel
8357b835d9
App: allow tabs expansion even if file-scope is used
Tabs in plain-text inputs are now handled correctly, even if the
`--file-scope` flag is used.

Closes: #6709
2021-05-05 19:09:21 +02:00
Albert Krewinkel
ddbf83f62c
Docx writer: support colspans and rowspans in tables
See: #6315
2021-05-01 18:52:24 +02:00
mbrackeantidot
b6a65445e1
Docx reader: add handling of vml image objects (jgm#4735) (#7257)
They represent images, the same way as other images in vml format.
2021-04-29 09:11:44 -07:00
John MacFarlane
d14c5f94df Further improvements in smart quotes.
Improves heuristic for detection of an "open double quote."
Closes #2103.
2021-04-29 08:48:49 -07:00
John MacFarlane
80e2e88287 Smarter smart quotes.
Treat a leading " with no closing " as a left curly quote.
This supports the practice, in fiction, of continuing
paragraphs quoting the same speaker without an end quote.
It also helps with quotes that break over lines in line
blocks.

Closes #7216.
2021-04-28 23:32:37 -07:00
Albert Krewinkel
85f379e474
JATS writer: use either styled-content or named-content for spans.
If the element has a content-type attribute, or at least one class, then
that value is used as `content-type` and the span is put inside a
`<named-content>` element. Otherwise a `<styled-content>` element is
used instead.

Closes: #7211
2021-04-28 22:21:34 +02:00
Albert Krewinkel
0921b82d98
Docx writer: autoset table width if no column has an explicit width. 2021-04-27 13:27:20 +02:00
Jan Tojnar
e9c0f9f97b
Markdown writer: Cleaner (code)blocks with single class (#7242)
When a block only has a single class and no other attributes,
it is not necessary to wrap the class attribute in curly braces –
the class name can be placed after the opening mark as is.

This will result in bit cleaner output when pandoc is used
as a markdown pretty-printer.
2021-04-25 10:36:06 -07:00
John MacFarlane
547bc2cdf8 Add quotes properly in markdown YAML metadata fields.
This fixes a bug, which caused the writer to look at the LAST
rather than the FIRST character in determining whether quotes
were needed.  So we got spurious quotes in some cases and
didn't get necessary quotes in others.

Closes #7245.  Updated a number of test cases accordingly.
2021-04-25 10:31:33 -07:00
John MacFarlane
7f4850c9de Remove biblatex-nussbaum.md test.
It is basically the same as biblaetx-quotes.md.
2021-04-25 10:29:03 -07:00
John MacFarlane
73d394ca2a Use MetaInlines not MetaBlocks for multimarkdown metadata fields.
This gives better results in converting to e.g. pandoc markdown.

Ref: <https://groups.google.com/d/msgid/pandoc-discuss/9728d1f4-040e-4392-aa04-148f648a8dfdn%40googlegroups.com>
2021-04-18 22:01:12 -07:00
John MacFarlane
a478a5c4c8 Update to released unicode-collation, latest citeproc dev version.
Update citeproc test.
2021-04-17 16:15:14 -07:00
John MacFarlane
099ac9985b Use BCP47 language codes in citeproc tests. 2021-04-17 16:15:14 -07:00
John MacFarlane
ff5a504809 Use new citeproc + unicode-collation.
Add command test for unicode-collation.
2021-04-17 16:15:13 -07:00
Albert Krewinkel
5f79a66ed6
JATS writer: reduce unnecessary use of <p> elements for wrapping
The `<p>` element is used for wrapping in cases were the contents would
otherwise not be allowed in a certain context. Unnecessary wrapping is
avoided, especially around quotes (`<disp-quote>` elements).

Closes: #7227
2021-04-16 22:47:37 +02:00
Albert Krewinkel
2d60524de4
JATS writer: convert spans to <named-content> elements
Spans with attributes are converted to `<named-content>` elements
instead of being wrapped with `<milestone-start/>` and `<milestone-end>`
elements. Milestone elements are not allowed in documents using the
articleauthoring tag set, so this change ensures the creation of valid
documents.

Closes: #7211
2021-04-10 11:49:18 +02:00
Albert Krewinkel
051b7ffeaf
JATS writer: add footnote number as label in backmatter
Footnotes in the backmatter are given the footnote's number as a label.
The articleauthoring output is unaffected from this change, as footnotes
are placed inline there.

Closes: #7210
2021-04-10 10:57:06 +02:00
John MacFarlane
20cd33e5a4 Fix regression in grid tables for wide characters.
In the translation from String to Text, a char-width-sensitive
splitAt' was dropped.  This commit reinstates it.
Closes #7214.
2021-04-08 14:48:29 -07:00
John MacFarlane
60974538b2 Commonmark writer: Use backslash escapes for < and |...
instead of entities.  Closes #7208.
2021-04-05 23:29:22 -07:00
Albert Krewinkel
038261ea52
JATS writer: escape disallows chars in identifiers
XML identifiers must start with an underscore or letter, and can contain
only a limited set of punctuation characters. Any IDs not adhering to
these rules are rewritten by writing the offending characters as Uxxxx,
where `xxxx` is the character's hex code.
2021-04-05 21:55:54 +02:00
tecosaur
4371223d13
Org writer: Use LaTeX style maths deliminators (#7196)
Org works better with LaTeX-style delimiters.
2021-04-01 23:36:02 +02:00
niszet
40da6c402b
Treat tabs as spaces in ODT Reader. (#7185) 2021-03-31 16:44:34 -07:00
John MacFarlane
56ce1fc126 Fix DocBook reader mathml regression...
...caused by the switch in XML libraries.
Also fixed a similar issue in JATS.
Closes #7173.
2021-03-24 12:04:33 -07:00
Erik Rask
82e8c29cb0 Include Header.Attr.attributes as XML attributes on section
Add key-value pairs found in the attributes list of Header.Attr as
XML attributes on the corresponding section element.

Any key name not allowed as an XML attribute name is dropped, as
are keys with invalid values where they are defined as enums in
DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not
intervene with computed identifiers.
2021-03-20 21:29:17 +01:00
John MacFarlane
ceadf33246 Tests: Use getExecutablePath from base...
avoiding the need to depend on the executable-path package.
2021-03-19 23:35:47 -07:00
John MacFarlane
dc94601eb5 Tests: factor out setupEnvironment in Test.Helpers.
This avoids code duplication between Command and Old.
2021-03-19 21:17:13 -07:00
John MacFarlane
2ca1b20a85 Fix finding of data files from test programs.
Apparently Cabal sets a `pandoc_datadir` environment variable
so that the data files will be sought in the source directory
rather than in the final destination (where they aren't yet
installed).

So we no longer need to set `--data-dir` in the tests. We just
need to make sure `pandoc_datadir` is set in the environment
when we call the program in the test suite.

This will fix the issue with loading of pandoc.lua when
pandoc is built with `-embed_data_files`, reported in #7163.

Closes #7163.
2021-03-19 18:57:13 -07:00
John MacFarlane
c3f9e8c122 Docx writer: make nsid in abstractNum deterministic.
Previously we assigned a random number (though in a deterministic
way).  But changes in the random package mean we get different
results now on different architectures, even with the same random
seed. We don't need random values; so now we just assign a value
based on the list number id, which is guaranteed to be unique
to the list marker.
2021-03-17 22:31:20 -07:00
John MacFarlane
e66bf891ec Add test for #7155. 2021-03-17 09:10:37 -07:00
John MacFarlane
63a6059790 Update tests for new texmath. 2021-03-15 18:22:38 -07:00
John MacFarlane
35b66a7671 MediaWiki reader: Allow block-level content in notes (ref).
Closes #7145.
2021-03-13 12:50:44 -08:00
John MacFarlane
eed18d231c Use integral values for w:tblW in docx.
Cloess #7141.
2021-03-13 12:05:52 -08:00
Albert Krewinkel
f8b49e77f8
Use jira-wiki-markup 1.3.4
Jira reader:

* Fixed parsing of autolinks (i.e., of bare URLs in the text).
  Previously an autolink would take up the rest of a line, as spaces
  were allowed characters in these items.

* Emoji character sequences no longer cause parsing failures. This was
  due to missing backtracking when emoji parsing fails.

Jira writer:

* Block quotes are only rendered as `bq.` if they do not contain a
  linebreak.
2021-03-13 14:53:58 +01:00
Albert Krewinkel
00e8d0678e
Jira reader: mark divs created from panels with class "panel".
Closes: tarleb/jira-wiki-markup#2
2021-03-13 14:29:47 +01:00
Albert Krewinkel
a8aa301428
Jira writer: improve div/panel handling
Include div attributes in panels, always render divs with class `panel`
as panels, and avoid nesting of panels.
2021-03-13 12:10:02 +01:00
John MacFarlane
5608dc01e5 HTML writer: Add warnings on duplicate attribute values.
This prevents emitting invalid HTML.

Ultimately it would be good to prevent this in the types
themselves, but this is better for now.

T.P.Logging: Add DuplicateAttribute constructor to LogMessage.
[API change]
2021-03-10 10:19:40 -08:00
John MacFarlane
1c23e3a824 RST reader: fix logic for ending comments.
Previously comments sometimes got extended too far.  Closes #7134.
2021-03-09 13:03:27 -08:00
Albert Krewinkel
b9b2586ed3
Org writer: prevent unintended creation of ordered list items
Adjust line wrapping if default wrapping would cause a line to be read
as an ordered list item.

Fixes #7132
2021-03-09 18:14:54 +01:00
Albert Krewinkel
eb184d9148
Jira writer: use noformat instead of code for unknown languages.
Code blocks that are not marked as a language supported by Jira are
rendered as preformatted text with `{noformat}` blocks.

Fixes: tarleb/jira-wiki-markup#4
2021-03-08 12:50:35 +01:00
John MacFarlane
5aa73bd0a2 LaTeX reader: handle table cells containing & in \verb.
Closes #7129.
2021-03-07 15:49:02 -08:00
Albert Krewinkel
e1454fe0d0
Jira writer: use Span identifiers as anchors
Closes: tarleb/jira-wiki-markup#3.
2021-03-01 14:36:11 +01:00
John MacFarlane
12b47656d4 Remove superfluous imports. 2021-02-28 22:57:36 -08:00
John MacFarlane
7e38b8e55a T.P.Readers.LaTeX: Don't export tokenize, untokenize.
[API change]

These were only exported for testing, which seems the
wrong thing to do.  They don't belong in the public
API and are not really usable as they are, without access
to the Tok type which is not exported.

Removed the tokenize/untokenize roundtrip test.

We put a quickcheck property in the comments which
may be used when this code is touched (if it is).
2021-02-28 22:53:42 -08:00
John MacFarlane
a9cc5d2616 Update tests for changes to https URLs. 2021-02-26 18:00:45 -08:00
Salim B
fae6a204f1
Fix/update URLs and use HTTP**S** where possible (#7122) 2021-02-26 17:56:04 -08:00
John MacFarlane
f0a991a22b T.P.CSV: fix parsing of unquoted values.
Previously we didn't allow unescaped quotes in unquoted values,
but they are allowed. Closes #7112.
2021-02-22 21:18:04 -08:00
Albert Krewinkel
00e4bb51e4
tests: print accurate location if a test fails
Ensures that tasty-hunit reports the location of the failing test
instead of the location of the helper `test` function.
2021-02-22 23:56:04 +01:00
John MacFarlane
80fde18fb1 Text.Pandoc.UTF8: change IO functions to return Text, not String.
[API change] This affects `readFile`, `getContents`, `writeFileWith`,
`writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`.
`hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`.

This avoids the need to uselessly create a linked list of characters
when emiting output.
2021-02-22 11:30:07 -08:00
John MacFarlane
005344fb18 Revert "LaTeX template: disable ` ? ` and ! `` ligatures."
This reverts commit 24d7cd539b.
2021-02-18 17:03:11 -08:00
John MacFarlane
24d7cd539b LaTeX template: disable ` ? ` and ! `` ligatures.
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
2021-02-18 15:48:40 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
b5b576184c
JATS writer: add date-type to pub-date elements 2021-02-15 13:15:14 +01:00
Albert Krewinkel
2c99e0e358
JATS writer: replace attribute "pub-type" with "publication-format".
The former attribute is deprecated.
2021-02-15 13:15:14 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
John MacFarlane
3be066b7d3 Fix command test 5686 2021-02-12 19:04:14 -08:00
John MacFarlane
59875185b3 Add command test for #7092 2021-02-12 19:04:14 -08:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00