Commit graph

193 commits

Author SHA1 Message Date
Emily Bourke
8474d488a5 Provide more detailed XML diff in tests
I had some failing tests and couldn’t tell what was different in the
XML. Updating the comparison to return what’s different made it easier
to figure out what was wrong, and I think will be helpful for others in
future.
2021-08-17 09:35:25 -07:00
John MacFarlane
4340bd52c4 Make docx writer sensitive to native_numbering extension.
Figure and table numbers are now only included if `native_numbering`
is enabled.  (By default it is disabled.)  This is a behavior change
with respect to 2.14.1, but the behavior is that of previous versions.

The change was necessary to avoid incompatibilities between pandoc's
native numbering and third-party cross reference filters like
pandoc-crossref.

Closes #7499.
2021-08-15 15:05:54 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
Jan Tojnar
af9de925de DocBook writer: Remove non-existent admonitions
attention, error and hint are actually just reStructuredText specific.
danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-06-05 08:02:21 -06:00
Emily Bourke
44484d0dee
Docx reader: Read table column widths. 2021-05-28 20:15:23 +02:00
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
Albert Krewinkel
25f5b92777
HTML writer: ensure headings only have valid attribs in HTML4
Fixes: #5944
2021-05-17 15:42:15 +02:00
Albert Krewinkel
4417dacc44
ConTeXt writer: use span identifiers as reference anchors.
Closes: #7246
2021-05-17 13:14:32 +02:00
Albert Krewinkel
d3ca48656f
ConTeXt writer tests: keep code lines below 80 chars. 2021-05-17 13:11:33 +02:00
Albert Krewinkel
85f379e474
JATS writer: use either styled-content or named-content for spans.
If the element has a content-type attribute, or at least one class, then
that value is used as `content-type` and the span is put inside a
`<named-content>` element. Otherwise a `<styled-content>` element is
used instead.

Closes: #7211
2021-04-28 22:21:34 +02:00
Albert Krewinkel
2d60524de4
JATS writer: convert spans to <named-content> elements
Spans with attributes are converted to `<named-content>` elements
instead of being wrapped with `<milestone-start/>` and `<milestone-end>`
elements. Milestone elements are not allowed in documents using the
articleauthoring tag set, so this change ensures the creation of valid
documents.

Closes: #7211
2021-04-10 11:49:18 +02:00
Albert Krewinkel
038261ea52
JATS writer: escape disallows chars in identifiers
XML identifiers must start with an underscore or letter, and can contain
only a limited set of punctuation characters. Any IDs not adhering to
these rules are rewritten by writing the offending characters as Uxxxx,
where `xxxx` is the character's hex code.
2021-04-05 21:55:54 +02:00
Erik Rask
82e8c29cb0 Include Header.Attr.attributes as XML attributes on section
Add key-value pairs found in the attributes list of Header.Attr as
XML attributes on the corresponding section element.

Any key name not allowed as an XML attribute name is dropped, as
are keys with invalid values where they are defined as enums in
DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not
intervene with computed identifiers.
2021-03-20 21:29:17 +01:00
Albert Krewinkel
a8aa301428
Jira writer: improve div/panel handling
Include div attributes in panels, always render divs with class `panel`
as panels, and avoid nesting of panels.
2021-03-13 12:10:02 +01:00
Albert Krewinkel
eb184d9148
Jira writer: use noformat instead of code for unknown languages.
Code blocks that are not marked as a language supported by Jira are
rendered as preformatted text with `{noformat}` blocks.

Fixes: tarleb/jira-wiki-markup#4
2021-03-08 12:50:35 +01:00
Albert Krewinkel
e1454fe0d0
Jira writer: use Span identifiers as anchors
Closes: tarleb/jira-wiki-markup#3.
2021-03-01 14:36:11 +01:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
Albert Krewinkel
00031fc809
Docx writer: keep raw openxml strings verbatim.
Closes: #6933
2020-12-13 14:09:59 +01:00
Jan Tojnar
70c7c5703a
Docbook writer: Handle admonition titles from Markdown reader
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.

Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
2020-12-07 07:28:39 +01:00
Jan Tojnar
dc6856530c
Docbook writer: handle admonitions
Similarly to d6fdfe6f2b,
we should handle admonitions.
2020-12-07 06:23:25 +01:00
Aner Lucero
f63b76e169 Markdown writer: default to using ATX headings.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).

* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.

Closes #6662.
2020-11-14 21:33:32 -08:00
Diego Balseiro
eda5540719
DOCX reader: Allow empty dates in comments and tracked changes (#6726)
For security reasons, some legal firms delete the date from comments and
tracked changes.

* Make date optional (Maybe) in tracked changes and comments datatypes
* Add tests
2020-10-06 21:03:00 -07:00
Michael Hoffmann
74bd5a4f47
Docx writer: better handle list items whose contents are lists (#6522)
If the first element of a bulleted or ordered list is another list,
then that first item will disappear if the target format is docx. This
changes the docx writer so that it prepends an empty string for those
cases. With this, no items will disappear.

Closes #5948.
2020-10-02 09:30:05 -07:00
Christian Despres
cae155b095
Fix hlint suggestions, update hlint.yaml (#6680)
* Fix hlint suggestions, update hlint.yaml

Most suggestions were redundant brackets. Some required
LambdaCase.

The .hlint.yaml file had a small typo, and didn't ignore camelCase
suggestions in certain modules.
2020-09-13 07:48:14 -07:00
Christian Despres
22babd5382
[API change] Rename Writers.Tables and its contents (#6679)
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
2020-09-12 08:50:36 -07:00
Christian Despres
10c6c411f9
Add Writers.Tables helper functions and types, add tests for those (#6655)
Add Writers.Tables helper functions and types, add tests for those

The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.

The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.

Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
2020-09-05 14:36:51 -07:00
Michael Hoffmann
09ea10e2b1 Escape starting periods in ms writer code blocks
If a line of ms code block output starts with a period (.), it should
be prepended by '\&' so that it is not interpreted as a roff command.

Fixes #6505
2020-07-08 23:52:28 +02:00
John MacFarlane
b1561d8e47 Use native Underline instead of Span in Jira 2020-06-22 17:55:57 -07:00
despresc
c7814f31e1 Use the new builders, modify readers to preserve empty headers
The Builder.simpleTable now only adds a row to the TableHead when the
given header row is not null. This uncovered an inconsistency in the
readers: some would unconditionally emit a header filled with empty
cells, even if the header was not present. Now every reader has the
conditional behaviour. Only the XWiki writer depended on the header
row being always present; it now pads its head as necessary.
2020-04-15 23:03:22 -04:00
despresc
4e34d366df Adapt to the newest Table type, fix some previous adaptation issues
- Writers.Native is now adapted to the new Table type.

- Inline captions should now be conditionally wrapped in a Plain, not
  a Para block.

- The toLegacyTable function now lives in Writers.Shared.
2020-04-15 23:03:22 -04:00
despresc
7254a2ae0b Implement the new Table type 2020-04-15 23:03:22 -04:00
Albert Krewinkel
c3f539364a
Jira: support citations, attachment links, and user links
Closes: #6231
Closes: #6238
Closes: #6239
2020-04-04 14:27:27 +02:00
Albert Krewinkel
d867cac8ca
Jira reader: resolve parsing issues of blockquote, color
Parsing problems occurring with block quotes and colored text have been
resolved.

Fixes: #6233
Fixes: #6235
2020-04-03 13:25:52 +02:00
Albert Krewinkel
ff9be6b384
Jira writer: convert spans with class underline to inserted text
Spans with class `underline` as converted into Jira text marked as
`+inserted+`, i.e. surrounded by plus-signs.
2020-03-31 09:57:59 +02:00
Albert Krewinkel
f5ea5f0aad
Introduce new format variants for JATS (#6067)
New formats:

- `jats_archiving` for the "Archiving and Interchange Tag Set",
- `jats_publishing` for the "Journal Publishing Tag Set", and
- `jats_articleauthoring` for the "Article Authoring Tag Set."

The "jats" output format is now an alias for "jats_archiving".

Closes: #6014
2020-02-12 20:36:02 -08:00
John MacFarlane
4c3db9273f Apply linter suggestions. Add fix_spacing to lint target in Makefile. 2020-02-07 09:08:22 -08:00
John MacFarlane
2c1a911bc1 LaTeX writer: properly handle unnumbered headings level 4+.
Closes #6018.

Previously the `\paragraph` command was used instead of
`\paragraph*` for unnumbered level 4 headings.
2020-01-01 22:32:27 -07:00
Albert Krewinkel
e43c2e75a1 RST writer: fix backslash escaping after strings
The check whether a complex inline element following a string must be
escaped, now depends on the last character of the string instead of the
first.

Fixes: #5906
2019-11-14 14:46:32 +01:00
despresc
90e436d496 Switch to new pandoc-types and use Text instead of String [API change].
PR #5884.

+ Use pandoc-types 1.20 and texmath 0.12.
+ Text is now used instead of String, with a few exceptions.
+ In the MediaBag module, some of the types using Strings
  were switched to use FilePath instead (not Text).
+ In the Parsing module, new parsers `manyChar`, `many1Char`,
  `manyTillChar`, `many1TillChar`, `many1Till`, `manyUntil`,
  `mantyUntilChar` have been added: these are like their
  unsuffixed counterparts but pack some or all of their output.
+ `glob` in Text.Pandoc.Class still takes String since it seems
  to be intended as an interface to Glob, which uses strings.
  It seems to be used only once in the package, in the EPUB writer,
  so that is not hard to change.
2019-11-12 16:03:45 -08:00
Amogh Rathore
bd2bd9b19d HTML Reader/Writer - Add support for <var> and <samp> (#5861)
Closes #5799
2019-11-04 08:42:30 -08:00
John MacFarlane
1fe9742263 Changes to build with new doctemplates/doclayout.
The new version of doctemplates adds many features to pandoc's
templating system, while remaining backwards-compatible.
New features include partials and filters.  Using template filters,
one can lay out data in enumerated lists and tables.

Templates are now layout-sensitive: so, for example, if a
text with soft line breaks is interpolated near the end of
a line, the text will break and wrap naturally.  This makes
the templating system much more suitable for programatically
generating markdown or other plain-text files from metadata.
2019-10-29 22:21:35 -07:00
Ole Martin Ruud
45479114e8 HTML reader/writer: Better handling of <q> with cite attribute (#5837)
* HTML reader: Handle cite attribute for quotes.  If a `<q>` tag has a `cite` attribute, we interpret it as a Quoted element with an inner Span.  Closes #5798

* Refactor url canonicalization into a helper function

* Modify HTML writer to handle quote with cite.

[0]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/q
2019-10-24 22:27:49 -07:00
John MacFarlane
aceee9ca48 Options.WriterOptions: Change type of writerVariables to Context Text.
This will allow structured values.

[API change]
2019-10-09 11:01:33 -07:00
John MacFarlane
7caaa3d5d6 Minor ghc 8.8 fixups. 2019-10-03 22:41:24 -07:00
John MacFarlane
e99050283e ConTeXt unit tests - tweak code property.
Inline code will never have two consecutive newlines.
We get a counterexample in this case

https://pipelines.actions.githubusercontent.com/bMXCpShstkkHbFPgw9hBRMWw2w9plyzdVM8r7CRPFBHFvidaAG/5cf52d2d-3804-412d-ae65-4f8c059b0fb7/_apis/pipelines/1/runs/116/signedlogcontent/39?urlExpires=2019-09-23T17%3A38%3A05.8358735Z&urlSigningMethod=HMACV1&urlSignature=Qtd6vnzqgSwXpAkIyp9DJY4Kn7GJzYMR8UDkLR%2FsMQY%3D

so for simplicity we just weed out code with newlines.
2019-09-23 15:03:26 -07:00
John MacFarlane
d247e9f72e Make plain output plainer.
Previously we used the following Project Gutenberg conventions
for plain output:

- extra space before and after level 1 and 2 headings
- all-caps for strong emphasis `LIKE THIS`
- underscores surrounding regular emphasis `_like this_`

This commit makes `plain` output plainer. Strong and Emph
inlines are rendered without special formatting.  Headings
are also rendered without special formatting, and with only
one blank line following.

To restore the former behavior, use `-t plain+gutenberg`.

API change: Add `Ext_gutenberg` constructor to `Extension`.

See #5741.
2019-09-22 11:33:09 -07:00
Ben Steinberg
7389919bb4 Preserve built-in styles in DOCX with custom style (#5670)
This commit prevents custom styles on divs and spans from overriding
styles on certain elements inside them, like headings, blockquotes,
and links. On those elements, the "native" style is required for the
element to display correctly. This change also allows nesting of
custom styles; in order to do so, it removes the default "Compact"
style applied to Plain blocks, except when inside a table.
2019-09-20 22:13:29 -07:00