Commit graph

669 commits

Author SHA1 Message Date
Albert Krewinkel
8523bb01b2 Lua: marshal Attr values as userdata
- Adds a new `pandoc.AttributeList()` constructor, which creates the
  associative attribute list that is used as the third component of
  `Attr` values. Values of this type can often be passed to constructors
  instead of `Attr` values.

- `AttributeList` values can no longer be indexed numerically.
2021-10-22 11:16:51 -07:00
Albert Krewinkel
e4287e6c95 Lua: marshal Pandoc values as userdata 2021-10-22 11:16:51 -07:00
Albert Krewinkel
9e74826ba9 Switch to hslua-2.0
The new HsLua version takes a somewhat different approach to marshalling
and unmarshalling, relying less on typeclasses and more on specialized
types. This allows for better performance and improved error messages.

Furthermore, new abstractions allow to document the code and exposed
functions.
2021-10-22 11:16:51 -07:00
Milan Bracke
465c28d28e Docx reader: fix handling of empty fields
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
2021-10-18 19:15:40 -07:00
Milan Bracke
6acc82c5d2 Docx parser: implement PAGEREF fields
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18 19:15:40 -07:00
Milan Bracke
193f6bfeba Docx reader: fix handling of nested fields
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.

To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
2021-10-18 19:15:40 -07:00
Emily Bourke
8af15ab345 pptx: Fix list level numbering
In PowerPoint, the content of a top-level list is at the same level as
the content of a top-level paragraph – the only difference is that a
list style has been applied.

At the moment, the pptx writer increments the paragraph level on each
list, turning what should be top-level lists into second-level lists.

This commit changes that logic, only incrementing the paragraph level on
continuation paragraphs of lists.

- Fixes https://github.com/jgm/pandoc/issues/4828
- Fixes https://github.com/jgm/pandoc/issues/4663
2021-10-17 17:24:30 -07:00
Samuel Tardieu
a41c1fe0bb asciidoc writer: translate numberLines attribute to linesnum switch
AsciiDoctor allows to request line numbering on code blocks by
using a switch on the `source` block, such as in:

```
[source%linesnum,haskell]
----
some Haskell code here
----
```
2021-10-14 13:41:12 -07:00
Milan Bracke
0f98cbff4b Avoid blockquote when parent style has more indent
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
2021-10-10 16:27:32 -07:00
Emily Bourke
aa78765bf9 pptx: Remove excessive layout tests
When I added the tests for moved layouts and deleted layouts, I added
them to all tests. However, this doesn’t really give a lot more info
than having single tests, and the extra tests take up time and disk
space.

This commit removes the moved-layouts and deleted-layouts tests, in
favour of a single test for each of those scenarios.
2021-10-07 08:45:43 -07:00
John MacFarlane
11baeb8850 OOXML tests: use pretty-printed form to display diffs.
Otherwise everything is on one line and the diff is uninformative.
2021-10-04 12:12:16 -07:00
John MacFarlane
6ff04ac52d Fix compareXML helper in Tests.Writers.OOXML.
Given how it is used, we were getting "mine" and "good"
flipped in the test results.
2021-10-02 06:52:40 -07:00
John MacFarlane
c266734448 Use pretty-simple to format native output.
Previously we used our own homespun formatting.  But this
produces over-long lines that aren't ideal for diffs in tests.
Easier to use something off-the-shelf and standard.

Closes #7580.

Performance is slower by about a factor of 10, but this isn't
really a problem because native isn't suitable as a serialization
format. (For serialization you should use json, because the reader
is so much faster than native.)
2021-09-21 12:37:42 -07:00
John MacFarlane
a1ca51c979 Command tests: raise error if command doesn't begin with %. 2021-09-21 10:42:14 -07:00
Emily Bourke
50adea220d pptx: Support footers in the reference doc
In PowerPoint, it’s possible to specify footers across all slides,
containing a date (optionally automatically updated to today’s date),
the slide number (optionally starting from a higher number than 1), and
static text. There’s also an option to hide the footer on the title
slide.

Before this commit, none of that footer content was pulled through from
the reference doc: this commit supports all the functionality listed
above.

There is one behaviour which may not be immediately obvious: if the
reference doc specifies a fixed date (i.e. not automatically updating),
and there’s a date specified in the metadata for the document, the
footer date is replaced by the metadata date.

- Include date, slide number, and static footer content from reference
  doc
- Respect “slide number starts from” option
- Respect “Don’t show on title slide” option
- Add tests
2021-09-18 09:55:45 -07:00
Emily Bourke
7c22c0202e pptx: Support specifying slide background images
In the reveal-js output, it’s possible to use reveal’s
`data-background-image` class on a slide’s title to specify a background
image for the slide.

With this commit, it’s possible to use `background-image` in the same
way for pptx output. Only the “stretch” mode is supported, and the
background image is centred around the slide in the image’s larger axis,
matching the observed default behaviour of PowerPoint.

- Support `background-image` per slide.
- Add tests.
- Update manual.
2021-09-16 19:45:53 -07:00
Emily Bourke
0fb6474a55 pptx: Add support for incremental lists
- Support -i option
- Support incremental/noincremental divs
- Support older block quote syntax
- Add tests

One thing not clear from the manual is what should happen when the input
uses a combination of these things. For example, what should the
following produce?

```md
::: {.incremental .nonincremental}
- are
- these
- incremental?
:::

::: incremental
::::: nonincremental
- or
- these?
:::::
:::

::: nonincremental
> - how
> - about
> - these?
:::
```

In this commit I’ve taken the following approach, matching the observed
behaviour for beamer and reveal.js output:

- if a div with both classes, incremental wins
- the innermost incremental/nonincremental div is the one which takes
  effect
- a block quote containing a list as its first element inverts whether
  the list is incremental, whether or not the quote is inside an
  incremental/non-incremental div

I’ve added some tests to verify this behaviour.

This commit closes issue #5689
(https://github.com/jgm/pandoc/issues/5689).
2021-09-15 09:13:05 -07:00
Emily Bourke
0ebe65e651 pptx: Fix logic for choosing Comparison layout
There was a mistake in the logic used to choose between the Comparison
and Two Content layouts: if one column contained only non-text (an image
or a table) and the other contained only text, the Comparison layout was
chosen instead of the desired Two Content layout.

This commit fixes that logic:

> If either column contains text followed by non-text, use Comparison.
  Otherwise, use Two Content.

It also adds a test asserting this behaviour.
2021-09-13 08:30:36 -07:00
Francesco Mazzoli
99a4d1d0b0
Support --reference-location for HTML output (#7461)
The HTML writer now supports `EndOfBlock`, `EndOfSection`, and
`EndOfDocument` for reference locations.  EPUB and HTML slide
show formats are also affected by this change.

This works similarly to the markdown writer, but with special care
taken to skipping section divs with what regards to the block level.

The change also takes care to not modify the output if `EndOfDocument`
is used.
2021-09-10 09:30:05 -07:00
Emily Bourke
b82a01b688 pptx: Add support for more layouts
Until now, the pptx writer only supported four slide layouts: “Title
Slide” (used for the automatically generated metadata slide), “Section
Header” (used for headings above the slide level), “Two Column” (used
when there’s a columns div containing at least two column divs), and
“Title and Content” (used for all other slides).

This commit adds support for three more layouts: Comparison, Content
with Caption, and Blank.

- Support “Comparison” slide layout

  This layout is used when a slide contains at least two columns, at
  least one of which contains some text followed by some non-text (e.g.
  an image or table). The text in each column is inserted into the
  “body” placeholder for that column, and the non-text is inserted into
  the ObjType placeholder. Any extra content after the non-text is
  overlaid on top of the preceding content, rather than dropping it
  completely (as currently happens for the two-column layout).

  + Accept straightforward test changes

    Adding the new layout means the “-deleted-layouts” tests have an
    additional layout added to the master and master rels.

  + Add new tests for the comparison layout
  + Add new tests to pandoc.cabal

- Support “Content with Caption” slide layout

  This layout is used when a slide’s body contains some text, followed by
  non-text (e.g. and image or a table). Before now, in this case the image
  or table would break onto a new slide: to get that output again, users
  can add a horizontal rule before the image or table.

  + Accept straightforward tests

    The “-deleted-layouts” tests all have an extra layout and relationship
    in the master for the Content with Caption layout.

  + Accept remove-empty-slides test

    Empty slides are still removed, but the Content with Caption layout is
    now used.

  + Change slide-level-0/h1-h2-with-text description

    This test now triggers the content with caption layout, giving a
    different (but still correct) result.

  + Add new tests for the new layout
  + Add new tests to the cabal file

- Support “Blank” slide layout

  This layout is used when a slide contains only blank content (e.g.
  non-breaking spaces). No content is inserted into any placeholders in
  the layout.

  Fixes #5097.

  + Accept straightforward test changes

    Blank layout now copied over from reference doc as well, when
    layouts have been deleted.

  + Add some new tests

    A slide should use the blank layout if:

    - It contains only speaker notes
    - It contains only an empty heading with a body of nbsps
    - It contains only a heading containing only nbsps

- Change ContentType -> Placeholder

  This type was starting to have a constructor for each placeholder on
  each slide (e.g. `ComparisonUpperLeftContent`). I’ve changed it
  instead to identify a placeholder by type and index, as I think that’s
  clearer and less redundant.

- Describe layout-choosing logic in manual
2021-09-01 07:16:17 -07:00
Emily Bourke
8dbea49092 pptx: Restructure tests
- Use dashes consistently rather than underscores
- Make a folder for each set of tests
- List test files explicitly (Cabal doesn’t support ** until version
  2.4)
2021-09-01 07:16:17 -07:00
Emily Bourke
8e5a79f264 pptx: Make first heading title if slide level is 0
Before this commit, the pptx writer adds a slide break before any table,
“columns” div, or paragraph starting with an image, unless the only
thing before it on the same slide is a heading at the slide level. In
that case, the item and heading are kept on the same slide, and the
heading is used as the slide title (inserted into the layout’s “title”
placeholder).

However, if the slide level is set to 0 (as was recently enabled) this
makes it impossible to have a slide with a title which contains any of
those items in its body.

This commit changes this behaviour: now if the slide level is 0, then
items will be kept with a heading of any level, if the heading’s the
only thing before the item on the same slide.
2021-08-27 09:47:03 -07:00
John MacFarlane
2e9a8935fb OOXML tests: silence warnings.
These can make the test output confusing, making people think
tests are failing when they're passing.
2021-08-17 15:33:10 -07:00
Emily Bourke
72823ad947 pptx: Select layouts from reference doc by name
Until now, users had to make sure that their reference doc contains
layouts in a specific order: the first four layouts in the file had to
have a specific structure, or else pandoc would error (or sometimes
successfully produce a pptx file, which PowerPoint would then fail to
open).

This commit changes the layout selection to use the layout names rather
than order: users must make sure their reference doc contains four
layouts with specific names, and if a layout with the right name isn’t
found pandoc will output a warning and use the corresponding layout from
the default reference doc as a fallback.

I believe the use of names rather than order will be clearer to users,
and the clearer errors will help them troubleshoot when things go wrong.

- Add tests for moved layouts
- Add tests for deleted layouts
- Add newly included layouts to slideMaster1.xml to fix tests
2021-08-17 09:35:25 -07:00
Emily Bourke
9204e5c9b1 Don’t compare cdLine in OOXML golden tests
The `cdLine` field gives the line of the file some CData was found on. I
don’t think this is a difference that should fail these golden tests, as
the XML should still be parsable if nothing else has changed.
2021-08-17 09:35:25 -07:00
Emily Bourke
8474d488a5 Provide more detailed XML diff in tests
I had some failing tests and couldn’t tell what was different in the
XML. Updating the comparison to return what’s different made it easier
to figure out what was wrong, and I think will be helpful for others in
future.
2021-08-17 09:35:25 -07:00
OCzarnecki
e37cf4484d
Multimarkdown sub- and superscripts (#5512) (#7188)
Added an extension `short_subsuperscripts` which modifies the behavior
of `subscript` and `superscript`, allowing subscripts or superscripts containing only
alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is
combustible`).  This improves support for multimarkdown. Closes #5512.

Add `Ext_short_subsuperscripts` constructor to `Extension` [API change].
This is enabled by default for `markdown_mmd`.
2021-08-15 21:57:57 -07:00
John MacFarlane
4340bd52c4 Make docx writer sensitive to native_numbering extension.
Figure and table numbers are now only included if `native_numbering`
is enabled.  (By default it is disabled.)  This is a behavior change
with respect to 2.14.1, but the behavior is that of previous versions.

The change was necessary to avoid incompatibilities between pandoc's
native numbering and third-party cross reference filters like
pandoc-crossref.

Closes #7499.
2021-08-15 15:05:54 -07:00
John MacFarlane
06d97131e5 Tests.Helpers: export testGolden and use it in RTF reader.
This gives a diff output on failure.
2021-08-10 22:07:48 -07:00
John MacFarlane
7ca4233793 Add test for #7488. 2021-08-10 11:11:33 -07:00
John MacFarlane
6543b05116 Add RTF reader.
- `rtf` is now supported as an input format as well as output.
- New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change]

Closes #3982.
2021-08-10 10:48:55 -07:00
Michael Hoffmann
e56e2b0e0b
Recognize data-external when reading HTML img tags (#7429)
Preserve all attributes in img tags.  If attributes have a `data-`
prefix, it will be stripped.  In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06 16:06:29 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
Jan Tojnar
af9de925de DocBook writer: Remove non-existent admonitions
attention, error and hint are actually just reStructuredText specific.
danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-06-05 08:02:21 -06:00
John MacFarlane
5cf887db20 Reduce size of cover image in test epub. 2021-05-29 11:48:52 -07:00
Emily Bourke
56b211120c
Docx reader: Support new table features.
* Column spans
* Row spans
  - The spec says that if the `val` attribute is ommitted, its value
    should be assumed to be `continue`, and that its values are
    restricted to {`restart`, `continue`}. If the value has any other
    value, I think it seems reasonable to default it to `continue`. It
    might cause problems if the spec is extended in the future by adding
    a third possible value, in which case this would probably give
    incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
  - The table description element is like alt text for a table (along
    with the table caption element). It seems like we should include
    this somewhere, but I’m not 100% sure how – I’m pairing it with the
    simple caption for the moment. (Should it maybe go in the block
    caption instead?)
* Detect table captions
  - Check for caption paragraph style /and/ either the simple or
    complex table field. This means the caption detection fails for
    captions which don’t contain a field, as in an example doc I added
    as a test. However, I think it’s better to be too conservative: a
    missed table caption will still show up as a paragraph next to the
    table, whereas if I incorrectly classify something else as a table
    caption it could cause havoc by pairing it up with a table it’s
    not at all related to, or dropping it entirely.
* Update tests and add new ones

Partially fixes: #6316
2021-05-28 20:15:23 +02:00
Emily Bourke
44484d0dee
Docx reader: Read table column widths. 2021-05-28 20:15:23 +02:00
John MacFarlane
e0a1f7d2cf Command tests: fail if a file contains no tests.
And fix a test that failed in that way!
2021-05-26 09:52:23 -07:00
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
John MacFarlane
8511f6fdf6 MediaBag improvements.
In the current dev version, we will sometimes add
a version of an image with a hashed name, keeping
the original version with the original name, which
would leave to undesirable duplication.

This change separates the media's filename from the
media's canonical name (which is the path of the link
in the document itself).  Filenames are based on SHA1
hashes and assigned automatically.

In Text.Pandoc.MediaBag:

- Export MediaItem type [API change].
- Change MediaBag type to a map from Text to MediaItem [API change].
- `lookupMedia` now returns a `MediaItem` [API change].
- Change `insertMedia` so it sets the `mediaPath` to
  a filename based on the SHA1 hash of the contents.
  This will be used when contents are extracted.

In Text.Pandoc.Class.PandocMonad:

- Remove `fetchMediaResource` [API change].

Lua MediaBag module has been changed minimally. In the future
it would be better, probably, to give Lua access to the full
MediaItem type.
2021-05-24 09:20:44 -07:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
Albert Krewinkel
1843a8793a
HTML writer: keep attributes from code nested below pre tag.
If a code block is defined with `<pre><code
class="language-x">…</code></pre>`, where the `<pre>` element has no
attributes, then the attributes from the `<code>` element are used
instead. Any leading `language-` prefix is dropped in the code's *class*
attribute are dropped to improve syntax highlighting.

Closes: #7221
2021-05-17 18:08:02 +02:00
Albert Krewinkel
25f5b92777
HTML writer: ensure headings only have valid attribs in HTML4
Fixes: #5944
2021-05-17 15:42:15 +02:00
Albert Krewinkel
4417dacc44
ConTeXt writer: use span identifiers as reference anchors.
Closes: #7246
2021-05-17 13:14:32 +02:00
Albert Krewinkel
d3ca48656f
ConTeXt writer tests: keep code lines below 80 chars. 2021-05-17 13:11:33 +02:00
Albert Krewinkel
0794862aac
HTML writer: parse <header> as a Div
HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-15 16:46:02 +02:00
John MacFarlane
6e45607f99 Change reader types, allowing better tracking of source positions.
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.

As a result, the readers had no way of knowing which file
was the source of any particular bit of text.  This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).

Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class.  A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`.  The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).

Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides.  Modified parsers to use a `Sources` as stream
[API change].

The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.

In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.

T.P.Error: showPos, do not print "-" as source name.
2021-05-09 19:11:34 -06:00
mbrackeantidot
b6a65445e1
Docx reader: add handling of vml image objects (jgm#4735) (#7257)
They represent images, the same way as other images in vml format.
2021-04-29 09:11:44 -07:00
John MacFarlane
80e2e88287 Smarter smart quotes.
Treat a leading " with no closing " as a left curly quote.
This supports the practice, in fiction, of continuing
paragraphs quoting the same speaker without an end quote.
It also helps with quotes that break over lines in line
blocks.

Closes #7216.
2021-04-28 23:32:37 -07:00
Albert Krewinkel
85f379e474
JATS writer: use either styled-content or named-content for spans.
If the element has a content-type attribute, or at least one class, then
that value is used as `content-type` and the span is put inside a
`<named-content>` element. Otherwise a `<styled-content>` element is
used instead.

Closes: #7211
2021-04-28 22:21:34 +02:00