Commit graph

7705 commits

Author SHA1 Message Date
John MacFarlane
d80aaee42b Translations: don't depend on the fact that Aeson Object is...
implemented internally as a HashMap.  This is no longer
public as of aeson 2.0.0.0.
2021-10-10 09:36:33 -07:00
John MacFarlane
5a1bd52677 Don't prepend file:// to --syntax-definition on Windows.
This was a fix for a problem in skylighting, but this
problem doesn't exist now that we've moved from HXT to
xml-conduit.

Cf. #6374.
2021-10-06 12:33:22 -07:00
John MacFarlane
6c507a66cf Avoid bad wraps in markdown writer at the Doc Text level.
Previously we tried to do this at the Inline list level,
but it makes more sense to intervene on breaking spaces
at the Doc Text level.
2021-10-05 08:44:51 -07:00
John MacFarlane
b8d460eeab Powerpoint writer: consolidate text runs when possible.
This slims down the output files by avoiding unnecessary
text run elements.

Updated golden tests.
2021-10-04 12:24:12 -07:00
John MacFarlane
82d587493d Revert "Powerpoint writer: consolidate text run nodes."
This reverts commit 62f83aa486.

This was already being done, it seems.
I misidentified the problem; it is really with `Str ""` nodes.
2021-10-04 11:50:32 -07:00
John MacFarlane
62f83aa486 Powerpoint writer: consolidate text run nodes.
This should reduce the size of the generated files.
2021-10-04 11:45:01 -07:00
John MacFarlane
fb0d6c7cb6 Depend on pandoc-types 1.23, remove Null constructor on Block. 2021-10-01 15:42:00 -07:00
nuew
021cdb543b epub: Add EPUB3 subject metadata (authority/term)
This adds the ability to specify EPUB 3 `authority` and `term` specific
refinements to the `subject` tag. Specifying a plain `subject` tag in
metadata will function as before.
2021-09-30 20:53:07 -07:00
John MacFarlane
75d551f65b Add footnotes to default gfm etxensions.
Now that `gfm` supports footnotes.
https://github.blog/changelog/2021-09-30-footnotes-now-supported-in-markdown-fields/
2021-09-30 13:08:13 -07:00
Ezwal
472b33095e Docx reader: Add placeholder for word diagram 2021-09-30 12:44:44 -07:00
John MacFarlane
45db998b39 EPUB writer: treat epub:type "frontispiece" as front matter.
This allows you to include a frontispiece using

```

![](yourimage.jpg)

etc.
```

Closes #7600.
2021-09-29 08:29:14 -07:00
John MacFarlane
0bdcf415e4 Switch from pretty-simple to pretty-show for native output.
Update tests.

Reason:  it turns out that the native output generated by
pretty-simple isn't always readable by the native reader.
According to https://github.com/cdepillabout/pretty-simple/issues/99
it is not a design goal of the library that the rendered values
be readable using 'read'.  This makes it unsuitable for our
purposes.

pretty-show is a bit slower and it uses 4-space indents
(non-configurable), but it doesn't have this serious drawback.
2021-09-28 21:17:53 -07:00
John MacFarlane
8018179b3d Better implementation of splitStrWhen 2021-09-27 16:43:13 -07:00
John MacFarlane
df57d0930b RST writer: properly handle anchors to ids...
with spaces or leading underscore.

In this cases we need the quoted form, e.g.
```
.. _`foo bar`:

.. _`_foo`:
```

Side note: rST will "normalize" these identifiers anyway,
ignoring the underscore:
https://docutils.sourceforge.io/docs/ref/rst/directives.html#identifier-normalization

Closes #7593.
2021-09-26 21:56:21 -07:00
John MacFarlane
665e6d3d94 BibTeX parser: fix expansion of special strings in series...
e.g. `newseries` or `library`.  Expansion should not happen
when these strings are protected in braces, or when they're
capitalized.

Closes #7591.
2021-09-23 22:21:05 -07:00
John MacFarlane
aa89f6be18 HTML reader: handle empty tbody element in table.
Closes #7589.
2021-09-23 09:25:37 -07:00
John MacFarlane
f0a6eb913d HTML writer: render \ref and \eqref as inline math...
not display.  See #7589.
2021-09-23 08:49:52 -07:00
John MacFarlane
0afb48cd38 HTML writer: pass through \ref and \eqref...
if MathJax is used.

Closes #7587.
2021-09-22 22:33:41 -07:00
John MacFarlane
7ab2f4a61d HTML writer: pass through inline math environments with KaTeX. 2021-09-22 22:18:38 -07:00
John MacFarlane
c266734448 Use pretty-simple to format native output.
Previously we used our own homespun formatting.  But this
produces over-long lines that aren't ideal for diffs in tests.
Easier to use something off-the-shelf and standard.

Closes #7580.

Performance is slower by about a factor of 10, but this isn't
really a problem because native isn't suitable as a serialization
format. (For serialization you should use json, because the reader
is so much faster than native.)
2021-09-21 12:37:42 -07:00
John MacFarlane
c9ce6da1bb LaTeX reader: Recognize that \vadjust sometimes takes "pre".
Closes #7531.
2021-09-19 10:12:07 -07:00
John MacFarlane
132a6df51e Ignore (and gobble parameters of) CSLReferences environment.
Otherwise we get the parameters as numbers in the output.
Closes #7531.
2021-09-19 10:10:45 -07:00
John MacFarlane
dd7b83ac91 Use babel, not polyglossia, with xelatex.
Previously polyglossia worked better with xelatex, but
that is no longer the case, so we simplify the code so that
babel is used with all latex engines.

This involves a change to the default LaTeX template.
2021-09-19 09:40:59 -07:00
John MacFarlane
92e0e424c0 Markdown writer: use underline class rather than ul for underline.
This only affects output with bracketed_spans enabled.
The markdown reader parses spans with either `.ul` or `.underline` as
Underline elements, but we're moving towards preferring the latter.
2021-09-19 09:12:55 -07:00
John MacFarlane
5684e6a76c Alphabetize Extension constructors. 2021-09-18 21:13:38 -07:00
Emily Bourke
4a5ed7e04a pptx-footers: Replace fixed dates with yaml date 2021-09-18 09:55:45 -07:00
Emily Bourke
50adea220d pptx: Support footers in the reference doc
In PowerPoint, it’s possible to specify footers across all slides,
containing a date (optionally automatically updated to today’s date),
the slide number (optionally starting from a higher number than 1), and
static text. There’s also an option to hide the footer on the title
slide.

Before this commit, none of that footer content was pulled through from
the reference doc: this commit supports all the functionality listed
above.

There is one behaviour which may not be immediately obvious: if the
reference doc specifies a fixed date (i.e. not automatically updating),
and there’s a date specified in the metadata for the document, the
footer date is replaced by the metadata date.

- Include date, slide number, and static footer content from reference
  doc
- Respect “slide number starts from” option
- Respect “Don’t show on title slide” option
- Add tests
2021-09-18 09:55:45 -07:00
John MacFarlane
cf7f80b11f Fix linter warning. 2021-09-17 10:05:55 -07:00
John MacFarlane
57d93cca56 Org writer: don't indent contents of code blocks.
We previously indented them by two spaces, following a
common convention.  Since the convention is fading, and
the indentation is inconvenient for copy/paste, we are
discontinuing this practice.

Closes #5440.
2021-09-17 09:41:34 -07:00
John MacFarlane
1487ee01fd Update list of supported source languages in org writer.
See #5440.
2021-09-17 09:22:36 -07:00
John MacFarlane
a07d955d6f Fix code blocks using --preserve-tabs.
Previously they did not behave as the equivalent input
with spaces would.  Closes #7573.
2021-09-16 20:46:05 -07:00
Emily Bourke
7c22c0202e pptx: Support specifying slide background images
In the reveal-js output, it’s possible to use reveal’s
`data-background-image` class on a slide’s title to specify a background
image for the slide.

With this commit, it’s possible to use `background-image` in the same
way for pptx output. Only the “stretch” mode is supported, and the
background image is centred around the slide in the image’s larger axis,
matching the observed default behaviour of PowerPoint.

- Support `background-image` per slide.
- Add tests.
- Update manual.
2021-09-16 19:45:53 -07:00
John MacFarlane
d2449ad926 HTML writer: set "hash" to True by default (for reveal.js).
Closes #7574.
See #6968 where the motivation for setting "hash" to True is
explained.
2021-09-16 19:27:34 -07:00
Emily Bourke
0fb6474a55 pptx: Add support for incremental lists
- Support -i option
- Support incremental/noincremental divs
- Support older block quote syntax
- Add tests

One thing not clear from the manual is what should happen when the input
uses a combination of these things. For example, what should the
following produce?

```md
::: {.incremental .nonincremental}
- are
- these
- incremental?
:::

::: incremental
::::: nonincremental
- or
- these?
:::::
:::

::: nonincremental
> - how
> - about
> - these?
:::
```

In this commit I’ve taken the following approach, matching the observed
behaviour for beamer and reveal.js output:

- if a div with both classes, incremental wins
- the innermost incremental/nonincremental div is the one which takes
  effect
- a block quote containing a list as its first element inverts whether
  the list is incremental, whether or not the quote is inside an
  incremental/non-incremental div

I’ve added some tests to verify this behaviour.

This commit closes issue #5689
(https://github.com/jgm/pandoc/issues/5689).
2021-09-15 09:13:05 -07:00
John MacFarlane
a3162d341b RST reader: handle escaped colons in reference definitions.
Cloess #7568.
2021-09-13 22:57:08 -07:00
Emily Bourke
0ebe65e651 pptx: Fix logic for choosing Comparison layout
There was a mistake in the logic used to choose between the Comparison
and Two Content layouts: if one column contained only non-text (an image
or a table) and the other contained only text, the Comparison layout was
chosen instead of the desired Two Content layout.

This commit fixes that logic:

> If either column contains text followed by non-text, use Comparison.
  Otherwise, use Two Content.

It also adds a test asserting this behaviour.
2021-09-13 08:30:36 -07:00
John MacFarlane
6271b09c50 Docx writer: make id used in native_numbering predictable.
If the image has the id IMAGEID, then we use the id ref_IMAGEID
for the figure number.  Closes #7551.

This allows one to create a filter that adds a figure number
with figure name, e.g.

     <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple>

For this to be possible it must be possible to predict the
figure number id from the image id.

If images lack an id, an id of the form `ref_fig1` is used.
2021-09-12 15:30:29 -07:00
Kolen Cheung
c66cedaa71 fix!(ipynb writer): improve round trip identity
for raw cell output

BREAKING CHANGE:
The Jupyter ecosystem, including nbconvert, lab and notebook,
deviated from their own spec in nbformat,
where they used the key `raw_mimetype` instead of `format`.

Moreover, the mime-type of rst used in Jupyter
deviated from that suggested by
https://docutils.sourceforge.io/FAQ.html
and is defined as `text/restructuredtext`
when chosen from "Raw NBConvert Format" in Jupyter.

So while this is backward-compatible,
it should matches the real world usage better,
hence improving the round-trip "identity" in raw-cell.

See #229, jupyter/nbformat#229.
2021-09-10 21:11:28 -07:00
Kolen Cheung
17092454c5 feat(ipynb writer): add more Jupyter's "Raw NBConvert Format"
Adds more formats that Jupyter's "Raw NBConvert Format" uses
natively (asciidoc),
and maps more formats to text/html whenever it makes sense.
2021-09-10 21:11:28 -07:00
Kolen Cheung
3483a54c72 feat(ipynb reader): get cell output mime from raw_mimetype too
While the spec defined format, in practice raw_mimetype is used.
See jupyter/nbformat#229
2021-09-10 21:11:28 -07:00
Kolen Cheung
e6bf1626d2 feat(ipynb reader): add more Jupyter's "Raw NBConvert Format"
This adds most of the available formats selectable from
Jupyter's interface "Raw NBConvert Format".
2021-09-10 21:11:28 -07:00
Kolen Cheung
6aa1087b97 fix!: rst mime type
BREAKING CHANGE:
fix rst mime type according to
https://docutils.sourceforge.io/FAQ.html
2021-09-10 21:11:28 -07:00
Emily Bourke
1dba1e6dc8 pptx: Copy embedded fonts from reference doc
We already copy the relationships and elements in presentation.xml for
embedded fonts, so at the moment using a reference doc with embedded
fonts is broken, producing a pptx that PowerPoint says needs repairing.

This commit copies the fonts over, which I believe is all that’s needed
to work correctly with reference docs with embedded fonts.
2021-09-10 17:06:45 -07:00
Emily Bourke
ec7cea294d pptx: Fix presentation rel numbering
Before now, the numbering of rIds was inconsistent when making the
presentation XML and when making the presentation relationships XML.

For the relationships, the slides were inserted into the rId order after
the first master, and everything else was moved up out of the way.
However, this change was then missed in the presentation XML, I think
because `envSlideOffset` was never set. The result was that any slide
masters after the first would have the wrong rIds in the presentation
XML, clashing with the slides, which would lead PowerPoint to view
produced files as corrupt. As well, other relationships (like embedded
fonts) would have their rId changed in the relationships XML but not in
the presentation XML.

This commit:

- Removes `envSlideOffset` in favour of directly passed function
  arguments
- Inserts the slides into the rId order after all masters rather than
  after the first
- Updates any other rIds in presentation.xml that need to be changed
2021-09-10 17:06:45 -07:00
Emily Bourke
2b98991551 pptx: Include all themes in output archive
- Accept test changes: they’re adding the second theme (for all tests
  not containing speaker notes), or changing its position in the
  XML (for the ones containing speaker notes).
2021-09-10 17:06:45 -07:00
Emily Bourke
b60c6157fe pptx: Don’t add relationships unnecessarily
Before now, for any layouts added to the output from the default
reference doc, the relationships were unconditionally added to the
output. However, if there was already a layout in slideMaster1 at the
same index then that results in duplicate relationships.

This commit checks first, and only adds the relationship if it doesn’t
already exist.
2021-09-10 17:06:45 -07:00
Emily Bourke
8ec9b884f1 pptx: Fix capitalisation of notesMasterId
I don’t think this has caused any problems, but before now it’s been
"NotesMasterId", which is incorrect according to [ECMA-376].

[ECMA-376]: https://www.ecma-international.org/publications-and-standards/standards/ecma-376/
2021-09-10 17:06:45 -07:00
John MacFarlane
78b2d74756 Remove redundant import. 2021-09-10 11:02:22 -07:00
John MacFarlane
0216a2f504 Org reader: don't parse a list as first item in a list item.
Closes #7557.
2021-09-10 09:50:05 -07:00
Francesco Mazzoli
99a4d1d0b0
Support --reference-location for HTML output (#7461)
The HTML writer now supports `EndOfBlock`, `EndOfSection`, and
`EndOfDocument` for reference locations.  EPUB and HTML slide
show formats are also affected by this change.

This works similarly to the markdown writer, but with special care
taken to skipping section divs with what regards to the block level.

The change also takes care to not modify the output if `EndOfDocument`
is used.
2021-09-10 09:30:05 -07:00
Kolen Cheung
1481dae629
Ipynb reader handleData: support text/markdown (#7561)
`text/markdown` is now a supported mime type for raw output.
2021-09-10 09:26:55 -07:00
John MacFarlane
0b1c5a87da RTF reader: support \binN for binary image data. 2021-09-08 09:30:58 -07:00
John MacFarlane
ddfa7b2a63 App: Issue NotUTF8Encoded warning when falling back to latin1. 2021-09-08 09:30:16 -07:00
John MacFarlane
dee30e2a1b Logging: add NotUTF8Encoded constructor to LogMessage.
[API change]
2021-09-08 09:29:46 -07:00
John MacFarlane
b185560a8e RTF reader: better handling of \* and bookmarks.
We now ensure that groups starting with `\*` never cause
text to be added to the document.

In addition, bookmarks now create a span between the start
and end of the bookmark, rather than an empty span.
2021-09-04 11:06:01 -07:00
John MacFarlane
aaef51707c Minor renaming to avoid shadowing. 2021-09-04 11:06:01 -07:00
John MacFarlane
481ff8ac44 Extensions: put Ext_short_subsuperscripts in alphabetical order. 2021-09-04 11:06:01 -07:00
John MacFarlane
10c4719076 RTF reader: if doc begins with {\rtf1 ... } only parse its contents.
Some documents seem to have non-RTF (e.g. XML) material after the
`{\rtf1 ... }` group.
2021-09-03 21:50:30 -07:00
John MacFarlane
e5d0b702c7 RTF reader: Ignore \pgdsc group.
Otherwise we get style names treated as test.
2021-09-03 19:52:52 -07:00
Emily Bourke
b82a01b688 pptx: Add support for more layouts
Until now, the pptx writer only supported four slide layouts: “Title
Slide” (used for the automatically generated metadata slide), “Section
Header” (used for headings above the slide level), “Two Column” (used
when there’s a columns div containing at least two column divs), and
“Title and Content” (used for all other slides).

This commit adds support for three more layouts: Comparison, Content
with Caption, and Blank.

- Support “Comparison” slide layout

  This layout is used when a slide contains at least two columns, at
  least one of which contains some text followed by some non-text (e.g.
  an image or table). The text in each column is inserted into the
  “body” placeholder for that column, and the non-text is inserted into
  the ObjType placeholder. Any extra content after the non-text is
  overlaid on top of the preceding content, rather than dropping it
  completely (as currently happens for the two-column layout).

  + Accept straightforward test changes

    Adding the new layout means the “-deleted-layouts” tests have an
    additional layout added to the master and master rels.

  + Add new tests for the comparison layout
  + Add new tests to pandoc.cabal

- Support “Content with Caption” slide layout

  This layout is used when a slide’s body contains some text, followed by
  non-text (e.g. and image or a table). Before now, in this case the image
  or table would break onto a new slide: to get that output again, users
  can add a horizontal rule before the image or table.

  + Accept straightforward tests

    The “-deleted-layouts” tests all have an extra layout and relationship
    in the master for the Content with Caption layout.

  + Accept remove-empty-slides test

    Empty slides are still removed, but the Content with Caption layout is
    now used.

  + Change slide-level-0/h1-h2-with-text description

    This test now triggers the content with caption layout, giving a
    different (but still correct) result.

  + Add new tests for the new layout
  + Add new tests to the cabal file

- Support “Blank” slide layout

  This layout is used when a slide contains only blank content (e.g.
  non-breaking spaces). No content is inserted into any placeholders in
  the layout.

  Fixes #5097.

  + Accept straightforward test changes

    Blank layout now copied over from reference doc as well, when
    layouts have been deleted.

  + Add some new tests

    A slide should use the blank layout if:

    - It contains only speaker notes
    - It contains only an empty heading with a body of nbsps
    - It contains only a heading containing only nbsps

- Change ContentType -> Placeholder

  This type was starting to have a constructor for each placeholder on
  each slide (e.g. `ComparisonUpperLeftContent`). I’ve changed it
  instead to identify a placeholder by type and index, as I think that’s
  clearer and less redundant.

- Describe layout-choosing logic in manual
2021-09-01 07:16:17 -07:00
John MacFarlane
5dcd4610e2 Improve asciidoc escaping for -- in URLs. Closes #7529. 2021-08-29 10:12:20 -07:00
John MacFarlane
d6d7c9620a Add --sandbox option.
+ Add sandbox feature for readers.  When this option is used,
  readers and writers only have access to input files (and
  other files specified directly on command line).  This restriction
  is enforced in the type system.
+ Filters, PDF production, custom writers are unaffected.  This
  feature only insulates the actual readers and writers, not
  the pipeline around them in Text.Pandoc.App.
+ Note that when `--sandboxed` is specified, readers won't have
  access to the resource path, nor will anything have access to
  the user data directory.
+ Add module Text.Pandoc.Class.Sandbox, defining
  `sandbox`.  Exported via Text.Pandoc.Class. [API change]

Closes #5045.
2021-08-28 22:31:42 -07:00
John MacFarlane
b76796eae8 Remove unneeded import. 2021-08-28 12:44:03 -07:00
John MacFarlane
51caa8b78d Docx writer: handle SVG images.
This change has several parts:

- In Text.Pandoc.App, if the writer is docx, we fill the media
  bag and attempt to convert any SVG images to PNG, adding these
  to the media bag.  The PNG backups have the same filenames as
  the SVG images, but with an added .png extension.  If the conversion
  cannot be done (e.g. because rsvg-convert is not present),
  a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016's syntax for
  including SVG images. If a PNG fallback is present in the media bag,
  we include a link to that too.

It would be helpful if someone with an old Word version could test
to see that the documents we produce can be opened and viewed with
the PNG fallbacks.  If not, then perhaps we can eliminate the
slightly complex code for producing these fallbacks.

Closes #4058.
2021-08-28 12:16:14 -07:00
John MacFarlane
4d7cdc4671 Image: Generalize svgToPng to MonadIO. 2021-08-27 22:27:01 -07:00
John MacFarlane
eb7ed27f3f Add haddock for dpi parameter. 2021-08-27 16:35:58 -07:00
John MacFarlane
7d0db79003 T.P.Image: svgToPng, change first parameter from WriterOptions to Int.
The information we need is just a DPI, so why require more?
2021-08-27 16:25:50 -07:00
Emily Bourke
8e5a79f264 pptx: Make first heading title if slide level is 0
Before this commit, the pptx writer adds a slide break before any table,
“columns” div, or paragraph starting with an image, unless the only
thing before it on the same slide is a heading at the slide level. In
that case, the item and heading are kept on the same slide, and the
heading is used as the slide title (inserted into the layout’s “title”
placeholder).

However, if the slide level is set to 0 (as was recently enabled) this
makes it impossible to have a slide with a title which contains any of
those items in its body.

This commit changes this behaviour: now if the slide level is 0, then
items will be kept with a heading of any level, if the heading’s the
only thing before the item on the same slide.
2021-08-27 09:47:03 -07:00
John MacFarlane
e4d7a6177f Ensure we have unique ids for wp:docPr and pic:cNvPr elements.
This will, I hope, fix #7527 and #7503.
2021-08-27 09:42:59 -07:00
John MacFarlane
c29a72ffe7 Comment out unused module. 2021-08-24 23:27:59 -07:00
John MacFarlane
fd7c140cde Reorganize App to make it easier to limit IO in main loop.
Previously we used liftIO fairly liberally.  The code has
been restructured to avoid this.

A small behavior change is that pandoc will now fall back
to latin1 encoding for inputs that can't be read as UTF-8.
This is what it did previously for content fetched from
the web and not marked as to content type. It makes sense
to do the same for local files.
2021-08-24 22:19:20 -07:00
John MacFarlane
c39ddeb8f8 Text.Pandoc.Class: add readStdinStrict method to PandocMonad.
[API change]
2021-08-24 22:19:15 -07:00
John MacFarlane
8ad22002cb Class: Generalize type of extractMedia.
It was uselessly  restricted to PandocIO, instead of any
instance of PandocMonad and MonadIO.

[API change]
2021-08-24 22:18:37 -07:00
John MacFarlane
bf860df938 T.P.App.OutputSettings: Generalize some types...
so we can run this with any instance of PandocMonad and MonadIO,
not just PandocIO.
2021-08-24 22:18:25 -07:00
John MacFarlane
0efbfb33ad Text.Pandoc.Filter: Generalize type of applyFilters...
from PandocIO to any instance of MonadIO and PandocMonad.
[API change]
2021-08-24 22:18:14 -07:00
John MacFarlane
65e78dac74 PDF: generalize type of makePDF...
instead of PandocIO, it can be used in any instance of
PandocMonad, MonadIO, and MonadMask.

[API change]
2021-08-24 22:18:06 -07:00
John MacFarlane
0df003b099 Lua subsystem and custom writers: generalize types from PandocIO...
to any instance of PandocMonad and MonadIO.

This involves an API change, since the type of
runLua is now

    (PandocMonad m, MonadIO m) => Lua a -> m (Either PandocError a)
2021-08-24 22:17:52 -07:00
John MacFarlane
3f9b7a10ad Markdown reader: fix interaction of --strip-comments and list
parsing.  Use of `--strip-comments` was causing tight lists
to be rendered as loose (as if the comment were a blank line).
Closes #7521.
2021-08-23 22:06:39 -07:00
John MacFarlane
5a23f8ff3e Clean up PDF module.
Previously we had to run runIOorExplode inside withTempDir.
Now that PandocIO is an instance of MonadMask, this is no
longer necessary.
2021-08-22 19:00:43 -07:00
John MacFarlane
d37dea9eeb PandocIO: derive MonadCatch, MonadThrow, MonadMask.
This will allow us to use withTempDir.
2021-08-22 18:35:28 -07:00
John MacFarlane
10a71c484f App: Move output-file writing out of PandocMonad action. 2021-08-22 08:44:50 -07:00
Simon Schuster
591cdca38b LaTeX-parser: restrict \endinput to current file 2021-08-21 18:08:27 -07:00
John MacFarlane
07d847a910 RST reader: Fix :literal: includes.
These should create code blocks, not insert raw RST.
Closes #7513.
2021-08-20 09:54:42 -07:00
John MacFarlane
ef4efa5373 Improve docx reader's robustness in extracting images.
The docx reader made a couple assumptions about how docx
containers were laid out that were not always true, with
the result that some images in documents did not get
found/extracted.

Closes #7511.
2021-08-19 10:50:34 -07:00
Emily Bourke
5616d00d09 pptx: Include image title in description
The image title (i.e. `![alt text](link "title")`) was previously
ignored when writing to pptx. This commit includes it in PowerPoint's
description of the image, along with the link (which was already
included).

Fixes 7352.
2021-08-18 10:10:55 -07:00
John MacFarlane
fd99fe4d7e Revise citeproc code to fit new citeproc 0.5 API.
Linkification of URLs in the bibliography is now done in
the citeproc library, depending on the setting of an option.
We set that option depending on the value of the metadata
field `link-bibliography` (defaulting to true, for consistency
with earlier behavior, though the new behavior includes the
CSL draft recommendation of hyperlinking the title or the whole
entry if a DOI, PMID, PMCID, or URL field is present but not
explicitly rendered).

These changes implement the following recommendations from the
draft CSL v1.0.2 spec (Appendix VI):

> The CSL syntax does not have support for configuration of links.
> However, processors should include links on bibliographic references,
> using the following rules:

> If the bibliography entry for an item renders any of the following
> identifiers, the identifier should be anchored as a link, with the
> target of the link as follows:

> - url: output as is
> - doi: prepend with "`https://doi.org/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"

> If the identifier is rendered as a URI, include rendered URI components
> (e.g. "`https://doi.org/`") in the link anchor. Do not include any other
> affix text in the link anchor (e.g. "Available from: ", "doi: ", "PMID: ").
> If the bibliography entry for an item does not render any of
> the above identifiers, then set the anchor of the link as the item
> title. If title is not rendered, then set the anchor of the link as the
> full bibliography entry for the item. Set the target of the link as one
> of the following, in order of priority:
>
> - doi: prepend with "`https://doi.org/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - url: output as is
>
> If the item data does not include any of the above identifiers, do not
> include a link.
>
> Citation processors should include an option flag for calling
> applications to disable bibliography linking behavior.

Thanks to Benjamin Bray for getting this all working.
2021-08-17 15:34:23 -07:00
John MacFarlane
8451bce6de Rename TemplateWarning -> PowerpointTemplateWarning.
@undergroundquizscene - I think TemplateWarning
is apt to be confusing, since this actually doesn't have
anything to do with what we call 'templates' in pandoc.
Hence the change to a powerpoint-specific name.
2021-08-17 15:31:52 -07:00
Emily Bourke
72823ad947 pptx: Select layouts from reference doc by name
Until now, users had to make sure that their reference doc contains
layouts in a specific order: the first four layouts in the file had to
have a specific structure, or else pandoc would error (or sometimes
successfully produce a pptx file, which PowerPoint would then fail to
open).

This commit changes the layout selection to use the layout names rather
than order: users must make sure their reference doc contains four
layouts with specific names, and if a layout with the right name isn’t
found pandoc will output a warning and use the corresponding layout from
the default reference doc as a fallback.

I believe the use of names rather than order will be clearer to users,
and the clearer errors will help them troubleshoot when things go wrong.

- Add tests for moved layouts
- Add tests for deleted layouts
- Add newly included layouts to slideMaster1.xml to fix tests
2021-08-17 09:35:25 -07:00
Emily Bourke
88d82203a1 Add TemplateWarning log message type [API change]
This is a general warning to use for messages about templates.
2021-08-17 09:35:25 -07:00
Emily Bourke
415f445fc1
Escape backslashes in haddock comments (#7505)
Any literal backslash needs to be escaped: these are currently showing
up as “‘r’” instead of “‘\r’”.

Co-authored-by: Emily Bourke <undergroundquizscene@protonmail.com>
2021-08-17 08:20:33 -07:00
John MacFarlane
abb35d8b0f Fix bug in last commit due to removal of take1WhileP. 2021-08-16 08:09:20 -07:00
OCzarnecki
e37cf4484d
Multimarkdown sub- and superscripts (#5512) (#7188)
Added an extension `short_subsuperscripts` which modifies the behavior
of `subscript` and `superscript`, allowing subscripts or superscripts containing only
alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is
combustible`).  This improves support for multimarkdown. Closes #5512.

Add `Ext_short_subsuperscripts` constructor to `Extension` [API change].
This is enabled by default for `markdown_mmd`.
2021-08-15 21:57:57 -07:00
John MacFarlane
4340bd52c4 Make docx writer sensitive to native_numbering extension.
Figure and table numbers are now only included if `native_numbering`
is enabled.  (By default it is disabled.)  This is a behavior change
with respect to 2.14.1, but the behavior is that of previous versions.

The change was necessary to avoid incompatibilities between pandoc's
native numbering and third-party cross reference filters like
pandoc-crossref.

Closes #7499.
2021-08-15 15:05:54 -07:00
John MacFarlane
82638ad53b Convert Quoted in bib entries to special Spans...
before passing them off to citeproc.
This ensures that we get proper localization and flipflopping
if, e.g., quotes are used in titles.

Closes jgm/citeproc#87.
2021-08-13 19:25:29 -07:00
John MacFarlane
15683bb607 Citeproc: avoid odd handling of quotes.
citeproc changes allow us to ignore Quoted elements;
citeproc now uses its own method for represented quoted
things, and only localizes and flipflops quotes it adds itself.

See #87.

The one thing left to do is to convert Quoted elements in
bibliography databases (esp. titles) to `Span ("",["csl-quoted"],[])`
before passing them to citeproc, IF the localized quotes
for the quote type match the standard inverted commas.
2021-08-13 18:13:06 -07:00
John MacFarlane
05640f9a21 Removed quote localization from citeproc processing.
This is now done in citeproc itself.
2021-08-13 17:30:54 -07:00
John MacFarlane
418155aa95 Fix raw LaTeX injection issue (LaTeX writer).
Using a code block containing `\end{verbatim}`, one could
inject raw TeX into a LaTeX document even when `raw_tex`
is disabled.  Thanks to Augustin Laville for noticing the
bug.

Closes #7497.
2021-08-13 11:27:04 -07:00
John MacFarlane
e8d7d157fd LaTeX reader: proper implicit grouping around environment macros. 2021-08-13 10:41:36 -07:00
John MacFarlane
3cfcfacd72 Use Prelude from base-compat for ghc 8.4 too.
We were having trouble building on ghc 8.4 because of
the lack of a Foldable instance for (Alt Maybe) in
base < 4.12.

Mystery: for some reason our builds were failing for gitit
but not in the pandoc CI.
2021-08-12 09:24:27 -07:00
John MacFarlane
ec34497bc1 Try fixing compile error on older ghcs.
See https://github.com/jgm/gitit/runs/3308381697
2021-08-11 23:14:43 -07:00
John MacFarlane
073895c340 Fix some lint issues. 2021-08-11 17:53:39 -07:00
John MacFarlane
dd1a956a8a LaTeX reader: Support \global before \def, \let, etc.
See #7494.
2021-08-11 16:28:53 -07:00
John MacFarlane
e3a263df46 Fix scope for LaTeX macros.
They should by default scope over the group in which they
are defined (except `\gdef` and `\xdef`, which are global).
In addition, environments must be treated as groups.

We handle this by making sMacros in the LaTeX parser state
a STACK of macro tables. Opening a group adds a table to
the stack, closing one removes one.  Only the top of the stack
is queried.

This commit adds a parameter for scope to the Macro constructor
(not exported).

Closes #7494.
2021-08-11 16:14:34 -07:00
John MacFarlane
a0e44b1ff6 LaTeX reader: improve handling of plain TeX macro primitives.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
  macros (so are equivalent to `\gdef` and `\xdef`).  This should be
  fixed by scoping macro definitions to groups, in a future commit.

Closes #7474.
2021-08-11 10:32:52 -07:00
John MacFarlane
3a924d8f96 HTML reader: treat commments as blank when parsing.
This modifies pBlank.  Previously comments could sometimes
flummox the parser.

Cloes #7482.
2021-08-10 12:50:23 -07:00
John MacFarlane
3d7120083a Fix RTF table parsing bug that created undesired nested tables.
Closes #7488.
2021-08-10 11:09:12 -07:00
John MacFarlane
6543b05116 Add RTF reader.
- `rtf` is now supported as an input format as well as output.
- New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change]

Closes #3982.
2021-08-10 10:48:55 -07:00
John MacFarlane
c0b68b2030 Allow --slide-level=0.
When the slide level is set to 0, headings won't be used at all
in splitting the document into slides. Horizontal rules must be
used to separate slides.

Closes #7476.
2021-08-08 11:20:26 -07:00
John MacFarlane
dea1f0f080 RTF writer: emit \outlinelevel for section headings. 2021-08-04 16:37:20 -06:00
mt_caret
407de98b5e
Stop using the HTTP package. (#7456)
We only depend on the urlEncode function in the package, which is also
provided by http-types. The HTTP package also depends on the network
package, which has difficulty building on ghcjs.

Add internal module Text.Pandoc.Network.HTTP, exporting `urlEncode`.
2021-08-03 15:53:05 -06:00
Peter Fabinski
8667ba2bcc
LaTeX table writer: Increase column width precision (#7466)
In some cases, the rounding performed by the LaTeX table
writer would introduce visible overrun outside the text
area.
This adds two more decimal places to the width values.
2021-08-03 15:34:39 -06:00
John MacFarlane
f938378d00 RTF writer: omit \bin in \pict.
According to the spec, this is not needed or wanted when
the data is in hexadecimal format, as it is here.
2021-08-01 22:45:41 -06:00
John MacFarlane
f145aea0f9 parseFromString: preserve at least the source directory.
Previously we just set the source name to "chunk" when parsing
from strings, to avoid misleading source positions.

This had the side effect that `rebase_relative_paths` would break
inside sections that were parsed as strings.

So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk".

Closes #7464.
2021-07-29 14:54:25 -06:00
John MacFarlane
1f1a30bbf6 LaTeX writer: Use ulem for underline.
ulem is conditionally included already when the `strikeout`
variable is set, so we set this when there is underlined text,
and use `\uline` instead of `\underline`.

This fixes wrapping for underlined text.
Closes #7351.
2021-07-22 23:05:43 -07:00
John MacFarlane
832196fb17 MIME: use image/x-xcf instead of application/x-xcf.
Closes #7454.
2021-07-22 13:08:30 -07:00
John MacFarlane
31a5bccd57 LaTeX reader: avoid trailing hyphen in translating languages.
Previously `\foreignlanguage{english}` turned into `<span lang="en-">`.
The same issue affected Arabic.

Closes #7447.
2021-07-17 23:07:53 -07:00
John MacFarlane
46099e79de DocBook reader: handle images with imageobjectco elements.
Closes #7440.
2021-07-16 13:10:45 -07:00
John MacFarlane
493522c562 LaTeX reader: Support \cline in LaTeX tables.
Closes #7442.
2021-07-16 12:04:43 -07:00
John MacFarlane
18270c7a39 PDF: Fix svgIn path error.
We were duplicating the temp directory; this didn't show up
on macOS or linux because there we use absolute paths for
the temp directory.

Closes #7431.
2021-07-16 11:39:02 -07:00
Jan Tojnar
06408d08e5
DocBook reader: add support for citerefentry (#7437)
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element.
These days, refentry is more general, for example the element documentation pages linked below are each a refentry.

As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot.

https://tdg.docbook.org/tdg/5.1/citerefentry.html
https://tdg.docbook.org/tdg/5.1/manvolnum.html
https://tdg.docbook.org/tdg/5.1/refentry.html

This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11 15:28:52 -07:00
John MacFarlane
ac0a9da6d8 Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).
We now use source positions from the token stream to tell us
how much of the text stream to consume.  Getting this to
work required a few other changes to make token source positions
accurate.

Closes #7434.
2021-07-11 13:50:28 -07:00
John MacFarlane
477a67061f Always use / when adding directory to image path with extractMedia.
Even on Windows.

May help with #7431.
2021-07-09 14:14:19 -07:00
John MacFarlane
ae22b1e977 RST reader: fix regression with code includes.
With the recent changes to include infrastructure,
included code blocks were getting an extra newline.

Closes #7436.  Added regression test.
2021-07-09 12:27:41 -07:00
Michael Hoffmann
565330033a
Don't incorporate externally linked images in EPUB documents (#7430)
Just like it is possible to avoid incorporating an image in EPUB by
passing `data-external="1"` to a raw HTML snippet, this makes the same
possible for native Images, by looking for an associated `external`
attribute.
2021-07-07 09:26:37 -07:00
Michael Hoffmann
e56e2b0e0b
Recognize data-external when reading HTML img tags (#7429)
Preserve all attributes in img tags.  If attributes have a `data-`
prefix, it will be stripped.  In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06 16:06:29 -07:00
John MacFarlane
e7f8cc5786 T.P.PDF, convertImage: normalize paths.
This will avoid paths on Windows with mixed path separators,
which may cause problems with SVG conversion.

See #7431.
2021-07-06 10:39:47 -07:00
John MacFarlane
f88ebf3ebf Markdown reader: don't try to read contents in self-closing HTML tag.
Previously we had problems parsing raw HTML with self-closing
tags like `<col/>`. The problem was that pandoc would look
for a closing tag to close the markdown contents, but the
closing tag had, in effect, already been parsed by `htmlTag`.

This fixes the issue described in
<https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-07-06 10:22:07 -07:00
John MacFarlane
3ed37f0077 HTML reader: add col, colgroup to 'closes' definitions 2021-07-06 10:21:59 -07:00
John MacFarlane
3a31fe68ef Add command test for #7394.
And fix a small bug in handling of citations in notes, which
led to commas at the end of sentences in some cases.
2021-07-05 15:10:14 -07:00
John MacFarlane
77537b1765 Citeproc: cleanup and efficiency improvement in deNote. 2021-07-05 13:41:01 -07:00
John MacFarlane
ff26af59ac Revamp note citation handling.
Use latest citeproc, which uses a Span with a class rather
than a Note for notes.  This helps us distinguish between
user notes and citation notes.

Don't put citations at the beginning of a note in parentheses.
(Closes #7394.)
2021-07-05 13:19:33 -07:00
Aner Lucero
cb038bb312 HTML5 writer, remove aria-hidden when explicit atl text is provided. 2021-07-02 13:02:52 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
John MacFarlane
a01ba4463f Docx writer: Fixed a couple bugs in Figure numbering. 2021-06-29 11:15:13 -07:00
John MacFarlane
a3d745e485 Docx writer: support figure numbers.
These are set up in such a way that they will work with Word's
automatic table of figures.

Closes #7392.
2021-06-29 09:56:21 -07:00
Aner Lucero
f4ef652a41 Remove duplicated alt text in HTML output. 2021-06-29 09:02:13 -07:00
John MacFarlane
851d037b3e Improve punctuation moving with --citeproc.
Previously, using `--citeproc` could cause punctuation to move in
quotes even when there aer no citations. This has been changed;
now, punctuation moving is limited to citations.

In addition, we only move footnotes around punctuation if the
style is a note style, even if `notes-after-punctuation` is `true`.
2021-06-28 22:41:14 -07:00
John MacFarlane
97b0aa667c Allow $ characters in bibtex keys.
Closes #7409.
2021-06-28 13:34:12 -07:00
John MacFarlane
f045e59248 Text.Pandoc.Error: fix line calculations in reporting parsec errors.
Also remove a spurious initial newline in the error report.
2021-06-28 13:28:49 -07:00
John MacFarlane
4262898fe9 Set proper initial source name in parsing BibTeX.
(For better error messages.)
2021-06-28 13:28:02 -07:00
John MacFarlane
dd098d4e15 Markdown writer: put space between Plain and following fenced Div.
Closes #4465.
2021-06-28 11:33:22 -07:00
John MacFarlane
4a7a0cff29 ImageSize: Add Tiff constructor for ImageType.
[Minor API change]

This allows pandoc to get size information from tiff images.
Closes #7405.
2021-06-23 11:39:50 -07:00
John MacFarlane
235cdea629 reveal.js writer: Go back to setting boolean values for variables.
In a previous commit we used strings because boolean False
wouldn't render as `false`. This is changed in the dev
version ofdoctemplates, so we can go back to the more
straightforward approach.
2021-06-23 09:54:14 -07:00
John MacFarlane
1b07997f4a Fix regression with comment-only YAML metadata blocks.
Closes #7400.
2021-06-22 09:55:50 -07:00
John MacFarlane
086790d986 Fix unneeded import 2021-06-22 09:49:24 -07:00
John MacFarlane
8eed5b90d0 LaTeX writer: add strut at end of minipage if it contains...
line breaks.  Without them, the last line is shorter
than it should be, at least in some cases.
2021-06-21 23:33:00 -07:00
John MacFarlane
9867231779 Revert "LaTeX writer: put a strut after a line break (\\)."
This reverts commit e2a7ecb5f7.
2021-06-21 23:19:40 -07:00
John MacFarlane
e2a7ecb5f7 LaTeX writer: put a strut after a line break (\\).
This ensures that we have proper spacing before the next
line (which might e.g. be a table bottom border).
This gives better results in cases like test/command/7272.md.
2021-06-21 23:17:43 -07:00
John MacFarlane
0352f7845b Improve emailAddress in Text.Pandoc.Parsing.
Previously the parser would accept characters in domains
that are illegal in domains, and this sometimes caused it
to gobble bits of the following text.

Closes #7398.

Note that this change, by itself, caused some txt2tag reader
tests to fail. txt2tags allows bare email addresses with
a following form query.  So, in addition to the change
to emailAddress, we modify the txt2tags parser so it can
still handle these cases.
2021-06-21 22:35:07 -07:00
John MacFarlane
ed3974a254 LaTeX writer: always use a minipage for cells with line breaks...
if width information is available.  Otherwise the way we treat them can
lead to content that overflows a cell.

Closes #7393.
2021-06-21 18:25:36 -07:00
John MacFarlane
eee648447a LaTeX writer: Use \strut instead of ~ before \\ in empty line. 2021-06-21 18:25:07 -07:00
John MacFarlane
14b2eb2aeb reveal.js writer: better handling of options.
Previously it was impossible to specify false values for
options that default to true; setting the option to false
just caused the portion of the template setting the option
to be omitted.

Now we prepopulate all the variables with their default
values, including them unconditionally and allowing them
to be overridden.
2021-06-21 16:40:52 -07:00
John MacFarlane
82ad855f38 Markdown writer: Fix regression in code blocks with attributes.
Code blocks with a single class but nonempty attributes
were having attributes drop as a result of #7242.

Closes #7397.
2021-06-21 08:49:00 -07:00
John MacFarlane
3fb5499dd6 insertMediaBag: ensure we get sane mediaPath for URLs.
Long URLs cannot be treated as mediaPaths, but System.FilePath's
`isRelative` often returns True for them.  So we add a check
for an absolute URL.  We also ensure that extensions are derived
only from the path portion of URLs (previously a following query
was being included).

Closes #7391.
2021-06-18 13:19:24 -07:00
John MacFarlane
cfa26e3ca0 Docx reader: handle absolute URIs in Relationship Target.
Closes #7374.
2021-06-12 13:56:09 -07:00
John MacFarlane
ea53a1dc5c Markdown writer: allow pipe_tables to be disabled for commonmark...
(commonmark_x, gfm).  Closes #7375.
2021-06-12 10:20:19 -07:00
John MacFarlane
b0cd6c6224 Fix regression in citeproc processing.
If inline references are used (in the metadata `references` field),
we should still only include in the bibliography items that are
actually cited -- unless `nocite` is used.

Closes #7376.
2021-06-12 10:16:44 -07:00
John MacFarlane
3776e828a8 Fix MediaBag regressions.
With the 2.14 release `--extract-media` stopped working as before;
there could be mismatches between the paths in the rendered document and
the extracted media.

This patch makes several changes (while keeping the same API).

The `mediaPath` in 2.14 was always constructed from the SHA1 hash of
the media contents.  Now, we preserve the original path unless it's
an absolute path or contains `..` segments (in that case we use a path
based on the SHA1 hash of the contents).

When constructing a path from the SHA1 hash, we always use the
original extension, if there is one. Otherwise we look up an
appropriate extension for the mime type.

`mediaDirectory` and `mediaItems` now use the `mediaPath`, rather
than the mediabag key, for the first component of the tuple.
This makes more sense, I think, and fits with the documentation
of these functions; eventually, though, we should rework the API so that
`mediaItems` returns both the keys and the MediaItems.

Rewriting of source paths in `extractMedia` has been fixed.

`fillMediaBag` has been modified so that it doesn't modify
image paths (that was part of the problem in #7345).

We now do path normalization (e.g. `\` separators on Windows) only
in writing the media; the paths are left unchanged in the image
links (sensibly, since they might be URLs and not file paths).

These changes should restore the original behavior from before 2.14.

Closes #7345.
2021-06-10 16:47:02 -07:00
John MacFarlane
aa79b3035c T.P.MIME, extensionFromMimeType: add a few special cases.
When we do a reverse lookup in the MIME table, we just get the
last match, so when the same mime type is associated with several
different extensions, we sometimes got weird results, e.g. `.vs`
for `text/plain`.  These special cases help us get the most standard
extensions for mime types like `text/plain`.
2021-06-10 16:36:54 -07:00
Albert Krewinkel
c7dd33d5aa
Docx writer: fix handling of empty table headers
A table header which does not contain any cells is now treated as an
empty header.

Fixes: #7369
2021-06-10 18:36:49 +02:00
Albert Krewinkel
55bcd4b4fb
Lua utils: fix handling of table headers in from_simple_table
Passing an empty list of header cells now results in an empty table
header.

Fixes: #7369
2021-06-10 18:36:49 +02:00
John MacFarlane
76e5f047b0 Citeproc: avoid duplicate classes and attributes on refs div. 2021-06-08 17:51:53 -07:00
John MacFarlane
21cc52abe3 LaTeX writer: Fix regression in table header position.
In recent versions the table headers were no longer bottom-aligned
(if more than one line).  This patch fixes that by using minipages
for table headers in non-simple tables.

Closes #7347.
2021-06-05 14:13:58 -06:00
Jan Tojnar
c550bf8482 CommonMark writer: do not use simple class for fenced-divs
In https://github.com/jgm/pandoc/pull/7242, we introduced a simple attribute style for for code blocks and fenced divs with a single class but turns out the CommonMark extension does not support it for fenced divs.

https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/fenced_divs.md
2021-06-05 13:51:18 -06:00
Jan Tojnar
7a3ee9d3d8 CommonMark writer: do not throw away attributes when Ext_attributes is enabled
Ext_attributes covers at least the following:

- Ext_fenced_code_attributes
- Ext_header_attributes
- Ext_inline_code_attributes
- Ext_link_attributes
2021-06-05 13:51:18 -06:00
Jan Tojnar
c6f8c38c49 Markdown writer: re-use functions from Inline
Instead of duplicating linkAttributes and attrsToMarkdown, let’s just use those from the Inline module.
2021-06-05 13:51:18 -06:00
Jan Tojnar
c8ab8bccf2 DocBook reader: Add support for danger element
Added in DocBook 5.2:

- https://github.com/docbook/docbook/pull/64
- https://tdg.docbook.org/tdg/5.2/danger.html
2021-06-05 08:02:21 -06:00
Jan Tojnar
af9de925de DocBook writer: Remove non-existent admonitions
attention, error and hint are actually just reStructuredText specific.
danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-06-05 08:02:21 -06:00
John MacFarlane
b6c04383e4 T.P.Class.IO: normalise path in writeMedia.
This ensures that we get `\` separators on Windows.
2021-06-03 18:34:38 -06:00
John MacFarlane
311736fb0a Text.Pandoc.PDF: only print relevant part of environment on --verbose. 2021-06-02 15:21:13 -06:00
John MacFarlane
2b5dad9912 Fix regression in 2.14 for generation of PDFs with SVGs.
Closes #7344.
2021-06-02 10:42:22 -06:00
John MacFarlane
3b628f7664 HTML writer: Don't omit width attribute on div.
Closes #7342.
2021-06-01 21:57:49 -06:00
John MacFarlane
2e4ef14d91 Markdown reader: fix pipe table regression in 2.11.4.
Previously pipe tables with empty headers (that is, a header
line with all empty cells) would be rendered as headerless
tables.  This broke in 2.11.4.

The fix here is to produce an AST with an empty table head
when a pipe table has all empty header cells.

Closes #7343.
2021-06-01 21:44:55 -06:00
John MacFarlane
abb59bd582 LaTeX reader: don't allow optional * on symbol control sequences.
Generally we allow optional starred variants of LaTeX commands
(since many allow them, and if we don't accept these explicitly,
ignoring the star usually gives acceptable results).  But we
don't want to do this for `\(*\)` and similar cases.

Closes #7340.
2021-06-01 13:54:51 -06:00
John MacFarlane
62f46b3995 Fix regression with commonmark/gfm yaml metdata block parsing.
A regression in 2.14 led to the document body being omitted
after YAML metadata in some cases.  This is now fixed.

Closes #7339.
2021-05-31 21:34:51 -06:00
John MacFarlane
fc70f44ee2 HTML reader: fix column width regression.
Column widths specified with a style attribute were
off by a factor of 100 in 2.14.

Closes #7334.
2021-05-30 17:15:14 -07:00
John MacFarlane
cc206af392 Have LoadedResource use relative paths.
The immediate reason for this is to allow the test output of #3752
to work on both windows and linux.
2021-05-30 10:23:00 -07:00
John MacFarlane
c2f46e6df4 Docx writer: fix regression on captions.
The "Table Caption" style was no longer getting applied.
(It was overwritten by "Compact.")

Closes #7328.
2021-05-30 10:07:28 -07:00
John MacFarlane
cc6dcf0392 Markdown reader: in rebasePaths, check for both Windows and Posix
absolute paths.  Previously Windows pandoc was treating
`/foo/bar.jpg` as non-absolute.
2021-05-29 17:36:30 -07:00
John MacFarlane
0d7103de7e In rebasePath, check for absolute paths two ways.
isAbsolute from FilePath doesn't return True on Windows
for paths beginning with `/`, so we check that separately.
2021-05-29 14:41:28 -07:00
John MacFarlane
b6b2331fdc Support rebase_relative_paths for commonmark based formats.
(Including `gfm`.)
2021-05-28 13:58:44 -07:00
Emily Bourke
56b211120c
Docx reader: Support new table features.
* Column spans
* Row spans
  - The spec says that if the `val` attribute is ommitted, its value
    should be assumed to be `continue`, and that its values are
    restricted to {`restart`, `continue`}. If the value has any other
    value, I think it seems reasonable to default it to `continue`. It
    might cause problems if the spec is extended in the future by adding
    a third possible value, in which case this would probably give
    incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
  - The table description element is like alt text for a table (along
    with the table caption element). It seems like we should include
    this somewhere, but I’m not 100% sure how – I’m pairing it with the
    simple caption for the moment. (Should it maybe go in the block
    caption instead?)
* Detect table captions
  - Check for caption paragraph style /and/ either the simple or
    complex table field. This means the caption detection fails for
    captions which don’t contain a field, as in an example doc I added
    as a test. However, I think it’s better to be too conservative: a
    missed table caption will still show up as a paragraph next to the
    table, whereas if I incorrectly classify something else as a table
    caption it could cause havoc by pairing it up with a table it’s
    not at all related to, or dropping it entirely.
* Update tests and add new ones

Partially fixes: #6316
2021-05-28 20:15:23 +02:00
Emily Bourke
44484d0dee
Docx reader: Read table column widths. 2021-05-28 20:15:23 +02:00
John MacFarlane
4842c5fb82 Two citeproc locator/suffix improvements:
- Recognize locators spelled with a capital letter.
  Closes #7323.
- Add a comma and a space in front of the suffix if it doesn't start
  with space or punctuation.  Closes #7324.
2021-05-27 18:28:52 -07:00
John MacFarlane
4b16d181e7 rebase_relative_paths: leave empty paths unchanged. 2021-05-27 14:16:37 -07:00
John MacFarlane
0661ce699f rebase_relative_paths extension: don't change fragment paths.
We don't want a pure fragment path to be rewritten, since
these are used for cross-referencing.
2021-05-27 13:53:26 -07:00
John MacFarlane
6972a7dc91 Modify rebase_reference_links treatment of reference links/images.
The directory is based on the file containing the link
reference, not the file containing the link, if these differ.
2021-05-27 11:26:38 -07:00
John MacFarlane
cbe16b2866 Citeproc: Don't detect math elements as locators.
Closes #7321.
2021-05-27 10:49:45 -07:00
John MacFarlane
834da53058 Add rebase_relative_paths extension.
- Add manual entry for (non-default) extension
  `rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
  in Text.Pandoc.Extensions [API change]. When enabled, this
  extension rewrites relative image and link paths by prepending
  the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.

Closes #3752.

NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
2021-05-27 10:38:25 -07:00
John MacFarlane
81eadfd99a LaTeX reader: improve \def and implement \newif.
- Improve parsing of `\def` macros.  We previously set "verbatim mode"
  even for parsing the initial `\def`; this caused problems for things
  like
  ```
  \def\foo{\def\bar{BAR}}
  \foo
  \bar
  ```
- Implement `\newif`.
- Add tests.
2021-05-27 09:15:04 -07:00
John MacFarlane
8d5014fdfc Logging: remove single quotes around paths in messages.
We weren't doing it consistently and it seems unnecessary.
2021-05-25 11:53:49 -07:00
Albert Krewinkel
105a50569b Allow compilation with base 4.15 2021-05-25 11:52:49 -07:00
Albert Krewinkel
bb2530caa4 Use haddock-library-1.10.0 2021-05-25 11:52:49 -07:00
John MacFarlane
f2c1b57469 PandocMonad: add info message in downloadOrRead...
indicating what path local resources have been loaded from.
2021-05-25 10:08:30 -07:00
John MacFarlane
fb40c8109d Logging: add LoadedResource constructor to LogMessage.
[API change]

This is for INFO-level messages telling where image data has been
loaded from.  (This can vary because of the resource path.)
2021-05-25 10:07:24 -07:00
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
John MacFarlane
8511f6fdf6 MediaBag improvements.
In the current dev version, we will sometimes add
a version of an image with a hashed name, keeping
the original version with the original name, which
would leave to undesirable duplication.

This change separates the media's filename from the
media's canonical name (which is the path of the link
in the document itself).  Filenames are based on SHA1
hashes and assigned automatically.

In Text.Pandoc.MediaBag:

- Export MediaItem type [API change].
- Change MediaBag type to a map from Text to MediaItem [API change].
- `lookupMedia` now returns a `MediaItem` [API change].
- Change `insertMedia` so it sets the `mediaPath` to
  a filename based on the SHA1 hash of the contents.
  This will be used when contents are extracted.

In Text.Pandoc.Class.PandocMonad:

- Remove `fetchMediaResource` [API change].

Lua MediaBag module has been changed minimally. In the future
it would be better, probably, to give Lua access to the full
MediaItem type.
2021-05-24 09:20:44 -07:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
John MacFarlane
1af2cfb287 Handle relative lengths (e.g. 2*) in HTML column widths.
See <https://www.w3.org/TR/html4/types.html#h-6.6>.

"A relative length has the form "i*", where "i" is an integer. When
allotting space among elements competing for that space, user agents
allot pixel and percentage lengths first, then divide up remaining
available space among relative lengths. Each relative length receives a
portion of the available space that is proportional to the integer
preceding the "*". The value "*" is equivalent to "1*". Thus, if 60
pixels of space are available after the user agent allots pixel and
percentage space, and the competing relative lengths are 1*, 2*, and 3*,
the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and
the 3* will be alloted 30 pixels."

Closes #4063.
2021-05-22 22:03:54 -07:00
John MacFarlane
80b4b3fe82 Revert "HTML reader: simplify col width parsing"
This reverts commit f76fe2ab56.
2021-05-22 22:03:51 -07:00