Commit graph

7499 commits

Author SHA1 Message Date
John MacFarlane
481ff8ac44 Extensions: put Ext_short_subsuperscripts in alphabetical order. 2021-09-04 11:06:01 -07:00
John MacFarlane
10c4719076 RTF reader: if doc begins with {\rtf1 ... } only parse its contents.
Some documents seem to have non-RTF (e.g. XML) material after the
`{\rtf1 ... }` group.
2021-09-03 21:50:30 -07:00
John MacFarlane
e5d0b702c7 RTF reader: Ignore \pgdsc group.
Otherwise we get style names treated as test.
2021-09-03 19:52:52 -07:00
Emily Bourke
b82a01b688 pptx: Add support for more layouts
Until now, the pptx writer only supported four slide layouts: “Title
Slide” (used for the automatically generated metadata slide), “Section
Header” (used for headings above the slide level), “Two Column” (used
when there’s a columns div containing at least two column divs), and
“Title and Content” (used for all other slides).

This commit adds support for three more layouts: Comparison, Content
with Caption, and Blank.

- Support “Comparison” slide layout

  This layout is used when a slide contains at least two columns, at
  least one of which contains some text followed by some non-text (e.g.
  an image or table). The text in each column is inserted into the
  “body” placeholder for that column, and the non-text is inserted into
  the ObjType placeholder. Any extra content after the non-text is
  overlaid on top of the preceding content, rather than dropping it
  completely (as currently happens for the two-column layout).

  + Accept straightforward test changes

    Adding the new layout means the “-deleted-layouts” tests have an
    additional layout added to the master and master rels.

  + Add new tests for the comparison layout
  + Add new tests to pandoc.cabal

- Support “Content with Caption” slide layout

  This layout is used when a slide’s body contains some text, followed by
  non-text (e.g. and image or a table). Before now, in this case the image
  or table would break onto a new slide: to get that output again, users
  can add a horizontal rule before the image or table.

  + Accept straightforward tests

    The “-deleted-layouts” tests all have an extra layout and relationship
    in the master for the Content with Caption layout.

  + Accept remove-empty-slides test

    Empty slides are still removed, but the Content with Caption layout is
    now used.

  + Change slide-level-0/h1-h2-with-text description

    This test now triggers the content with caption layout, giving a
    different (but still correct) result.

  + Add new tests for the new layout
  + Add new tests to the cabal file

- Support “Blank” slide layout

  This layout is used when a slide contains only blank content (e.g.
  non-breaking spaces). No content is inserted into any placeholders in
  the layout.

  Fixes #5097.

  + Accept straightforward test changes

    Blank layout now copied over from reference doc as well, when
    layouts have been deleted.

  + Add some new tests

    A slide should use the blank layout if:

    - It contains only speaker notes
    - It contains only an empty heading with a body of nbsps
    - It contains only a heading containing only nbsps

- Change ContentType -> Placeholder

  This type was starting to have a constructor for each placeholder on
  each slide (e.g. `ComparisonUpperLeftContent`). I’ve changed it
  instead to identify a placeholder by type and index, as I think that’s
  clearer and less redundant.

- Describe layout-choosing logic in manual
2021-09-01 07:16:17 -07:00
John MacFarlane
5dcd4610e2 Improve asciidoc escaping for -- in URLs. Closes #7529. 2021-08-29 10:12:20 -07:00
John MacFarlane
d6d7c9620a Add --sandbox option.
+ Add sandbox feature for readers.  When this option is used,
  readers and writers only have access to input files (and
  other files specified directly on command line).  This restriction
  is enforced in the type system.
+ Filters, PDF production, custom writers are unaffected.  This
  feature only insulates the actual readers and writers, not
  the pipeline around them in Text.Pandoc.App.
+ Note that when `--sandboxed` is specified, readers won't have
  access to the resource path, nor will anything have access to
  the user data directory.
+ Add module Text.Pandoc.Class.Sandbox, defining
  `sandbox`.  Exported via Text.Pandoc.Class. [API change]

Closes #5045.
2021-08-28 22:31:42 -07:00
John MacFarlane
b76796eae8 Remove unneeded import. 2021-08-28 12:44:03 -07:00
John MacFarlane
51caa8b78d Docx writer: handle SVG images.
This change has several parts:

- In Text.Pandoc.App, if the writer is docx, we fill the media
  bag and attempt to convert any SVG images to PNG, adding these
  to the media bag.  The PNG backups have the same filenames as
  the SVG images, but with an added .png extension.  If the conversion
  cannot be done (e.g. because rsvg-convert is not present),
  a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016's syntax for
  including SVG images. If a PNG fallback is present in the media bag,
  we include a link to that too.

It would be helpful if someone with an old Word version could test
to see that the documents we produce can be opened and viewed with
the PNG fallbacks.  If not, then perhaps we can eliminate the
slightly complex code for producing these fallbacks.

Closes #4058.
2021-08-28 12:16:14 -07:00
John MacFarlane
4d7cdc4671 Image: Generalize svgToPng to MonadIO. 2021-08-27 22:27:01 -07:00
John MacFarlane
eb7ed27f3f Add haddock for dpi parameter. 2021-08-27 16:35:58 -07:00
John MacFarlane
7d0db79003 T.P.Image: svgToPng, change first parameter from WriterOptions to Int.
The information we need is just a DPI, so why require more?
2021-08-27 16:25:50 -07:00
Emily Bourke
8e5a79f264 pptx: Make first heading title if slide level is 0
Before this commit, the pptx writer adds a slide break before any table,
“columns” div, or paragraph starting with an image, unless the only
thing before it on the same slide is a heading at the slide level. In
that case, the item and heading are kept on the same slide, and the
heading is used as the slide title (inserted into the layout’s “title”
placeholder).

However, if the slide level is set to 0 (as was recently enabled) this
makes it impossible to have a slide with a title which contains any of
those items in its body.

This commit changes this behaviour: now if the slide level is 0, then
items will be kept with a heading of any level, if the heading’s the
only thing before the item on the same slide.
2021-08-27 09:47:03 -07:00
John MacFarlane
e4d7a6177f Ensure we have unique ids for wp:docPr and pic:cNvPr elements.
This will, I hope, fix #7527 and #7503.
2021-08-27 09:42:59 -07:00
John MacFarlane
c29a72ffe7 Comment out unused module. 2021-08-24 23:27:59 -07:00
John MacFarlane
fd7c140cde Reorganize App to make it easier to limit IO in main loop.
Previously we used liftIO fairly liberally.  The code has
been restructured to avoid this.

A small behavior change is that pandoc will now fall back
to latin1 encoding for inputs that can't be read as UTF-8.
This is what it did previously for content fetched from
the web and not marked as to content type. It makes sense
to do the same for local files.
2021-08-24 22:19:20 -07:00
John MacFarlane
c39ddeb8f8 Text.Pandoc.Class: add readStdinStrict method to PandocMonad.
[API change]
2021-08-24 22:19:15 -07:00
John MacFarlane
8ad22002cb Class: Generalize type of extractMedia.
It was uselessly  restricted to PandocIO, instead of any
instance of PandocMonad and MonadIO.

[API change]
2021-08-24 22:18:37 -07:00
John MacFarlane
bf860df938 T.P.App.OutputSettings: Generalize some types...
so we can run this with any instance of PandocMonad and MonadIO,
not just PandocIO.
2021-08-24 22:18:25 -07:00
John MacFarlane
0efbfb33ad Text.Pandoc.Filter: Generalize type of applyFilters...
from PandocIO to any instance of MonadIO and PandocMonad.
[API change]
2021-08-24 22:18:14 -07:00
John MacFarlane
65e78dac74 PDF: generalize type of makePDF...
instead of PandocIO, it can be used in any instance of
PandocMonad, MonadIO, and MonadMask.

[API change]
2021-08-24 22:18:06 -07:00
John MacFarlane
0df003b099 Lua subsystem and custom writers: generalize types from PandocIO...
to any instance of PandocMonad and MonadIO.

This involves an API change, since the type of
runLua is now

    (PandocMonad m, MonadIO m) => Lua a -> m (Either PandocError a)
2021-08-24 22:17:52 -07:00
John MacFarlane
3f9b7a10ad Markdown reader: fix interaction of --strip-comments and list
parsing.  Use of `--strip-comments` was causing tight lists
to be rendered as loose (as if the comment were a blank line).
Closes #7521.
2021-08-23 22:06:39 -07:00
John MacFarlane
5a23f8ff3e Clean up PDF module.
Previously we had to run runIOorExplode inside withTempDir.
Now that PandocIO is an instance of MonadMask, this is no
longer necessary.
2021-08-22 19:00:43 -07:00
John MacFarlane
d37dea9eeb PandocIO: derive MonadCatch, MonadThrow, MonadMask.
This will allow us to use withTempDir.
2021-08-22 18:35:28 -07:00
John MacFarlane
10a71c484f App: Move output-file writing out of PandocMonad action. 2021-08-22 08:44:50 -07:00
Simon Schuster
591cdca38b LaTeX-parser: restrict \endinput to current file 2021-08-21 18:08:27 -07:00
John MacFarlane
07d847a910 RST reader: Fix :literal: includes.
These should create code blocks, not insert raw RST.
Closes #7513.
2021-08-20 09:54:42 -07:00
John MacFarlane
ef4efa5373 Improve docx reader's robustness in extracting images.
The docx reader made a couple assumptions about how docx
containers were laid out that were not always true, with
the result that some images in documents did not get
found/extracted.

Closes #7511.
2021-08-19 10:50:34 -07:00
Emily Bourke
5616d00d09 pptx: Include image title in description
The image title (i.e. `![alt text](link "title")`) was previously
ignored when writing to pptx. This commit includes it in PowerPoint's
description of the image, along with the link (which was already
included).

Fixes 7352.
2021-08-18 10:10:55 -07:00
John MacFarlane
fd99fe4d7e Revise citeproc code to fit new citeproc 0.5 API.
Linkification of URLs in the bibliography is now done in
the citeproc library, depending on the setting of an option.
We set that option depending on the value of the metadata
field `link-bibliography` (defaulting to true, for consistency
with earlier behavior, though the new behavior includes the
CSL draft recommendation of hyperlinking the title or the whole
entry if a DOI, PMID, PMCID, or URL field is present but not
explicitly rendered).

These changes implement the following recommendations from the
draft CSL v1.0.2 spec (Appendix VI):

> The CSL syntax does not have support for configuration of links.
> However, processors should include links on bibliographic references,
> using the following rules:

> If the bibliography entry for an item renders any of the following
> identifiers, the identifier should be anchored as a link, with the
> target of the link as follows:

> - url: output as is
> - doi: prepend with "`https://doi.org/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"

> If the identifier is rendered as a URI, include rendered URI components
> (e.g. "`https://doi.org/`") in the link anchor. Do not include any other
> affix text in the link anchor (e.g. "Available from: ", "doi: ", "PMID: ").
> If the bibliography entry for an item does not render any of
> the above identifiers, then set the anchor of the link as the item
> title. If title is not rendered, then set the anchor of the link as the
> full bibliography entry for the item. Set the target of the link as one
> of the following, in order of priority:
>
> - doi: prepend with "`https://doi.org/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - url: output as is
>
> If the item data does not include any of the above identifiers, do not
> include a link.
>
> Citation processors should include an option flag for calling
> applications to disable bibliography linking behavior.

Thanks to Benjamin Bray for getting this all working.
2021-08-17 15:34:23 -07:00
John MacFarlane
8451bce6de Rename TemplateWarning -> PowerpointTemplateWarning.
@undergroundquizscene - I think TemplateWarning
is apt to be confusing, since this actually doesn't have
anything to do with what we call 'templates' in pandoc.
Hence the change to a powerpoint-specific name.
2021-08-17 15:31:52 -07:00
Emily Bourke
72823ad947 pptx: Select layouts from reference doc by name
Until now, users had to make sure that their reference doc contains
layouts in a specific order: the first four layouts in the file had to
have a specific structure, or else pandoc would error (or sometimes
successfully produce a pptx file, which PowerPoint would then fail to
open).

This commit changes the layout selection to use the layout names rather
than order: users must make sure their reference doc contains four
layouts with specific names, and if a layout with the right name isn’t
found pandoc will output a warning and use the corresponding layout from
the default reference doc as a fallback.

I believe the use of names rather than order will be clearer to users,
and the clearer errors will help them troubleshoot when things go wrong.

- Add tests for moved layouts
- Add tests for deleted layouts
- Add newly included layouts to slideMaster1.xml to fix tests
2021-08-17 09:35:25 -07:00
Emily Bourke
88d82203a1 Add TemplateWarning log message type [API change]
This is a general warning to use for messages about templates.
2021-08-17 09:35:25 -07:00
Emily Bourke
415f445fc1
Escape backslashes in haddock comments (#7505)
Any literal backslash needs to be escaped: these are currently showing
up as “‘r’” instead of “‘\r’”.

Co-authored-by: Emily Bourke <undergroundquizscene@protonmail.com>
2021-08-17 08:20:33 -07:00
John MacFarlane
abb35d8b0f Fix bug in last commit due to removal of take1WhileP. 2021-08-16 08:09:20 -07:00
OCzarnecki
e37cf4484d
Multimarkdown sub- and superscripts (#5512) (#7188)
Added an extension `short_subsuperscripts` which modifies the behavior
of `subscript` and `superscript`, allowing subscripts or superscripts containing only
alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is
combustible`).  This improves support for multimarkdown. Closes #5512.

Add `Ext_short_subsuperscripts` constructor to `Extension` [API change].
This is enabled by default for `markdown_mmd`.
2021-08-15 21:57:57 -07:00
John MacFarlane
4340bd52c4 Make docx writer sensitive to native_numbering extension.
Figure and table numbers are now only included if `native_numbering`
is enabled.  (By default it is disabled.)  This is a behavior change
with respect to 2.14.1, but the behavior is that of previous versions.

The change was necessary to avoid incompatibilities between pandoc's
native numbering and third-party cross reference filters like
pandoc-crossref.

Closes #7499.
2021-08-15 15:05:54 -07:00
John MacFarlane
82638ad53b Convert Quoted in bib entries to special Spans...
before passing them off to citeproc.
This ensures that we get proper localization and flipflopping
if, e.g., quotes are used in titles.

Closes jgm/citeproc#87.
2021-08-13 19:25:29 -07:00
John MacFarlane
15683bb607 Citeproc: avoid odd handling of quotes.
citeproc changes allow us to ignore Quoted elements;
citeproc now uses its own method for represented quoted
things, and only localizes and flipflops quotes it adds itself.

See #87.

The one thing left to do is to convert Quoted elements in
bibliography databases (esp. titles) to `Span ("",["csl-quoted"],[])`
before passing them to citeproc, IF the localized quotes
for the quote type match the standard inverted commas.
2021-08-13 18:13:06 -07:00
John MacFarlane
05640f9a21 Removed quote localization from citeproc processing.
This is now done in citeproc itself.
2021-08-13 17:30:54 -07:00
John MacFarlane
418155aa95 Fix raw LaTeX injection issue (LaTeX writer).
Using a code block containing `\end{verbatim}`, one could
inject raw TeX into a LaTeX document even when `raw_tex`
is disabled.  Thanks to Augustin Laville for noticing the
bug.

Closes #7497.
2021-08-13 11:27:04 -07:00
John MacFarlane
e8d7d157fd LaTeX reader: proper implicit grouping around environment macros. 2021-08-13 10:41:36 -07:00
John MacFarlane
3cfcfacd72 Use Prelude from base-compat for ghc 8.4 too.
We were having trouble building on ghc 8.4 because of
the lack of a Foldable instance for (Alt Maybe) in
base < 4.12.

Mystery: for some reason our builds were failing for gitit
but not in the pandoc CI.
2021-08-12 09:24:27 -07:00
John MacFarlane
ec34497bc1 Try fixing compile error on older ghcs.
See https://github.com/jgm/gitit/runs/3308381697
2021-08-11 23:14:43 -07:00
John MacFarlane
073895c340 Fix some lint issues. 2021-08-11 17:53:39 -07:00
John MacFarlane
dd1a956a8a LaTeX reader: Support \global before \def, \let, etc.
See #7494.
2021-08-11 16:28:53 -07:00
John MacFarlane
e3a263df46 Fix scope for LaTeX macros.
They should by default scope over the group in which they
are defined (except `\gdef` and `\xdef`, which are global).
In addition, environments must be treated as groups.

We handle this by making sMacros in the LaTeX parser state
a STACK of macro tables. Opening a group adds a table to
the stack, closing one removes one.  Only the top of the stack
is queried.

This commit adds a parameter for scope to the Macro constructor
(not exported).

Closes #7494.
2021-08-11 16:14:34 -07:00
John MacFarlane
a0e44b1ff6 LaTeX reader: improve handling of plain TeX macro primitives.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
  macros (so are equivalent to `\gdef` and `\xdef`).  This should be
  fixed by scoping macro definitions to groups, in a future commit.

Closes #7474.
2021-08-11 10:32:52 -07:00
John MacFarlane
3a924d8f96 HTML reader: treat commments as blank when parsing.
This modifies pBlank.  Previously comments could sometimes
flummox the parser.

Cloes #7482.
2021-08-10 12:50:23 -07:00
John MacFarlane
3d7120083a Fix RTF table parsing bug that created undesired nested tables.
Closes #7488.
2021-08-10 11:09:12 -07:00