Commit graph

14709 commits

Author SHA1 Message Date
Emily Bourke
0ebe65e651 pptx: Fix logic for choosing Comparison layout
There was a mistake in the logic used to choose between the Comparison
and Two Content layouts: if one column contained only non-text (an image
or a table) and the other contained only text, the Comparison layout was
chosen instead of the desired Two Content layout.

This commit fixes that logic:

> If either column contains text followed by non-text, use Comparison.
  Otherwise, use Two Content.

It also adds a test asserting this behaviour.
2021-09-13 08:30:36 -07:00
John MacFarlane
6271b09c50 Docx writer: make id used in native_numbering predictable.
If the image has the id IMAGEID, then we use the id ref_IMAGEID
for the figure number.  Closes #7551.

This allows one to create a filter that adds a figure number
with figure name, e.g.

     <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple>

For this to be possible it must be possible to predict the
figure number id from the image id.

If images lack an id, an id of the form `ref_fig1` is used.
2021-09-12 15:30:29 -07:00
John MacFarlane
d43f9cf414 Add note to Security section that commonmark is better...
than markdown as far as pathological performance goes.
2021-09-12 11:10:05 -07:00
John MacFarlane
84b5c55448 Use latest dev citeproc. 2021-09-10 21:29:44 -07:00
Kolen Cheung
c66cedaa71 fix!(ipynb writer): improve round trip identity
for raw cell output

BREAKING CHANGE:
The Jupyter ecosystem, including nbconvert, lab and notebook,
deviated from their own spec in nbformat,
where they used the key `raw_mimetype` instead of `format`.

Moreover, the mime-type of rst used in Jupyter
deviated from that suggested by
https://docutils.sourceforge.io/FAQ.html
and is defined as `text/restructuredtext`
when chosen from "Raw NBConvert Format" in Jupyter.

So while this is backward-compatible,
it should matches the real world usage better,
hence improving the round-trip "identity" in raw-cell.

See #229, jupyter/nbformat#229.
2021-09-10 21:11:28 -07:00
Kolen Cheung
17092454c5 feat(ipynb writer): add more Jupyter's "Raw NBConvert Format"
Adds more formats that Jupyter's "Raw NBConvert Format" uses
natively (asciidoc),
and maps more formats to text/html whenever it makes sense.
2021-09-10 21:11:28 -07:00
Kolen Cheung
3483a54c72 feat(ipynb reader): get cell output mime from raw_mimetype too
While the spec defined format, in practice raw_mimetype is used.
See jupyter/nbformat#229
2021-09-10 21:11:28 -07:00
Kolen Cheung
e6bf1626d2 feat(ipynb reader): add more Jupyter's "Raw NBConvert Format"
This adds most of the available formats selectable from
Jupyter's interface "Raw NBConvert Format".
2021-09-10 21:11:28 -07:00
Kolen Cheung
6aa1087b97 fix!: rst mime type
BREAKING CHANGE:
fix rst mime type according to
https://docutils.sourceforge.io/FAQ.html
2021-09-10 21:11:28 -07:00
Emily Bourke
1dba1e6dc8 pptx: Copy embedded fonts from reference doc
We already copy the relationships and elements in presentation.xml for
embedded fonts, so at the moment using a reference doc with embedded
fonts is broken, producing a pptx that PowerPoint says needs repairing.

This commit copies the fonts over, which I believe is all that’s needed
to work correctly with reference docs with embedded fonts.
2021-09-10 17:06:45 -07:00
Emily Bourke
ec7cea294d pptx: Fix presentation rel numbering
Before now, the numbering of rIds was inconsistent when making the
presentation XML and when making the presentation relationships XML.

For the relationships, the slides were inserted into the rId order after
the first master, and everything else was moved up out of the way.
However, this change was then missed in the presentation XML, I think
because `envSlideOffset` was never set. The result was that any slide
masters after the first would have the wrong rIds in the presentation
XML, clashing with the slides, which would lead PowerPoint to view
produced files as corrupt. As well, other relationships (like embedded
fonts) would have their rId changed in the relationships XML but not in
the presentation XML.

This commit:

- Removes `envSlideOffset` in favour of directly passed function
  arguments
- Inserts the slides into the rId order after all masters rather than
  after the first
- Updates any other rIds in presentation.xml that need to be changed
2021-09-10 17:06:45 -07:00
Emily Bourke
2b98991551 pptx: Include all themes in output archive
- Accept test changes: they’re adding the second theme (for all tests
  not containing speaker notes), or changing its position in the
  XML (for the ones containing speaker notes).
2021-09-10 17:06:45 -07:00
Emily Bourke
b60c6157fe pptx: Don’t add relationships unnecessarily
Before now, for any layouts added to the output from the default
reference doc, the relationships were unconditionally added to the
output. However, if there was already a layout in slideMaster1 at the
same index then that results in duplicate relationships.

This commit checks first, and only adds the relationship if it doesn’t
already exist.
2021-09-10 17:06:45 -07:00
Emily Bourke
8ec9b884f1 pptx: Fix capitalisation of notesMasterId
I don’t think this has caused any problems, but before now it’s been
"NotesMasterId", which is incorrect according to [ECMA-376].

[ECMA-376]: https://www.ecma-international.org/publications-and-standards/standards/ecma-376/
2021-09-10 17:06:45 -07:00
John MacFarlane
8beca46611 Fix command test for #7557. 2021-09-10 12:07:11 -07:00
John MacFarlane
78b2d74756 Remove redundant import. 2021-09-10 11:02:22 -07:00
John MacFarlane
0216a2f504 Org reader: don't parse a list as first item in a list item.
Closes #7557.
2021-09-10 09:50:05 -07:00
John MacFarlane
12b3ee3787 MANUAL: Document formats affected by --reference-location. 2021-09-10 09:32:51 -07:00
Francesco Mazzoli
99a4d1d0b0
Support --reference-location for HTML output (#7461)
The HTML writer now supports `EndOfBlock`, `EndOfSection`, and
`EndOfDocument` for reference locations.  EPUB and HTML slide
show formats are also affected by this change.

This works similarly to the markdown writer, but with special care
taken to skipping section divs with what regards to the block level.

The change also takes care to not modify the output if `EndOfDocument`
is used.
2021-09-10 09:30:05 -07:00
Kolen Cheung
1481dae629
Ipynb reader handleData: support text/markdown (#7561)
`text/markdown` is now a supported mime type for raw output.
2021-09-10 09:26:55 -07:00
John MacFarlane
37e30560ad Use dev version of citeproc. 2021-09-09 23:27:58 -07:00
John MacFarlane
0b1c5a87da RTF reader: support \binN for binary image data. 2021-09-08 09:30:58 -07:00
John MacFarlane
ddfa7b2a63 App: Issue NotUTF8Encoded warning when falling back to latin1. 2021-09-08 09:30:16 -07:00
John MacFarlane
dee30e2a1b Logging: add NotUTF8Encoded constructor to LogMessage.
[API change]
2021-09-08 09:29:46 -07:00
John MacFarlane
395d65fdbe CI: disable ansi-tricks in tasty.
This will prevent the test output from being overwhelmed
with headings from passing tests.
2021-09-08 09:02:28 -07:00
John MacFarlane
d87c44ed3a Makefile: disable ansi tricks for tasty; use v2- instead of new-. 2021-09-08 09:02:28 -07:00
Quinn
2b427331d9
Rephrase pandoc.path docs (#7548) 2021-09-04 22:47:01 -07:00
John MacFarlane
b185560a8e RTF reader: better handling of \* and bookmarks.
We now ensure that groups starting with `\*` never cause
text to be added to the document.

In addition, bookmarks now create a span between the start
and end of the bookmark, rather than an empty span.
2021-09-04 11:06:01 -07:00
John MacFarlane
aaef51707c Minor renaming to avoid shadowing. 2021-09-04 11:06:01 -07:00
John MacFarlane
481ff8ac44 Extensions: put Ext_short_subsuperscripts in alphabetical order. 2021-09-04 11:06:01 -07:00
Quinn
db03e75e27 Improve order of Image fields
Ensure consistency throughout docs
2021-09-04 09:52:43 -07:00
Quinn
531eb2a92a Add missing type for Image title 2021-09-04 09:51:58 -07:00
John MacFarlane
10c4719076 RTF reader: if doc begins with {\rtf1 ... } only parse its contents.
Some documents seem to have non-RTF (e.g. XML) material after the
`{\rtf1 ... }` group.
2021-09-03 21:50:30 -07:00
John MacFarlane
e5d0b702c7 RTF reader: Ignore \pgdsc group.
Otherwise we get style names treated as test.
2021-09-03 19:52:52 -07:00
Emily Bourke
b82a01b688 pptx: Add support for more layouts
Until now, the pptx writer only supported four slide layouts: “Title
Slide” (used for the automatically generated metadata slide), “Section
Header” (used for headings above the slide level), “Two Column” (used
when there’s a columns div containing at least two column divs), and
“Title and Content” (used for all other slides).

This commit adds support for three more layouts: Comparison, Content
with Caption, and Blank.

- Support “Comparison” slide layout

  This layout is used when a slide contains at least two columns, at
  least one of which contains some text followed by some non-text (e.g.
  an image or table). The text in each column is inserted into the
  “body” placeholder for that column, and the non-text is inserted into
  the ObjType placeholder. Any extra content after the non-text is
  overlaid on top of the preceding content, rather than dropping it
  completely (as currently happens for the two-column layout).

  + Accept straightforward test changes

    Adding the new layout means the “-deleted-layouts” tests have an
    additional layout added to the master and master rels.

  + Add new tests for the comparison layout
  + Add new tests to pandoc.cabal

- Support “Content with Caption” slide layout

  This layout is used when a slide’s body contains some text, followed by
  non-text (e.g. and image or a table). Before now, in this case the image
  or table would break onto a new slide: to get that output again, users
  can add a horizontal rule before the image or table.

  + Accept straightforward tests

    The “-deleted-layouts” tests all have an extra layout and relationship
    in the master for the Content with Caption layout.

  + Accept remove-empty-slides test

    Empty slides are still removed, but the Content with Caption layout is
    now used.

  + Change slide-level-0/h1-h2-with-text description

    This test now triggers the content with caption layout, giving a
    different (but still correct) result.

  + Add new tests for the new layout
  + Add new tests to the cabal file

- Support “Blank” slide layout

  This layout is used when a slide contains only blank content (e.g.
  non-breaking spaces). No content is inserted into any placeholders in
  the layout.

  Fixes #5097.

  + Accept straightforward test changes

    Blank layout now copied over from reference doc as well, when
    layouts have been deleted.

  + Add some new tests

    A slide should use the blank layout if:

    - It contains only speaker notes
    - It contains only an empty heading with a body of nbsps
    - It contains only a heading containing only nbsps

- Change ContentType -> Placeholder

  This type was starting to have a constructor for each placeholder on
  each slide (e.g. `ComparisonUpperLeftContent`). I’ve changed it
  instead to identify a placeholder by type and index, as I think that’s
  clearer and less redundant.

- Describe layout-choosing logic in manual
2021-09-01 07:16:17 -07:00
Emily Bourke
8dbea49092 pptx: Restructure tests
- Use dashes consistently rather than underscores
- Make a folder for each set of tests
- List test files explicitly (Cabal doesn’t support ** until version
  2.4)
2021-09-01 07:16:17 -07:00
John MacFarlane
0617d3a88b Hlint: ignore "Use void." 2021-08-30 09:38:13 -07:00
Jeroen de Haas
7d91ff28ac Do not leak working directory in TikZ filter 2021-08-30 08:35:35 -07:00
John MacFarlane
5dcd4610e2 Improve asciidoc escaping for -- in URLs. Closes #7529. 2021-08-29 10:12:20 -07:00
John MacFarlane
6180d42434 Add more potential threats to security section of manual. 2021-08-28 22:31:42 -07:00
John MacFarlane
d6d7c9620a Add --sandbox option.
+ Add sandbox feature for readers.  When this option is used,
  readers and writers only have access to input files (and
  other files specified directly on command line).  This restriction
  is enforced in the type system.
+ Filters, PDF production, custom writers are unaffected.  This
  feature only insulates the actual readers and writers, not
  the pipeline around them in Text.Pandoc.App.
+ Note that when `--sandboxed` is specified, readers won't have
  access to the resource path, nor will anything have access to
  the user data directory.
+ Add module Text.Pandoc.Class.Sandbox, defining
  `sandbox`.  Exported via Text.Pandoc.Class. [API change]

Closes #5045.
2021-08-28 22:31:42 -07:00
John MacFarlane
b76796eae8 Remove unneeded import. 2021-08-28 12:44:03 -07:00
John MacFarlane
51caa8b78d Docx writer: handle SVG images.
This change has several parts:

- In Text.Pandoc.App, if the writer is docx, we fill the media
  bag and attempt to convert any SVG images to PNG, adding these
  to the media bag.  The PNG backups have the same filenames as
  the SVG images, but with an added .png extension.  If the conversion
  cannot be done (e.g. because rsvg-convert is not present),
  a warning is omitted.
- In Text.Pandoc.Writers.Docx, we now use Word 2016's syntax for
  including SVG images. If a PNG fallback is present in the media bag,
  we include a link to that too.

It would be helpful if someone with an old Word version could test
to see that the documents we produce can be opened and viewed with
the PNG fallbacks.  If not, then perhaps we can eliminate the
slightly complex code for producing these fallbacks.

Closes #4058.
2021-08-28 12:16:14 -07:00
John MacFarlane
4d7cdc4671 Image: Generalize svgToPng to MonadIO. 2021-08-27 22:27:01 -07:00
John MacFarlane
eb7ed27f3f Add haddock for dpi parameter. 2021-08-27 16:35:58 -07:00
John MacFarlane
7d0db79003 T.P.Image: svgToPng, change first parameter from WriterOptions to Int.
The information we need is just a DPI, so why require more?
2021-08-27 16:25:50 -07:00
Emily Bourke
8e5a79f264 pptx: Make first heading title if slide level is 0
Before this commit, the pptx writer adds a slide break before any table,
“columns” div, or paragraph starting with an image, unless the only
thing before it on the same slide is a heading at the slide level. In
that case, the item and heading are kept on the same slide, and the
heading is used as the slide title (inserted into the layout’s “title”
placeholder).

However, if the slide level is set to 0 (as was recently enabled) this
makes it impossible to have a slide with a title which contains any of
those items in its body.

This commit changes this behaviour: now if the slide level is 0, then
items will be kept with a heading of any level, if the heading’s the
only thing before the item on the same slide.
2021-08-27 09:47:03 -07:00
John MacFarlane
e4d7a6177f Ensure we have unique ids for wp:docPr and pic:cNvPr elements.
This will, I hope, fix #7527 and #7503.
2021-08-27 09:42:59 -07:00
William Lupton
af9d464cee Clarify 'attributes' extension support 2021-08-27 09:14:27 -07:00
John MacFarlane
c29a72ffe7 Comment out unused module. 2021-08-24 23:27:59 -07:00