Commit graph

1963 commits

Author SHA1 Message Date
John MacFarlane
ccb9db34f8 Add test for #7738. 2021-12-07 23:59:59 -08:00
John MacFarlane
51142c6803 Ipynb reader & writer: properly handle cell "id".
This is passed through if it exists (in Nb4); otherwise
the writer will add a random one so that cells all have
an "id".

Closes #7728.
2021-12-06 23:40:51 -08:00
John MacFarlane
51f6f0e3a1 Improve Markdown writer escaping.
This fixes escaping for '#' in particular.
Closes #7726.
2021-12-03 17:52:47 -08:00
John MacFarlane
619dfa2a2a Markdown reader: don't allow ^ at beginning of link or image label.
This is reserved for footnotes.
Fixes a regression introduced by 0a93acf.

Closes #7723.
2021-11-30 12:53:54 -08:00
Albert Krewinkel
fa838deefc
Lua: remove pandoc.utils.text (#7720)
The new `pandoc.Inlines` function behaves identical on string input, but
allows other Inlines-like arguments as well.

The `pandoc.utils.text` function could be written as

    function pandoc.utils.text (x)
      assert(type(x) == 'string')
      return pandoc.Inlines(x)
    end
2021-11-29 09:12:30 -08:00
Albert Krewinkel
b9222e5cb1
Lua: add constructors pandoc.Blocks and pandoc.Inlines
The functions convert their argument into a list of Block and Inline
values, respectively.
2021-11-28 16:02:42 +01:00
Albert Krewinkel
3692a1d1e8
Lua: use package pandoc-lua-marshal (#7719)
The marshaling functions for pandoc's AST are extracted into a separate
package. The package comes with a number of changes:

  - Pandoc's List module was rewritten in C, thereby improving error
    messages.

  - Lists of `Block` and `Inline` elements are marshaled using the new
    list types `Blocks` and `Inlines`, respectively. These types
    currently behave identical to the generic List type, but give better
    error messages. This also opens up the possibility of adding
    element-specific methods to these lists in the future.

  - Elements of type `MetaValue` are no longer pushed as values which
    have `.t` and `.tag` properties. This was already true for
    `MetaString` and `MetaBool` values, which are still marshaled as Lua
    strings and booleans, respectively. Affected values:

      + `MetaBlocks` values are marshaled as a `Blocks` list;

      + `MetaInlines` values are marshaled as a `Inlines` list;

      + `MetaList` values are marshaled as a generic pandoc `List`s.

      + `MetaMap` values are marshaled as plain tables and no longer
        given any metatable.

  - The test suite for marshaled objects and their constructors has
    been extended and improved.

  - A bug in Citation objects, where setting a citation's suffix
    modified it's prefix, has been fixed.
2021-11-27 17:08:01 -08:00
John MacFarlane
0d25232bbf LaTeX reader: Fix semantics of \ref.
We were including the ams environment type in addition
to the number. This is proper behavior for `\cref` but
not for `\ref`.  To support `\cref` we need to store
the environment label separately.
2021-11-24 19:48:56 -08:00
John MacFarlane
7726b69cd3 LaTeX reader: omit visible content for \label{...}.
Previously we included the text of the label in square brackets,
but this is undesirable in many cases.

See discussion in
<https://github.com/jgm/pandoc/issues/813#issuecomment-978232426>.
2021-11-24 14:47:00 -08:00
John MacFarlane
6072bdcec9 HTML reader: parse attributes on links and images.
Closes #6970.
2021-11-24 11:01:55 -08:00
Albert Krewinkel
a8638894ab
Lua: allow single elements as singleton MetaBlocks/MetaInlines
Single elements should always be treated as singleton lists in the Lua
subsystem.
2021-11-24 16:54:12 +01:00
John MacFarlane
79e6f8db13 Improve detection of pipe table line widths.
Fixed calculation of maximum column widths in pipe tables.
It is now based on the length of the markdown line, rather
than a "stringified" version of the parsed line.  This should
be more predictable for users. In addition, we take into account
double-wide characters such as emojis.

Closes #7713.
2021-11-23 13:29:25 -08:00
Albert Krewinkel
bffd74323c
Lua: add function pandoc.utils.text (#7710)
The function converts a string to `Inlines`, treating interword spaces
as `Space`s or `SoftBreak`s. If you want a `Str` with literal spaces,
use `pandoc.Str`.

Closes: #7709
2021-11-23 09:32:53 -08:00
Albert Krewinkel
0c0945b93c
Lua: split strings into words when treating them as Inline list (#7712)
Using a Lua string where a list of inlines is expected will cause the
string to be split into words, replacing spaces and tabs into
`pandoc.Space()` elements and newlines into `pandoc.SoftBreak()`.

The previous behavior was to treat the string `s` as `{pandoc.Str(s)}`.
The old behavior can be recovered by wrapping the string into a table
`{s}`.
2021-11-23 09:30:48 -08:00
Albert Krewinkel
96a4bbe264
Capture alt-text in JATS figures (#7703)
Co-authored-by: Aner Lucero <4rgento@gmail.com>
2021-11-20 09:48:01 -08:00
John MacFarlane
db9a73c842 Lua tests: reset path and cpath when testing 'require' fallback. 2021-11-19 21:55:14 -08:00
John MacFarlane
4f2eac88aa MediaWiki writer: fix code for generating spans for header IDs.
We need to generate a span when the header's ID doesn't match
the one MediaWiki would generate automatically.  But MediaWiki's
generation scheme is different from ours (it uses uppercase letters,
and `_` instead of `-`, for example).

This means that in going from markdown -> mediawiki, we'll now get
spans before almost every heading, unless explicit identifiers are
used that correspond to the ones MediaWiki auto-generates.
This is uglier output but it's necessary for internal links to
work properly.

See #7697.
2021-11-19 09:05:19 -08:00
John MacFarlane
df5ae1c186 HTML writer: Don't create invalid data- attribute...
for empty attribute key. (It would be better to make these
unrepresentable in the type system, but for now this is
an improvement.)

Closes #7546.
2021-11-19 08:50:18 -08:00
John MacFarlane
25bba0cc62 MediaWiki writer: use HTML spans for anchors when header has id.
Closes #7697.
2021-11-18 21:15:51 -08:00
willj-dev
005dc7ce56
RST reader: handle class attribute for for custom roles (#7700)
Previously the class attribute was ignored, and the name of the role used as the class.
Closes #7699.
2021-11-18 17:33:57 -08:00
Albert Krewinkel
cd91f72843
Lua: set lpeg, re as globals; allow shared lib access via require
The `lpeg` and `re` modules are loaded into globals of the respective
name, but they are not necessarily registered as loaded packages. This
ensures that

- the built-in library versions are preferred when setting the globals,
- a shared library is used if pandoc has been compiled without `lpeg`,
  and
- the `require` mechanism can be used to load the shared library if
  available, falling back to the internal version if possible and
  necessary.
2021-11-17 10:03:04 +01:00
John MacFarlane
c19f063420 Markdown writer: don't create autolinks when this loses information.
Previously we sometimes lost attributes when rendering links as autolinks.

Closes #7692.
2021-11-15 11:06:50 -08:00
Albert Krewinkel
ea268fd8a7
LaTeX reader: add rudimentary support for \autoref (#7693) 2021-11-15 09:40:50 -08:00
Albert Krewinkel
96a01451ef
JATS writer: ensure figures are wrapped with <p> in list items.
This prevents the generation of invalid output.
2021-11-12 13:29:08 +01:00
Christian Despres
abdfefebdf
Writers.Shared: Improve toLegacyTable.
Closes #7683.
(PR #7684)
2021-11-11 20:55:37 -08:00
Albert Krewinkel
d4c73d5e65
Lua: fix argument order in constructor pandoc.Cite.
This restores the old behavior; argument order had been switched
accidentally in pandoc 2.15.
2021-11-09 14:45:36 +01:00
John MacFarlane
0a45f2600a Properly handle commented lines in BibTeX/BibLaTeX.
Closes #7668.
2021-11-08 10:15:53 -08:00
Rowan Rodrik van der Molen
40aa74badc Add <titleabbr> support to DocBook reader 2021-11-08 07:30:20 -08:00
Albert Krewinkel
ab0fe676a8
Lua: ensure that 're' module is always available.
The module is shipped with LPeg.
2021-11-08 12:22:33 +01:00
John MacFarlane
cc46667953 LaTeX reader: add 'uri' class when parsing \url.
Closes #7672.
2021-11-07 21:08:20 -08:00
Albert Krewinkel
6b462e5933 Lua: allow to pass custom reader options to pandoc.read
Reader options can now be passed as an optional third argument to
`pandoc.read`. The object can either be a table or a ReaderOptions value
like `PANDOC_READER_OPTIONS`. Creating new ReaderOptions objects is
possible through the new constructor `pandoc.ReaderOptions`.

Closes: #7656
2021-11-06 09:04:29 -07:00
Rowan Rodrik van der Molen
7a70a46c03
Support for <indexterm>s when reading DocBook (#7607)
* Support for <indexterm>s when reading DocBook
* Update implementation status of `<n-ary>` tags
* Remove non-idiomatic parentheses
* More complete `<indexterm>` support, with tests

Co-authored-by: Rowan Rodrik van der Molen <rowan@ytec.nl>
2021-11-05 10:22:38 -07:00
John MacFarlane
938d557844 Docx reader: don't let first line indents trigger block quotes.
This fixes a regression introduced in pandoc 2.15 by PR #7606.
Closes #7655.
2021-11-02 14:04:38 -07:00
Albert Krewinkel
45bcd7d3f1
Lua: fix typo in SoftBreak constructor 2021-11-02 21:53:08 +01:00
Albert Krewinkel
dbc654e4a7
Lua tests: ensure Inline elements have all expected properties 2021-11-02 21:40:37 +01:00
Albert Krewinkel
421fd736d4
Lua: re-add content property to Strikeout elements
Fixes a regression introduced in 2.15.
2021-11-02 21:22:59 +01:00
Albert Krewinkel
cce49c5d4b
Lua: be more forgiving when retrieving the Image caption property
Fixes a regression introduced in 2.15.
2021-11-02 17:40:07 +01:00
Albert Krewinkel
b26f950cca
Lua: display Attr values using their native Haskell representation 2021-11-02 17:25:47 +01:00
Albert Krewinkel
c467f0fed1
Lua: allow omitting the 2nd parameter in pandoc.Code constructor
Fixes a regression introduced in 2.15 which required users to always
specify an Attr value when constructing a Code element.
2021-11-02 17:23:24 +01:00
Albert Krewinkel
210e4c98b0
Lua: allow to compare, show Citation values
Comparisons of Citation values are performed in Haskell; values are
equal if they represent the same Haskell value. Converting a Citation
value to a string now yields its native Haskell string representation.
2021-11-02 16:49:50 +01:00
Albert Krewinkel
3a0ac52f7b
Lua tests: ensure Block elements have expected properties 2021-11-02 10:23:14 +01:00
Albert Krewinkel
759aa50951
Lua: restore content property on Header elements 2021-11-01 15:43:51 +01:00
Albert Krewinkel
96e76d4cd4
Lua: restore List behavior of MetaList
Fixes a regression introduced in 2.16 which had MetaList elements loose
the `pandoc.List` properties.

Fixes #7650
2021-11-01 08:27:14 +01:00
Albert Krewinkel
3de8f4fdc5
Lua: re-add content property to Link elements
This was a regression introduced in version 2.15.

Fixes: #7647
2021-10-31 11:15:50 +01:00
Tristan Stenner
2444cbc668 Docx writer: add IDs to native_numbering test 2021-10-29 08:40:20 -07:00
Tristan Stenner
31d6a494de Update test golden master for docx native numbering 2021-10-29 08:40:20 -07:00
Albert Krewinkel
f4d9b443d8
Lua: use hslua module abstraction where possible
This will make it easier to generate module documentation in the future.
2021-10-29 17:08:30 +02:00
Albert Krewinkel
e1cf0ad1be
Lua: fix placement of tests for Block elements in pandoc module tests 2021-10-28 14:10:54 +02:00
Albert Krewinkel
7fcf1d6184
Lua: re-add t and tag property to Attr values
Removal of these properties from Attr values was a regression.
2021-10-27 22:32:19 +02:00
John MacFarlane
d226a35c0a Switch back from HsYAML to yaml.
Reasons:

- Performance: HsYAML is around 20 times slower in parsing
  large YAML bibliographies (#6084).
- An issue was submitted to HsYAML, but it hasn't gotten
  any attention.  HsYAML seems borderline unmaintained; it hasn't
  had a commit in over a year.
- Unfortunately this goes back on our attempts to free ourselves
  from C dependencies (#4535).  But I don't see a better alternative
  until a better pure Haskell parser is available.

Closes #6084.

Notes:

- We've removed the FromYAML instances for all types that had
  them, since this is a HsYAML-specific typeclass [API change].
  (The yaml package just uses From/ToJSON.)
- Unlike HsYAML (in the configuration we were using), yaml
  parses 'Y', 'N', 'Yes', 'No', 'On', 'Off' as boolean values.
  Users may need to quote these when they are meant to be
  interpreted as strings.  Similarly, 'null' is parsed as
  a YAML null value (and will be treated as an empty string
  by pandoc rather than the string 'null').  Quoting it will
  force it to be interpreted as a string.
- Some tests had to be adjusted accordingly.
- Pandoc now behaves better when the YAML metadata contains
  escaping errors: instead of just falling back on treating
  the section as a table, it raises a YAML parsing error.
2021-10-27 12:50:51 -07:00
Albert Krewinkel
b95e864ecf
Lua: marshal SimpleTable values as userdata objects 2021-10-26 21:45:16 +02:00
Albert Krewinkel
a493c7029c
Lua: marshal Block values as userdata objects
Properties of Block values are marshalled lazily, which generally
improves performance considerably. Script users may also notice the
following differences:

  - Block element properties can no longer be accessed by numerical
    indexing of the `.c` field. The `.c` property now serves as an alias
    for `.content`, so some filter that used this undocumented method
    for property access may continue to work, while others will need to
    be updated and use proper property names.

  - The marshalled Block elements now have a `show` method, and a
    `__tostring` metamethod. Both return the Haskell string
    representation of the element.

  - Block values now have the Lua type `userdata` instead of `table`.
2021-10-26 14:40:10 +02:00
Albert Krewinkel
230b133db5
Lua: marshal Citation values as userdata objects 2021-10-25 09:08:58 +02:00
John MacFarlane
113a66bd08 Fix more epub files in epub reader tests.
Closes #7586.
2021-10-24 15:24:59 -07:00
John MacFarlane
73a105f59c Clean up wasteland.epub and formatting.epub from reader tests.
Make them valid according to epubcheck.
2021-10-24 15:11:02 -07:00
John MacFarlane
c282451fb2 Fixed test/epub/img.epub and img_no_cover.epub...
so they're valid epubs.
2021-10-24 14:13:04 -07:00
John MacFarlane
6df9dc1262 Fix conformance errors in test/epub/features.epub and
test/epub/formatting.epub.  See #7586.
2021-10-23 23:23:22 -07:00
John MacFarlane
c712d13b67 Org reader: allow an initial :PROPERTIES: drawer to add to metadata.
Closes #7520.
2021-10-22 22:10:25 -07:00
Albert Krewinkel
8523bb01b2 Lua: marshal Attr values as userdata
- Adds a new `pandoc.AttributeList()` constructor, which creates the
  associative attribute list that is used as the third component of
  `Attr` values. Values of this type can often be passed to constructors
  instead of `Attr` values.

- `AttributeList` values can no longer be indexed numerically.
2021-10-22 11:16:51 -07:00
Albert Krewinkel
e4287e6c95 Lua: marshal Pandoc values as userdata 2021-10-22 11:16:51 -07:00
Albert Krewinkel
9e74826ba9 Switch to hslua-2.0
The new HsLua version takes a somewhat different approach to marshalling
and unmarshalling, relying less on typeclasses and more on specialized
types. This allows for better performance and improved error messages.

Furthermore, new abstractions allow to document the code and exposed
functions.
2021-10-22 11:16:51 -07:00
John MacFarlane
0a93acf91a Markdown reader: don't parse links or bracketed spans as citations.
Previously pandoc would parse

    [link to (@a)](url)

as a citation; similarly

    [(@a)]{#ident}

This is undesirable.  One should be able to use example references
in citations, and even if `@a` is not defined as an example
reference, `[@a](url)` should be a link containing an author-in-text
citation rather than a normal citation followed by literal `(url)`.

Closes #7632.
2021-10-20 10:34:47 -07:00
Milan Bracke
465c28d28e Docx reader: fix handling of empty fields
Some fields only have an instrText and no content, Pandoc didn't
understand these, causing other fields to be misunderstood because it
seemed like a field was still open when it wasn't.
2021-10-18 19:15:40 -07:00
Milan Bracke
6acc82c5d2 Docx parser: implement PAGEREF fields
These fields, often used in tables of contents, can be a hyperlink.
2021-10-18 19:15:40 -07:00
Milan Bracke
193f6bfeba Docx reader: fix handling of nested fields
Fields delimited by fldChar elements can contain other fields. Before,
the nested fields would be ignored, except for the end, which would be
considered the end of the parent field.

To fix this issue, fields needed to be considered containing ParParts
instead of Runs, since a Run can't represent complex enough structures.
This also impacted Hyperlinks since they can originate from a field.
2021-10-18 19:15:40 -07:00
Emily Bourke
8de261ba4e pptx: Line up continuation paragraphs
This commit changes the `marL` and `indent` values used for plain
paragraphs and numbered lists, and changes the spacing defined in the
reference doc master for bulleted lists.

For paragraphs, there is now a left-indent taken from the `otherStyle`
in the master. For numbered lists, the number is positioned where the
text would be if this were a plain paragraph, and the text is indented
to the next level. This means that continuation paragraphs line up
nicely with numbered lists.

It also /mostly/ matches the observed PowerPoint behaviour when
inserting paragraphs and numbered lists: the only difference is that
PowerPoint was using a different margin value for the first level
numbered lists – I’ve changed this to match the other levels, as I don’t
think it makes the spacing unappealing and it allows continuation
paragraphs at any level to line up.

With bulleted lists, I’m keeping the observed PowerPoint behaviour of
specifying only a level, letting `marL` and `indent` be automatically
taken from `bodyStyle`. To that end, this commit changes the `bodyStyle`
spacing in the master of the default reference doc, to:

- line up the text of the first paragraph in each bullet with any
  continuation paragraphs
- line up nested bullet markers in any continuation paragraphs with the
  first paragraph, matching lists and plain paragraphs

This does mean the continuation paragraphs still won’t line up for
anyone using their own reference doc where they haven’t matched the
`otherStyle` and `bodyStyle` indent levels, but I think people in that
situation will be able to troubleshoot.
2021-10-17 17:24:30 -07:00
Emily Bourke
8af15ab345 pptx: Fix list level numbering
In PowerPoint, the content of a top-level list is at the same level as
the content of a top-level paragraph – the only difference is that a
list style has been applied.

At the moment, the pptx writer increments the paragraph level on each
list, turning what should be top-level lists into second-level lists.

This commit changes that logic, only incrementing the paragraph level on
continuation paragraphs of lists.

- Fixes https://github.com/jgm/pandoc/issues/4828
- Fixes https://github.com/jgm/pandoc/issues/4663
2021-10-17 17:24:30 -07:00
John MacFarlane
3f489bcb58 Ensure that babel is loaded also with pdflatex.
This fixes a regression in #7604, which modernized
babel usage but omitted to load babel for pdflatex,
with the result that even simple documents could no
longer be produced.

Closes #7627.
2021-10-16 23:34:53 -07:00
Samuel Tardieu
a41c1fe0bb asciidoc writer: translate numberLines attribute to linesnum switch
AsciiDoctor allows to request line numbering on code blocks by
using a switch on the `source` block, such as in:

```
[source%linesnum,haskell]
----
some Haskell code here
----
```
2021-10-14 13:41:12 -07:00
Samuel Tardieu
628cde48cf DocBook reader: honor linenumbering attribute
The attribute DocBook linenumbering="numbered" attribute on code blocks
maps to "numberLines" internally.
2021-10-14 09:04:56 -07:00
John MacFarlane
49c4e1d014 Fix markdown parsing bug for math in bracketed spans and links.
This affects math with unbalanced brackets (e.g. `$(0,1]$`)
inside links, images, bracketed spans.

Closes #7623.
2021-10-13 08:59:37 -07:00
John MacFarlane
4ba4533d70 Update wasteland tests.
When we trimmed it down we left out some notes.
2021-10-11 09:09:51 -07:00
Milan Bracke
0f98cbff4b Avoid blockquote when parent style has more indent
When a paragraph has an indentation different from the parent (named)
style, it used to be considered a blockquote. But this only makes sense
when the paragraph has more indentation. So this commit adds a check
for the indentation of the parent style.
2021-10-10 16:27:32 -07:00
John MacFarlane
c72277e986 LaTeX reader: Properly handle \^ followed by group closing.
Closes #7615.
2021-10-10 11:24:28 -07:00
Emily Bourke
aa78765bf9 pptx: Remove excessive layout tests
When I added the tests for moved layouts and deleted layouts, I added
them to all tests. However, this doesn’t really give a lot more info
than having single tests, and the extra tests take up time and disk
space.

This commit removes the moved-layouts and deleted-layouts tests, in
favour of a single test for each of those scenarios.
2021-10-07 08:45:43 -07:00
John MacFarlane
b8d460eeab Powerpoint writer: consolidate text runs when possible.
This slims down the output files by avoiding unnecessary
text run elements.

Updated golden tests.
2021-10-04 12:24:12 -07:00
John MacFarlane
11baeb8850 OOXML tests: use pretty-printed form to display diffs.
Otherwise everything is on one line and the diff is uninformative.
2021-10-04 12:12:16 -07:00
John MacFarlane
82d587493d Revert "Powerpoint writer: consolidate text run nodes."
This reverts commit 62f83aa486.

This was already being done, it seems.
I misidentified the problem; it is really with `Str ""` nodes.
2021-10-04 11:50:32 -07:00
John MacFarlane
62f83aa486 Powerpoint writer: consolidate text run nodes.
This should reduce the size of the generated files.
2021-10-04 11:45:01 -07:00
John MacFarlane
0088e79cf5 Update tests for babel-related changes in latex template. 2021-10-03 19:27:37 -07:00
John MacFarlane
6ff04ac52d Fix compareXML helper in Tests.Writers.OOXML.
Given how it is used, we were getting "mine" and "good"
flipped in the test results.
2021-10-02 06:52:40 -07:00
Ezwal
472b33095e Docx reader: Add placeholder for word diagram 2021-09-30 12:44:44 -07:00
John MacFarlane
92abe45863 Further test updates for switch to pretty-show. 2021-09-29 08:28:54 -07:00
John MacFarlane
0bdcf415e4 Switch from pretty-simple to pretty-show for native output.
Update tests.

Reason:  it turns out that the native output generated by
pretty-simple isn't always readable by the native reader.
According to https://github.com/cdepillabout/pretty-simple/issues/99
it is not a design goal of the library that the rendered values
be readable using 'read'.  This makes it unsuitable for our
purposes.

pretty-show is a bit slower and it uses 4-space indents
(non-configurable), but it doesn't have this serious drawback.
2021-09-28 21:17:53 -07:00
John MacFarlane
665e6d3d94 BibTeX parser: fix expansion of special strings in series...
e.g. `newseries` or `library`.  Expansion should not happen
when these strings are protected in braces, or when they're
capitalized.

Closes #7591.
2021-09-23 22:21:05 -07:00
John MacFarlane
aa89f6be18 HTML reader: handle empty tbody element in table.
Closes #7589.
2021-09-23 09:25:37 -07:00
John MacFarlane
3de0d5a977 test/epub: an excerpt from The Wasteland is enough!
Saves over 100K.
2021-09-21 18:54:42 -07:00
John MacFarlane
c6fa0bc271 Revert "Remove unused epub test file features.epub."
This reverts commit 83ebb85b64.
2021-09-21 17:03:25 -07:00
John MacFarlane
a88b0098d6 Make test/epub/wasteland.epub valid. 2021-09-21 16:49:58 -07:00
John MacFarlane
83ebb85b64 Remove unused epub test file features.epub. 2021-09-21 16:42:33 -07:00
John MacFarlane
c266734448 Use pretty-simple to format native output.
Previously we used our own homespun formatting.  But this
produces over-long lines that aren't ideal for diffs in tests.
Easier to use something off-the-shelf and standard.

Closes #7580.

Performance is slower by about a factor of 10, but this isn't
really a problem because native isn't suitable as a serialization
format. (For serialization you should use json, because the reader
is so much faster than native.)
2021-09-21 12:37:42 -07:00
John MacFarlane
5f7e7f539a Add missing % on command tests.
This prevented `--accept` from working properly.
2021-09-21 10:42:24 -07:00
John MacFarlane
a1ca51c979 Command tests: raise error if command doesn't begin with %. 2021-09-21 10:42:14 -07:00
John MacFarlane
dd7b83ac91 Use babel, not polyglossia, with xelatex.
Previously polyglossia worked better with xelatex, but
that is no longer the case, so we simplify the code so that
babel is used with all latex engines.

This involves a change to the default LaTeX template.
2021-09-19 09:40:59 -07:00
Emily Bourke
50adea220d pptx: Support footers in the reference doc
In PowerPoint, it’s possible to specify footers across all slides,
containing a date (optionally automatically updated to today’s date),
the slide number (optionally starting from a higher number than 1), and
static text. There’s also an option to hide the footer on the title
slide.

Before this commit, none of that footer content was pulled through from
the reference doc: this commit supports all the functionality listed
above.

There is one behaviour which may not be immediately obvious: if the
reference doc specifies a fixed date (i.e. not automatically updating),
and there’s a date specified in the metadata for the document, the
footer date is replaced by the metadata date.

- Include date, slide number, and static footer content from reference
  doc
- Respect “slide number starts from” option
- Respect “Don’t show on title slide” option
- Add tests
2021-09-18 09:55:45 -07:00
John MacFarlane
57d93cca56 Org writer: don't indent contents of code blocks.
We previously indented them by two spaces, following a
common convention.  Since the convention is fading, and
the indentation is inconvenient for copy/paste, we are
discontinuing this practice.

Closes #5440.
2021-09-17 09:41:34 -07:00
John MacFarlane
a07d955d6f Fix code blocks using --preserve-tabs.
Previously they did not behave as the equivalent input
with spaces would.  Closes #7573.
2021-09-16 20:46:05 -07:00
Emily Bourke
7c22c0202e pptx: Support specifying slide background images
In the reveal-js output, it’s possible to use reveal’s
`data-background-image` class on a slide’s title to specify a background
image for the slide.

With this commit, it’s possible to use `background-image` in the same
way for pptx output. Only the “stretch” mode is supported, and the
background image is centred around the slide in the image’s larger axis,
matching the observed default behaviour of PowerPoint.

- Support `background-image` per slide.
- Add tests.
- Update manual.
2021-09-16 19:45:53 -07:00
Emily Bourke
0fb6474a55 pptx: Add support for incremental lists
- Support -i option
- Support incremental/noincremental divs
- Support older block quote syntax
- Add tests

One thing not clear from the manual is what should happen when the input
uses a combination of these things. For example, what should the
following produce?

```md
::: {.incremental .nonincremental}
- are
- these
- incremental?
:::

::: incremental
::::: nonincremental
- or
- these?
:::::
:::

::: nonincremental
> - how
> - about
> - these?
:::
```

In this commit I’ve taken the following approach, matching the observed
behaviour for beamer and reveal.js output:

- if a div with both classes, incremental wins
- the innermost incremental/nonincremental div is the one which takes
  effect
- a block quote containing a list as its first element inverts whether
  the list is incremental, whether or not the quote is inside an
  incremental/non-incremental div

I’ve added some tests to verify this behaviour.

This commit closes issue #5689
(https://github.com/jgm/pandoc/issues/5689).
2021-09-15 09:13:05 -07:00
John MacFarlane
a3162d341b RST reader: handle escaped colons in reference definitions.
Cloess #7568.
2021-09-13 22:57:08 -07:00
Emily Bourke
0ebe65e651 pptx: Fix logic for choosing Comparison layout
There was a mistake in the logic used to choose between the Comparison
and Two Content layouts: if one column contained only non-text (an image
or a table) and the other contained only text, the Comparison layout was
chosen instead of the desired Two Content layout.

This commit fixes that logic:

> If either column contains text followed by non-text, use Comparison.
  Otherwise, use Two Content.

It also adds a test asserting this behaviour.
2021-09-13 08:30:36 -07:00
John MacFarlane
6271b09c50 Docx writer: make id used in native_numbering predictable.
If the image has the id IMAGEID, then we use the id ref_IMAGEID
for the figure number.  Closes #7551.

This allows one to create a filter that adds a figure number
with figure name, e.g.

     <w:fldSimple w:instr=" REF ref_superfig "><w:r><w:t>Figure X</w:t></w:r></w:fldSimple>

For this to be possible it must be possible to predict the
figure number id from the image id.

If images lack an id, an id of the form `ref_fig1` is used.
2021-09-12 15:30:29 -07:00
Emily Bourke
2b98991551 pptx: Include all themes in output archive
- Accept test changes: they’re adding the second theme (for all tests
  not containing speaker notes), or changing its position in the
  XML (for the ones containing speaker notes).
2021-09-10 17:06:45 -07:00
Emily Bourke
8ec9b884f1 pptx: Fix capitalisation of notesMasterId
I don’t think this has caused any problems, but before now it’s been
"NotesMasterId", which is incorrect according to [ECMA-376].

[ECMA-376]: https://www.ecma-international.org/publications-and-standards/standards/ecma-376/
2021-09-10 17:06:45 -07:00
John MacFarlane
8beca46611 Fix command test for #7557. 2021-09-10 12:07:11 -07:00
John MacFarlane
0216a2f504 Org reader: don't parse a list as first item in a list item.
Closes #7557.
2021-09-10 09:50:05 -07:00
Francesco Mazzoli
99a4d1d0b0
Support --reference-location for HTML output (#7461)
The HTML writer now supports `EndOfBlock`, `EndOfSection`, and
`EndOfDocument` for reference locations.  EPUB and HTML slide
show formats are also affected by this change.

This works similarly to the markdown writer, but with special care
taken to skipping section divs with what regards to the block level.

The change also takes care to not modify the output if `EndOfDocument`
is used.
2021-09-10 09:30:05 -07:00
John MacFarlane
b185560a8e RTF reader: better handling of \* and bookmarks.
We now ensure that groups starting with `\*` never cause
text to be added to the document.

In addition, bookmarks now create a span between the start
and end of the bookmark, rather than an empty span.
2021-09-04 11:06:01 -07:00
Emily Bourke
b82a01b688 pptx: Add support for more layouts
Until now, the pptx writer only supported four slide layouts: “Title
Slide” (used for the automatically generated metadata slide), “Section
Header” (used for headings above the slide level), “Two Column” (used
when there’s a columns div containing at least two column divs), and
“Title and Content” (used for all other slides).

This commit adds support for three more layouts: Comparison, Content
with Caption, and Blank.

- Support “Comparison” slide layout

  This layout is used when a slide contains at least two columns, at
  least one of which contains some text followed by some non-text (e.g.
  an image or table). The text in each column is inserted into the
  “body” placeholder for that column, and the non-text is inserted into
  the ObjType placeholder. Any extra content after the non-text is
  overlaid on top of the preceding content, rather than dropping it
  completely (as currently happens for the two-column layout).

  + Accept straightforward test changes

    Adding the new layout means the “-deleted-layouts” tests have an
    additional layout added to the master and master rels.

  + Add new tests for the comparison layout
  + Add new tests to pandoc.cabal

- Support “Content with Caption” slide layout

  This layout is used when a slide’s body contains some text, followed by
  non-text (e.g. and image or a table). Before now, in this case the image
  or table would break onto a new slide: to get that output again, users
  can add a horizontal rule before the image or table.

  + Accept straightforward tests

    The “-deleted-layouts” tests all have an extra layout and relationship
    in the master for the Content with Caption layout.

  + Accept remove-empty-slides test

    Empty slides are still removed, but the Content with Caption layout is
    now used.

  + Change slide-level-0/h1-h2-with-text description

    This test now triggers the content with caption layout, giving a
    different (but still correct) result.

  + Add new tests for the new layout
  + Add new tests to the cabal file

- Support “Blank” slide layout

  This layout is used when a slide contains only blank content (e.g.
  non-breaking spaces). No content is inserted into any placeholders in
  the layout.

  Fixes #5097.

  + Accept straightforward test changes

    Blank layout now copied over from reference doc as well, when
    layouts have been deleted.

  + Add some new tests

    A slide should use the blank layout if:

    - It contains only speaker notes
    - It contains only an empty heading with a body of nbsps
    - It contains only a heading containing only nbsps

- Change ContentType -> Placeholder

  This type was starting to have a constructor for each placeholder on
  each slide (e.g. `ComparisonUpperLeftContent`). I’ve changed it
  instead to identify a placeholder by type and index, as I think that’s
  clearer and less redundant.

- Describe layout-choosing logic in manual
2021-09-01 07:16:17 -07:00
Emily Bourke
8dbea49092 pptx: Restructure tests
- Use dashes consistently rather than underscores
- Make a folder for each set of tests
- List test files explicitly (Cabal doesn’t support ** until version
  2.4)
2021-09-01 07:16:17 -07:00
John MacFarlane
5dcd4610e2 Improve asciidoc escaping for -- in URLs. Closes #7529. 2021-08-29 10:12:20 -07:00
Emily Bourke
8e5a79f264 pptx: Make first heading title if slide level is 0
Before this commit, the pptx writer adds a slide break before any table,
“columns” div, or paragraph starting with an image, unless the only
thing before it on the same slide is a heading at the slide level. In
that case, the item and heading are kept on the same slide, and the
heading is used as the slide title (inserted into the layout’s “title”
placeholder).

However, if the slide level is set to 0 (as was recently enabled) this
makes it impossible to have a slide with a title which contains any of
those items in its body.

This commit changes this behaviour: now if the slide level is 0, then
items will be kept with a heading of any level, if the heading’s the
only thing before the item on the same slide.
2021-08-27 09:47:03 -07:00
John MacFarlane
e4d7a6177f Ensure we have unique ids for wp:docPr and pic:cNvPr elements.
This will, I hope, fix #7527 and #7503.
2021-08-27 09:42:59 -07:00
John MacFarlane
7ff06a8c43 Fix test for #7521. 2021-08-24 12:56:31 -07:00
John MacFarlane
3f9b7a10ad Markdown reader: fix interaction of --strip-comments and list
parsing.  Use of `--strip-comments` was causing tight lists
to be rendered as loose (as if the comment were a blank line).
Closes #7521.
2021-08-23 22:06:39 -07:00
Simon Schuster
591cdca38b LaTeX-parser: restrict \endinput to current file 2021-08-21 18:08:27 -07:00
John MacFarlane
07d847a910 RST reader: Fix :literal: includes.
These should create code blocks, not insert raw RST.
Closes #7513.
2021-08-20 09:54:42 -07:00
Emily Bourke
5616d00d09 pptx: Include image title in description
The image title (i.e. `![alt text](link "title")`) was previously
ignored when writing to pptx. This commit includes it in PowerPoint's
description of the image, along with the link (which was already
included).

Fixes 7352.
2021-08-18 10:10:55 -07:00
John MacFarlane
fd99fe4d7e Revise citeproc code to fit new citeproc 0.5 API.
Linkification of URLs in the bibliography is now done in
the citeproc library, depending on the setting of an option.
We set that option depending on the value of the metadata
field `link-bibliography` (defaulting to true, for consistency
with earlier behavior, though the new behavior includes the
CSL draft recommendation of hyperlinking the title or the whole
entry if a DOI, PMID, PMCID, or URL field is present but not
explicitly rendered).

These changes implement the following recommendations from the
draft CSL v1.0.2 spec (Appendix VI):

> The CSL syntax does not have support for configuration of links.
> However, processors should include links on bibliographic references,
> using the following rules:

> If the bibliography entry for an item renders any of the following
> identifiers, the identifier should be anchored as a link, with the
> target of the link as follows:

> - url: output as is
> - doi: prepend with "`https://doi.org/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"

> If the identifier is rendered as a URI, include rendered URI components
> (e.g. "`https://doi.org/`") in the link anchor. Do not include any other
> affix text in the link anchor (e.g. "Available from: ", "doi: ", "PMID: ").
> If the bibliography entry for an item does not render any of
> the above identifiers, then set the anchor of the link as the item
> title. If title is not rendered, then set the anchor of the link as the
> full bibliography entry for the item. Set the target of the link as one
> of the following, in order of priority:
>
> - doi: prepend with "`https://doi.org/`"
> - pmcid: prepend with "`https://www.ncbi.nlm.nih.gov/pmc/articles/`"
> - pmid: prepend with "`https://www.ncbi.nlm.nih.gov/pubmed/`"
> - url: output as is
>
> If the item data does not include any of the above identifiers, do not
> include a link.
>
> Citation processors should include an option flag for calling
> applications to disable bibliography linking behavior.

Thanks to Benjamin Bray for getting this all working.
2021-08-17 15:34:23 -07:00
John MacFarlane
2e9a8935fb OOXML tests: silence warnings.
These can make the test output confusing, making people think
tests are failing when they're passing.
2021-08-17 15:33:10 -07:00
Emily Bourke
72823ad947 pptx: Select layouts from reference doc by name
Until now, users had to make sure that their reference doc contains
layouts in a specific order: the first four layouts in the file had to
have a specific structure, or else pandoc would error (or sometimes
successfully produce a pptx file, which PowerPoint would then fail to
open).

This commit changes the layout selection to use the layout names rather
than order: users must make sure their reference doc contains four
layouts with specific names, and if a layout with the right name isn’t
found pandoc will output a warning and use the corresponding layout from
the default reference doc as a fallback.

I believe the use of names rather than order will be clearer to users,
and the clearer errors will help them troubleshoot when things go wrong.

- Add tests for moved layouts
- Add tests for deleted layouts
- Add newly included layouts to slideMaster1.xml to fix tests
2021-08-17 09:35:25 -07:00
Emily Bourke
9204e5c9b1 Don’t compare cdLine in OOXML golden tests
The `cdLine` field gives the line of the file some CData was found on. I
don’t think this is a difference that should fail these golden tests, as
the XML should still be parsable if nothing else has changed.
2021-08-17 09:35:25 -07:00
Emily Bourke
8474d488a5 Provide more detailed XML diff in tests
I had some failing tests and couldn’t tell what was different in the
XML. Updating the comparison to return what’s different made it easier
to figure out what was wrong, and I think will be helpful for others in
future.
2021-08-17 09:35:25 -07:00
OCzarnecki
e37cf4484d
Multimarkdown sub- and superscripts (#5512) (#7188)
Added an extension `short_subsuperscripts` which modifies the behavior
of `subscript` and `superscript`, allowing subscripts or superscripts containing only
alphanumerics to end with a space character (eg. `x^2 = 4` or `H~2 is
combustible`).  This improves support for multimarkdown. Closes #5512.

Add `Ext_short_subsuperscripts` constructor to `Extension` [API change].
This is enabled by default for `markdown_mmd`.
2021-08-15 21:57:57 -07:00
John MacFarlane
4340bd52c4 Make docx writer sensitive to native_numbering extension.
Figure and table numbers are now only included if `native_numbering`
is enabled.  (By default it is disabled.)  This is a behavior change
with respect to 2.14.1, but the behavior is that of previous versions.

The change was necessary to avoid incompatibilities between pandoc's
native numbering and third-party cross reference filters like
pandoc-crossref.

Closes #7499.
2021-08-15 15:05:54 -07:00
John MacFarlane
2c466a15af Remove misleading description from command/citeproc-87 test. 2021-08-15 09:26:30 -07:00
John MacFarlane
82638ad53b Convert Quoted in bib entries to special Spans...
before passing them off to citeproc.
This ensures that we get proper localization and flipflopping
if, e.g., quotes are used in titles.

Closes jgm/citeproc#87.
2021-08-13 19:25:29 -07:00
John MacFarlane
15683bb607 Citeproc: avoid odd handling of quotes.
citeproc changes allow us to ignore Quoted elements;
citeproc now uses its own method for represented quoted
things, and only localizes and flipflops quotes it adds itself.

See #87.

The one thing left to do is to convert Quoted elements in
bibliography databases (esp. titles) to `Span ("",["csl-quoted"],[])`
before passing them to citeproc, IF the localized quotes
for the quote type match the standard inverted commas.
2021-08-13 18:13:06 -07:00
John MacFarlane
418155aa95 Fix raw LaTeX injection issue (LaTeX writer).
Using a code block containing `\end{verbatim}`, one could
inject raw TeX into a LaTeX document even when `raw_tex`
is disabled.  Thanks to Augustin Laville for noticing the
bug.

Closes #7497.
2021-08-13 11:27:04 -07:00
William Lupton
fc20672bb9
Various sample.lua editorial fixes. (#7493)
These address most of the items mentioned in #7487.
There's also a table caption fix (the caption wasn't escaped).
2021-08-12 10:16:34 -07:00
John MacFarlane
dd1a956a8a LaTeX reader: Support \global before \def, \let, etc.
See #7494.
2021-08-11 16:28:53 -07:00
John MacFarlane
e3a263df46 Fix scope for LaTeX macros.
They should by default scope over the group in which they
are defined (except `\gdef` and `\xdef`, which are global).
In addition, environments must be treated as groups.

We handle this by making sMacros in the LaTeX parser state
a STACK of macro tables. Opening a group adds a table to
the stack, closing one removes one.  Only the top of the stack
is queried.

This commit adds a parameter for scope to the Macro constructor
(not exported).

Closes #7494.
2021-08-11 16:14:34 -07:00
John MacFarlane
a0e44b1ff6 LaTeX reader: improve handling of plain TeX macro primitives.
- Fixed semantics for `\let`.
- Implement `\edef`, `\gdef`, and `\xdef`.
- Add comment noting that currently `\def` and `\edef` set global
  macros (so are equivalent to `\gdef` and `\xdef`).  This should be
  fixed by scoping macro definitions to groups, in a future commit.

Closes #7474.
2021-08-11 10:32:52 -07:00
John MacFarlane
06d97131e5 Tests.Helpers: export testGolden and use it in RTF reader.
This gives a diff output on failure.
2021-08-10 22:07:48 -07:00
John MacFarlane
3a924d8f96 HTML reader: treat commments as blank when parsing.
This modifies pBlank.  Previously comments could sometimes
flummox the parser.

Cloes #7482.
2021-08-10 12:50:23 -07:00
John MacFarlane
7ca4233793 Add test for #7488. 2021-08-10 11:11:33 -07:00
John MacFarlane
6543b05116 Add RTF reader.
- `rtf` is now supported as an input format as well as output.
- New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change]

Closes #3982.
2021-08-10 10:48:55 -07:00
John MacFarlane
dea1f0f080 RTF writer: emit \outlinelevel for section headings. 2021-08-04 16:37:20 -06:00
Peter Fabinski
8667ba2bcc
LaTeX table writer: Increase column width precision (#7466)
In some cases, the rounding performed by the LaTeX table
writer would introduce visible overrun outside the text
area.
This adds two more decimal places to the width values.
2021-08-03 15:34:39 -06:00
John MacFarlane
f938378d00 RTF writer: omit \bin in \pict.
According to the spec, this is not needed or wanted when
the data is in hexadecimal format, as it is here.
2021-08-01 22:45:41 -06:00
John MacFarlane
ca12e198ba RTF template: specify font family for fixed-width font f1.
According to the spec, this is mandatory.
2021-08-01 09:45:09 -06:00
Jan Tojnar
06408d08e5
DocBook reader: add support for citerefentry (#7437)
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element.
These days, refentry is more general, for example the element documentation pages linked below are each a refentry.

As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot.

https://tdg.docbook.org/tdg/5.1/citerefentry.html
https://tdg.docbook.org/tdg/5.1/manvolnum.html
https://tdg.docbook.org/tdg/5.1/refentry.html

This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11 15:28:52 -07:00
John MacFarlane
ac0a9da6d8 Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).
We now use source positions from the token stream to tell us
how much of the text stream to consume.  Getting this to
work required a few other changes to make token source positions
accurate.

Closes #7434.
2021-07-11 13:50:28 -07:00
John MacFarlane
ae22b1e977 RST reader: fix regression with code includes.
With the recent changes to include infrastructure,
included code blocks were getting an extra newline.

Closes #7436.  Added regression test.
2021-07-09 12:27:41 -07:00
Michael Hoffmann
e56e2b0e0b
Recognize data-external when reading HTML img tags (#7429)
Preserve all attributes in img tags.  If attributes have a `data-`
prefix, it will be stripped.  In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06 16:06:29 -07:00
John MacFarlane
3a31fe68ef Add command test for #7394.
And fix a small bug in handling of citations in notes, which
led to commas at the end of sentences in some cases.
2021-07-05 15:10:14 -07:00
Mauro Bieg
de4da56079 document-css: reset overflow-wrap on code blocks
fixes #7423
2021-07-05 08:57:23 -07:00
John MacFarlane
972db3cdca Revert "LaTeX template: move title, author, date up to top of preamble."
This reverts commit cc088687b4
and PR #7295.

This fixes issues people had when using LaTeX commands defined later
in the preamble (or in some cases UTF-8 text) in the title or author
fields.  Closes #7422.
2021-07-03 15:34:42 -07:00
Aner Lucero
cb038bb312 HTML5 writer, remove aria-hidden when explicit atl text is provided. 2021-07-02 13:02:52 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
John MacFarlane
a3d745e485 Docx writer: support figure numbers.
These are set up in such a way that they will work with Word's
automatic table of figures.

Closes #7392.
2021-06-29 09:56:21 -07:00
John MacFarlane
b7572db224 Use dev version of citeproc.
This eliminates double hyperlinks in author-in-text citations.
Author-only citations are no longer hyperlinked.
See jgm/citeproc#77.
2021-06-29 09:18:49 -07:00
Aner Lucero
f4ef652a41 Remove duplicated alt text in HTML output. 2021-06-29 09:02:13 -07:00
John MacFarlane
851d037b3e Improve punctuation moving with --citeproc.
Previously, using `--citeproc` could cause punctuation to move in
quotes even when there aer no citations. This has been changed;
now, punctuation moving is limited to citations.

In addition, we only move footnotes around punctuation if the
style is a note style, even if `notes-after-punctuation` is `true`.
2021-06-28 22:41:14 -07:00
John MacFarlane
dd098d4e15 Markdown writer: put space between Plain and following fenced Div.
Closes #4465.
2021-06-28 11:33:22 -07:00
John MacFarlane
1b07997f4a Fix regression with comment-only YAML metadata blocks.
Closes #7400.
2021-06-22 09:55:50 -07:00
John MacFarlane
8eed5b90d0 LaTeX writer: add strut at end of minipage if it contains...
line breaks.  Without them, the last line is shorter
than it should be, at least in some cases.
2021-06-21 23:33:00 -07:00
John MacFarlane
2ef2049b4e Update command test for change to LaTeX LineBreak handling. 2021-06-21 22:34:38 -07:00
John MacFarlane
ed3974a254 LaTeX writer: always use a minipage for cells with line breaks...
if width information is available.  Otherwise the way we treat them can
lead to content that overflows a cell.

Closes #7393.
2021-06-21 18:25:36 -07:00
John MacFarlane
a39313eddb Fix test for #7397 2021-06-21 09:30:23 -07:00
John MacFarlane
82ad855f38 Markdown writer: Fix regression in code blocks with attributes.
Code blocks with a single class but nonempty attributes
were having attributes drop as a result of #7242.

Closes #7397.
2021-06-21 08:49:00 -07:00
John MacFarlane
b0cd6c6224 Fix regression in citeproc processing.
If inline references are used (in the metadata `references` field),
we should still only include in the bibliography items that are
actually cited -- unless `nocite` is used.

Closes #7376.
2021-06-12 10:16:44 -07:00
John MacFarlane
21cc52abe3 LaTeX writer: Fix regression in table header position.
In recent versions the table headers were no longer bottom-aligned
(if more than one line).  This patch fixes that by using minipages
for table headers in non-simple tables.

Closes #7347.
2021-06-05 14:13:58 -06:00
Jan Tojnar
af9de925de DocBook writer: Remove non-existent admonitions
attention, error and hint are actually just reStructuredText specific.
danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55
2021-06-05 08:02:21 -06:00
John MacFarlane
2e4ef14d91 Markdown reader: fix pipe table regression in 2.11.4.
Previously pipe tables with empty headers (that is, a header
line with all empty cells) would be rendered as headerless
tables.  This broke in 2.11.4.

The fix here is to produce an AST with an empty table head
when a pipe table has all empty header cells.

Closes #7343.
2021-06-01 21:44:55 -06:00
John MacFarlane
abb59bd582 LaTeX reader: don't allow optional * on symbol control sequences.
Generally we allow optional starred variants of LaTeX commands
(since many allow them, and if we don't accept these explicitly,
ignoring the star usually gives acceptable results).  But we
don't want to do this for `\(*\)` and similar cases.

Closes #7340.
2021-06-01 13:54:51 -06:00
John MacFarlane
62f46b3995 Fix regression with commonmark/gfm yaml metdata block parsing.
A regression in 2.14 led to the document body being omitted
after YAML metadata in some cases.  This is now fixed.

Closes #7339.
2021-05-31 21:34:51 -06:00
John MacFarlane
cc206af392 Have LoadedResource use relative paths.
The immediate reason for this is to allow the test output of #3752
to work on both windows and linux.
2021-05-30 10:23:00 -07:00
John MacFarlane
c210b98366 Fix test #3752 (1) for Windows. 2021-05-29 14:36:49 -07:00
John MacFarlane
5772f7f943 Further test image size reductions. 2021-05-29 12:27:59 -07:00
John MacFarlane
7aade73dce Replace biblatex-exmaples.bib with shorter averroes.bib in tests. 2021-05-29 12:14:37 -07:00
John MacFarlane
e86f6abc45 Further test image size reductions. 2021-05-29 12:09:21 -07:00
John MacFarlane
3ba9ef01eb Reduce size of image in fb2 image test. 2021-05-29 11:54:03 -07:00
John MacFarlane
5cf887db20 Reduce size of cover image in test epub. 2021-05-29 11:48:52 -07:00
John MacFarlane
8660f42f09 Modify pptx tests to take a whole lot less space.
- Replace a 300K image in the reference pptx with a 2K one.
- Updated all the *_templated.pptx files based on the new
  reference pptx.
- These changes should reduce the size of the tarball by
  roughly 7 MB!

See haskell/hackage-server#935
2021-05-29 10:59:14 -07:00
John MacFarlane
b6b2331fdc Support rebase_relative_paths for commonmark based formats.
(Including `gfm`.)
2021-05-28 13:58:44 -07:00
Emily Bourke
56b211120c
Docx reader: Support new table features.
* Column spans
* Row spans
  - The spec says that if the `val` attribute is ommitted, its value
    should be assumed to be `continue`, and that its values are
    restricted to {`restart`, `continue`}. If the value has any other
    value, I think it seems reasonable to default it to `continue`. It
    might cause problems if the spec is extended in the future by adding
    a third possible value, in which case this would probably give
    incorrect behaviour, and wouldn't error.
* Allow multiple header rows
* Include table description in simple caption
  - The table description element is like alt text for a table (along
    with the table caption element). It seems like we should include
    this somewhere, but I’m not 100% sure how – I’m pairing it with the
    simple caption for the moment. (Should it maybe go in the block
    caption instead?)
* Detect table captions
  - Check for caption paragraph style /and/ either the simple or
    complex table field. This means the caption detection fails for
    captions which don’t contain a field, as in an example doc I added
    as a test. However, I think it’s better to be too conservative: a
    missed table caption will still show up as a paragraph next to the
    table, whereas if I incorrectly classify something else as a table
    caption it could cause havoc by pairing it up with a table it’s
    not at all related to, or dropping it entirely.
* Update tests and add new ones

Partially fixes: #6316
2021-05-28 20:15:23 +02:00
Emily Bourke
44484d0dee
Docx reader: Read table column widths. 2021-05-28 20:15:23 +02:00
John MacFarlane
4842c5fb82 Two citeproc locator/suffix improvements:
- Recognize locators spelled with a capital letter.
  Closes #7323.
- Add a comma and a space in front of the suffix if it doesn't start
  with space or punctuation.  Closes #7324.
2021-05-27 18:28:52 -07:00
John MacFarlane
4b16d181e7 rebase_relative_paths: leave empty paths unchanged. 2021-05-27 14:16:37 -07:00
John MacFarlane
0661ce699f rebase_relative_paths extension: don't change fragment paths.
We don't want a pure fragment path to be rewritten, since
these are used for cross-referencing.
2021-05-27 13:53:26 -07:00
John MacFarlane
6972a7dc91 Modify rebase_reference_links treatment of reference links/images.
The directory is based on the file containing the link
reference, not the file containing the link, if these differ.
2021-05-27 11:26:38 -07:00
John MacFarlane
cbe16b2866 Citeproc: Don't detect math elements as locators.
Closes #7321.
2021-05-27 10:49:45 -07:00
John MacFarlane
834da53058 Add rebase_relative_paths extension.
- Add manual entry for (non-default) extension
  `rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
  in Text.Pandoc.Extensions [API change]. When enabled, this
  extension rewrites relative image and link paths by prepending
  the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.

Closes #3752.

NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
2021-05-27 10:38:25 -07:00
John MacFarlane
81eadfd99a LaTeX reader: improve \def and implement \newif.
- Improve parsing of `\def` macros.  We previously set "verbatim mode"
  even for parsing the initial `\def`; this caused problems for things
  like
  ```
  \def\foo{\def\bar{BAR}}
  \foo
  \bar
  ```
- Implement `\newif`.
- Add tests.
2021-05-27 09:15:04 -07:00
John MacFarlane
e0a1f7d2cf Command tests: fail if a file contains no tests.
And fix a test that failed in that way!
2021-05-26 09:52:23 -07:00
John MacFarlane
6804f47383 Fix a command test so it writes to stdout not stderr.
The error message to stderr was appearing in test output
and confusing some users, who thought it indicated a failing
test rather than expected output.
2021-05-25 21:41:40 -07:00
John MacFarlane
8d5014fdfc Logging: remove single quotes around paths in messages.
We weren't doing it consistently and it seems unnecessary.
2021-05-25 11:53:49 -07:00
Albert Krewinkel
d46ea7d7da
Jira: add support for "smart" links
Support has been added for the new
`[alias|https://example.com|smart-card]` syntax.
2021-05-25 16:54:42 +02:00
John MacFarlane
8511f6fdf6 MediaBag improvements.
In the current dev version, we will sometimes add
a version of an image with a hashed name, keeping
the original version with the original name, which
would leave to undesirable duplication.

This change separates the media's filename from the
media's canonical name (which is the path of the link
in the document itself).  Filenames are based on SHA1
hashes and assigned automatically.

In Text.Pandoc.MediaBag:

- Export MediaItem type [API change].
- Change MediaBag type to a map from Text to MediaItem [API change].
- `lookupMedia` now returns a `MediaItem` [API change].
- Change `insertMedia` so it sets the `mediaPath` to
  a filename based on the SHA1 hash of the contents.
  This will be used when contents are extracted.

In Text.Pandoc.Class.PandocMonad:

- Remove `fetchMediaResource` [API change].

Lua MediaBag module has been changed minimally. In the future
it would be better, probably, to give Lua access to the full
MediaItem type.
2021-05-24 09:20:44 -07:00
Albert Krewinkel
58fbf56548
Jira writer: use {color} when span has a color attribute
Closes: tarleb/jira-wiki-markup#10
2021-05-24 09:56:02 +02:00
John MacFarlane
1af2cfb287 Handle relative lengths (e.g. 2*) in HTML column widths.
See <https://www.w3.org/TR/html4/types.html#h-6.6>.

"A relative length has the form "i*", where "i" is an integer. When
allotting space among elements competing for that space, user agents
allot pixel and percentage lengths first, then divide up remaining
available space among relative lengths. Each relative length receives a
portion of the available space that is proportional to the integer
preceding the "*". The value "*" is equivalent to "1*". Thus, if 60
pixels of space are available after the user agent allots pixel and
percentage space, and the competing relative lengths are 1*, 2*, and 3*,
the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and
the 3* will be alloted 30 pixels."

Closes #4063.
2021-05-22 22:03:54 -07:00
John MacFarlane
07d299d353 DocBook reader: ensure that first and last names are separated.
Closes #6541.
2021-05-20 18:45:39 -07:00
John MacFarlane
d7b5def287 Ms writer: handle tables with multiple paragraphs.
Previously they overflowed the table cell width.
We now set line lengths per-cell and restore them
after the table has been written.

Closes #7288.
2021-05-20 17:12:38 -07:00
John MacFarlane
bb11f5fb86 LaTeX reader: More siunitx improvements. Closes #6658.
There's still one slight divergence from the siunitx behavior:
we get 'kg m/A/s' instead of 'kg m/(A s)'. At the moment I'm
not going to worry about that.
2021-05-20 15:30:31 -07:00
John MacFarlane
4e990a8cf9 LaTeX/siunitx: fix parsing of \cubic etc. See #6658. 2021-05-20 10:13:20 -07:00
John MacFarlane
bc5058234f LaTeX reader sinuitx: fix + sign on ang. 2021-05-20 10:13:20 -07:00
John MacFarlane
5dc917da3e LaTeX reader siunitx: add leading 0 to numbers starting with . 2021-05-20 10:13:20 -07:00
Denis Maier
183ce58477
ConTeXt reader: improve ordered lists (#7304)
Closes #5016 

- change ordered list from itemize to enumerate
- adds new itemgroup for ordered lists
- add fontfeature for table figures
- remove width from itemize in context writer
2021-05-20 09:59:53 -07:00
John MacFarlane
a366bd6abc LaTeX reader: Fix parsing of +- in siunitx numbers.
See #6658.
2021-05-20 09:03:29 -07:00