Commit graph

7441 commits

Author SHA1 Message Date
John MacFarlane
832196fb17 MIME: use image/x-xcf instead of application/x-xcf.
Closes .
2021-07-22 13:08:30 -07:00
John MacFarlane
31a5bccd57 LaTeX reader: avoid trailing hyphen in translating languages.
Previously `\foreignlanguage{english}` turned into `<span lang="en-">`.
The same issue affected Arabic.

Closes .
2021-07-17 23:07:53 -07:00
John MacFarlane
46099e79de DocBook reader: handle images with imageobjectco elements.
Closes .
2021-07-16 13:10:45 -07:00
John MacFarlane
493522c562 LaTeX reader: Support \cline in LaTeX tables.
Closes .
2021-07-16 12:04:43 -07:00
John MacFarlane
18270c7a39 PDF: Fix svgIn path error.
We were duplicating the temp directory; this didn't show up
on macOS or linux because there we use absolute paths for
the temp directory.

Closes .
2021-07-16 11:39:02 -07:00
Jan Tojnar
06408d08e5
DocBook reader: add support for citerefentry ()
Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element.
These days, refentry is more general, for example the element documentation pages linked below are each a refentry.

As per the *Processing expectations* section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot.

https://tdg.docbook.org/tdg/5.1/citerefentry.html
https://tdg.docbook.org/tdg/5.1/manvolnum.html
https://tdg.docbook.org/tdg/5.1/refentry.html

This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage
2021-07-11 15:28:52 -07:00
John MacFarlane
ac0a9da6d8 Improved parsing of raw LaTeX from Text streams (rawLaTeXParser).
We now use source positions from the token stream to tell us
how much of the text stream to consume.  Getting this to
work required a few other changes to make token source positions
accurate.

Closes .
2021-07-11 13:50:28 -07:00
John MacFarlane
477a67061f Always use / when adding directory to image path with extractMedia.
Even on Windows.

May help with .
2021-07-09 14:14:19 -07:00
John MacFarlane
ae22b1e977 RST reader: fix regression with code includes.
With the recent changes to include infrastructure,
included code blocks were getting an extra newline.

Closes .  Added regression test.
2021-07-09 12:27:41 -07:00
Michael Hoffmann
565330033a
Don't incorporate externally linked images in EPUB documents ()
Just like it is possible to avoid incorporating an image in EPUB by
passing `data-external="1"` to a raw HTML snippet, this makes the same
possible for native Images, by looking for an associated `external`
attribute.
2021-07-07 09:26:37 -07:00
Michael Hoffmann
e56e2b0e0b
Recognize data-external when reading HTML img tags ()
Preserve all attributes in img tags.  If attributes have a `data-`
prefix, it will be stripped.  In particular, this preserves a
`data-external` attribute as an `external` attribute in the pandoc AST.
2021-07-06 16:06:29 -07:00
John MacFarlane
e7f8cc5786 T.P.PDF, convertImage: normalize paths.
This will avoid paths on Windows with mixed path separators,
which may cause problems with SVG conversion.

See .
2021-07-06 10:39:47 -07:00
John MacFarlane
f88ebf3ebf Markdown reader: don't try to read contents in self-closing HTML tag.
Previously we had problems parsing raw HTML with self-closing
tags like `<col/>`. The problem was that pandoc would look
for a closing tag to close the markdown contents, but the
closing tag had, in effect, already been parsed by `htmlTag`.

This fixes the issue described in
<https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.
2021-07-06 10:22:07 -07:00
John MacFarlane
3ed37f0077 HTML reader: add col, colgroup to 'closes' definitions 2021-07-06 10:21:59 -07:00
John MacFarlane
3a31fe68ef Add command test for .
And fix a small bug in handling of citations in notes, which
led to commas at the end of sentences in some cases.
2021-07-05 15:10:14 -07:00
John MacFarlane
77537b1765 Citeproc: cleanup and efficiency improvement in deNote. 2021-07-05 13:41:01 -07:00
John MacFarlane
ff26af59ac Revamp note citation handling.
Use latest citeproc, which uses a Span with a class rather
than a Note for notes.  This helps us distinguish between
user notes and citation notes.

Don't put citations at the beginning of a note in parentheses.
(Closes #7394.)
2021-07-05 13:19:33 -07:00
Aner Lucero
cb038bb312 HTML5 writer, remove aria-hidden when explicit atl text is provided. 2021-07-02 13:02:52 -07:00
John MacFarlane
0948af9cc5 Docx writer: Add table numbering for captioned tables.
The numbers are added using fields, so that Word can
create a list of tables that will update automatically.
2021-06-29 11:15:40 -07:00
John MacFarlane
a01ba4463f Docx writer: Fixed a couple bugs in Figure numbering. 2021-06-29 11:15:13 -07:00
John MacFarlane
a3d745e485 Docx writer: support figure numbers.
These are set up in such a way that they will work with Word's
automatic table of figures.

Closes .
2021-06-29 09:56:21 -07:00
Aner Lucero
f4ef652a41 Remove duplicated alt text in HTML output. 2021-06-29 09:02:13 -07:00
John MacFarlane
851d037b3e Improve punctuation moving with --citeproc.
Previously, using `--citeproc` could cause punctuation to move in
quotes even when there aer no citations. This has been changed;
now, punctuation moving is limited to citations.

In addition, we only move footnotes around punctuation if the
style is a note style, even if `notes-after-punctuation` is `true`.
2021-06-28 22:41:14 -07:00
John MacFarlane
97b0aa667c Allow $ characters in bibtex keys.
Closes .
2021-06-28 13:34:12 -07:00
John MacFarlane
f045e59248 Text.Pandoc.Error: fix line calculations in reporting parsec errors.
Also remove a spurious initial newline in the error report.
2021-06-28 13:28:49 -07:00
John MacFarlane
4262898fe9 Set proper initial source name in parsing BibTeX.
(For better error messages.)
2021-06-28 13:28:02 -07:00
John MacFarlane
dd098d4e15 Markdown writer: put space between Plain and following fenced Div.
Closes .
2021-06-28 11:33:22 -07:00
John MacFarlane
4a7a0cff29 ImageSize: Add Tiff constructor for ImageType.
[Minor API change]

This allows pandoc to get size information from tiff images.
Closes .
2021-06-23 11:39:50 -07:00
John MacFarlane
235cdea629 reveal.js writer: Go back to setting boolean values for variables.
In a previous commit we used strings because boolean False
wouldn't render as `false`. This is changed in the dev
version ofdoctemplates, so we can go back to the more
straightforward approach.
2021-06-23 09:54:14 -07:00
John MacFarlane
1b07997f4a Fix regression with comment-only YAML metadata blocks.
Closes .
2021-06-22 09:55:50 -07:00
John MacFarlane
086790d986 Fix unneeded import 2021-06-22 09:49:24 -07:00
John MacFarlane
8eed5b90d0 LaTeX writer: add strut at end of minipage if it contains...
line breaks.  Without them, the last line is shorter
than it should be, at least in some cases.
2021-06-21 23:33:00 -07:00
John MacFarlane
9867231779 Revert "LaTeX writer: put a strut after a line break (\\)."
This reverts commit e2a7ecb5f7.
2021-06-21 23:19:40 -07:00
John MacFarlane
e2a7ecb5f7 LaTeX writer: put a strut after a line break (\\).
This ensures that we have proper spacing before the next
line (which might e.g. be a table bottom border).
This gives better results in cases like test/command/7272.md.
2021-06-21 23:17:43 -07:00
John MacFarlane
0352f7845b Improve emailAddress in Text.Pandoc.Parsing.
Previously the parser would accept characters in domains
that are illegal in domains, and this sometimes caused it
to gobble bits of the following text.

Closes .

Note that this change, by itself, caused some txt2tag reader
tests to fail. txt2tags allows bare email addresses with
a following form query.  So, in addition to the change
to emailAddress, we modify the txt2tags parser so it can
still handle these cases.
2021-06-21 22:35:07 -07:00
John MacFarlane
ed3974a254 LaTeX writer: always use a minipage for cells with line breaks...
if width information is available.  Otherwise the way we treat them can
lead to content that overflows a cell.

Closes .
2021-06-21 18:25:36 -07:00
John MacFarlane
eee648447a LaTeX writer: Use \strut instead of ~ before \\ in empty line. 2021-06-21 18:25:07 -07:00
John MacFarlane
14b2eb2aeb reveal.js writer: better handling of options.
Previously it was impossible to specify false values for
options that default to true; setting the option to false
just caused the portion of the template setting the option
to be omitted.

Now we prepopulate all the variables with their default
values, including them unconditionally and allowing them
to be overridden.
2021-06-21 16:40:52 -07:00
John MacFarlane
82ad855f38 Markdown writer: Fix regression in code blocks with attributes.
Code blocks with a single class but nonempty attributes
were having attributes drop as a result of .

Closes .
2021-06-21 08:49:00 -07:00
John MacFarlane
3fb5499dd6 insertMediaBag: ensure we get sane mediaPath for URLs.
Long URLs cannot be treated as mediaPaths, but System.FilePath's
`isRelative` often returns True for them.  So we add a check
for an absolute URL.  We also ensure that extensions are derived
only from the path portion of URLs (previously a following query
was being included).

Closes .
2021-06-18 13:19:24 -07:00
John MacFarlane
cfa26e3ca0 Docx reader: handle absolute URIs in Relationship Target.
Closes .
2021-06-12 13:56:09 -07:00
John MacFarlane
ea53a1dc5c Markdown writer: allow pipe_tables to be disabled for commonmark...
(commonmark_x, gfm).  Closes .
2021-06-12 10:20:19 -07:00
John MacFarlane
b0cd6c6224 Fix regression in citeproc processing.
If inline references are used (in the metadata `references` field),
we should still only include in the bibliography items that are
actually cited -- unless `nocite` is used.

Closes .
2021-06-12 10:16:44 -07:00
John MacFarlane
3776e828a8 Fix MediaBag regressions.
With the 2.14 release `--extract-media` stopped working as before;
there could be mismatches between the paths in the rendered document and
the extracted media.

This patch makes several changes (while keeping the same API).

The `mediaPath` in 2.14 was always constructed from the SHA1 hash of
the media contents.  Now, we preserve the original path unless it's
an absolute path or contains `..` segments (in that case we use a path
based on the SHA1 hash of the contents).

When constructing a path from the SHA1 hash, we always use the
original extension, if there is one. Otherwise we look up an
appropriate extension for the mime type.

`mediaDirectory` and `mediaItems` now use the `mediaPath`, rather
than the mediabag key, for the first component of the tuple.
This makes more sense, I think, and fits with the documentation
of these functions; eventually, though, we should rework the API so that
`mediaItems` returns both the keys and the MediaItems.

Rewriting of source paths in `extractMedia` has been fixed.

`fillMediaBag` has been modified so that it doesn't modify
image paths (that was part of the problem in ).

We now do path normalization (e.g. `\` separators on Windows) only
in writing the media; the paths are left unchanged in the image
links (sensibly, since they might be URLs and not file paths).

These changes should restore the original behavior from before 2.14.

Closes .
2021-06-10 16:47:02 -07:00
John MacFarlane
aa79b3035c T.P.MIME, extensionFromMimeType: add a few special cases.
When we do a reverse lookup in the MIME table, we just get the
last match, so when the same mime type is associated with several
different extensions, we sometimes got weird results, e.g. `.vs`
for `text/plain`.  These special cases help us get the most standard
extensions for mime types like `text/plain`.
2021-06-10 16:36:54 -07:00
Albert Krewinkel
c7dd33d5aa
Docx writer: fix handling of empty table headers
A table header which does not contain any cells is now treated as an
empty header.

Fixes: 
2021-06-10 18:36:49 +02:00
Albert Krewinkel
55bcd4b4fb
Lua utils: fix handling of table headers in from_simple_table
Passing an empty list of header cells now results in an empty table
header.

Fixes: 
2021-06-10 18:36:49 +02:00
John MacFarlane
76e5f047b0 Citeproc: avoid duplicate classes and attributes on refs div. 2021-06-08 17:51:53 -07:00
John MacFarlane
21cc52abe3 LaTeX writer: Fix regression in table header position.
In recent versions the table headers were no longer bottom-aligned
(if more than one line).  This patch fixes that by using minipages
for table headers in non-simple tables.

Closes .
2021-06-05 14:13:58 -06:00
Jan Tojnar
c550bf8482 CommonMark writer: do not use simple class for fenced-divs
In https://github.com/jgm/pandoc/pull/7242, we introduced a simple attribute style for for code blocks and fenced divs with a single class but turns out the CommonMark extension does not support it for fenced divs.

https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/fenced_divs.md
2021-06-05 13:51:18 -06:00