Commit graph

7339 commits

Author SHA1 Message Date
John MacFarlane
56fb4dae1b Citeproc: ensure that CSL-related attributes are passed on...
...to a Div with id 'refs'.  Previously we just left the
attributes of such a Div alone, which meant that style
options like entry-spacing had no effect there.
2021-05-17 20:42:43 -07:00
Albert Krewinkel
1843a8793a
HTML writer: keep attributes from code nested below pre tag.
If a code block is defined with `<pre><code
class="language-x">…</code></pre>`, where the `<pre>` element has no
attributes, then the attributes from the `<code>` element are used
instead. Any leading `language-` prefix is dropped in the code's *class*
attribute are dropped to improve syntax highlighting.

Closes: #7221
2021-05-17 18:08:02 +02:00
Albert Krewinkel
25f5b92777
HTML writer: ensure headings only have valid attribs in HTML4
Fixes: #5944
2021-05-17 15:42:15 +02:00
Albert Krewinkel
4417dacc44
ConTeXt writer: use span identifiers as reference anchors.
Closes: #7246
2021-05-17 13:14:32 +02:00
Albert Krewinkel
d92622ba3c
LaTeX template: define commands for zero width non-joiner character
Closes: #6639

The zero-width non-joiner character is used to avoid ligatures (e.g. in
German).
2021-05-16 12:33:32 -07:00
John MacFarlane
5a6399d9f6 Markdown writer: fewer unneeded escapes for #.
See #6259.
2021-05-16 12:23:34 -07:00
John MacFarlane
39a69c4f93 Markdown writer: improve escaping of @.
We need to escape literal `@` before `{` because of
the new citation syntax.
2021-05-16 11:53:19 -07:00
John MacFarlane
0a4c6925b6 Docx writer: copy over more settings from referenc.odcx.
From settings.xml in the reference-doc, we now include:
`zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`,
`drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`,
`displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`,
`characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`,
`decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`.

Closes #7240.
2021-05-15 15:40:49 -07:00
Albert Krewinkel
0794862aac
HTML writer: parse <header> as a Div
HTML5 `<header>` elements are treated like `<div>` elements.
2021-05-15 16:46:02 +02:00
Albert Krewinkel
013e4a3164
HTML reader: keep h1 tags as normal headers (#7274)
The tags `<title>` and `<h1 class="title">` often contain the same
information, so the latter was dropped from the document. However, as
this can lead to loss of information, the heading is now always
retained.

Use `--shift-heading-level-by=-1` to turn the `<h1>` into the document
title, or a filter to restore the previous behavior.

Closes: #2293
2021-05-14 12:31:24 -07:00
John MacFarlane
76a4e7127b Beamer writer: support exampleblock and alertblock.
A block will be rendered as an exampleblock if the heading
has class `example` and alertblock if it has class `alert`.

Closes #7278.
2021-05-14 10:09:46 -07:00
Albert Krewinkel
3ec5726c9b
Docx writer: fix alignment for cells.
This fixes a regression introduced with the in the colspan/rowspan
changes that caused column alignments to be ignored. The column
alignment is used only if a default alignment is specified at the cell
level; otherwise the cell-level alignment takes precedence.
2021-05-14 16:49:19 +02:00
Albert Krewinkel
17d96404f5
Docx writer: allow multirow table headers 2021-05-14 16:19:20 +02:00
Albert Krewinkel
875f8f3654
HTML reader: don't fail on unmatched closing "script" tag.
Prevent the reader from crashing if the HTML input contains an unmatched
closing `</script>` tag.

Fixes: #7282
2021-05-14 12:13:40 +02:00
John MacFarlane
3f09f53459 Implement curly-brace syntax for Markdown citation keys.
The change provides a way to use citation keys that contain
special characters not usable with the standard citation
key syntax.  Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`.
Closes #6026.

The change requires adding a new parameter to the `citeKey`
parser from Text.Pandoc.Parsing [API change].

Markdown reader: recognize @{..} syntax for citatinos.

Markdown writer:  use @{..} syntax for citations when needed.

Update manual with curly-brace syntax for citations.

Closes #6026.
2021-05-13 21:59:32 -07:00
John MacFarlane
edca1d1656 Plain writer: handle superscript unicode minus.
Closes #7276.  Note:  currently we still get unwanted
white space around the minus; this needs to be addressed
with a change in texmath.
2021-05-12 11:12:27 -07:00
John MacFarlane
0217ae2a4f Hande 'annote' field in bibtex/biblatex writer.
Closes #7266.
2021-05-12 11:05:55 -07:00
John MacFarlane
46309319ef Fix source position reporting for YAML bibliographies.
Closes #7273.
2021-05-12 06:01:13 -06:00
John MacFarlane
5eb7ad7d1e Improve integration of settings from reference.docx.
The settings we can carry over from a reference.docx are
autoHyphenation, consecutiveHyphenLimit, hyphenationZone,
doNotHyphenateCap, evenAndOddHeaders, and proofState.

Previously this was implemented in a buggy way, so that the
reference doc's values AND the new values were included.

This change allows users to create a reference.docx that
sets w:proofState for spelling or grammar to "dirty,"
so that spell/grammar checking will be triggered on the
generated docx.

Closes #1209.
2021-05-11 22:31:38 -06:00
John MacFarlane
a66e50840b T.P.XML.Light - add Eq, Ord instances...
for Content, Element, Attr, CDataKind.
[API change]
2021-05-11 09:01:36 -06:00
John MacFarlane
2bd5d0cafb LaTeX writer: better handling of line breaks in simple tables.
Now we also handle the case where they're embedded in other
elements, e.g. spans. Closes #7272.
2021-05-11 07:52:05 -06:00
nuew
ff7176de80
epub Writer: Fix belongs-to-collection XML id choice (#7267)
The epub writer previously used the same XML id for both the book
identifier and the epub collection. This causes an error on epubcheck.
2021-05-10 09:26:32 -06:00
John MacFarlane
2a2e08d823 RST reader: seek include files in the directory...
...of the file containing the include directive, as
RST requires.

Closes #6632.
2021-05-09 19:11:35 -06:00
John MacFarlane
b2398cd747 Org reader: Resolve org includes relative to ...
...the directory containing the file containing the
INCLUDE directive.  Closes #5501.
2021-05-09 19:11:35 -06:00
John MacFarlane
41a3ac9da9 RST reader: use insertIncludedFile from T.P.Parsing...
instead of reproducing much of its code.
2021-05-09 19:11:34 -06:00
John MacFarlane
05ea507bd7 T.P.Parsing: improve include file functions.
Remove old `insertIncludedFileF`. [API change]
Give `insertIncludedFile` a more general type, allowing it
to be used where `insertIncludedFileF` was.
2021-05-09 19:11:34 -06:00
John MacFarlane
6e45607f99 Change reader types, allowing better tracking of source positions.
Previously, when multiple file arguments were provided, pandoc
simply concatenated them and passed the contents to the readers,
which took a Text argument.

As a result, the readers had no way of knowing which file
was the source of any particular bit of text.  This meant that
we couldn't report accurate source positions on errors or
include accurate source positions as attributes in the AST.
More seriously, it meant that we couldn't resolve resource
paths relative to the files containing them
(see e.g. #5501, #6632, #6384, #3752).

Add Text.Pandoc.Sources (exported module), with a `Sources` type
and a `ToSources` class.  A `Sources` wraps a list of `(SourcePos,
Text)` pairs. [API change] A parsec `Stream` instance is provided for
`Sources`.  The module also exports versions of parsec's `satisfy` and
other Char parsers that track source positions accurately from a
`Sources` stream (or any instance of the new `UpdateSourcePos` class).

Text.Pandoc.Parsing now exports these modified Char parsers instead of
the ones parsec provides.  Modified parsers to use a `Sources` as stream
[API change].

The readers that previously took a `Text` argument have been
modified to take any instance of `ToSources`. So, they may still
be used with a `Text`, but they can also be used with a `Sources`
object.

In Text.Pandoc.Error, modified the constructor PandocParsecError
to take a `Sources` rather than a `Text` as first argument,
so parse error locations can be accurately reported.

T.P.Error: showPos, do not print "-" as source name.
2021-05-09 19:11:34 -06:00
Albert Krewinkel
295d93e96b
ConTeXt writer: support blank lines in line blocks.
Fixes: #6564

Thanks to @denismaier.
2021-05-07 17:17:47 +02:00
Albert Krewinkel
8357b835d9
App: allow tabs expansion even if file-scope is used
Tabs in plain-text inputs are now handled correctly, even if the
`--file-scope` flag is used.

Closes: #6709
2021-05-05 19:09:21 +02:00
Albert Krewinkel
ddbf83f62c
Docx writer: support colspans and rowspans in tables
See: #6315
2021-05-01 18:52:24 +02:00
Albert Krewinkel
3da919e35d
Add new internal module Text.Pandoc.Writers.GridTable 2021-05-01 18:52:24 +02:00
tecosaur
6b16f3bb0d
Org writer: inline latex envs need newlines (#7259)
Closes #7252

As specified in https://orgmode.org/manual/LaTeX-fragments.html, an
inline \begin{}...\end{} LaTeX block must start on a new line.
2021-04-30 10:23:28 +02:00
mbrackeantidot
b6a65445e1
Docx reader: add handling of vml image objects (jgm#4735) (#7257)
They represent images, the same way as other images in vml format.
2021-04-29 09:11:44 -07:00
John MacFarlane
d14c5f94df Further improvements in smart quotes.
Improves heuristic for detection of an "open double quote."
Closes #2103.
2021-04-29 08:48:49 -07:00
John MacFarlane
80e2e88287 Smarter smart quotes.
Treat a leading " with no closing " as a left curly quote.
This supports the practice, in fiction, of continuing
paragraphs quoting the same speaker without an end quote.
It also helps with quotes that break over lines in line
blocks.

Closes #7216.
2021-04-28 23:32:37 -07:00
Albert Krewinkel
85f379e474
JATS writer: use either styled-content or named-content for spans.
If the element has a content-type attribute, or at least one class, then
that value is used as `content-type` and the span is put inside a
`<named-content>` element. Otherwise a `<styled-content>` element is
used instead.

Closes: #7211
2021-04-28 22:21:34 +02:00
Albert Krewinkel
0921b82d98
Docx writer: autoset table width if no column has an explicit width. 2021-04-27 13:27:20 +02:00
John MacFarlane
3a98f7a0c7 Minor code reformatting.
Also taking this opportunity to note, for the record, that
the commit for #7241 should be marked [API change].
It changes the type of `languagesByExtension` in Highlighting,
adding a parameter for a `SyntaxMap`.
2021-04-25 12:22:04 -07:00
Jan Tojnar
c56d080a25
Writers: Recognize custom syntax definitions (#7241)
Languages defined using `--syntax-definition` were not recognized by `languagesByExtension`.
This patch corrects that, allowing the writers to see all custom definitions.

The LaTeX still uses the default syntax map, but that's okay in that context, since
`--syntax-definition` won't create new listings styles.
2021-04-25 12:19:07 -07:00
Jan Tojnar
e9c0f9f97b
Markdown writer: Cleaner (code)blocks with single class (#7242)
When a block only has a single class and no other attributes,
it is not necessary to wrap the class attribute in curly braces –
the class name can be placed after the opening mark as is.

This will result in bit cleaner output when pandoc is used
as a markdown pretty-printer.
2021-04-25 10:36:06 -07:00
John MacFarlane
547bc2cdf8 Add quotes properly in markdown YAML metadata fields.
This fixes a bug, which caused the writer to look at the LAST
rather than the FIRST character in determining whether quotes
were needed.  So we got spurious quotes in some cases and
didn't get necessary quotes in others.

Closes #7245.  Updated a number of test cases accordingly.
2021-04-25 10:31:33 -07:00
Albert Krewinkel
dc0ba7294d
Docx writer: add missing file 2021-04-20 13:38:16 +02:00
Albert Krewinkel
0b74bbbdaa
Docx writer: extract Table handling into separate module 2021-04-20 10:57:54 +02:00
John MacFarlane
16d372abcb Issue error message when reader or writer format is malformed.
Previously we exited with an error status but (due to a bug)
no message.

Closes #7231.
2021-04-19 08:38:31 -07:00
John MacFarlane
73d394ca2a Use MetaInlines not MetaBlocks for multimarkdown metadata fields.
This gives better results in converting to e.g. pandoc markdown.

Ref: <https://groups.google.com/d/msgid/pandoc-discuss/9728d1f4-040e-4392-aa04-148f648a8dfdn%40googlegroups.com>
2021-04-18 22:01:12 -07:00
John MacFarlane
a478a5c4c8 Update to released unicode-collation, latest citeproc dev version.
Update citeproc test.
2021-04-17 16:15:14 -07:00
John MacFarlane
7a7fefce5e Use document's lang for the lang parameter of citeproc...
even if it differs from localeLanguage.  (It is designed
to be possible to override the locale language, and this
is especially useful when one wants to use the unicode
extension syntx, e.g. fr-u-kb.)
2021-04-17 16:15:14 -07:00
John MacFarlane
aecbf8156e Remove Text.Pandoc.BCP47 module.
[API change]

Use Lang from UnicodeCollation.Lang instead.
This is a richer implementation of BCP 47.
2021-04-17 16:15:14 -07:00
John MacFarlane
7ba8c0d2a5 Move getLang from BCP47 -> T.P.Writers.Shared.
[API change]
2021-04-17 16:15:13 -07:00
Albert Krewinkel
5f79a66ed6
JATS writer: reduce unnecessary use of <p> elements for wrapping
The `<p>` element is used for wrapping in cases were the contents would
otherwise not be allowed in a certain context. Unnecessary wrapping is
avoided, especially around quotes (`<disp-quote>` elements).

Closes: #7227
2021-04-16 22:47:37 +02:00