Commit graph

8003 commits

Author SHA1 Message Date
John MacFarlane
7fb74b74df JATS reader: strip ref- prefix from ref id in xref.
This completes commit 807a574e9d.
Closes #8007.
2022-04-06 23:14:10 -07:00
John MacFarlane
f406f93dab LaTeX reader: avoid a thunk in sRawTokens. 2022-04-02 11:00:44 -07:00
John MacFarlane
1b97846be2 Fix regression with ascii_identifiers and Turkish undotted i.
Closes #8003.
2022-04-01 10:41:33 -07:00
John MacFarlane
98ff548c5e Revert "Parsing.General: make manyChar1, etc. more strict."
This reverts commit c1ab48874c.

Mistake in measurement.
2022-03-31 23:45:28 -07:00
John MacFarlane
c1ab48874c Parsing.General: make manyChar1, etc. more strict.
Profiling the muse reader revealed that these were creating huge thunks.
2022-03-31 23:09:14 -07:00
John MacFarlane
ffa13769e6 RTF reader: increased stricness.
This leads to some performance improvements.
2022-03-31 10:11:48 -07:00
John MacFarlane
8b21ec7d0c Markdown reader: add some strictness.
This improves some benchmarks significantly.
2022-03-31 10:11:48 -07:00
Albert Krewinkel
ad726953b9
Lua: allow to pass Sources to pandoc.read (#8002)
Sources, the data type passed to the `Reader` function in custom
readers, are now accepted as input to `pandoc.read`.
2022-03-30 14:10:30 -07:00
John MacFarlane
b8e0d574b1 STrictness improvement in RTF reader. 2022-03-30 13:19:09 -07:00
John MacFarlane
5f0bfd41a8 LaTeX writer: add () after booktabs rules.
These commands take optional arguments with () and [],
which can lead to problems if the content of the table
cell begins with these characters.

Closes #8001.
2022-03-30 10:07:09 -07:00
John MacFarlane
bb5f0f7b76 HTML writer: Further performance improvements. 2022-03-30 09:48:56 -07:00
John MacFarlane
4a54ca5b0b Add mime type for mkv extension (#7181). 2022-03-30 09:44:22 -07:00
John MacFarlane
d71d01f41a HTML writer: add a performance shortcut to strToHtml. 2022-03-30 09:34:16 -07:00
John MacFarlane
5fbea20e03 Fixed two thunk leaks in RTF reader.
This further reduces memory usage.
See #7943.
2022-03-29 22:42:20 -07:00
John MacFarlane
76748ee0fe JATS reader: handle pub-date.
Closes #8000.
2022-03-29 19:41:14 -07:00
John MacFarlane
a9498a1568 LaTeX writer: support page,trim,clip attributes on images.
These are actually supported by `\includegraphics`, though
this is not well documented. See
https://tex.stackexchange.com/questions/7938/pdflatex-includegraphics-and-multi-page-pdf-files

Partially addresses #7181.
2022-03-29 09:03:28 -07:00
Albert Krewinkel
7a7e1b2b70
RST reader: wrap math in Span to preserve attributes (#7998)
Math elements with a name, classes, or other fields are wrapped in a
`Span` with these attributes.
2022-03-29 08:50:55 -07:00
Jonathan Dönszelmann
cd931e55b6
Refactor Text.Pandoc.Writers.EPUB (#7991)
Refactor for readability.

Co-authored-by: Ola Wolska <A.k.wolska@student.tudelft.nl@gmail.com>
Co-authored-by: Ivar de Bruin <ivardb@gmail.com>
Co-authored-by: Jaap de Jong <jaapdejong15@gmail.com>
2022-03-29 08:40:20 -07:00
Albert Krewinkel
40dd8fd129
Include Lua version in --version output. (#7997) 2022-03-29 08:38:00 -07:00
Albert Krewinkel
e4f4be6c80
Remove redundant dependency on hslua-marshalling.
The package is a dependency of hslua; all important modules are
re-exported.
2022-03-29 08:04:49 +02:00
John MacFarlane
807a574e9d JATS reader: strip 'ref-' from ref id in constructing CSL id.
This allows better round-tripping, because the JATS
writer adds the `ref-` prefix to the citation id to get
the ref element's id.
2022-03-28 18:50:03 -07:00
John MacFarlane
51c8b059e1 JATS reader: improve refs parsing.
Handle issn and isbn; use simpler form for issued date.
2022-03-28 18:37:33 -07:00
John MacFarlane
6217fd0976 JATS writer: Fix handling of CSL variable 'page'.
Not 'pages' as we had before.  It should go to 'lpage' and 'rpage',
not 'page-range'.  See
https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/page-range.html

Fixed some mistakes in test #7042.
2022-03-28 17:04:10 -07:00
John MacFarlane
5c7dc4c7f3 JATS reader: support PMID, DOI, issue in citations.
Closes #7995.
2022-03-28 17:04:10 -07:00
Albert Krewinkel
c5cd03a022
JATS writer: keep edition info in element citations.
Closes: #7993
2022-03-28 21:45:56 +02:00
John MacFarlane
35350fac85 JATS writer: avoid doubled ref-list element.
Previously when generating JATS with the `element_citations`
extension enabled, the references were put in a doubly-nested
ref-list element (`<ref-list><ref-list>...`).  This is now fixed.

Closes #7990.
2022-03-27 09:32:55 -07:00
Nikolai Korobeinikov
e2923747a4
Docx writer: add bookmark with table id to table (#7989)
This allows tables with ids to be linked to.

Closes #7285.
2022-03-26 10:00:05 -07:00
John MacFarlane
51f18d52c7 Rename T.P.Parsing.Combinators -> T.P.Parsing.General.
Because many of the exported things aren't combinators...

Also remove redundant explot of indentWith from T.P.Parsing.Lists.
2022-03-25 11:14:54 -07:00
John MacFarlane
f520ac9b17 T.P.Parsing: use explicit imports. 2022-03-25 11:03:35 -07:00
John MacFarlane
1572c27241 More optimization of RTF reader. 2022-03-25 09:14:47 -07:00
John MacFarlane
672822cf98 RTF reader: optimize parsing of unformatted text. 2022-03-25 08:38:50 -07:00
John MacFarlane
dafdd16e10 Sources: small strictness optimization 2022-03-25 08:38:10 -07:00
John MacFarlane
36786e86fb RTF reader: more memory usage optimizations.
See #7943.
2022-03-24 23:39:14 -07:00
John MacFarlane
0de829090c Small optimizations in RTF reader. 2022-03-24 22:39:24 -07:00
Albert Krewinkel
b9eeb77df5
[API change] Unify grid table parsing (#7971)
Grid table parsing in Markdown and rst are updated use the same
functions. Functions are generalized to meet requirements for both
formats.

This change also lays the ground for further generalizations in table
parsers, including support for advanced table features.

API changes in Text.Pandoc.Parsing:

- Parse results of functions `tableWith'` and `gridTableWith'` are now a
  `mf TableComponents` instead of a quadruple of alignments, column
  widths, header rows and body rows.

Additional exports from Text.Pandoc.Parsing:

- `tableWith'`
- `TableComponents`
- `TableNormalization`
- `toTableComponents`
- `toTableComponents'`
2022-03-24 11:59:20 -07:00
John MacFarlane
9fa2aeb489 RTF reader: more efficient parsing of command parameters. 2022-03-24 11:38:55 -07:00
Albert Krewinkel
4394fdf59c
JATS writer: encode author "others" as <etal/>
Citeproc adopted the BibTeX convention to use the author name "others"
when there are additional authors that are not named. JATS uses the
`<etal>` element for this.
2022-03-22 15:09:14 +01:00
Albert Krewinkel
69177861a4
Parsing.GridTable: simplify column handling code. 2022-03-18 14:20:49 +01:00
Albert Krewinkel
eaba313fb3
Writers.GridTable: improve module documentation. 2022-03-18 14:16:03 +01:00
Albert Krewinkel
43e549b2fb
Markdown writer: move table-related code into submodule. 2022-03-18 14:15:56 +01:00
John MacFarlane
75ddff2422 Allow formatted bibliography to be placed in metadata fields.
This modifies `processCitations` so that pandoc will look not just
in the document body but in metadata for a Div with id `refs` in
which to place the formatted bibliography.

Thus, one can include a metadata field, say `refs`, whose content
is an empty div with id `refs`, and the formatted bibliography
will be put into this metadata field.  It may then be interpolated
into a template using the variable `refs`.

Closes #7969.

Closes #526 by providing a way to interpolate references into
a template.
2022-03-16 14:37:51 -07:00
John MacFarlane
54f6e1be9b Remove native_divs from allowed gfm extensions.
This allows `<div>` to be suppressed using `-raw_html`.
Previously `native_divs` was enabled but could
not be suppressed, because it was not in the list of
available extensions for commonmark-based formats.

Closes #7965.
2022-03-14 12:45:45 -07:00
Albert Krewinkel
1aeeba9ecb
Shared: define ordNub as alias for nubOrd from containers package (#7963)
This requires at least containers 0.6.0.1, which ships with the oldest
GHC version currently supported by pandoc (GHC 8.6).
2022-03-13 08:42:30 -07:00
Albert Krewinkel
edfe34c86c
Document more functions in T.P.Parsing and T.P.Shared. 2022-03-12 23:16:31 +01:00
John MacFarlane
699336cf5b LaTeX reader: better handling of \usepackage.
If the package is local but causes parse errors, parse
everything up to the error and skip the rest.  Issue a
CouldNotParseIncludeFile warning indicating that parsing
failed at that point.

T.P.Logging: add CouldNotParseIncludeFile constructor.
2022-03-12 12:18:51 -08:00
John MacFarlane
f9a4e049c5 T.P.Readers.LaTeX.Parsing: Monoid and Semigroup instances for TokStream. 2022-03-12 10:23:25 -08:00
John MacFarlane
6abcde0bf7 LaTeX reader: further optimizations for inline parsing. 2022-03-11 21:59:26 -08:00
John MacFarlane
b423c17100 LaTeX reader: use custom TokStream...
that keeps track of whether macros are expanded. This allows
us to improve performance a bit by avoiding unnecessary
runs of the macro expansion code (e.g. from 24 ms to 20 ms on
our standard benchmark).
2022-03-11 19:51:59 -08:00
Albert Krewinkel
517bceeba8
Parsing: partition module into (internal) submodules (#7962) 2022-03-11 09:21:59 -08:00
Albert Krewinkel
168529f0a4
Org writer: stop indenting property drawers, quote blocks
This follows the current default org-mode behavior.

Closes: #3245
2022-03-11 12:12:04 +01:00
John MacFarlane
a7d94dba43 Org reader: allow multiple #+bibliography:. 2022-03-10 13:31:02 -08:00
John MacFarlane
18c432024b Org reader: parse #+print_bibliography: as Div with id refs. 2022-03-10 13:15:52 -08:00
John MacFarlane
581c94913f LaTeX reader: allow inline groups starting with \bgroup.
Closes #7953.
2022-03-09 17:53:00 -08:00
John MacFarlane
9b5ec100e5 Markdown writer: update escaping rules for \.
We now escape `\` only if `raw_tex` is enabled or
it is followed by a non-alphanumeric.
2022-03-07 10:46:07 -08:00
John MacFarlane
abffe63274 Remove raw_tex extension from list of commonmark...
extensions, and from the `commonmark_x` defaults.
commonmark doesn't parse raw TeX, and it doesn't
make sense to write it if we don't parse it.
2022-03-07 10:45:03 -08:00
John MacFarlane
0124e8b095 Org reader: handle #+bibliography: as metadata...
so that it can work with citeproc.
2022-03-04 22:50:17 -08:00
John MacFarlane
326a00ab1a DocBook reader: handle address and coyright in metadata.
See #7747.
2022-02-28 10:22:09 -08:00
John MacFarlane
b94ad5b2ed DocBook reader: improve info parsing.
Simplify metadata parsing code.
Handle abstract as block-level content.
Report skipped info elements with `--verbose`.

See #7747.
2022-02-28 10:19:04 -08:00
John MacFarlane
dc0fdb2709 DocBook reader: handle abstract in info section.
See #7747.
2022-02-28 08:38:03 -08:00
John MacFarlane
97c4f3f237 LaTeX reader: rudimentary support for vbox.
Closes #7939.
2022-02-27 23:24:30 -08:00
John MacFarlane
7f6021d7b2 Markdown writer: don't produce redundant header identifier...
when the `gfm_auto_identifiers` extension is set.

Closes #7941.
2022-02-26 11:37:46 -08:00
John MacFarlane
5375bd1446 DocBook reader: handle complete set of entities...
as specified at <https://www.w3.org/2003/entities/2007doc/byalpha.html>.

Closes #7938.
2022-02-24 15:50:53 -08:00
John MacFarlane
7dea81f992 Text.Pandoc.XML.Light: add versions of the parsers...
that allow specifying a custom entity map.

Exports new functions `parseXMLElementWithEntities`,
`parseXMLContentsWithEntities` [API change].
2022-02-24 14:47:35 -08:00
John MacFarlane
2b05ce6a81 Ensure that valid XML identifiers are used in...
Docbook, EPUB, FB2, HTML4, S5, Slidy, Slideous,
ICML, ODT, TEI writers.

Thus, if you convert `[anchor]{#1} and [link to](#1)`,
`id_1` will be used instead of `1` for the identifier.
2022-02-23 16:54:37 -08:00
John MacFarlane
9dc5e31416 T.P.Writers.Shared: export ensureValidXmlIdentifiers.
This function changes identifiers that don't start
with letters, and internal links to these identifiers,
making them compatible with XML standards.  The change
is simple: we add `id_` to the front.  There is potential
for duplication if there are already `id_...` identifiers
defined, but this seems rare enough not to worry too much
about.
2022-02-23 16:53:01 -08:00
John MacFarlane
8ceea05c75 Markdown reader: remove restriction on identifiers...
so they no longer need to begin with a letter. Closes #7920.
2022-02-23 10:04:15 -08:00
Albert Krewinkel
cfa473e9cd
Remove trailing whitespace in Writers.Markdown. 2022-02-23 09:21:25 +01:00
John MacFarlane
3d7eb129bd --version: print hslua version.
This will help us determine which version of Lua pandoc
is compiled against. See #7929.
2022-02-22 14:00:15 -08:00
ivardb
91b391e5a6
Add scrreport to the latex writer chaptersClasses. Fixed #6168 (#7935)
If scrreport is now chosen as the latex documentclass chapters will be used instead of sections. This behaviour is intended as scrreport is an alias for scrreprt which already created chapters
2022-02-22 13:20:16 -08:00
John MacFarlane
aa90302284 LaTeX writer: avoid extra space before \CSLRightInline.
Closes #7932.
2022-02-22 13:19:16 -08:00
Dimitris Apostolou
2f521081ad
Fix typos (#7934) 2022-02-22 09:05:39 -08:00
Lucas V. R
3bc3e96837 Org reader: More flexible LaTeX environments
Looking at the definition of `org-element-latex-environment-parser`, one
sees that Org allows arbitrary arguments to LaTeX environments. In fact,
it parses every char just after `\begin{xxx}` until `\end{xxx}` as
content for the environment, so all the following examples are valid
environments:

```org
\begin{equation} e = mc^2 \end{equations}
```
```org
\begin{tikzcd}[ampersand replacement=\&]
	A \& B \\
	C \& D
	\arrow[from=1-1, to=1-2]
	\arrow["f", from=2-1, to=2-2]
\end{tikzcd}
```
2022-02-21 16:04:14 +01:00
John MacFarlane
6fe8014a2c LaTeX reader: Handle \label and \ref for footnotes.
Closes #7930.
2022-02-19 11:55:46 -08:00
Albert Krewinkel
a3117bc142
Relax upper bound for hslua, allow hslua-2.2. (#7929)
Lua 5.4 is used by default after this is merged. Packagers may still include Lua 5.3
instead by building pandoc with `--constraint='hslua <2.2'`.

Differences between 5.3 and 5.4 should not generally affect pandoc Lua filters.
See list of incompatible changes here  <https://www.lua.org/manual/5.4/manual.html#8.1>
2022-02-19 11:26:18 -08:00
John MacFarlane
fb465070eb Ipynb writer: handle metadata better.
Previously we used the markdown writer to render metadata.
This had some undesirable consequences (e.g. en dash expanded
to `--` when `smart` enabled), so now we use the plain writer.

This addresses #7928, but I think a more elegant fix is possible.
2022-02-18 17:46:36 -08:00
John MacFarlane
5b84c0f09d Change --metadata-file parsing...
...so that, when the input format is not markdown or a markdown
variant, pandoc's markdown is used.  When the input format is
a markdown variant, the same format is used.  Reason for the change:
it doesn't make sense to run the markdown parser with a set of
extensions designed for a non-markdown format, and this dramatically
limits what people can do in metadata files.

Refines #6832.  Closes #7926.

Perhaps this can be reconsidered if we come up with a way
of specifying an arbitrary format for the metadata file (#5914).
2022-02-18 09:35:38 -08:00
John MacFarlane
85136b064f Markdown reader: allow one-column pipe tables with pipe on right.
See #7919.

We still need to implement this for gfm (commonmark).
This must be done via changes in commonmark-hs.
2022-02-13 13:06:49 -08:00
John MacFarlane
f3b0f19d7a Ensure that you don't get PDF output to terminal.
`-t pdf` should behave like `-t docx` and give an error
unless the output is redirected.
2022-02-13 00:26:54 -08:00
Albert Krewinkel
e1b7f3a63d
JATS reader: improve handling of fn-group elements (#7914)
Footnotes in `<fn-group>` elements are collected and re-inserted into
the document as proper footnotes in the place where they are referenced.

Fixes: #6348
2022-02-12 17:39:02 -08:00
damon-sava-stanley
01ec1ac43a
Put id attributes on TOC entries #7907 (#7913)
Naming scheme of id is "toc-" + id of linked to header/section.
In Shared, will effect HTML, Markdown, Powerpoint, and RTF.
2022-02-11 21:37:00 -08:00
John MacFarlane
899feec4d3 RST reader: fix treatment of headerless simple tables.
We were producing a header with blank cells rather than no
header.  Closes #7902.
2022-02-11 09:42:24 -08:00
John MacFarlane
e8c1c6adb1 Clean up import list. 2022-02-11 09:29:55 -08:00
damon-sava-stanley
d3716eaeb6
Add DokuWiki table alignment for #5202 (#7908)
Closes #5202.

Within each cell, determine the cell alignment as per
https://www.dokuwiki.org/wiki:syntax#tables. The current approach, as
per the issue treats the first row's alignment as determining
that of the entire column. Given this, it wastes some work in
determining an alignment for every cell.
2022-02-11 08:58:29 -08:00
John MacFarlane
61996682ff --self-contained: issue warning rather than failing...
with an error, if a resource can't be found.
Closes #7904.
2022-02-10 21:39:15 -08:00
John MacFarlane
97d83e383a LaTeX reader: support \today.
Closes #7905.
2022-02-10 19:17:20 -08:00
John MacFarlane
7a888e8603 Fix parsing of epub footnotes.
Closes #7884.
2022-02-09 11:47:34 -08:00
Albert Krewinkel
7dc59aa26a
PDF: allow custom writer as format if engine is explicitly specified (#7901)
Closes #7898.
Note that it may be necessary to explicitly specify a template on the command line.
2022-02-09 10:16:41 -08:00
mjfs
63deba49d4
Docx: single numbering ID for examples - fixes #7895 (#7900)
This change ensures that example list items all belong to a single
number sequence, so that if items are added or deleted in a word
processor, the other items will renumber automatically.
2022-02-09 10:15:01 -08:00
John MacFarlane
93b1dbfdac HTML reader: give warnings and emit empty note...
when parsing `<a epub:type="noteref">` and the identifier
doesn't correspond to anything in the note table.

Previously we just silently skipped these cases.

See #7884.
2022-02-07 09:22:05 -08:00
Albert Krewinkel
4864761ad8
Custom writer: produce stacktrace if Writer function fails 2022-02-07 09:45:32 +01:00
Albert Krewinkel
0f0b042139 Custom writer: support new-style Writer function. 2022-02-06 16:37:39 -08:00
Albert Krewinkel
f738c451d7 Lua: move custom writer code into Lua hierarchy. 2022-02-06 16:37:39 -08:00
Albert Krewinkel
49f1e7608e Lua: add module pandoc.layout to format and layout text 2022-02-06 16:01:24 -08:00
Lucas V. R
ae846381c3 Org reader: allow comments above property drawer
The Org Manual page at https://orgmode.org/manual/Property-Syntax.html
says (as of 2022-02-03):

"Property blocks defined before first headline needs to be located at
the top of the buffer, allowing only comments above."

This commit allows comments above.
2022-02-06 23:08:27 +01:00
Lucas V. R
61f4771c55 Org reader: allow ":" in property drawer keys
Any non-space character is allowed as property drawer key, including ":"
itself (so it is not really a delimiter). The real delimiter is a space
character, so in a drawer like

:PROPERTIES:
::k:ey:: value
:END:

":k:ey:" is a key with value "value".

This usage can be seen in the Org Manual at
https://orgmode.org/manual/Using-Header-Arguments.html,
where the Org snippet

* Heading
  :PROPERTIES:
  :header-args:clojure:    :session *clojure-1*
  :header-args:R:          :session *R*
  :END:

is listed as an example.
2022-02-06 23:08:27 +01:00
Jan Tojnar
876859f9e9
Docbook writer: Interpret links without contents as cross-references (#7360)
Links without text contents are converted to `<xref>` elements. DocBook
processors will generate appropriate cross-reference text when presented
with an xref element.
2022-02-06 23:05:20 +01:00
John MacFarlane
677f2ca26e Allow use of a RIS bibliography with citeproc. 2022-02-05 23:48:55 -08:00
John MacFarlane
6cc253aab6 RIS reader: support ID and DO fields. 2022-02-05 23:34:44 -08:00
John MacFarlane
3da5440858 Add RIS bibliography format reader.
New module, Text.Pandoc.Readers.RIS, exporting readRIS.

New input format `ris`.

Closes #7894.
2022-02-05 23:25:03 -08:00
Albert Krewinkel
c962cef309
Lua: set module name before pushing
Using the correct module name is relevant when auto-generating
documentation.
2022-02-05 13:32:02 +01:00
John MacFarlane
d40d94ebd9 EndNote reader: add nocite as the other bib format readers do. 2022-02-04 23:51:12 -08:00
John MacFarlane
b7f1c97b6a Docx zotero/mendeley/endnote: add comma before locator in suffix. 2022-02-04 23:28:46 -08:00
John MacFarlane
f48890eff0 Support Prefix, Suffix, Pages in endnote ADDINs. 2022-02-04 22:20:51 -08:00
John MacFarlane
19cfe6a907 Got endnote citations working in docx...
Still to do:  prefix, suffix, locator.
2022-02-04 21:54:50 -08:00
John MacFarlane
28349447cb Docx reader: skeleton for endnote citation ADDINs. 2022-02-04 17:02:43 -08:00
John MacFarlane
15316a0058 EndNote: export readEndNoteXMLCitation...
instead of readEndNoteXMLReferences.  This is the function
we'll need in the docx reader.

We still need to implement locator, prefix, and suffix.
2022-02-04 14:02:58 -08:00
John MacFarlane
d164e5bb1d Docx reader: parse EN.CITE and EN.REFLIST fields. 2022-02-04 10:04:16 -08:00
John MacFarlane
34897031f4 Add endnote XML reader.
New input format: endnotexml

New reader module: Text.Pandoc.Readers.EndNote, exporting
`readEndNoteXML` and `readEndNoteXMLReferences`. [API change]

This reader is still a bit rudimentary, but it should get
be good enough to be helpful.
2022-02-04 10:03:52 -08:00
John MacFarlane
e07c0e74ce Support embedded Mendeley citations in docx.
These are supported in the same way as Zotero citations,
using the same code.  As with Zotero, enable the `citations`
extension on `docx` to parse these as native citations.

Closes #7840.
2022-02-04 10:00:23 -08:00
John MacFarlane
d40236805d MediaBag: improve detection of absolute paths.
Previously we used System.FilePath's isRelative to
determine when paths are relative (since absolute
paths need to get a new name based on the sha1 hash).
But this has an OS-specific behavior and actually
returns True on Windows for paths like `/media/file.png`.
This ought to fix #7881.
2022-02-04 09:47:07 -08:00
John MacFarlane
40b174c770 Revert "T.P.Class.IO.adjustImagePath: avoid double slash."
This reverts commit 3dcb526b9b.
2022-02-04 09:29:49 -08:00
John MacFarlane
3dcb526b9b T.P.Class.IO.adjustImagePath: avoid double slash.
PReviously if the directory argument ended in slash,
we'd get a doubled slash in the path.  This may help
with #7881.
2022-02-04 07:54:52 -08:00
John MacFarlane
60caa0a1e1 Docx reader: add bibliographic entries for zotero ADDIN.
Bibliographic data embedded in citation items is added
to the `references` metadata field.

Closes #7840.
2022-02-03 22:08:46 -08:00
John MacFarlane
4086873281 Improve locators for docx Zotero citations. 2022-02-03 20:23:11 -08:00
John MacFarlane
1c2f0fe1d2 Enable citations extension for docx reader.
When enabled, Zotero citations are parsed as native pandoc
citations.  (When disabled, the Zotero-generated citation
text is passed through as regular text.)  In addition, the
Zotero-generated bibliography is suppressed.

Locators still need some work.
2022-02-03 19:34:05 -08:00
John MacFarlane
9ef8650612 Docx reader: Parse CSL JSON in Zotero addin.
This gives us what we ned for #7840, except adding
to the references in metadata.
2022-02-03 16:04:15 -08:00
John MacFarlane
b9ac243986 Trim whitespace from math in --webtex.
This fixes problems with --webtex and markdown output,
when display math starts or ends with a newline.

Closes #7892.
2022-02-03 13:21:27 -08:00
John MacFarlane
9618b66fe8 Whitespace fixes. 2022-02-03 13:13:03 -08:00
John MacFarlane
0011c9520d Docx reader: add more framework for Zotero citations.
- Add docxReferences to state, so we can accumulate
  references for metadata.
- Add a clause for ZoteroItem to parPartToInlines'.
  So far it doesn't do anything except add a surrounding Cite element.

See #7840.
2022-02-03 07:39:07 -08:00
John MacFarlane
54279149ab Use unreleased citeproc. 2022-02-03 07:37:13 -08:00
John MacFarlane
6ed8999f75 LaTeX reader: handle subequations as inline math environment.
Closes #7883.
2022-02-02 10:41:46 -08:00
Albert Krewinkel
2fa8308afa Restore wkhtmltopdf as default pdf engine for HTML 2022-02-01 14:43:38 -08:00
John MacFarlane
4c8c7f6dff Revert "T.P.App.Opt: fix logic bug in fullDefaultsPath."
This reverts commit 545c0911aa.

Fixes regression in 2.17.1.

The original commit was completely misguided, and caused
problems finding defaults files in the default user data
directory.
2022-01-31 09:36:49 -08:00
John MacFarlane
a246107347 LaTeX reader: ensure that \raggedright doesn't gobble an argument.
See #7757.
2022-01-29 22:15:04 -08:00
John MacFarlane
c348c4d4fe Use [x] not [X] for asciidoctor checklists.
See #7798.
2022-01-29 17:47:35 -08:00
Albert Krewinkel
fbb9fbf9bb
Custom writer: preserve order of element attributes
Attribute key-value pairs are marshaled as AttributeList, i.e., as a
userdata type that behaves both like a list and a map. This allows to
preserve the order of key-value pairs.

Closes: #7489
2022-01-29 22:36:22 +01:00
Albert Krewinkel
412596c30b Switch to hslua-2.1
This allows for some code simplification and improves stability.
2022-01-29 08:43:14 -08:00
Albert Krewinkel
a6fa3df114
HTML writer: avoid duplicate "style" attributes on table cells
Fixes: #7871
2022-01-28 18:20:14 +01:00
Even Brenden
d36a16a4df Don't read files outside of user data directory
If a file path does not exist relative to the working directory, but
it does exist relative to the user data directory, and it exists outside
of the user data directory, do not read it. This applies to readDataFile
and readMetadataFile in PandocMonad and, by extension, any module that
uses these by passing them relative paths.
2022-01-28 08:51:27 -08:00
Even Brenden
e1f8c4b396 Handle consecutive ".."s in makeCanonical
As an example, prior to this commit, "../../file" would evaluate to
"file", when it should be unchanged.
2022-01-28 08:51:27 -08:00
John MacFarlane
7fbce82f2f LaTeX writer: allow arbitrary frameoptions to be passed...
to a beamer frame, using the frameoptions attribute.
Updated manual.

See #7869.
2022-01-27 14:07:51 -08:00
John MacFarlane
4fa042f847 LaTeX writer: add s and squeeze to recognized beamer frameoptions.
Closes #7869.
2022-01-27 14:07:51 -08:00
John MacFarlane
183fb3e327 LaTeX reader: improve descItem.
For some reason we were skipping arbitrary blocks before `\item`.
This is now changed to "skip whitespace and comments."
2022-01-25 08:43:12 -08:00
John MacFarlane
a9f901cf6b CommonMark reader: fix source position after YAML metadata.
Closes #7863.
2022-01-23 22:13:58 -08:00
John MacFarlane
67f2b25c05 LaTeX reader: improve handling of newif.
Adding a pair of braces around the second argument of `\def`
prevents LaTeX from an emergency stop with:  Closes #6096.

```
pandoc -f markdown -o test.pdf
\newif\ifepub

\epubtrue

\ifepub

hi

\fi
^D
```
2022-01-22 21:48:14 -08:00
Even Brenden
7df29e495f
Search for metadata files in $DATADIR/metadata (#7851)
If files specified with `--metadata-file` are not found in the working
directory, look in `$DATADIR/metadata`.

Expose new `readMetadataFile` function from Text.Pandoc.Class
[API change].

Expose new `PandocCouldNotFindMetadataFileError` constructor for
`PandocError` from Text.Pandoc.Error [API change].

Closes #5876.
2022-01-21 12:00:45 -08:00
John MacFarlane
672b6dc7e6 Remove retokenizing in rawLaTeXParser.
This was causing serious problems with `newif` commands.
See #6096.  And it didn't seem to make any difference for
the tests; I assume that, unless there's some untested
behavior, this is something that has now become unnecessary.
2022-01-21 10:17:58 -08:00
John MacFarlane
52b78b10c8 Avoid putting a frame around speaker notes in beamer.
If speaker notes (a Div with class 'notes') occur right
after a section heading, but above slide level, the
resulting `\note{..}` caommand should not be wrapped in
a frame, as that will cause a spurious blank slide.

Closes #7857.
2022-01-20 19:09:44 -08:00
John MacFarlane
ef8135b4a7 HTML writer: don't break lines inside code elements.
With the new (default) line wrapping of HTML, in
conjunction with the default CSS which includes
`code { whitespace: pre-wrap; }`, spurious line
breaks could be introduced into inline code.

Closes #7858.
2022-01-20 09:17:34 -08:00
John MacFarlane
d9ec95e7ab Modify stringify so it ignores [Citation] inside Cite.
Otherwise we'll sometimes get two copies of things, one
from the `citationPrefix` or `citationSuffix` and another
from the embedded fallback text.

When there is no fallback text, we'll get no content.
However, it really isn't an alternative to just rely
on the result of running `query` on the embedded `Citation`s;
this will result in a jumble of text rather than anything
structured.

Closes #7855.
2022-01-19 22:06:06 -08:00
John MacFarlane
d5818413ff Docx reader: parse both zotero citation and bibliography...
as FieldInfo.
2022-01-19 10:31:00 -08:00
John MacFarlane
73fe7c129e Docx reader: add skeleton for parsing zotero ADDINs.
So far this just adds a constructor for FieldInfo;
we'll need to adjust the rest of the reader code to
parse the JSON and do something with it.

See #7840.
2022-01-19 10:20:15 -08:00
John MacFarlane
6723891c72 Markdown writer: handle explicit column widths with pipe tables.
If a table has explicit column width information *and* the
content extends beyond the `--columns` width, we need to
adjust the widths of the pipe separators to encode this width
information.

Closes #7847.
2022-01-19 09:36:48 -08:00
Michael Hoffmann
e146b1ff3b
Docx writer: Separate tables even with RawBlocks between (#7844)
Adjacent docx tables need to be separated by an empty paragraph. If
there's a RawBlock between tables which renders to nothing, be sure to
still insert the empty paragraph so that they will not collapse
together.

Fixes #7724
2022-01-18 14:28:28 -08:00
John MacFarlane
c1717378b0 Fix some haddock errors. 2022-01-17 21:03:25 -08:00
John MacFarlane
545c0911aa T.P.App.Opt: fix logic bug in fullDefaultsPath.
Previously we would (also) search the default user data directory
for a defaults file, even if a different user data directory
was specified using `--data-dir`.  This was a mistake; if
`--data-dir` is used, the default user data directory should
not be searched.
2022-01-17 21:03:25 -08:00
John MacFarlane
a92ae0a58a T.P.Shared.defaultUserDataDir: behavior change.
If the XDG data directory is not defined (e.g. because
it's not supported in the OS or HOME isn't defined), we
return the empty string instead of raising an exception.

Closes #7842.
2022-01-17 21:03:25 -08:00
Albert Krewinkel
7f50324ff9
PDF: support pagedjs-cli as pdf engine (#7838)
PagedJS is a polyfill and supports the Paged Media standards by the W3C.
<https://www.pagedjs.org/>
2022-01-17 09:19:03 -08:00
Nikolai Korobeinikov
b683b8d48a
Support checklists in asciidoctor writer (#7832)
The checklist syntax (similar to `task_list` in markdown) seems to be
an asciidoctor-only addition.

Co-authored-by: ricnorr <ricnorr@yandex-tream.ru>
2022-01-16 11:05:19 -08:00
John MacFarlane
c40727bfbb Man writer: use custom font V for inline code.
The V font is defined conditionally, so that it renders
like CB in output formats that support that, and like B
in those that don't (e.g. the terminal).

We could just redefine C, but this would affect code
blocks, too, and putting them all in boldface looks ugly,
I think.

Possible drawback: fragments created by pandoc's man
writer will presuppose a nonstandard V font.

Closes #7506.
Supersedes 253467a549.
2022-01-15 12:39:19 -08:00