Commit graph

7899 commits

Author SHA1 Message Date
John MacFarlane
c1ab48874c Parsing.General: make manyChar1, etc. more strict.
Profiling the muse reader revealed that these were creating huge thunks.
2022-03-31 23:09:14 -07:00
John MacFarlane
ffa13769e6 RTF reader: increased stricness.
This leads to some performance improvements.
2022-03-31 10:11:48 -07:00
John MacFarlane
8b21ec7d0c Markdown reader: add some strictness.
This improves some benchmarks significantly.
2022-03-31 10:11:48 -07:00
Albert Krewinkel
ad726953b9
Lua: allow to pass Sources to pandoc.read (#8002)
Sources, the data type passed to the `Reader` function in custom
readers, are now accepted as input to `pandoc.read`.
2022-03-30 14:10:30 -07:00
John MacFarlane
b8e0d574b1 STrictness improvement in RTF reader. 2022-03-30 13:19:09 -07:00
John MacFarlane
5f0bfd41a8 LaTeX writer: add () after booktabs rules.
These commands take optional arguments with () and [],
which can lead to problems if the content of the table
cell begins with these characters.

Closes #8001.
2022-03-30 10:07:09 -07:00
John MacFarlane
bb5f0f7b76 HTML writer: Further performance improvements. 2022-03-30 09:48:56 -07:00
John MacFarlane
4a54ca5b0b Add mime type for mkv extension (#7181). 2022-03-30 09:44:22 -07:00
John MacFarlane
d71d01f41a HTML writer: add a performance shortcut to strToHtml. 2022-03-30 09:34:16 -07:00
John MacFarlane
5fbea20e03 Fixed two thunk leaks in RTF reader.
This further reduces memory usage.
See #7943.
2022-03-29 22:42:20 -07:00
John MacFarlane
76748ee0fe JATS reader: handle pub-date.
Closes #8000.
2022-03-29 19:41:14 -07:00
John MacFarlane
a9498a1568 LaTeX writer: support page,trim,clip attributes on images.
These are actually supported by `\includegraphics`, though
this is not well documented. See
https://tex.stackexchange.com/questions/7938/pdflatex-includegraphics-and-multi-page-pdf-files

Partially addresses #7181.
2022-03-29 09:03:28 -07:00
Albert Krewinkel
7a7e1b2b70
RST reader: wrap math in Span to preserve attributes (#7998)
Math elements with a name, classes, or other fields are wrapped in a
`Span` with these attributes.
2022-03-29 08:50:55 -07:00
Jonathan Dönszelmann
cd931e55b6
Refactor Text.Pandoc.Writers.EPUB (#7991)
Refactor for readability.

Co-authored-by: Ola Wolska <A.k.wolska@student.tudelft.nl@gmail.com>
Co-authored-by: Ivar de Bruin <ivardb@gmail.com>
Co-authored-by: Jaap de Jong <jaapdejong15@gmail.com>
2022-03-29 08:40:20 -07:00
Albert Krewinkel
40dd8fd129
Include Lua version in --version output. (#7997) 2022-03-29 08:38:00 -07:00
Albert Krewinkel
e4f4be6c80
Remove redundant dependency on hslua-marshalling.
The package is a dependency of hslua; all important modules are
re-exported.
2022-03-29 08:04:49 +02:00
John MacFarlane
807a574e9d JATS reader: strip 'ref-' from ref id in constructing CSL id.
This allows better round-tripping, because the JATS
writer adds the `ref-` prefix to the citation id to get
the ref element's id.
2022-03-28 18:50:03 -07:00
John MacFarlane
51c8b059e1 JATS reader: improve refs parsing.
Handle issn and isbn; use simpler form for issued date.
2022-03-28 18:37:33 -07:00
John MacFarlane
6217fd0976 JATS writer: Fix handling of CSL variable 'page'.
Not 'pages' as we had before.  It should go to 'lpage' and 'rpage',
not 'page-range'.  See
https://jats.nlm.nih.gov/archiving/tag-library/1.1/element/page-range.html

Fixed some mistakes in test #7042.
2022-03-28 17:04:10 -07:00
John MacFarlane
5c7dc4c7f3 JATS reader: support PMID, DOI, issue in citations.
Closes #7995.
2022-03-28 17:04:10 -07:00
Albert Krewinkel
c5cd03a022
JATS writer: keep edition info in element citations.
Closes: #7993
2022-03-28 21:45:56 +02:00
John MacFarlane
35350fac85 JATS writer: avoid doubled ref-list element.
Previously when generating JATS with the `element_citations`
extension enabled, the references were put in a doubly-nested
ref-list element (`<ref-list><ref-list>...`).  This is now fixed.

Closes #7990.
2022-03-27 09:32:55 -07:00
Nikolai Korobeinikov
e2923747a4
Docx writer: add bookmark with table id to table (#7989)
This allows tables with ids to be linked to.

Closes #7285.
2022-03-26 10:00:05 -07:00
John MacFarlane
51f18d52c7 Rename T.P.Parsing.Combinators -> T.P.Parsing.General.
Because many of the exported things aren't combinators...

Also remove redundant explot of indentWith from T.P.Parsing.Lists.
2022-03-25 11:14:54 -07:00
John MacFarlane
f520ac9b17 T.P.Parsing: use explicit imports. 2022-03-25 11:03:35 -07:00
John MacFarlane
1572c27241 More optimization of RTF reader. 2022-03-25 09:14:47 -07:00
John MacFarlane
672822cf98 RTF reader: optimize parsing of unformatted text. 2022-03-25 08:38:50 -07:00
John MacFarlane
dafdd16e10 Sources: small strictness optimization 2022-03-25 08:38:10 -07:00
John MacFarlane
36786e86fb RTF reader: more memory usage optimizations.
See #7943.
2022-03-24 23:39:14 -07:00
John MacFarlane
0de829090c Small optimizations in RTF reader. 2022-03-24 22:39:24 -07:00
Albert Krewinkel
b9eeb77df5
[API change] Unify grid table parsing (#7971)
Grid table parsing in Markdown and rst are updated use the same
functions. Functions are generalized to meet requirements for both
formats.

This change also lays the ground for further generalizations in table
parsers, including support for advanced table features.

API changes in Text.Pandoc.Parsing:

- Parse results of functions `tableWith'` and `gridTableWith'` are now a
  `mf TableComponents` instead of a quadruple of alignments, column
  widths, header rows and body rows.

Additional exports from Text.Pandoc.Parsing:

- `tableWith'`
- `TableComponents`
- `TableNormalization`
- `toTableComponents`
- `toTableComponents'`
2022-03-24 11:59:20 -07:00
John MacFarlane
9fa2aeb489 RTF reader: more efficient parsing of command parameters. 2022-03-24 11:38:55 -07:00
Albert Krewinkel
4394fdf59c
JATS writer: encode author "others" as <etal/>
Citeproc adopted the BibTeX convention to use the author name "others"
when there are additional authors that are not named. JATS uses the
`<etal>` element for this.
2022-03-22 15:09:14 +01:00
Albert Krewinkel
69177861a4
Parsing.GridTable: simplify column handling code. 2022-03-18 14:20:49 +01:00
Albert Krewinkel
eaba313fb3
Writers.GridTable: improve module documentation. 2022-03-18 14:16:03 +01:00
Albert Krewinkel
43e549b2fb
Markdown writer: move table-related code into submodule. 2022-03-18 14:15:56 +01:00
John MacFarlane
75ddff2422 Allow formatted bibliography to be placed in metadata fields.
This modifies `processCitations` so that pandoc will look not just
in the document body but in metadata for a Div with id `refs` in
which to place the formatted bibliography.

Thus, one can include a metadata field, say `refs`, whose content
is an empty div with id `refs`, and the formatted bibliography
will be put into this metadata field.  It may then be interpolated
into a template using the variable `refs`.

Closes #7969.

Closes #526 by providing a way to interpolate references into
a template.
2022-03-16 14:37:51 -07:00
John MacFarlane
54f6e1be9b Remove native_divs from allowed gfm extensions.
This allows `<div>` to be suppressed using `-raw_html`.
Previously `native_divs` was enabled but could
not be suppressed, because it was not in the list of
available extensions for commonmark-based formats.

Closes #7965.
2022-03-14 12:45:45 -07:00
Albert Krewinkel
1aeeba9ecb
Shared: define ordNub as alias for nubOrd from containers package (#7963)
This requires at least containers 0.6.0.1, which ships with the oldest
GHC version currently supported by pandoc (GHC 8.6).
2022-03-13 08:42:30 -07:00
Albert Krewinkel
edfe34c86c
Document more functions in T.P.Parsing and T.P.Shared. 2022-03-12 23:16:31 +01:00
John MacFarlane
699336cf5b LaTeX reader: better handling of \usepackage.
If the package is local but causes parse errors, parse
everything up to the error and skip the rest.  Issue a
CouldNotParseIncludeFile warning indicating that parsing
failed at that point.

T.P.Logging: add CouldNotParseIncludeFile constructor.
2022-03-12 12:18:51 -08:00
John MacFarlane
f9a4e049c5 T.P.Readers.LaTeX.Parsing: Monoid and Semigroup instances for TokStream. 2022-03-12 10:23:25 -08:00
John MacFarlane
6abcde0bf7 LaTeX reader: further optimizations for inline parsing. 2022-03-11 21:59:26 -08:00
John MacFarlane
b423c17100 LaTeX reader: use custom TokStream...
that keeps track of whether macros are expanded. This allows
us to improve performance a bit by avoiding unnecessary
runs of the macro expansion code (e.g. from 24 ms to 20 ms on
our standard benchmark).
2022-03-11 19:51:59 -08:00
Albert Krewinkel
517bceeba8
Parsing: partition module into (internal) submodules (#7962) 2022-03-11 09:21:59 -08:00
Albert Krewinkel
168529f0a4
Org writer: stop indenting property drawers, quote blocks
This follows the current default org-mode behavior.

Closes: #3245
2022-03-11 12:12:04 +01:00
John MacFarlane
a7d94dba43 Org reader: allow multiple #+bibliography:. 2022-03-10 13:31:02 -08:00
John MacFarlane
18c432024b Org reader: parse #+print_bibliography: as Div with id refs. 2022-03-10 13:15:52 -08:00
John MacFarlane
581c94913f LaTeX reader: allow inline groups starting with \bgroup.
Closes #7953.
2022-03-09 17:53:00 -08:00
John MacFarlane
9b5ec100e5 Markdown writer: update escaping rules for \.
We now escape `\` only if `raw_tex` is enabled or
it is followed by a non-alphanumeric.
2022-03-07 10:46:07 -08:00