Commit graph

14074 commits

Author SHA1 Message Date
John MacFarlane
24d7cd539b LaTeX template: disable ` ? ` and ! `` ligatures.
These are often triggered by accident in languagegs that
use ` `` ` for end quote (e.g. German).

See jgm/citeproc#54.
2021-02-18 15:48:40 -08:00
John MacFarlane
53cf8295a4 LaTeX writer: adjust hypertargets to beginnings of paragraphs.
Use `\vadjust pre` so that the hypertarget takes you to the
beginning of the paragraph rather than one line down.

Closes #7078.

This makes a particular difference for links to citations
using `--citeproc` and `link-citations: true`.
2021-02-18 14:34:38 -08:00
John MacFarlane
9e728b40f3 T.P.Shared: cleanup.
Cleanup up some functions and added deprecation pragmas
to funtions no longer used in the code base.
2021-02-18 13:12:15 -08:00
Albert Krewinkel
743f7216de
Org reader: fix bug in org-ref citation parsing.
The org-ref syntax allows to list multiple citations separated by comma.
This fixes a bug that accepted commas as part of the citation id, so all
citation lists were parsed as one single citation.

Fixes: #7101
2021-02-18 21:59:18 +01:00
Dmitrii Kovanikov
ef741f3842 Allow base64-bytestring-1.2.* 2021-02-18 18:07:23 +01:00
John MacFarlane
73add05789 Docx reader: use Map instead of list for Namespaces.
This gives a speedup of about 5-10%.

The reader is now approximately twice as fast as in the last
release.
2021-02-17 09:54:39 -08:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
4af378702a Add orig copyright/license info for code derived from xml-light. 2021-02-16 18:44:38 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
b5b576184c
JATS writer: add date-type to pub-date elements 2021-02-15 13:15:14 +01:00
Albert Krewinkel
2c99e0e358
JATS writer: replace attribute "pub-type" with "publication-format".
The former attribute is deprecated.
2021-02-15 13:15:14 +01:00
Albert Krewinkel
8621ed600a
T.P.Error: remove unused variables 2021-02-14 15:49:12 +01:00
Albert Krewinkel
1942dc5611
Allow tasty 1.4.* 2021-02-14 14:43:32 +01:00
John MacFarlane
d84a6041e1 HTML reader: fix bad handling of empty src attribute in iframe.
- If src is empty, we simply skip the iframe.
- If src is invalid or cannot be fetched, we issue a warning
  and skip instead of failing with an error.
- Closes #7099.
2021-02-13 13:08:34 -08:00
John MacFarlane
6e73273916 T.P.Error: export renderError.
Refactor `handleError` to use `renderError`. This allows us
render error messages without exiting.
2021-02-13 13:08:34 -08:00
Albert Krewinkel
a3beed9db8 Org: support task_lists extension
The tasks lists extension is now supported by the org reader and writer;
the extension is turned on by default.

Closes: #6336
2021-02-13 13:00:37 -08:00
Albert Krewinkel
2d60a5127c T.P.Shared: export handleTaskListItem. [API change] 2021-02-13 13:00:37 -08:00
John MacFarlane
6323250bad LaTeX reader: remove unnecessary line 2021-02-13 00:22:22 -08:00
John MacFarlane
25b7df7c2a Remove Ext_fenced_code_attributes from allowed commonmark attributes.
This attribute was listed as allowed, but it didn't actually
do anything. Use `attributes` for code attributes and more.

Closes #7097.
2021-02-13 00:18:40 -08:00
John MacFarlane
1954e894b4 Clean up benchmark code.
Now we can do patterns using `-p blah'.
2021-02-13 00:14:49 -08:00
John MacFarlane
eb0c63b002 Avoid an unnecessary withRaw. 2021-02-12 19:29:48 -08:00
John MacFarlane
d9322629a3 LaTeX reader improvements.
* Rewrote `withRaw` so it doesn't rely on fragile assumptions
  about token positions (which break when macros are expanded).
  This requires the addition of `sEnableWithRaw` and `sRawTokens`
  in `LaTeXState`, and a new combinator `disablingWithRaw` to
  disable collecting of raw tokens in certain contexts.
* Add `parseFromToks` to T.P.Readers.LaTeX.Parsing.
* Fix parsing of single character tokens so it doesn't mess
  up the new raw token collecting.
* These changes slightly increase allocations and have a small
  performance impact, but it's minor.

Closes #7092.
2021-02-12 19:04:14 -08:00
John MacFarlane
3be066b7d3 Fix command test 5686 2021-02-12 19:04:14 -08:00
John MacFarlane
59875185b3 Add command test for #7092 2021-02-12 19:04:14 -08:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00
John MacFarlane
8c2618ed81 Add MANUAL section on reproducible builds. 2021-02-11 15:17:56 -08:00
John MacFarlane
390d5e65b2 Use getTimestamp instead of getCurrentTime in writers.
Setting SOURCE_DATE_EPOCH will allow reproducible builds.

Partially addresses #7093.  This does not suffice to fully enable
reproducible in EPUB, since a unique id is being generated for each
build.
2021-02-11 14:55:03 -08:00
John MacFarlane
3c4a58bad0 T.P.Class: Add getTimestamp [API change].
This attempts to read the SOURCE_DATE_EPOCH environment variable
and parse a UTC time from it (treating it as a unix date stamp,
see https://reproducible-builds.org/specs/source-date-epoch/).
If the variable is not set or can't be parsed as a unix date
stamp, then the function returns the current date.
2021-02-11 14:54:28 -08:00
John MacFarlane
acc9afaf6f Correctly parse "raw" date value in markdown references metadata.
See jgm/citeproc#53.
2021-02-11 09:16:25 -08:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
John MacFarlane
9994ad977d Use lts-17.2 resolver (with ghc 8.10.3). 2021-02-08 10:11:06 -08:00
John MacFarlane
f70795dc5e ODT reader: finer-grained errors on parse failure.
See #7091.
2021-02-08 09:39:59 -08:00
John MacFarlane
5cd1c1001f ODT reader: give more information if zip can't be unpacked. 2021-02-08 09:39:59 -08:00
Nils Carlson
69b7401e31
DocBook reader: Support informalfigure (#7079)
Add support for informalfigure.
2021-02-08 09:36:58 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
Albert Krewinkel
f7be8d0964
pandoc.cabal: use common stanza to reduce duplication (#7086) 2021-02-07 08:33:43 -08:00
Nixon Enraght-Moony
bab5d10ea7
Document no template fallback for absolute path (#7088)
See jgm/pandoc#7077
2021-02-07 08:30:34 -08:00
John MacFarlane
8e9131db4e Markdown reader: improved handling of mmd link attributes in references.
Previously they only worked for links that had titles.  Closes #7080.
2021-02-06 21:52:12 -08:00
John MacFarlane
0ab3e4048c stack.yaml - use commonmark-0.1.1.4 for GHC 9 2021-02-06 19:00:22 -08:00
John MacFarlane
b63d579ba2 CI: use haskell/actions/setup.
actions/haskell-setup is no longer maintained.
2021-02-06 19:00:00 -08:00
Albert Krewinkel
51c3c93f0f
CI: use cabal 2.2 when building with GHC 8.0.2. (#7085) 2021-02-06 18:09:05 -08:00
Albert Krewinkel
a5169f68b2
Lua filters: use same function names in Haskell and Lua 2021-02-04 19:07:59 +01:00
Albert Krewinkel
57e56ed55c
doc/lua-filters.md: improve docs for pandoc.mediabag.insert 2021-02-04 19:07:59 +01:00
Albert Krewinkel
364fe4a03b
doc/lua-filters.md: fix, improve docs for pandoc.mediabag.fetch 2021-02-04 15:32:35 +01:00
Nick Berendsen
b79aba6ea1
ePub writer: belongs-to-collection metadata (#7063) 2021-02-03 09:00:18 -08:00
Andrew Dunning
4de9edb8e8
LaTeX template: Update to iftex package (#7073)
Load the iftex package directly rather than via the ifxetex and ifluatex compatibility
wrappers, which have been merged into a single package that is part of the LaTeX core.
The capitalization of the commands has been changed for compatibility with older
versions of TeX Live that have the version of iftex by the Persian TeX Group. This had
been removed in
<2845794c0c>
for compatibility with BasicTeX, but that is no longer an issue.
2021-02-03 08:54:11 -08:00
John MacFarlane
e6c7fcc598 Fixed some compiler warnings in tests. 2021-02-02 21:09:10 -08:00
Albert Krewinkel
6f79042502 Add tests for search_path_separator 2021-02-02 21:04:30 -08:00