Commit graph

1388 commits

Author SHA1 Message Date
John MacFarlane
c6e5cf2e74 Benchmark improvements.
* Build `+RTS -A256m -RTS` into default ghc-options for benchmark,
  so we don't have to specify this separately on the command line.
  This is necessary to get accurate benchmark results; otherwise
  we are largely measuring garbage collecting, some not related
  to the current benchmark.
* Switch back from gauge to tasty-bench.
* Allow specifying BASELINE file in 'make bench' for comparison
  (otherwise the latest is chosen by default).
* Remove obsolete reference to weigh-pandoc from CONTRIBUTING.md.
* Remove `-Rghc-timing` from 'make bench'.
2021-03-17 13:34:17 -07:00
John MacFarlane
1ef3534328 Increase heap space in runtime for benchmarks.
Otherwise we're essentially benchmarking garbage collecting,
which can give very inconsistent results.
2021-03-16 15:59:50 -07:00
John MacFarlane
ff0fcedcb3 Switch to gauge for now for benchmarks.
tasty-bench is displaying odd behavior, with different
timings depending on the `--pattern` specified.
2021-03-15 22:50:18 -07:00
John MacFarlane
5f94dd74f1 Require texmath 0.12.2 2021-03-15 15:36:57 -07:00
John MacFarlane
39934c8851 Require latest doclayout and skylighting. 2021-03-14 15:48:01 -07:00
John MacFarlane
3519d6f3b4 Use eciteproc >= 0.3.0.9 2021-03-13 11:27:05 -08:00
Albert Krewinkel
f8b49e77f8
Use jira-wiki-markup 1.3.4
Jira reader:

* Fixed parsing of autolinks (i.e., of bare URLs in the text).
  Previously an autolink would take up the rest of a line, as spaces
  were allowed characters in these items.

* Emoji character sequences no longer cause parsing failures. This was
  due to missing backtracking when emoji parsing fails.

Jira writer:

* Block quotes are only rendered as `bq.` if they do not contain a
  linebreak.
2021-03-13 14:53:58 +01:00
John MacFarlane
e17127dc28 Re-add a needed dependency for benchmark. 2021-03-09 13:40:24 -08:00
John MacFarlane
a8b2031bb4 Revert "Use -Wunused-packages on ghc >= 8.10."
This reverts commit 7a1d0f01e9.

This option gives confusing output when a build is interrupted,
suggesting that packages aren't required when we just didn't
get to the model that requires them.
2021-03-09 12:49:15 -08:00
John MacFarlane
a9a05110d0 Remove some unused packages from pandoc.cabal. 2021-03-09 12:34:36 -08:00
John MacFarlane
7a1d0f01e9 Use -Wunused-packages on ghc >= 8.10. 2021-03-09 12:34:36 -08:00
John MacFarlane
e649b69564 Bump version to 2.12 2021-03-04 08:57:27 -08:00
John MacFarlane
92ea8a0cb6 Revert "Add T.P.Readers.LaTeX.Include."
This reverts commit b569b0226d.

Memory usage improvement in compilation wasn't very significant.
2021-03-03 19:07:16 -08:00
John MacFarlane
b569b0226d Add T.P.Readers.LaTeX.Include. 2021-03-03 18:47:17 -08:00
John MacFarlane
33e4c8dd6c Remove T.P.Readers.LaTeX.Accent.
Incorporate accentCommands into T.P.Readers.LaTeX.Inline.
2021-03-03 18:21:32 -08:00
John MacFarlane
bbcc1501a5 Split out T.P.Readers.LaTeX.Inline. 2021-03-03 10:34:10 -08:00
John MacFarlane
e8e5ffe1f4 Split out T.P.Writers.LaTeX.Util. 2021-03-02 22:40:45 -08:00
John MacFarlane
fe483c653b Split out T.P.Writers.LaTeX.Citation. 2021-03-02 21:57:37 -08:00
John MacFarlane
827ecdd2de Split out T.P.Writers.LaTeX.Lang. 2021-03-02 21:33:58 -08:00
John MacFarlane
2097411e4f Split up T.P.Writers.Markdown...
with T.P.Writers.Markdown.Types and T.P.Writers.Markdown.Inline.
The module was difficult to compile on low-memory system.s
2021-03-02 21:08:13 -08:00
John MacFarlane
7f1b933aaa Make T.P.Readers.LaTeX.Types an unexported module.
[API change]

This is really an implementation detail that shouldn't be
exposed in the public API.
2021-03-01 09:46:43 -08:00
John MacFarlane
382f0e23d2 Factor out T.P.Readers.LaTeX.Macro. 2021-03-01 09:46:43 -08:00
John MacFarlane
d2bb0c7c8d Factor out T.P.Readers.LaTeX.Math. 2021-02-28 21:05:25 -08:00
John MacFarlane
7e83686d31 trypandoc: add 2 second timeout. 2021-02-28 09:24:37 -08:00
John MacFarlane
2faa57e8e9 Factor out T.P.Readers.LaTeX.Citation. 2021-02-28 09:12:09 -08:00
John MacFarlane
08231f5cdd Factor out T.P.Readers.LaTeX.Table. 2021-02-27 21:40:56 -08:00
John MacFarlane
925815bb33 Split off T.P.Readers.LaTeX.Accent.
To help reduce memory demands compiling the main LaTeX reader.
2021-02-27 17:02:44 -08:00
John MacFarlane
cbc3f034ad Use skylighting 0.10.4.
This version of skylighting uses xml-conduit rather than hxt.
This speeds up parsing of XML syntax definitions fourfold, and
removes four packages from pandoc's dependency graph:

hxt-charproperties
hxt-unicode
hxt-regex-xmlschema
hxt
2021-02-27 14:26:10 -08:00
John MacFarlane
9767386676 Use latest skylighting. 2021-02-22 22:25:10 -08:00
John MacFarlane
d7cfa0ef4c Remove weigh-pandoc.
It's not really useful any more, now that our regular
benchmarks include data on allocation.
2021-02-22 22:10:20 -08:00
John MacFarlane
b2b32d9bb2 'make bench': Create csv files for comparison. 2021-02-18 23:22:18 -08:00
Dmitrii Kovanikov
ef741f3842 Allow base64-bytestring-1.2.* 2021-02-18 18:07:23 +01:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
1942dc5611
Allow tasty 1.4.* 2021-02-14 14:43:32 +01:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
Albert Krewinkel
f7be8d0964
pandoc.cabal: use common stanza to reduce duplication (#7086) 2021-02-07 08:33:43 -08:00
Albert Krewinkel
ebb8f23b66 Use hslua-module-path 0.1.0 2021-02-02 21:04:30 -08:00
Albert Krewinkel
61b108d527 Lua: add module "pandoc.path"
The module allows to work with file paths in a convenient and
platform-independent manner.

Closes: #6001
Closes: #6565
2021-02-02 21:04:30 -08:00
John MacFarlane
02d3c71e72 BibTeX writer: use doclayout and doctemplate.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.

It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.

This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.

Closes #7068.
2021-02-01 18:05:20 -08:00
John MacFarlane
b239c89a82 BibTeX writer fixes. Closes #7067.
+ Require citeproc 0.3.0.7, which correctly titlecases when titles
  contain non-ASCII characters.
+ Correctly handle 'pages' (= 'page' in CSL).
+ Correctly handle BibLaTeX 'langid' (= 'language' in CSL).
+ In BibTeX output, protect foreign titles since there's no language
  field.
2021-02-01 11:23:07 -08:00
John MacFarlane
a9adb29648 Require citeproc 0.3.0.6. 2021-01-30 19:09:08 -08:00
John MacFarlane
fe06437ba4 Use tasty-bench instead of criterion for benchmarks.
It is much lighter-weight.
2021-01-30 18:01:14 -08:00
John MacFarlane
f5e3c1dad6 Use citeproc 0.3.0.5. 2021-01-22 11:06:35 -08:00
John MacFarlane
83d7804b8f
Merge pull request #7042 from tarleb/jats-element-citations
JATS writer: use element citations
2021-01-22 10:39:58 -08:00
Albert Krewinkel
b4b3560191
JATS writer: allow to use element-citation 2021-01-22 19:35:08 +01:00