Commit graph

1426 commits

Author SHA1 Message Date
John MacFarlane
92ea8a0cb6 Revert "Add T.P.Readers.LaTeX.Include."
This reverts commit b569b0226d.

Memory usage improvement in compilation wasn't very significant.
2021-03-03 19:07:16 -08:00
John MacFarlane
b569b0226d Add T.P.Readers.LaTeX.Include. 2021-03-03 18:47:17 -08:00
John MacFarlane
33e4c8dd6c Remove T.P.Readers.LaTeX.Accent.
Incorporate accentCommands into T.P.Readers.LaTeX.Inline.
2021-03-03 18:21:32 -08:00
John MacFarlane
bbcc1501a5 Split out T.P.Readers.LaTeX.Inline. 2021-03-03 10:34:10 -08:00
John MacFarlane
e8e5ffe1f4 Split out T.P.Writers.LaTeX.Util. 2021-03-02 22:40:45 -08:00
John MacFarlane
fe483c653b Split out T.P.Writers.LaTeX.Citation. 2021-03-02 21:57:37 -08:00
John MacFarlane
827ecdd2de Split out T.P.Writers.LaTeX.Lang. 2021-03-02 21:33:58 -08:00
John MacFarlane
2097411e4f Split up T.P.Writers.Markdown...
with T.P.Writers.Markdown.Types and T.P.Writers.Markdown.Inline.
The module was difficult to compile on low-memory system.s
2021-03-02 21:08:13 -08:00
John MacFarlane
7f1b933aaa Make T.P.Readers.LaTeX.Types an unexported module.
[API change]

This is really an implementation detail that shouldn't be
exposed in the public API.
2021-03-01 09:46:43 -08:00
John MacFarlane
382f0e23d2 Factor out T.P.Readers.LaTeX.Macro. 2021-03-01 09:46:43 -08:00
John MacFarlane
d2bb0c7c8d Factor out T.P.Readers.LaTeX.Math. 2021-02-28 21:05:25 -08:00
John MacFarlane
7e83686d31 trypandoc: add 2 second timeout. 2021-02-28 09:24:37 -08:00
John MacFarlane
2faa57e8e9 Factor out T.P.Readers.LaTeX.Citation. 2021-02-28 09:12:09 -08:00
John MacFarlane
08231f5cdd Factor out T.P.Readers.LaTeX.Table. 2021-02-27 21:40:56 -08:00
John MacFarlane
925815bb33 Split off T.P.Readers.LaTeX.Accent.
To help reduce memory demands compiling the main LaTeX reader.
2021-02-27 17:02:44 -08:00
John MacFarlane
cbc3f034ad Use skylighting 0.10.4.
This version of skylighting uses xml-conduit rather than hxt.
This speeds up parsing of XML syntax definitions fourfold, and
removes four packages from pandoc's dependency graph:

hxt-charproperties
hxt-unicode
hxt-regex-xmlschema
hxt
2021-02-27 14:26:10 -08:00
John MacFarlane
9767386676 Use latest skylighting. 2021-02-22 22:25:10 -08:00
John MacFarlane
d7cfa0ef4c Remove weigh-pandoc.
It's not really useful any more, now that our regular
benchmarks include data on allocation.
2021-02-22 22:10:20 -08:00
John MacFarlane
b2b32d9bb2 'make bench': Create csv files for comparison. 2021-02-18 23:22:18 -08:00
Dmitrii Kovanikov
ef741f3842 Allow base64-bytestring-1.2.* 2021-02-18 18:07:23 +01:00
John MacFarlane
80a1d5c9b6 Revert "Add T.P.XML.Light.Cursor."
This reverts commit d8fc497186.
2021-02-16 19:18:01 -08:00
John MacFarlane
d8fc497186 Add T.P.XML.Light.Cursor. 2021-02-16 18:51:41 -08:00
John MacFarlane
d7a4996b1e Split up T.P.XML.Light into submodules. 2021-02-16 18:40:06 -08:00
John MacFarlane
967e7f5fb9 Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light...
..and add new definitions isomorphic to xml-light's, but with
Text instead of String.  This allows us to keep most of the code in
existing readers that use xml-light, but avoid lots of unnecessary
allocation.

We also add versions of the functions from xml-light's
Text.XML.Light.Output and Text.XML.Light.Proc that operate
on our modified XML types, and functions that convert
xml-light types to our types (since some of our dependencies,
like texmath, use xml-light).

Update golden tests for docx and pptx.

OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`.

Docx: Do a manual traversal to unwrap sdt and smartTag.
This is faster, and needed to pass the tests.

Benchmarks:

A = prior to 8ca191604d (Feb 8)
B = as of 8ca191604d (Feb 8)
C = this commit

| Reader  |  A    | B      | C     |
| ------- | ----- | ------ | ----- |
| docbook | 18 ms | 12 ms  | 10 ms |
| opml    | 65 ms | 62 ms  | 35 ms |
| jats    | 15 ms | 11 ms  |  9 ms |
| docx    | 72 ms | 69 ms  | 44 ms |
| odt     | 78 ms | 41 ms  | 28 ms |
| epub    | 64 ms | 61 ms  | 56 ms |
| fb2     | 14 ms | 5  ms  | 4 ms  |
2021-02-16 16:55:20 -08:00
Albert Krewinkel
1942dc5611
Allow tasty 1.4.* 2021-02-14 14:43:32 +01:00
Albert Krewinkel
8ffd4159d6
Jira: require jira-wiki-markup 1.3.3
* Modified the Doc parser to skip leading blank lines. This fixes
  parsing of documents which start with multiple blank lines.
  (#7095)

* Prevent URLs within link aliases to be treated as autolinks.
  (#6944)

Fixes: #7095
Fixes: #6944
2021-02-12 17:15:12 +01:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
Albert Krewinkel
f7be8d0964
pandoc.cabal: use common stanza to reduce duplication (#7086) 2021-02-07 08:33:43 -08:00
Albert Krewinkel
ebb8f23b66 Use hslua-module-path 0.1.0 2021-02-02 21:04:30 -08:00
Albert Krewinkel
61b108d527 Lua: add module "pandoc.path"
The module allows to work with file paths in a convenient and
platform-independent manner.

Closes: #6001
Closes: #6565
2021-02-02 21:04:30 -08:00
John MacFarlane
02d3c71e72 BibTeX writer: use doclayout and doctemplate.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.

It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.

This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.

Closes #7068.
2021-02-01 18:05:20 -08:00
John MacFarlane
b239c89a82 BibTeX writer fixes. Closes #7067.
+ Require citeproc 0.3.0.7, which correctly titlecases when titles
  contain non-ASCII characters.
+ Correctly handle 'pages' (= 'page' in CSL).
+ Correctly handle BibLaTeX 'langid' (= 'language' in CSL).
+ In BibTeX output, protect foreign titles since there's no language
  field.
2021-02-01 11:23:07 -08:00
John MacFarlane
a9adb29648 Require citeproc 0.3.0.6. 2021-01-30 19:09:08 -08:00
John MacFarlane
fe06437ba4 Use tasty-bench instead of criterion for benchmarks.
It is much lighter-weight.
2021-01-30 18:01:14 -08:00
John MacFarlane
f5e3c1dad6 Use citeproc 0.3.0.5. 2021-01-22 11:06:35 -08:00
John MacFarlane
83d7804b8f
Merge pull request #7042 from tarleb/jats-element-citations
JATS writer: use element citations
2021-01-22 10:39:58 -08:00
Albert Krewinkel
b4b3560191
JATS writer: allow to use element-citation 2021-01-22 19:35:08 +01:00
John MacFarlane
fa952c8dbe Add biblatex, bibtex as output formats (closes #7040).
* `biblatex` and `bibtex` are now supported as output
  as well as input formats.

* New module Text.Pandoc.Writers.BibTeX, exporting
  writeBibTeX and writeBibLaTeX. [API change]

* New unexported function `writeBibtexString` in
  Text.Pandoc.Citeproc.BibTeX.
2021-01-22 10:08:43 -08:00
John MacFarlane
07c98eae50 Use citeproc >= 0.3.0.4. 2021-01-15 14:17:17 -08:00
John MacFarlane
4a223e68f4 Use commonmark 0.1.1.3. 2021-01-11 12:23:55 -08:00
Albert Krewinkel
68fa437999
JATS writer: fix citations (#7018)
* JATS writer: keep code lines at 80 chars or below

* JATS writer: fix citations
2021-01-10 15:35:48 -08:00
John MacFarlane
c83811773e Bump to 2.11.4.
API change: export getReferences from T.P.Citeproc.
2021-01-10 10:16:15 -08:00
Albert Krewinkel
4f34345867
Update copyright notices for 2021 (#7012) 2021-01-08 09:38:20 -08:00
David Martschenko
385b6a3b21
Implement defaults file inheritance (#6924)
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
2021-01-05 10:15:59 -08:00
John MacFarlane
886faa3cbc Bump to 2.11.3.2, update changelog and man page 2020-12-29 12:48:55 -08:00
John MacFarlane
5d8b57444e Use citeproc 0.3.0.3.
Fixes an issue in author-only citations when both an
author and translator are present.
2020-12-29 10:43:50 -08:00
John MacFarlane
a7a162ea55 Update test for new citeproc and require it in cabal. 2020-12-28 14:40:23 -08:00
John MacFarlane
19d4e43605 Require texmath 0.12.1. 2020-12-27 22:57:14 -08:00
Albert Krewinkel
8f402beab9
LaTeX writer: support colspans and rowspans in tables. (#6950)
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
2020-12-20 18:04:54 -08:00
John MacFarlane
37ba5d5dfe Bump to 2.11.3.1 and update changelog and man page. 2020-12-18 15:29:57 -08:00
John MacFarlane
aa37970969 Use citeproc 0.3.0.1. 2020-12-18 15:08:23 -08:00
John MacFarlane
591bb2bace Add test/writer.asciidoctor, tables.asciidoctor to extra-source-files. 2020-12-18 11:27:41 -08:00
John MacFarlane
29e7fef729 Include missing jats test files in pandoc.cabal.
See #6961.
2020-12-18 08:02:36 -08:00
John MacFarlane
9ec3d6ee97 Use skylighting 0.10.2.
Cloess #6625.
2020-12-17 09:32:13 -08:00
John MacFarlane
914cf0b602 Fix citeproc regression with duplicate references.
- Use dev version of citeproc, which handles duplicate
  ids better, preferring the last one in the list
  and discarding the rest.
- Ensure that inline citations take priority over external
  ones.

See jgm/citeproc#36.

This restores the behavior of pandoc-citeproc.
2020-12-16 15:37:40 -08:00
John MacFarlane
b4b4e32307 Properly handle boolean values in writing YAML metadata.
(Markdown writer.)
This requires doctemplates >= 0.9.
Closes #6388.
2020-12-15 23:45:34 -08:00
John MacFarlane
2ce14997ad Require binary >= 0.7.
Needed for runGetOrFail.
2020-12-13 10:33:46 -08:00
Albert Krewinkel
ccd235e31f
LaTeX writer: extract table handling into separate module. 2020-12-12 16:48:28 +01:00
John MacFarlane
248a2a1db5 cabal: remove -Werror=missing-home-modules.
It causes problems using cabal repl.
2020-12-10 10:27:31 -08:00
John MacFarlane
1fd642dd30 Move executable to app directory.
Otherwise we have problems with cabal repl.
2020-12-10 10:08:24 -08:00
John MacFarlane
a3eb87b2ea Add sourcepos extension for commonmarke
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3

With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected.  The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.

Closes #4565.
2020-12-10 08:59:55 -08:00
John MacFarlane
0dd228593f Use latest citeproc release. 2020-12-09 09:34:15 -08:00
John MacFarlane
1489bb8414 Use skylighting 0.10.1. 2020-11-24 21:26:25 -08:00
Albert Krewinkel
41237fcc0e
HTML reader: extract table parsing into separate module 2020-11-24 14:17:35 +01:00
Albert Krewinkel
f9258371dd HTML reader: extract submodules
Reducing module size should reduce memory use during compilation.

This is preparatory work to tackle support for more table features.
2020-11-23 10:12:20 +01:00
John MacFarlane
3f278f580e Remove 'static' flag.
This isn't really necessary and can be misleading
(e.g. on macOS, where a fully static build isn't
possible). cabal's new option
`--enable-executable-static` does the same. On stack
you can add something like this to the options for your
executable in package.yaml:

    ld-options: -static -pthread
2020-11-18 21:08:24 -08:00
John MacFarlane
e17f970ed0 Use citeproc 0.2 2020-11-18 17:49:30 -08:00
John MacFarlane
46bbdad838 Don't allow macos builds with 'static' flag.
Closes #6771.
2020-11-18 15:41:48 -08:00
Albert Krewinkel
94c9028819 JATS writer: move Table handling to separate module
This makes it easier to split the module into smaller parts.
2020-11-17 09:46:30 +01:00
John MacFarlane
79907e5f17 Bump to 2.11.2 for next release (minor API change in Logging). 2020-11-15 08:34:45 -08:00
John MacFarlane
cfb017c76b Bump to 2.11.1.1 and update changelog. 2020-11-07 11:12:19 -08:00
John MacFarlane
e6abf3b8ed Use citeproc 0.1.1.1.
Closes #6813.
2020-11-05 21:23:57 -08:00
John MacFarlane
9de386352a Require latest commonmark, commonmark-extensions.
Fixes a bug with `autolink_bare_uris` and commonmark.
2020-11-05 16:58:36 -08:00
John MacFarlane
391f6e5f80 Use latest commonmark, commonmark-extensions. 2020-11-05 15:05:11 -08:00
John MacFarlane
b5e9c2a7a6 Use citeproc 0.1.1. 2020-11-04 11:15:48 -08:00
John MacFarlane
f502c8d944 Bump version to 2.11.1 and update changelog. 2020-11-02 22:20:44 -08:00
John MacFarlane
992657efaa Use latest commonmark, commonmark-extensions.
This fixes a bug with nested blocks in footnotes with the
`footnote` extension to `commonmark`.  See jgm/commonmark-hs#63.
2020-11-01 10:48:47 -08:00
John MacFarlane
f20ec6b329 Bump to 2.11.0.4. 2020-10-22 22:06:38 -07:00
John MacFarlane
2059c05f0e Require citeproc >= 0.1.0.3.
In the previous release we pointed to this with cabal.project
and stack.yaml, but jumped the gun because citeproc 0.1.0.3
had not yet been officially released.
2020-10-22 21:45:38 -07:00
John MacFarlane
4731fa1d3f Bump to 2.11.0.3 and update changelog. 2020-10-22 17:35:43 -07:00
John MacFarlane
d199abb380 Bump version to 2.11.0.2 2020-10-19 16:32:39 -07:00
Albert Krewinkel
ae4e9d3b38
Relax upper bound on hslua, allow hslua-1.3.* 2020-10-16 21:41:05 +02:00
John MacFarlane
3c8b3eba17 Require citeproc 0.1.0.2. 2020-10-15 13:00:37 -07:00
John MacFarlane
0646873964 Version to 2.11.0.1 2020-10-13 22:47:56 -07:00
John MacFarlane
1122d22a2c Use citeproc 0.1.0.1. 2020-10-13 22:44:05 -07:00
John MacFarlane
b438f0c7a7 pandoc.cabal - recognize new formats in description. 2020-10-10 22:20:56 -07:00
John MacFarlane
eff6b8f27d Use latest citeproc. 2020-09-27 16:03:31 -07:00
John MacFarlane
09d39e0e98 ALlow bytestring 0.11.x. 2020-09-23 22:27:20 -07:00
John MacFarlane
e0984a43a9 Add built-in citation support using new citeproc library.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.

* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
  modules under Text.Pandoc.Citeproc).  Exports `processCitations`.
  [API change]
* Add data files needed for Text.Pandoc.Citeproc:  default.csl
  in the data directory, and a citeproc directory that is just
  used at compile-time.  Note that we've added file-embed as a mandatory
  rather than a conditional depedency, because of the biblatex
  localization files. We might eventually want to use readDataFile
  for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
  in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
  some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
  release binary build instructions.  We will no longer distribute
  pandoc-citeproc.
* Markdown reader: tweak abbreviation support.  Don't insert a
  nonbreaking space after a potential abbreviation if it comes right before
  a note or citation.  This messes up several things, including citeproc's
  moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
  to convert between `csljson` and other bibliography formats,
  and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
  change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
  change]
* Added `bibtex`, `biblatex` as input formats.  This allows pandoc
  to convert between BibLaTeX and BibTeX and other bibliography formats,
  and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
  `readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
  This is needed because pandoc readers for bibliography formats put
  the bibliographic information in the `references` field of metadata;
  and unless standalone is specified, metadata gets ignored.
  (TODO: This needs improvement. We should trigger standalone for the
  reader when the input format is bibliographic, and for the writer
  when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`.  This was just
  ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
  [API change] This runs the processCitations transformation.
  We need to treat it like a filter so it can be placed
  in the sequence of filter runs (after some, before others).
  In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
  so this special filter may be specified either way in a defaults file
  (or by `citeproc: true`, though this gives no control of positioning
  relative to other filters).  TODO: we need to add something to the
  manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
  This behaves like a filter and will be positioned
  relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
  section which also includes some information formerly found in
  the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
  directory.  This changes the old pandoc-citeproc behavior, which looked
  in `~/.csl`.  Users can simply symlink `~/.csl` to the `csl`
  subdirectory of their pandoc user data directory if they want
  the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
  Ms writers.  Added CSL-related CSS to styles.html.
2020-09-21 10:15:50 -07:00
John MacFarlane
89c577befb Bump to 2.11. 2020-09-21 10:15:49 -07:00
Albert Krewinkel
ba591ba365
pandoc.cabal: sort build depends alphabetically (#6691) 2020-09-21 09:28:25 -07:00
Albert Krewinkel
acbea6b8c6
Lua filters: add SimpleTable for backwards compatibility (#6575)
A new type `SimpleTable` is made available to Lua filters. It is
similar to the `Table` type in pandoc versions before 2.10;
conversion functions from and to the new Table type are provided.

Old filters using tables now require minimal changes and can use,
e.g.,

    if PANDOC_VERSION > {2,10,1} then
      pandoc.Table = pandoc.SimpleTable
    end

and

    function Table (tbl)
      tbl = pandoc.utils.to_simple_table(tbl)
      …
      return pandoc.utils.from_simple_table(tbl)
    end

to work with the current pandoc version.
2020-09-20 15:48:31 -07:00
Albert Krewinkel
b2decdfd13
CI: bump tested GHC versions to 8.8.4 and 8.10.2
Besides being newer, GHC version 8.10.2 comes preinstalled on GitHub
Actions environments; using it slightly speeds up CI tests.
2020-09-20 22:57:51 +02:00
Albert Krewinkel
c9e7ccba6f
pandoc.cabal: allow hslua 1.2 again
Sporadic test failures also happen with hslua-1.1.*, so there is no need
to exclude the newer version.

This reverts commit 315b5a4836.
2020-09-20 09:20:59 +02:00
John MacFarlane
1dd9f8b654 Use released pandoc-types 1.22. 2020-09-19 09:58:16 -07:00
Albert Krewinkel
a400d0dc62
HTML writer: render table footers if present
Part of: #6314
2020-09-12 21:49:01 +02:00
Christian Despres
22babd5382
[API change] Rename Writers.Tables and its contents (#6679)
Writers.Tables is now Writers.AnnotatedTable. All of the types and
functions in it have had the "Ann" removed from them. Now it is
expected that the module be imported qualified.
2020-09-12 08:50:36 -07:00
Albert Krewinkel
315b5a4836
pandoc.cabal: disallow hslua 1.2
See #6674
2020-09-11 09:50:33 +02:00
Christian Despres
10c6c411f9
Add Writers.Tables helper functions and types, add tests for those (#6655)
Add Writers.Tables helper functions and types, add tests for those

The Writers.Tables module contains an AnnTable type that is a pandoc
Table with added inferred information that should be enough for
writers (in particular the HTML writer) to operate on without having
to lay out the table themselves.

The toAnnTable and fromAnnTable functions in that module convert
between AnnTable and Table. In addition to producing an AnnTable with
coherent and well-formed annotations, the toAnnTable function also
normalizes its input Table like the table builder does.

Various tests ensure that toAnnTable normalizes tables exactly like
the table builder, and that its annotations are coherent.
2020-09-05 14:36:51 -07:00