Commit graph

7159 commits

Author SHA1 Message Date
John MacFarlane
acc9afaf6f Correctly parse "raw" date value in markdown references metadata.
See jgm/citeproc#53.
2021-02-11 09:16:25 -08:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
John MacFarlane
f70795dc5e ODT reader: finer-grained errors on parse failure.
See #7091.
2021-02-08 09:39:59 -08:00
John MacFarlane
5cd1c1001f ODT reader: give more information if zip can't be unpacked. 2021-02-08 09:39:59 -08:00
Nils Carlson
69b7401e31
DocBook reader: Support informalfigure (#7079)
Add support for informalfigure.
2021-02-08 09:36:58 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
John MacFarlane
8e9131db4e Markdown reader: improved handling of mmd link attributes in references.
Previously they only worked for links that had titles.  Closes #7080.
2021-02-06 21:52:12 -08:00
Albert Krewinkel
a5169f68b2
Lua filters: use same function names in Haskell and Lua 2021-02-04 19:07:59 +01:00
Nick Berendsen
b79aba6ea1
ePub writer: belongs-to-collection metadata (#7063) 2021-02-03 09:00:18 -08:00
Albert Krewinkel
61b108d527 Lua: add module "pandoc.path"
The module allows to work with file paths in a convenient and
platform-independent manner.

Closes: #6001
Closes: #6565
2021-02-02 21:04:30 -08:00
John MacFarlane
ec8509295a Add parseOptionsFromArgs [API change, addition].
Exported by Text.Pandoc.App.
2021-02-02 17:00:03 -08:00
John MacFarlane
02d3c71e72 BibTeX writer: use doclayout and doctemplate.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.

It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.

This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.

Closes #7068.
2021-02-01 18:05:20 -08:00
John MacFarlane
b239c89a82 BibTeX writer fixes. Closes #7067.
+ Require citeproc 0.3.0.7, which correctly titlecases when titles
  contain non-ASCII characters.
+ Correctly handle 'pages' (= 'page' in CSL).
+ Correctly handle BibLaTeX 'langid' (= 'language' in CSL).
+ In BibTeX output, protect foreign titles since there's no language
  field.
2021-02-01 11:23:07 -08:00
John MacFarlane
d1875b69ec RST reader: fix handling of header in CSV tables.
The interpretation of this line is not affected
by the delim option. Closes #7064.
2021-01-31 12:05:46 -08:00
Albert Krewinkel
9c8ff53b54
CslJson writer: fix compiler warning 2021-01-31 14:37:47 +01:00
John MacFarlane
6695917258 CslJson writer: output [] if no references in input,
instead of raising a PandocAppError as before.
2021-01-30 18:10:22 -08:00
John MacFarlane
9223788a05 Markdown writer: handle math right before digit.
We insert an HTML comment to avoid a `$` right before
a digit, which pandoc will not recognize as a math delimiter.
2021-01-29 18:29:17 -08:00
Albert Krewinkel
300b9b0ea3
JATS writer: escape special chars in reference elements.
Prevents the generation of invalid markup if a citation element contains
an ampersand or another character with a special meaning in XML.
2021-01-29 09:51:20 +01:00
John MacFarlane
98c2a52b4e Clean up BibTeX parsing.
Previously there was a messy code path that gave strange
results in some cases, not passing through raw tex but
trying to extract a string content.  This was an artefact
of trying to handle some special bibtex-specific commands
in the BibTeX reader. Now we just handle these in the
LaTeX reader and simplify parsing in the BibTeX reader.
This does mean that more raw tex will be passed through
(and currently this is not sensitive to the `raw_tex`
extension; this should be fixed).

Closes #7049.
2021-01-26 22:45:57 -08:00
Mauro Bieg
12bc662535 LaTeX writer: change BCP47 lang tag from jp to ja
fixes #7047
2021-01-26 15:29:33 -08:00
Albert Krewinkel
490065f3ed Lua: always load built-in Lua scripts from default data-dir
The Lua modules `pandoc` and `pandoc.List` are now always loaded from the
system's default data directory. Loading from a different directory by
overriding the default path, e.g. via `--data-dir`, is no longer supported to
avoid unexpected behavior and to address security concerns.
2021-01-26 09:43:56 -08:00
John MacFarlane
198ce0cde9 ImageSize: use viewBox for svg if no length, width.
This change allows pandoc to extract size information
from more SVGs.  Closes #7045.
2021-01-22 20:49:41 -08:00
John MacFarlane
83d7804b8f
Merge pull request #7042 from tarleb/jats-element-citations
JATS writer: use element citations
2021-01-22 10:39:58 -08:00
Albert Krewinkel
b4b3560191
JATS writer: allow to use element-citation 2021-01-22 19:35:08 +01:00
John MacFarlane
fa952c8dbe Add biblatex, bibtex as output formats (closes #7040).
* `biblatex` and `bibtex` are now supported as output
  as well as input formats.

* New module Text.Pandoc.Writers.BibTeX, exporting
  writeBibTeX and writeBibLaTeX. [API change]

* New unexported function `writeBibtexString` in
  Text.Pandoc.Citeproc.BibTeX.
2021-01-22 10:08:43 -08:00
Albert Krewinkel
87083bd1d6
Text.Pandoc.Citeproc: use finer grained imports
This allows to import the module in writers without causing a circular
dependency.
2021-01-21 23:22:08 +01:00
John MacFarlane
5f98ac62e3 JATS writer: Ensure that disp-quote is always wrapped in p.
Closes #7041.
2021-01-19 20:39:58 -08:00
John MacFarlane
1c4d14cdcc RST writer: fix #7039.
We were losing content from inside spans with a class,
due to logic that is meant to avoid nested inline
structures that can't be represented in RST.

The logic was a bit stricter than necessary.  This
commit fixes the issue.
2021-01-18 11:32:02 -08:00
John MacFarlane
c841bcf3b0 Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"
This reverts commit 6efd3460a7.

Since this extension is designed to be used with
GitHub markdown (gfm), we need to implement the parser
as a commonmark extension (commonmark-extensions),
rather than in pandoc's markdown reader.  When that is
done, we can add it here.
2021-01-16 16:22:04 -08:00
Gautier DI FOLCO
6efd3460a7
Markdown reader: support GitHub wiki's internal links (#2923) (#6458)
Canges overview:

 * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change].
 * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown`
 * Add tests.
2021-01-16 16:15:33 -08:00
John MacFarlane
83336a45a7 Recognize more extensions as markdown by default.
`mkdn`, `mkd`, `mdwn`, `mdown`, `Rmd`.
Closes #7034.
2021-01-16 11:15:35 -08:00
John MacFarlane
387d3e76ee Markdown writer: cleaned up raw formats.
We now react appropriately to gfm, commonmark, and commonmark_x
as raw formats.
2021-01-12 10:20:32 -08:00
John MacFarlane
c451207b08 Docx writer: handle table header using styles.
Instead of hard-coding the border and header cell vertical alignment,
we now let this be determined by the Table style, making use of
Word's "conditional formatting" for the table's first row.
For headerless tables, we use the tblLook element to tell Word
not to apply conditional first-row formatting.

Closes #7008.
2021-01-12 09:49:10 -08:00
Albert Krewinkel
68fa437999
JATS writer: fix citations (#7018)
* JATS writer: keep code lines at 80 chars or below

* JATS writer: fix citations
2021-01-10 15:35:48 -08:00
John MacFarlane
e741c7f553 Fix infinite HTTP requests when writing epubs from URL source.
Due to a bug in code added to avoid overwriting the cover image
if it had the form `fileX.YYY`, pandoc made an endless sequence
of HTTP requests when writing epub with input from a URL.

Closes #7013.
2021-01-10 12:49:53 -08:00
John MacFarlane
d98ec4feb8 T.P.Citeproc: factor out and export getStyle. 2021-01-10 11:48:53 -08:00
John MacFarlane
402d984bc5 T.P.Citeproc: factor out getLang. 2021-01-10 10:28:53 -08:00
John MacFarlane
15e33b33b4 T.P.Citeproc: refactor and export getReferences.
See #7016.
2021-01-10 10:15:30 -08:00
Albert Krewinkel
fe1378227b
Org reader: allow multiple pipe chars in todo sequences
Additional pipe chars, used to separate "action" state from "no further
action" states, are ignored. E.g., for the following sequence, both
`DONE` and `FINISHED` are states with no further action required.

    #+TODO: UNFINISHED | DONE | FINISHED

Previously, parsing of the todo sequence failed if multiple pipe chars
were included.

Closes: #7014
2021-01-09 13:40:31 +01:00
Albert Krewinkel
4f34345867
Update copyright notices for 2021 (#7012) 2021-01-08 09:38:20 -08:00
John MacFarlane
327e1428c5 gfm/commonmark writer: implement start number on ordered lists.
Previously they always started at 1, but according to the spec
the start number is respected. Closes #7009.
2021-01-07 16:42:05 -08:00
John MacFarlane
c0d8b186d1 T.P.Parsing: modify gridTableWith' for headerless tables.
If the table lacks a header, the header row should be an empty
list. Previously we got a list of empty cells, which caused
an empty header to be emitted instead of no header.  In LaTeX/PDF
output that meant we got a double top line with space between.

@tarleb @despres - please let me know if this is problematic
for some reason I'm not grasping.
2021-01-07 11:07:03 -08:00
John MacFarlane
15ba184e6e HTML writer: fix implicit_figure at end of footnotes.
Closes #7006.
2021-01-05 12:07:02 -08:00
David Martschenko
385b6a3b21
Implement defaults file inheritance (#6924)
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
2021-01-05 10:15:59 -08:00
John MacFarlane
ea479bf28a LaTeX reader: handle filecontents environment.
Closes #7003.
2021-01-04 14:05:03 -08:00
John MacFarlane
1ce7db1fa6 EPUB writer: adjust internal links to identifiers...
defined in raw HTML sections after splitting into
chapters.

Closes #7000.
2021-01-04 11:38:18 -08:00
John MacFarlane
f04e02d8d5 EPUB writer: recognize Format "html4", Format "html5" as raw HTML. 2021-01-03 11:35:36 -08:00
John MacFarlane
21ee2d80c2 EPUB writer: adjust internal links to images, links, and tables...
after splitting into chapters. Previously we only did this for
Div and Span and Header elements.  See #7000.
2021-01-03 11:27:01 -08:00
Dimitri Sabadie
57b1094152
Org reader: mark verbatim code with class "verbatim". (#6998)
* Replace org-mode’s verbatim from code to codeWith.

This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.

* Alter test for org-mode’s verbatim change.

See previous commit for further detail on the new implementation.
2021-01-03 08:57:47 +01:00
John MacFarlane
260aaaacc6 LaTeX reader: put contents of unknown environments in a Div...
when `raw_tex` is not enabled. (When `raw_tex` is enabled,
the whole environment is parsed as a raw block.)
The class name is the name of the environment.
Previously, we just included the contents without the
surrounding Div, but having a record of the environment's
boundaries and name can be useful.

Closes #6997.
2021-01-02 08:19:00 -08:00