Commit graph

7312 commits

Author SHA1 Message Date
John MacFarlane
d9322629a3 LaTeX reader improvements.
* Rewrote `withRaw` so it doesn't rely on fragile assumptions
  about token positions (which break when macros are expanded).
  This requires the addition of `sEnableWithRaw` and `sRawTokens`
  in `LaTeXState`, and a new combinator `disablingWithRaw` to
  disable collecting of raw tokens in certain contexts.
* Add `parseFromToks` to T.P.Readers.LaTeX.Parsing.
* Fix parsing of single character tokens so it doesn't mess
  up the new raw token collecting.
* These changes slightly increase allocations and have a small
  performance impact, but it's minor.

Closes #7092.
2021-02-12 19:04:14 -08:00
John MacFarlane
390d5e65b2 Use getTimestamp instead of getCurrentTime in writers.
Setting SOURCE_DATE_EPOCH will allow reproducible builds.

Partially addresses #7093.  This does not suffice to fully enable
reproducible in EPUB, since a unique id is being generated for each
build.
2021-02-11 14:55:03 -08:00
John MacFarlane
3c4a58bad0 T.P.Class: Add getTimestamp [API change].
This attempts to read the SOURCE_DATE_EPOCH environment variable
and parse a UTC time from it (treating it as a unix date stamp,
see https://reproducible-builds.org/specs/source-date-epoch/).
If the variable is not set or can't be parsed as a unix date
stamp, then the function returns the current date.
2021-02-11 14:54:28 -08:00
John MacFarlane
acc9afaf6f Correctly parse "raw" date value in markdown references metadata.
See jgm/citeproc#53.
2021-02-11 09:16:25 -08:00
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
John MacFarlane
f70795dc5e ODT reader: finer-grained errors on parse failure.
See #7091.
2021-02-08 09:39:59 -08:00
John MacFarlane
5cd1c1001f ODT reader: give more information if zip can't be unpacked. 2021-02-08 09:39:59 -08:00
Nils Carlson
69b7401e31
DocBook reader: Support informalfigure (#7079)
Add support for informalfigure.
2021-02-08 09:36:58 -08:00
Albert Krewinkel
d202f7eb77
Avoid unnecessary use of NoImplicitPrelude pragma (#7089) 2021-02-07 10:02:35 -08:00
John MacFarlane
8e9131db4e Markdown reader: improved handling of mmd link attributes in references.
Previously they only worked for links that had titles.  Closes #7080.
2021-02-06 21:52:12 -08:00
Albert Krewinkel
a5169f68b2
Lua filters: use same function names in Haskell and Lua 2021-02-04 19:07:59 +01:00
Nick Berendsen
b79aba6ea1
ePub writer: belongs-to-collection metadata (#7063) 2021-02-03 09:00:18 -08:00
Albert Krewinkel
61b108d527 Lua: add module "pandoc.path"
The module allows to work with file paths in a convenient and
platform-independent manner.

Closes: #6001
Closes: #6565
2021-02-02 21:04:30 -08:00
John MacFarlane
ec8509295a Add parseOptionsFromArgs [API change, addition].
Exported by Text.Pandoc.App.
2021-02-02 17:00:03 -08:00
John MacFarlane
02d3c71e72 BibTeX writer: use doclayout and doctemplate.
This change allows bibtex/biblatex output to wrap as other
formats do, depending on the settings of `--wrap` and `--columns`.

It also introduces default templates for bibtex and biblatex,
which allow for using the variables `header-include`, `include-before`
or `include-after` (or alternatively the command line options
`--include-in-header`, `--include-before-body`, `--include-after-body`)
to insert content into the generated bibtex/biblatex.

This change requires a change in the return type of the unexported
`T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`.

Closes #7068.
2021-02-01 18:05:20 -08:00
John MacFarlane
b239c89a82 BibTeX writer fixes. Closes #7067.
+ Require citeproc 0.3.0.7, which correctly titlecases when titles
  contain non-ASCII characters.
+ Correctly handle 'pages' (= 'page' in CSL).
+ Correctly handle BibLaTeX 'langid' (= 'language' in CSL).
+ In BibTeX output, protect foreign titles since there's no language
  field.
2021-02-01 11:23:07 -08:00
John MacFarlane
d1875b69ec RST reader: fix handling of header in CSV tables.
The interpretation of this line is not affected
by the delim option. Closes #7064.
2021-01-31 12:05:46 -08:00
Albert Krewinkel
9c8ff53b54
CslJson writer: fix compiler warning 2021-01-31 14:37:47 +01:00
John MacFarlane
6695917258 CslJson writer: output [] if no references in input,
instead of raising a PandocAppError as before.
2021-01-30 18:10:22 -08:00
John MacFarlane
9223788a05 Markdown writer: handle math right before digit.
We insert an HTML comment to avoid a `$` right before
a digit, which pandoc will not recognize as a math delimiter.
2021-01-29 18:29:17 -08:00
Albert Krewinkel
300b9b0ea3
JATS writer: escape special chars in reference elements.
Prevents the generation of invalid markup if a citation element contains
an ampersand or another character with a special meaning in XML.
2021-01-29 09:51:20 +01:00
John MacFarlane
98c2a52b4e Clean up BibTeX parsing.
Previously there was a messy code path that gave strange
results in some cases, not passing through raw tex but
trying to extract a string content.  This was an artefact
of trying to handle some special bibtex-specific commands
in the BibTeX reader. Now we just handle these in the
LaTeX reader and simplify parsing in the BibTeX reader.
This does mean that more raw tex will be passed through
(and currently this is not sensitive to the `raw_tex`
extension; this should be fixed).

Closes #7049.
2021-01-26 22:45:57 -08:00
Mauro Bieg
12bc662535 LaTeX writer: change BCP47 lang tag from jp to ja
fixes #7047
2021-01-26 15:29:33 -08:00
Albert Krewinkel
490065f3ed Lua: always load built-in Lua scripts from default data-dir
The Lua modules `pandoc` and `pandoc.List` are now always loaded from the
system's default data directory. Loading from a different directory by
overriding the default path, e.g. via `--data-dir`, is no longer supported to
avoid unexpected behavior and to address security concerns.
2021-01-26 09:43:56 -08:00
John MacFarlane
198ce0cde9 ImageSize: use viewBox for svg if no length, width.
This change allows pandoc to extract size information
from more SVGs.  Closes #7045.
2021-01-22 20:49:41 -08:00
John MacFarlane
83d7804b8f
Merge pull request #7042 from tarleb/jats-element-citations
JATS writer: use element citations
2021-01-22 10:39:58 -08:00
Albert Krewinkel
b4b3560191
JATS writer: allow to use element-citation 2021-01-22 19:35:08 +01:00
John MacFarlane
fa952c8dbe Add biblatex, bibtex as output formats (closes #7040).
* `biblatex` and `bibtex` are now supported as output
  as well as input formats.

* New module Text.Pandoc.Writers.BibTeX, exporting
  writeBibTeX and writeBibLaTeX. [API change]

* New unexported function `writeBibtexString` in
  Text.Pandoc.Citeproc.BibTeX.
2021-01-22 10:08:43 -08:00
Albert Krewinkel
87083bd1d6
Text.Pandoc.Citeproc: use finer grained imports
This allows to import the module in writers without causing a circular
dependency.
2021-01-21 23:22:08 +01:00
John MacFarlane
5f98ac62e3 JATS writer: Ensure that disp-quote is always wrapped in p.
Closes #7041.
2021-01-19 20:39:58 -08:00
John MacFarlane
1c4d14cdcc RST writer: fix #7039.
We were losing content from inside spans with a class,
due to logic that is meant to avoid nested inline
structures that can't be represented in RST.

The logic was a bit stricter than necessary.  This
commit fixes the issue.
2021-01-18 11:32:02 -08:00
John MacFarlane
c841bcf3b0 Revert "Markdown reader: support GitHub wiki's internal links (#2923) (#6458)"
This reverts commit 6efd3460a7.

Since this extension is designed to be used with
GitHub markdown (gfm), we need to implement the parser
as a commonmark extension (commonmark-extensions),
rather than in pandoc's markdown reader.  When that is
done, we can add it here.
2021-01-16 16:22:04 -08:00
Gautier DI FOLCO
6efd3460a7
Markdown reader: support GitHub wiki's internal links (#2923) (#6458)
Canges overview:

 * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change].
 * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown`
 * Add tests.
2021-01-16 16:15:33 -08:00
John MacFarlane
83336a45a7 Recognize more extensions as markdown by default.
`mkdn`, `mkd`, `mdwn`, `mdown`, `Rmd`.
Closes #7034.
2021-01-16 11:15:35 -08:00
John MacFarlane
387d3e76ee Markdown writer: cleaned up raw formats.
We now react appropriately to gfm, commonmark, and commonmark_x
as raw formats.
2021-01-12 10:20:32 -08:00
John MacFarlane
c451207b08 Docx writer: handle table header using styles.
Instead of hard-coding the border and header cell vertical alignment,
we now let this be determined by the Table style, making use of
Word's "conditional formatting" for the table's first row.
For headerless tables, we use the tblLook element to tell Word
not to apply conditional first-row formatting.

Closes #7008.
2021-01-12 09:49:10 -08:00
Albert Krewinkel
68fa437999
JATS writer: fix citations (#7018)
* JATS writer: keep code lines at 80 chars or below

* JATS writer: fix citations
2021-01-10 15:35:48 -08:00
John MacFarlane
e741c7f553 Fix infinite HTTP requests when writing epubs from URL source.
Due to a bug in code added to avoid overwriting the cover image
if it had the form `fileX.YYY`, pandoc made an endless sequence
of HTTP requests when writing epub with input from a URL.

Closes #7013.
2021-01-10 12:49:53 -08:00
John MacFarlane
d98ec4feb8 T.P.Citeproc: factor out and export getStyle. 2021-01-10 11:48:53 -08:00
John MacFarlane
402d984bc5 T.P.Citeproc: factor out getLang. 2021-01-10 10:28:53 -08:00
John MacFarlane
15e33b33b4 T.P.Citeproc: refactor and export getReferences.
See #7016.
2021-01-10 10:15:30 -08:00
Albert Krewinkel
fe1378227b
Org reader: allow multiple pipe chars in todo sequences
Additional pipe chars, used to separate "action" state from "no further
action" states, are ignored. E.g., for the following sequence, both
`DONE` and `FINISHED` are states with no further action required.

    #+TODO: UNFINISHED | DONE | FINISHED

Previously, parsing of the todo sequence failed if multiple pipe chars
were included.

Closes: #7014
2021-01-09 13:40:31 +01:00
Albert Krewinkel
4f34345867
Update copyright notices for 2021 (#7012) 2021-01-08 09:38:20 -08:00
John MacFarlane
327e1428c5 gfm/commonmark writer: implement start number on ordered lists.
Previously they always started at 1, but according to the spec
the start number is respected. Closes #7009.
2021-01-07 16:42:05 -08:00
John MacFarlane
c0d8b186d1 T.P.Parsing: modify gridTableWith' for headerless tables.
If the table lacks a header, the header row should be an empty
list. Previously we got a list of empty cells, which caused
an empty header to be emitted instead of no header.  In LaTeX/PDF
output that meant we got a double top line with space between.

@tarleb @despres - please let me know if this is problematic
for some reason I'm not grasping.
2021-01-07 11:07:03 -08:00
John MacFarlane
15ba184e6e HTML writer: fix implicit_figure at end of footnotes.
Closes #7006.
2021-01-05 12:07:02 -08:00
David Martschenko
385b6a3b21
Implement defaults file inheritance (#6924)
Allow defaults files to inherit options from other defaults files by
specifying them with the following syntax:
`defaults: [list of defaults files or single defaults file]`.
2021-01-05 10:15:59 -08:00
John MacFarlane
ea479bf28a LaTeX reader: handle filecontents environment.
Closes #7003.
2021-01-04 14:05:03 -08:00
John MacFarlane
1ce7db1fa6 EPUB writer: adjust internal links to identifiers...
defined in raw HTML sections after splitting into
chapters.

Closes #7000.
2021-01-04 11:38:18 -08:00
John MacFarlane
f04e02d8d5 EPUB writer: recognize Format "html4", Format "html5" as raw HTML. 2021-01-03 11:35:36 -08:00
John MacFarlane
21ee2d80c2 EPUB writer: adjust internal links to images, links, and tables...
after splitting into chapters. Previously we only did this for
Div and Span and Header elements.  See #7000.
2021-01-03 11:27:01 -08:00
Dimitri Sabadie
57b1094152
Org reader: mark verbatim code with class "verbatim". (#6998)
* Replace org-mode’s verbatim from code to codeWith.

This adds the `"verbatim"` class so that exporters can apply a specific
style on it. For instance, it will be possible for HTML to add a CSS
rule for code + verbatim class.

* Alter test for org-mode’s verbatim change.

See previous commit for further detail on the new implementation.
2021-01-03 08:57:47 +01:00
John MacFarlane
260aaaacc6 LaTeX reader: put contents of unknown environments in a Div...
when `raw_tex` is not enabled. (When `raw_tex` is enabled,
the whole environment is parsed as a raw block.)
The class name is the name of the environment.
Previously, we just included the contents without the
surrounding Div, but having a record of the environment's
boundaries and name can be useful.

Closes #6997.
2021-01-02 08:19:00 -08:00
John MacFarlane
9a18cf4b59 LaTeX writer: revert table line height increase in 2.11.3.
In 2.11.3 we started adding `\addlinespace`, which produced less
dense tables.  This wasn't an intentional change; I misunderstood
a comment in the discussion leading up to the change. This commit
restores the earlier default table appearance.

Note that if you want a less dense table, you can use something like
`\def\arraystretch{1.5}` in your header.

Closes #6996.
2021-01-02 07:56:07 -08:00
Albert Krewinkel
17e3efc785
Org reader: restructure output of captioned code blocks
The Div wrapper of code blocks with captions now has the class
"captioned-content". The caption itself is added as a Plain block
inside a Div of class "caption". This makes it easier to write filters
which match on captioned code blocks. Existing filters will need to be
updated.

Closes: #6977
2021-01-01 11:18:36 +01:00
John MacFarlane
23f964b907 Mediawiki reader: allow space around storng/emph delimiters.
Closes #6993.
2020-12-30 21:31:28 -08:00
John MacFarlane
0782d5882c Undo the "Use fromRight" hlint hint. 2020-12-30 16:04:09 -08:00
John MacFarlane
419190213a Hlint fixes 2020-12-30 15:38:48 -08:00
John MacFarlane
49286a25df Ms writer: don't justify inside table cells. 2020-12-30 13:36:18 -08:00
John MacFarlane
3cd21c5f6e Improve fix to #6983.
If we have a paragraph then a bookmarkEnd, we don't need to
insert the empty paragraph (and in fact it alters the spacing).

Closes #6983.
2020-12-29 08:44:43 -08:00
John MacFarlane
55f9b59af1 Docx writer: fix nested tables with captions.
Previously we got unreadable content, because docx seems
to want a `<w:p>` element (even an empty one) at the end of
every table cell.  Closes #6983.
2020-12-28 14:41:28 -08:00
Albert Krewinkel
e837ed772e
HTML reader: use renderTags' from Text.Pandoc.Shared.
The `renderTags'` function was duplicated when the reader used `Text` as
its string type. The duplication is no longer necessary.

A side effect of this change is that empty `<col>` elements are written
as self-closing tags in raw HTML blocks.
2020-12-28 14:48:55 +01:00
John MacFarlane
99e1b67b74 Use meta-description instead of description in templates.
Since this is an attribute value, we need to prepare it
in the writer.
2020-12-27 23:19:14 -08:00
timo-a
668596cc89
Add support for writing nested tables to asciidoc (#6972)
Added field to WriterState that denotes the current nesting level for traversing tables.
Depending on the value of that field nested tables are recognized and written.
Asciidoc supports one level of nesting. If deeper tables are to be written, they are
omitted and a warning is issued.
2020-12-27 18:42:28 -08:00
Albert Krewinkel
dcd89413f3 Powerpoint writer: allow arbitrary OOXML in raw inline elements
The raw text is now included verbatim in the output. Previously is was parsed
into XML elements, which prevented the inclusion of partial XML snippets.
2020-12-27 23:18:54 +01:00
John MacFarlane
47f435276a Citeproc: fix handling of empty URL variables (DOI, etc.).
The `linkifyVariables` function was changing these to links
which then got treated as non-empty by citeproc, leading
to wrong results (e.g. ignoring nonempty URL when empty DOI is present).

Addresses part 2 of jgm/citeproc#41.
2020-12-24 09:56:20 -08:00
John MacFarlane
9cbbf18fe1 HTML writer: don't include p tags in CSL bibliography entries.
Fixes a regression in 2.11.3.
Closes #6966
2020-12-20 22:34:31 -08:00
Albert Krewinkel
8f402beab9
LaTeX writer: support colspans and rowspans in tables. (#6950)
Note that the multirow package is needed for rowspans.
It is included in the latex template under a variable,
so that it won't be used unless needed for a table.
2020-12-20 18:04:54 -08:00
John MacFarlane
914cf0b602 Fix citeproc regression with duplicate references.
- Use dev version of citeproc, which handles duplicate
  ids better, preferring the last one in the list
  and discarding the rest.
- Ensure that inline citations take priority over external
  ones.

See jgm/citeproc#36.

This restores the behavior of pandoc-citeproc.
2020-12-16 15:37:40 -08:00
John MacFarlane
57241e201a Support Lua marshalling of doctemplates BoolVal.
This updates T.P.Lua.Marshaling.Context for doctemplates >= 0.9.
2020-12-16 07:56:07 -08:00
John MacFarlane
b4b4e32307 Properly handle boolean values in writing YAML metadata.
(Markdown writer.)
This requires doctemplates >= 0.9.
Closes #6388.
2020-12-15 23:45:34 -08:00
John MacFarlane
87033b2856 Use fetchItem to get external bibliography.
This means that:

- a URL may be provided, and pandoc will fetch the resource.
- Pandoc will search the resource path for the bibliography
  if it is not found relative to the working directory.

Closes #6940.
2020-12-15 09:09:51 -08:00
John MacFarlane
7d799bfcda Allow both inline and external references to be used
with `--citeproc`.  This fixes a regression, since pandoc-citeproc
allowed these to be combined.

Closes #6951.
2020-12-15 08:51:43 -08:00
John MacFarlane
39153ea6e2 ImageSize: use exif width and height when available.
After the move to JuicyPixels, we were getting incorrect
width and heigh information for some images (see #6936, test-3.jpg).

The correct information was encoded in Exif tags that
JuicyPixels seemed to ignore. So we check these first
before looking at the Width and Height identified by
JuicyPixels.

Closes #6936.
2020-12-14 09:39:07 -08:00
John MacFarlane
c43e2dc0f4 RST writer: better image handling.
- An image alone in its paragraph (but not a figure) is now
  rendered as an independent image, with an `alt` attribute
  if a description is supplied.
- An inline image that is not alone in its paragraph will
  be rendered, as before, using a substitution.
  Such an image cannot have a "center", "left", or
  "right" alignment, so the classes `align-center`,
  `align-left`, or `align-right` are ignored.
  However, `align-top`, `align-middle`, `align-bottom`
  will generate a corresponding `align` attribute.

Closes #6948.
2020-12-13 15:25:46 -08:00
John MacFarlane
32902d0fad
Merge pull request #6941 from tarleb/docx-raw
Docx writer: keep raw openxml strings verbatim
2020-12-13 11:08:41 -08:00
John MacFarlane
c3aa90b57a ImageSize: use JuicyPixels to extract size...
...for png, jpeg, gif, instead of doing our own binary parsing.
See #6936.
2020-12-13 10:33:46 -08:00
John MacFarlane
ef62b70646 ImageSize: use JuicyPixels to determine png size. 2020-12-13 10:33:46 -08:00
Albert Krewinkel
00031fc809
Docx writer: keep raw openxml strings verbatim.
Closes: #6933
2020-12-13 14:09:59 +01:00
Albert Krewinkel
8cf58d96e0
Docx writer: use Content instead of Element. 2020-12-13 14:09:53 +01:00
John MacFarlane
3a7d97f02f
Merge pull request #6946 from mb21/icml-image-fit
ICML writer: fix image bounding box for custom widths/heights
2020-12-12 08:28:14 -08:00
Albert Krewinkel
ccd235e31f
LaTeX writer: extract table handling into separate module. 2020-12-12 16:48:28 +01:00
mb21
208cb96196 ICML writer: fix image bounding box for custom widths/heights
fixes #6936
2020-12-12 14:49:11 +01:00
John MacFarlane
fcd0658189 HTML reader: pay attention to lang attributes on body.
These (as well as lang attributes on html) should update
lang in metadata. See #6938.
2020-12-10 15:51:20 -08:00
John MacFarlane
0a502e5ff5 HTML reader: retain attribute prefixes and avoid duplicates.
Previously we stripped attribute prefixes, reading
`xml:lang` as `lang` for example. This resulted in
two duplicate `lang` attributes when `xml:lang` and
`lang` were both used.  This commit causes the prefixes
to be retained, and also avoids invald duplicate
attributes.

Closes #6938.
2020-12-10 15:44:10 -08:00
John MacFarlane
a3eb87b2ea Add sourcepos extension for commonmarke
* Add `Ext_sourcepos` constructor for `Extension`.
* Add `sourcepos` extension (only for commonmark).
* Bump to 2.11.3

With the `sourcepos` extension set set, `data-pos` attributes are added
to the AST by the commonmark reader. No other readers are affected.  The
`data-pos` attributes are put on elements that accept attributes; for
other elements, an enlosing Div or Span is added to hold the attributes.

Closes #4565.
2020-12-10 08:59:55 -08:00
John MacFarlane
8c9010864c Commonmark reader: refactor specFor, set input name to "". 2020-12-10 08:59:55 -08:00
John MacFarlane
5990cbb150 Parsing: Small code improvements. 2020-12-07 21:34:23 -08:00
John MacFarlane
0fa1023b9e Parsing: More minor performance improvements. 2020-12-07 18:57:09 -08:00
John MacFarlane
ce1791913d Small efficiency improvement in uri parser 2020-12-07 13:24:19 -08:00
John MacFarlane
2f9b684b3a Bibtex parser: avoid noneOf. 2020-12-07 13:01:30 -08:00
John MacFarlane
f2749ba6cd Parsing: in nonspaceChar use satisfy instead of oneOf.
For efficiency.
2020-12-07 12:56:03 -08:00
John MacFarlane
501ea7f0c4 Dokuwiki reader: handle unknown interwiki links better.
DokuWiki lets the user define his own Interwiki links.
Previously pandoc reacted to these by emitting a
google search link, which is not helpful. Instead,
we now just emit the full URL including the
wikilink prefix, e.g. `faquk>FAQ-mathml`.
This at least gives users the ability to
modify the links using filters.

Closes #6932.
2020-12-07 12:15:14 -08:00
John MacFarlane
810df00cf5
Merge pull request #6922 from jtojnar/db-writer-admonitions
Docbook writer: handle admonitions
2020-12-07 08:48:02 -08:00
Jan Tojnar
70c7c5703a
Docbook writer: Handle admonition titles from Markdown reader
Docbook reader produces a `Div` with `title` class for `<title>` element
within an “admonition” element. Markdown writer then turns this
into a fenced div with `title` class attribute. Since fenced divs
are block elements, their content is recognized as a paragraph
by the Markdown reader. This is an issue for Docbook writer because
it would produce an invalid DocBook document from such AST –
the `<title>` element can only contain “inline” elements.

Let’s handle this invalid special case separately by unwrapping
the paragraph before creating the `<title>` element.
2020-12-07 07:28:39 +01:00
Jan Tojnar
16ef877457
Docbook writer: Use correct id attribute consistently
DocBook5 should always use xml:id instead of id so let’s use it everywhere.
2020-12-07 06:23:25 +01:00
Jan Tojnar
dc6856530c
Docbook writer: handle admonitions
Similarly to d6fdfe6f2b,
we should handle admonitions.
2020-12-07 06:23:25 +01:00
Albert Krewinkel
acf932825b
Org reader: preserve targets of spurious links
Links with (internal) targets that the reader doesn't know about are
converted into emphasized text. Information on the link target is now
preserved by wrapping the text in a Span of class `spurious-link`, with
an attribute `target` set to the link's original target. This allows to
recover and fix broken or unknown links with filters.

See: #6916
2020-12-05 22:37:48 +01:00
Nils Carlson
c161893f44
OpenDocument writer: Allow references for internal links (#6774)
This commit adds two extensions to the OpenDocument writer,
`xrefs_name` and `xrefs_number`.

Links to headings, figures and tables inside the document are
substituted with cross-references that will use the name or caption
of the referenced item for `xrefs_name` or the number for `xrefs_number`.

For the `xrefs_number` to be useful heading numbers must be enabled
in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension.

In order for numbers and reference text to be updated the generated
document must be refreshed.

Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>
2020-12-05 10:00:04 -08:00
John MacFarlane
ddb76cb356 LaTeX reader: don't apply theorem default styling to a figure inside.
If we put an image in italics, then when rendering to Markdown
we no longer get an implicit figure.

Closes #6925.
2020-12-05 09:53:39 -08:00
Jan Tojnar
6f35600204
Docbook writer: add XML namespaces to top-level elements (#6923)
Previously, we only added xmlns attributes to chapter elements,
even when running with --top-level-division=section.
Let’s add the namespaces to part and section elements too,
when they are the selected top-level divisions.

We do not need to add namespaces to documents produced with
--standalone flag, since those will already have xmlns attribute
on the root element in the template.
2020-12-04 21:00:21 -08:00
John MacFarlane
dc3ef5201f Markdown writer: ensure that a new csl-block begins on a new line.
This just looks better and doesn't affect the semantics.
See #6921.
2020-12-04 10:55:48 -08:00
John MacFarlane
68bcddeb21 LaTeX writer: Fix bug with nested csl- display Spans.
See #6921.
2020-12-04 10:14:19 -08:00
John MacFarlane
171d3db384 HTML writer: Fix handling of nested csl- display spans.
Previously inner Spans used to represent
CSL display attributes were not rendered as div tags.

See #6921.
2020-12-04 09:47:56 -08:00
John MacFarlane
7199d68ba0 EPUB writer: include title page in landmarks.
Closes #6919.

Note that the toc is also included if `--toc` is specified.
2020-12-03 21:39:44 -08:00
John MacFarlane
9c6cc79c11 EPUB writer: add frontmatter type on body element for nav.xhtml.
Closes #6918.
2020-12-03 21:24:27 -08:00
John MacFarlane
5bbd5a9e80 Docx writer: Support bold and italic in "complex script."
Previously bold and italics didn't work properly in LTR
text.  This commit causes the w:bCs and w:iCs attributes
to be used, in addition to w:b and w:i, for bold and
italics respectively.

Closes #6911.
2020-12-03 09:51:23 -08:00
John MacFarlane
7b11cdee49 Citeproc: ensure that BCP47 lang codes can be used.
We ignore the variants and just use the base lang code
and country code when passing off to citeproc.
2020-12-02 10:46:23 -08:00
John MacFarlane
bff9c129c3 LaTeX reader: don't parse \rule with width 0 as horizontal rule. 2020-11-29 10:35:20 -08:00
Tassos Manganaris
83d63b72e1 Fix a tiny Typo in the CSV reader module
Header comment in the CSV reader module says "RST" instead of "CSV".
2020-11-28 09:40:15 +01:00
Albert Krewinkel
8c38390038
HTML reader tests: improve test coverage of new features 2020-11-27 21:21:25 +01:00
Albert Krewinkel
a9c766291f
HTML reader: support body headers, row head columns
Closes: #6312
2020-11-27 10:36:13 +01:00
John MacFarlane
db2db54f80 Added some explicit imports. 2020-11-26 12:44:01 -08:00
cholonam
5f4deb5455 Docx writer: Fix bullets/lists indentation
Fix appearance of bullets/numbered lists (the first level is slightly
indented to the right instead of right on the margin).

New golden files have been tested using Word 2010 on Windows 10.
2020-11-26 12:11:26 -08:00
Igor Pashev
630b1bff2b
LaTeX reader: preserve center environment (#6852)
The contents of the `center` environment are put in a `Div`
with class `center`.
2020-11-26 12:04:31 -08:00
Albert Krewinkel
07919e1b22
HTML reader: improve support for table headers, footer, attributes
- `<tfoot>` elements are no longer added to the table body but used as
  table footer.
- Separate `<tbody>` elements are no longer combined into one.
- Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>`
  elements are preserved.
2020-11-26 07:22:01 +01:00
Albert Krewinkel
3e01ae405f
HTML reader: allow finer grained options for tag omission 2020-11-26 07:22:01 +01:00
John MacFarlane
7c4d7db9c7 LaTeX writer: improve longtable output.
- Don't create minipages for regular paragraphs.
- Put width and alignment information in the longtable column
  descriptors.
- Closes #6883.
2020-11-25 15:42:44 -08:00
John MacFarlane
b50ac3a95b LaTeX tables: Fix calculation of column spacing.
See #6883.
2020-11-25 14:41:28 -08:00
John MacFarlane
815976d537 Fix truncation of [Citation] list in Cite inside footnotes...
This affected author-in-text citations in footnotes.
It didn't cause problems for the printed output, but for
filters that expected the citation id and other information.

Closes #6890.
2020-11-25 09:10:10 -08:00
Albert Krewinkel
c6f2663a23
HTML reader: simplify list attribute handling
This removes the `foldOrElse` function from the internal Text.Pandoc.CSS
module.
2020-11-25 17:55:42 +01:00
Albert Krewinkel
c9f98e2bf5
HTML reader: support row or column-spanning table cells 2020-11-24 14:17:35 +01:00
Albert Krewinkel
446ef27a3f
HTML reader: support blocks in caption 2020-11-24 14:17:35 +01:00
Albert Krewinkel
41237fcc0e
HTML reader: extract table parsing into separate module 2020-11-24 14:17:35 +01:00
John MacFarlane
2f110265ff ImageSize: default to DPI 72 if the format specifies DPI of 0.
This shouldn't happen, in general, but it can happen with
JPEGs that don't conform to the spec.  Having a DPI of 0
will blow up size calculations (division by 0).

Closes #6880.
2020-11-23 09:39:48 -08:00
Albert Krewinkel
f9258371dd HTML reader: extract submodules
Reducing module size should reduce memory use during compilation.

This is preparatory work to tackle support for more table features.
2020-11-23 10:12:20 +01:00
Nils Carlson
75c881e2d9
OpenDocument Writer: Implement Div and Span ident support (#6755)
Spans and Divs containing an ident in the Attr will become bookmarks
or sections with idents in OpenDocument format.
2020-11-22 22:23:30 -08:00
John MacFarlane
b5b5ef92cb LaTeX writer: Improve table spacing.
+ Remove the `\strut` that was added at the end of minipage
  environments in cells.

+ Replace `\tabularnewline` with `\\ \addlinespace`.

Closes #6842, closes #6860.
2020-11-22 10:54:42 -08:00
Albert Krewinkel
5344dab8eb
Org reader: parse #+LANGUAGE into lang metadata field
Fixes: #6845
2020-11-22 12:53:05 +01:00
Nils Carlson
ae52918faa
OpenDocument writer: Table text width support (#6792)
Support for table width as a percentage of text width by summing
width of columns and verifying that the sum is > 0 and <= 1.
2020-11-21 12:42:43 -08:00
John MacFarlane
7db2cf5d2f LaTeX reader: more robust parsing of bracketed options.
Improves on 9a40976.  Closes #6873.
2020-11-21 12:24:37 -08:00
John MacFarlane
fec8223d3a Citeproc BibTeX parser: revert change in getRawField...
which was made (for reasons forgotten) when transferring
this code from pandoc-citeproc.  The change led to `--` in
URLs being interpreted as en-dashes, which is unwanted.

Closes #6874.
2020-11-21 12:07:28 -08:00
Nils Carlson
56ceaf49dc
DocBook reader: Table text width support (#6791)
Table width in relation to text width is not natively supported
by docbook but is by the docbook fo stylesheets through an XML
processing instruction, <?dbfo table-width="50%"?> .
Implement support for this instruction in the DocBook reader.
2020-11-20 16:05:56 -08:00
John MacFarlane
9a4097640f Improve LaTeX option parsing...
in cases where we run into trouble parsing inlines til the
closing `]`, e.g. quotes, we return a plain string with the
option contents. Previously we mistakenly included the brackets
in this string.

Closes #6869.
2020-11-20 13:40:26 -08:00
John MacFarlane
c647948ff1 commonmark_x: replace auto_identifiers with gfm_auto_identifiers.
`commonmark_x` never actually supported `auto_identifiers` (it
didn't do anything), because the underlying library implements
gfm-style identifiers only.

Attempts to add the `autolink_identifiers` extension to
`commonmark` will now fail with an error.

Closes #6863.
2020-11-20 09:17:14 -08:00
Albert Krewinkel
d286242131 JATS writer: support advanced table features 2020-11-19 22:09:52 +01:00
John MacFarlane
c1fbe7b91a --self-contained: increase coverage.
Previously we only self-contained attributes for
certain tag names (`img`, `embed`, `video`, `input`, `audio`,
`source`, `track`, `section`).  Now we self-contain any
occurrence of `src`, `data-src`, `poster`, or `data-background-image`,
on any tag; and also `href` on `link` tags.

Closes #6854 (which specifically asked about
`asciinema-player` tags).
2020-11-19 10:08:43 -08:00
John MacFarlane
e16df8d271 DocBook reader: drop period in formalpara title...
...and put it in a div with class `formalpara-title`, so that
people can reformat with filters.

Closes #6562.

Thanks to rdmuller.
2020-11-19 09:33:29 -08:00
John MacFarlane
0962b30d84 Man reader: improve handling of .IP.
We now better handle `.IP` when it is used with non-bullet,
non-numbered lists, creating a definition list.

We also skip blank lines like groff itself.

Closes #6858.
2020-11-18 22:44:32 -08:00
Albert Krewinkel
023468ea2d
JATS writer: wrap all tables
All `<table>` elements are put inside `<table-wrap>` elements, as the
former are not valid as immediate child elements of `<body>`.
2020-11-18 18:10:17 +01:00
TEC
0306eec5fa Replace org #+KEYWORDS with #+keywords
As of ~2 years ago, lower case keywords became the standard (though they
are handled case insensitive, as always):
13424336a6

Upper case keywords are exclusive to the manual:
- https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/
- https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/
2020-11-18 14:48:56 +01:00
TEC
224a501b29 Update org supported languages and identifiers
according to the current list contained in
https://orgmode.org/worg/org-contrib/babel/languages/index.html
2020-11-18 14:48:56 +01:00
John MacFarlane
efa34a8de6 Bibtex reader: fall back on en-US if locale for LANG not found.
This reproduces earlier pandoc-citeproc behavior.

Closes jgm/citeproc#26.
2020-11-17 23:12:32 -08:00
John MacFarlane
bf3fea0a8c Markdown reader: fix regression with example list references.
This affects example list references followed by dashes.
Introduced by commit b8d17f7.
Closes #6855.
2020-11-17 20:36:59 -08:00
Albert Krewinkel
94c9028819 JATS writer: move Table handling to separate module
This makes it easier to split the module into smaller parts.
2020-11-17 09:46:30 +01:00
John MacFarlane
c9ada73cac Move getNextNumber from Readers.LaTeX to Readers.LaTeX.Parsing. 2020-11-16 22:36:10 -08:00
John MacFarlane
ee34c4fef8 Only use filterIpynbOutput if input format is ipynb.
Closes #6841.
2020-11-16 18:21:30 -08:00
John MacFarlane
98bedd7631 When checking reader/writer name, check base name...
now that we permit extensions on formats other
than markdown.
2020-11-16 17:49:23 -08:00
John MacFarlane
5271c6b3fb Improve fix to siunitx numbers with minus.
- use real minus sign
- use tests contributed by Igor Pashev.
2020-11-16 16:36:16 -08:00
John MacFarlane
734b4c26a9 LaTeX reader: Fix negative numbers in siunitx commands.
The commit a157e1a broke negative numbers, e.g.
`\SI{-33}{\celcius}` or `\num{-3}`. This fixes the regression.
2020-11-16 14:08:29 -08:00
John MacFarlane
d7f905fb63 Markdown reader: fix detection of locators following in-text citations.
Prevously, if we had `@foo [p. 33; @bar]`, the `p. 33` would be
incorrectly parsed as a prefix of `@bar` rather than a suffix
of `@foo`.
2020-11-15 17:51:03 -08:00
John MacFarlane
f8225140a5 Text.Pandoc.PDF: Fix changePathSeparators for Windows.
Previously a path beginning with a drive, like
`C:\foo\bar`, was translated to `C:\/foo/bar`, which
caused problems.

With this fix, the backslashes are removed.

Closes #6173.
2020-11-15 10:43:43 -08:00
Albert Krewinkel
26f946af20
Remove redundant bracket in App.Opt 2020-11-15 12:08:15 +01:00
John MacFarlane
b5d066f167 Revise deprecation warning for --atx-headers. 2020-11-14 21:41:50 -08:00
Aner Lucero
f63b76e169 Markdown writer: default to using ATX headings.
Previously we used Setext (underlined) headings by default.
The default is now ATX (`##` style).

* Add the `--markdown-headings=atx|setext` option.
* Deprecate `--atx-headers`.
* Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change].
* Support `markdown-headings` in defaults files.
* Document new options in MANUAL.

Closes #6662.
2020-11-14 21:33:32 -08:00
John MacFarlane
b8d17f7ae8 Markdown reader: don't increment stateNoteNumber for example refs.
Background:  syntactically, references to example list items
can't be distinguished from citations; we only know which they
are after we've parsed the whole document (and this is resolved
in the `runF` stage).

This means that pandoc's calculation of `citationNoteNum`
can sometimes be wrong when there are example list references.

This commit partially addresses #6836, but only for the case
where the example list references refer to list items defined
previously in the document.
2020-11-14 15:00:17 -08:00
John MacFarlane
68b298ed9a Improve period suppression algorithm for citations in notes...
in note citation styles.  See #6835.
2020-11-13 10:52:21 -08:00
gison93
fec695c77a
Fix error when extension output is doc (#6834) 2020-11-13 09:07:31 -08:00
John MacFarlane
7d298d13d9 Remove redundant bracket. 2020-11-10 10:34:46 -08:00
John MacFarlane
7d01887dda Fix corner case in YAML metadata parsing.
Previously YAML metadata would sometimes not get recognized if a
field ended with a newline followed by spaces.  Closes #6823.
2020-11-10 09:47:24 -08:00
John MacFarlane
08ce3addde Hlint suggestions. 2020-11-07 10:53:07 -08:00
Albert Krewinkel
527346cc7e
Lint code in PRs and when committing to master (#6790)
* Remove unused LANGUAGE pragmata

* Apply HLint suggestions

* Configure HLint to ignore some warnings

* Lint code when committing to master
2020-11-07 10:38:03 -08:00
Albert Krewinkel
0ed3436588
doc/filters.md: describe technical details of filter invocations (#6815) 2020-11-06 15:37:24 -08:00
John MacFarlane
535bd607de Support nocase spansn for csljson output 2020-11-06 09:16:24 -08:00
John MacFarlane
06d3071090 LaTeX reader: better handling of \\ inside math in table cells.
Previously this confused the table parser.  Closes #6811.
2020-11-05 16:13:35 -08:00
John MacFarlane
090b0877bc Citeproc: improve punctuation in in-text note citations.
Previously in-text note citations inside a footnote
would sometimes have the final period stripped, even
if it was needed (e.g. on the end of 'ibid').

See #6813.
2020-11-05 11:15:23 -08:00
John MacFarlane
efe74746d8 DokuWiki writer: translate language names for code elements...
...and improve whitespace.  Closes #6807.
2020-11-04 22:38:53 -08:00
John MacFarlane
08134388ad MediaWiki writer: use syntaxhighlight tag...
instead of deprecated source, for highlighted code.

Also support `startFrom` attribute and `numberLines`.

Closes #6810.
2020-11-04 21:20:41 -08:00
John MacFarlane
0bd6fb4745 Simplified idpred in citeproc. 2020-11-04 11:10:49 -08:00
John MacFarlane
8f75a53542 Properly support optional cite argument for \blockquote.
(LaTeX reader)

Closes #6802.
2020-11-03 10:25:56 -08:00
John MacFarlane
6cbe5efd56 LaTeX reader: fix bug parsing macro arguments.
If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`,
then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc
was expanding it to `\tilde\mathcal{L}`.  This is fixed by
parsing the arguments in "verbatim mode" when the macro expands
arguments at the point of use.

Closes #6796.
2020-11-02 15:04:16 -08:00
Albert Krewinkel
1175b0a008
T.P.Filter: allow shorter YAML representation of Citeproc
The map-based YAML representation of filters expects `type` and `path`
fields. The path field had to be present for all filter types, but is
not used for citeproc filters. The field can now be omitted when type
is "citeproc", as described in the MANUAL.
2020-11-02 15:14:19 +01:00
John MacFarlane
6051c751ce Citeproc: use comma for in-text citations inside footnotes.
When an author-in-text citation like `@foo` occurs in a footnote,
we now render it with:  `AUTHOR NAME + COMMA + SPACE + REST`.

Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`.

This gives better results.  Note that normal citations are still
rendered in parentheses.
2020-11-01 10:48:47 -08:00
John MacFarlane
01f2d81168 Improve deNote. 2020-11-01 10:48:47 -08:00
Andy Morris
f1f2728259 Fix duplicate "class" attribute in HTML writer 2020-10-30 16:38:59 +01:00
John MacFarlane
3e6d009c6b Use new citeproc; do note capitalization here, not in citeproc. 2020-10-29 21:53:02 -07:00
John MacFarlane
bc3f16b0c1 Allow citation-abbreviations in defaults file. 2020-10-29 15:54:50 -07:00
John MacFarlane
bd7c9eb32b LaTeX writer: Improved calculation of table column widths.
We now have LaTeX do the calculation, using `\tabcolsep`.
So we should now have accurate relative column widths no
matter what the text width.

The default template has been modified to load the calc
package if tables are used.
2020-10-29 12:10:05 -07:00
John MacFarlane
95c9f3da63 Remove obsolete comment 2020-10-27 21:05:59 -07:00
John MacFarlane
3190ce95c2 Citeproc: properly handle csl field with data: URI.
This is used with the JATS writer, so this fixes a regression
in pandoc 2.11 with JATS output and citeproc.

Closes #6783.
2020-10-27 21:04:24 -07:00
John MacFarlane
3d93414e5d Add PandocBibliographyError and use it in parsing bibliographies.
This ensures that bibliography parsing errors generate messages
that include the bibliography file name -- otherwise it can be
quite mysterious where it is coming from.

[API change] New PandocBibliographyError constructor on
PandocError type.
2020-10-26 14:46:53 -07:00
Nils Carlson
dd3d920ba0
DocBook Reader: fix duplicate bibliography bug (#6773)
Also add unit test to ensure the behavior stays consistent.
2020-10-26 12:49:03 -07:00
John MacFarlane
9ab04a92f8 HTML reader: Parse contents of iframes.
See #6770.
2020-10-23 23:31:36 -07:00
John MacFarlane
4bf171e11d HTML reader: parse inline svg as image...
...unless `raw_html` is set in the reader (in which case
the svg is passed through as raw HTML).

Closes #6770.
2020-10-23 22:09:39 -07:00
John MacFarlane
efc6994c8a Commonmark writer: fix regression with fenced divs.
Starting with 2.10.1, fenced divs no longer render with
HTML div tags in commonmark output.  This is a regression
due to our transition from cmark-gfm.  This commit fixes it.

Closes #6768.
2020-10-23 09:25:07 -07:00
John MacFarlane
f9c6167ad1 citeproc - improved removal of final period...
...in citations inside notes in note-based styles.
These citations are put in parentheses, but the final
period must be removed.

See jgm/citeproc#20
2020-10-21 22:23:21 -07:00
John MacFarlane
76315d99ca More refinements to --version output.
Add ipynb version.  Put user data directory on same line as
heading "User data directory" (dropping "default").
2020-10-19 17:12:36 -07:00
John MacFarlane
1a2f8733b6 Normalize rewritten image paths with --extract-media.
This change will avoid mixed paths like this one when
`--extract-media` is used with a Word file:
`![](C:\Git\TIJ4\Markdown/media/image30.wmf)`

Instead we'll get
`![](C:\Git\TIJ4\Markdown`media`image30.wmf)`.

Closes #6761.
2020-10-19 16:32:39 -07:00
John MacFarlane
9ecea0bc62 Modify --version output.
Use space more efficiently and report the citeproc version along
with skylighting, texmath, and pandoc-types.
2020-10-19 16:32:39 -07:00
Nils Carlson
2332a08f1e
DocBook reader: bibliomisc and anchor support (#6754)
Also do some minor refactoring - bibliodiv without
a title no longer results in an empty Header.
2020-10-16 23:52:19 -07:00
John MacFarlane
eb3307da4e Fix handling of xdata in bibtex/biblatex bibliographies.
Closes #6752.
2020-10-15 17:41:45 -07:00
Michael Hoffmann
988d381aad
Fix some small typos in the API documentation (#6751)
While reading the docs I found a couple of small typos.
2020-10-15 17:09:29 -07:00
Albert Krewinkel
90af138443
Fix typos in comments, doc strings, error messages, and tests
Typos reported by
https://fossies.org/linux/test/pandoc-master.tar.gz/codespell.html

See: #6738
2020-10-14 22:26:51 +02:00
John MacFarlane
0b3b77415f Modify fix to #6742 to use stringToLaTeX. 2020-10-14 10:22:15 -07:00
John MacFarlane
e0da02623e LaTeX reader: support more acronym commands.
`\acl`, `\aclp`, and capitalized versions of already
supported commands.

Closes #6746.
2020-10-13 21:00:02 -07:00
John MacFarlane
a55fb5f29d LaTeX writer: escape option values in lstlistings environment.
Closes #6742.
2020-10-13 20:53:39 -07:00
John MacFarlane
ef6627f645 LaTeX writer: fix handling of pt-BR.
For polyglossia we now use
`\setmainlanguage[variant=brazilian]{portuguese}`
and for babel
`\usepackage[shorthands=off,main=brazilian]{babel}`.

Closes #2953.
2020-10-12 21:35:36 -07:00
John MacFarlane
12ff835a8a Commonmark reader: add pipe_table extension after defaults.
Otherwise we get bad results for non-table, non-paragraph
lines containing pipe characters.

Closes #6739.

See also jgm/commonmark-hs#52.
2020-10-12 21:24:26 -07:00
John MacFarlane
2007cff203 Markdown writer: Fix autolinks rendering for gfm.
Previously, autolinks rendered as raw HTML, due to the
`class="uri"` added by pandoc's markdown reader.

Closes #6740.
2020-10-12 18:57:04 -07:00
John MacFarlane
0b5e2601f5 LaTeX reader: allow blank lines inside \author. 2020-10-10 16:28:52 -07:00