pandoc

Author	SHA1	Message	Date
John MacFarlane	d9322629a3	LaTeX reader improvements. * Rewrote `withRaw` so it doesn't rely on fragile assumptions about token positions (which break when macros are expanded). This requires the addition of `sEnableWithRaw` and `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` to disable collecting of raw tokens in certain contexts. * Add `parseFromToks` to T.P.Readers.LaTeX.Parsing. * Fix parsing of single character tokens so it doesn't mess up the new raw token collecting. * These changes slightly increase allocations and have a small performance impact, but it's minor. Closes #7092.	2021-02-12 19:04:14 -08:00
John MacFarlane	390d5e65b2	Use getTimestamp instead of getCurrentTime in writers. Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.	2021-02-11 14:55:03 -08:00
John MacFarlane	3c4a58bad0	T.P.Class: Add getTimestamp [API change]. This attempts to read the SOURCE_DATE_EPOCH environment variable and parse a UTC time from it (treating it as a unix date stamp, see https://reproducible-builds.org/specs/source-date-epoch/). If the variable is not set or can't be parsed as a unix date stamp, then the function returns the current date.	2021-02-11 14:54:28 -08:00
John MacFarlane	acc9afaf6f	Correctly parse "raw" date value in markdown references metadata. See jgm/citeproc#53.	2021-02-11 09:16:25 -08:00
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
John MacFarlane	f70795dc5e	ODT reader: finer-grained errors on parse failure. See #7091.	2021-02-08 09:39:59 -08:00
John MacFarlane	5cd1c1001f	ODT reader: give more information if zip can't be unpacked.	2021-02-08 09:39:59 -08:00
Nils Carlson	69b7401e31	DocBook reader: Support informalfigure (#7079 ) Add support for informalfigure.	2021-02-08 09:36:58 -08:00
Albert Krewinkel	d202f7eb77	Avoid unnecessary use of NoImplicitPrelude pragma (#7089 )	2021-02-07 10:02:35 -08:00
John MacFarlane	8e9131db4e	Markdown reader: improved handling of mmd link attributes in references. Previously they only worked for links that had titles. Closes #7080.	2021-02-06 21:52:12 -08:00
Albert Krewinkel	a5169f68b2	Lua filters: use same function names in Haskell and Lua	2021-02-04 19:07:59 +01:00
Nick Berendsen	b79aba6ea1	ePub writer: `belongs-to-collection` metadata (#7063 )	2021-02-03 09:00:18 -08:00
Albert Krewinkel	61b108d527	Lua: add module "pandoc.path" The module allows to work with file paths in a convenient and platform-independent manner. Closes: #6001 Closes: #6565	2021-02-02 21:04:30 -08:00
John MacFarlane	ec8509295a	Add parseOptionsFromArgs [API change, addition]. Exported by Text.Pandoc.App.	2021-02-02 17:00:03 -08:00
John MacFarlane	02d3c71e72	BibTeX writer: use doclayout and doctemplate. This change allows bibtex/biblatex output to wrap as other formats do, depending on the settings of `--wrap` and `--columns`. It also introduces default templates for bibtex and biblatex, which allow for using the variables `header-include`, `include-before` or `include-after` (or alternatively the command line options `--include-in-header`, `--include-before-body`, `--include-after-body`) to insert content into the generated bibtex/biblatex. This change requires a change in the return type of the unexported `T.P.Citeproc.writeBibTeXString` from `Text` to `Doc Text`. Closes #7068.	2021-02-01 18:05:20 -08:00
John MacFarlane	b239c89a82	BibTeX writer fixes. Closes #7067 . + Require citeproc 0.3.0.7, which correctly titlecases when titles contain non-ASCII characters. + Correctly handle 'pages' (= 'page' in CSL). + Correctly handle BibLaTeX 'langid' (= 'language' in CSL). + In BibTeX output, protect foreign titles since there's no language field.	2021-02-01 11:23:07 -08:00
John MacFarlane	d1875b69ec	RST reader: fix handling of header in CSV tables. The interpretation of this line is not affected by the delim option. Closes #7064.	2021-01-31 12:05:46 -08:00
Albert Krewinkel	9c8ff53b54	CslJson writer: fix compiler warning	2021-01-31 14:37:47 +01:00
John MacFarlane	6695917258	CslJson writer: output `[]` if no references in input, instead of raising a PandocAppError as before.	2021-01-30 18:10:22 -08:00
John MacFarlane	9223788a05	Markdown writer: handle math right before digit. We insert an HTML comment to avoid a `$` right before a digit, which pandoc will not recognize as a math delimiter.	2021-01-29 18:29:17 -08:00
Albert Krewinkel	300b9b0ea3	JATS writer: escape special chars in reference elements. Prevents the generation of invalid markup if a citation element contains an ampersand or another character with a special meaning in XML.	2021-01-29 09:51:20 +01:00
John MacFarlane	98c2a52b4e	Clean up BibTeX parsing. Previously there was a messy code path that gave strange results in some cases, not passing through raw tex but trying to extract a string content. This was an artefact of trying to handle some special bibtex-specific commands in the BibTeX reader. Now we just handle these in the LaTeX reader and simplify parsing in the BibTeX reader. This does mean that more raw tex will be passed through (and currently this is not sensitive to the `raw_tex` extension; this should be fixed). Closes #7049.	2021-01-26 22:45:57 -08:00
Mauro Bieg	12bc662535	LaTeX writer: change BCP47 lang tag from jp to ja fixes #7047	2021-01-26 15:29:33 -08:00
Albert Krewinkel	490065f3ed	Lua: always load built-in Lua scripts from default data-dir The Lua modules `pandoc` and `pandoc.List` are now always loaded from the system's default data directory. Loading from a different directory by overriding the default path, e.g. via `--data-dir`, is no longer supported to avoid unexpected behavior and to address security concerns.	2021-01-26 09:43:56 -08:00
John MacFarlane	198ce0cde9	ImageSize: use viewBox for svg if no length, width. This change allows pandoc to extract size information from more SVGs. Closes #7045.	2021-01-22 20:49:41 -08:00
John MacFarlane	83d7804b8f	Merge pull request #7042 from tarleb/jats-element-citations JATS writer: use element citations	2021-01-22 10:39:58 -08:00
Albert Krewinkel	b4b3560191	JATS writer: allow to use element-citation	2021-01-22 19:35:08 +01:00
John MacFarlane	fa952c8dbe	Add biblatex, bibtex as output formats (closes #7040 ). * `biblatex` and `bibtex` are now supported as output as well as input formats. * New module Text.Pandoc.Writers.BibTeX, exporting writeBibTeX and writeBibLaTeX. [API change] * New unexported function `writeBibtexString` in Text.Pandoc.Citeproc.BibTeX.	2021-01-22 10:08:43 -08:00
Albert Krewinkel	87083bd1d6	Text.Pandoc.Citeproc: use finer grained imports This allows to import the module in writers without causing a circular dependency.	2021-01-21 23:22:08 +01:00
John MacFarlane	5f98ac62e3	JATS writer: Ensure that disp-quote is always wrapped in p. Closes #7041.	2021-01-19 20:39:58 -08:00
John MacFarlane	1c4d14cdcc	RST writer: fix #7039 . We were losing content from inside spans with a class, due to logic that is meant to avoid nested inline structures that can't be represented in RST. The logic was a bit stricter than necessary. This commit fixes the issue.	2021-01-18 11:32:02 -08:00
John MacFarlane	c841bcf3b0	Revert "Markdown reader: support GitHub wiki's internal links (#2923 ) (#6458 )" This reverts commit `6efd3460a7`. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.	2021-01-16 16:22:04 -08:00
Gautier DI FOLCO	6efd3460a7	Markdown reader: support GitHub wiki's internal links (#2923 ) (#6458 ) Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.	2021-01-16 16:15:33 -08:00
John MacFarlane	83336a45a7	Recognize more extensions as markdown by default. `mkdn`, `mkd`, `mdwn`, `mdown`, `Rmd`. Closes #7034.	2021-01-16 11:15:35 -08:00
John MacFarlane	387d3e76ee	Markdown writer: cleaned up raw formats. We now react appropriately to gfm, commonmark, and commonmark_x as raw formats.	2021-01-12 10:20:32 -08:00
John MacFarlane	c451207b08	Docx writer: handle table header using styles. Instead of hard-coding the border and header cell vertical alignment, we now let this be determined by the Table style, making use of Word's "conditional formatting" for the table's first row. For headerless tables, we use the tblLook element to tell Word not to apply conditional first-row formatting. Closes #7008.	2021-01-12 09:49:10 -08:00
Albert Krewinkel	68fa437999	JATS writer: fix citations (#7018 ) * JATS writer: keep code lines at 80 chars or below * JATS writer: fix citations	2021-01-10 15:35:48 -08:00
John MacFarlane	e741c7f553	Fix infinite HTTP requests when writing epubs from URL source. Due to a bug in code added to avoid overwriting the cover image if it had the form `fileX.YYY`, pandoc made an endless sequence of HTTP requests when writing epub with input from a URL. Closes #7013.	2021-01-10 12:49:53 -08:00
John MacFarlane	d98ec4feb8	T.P.Citeproc: factor out and export `getStyle`.	2021-01-10 11:48:53 -08:00
John MacFarlane	402d984bc5	T.P.Citeproc: factor out getLang.	2021-01-10 10:28:53 -08:00
John MacFarlane	15e33b33b4	T.P.Citeproc: refactor and export `getReferences`. See #7016.	2021-01-10 10:15:30 -08:00
Albert Krewinkel	fe1378227b	Org reader: allow multiple pipe chars in todo sequences Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED \| DONE \| FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014	2021-01-09 13:40:31 +01:00
Albert Krewinkel	4f34345867	Update copyright notices for 2021 (#7012 )	2021-01-08 09:38:20 -08:00
John MacFarlane	327e1428c5	gfm/commonmark writer: implement start number on ordered lists. Previously they always started at 1, but according to the spec the start number is respected. Closes #7009.	2021-01-07 16:42:05 -08:00
John MacFarlane	c0d8b186d1	T.P.Parsing: modify gridTableWith' for headerless tables. If the table lacks a header, the header row should be an empty list. Previously we got a list of empty cells, which caused an empty header to be emitted instead of no header. In LaTeX/PDF output that meant we got a double top line with space between. @tarleb @despres - please let me know if this is problematic for some reason I'm not grasping.	2021-01-07 11:07:03 -08:00
John MacFarlane	15ba184e6e	HTML writer: fix implicit_figure at end of footnotes. Closes #7006.	2021-01-05 12:07:02 -08:00
David Martschenko	385b6a3b21	Implement defaults file inheritance (#6924 ) Allow defaults files to inherit options from other defaults files by specifying them with the following syntax: `defaults: [list of defaults files or single defaults file]`.	2021-01-05 10:15:59 -08:00
John MacFarlane	ea479bf28a	LaTeX reader: handle filecontents environment. Closes #7003.	2021-01-04 14:05:03 -08:00
John MacFarlane	1ce7db1fa6	EPUB writer: adjust internal links to identifiers... defined in raw HTML sections after splitting into chapters. Closes #7000.	2021-01-04 11:38:18 -08:00
John MacFarlane	f04e02d8d5	EPUB writer: recognize `Format "html4"`, `Format "html5"` as raw HTML.	2021-01-03 11:35:36 -08:00
John MacFarlane	21ee2d80c2	EPUB writer: adjust internal links to images, links, and tables... after splitting into chapters. Previously we only did this for Div and Span and Header elements. See #7000.	2021-01-03 11:27:01 -08:00
Dimitri Sabadie	57b1094152	Org reader: mark verbatim code with class "verbatim". (#6998 ) * Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.	2021-01-03 08:57:47 +01:00
John MacFarlane	260aaaacc6	LaTeX reader: put contents of unknown environments in a Div... when `raw_tex` is not enabled. (When `raw_tex` is enabled, the whole environment is parsed as a raw block.) The class name is the name of the environment. Previously, we just included the contents without the surrounding Div, but having a record of the environment's boundaries and name can be useful. Closes #6997.	2021-01-02 08:19:00 -08:00
John MacFarlane	9a18cf4b59	LaTeX writer: revert table line height increase in 2.11.3. In 2.11.3 we started adding `\addlinespace`, which produced less dense tables. This wasn't an intentional change; I misunderstood a comment in the discussion leading up to the change. This commit restores the earlier default table appearance. Note that if you want a less dense table, you can use something like `\def\arraystretch{1.5}` in your header. Closes #6996.	2021-01-02 07:56:07 -08:00
Albert Krewinkel	17e3efc785	Org reader: restructure output of captioned code blocks The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977	2021-01-01 11:18:36 +01:00
John MacFarlane	23f964b907	Mediawiki reader: allow space around storng/emph delimiters. Closes #6993.	2020-12-30 21:31:28 -08:00
John MacFarlane	0782d5882c	Undo the "Use fromRight" hlint hint.	2020-12-30 16:04:09 -08:00
John MacFarlane	419190213a	Hlint fixes	2020-12-30 15:38:48 -08:00
John MacFarlane	49286a25df	Ms writer: don't justify inside table cells.	2020-12-30 13:36:18 -08:00
John MacFarlane	3cd21c5f6e	Improve fix to #6983 . If we have a paragraph then a bookmarkEnd, we don't need to insert the empty paragraph (and in fact it alters the spacing). Closes #6983.	2020-12-29 08:44:43 -08:00
John MacFarlane	55f9b59af1	Docx writer: fix nested tables with captions. Previously we got unreadable content, because docx seems to want a `<w:p>` element (even an empty one) at the end of every table cell. Closes #6983.	2020-12-28 14:41:28 -08:00
Albert Krewinkel	e837ed772e	HTML reader: use renderTags' from Text.Pandoc.Shared. The `renderTags'` function was duplicated when the reader used `Text` as its string type. The duplication is no longer necessary. A side effect of this change is that empty `<col>` elements are written as self-closing tags in raw HTML blocks.	2020-12-28 14:48:55 +01:00
John MacFarlane	99e1b67b74	Use meta-description instead of description in templates. Since this is an attribute value, we need to prepare it in the writer.	2020-12-27 23:19:14 -08:00
timo-a	668596cc89	Add support for writing nested tables to asciidoc (#6972 ) Added field to WriterState that denotes the current nesting level for traversing tables. Depending on the value of that field nested tables are recognized and written. Asciidoc supports one level of nesting. If deeper tables are to be written, they are omitted and a warning is issued.	2020-12-27 18:42:28 -08:00
Albert Krewinkel	dcd89413f3	Powerpoint writer: allow arbitrary OOXML in raw inline elements The raw text is now included verbatim in the output. Previously is was parsed into XML elements, which prevented the inclusion of partial XML snippets.	2020-12-27 23:18:54 +01:00
John MacFarlane	47f435276a	Citeproc: fix handling of empty URL variables (`DOI`, etc.). The `linkifyVariables` function was changing these to links which then got treated as non-empty by citeproc, leading to wrong results (e.g. ignoring nonempty URL when empty DOI is present). Addresses part 2 of jgm/citeproc#41.	2020-12-24 09:56:20 -08:00
John MacFarlane	9cbbf18fe1	HTML writer: don't include p tags in CSL bibliography entries. Fixes a regression in 2.11.3. Closes #6966	2020-12-20 22:34:31 -08:00
Albert Krewinkel	8f402beab9	LaTeX writer: support colspans and rowspans in tables. (#6950 ) Note that the multirow package is needed for rowspans. It is included in the latex template under a variable, so that it won't be used unless needed for a table.	2020-12-20 18:04:54 -08:00
John MacFarlane	914cf0b602	Fix citeproc regression with duplicate references. - Use dev version of citeproc, which handles duplicate ids better, preferring the last one in the list and discarding the rest. - Ensure that inline citations take priority over external ones. See jgm/citeproc#36. This restores the behavior of pandoc-citeproc.	2020-12-16 15:37:40 -08:00
John MacFarlane	57241e201a	Support Lua marshalling of doctemplates BoolVal. This updates T.P.Lua.Marshaling.Context for doctemplates >= 0.9.	2020-12-16 07:56:07 -08:00
John MacFarlane	b4b4e32307	Properly handle boolean values in writing YAML metadata. (Markdown writer.) This requires doctemplates >= 0.9. Closes #6388.	2020-12-15 23:45:34 -08:00
John MacFarlane	87033b2856	Use fetchItem to get external bibliography. This means that: - a URL may be provided, and pandoc will fetch the resource. - Pandoc will search the resource path for the bibliography if it is not found relative to the working directory. Closes #6940.	2020-12-15 09:09:51 -08:00
John MacFarlane	7d799bfcda	Allow both inline and external references to be used with `--citeproc`. This fixes a regression, since pandoc-citeproc allowed these to be combined. Closes #6951.	2020-12-15 08:51:43 -08:00
John MacFarlane	39153ea6e2	ImageSize: use exif width and height when available. After the move to JuicyPixels, we were getting incorrect width and heigh information for some images (see #6936, test-3.jpg). The correct information was encoded in Exif tags that JuicyPixels seemed to ignore. So we check these first before looking at the Width and Height identified by JuicyPixels. Closes #6936.	2020-12-14 09:39:07 -08:00
John MacFarlane	c43e2dc0f4	RST writer: better image handling. - An image alone in its paragraph (but not a figure) is now rendered as an independent image, with an `alt` attribute if a description is supplied. - An inline image that is not alone in its paragraph will be rendered, as before, using a substitution. Such an image cannot have a "center", "left", or "right" alignment, so the classes `align-center`, `align-left`, or `align-right` are ignored. However, `align-top`, `align-middle`, `align-bottom` will generate a corresponding `align` attribute. Closes #6948.	2020-12-13 15:25:46 -08:00
John MacFarlane	32902d0fad	Merge pull request #6941 from tarleb/docx-raw Docx writer: keep raw openxml strings verbatim	2020-12-13 11:08:41 -08:00
John MacFarlane	c3aa90b57a	ImageSize: use JuicyPixels to extract size... ...for png, jpeg, gif, instead of doing our own binary parsing. See #6936.	2020-12-13 10:33:46 -08:00
John MacFarlane	ef62b70646	ImageSize: use JuicyPixels to determine png size.	2020-12-13 10:33:46 -08:00
Albert Krewinkel	00031fc809	Docx writer: keep raw openxml strings verbatim. Closes: #6933	2020-12-13 14:09:59 +01:00
Albert Krewinkel	8cf58d96e0	Docx writer: use Content instead of Element.	2020-12-13 14:09:53 +01:00
John MacFarlane	3a7d97f02f	Merge pull request #6946 from mb21/icml-image-fit ICML writer: fix image bounding box for custom widths/heights	2020-12-12 08:28:14 -08:00
Albert Krewinkel	ccd235e31f	LaTeX writer: extract table handling into separate module.	2020-12-12 16:48:28 +01:00
mb21	208cb96196	ICML writer: fix image bounding box for custom widths/heights fixes #6936	2020-12-12 14:49:11 +01:00
John MacFarlane	fcd0658189	HTML reader: pay attention to lang attributes on body. These (as well as lang attributes on html) should update lang in metadata. See #6938.	2020-12-10 15:51:20 -08:00
John MacFarlane	0a502e5ff5	HTML reader: retain attribute prefixes and avoid duplicates. Previously we stripped attribute prefixes, reading `xml:lang` as `lang` for example. This resulted in two duplicate `lang` attributes when `xml:lang` and `lang` were both used. This commit causes the prefixes to be retained, and also avoids invald duplicate attributes. Closes #6938.	2020-12-10 15:44:10 -08:00
John MacFarlane	a3eb87b2ea	Add sourcepos extension for commonmarke * Add `Ext_sourcepos` constructor for `Extension`. * Add `sourcepos` extension (only for commonmark). * Bump to 2.11.3 With the `sourcepos` extension set set, `data-pos` attributes are added to the AST by the commonmark reader. No other readers are affected. The `data-pos` attributes are put on elements that accept attributes; for other elements, an enlosing Div or Span is added to hold the attributes. Closes #4565.	2020-12-10 08:59:55 -08:00
John MacFarlane	8c9010864c	Commonmark reader: refactor specFor, set input name to "".	2020-12-10 08:59:55 -08:00
John MacFarlane	5990cbb150	Parsing: Small code improvements.	2020-12-07 21:34:23 -08:00
John MacFarlane	0fa1023b9e	Parsing: More minor performance improvements.	2020-12-07 18:57:09 -08:00
John MacFarlane	ce1791913d	Small efficiency improvement in uri parser	2020-12-07 13:24:19 -08:00
John MacFarlane	2f9b684b3a	Bibtex parser: avoid noneOf.	2020-12-07 13:01:30 -08:00
John MacFarlane	f2749ba6cd	Parsing: in nonspaceChar use satisfy instead of oneOf. For efficiency.	2020-12-07 12:56:03 -08:00
John MacFarlane	501ea7f0c4	Dokuwiki reader: handle unknown interwiki links better. DokuWiki lets the user define his own Interwiki links. Previously pandoc reacted to these by emitting a google search link, which is not helpful. Instead, we now just emit the full URL including the wikilink prefix, e.g. `faquk>FAQ-mathml`. This at least gives users the ability to modify the links using filters. Closes #6932.	2020-12-07 12:15:14 -08:00
John MacFarlane	810df00cf5	Merge pull request #6922 from jtojnar/db-writer-admonitions Docbook writer: handle admonitions	2020-12-07 08:48:02 -08:00
Jan Tojnar	70c7c5703a	Docbook writer: Handle admonition titles from Markdown reader Docbook reader produces a `Div` with `title` class for `<title>` element within an “admonition” element. Markdown writer then turns this into a fenced div with `title` class attribute. Since fenced divs are block elements, their content is recognized as a paragraph by the Markdown reader. This is an issue for Docbook writer because it would produce an invalid DocBook document from such AST – the `<title>` element can only contain “inline” elements. Let’s handle this invalid special case separately by unwrapping the paragraph before creating the `<title>` element.	2020-12-07 07:28:39 +01:00
Jan Tojnar	16ef877457	Docbook writer: Use correct id attribute consistently DocBook5 should always use xml:id instead of id so let’s use it everywhere.	2020-12-07 06:23:25 +01:00
Jan Tojnar	dc6856530c	Docbook writer: handle admonitions Similarly to `d6fdfe6f2b`, we should handle admonitions.	2020-12-07 06:23:25 +01:00
Albert Krewinkel	acf932825b	Org reader: preserve targets of spurious links Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916	2020-12-05 22:37:48 +01:00
Nils Carlson	c161893f44	OpenDocument writer: Allow references for internal links (#6774 ) This commit adds two extensions to the OpenDocument writer, `xrefs_name` and `xrefs_number`. Links to headings, figures and tables inside the document are substituted with cross-references that will use the name or caption of the referenced item for `xrefs_name` or the number for `xrefs_number`. For the `xrefs_number` to be useful heading numbers must be enabled in the generated document and table and figure captions must be enabled using for example the `native_numbering` extension. In order for numbers and reference text to be updated the generated document must be refreshed. Co-authored-by: Nils Carlson <nils.carlson@ludd.ltu.se>	2020-12-05 10:00:04 -08:00
John MacFarlane	ddb76cb356	LaTeX reader: don't apply theorem default styling to a figure inside. If we put an image in italics, then when rendering to Markdown we no longer get an implicit figure. Closes #6925.	2020-12-05 09:53:39 -08:00
Jan Tojnar	6f35600204	Docbook writer: add XML namespaces to top-level elements (#6923 ) Previously, we only added xmlns attributes to chapter elements, even when running with --top-level-division=section. Let’s add the namespaces to part and section elements too, when they are the selected top-level divisions. We do not need to add namespaces to documents produced with --standalone flag, since those will already have xmlns attribute on the root element in the template.	2020-12-04 21:00:21 -08:00
John MacFarlane	dc3ef5201f	Markdown writer: ensure that a new csl-block begins on a new line. This just looks better and doesn't affect the semantics. See #6921.	2020-12-04 10:55:48 -08:00
John MacFarlane	68bcddeb21	LaTeX writer: Fix bug with nested csl- display Spans. See #6921.	2020-12-04 10:14:19 -08:00
John MacFarlane	171d3db384	HTML writer: Fix handling of nested csl- display spans. Previously inner Spans used to represent CSL display attributes were not rendered as div tags. See #6921.	2020-12-04 09:47:56 -08:00
John MacFarlane	7199d68ba0	EPUB writer: include title page in landmarks. Closes #6919. Note that the toc is also included if `--toc` is specified.	2020-12-03 21:39:44 -08:00
John MacFarlane	9c6cc79c11	EPUB writer: add frontmatter type on body element for nav.xhtml. Closes #6918.	2020-12-03 21:24:27 -08:00
John MacFarlane	5bbd5a9e80	Docx writer: Support bold and italic in "complex script." Previously bold and italics didn't work properly in LTR text. This commit causes the w:bCs and w:iCs attributes to be used, in addition to w:b and w:i, for bold and italics respectively. Closes #6911.	2020-12-03 09:51:23 -08:00
John MacFarlane	7b11cdee49	Citeproc: ensure that BCP47 lang codes can be used. We ignore the variants and just use the base lang code and country code when passing off to citeproc.	2020-12-02 10:46:23 -08:00
John MacFarlane	bff9c129c3	LaTeX reader: don't parse `\rule` with width 0 as horizontal rule.	2020-11-29 10:35:20 -08:00
Tassos Manganaris	83d63b72e1	Fix a tiny Typo in the CSV reader module Header comment in the CSV reader module says "RST" instead of "CSV".	2020-11-28 09:40:15 +01:00
Albert Krewinkel	8c38390038	HTML reader tests: improve test coverage of new features	2020-11-27 21:21:25 +01:00
Albert Krewinkel	a9c766291f	HTML reader: support body headers, row head columns Closes: #6312	2020-11-27 10:36:13 +01:00
John MacFarlane	db2db54f80	Added some explicit imports.	2020-11-26 12:44:01 -08:00
cholonam	5f4deb5455	Docx writer: Fix bullets/lists indentation Fix appearance of bullets/numbered lists (the first level is slightly indented to the right instead of right on the margin). New golden files have been tested using Word 2010 on Windows 10.	2020-11-26 12:11:26 -08:00
Igor Pashev	630b1bff2b	LaTeX reader: preserve center environment (#6852 ) The contents of the `center` environment are put in a `Div` with class `center`.	2020-11-26 12:04:31 -08:00
Albert Krewinkel	07919e1b22	HTML reader: improve support for table headers, footer, attributes - `<tfoot>` elements are no longer added to the table body but used as table footer. - Separate `<tbody>` elements are no longer combined into one. - Attributes on `<thead>`, `<tbody>`, `<th>`/`<td>`, and `<tfoot>` elements are preserved.	2020-11-26 07:22:01 +01:00
Albert Krewinkel	3e01ae405f	HTML reader: allow finer grained options for tag omission	2020-11-26 07:22:01 +01:00
John MacFarlane	7c4d7db9c7	LaTeX writer: improve longtable output. - Don't create minipages for regular paragraphs. - Put width and alignment information in the longtable column descriptors. - Closes #6883.	2020-11-25 15:42:44 -08:00
John MacFarlane	b50ac3a95b	LaTeX tables: Fix calculation of column spacing. See #6883.	2020-11-25 14:41:28 -08:00
John MacFarlane	815976d537	Fix truncation of `[Citation]` list in `Cite` inside footnotes... This affected author-in-text citations in footnotes. It didn't cause problems for the printed output, but for filters that expected the citation id and other information. Closes #6890.	2020-11-25 09:10:10 -08:00
Albert Krewinkel	c6f2663a23	HTML reader: simplify list attribute handling This removes the `foldOrElse` function from the internal Text.Pandoc.CSS module.	2020-11-25 17:55:42 +01:00
Albert Krewinkel	c9f98e2bf5	HTML reader: support row or column-spanning table cells	2020-11-24 14:17:35 +01:00
Albert Krewinkel	446ef27a3f	HTML reader: support blocks in caption	2020-11-24 14:17:35 +01:00
Albert Krewinkel	41237fcc0e	HTML reader: extract table parsing into separate module	2020-11-24 14:17:35 +01:00
John MacFarlane	2f110265ff	ImageSize: default to DPI 72 if the format specifies DPI of 0. This shouldn't happen, in general, but it can happen with JPEGs that don't conform to the spec. Having a DPI of 0 will blow up size calculations (division by 0). Closes #6880.	2020-11-23 09:39:48 -08:00
Albert Krewinkel	f9258371dd	HTML reader: extract submodules Reducing module size should reduce memory use during compilation. This is preparatory work to tackle support for more table features.	2020-11-23 10:12:20 +01:00
Nils Carlson	75c881e2d9	OpenDocument Writer: Implement Div and Span ident support (#6755 ) Spans and Divs containing an ident in the Attr will become bookmarks or sections with idents in OpenDocument format.	2020-11-22 22:23:30 -08:00
John MacFarlane	b5b5ef92cb	LaTeX writer: Improve table spacing. + Remove the `\strut` that was added at the end of minipage environments in cells. + Replace `\tabularnewline` with `\\ \addlinespace`. Closes #6842, closes #6860.	2020-11-22 10:54:42 -08:00
Albert Krewinkel	5344dab8eb	Org reader: parse `#+LANGUAGE` into `lang` metadata field Fixes: #6845	2020-11-22 12:53:05 +01:00
Nils Carlson	ae52918faa	OpenDocument writer: Table text width support (#6792 ) Support for table width as a percentage of text width by summing width of columns and verifying that the sum is > 0 and <= 1.	2020-11-21 12:42:43 -08:00
John MacFarlane	7db2cf5d2f	LaTeX reader: more robust parsing of bracketed options. Improves on `9a40976`. Closes #6873.	2020-11-21 12:24:37 -08:00
John MacFarlane	fec8223d3a	Citeproc BibTeX parser: revert change in getRawField... which was made (for reasons forgotten) when transferring this code from pandoc-citeproc. The change led to `--` in URLs being interpreted as en-dashes, which is unwanted. Closes #6874.	2020-11-21 12:07:28 -08:00
Nils Carlson	56ceaf49dc	DocBook reader: Table text width support (#6791 ) Table width in relation to text width is not natively supported by docbook but is by the docbook fo stylesheets through an XML processing instruction, <?dbfo table-width="50%"?> . Implement support for this instruction in the DocBook reader.	2020-11-20 16:05:56 -08:00
John MacFarlane	9a4097640f	Improve LaTeX option parsing... in cases where we run into trouble parsing inlines til the closing `]`, e.g. quotes, we return a plain string with the option contents. Previously we mistakenly included the brackets in this string. Closes #6869.	2020-11-20 13:40:26 -08:00
John MacFarlane	c647948ff1	`commonmark_x`: replace `auto_identifiers` with `gfm_auto_identifiers`. `commonmark_x` never actually supported `auto_identifiers` (it didn't do anything), because the underlying library implements gfm-style identifiers only. Attempts to add the `autolink_identifiers` extension to `commonmark` will now fail with an error. Closes #6863.	2020-11-20 09:17:14 -08:00
Albert Krewinkel	d286242131	JATS writer: support advanced table features	2020-11-19 22:09:52 +01:00
John MacFarlane	c1fbe7b91a	--self-contained: increase coverage. Previously we only self-contained attributes for certain tag names (`img`, `embed`, `video`, `input`, `audio`, `source`, `track`, `section`). Now we self-contain any occurrence of `src`, `data-src`, `poster`, or `data-background-image`, on any tag; and also `href` on `link` tags. Closes #6854 (which specifically asked about `asciinema-player` tags).	2020-11-19 10:08:43 -08:00
John MacFarlane	e16df8d271	DocBook reader: drop period in formalpara title... ...and put it in a div with class `formalpara-title`, so that people can reformat with filters. Closes #6562. Thanks to rdmuller.	2020-11-19 09:33:29 -08:00
John MacFarlane	0962b30d84	Man reader: improve handling of .IP. We now better handle `.IP` when it is used with non-bullet, non-numbered lists, creating a definition list. We also skip blank lines like groff itself. Closes #6858.	2020-11-18 22:44:32 -08:00
Albert Krewinkel	023468ea2d	JATS writer: wrap all tables All `<table>` elements are put inside `<table-wrap>` elements, as the former are not valid as immediate child elements of `<body>`.	2020-11-18 18:10:17 +01:00
TEC	0306eec5fa	Replace org #+KEYWORDS with #+keywords As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): `13424336a6` Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/	2020-11-18 14:48:56 +01:00
TEC	224a501b29	Update org supported languages and identifiers according to the current list contained in https://orgmode.org/worg/org-contrib/babel/languages/index.html	2020-11-18 14:48:56 +01:00
John MacFarlane	efa34a8de6	Bibtex reader: fall back on en-US if locale for LANG not found. This reproduces earlier pandoc-citeproc behavior. Closes jgm/citeproc#26.	2020-11-17 23:12:32 -08:00
John MacFarlane	bf3fea0a8c	Markdown reader: fix regression with example list references. This affects example list references followed by dashes. Introduced by commit `b8d17f7`. Closes #6855.	2020-11-17 20:36:59 -08:00
Albert Krewinkel	94c9028819	JATS writer: move Table handling to separate module This makes it easier to split the module into smaller parts.	2020-11-17 09:46:30 +01:00
John MacFarlane	c9ada73cac	Move getNextNumber from Readers.LaTeX to Readers.LaTeX.Parsing.	2020-11-16 22:36:10 -08:00
John MacFarlane	ee34c4fef8	Only use filterIpynbOutput if input format is ipynb. Closes #6841.	2020-11-16 18:21:30 -08:00
John MacFarlane	98bedd7631	When checking reader/writer name, check base name... now that we permit extensions on formats other than markdown.	2020-11-16 17:49:23 -08:00
John MacFarlane	5271c6b3fb	Improve fix to siunitx numbers with minus. - use real minus sign - use tests contributed by Igor Pashev.	2020-11-16 16:36:16 -08:00
John MacFarlane	734b4c26a9	LaTeX reader: Fix negative numbers in siunitx commands. The commit `a157e1a` broke negative numbers, e.g. `\SI{-33}{\celcius}` or `\num{-3}`. This fixes the regression.	2020-11-16 14:08:29 -08:00
John MacFarlane	d7f905fb63	Markdown reader: fix detection of locators following in-text citations. Prevously, if we had `@foo [p. 33; @bar]`, the `p. 33` would be incorrectly parsed as a prefix of `@bar` rather than a suffix of `@foo`.	2020-11-15 17:51:03 -08:00
John MacFarlane	f8225140a5	Text.Pandoc.PDF: Fix `changePathSeparators` for Windows. Previously a path beginning with a drive, like `C:\foo\bar`, was translated to `C:\/foo/bar`, which caused problems. With this fix, the backslashes are removed. Closes #6173.	2020-11-15 10:43:43 -08:00
Albert Krewinkel	26f946af20	Remove redundant bracket in App.Opt	2020-11-15 12:08:15 +01:00
John MacFarlane	b5d066f167	Revise deprecation warning for --atx-headers.	2020-11-14 21:41:50 -08:00
Aner Lucero	f63b76e169	Markdown writer: default to using ATX headings. Previously we used Setext (underlined) headings by default. The default is now ATX (`##` style). * Add the `--markdown-headings=atx\|setext` option. * Deprecate `--atx-headers`. * Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change]. * Support `markdown-headings` in defaults files. * Document new options in MANUAL. Closes #6662.	2020-11-14 21:33:32 -08:00
John MacFarlane	b8d17f7ae8	Markdown reader: don't increment stateNoteNumber for example refs. Background: syntactically, references to example list items can't be distinguished from citations; we only know which they are after we've parsed the whole document (and this is resolved in the `runF` stage). This means that pandoc's calculation of `citationNoteNum` can sometimes be wrong when there are example list references. This commit partially addresses #6836, but only for the case where the example list references refer to list items defined previously in the document.	2020-11-14 15:00:17 -08:00
John MacFarlane	68b298ed9a	Improve period suppression algorithm for citations in notes... in note citation styles. See #6835.	2020-11-13 10:52:21 -08:00
gison93	fec695c77a	Fix error when extension output is doc (#6834 )	2020-11-13 09:07:31 -08:00
John MacFarlane	7d298d13d9	Remove redundant bracket.	2020-11-10 10:34:46 -08:00
John MacFarlane	7d01887dda	Fix corner case in YAML metadata parsing. Previously YAML metadata would sometimes not get recognized if a field ended with a newline followed by spaces. Closes #6823.	2020-11-10 09:47:24 -08:00
John MacFarlane	08ce3addde	Hlint suggestions.	2020-11-07 10:53:07 -08:00
Albert Krewinkel	527346cc7e	Lint code in PRs and when committing to master (#6790 ) * Remove unused LANGUAGE pragmata * Apply HLint suggestions * Configure HLint to ignore some warnings * Lint code when committing to master	2020-11-07 10:38:03 -08:00
Albert Krewinkel	0ed3436588	doc/filters.md: describe technical details of filter invocations (#6815 )	2020-11-06 15:37:24 -08:00
John MacFarlane	535bd607de	Support nocase spansn for csljson output	2020-11-06 09:16:24 -08:00
John MacFarlane	06d3071090	LaTeX reader: better handling of `\\` inside math in table cells. Previously this confused the table parser. Closes #6811.	2020-11-05 16:13:35 -08:00
John MacFarlane	090b0877bc	Citeproc: improve punctuation in in-text note citations. Previously in-text note citations inside a footnote would sometimes have the final period stripped, even if it was needed (e.g. on the end of 'ibid'). See #6813.	2020-11-05 11:15:23 -08:00
John MacFarlane	efe74746d8	DokuWiki writer: translate language names for code elements... ...and improve whitespace. Closes #6807.	2020-11-04 22:38:53 -08:00
John MacFarlane	08134388ad	MediaWiki writer: use syntaxhighlight tag... instead of deprecated source, for highlighted code. Also support `startFrom` attribute and `numberLines`. Closes #6810.	2020-11-04 21:20:41 -08:00
John MacFarlane	0bd6fb4745	Simplified idpred in citeproc.	2020-11-04 11:10:49 -08:00
John MacFarlane	8f75a53542	Properly support optional cite argument for `\blockquote`. (LaTeX reader) Closes #6802.	2020-11-03 10:25:56 -08:00
John MacFarlane	6cbe5efd56	LaTeX reader: fix bug parsing macro arguments. If `\cL` is defined as `\mathcal{L}`, and `\til` as `\tilde{#1}`, then `\til\cL` should expand to `\tilde{\mathcal{L}}`, but pandoc was expanding it to `\tilde\mathcal{L}`. This is fixed by parsing the arguments in "verbatim mode" when the macro expands arguments at the point of use. Closes #6796.	2020-11-02 15:04:16 -08:00
Albert Krewinkel	1175b0a008	T.P.Filter: allow shorter YAML representation of Citeproc The map-based YAML representation of filters expects `type` and `path` fields. The path field had to be present for all filter types, but is not used for citeproc filters. The field can now be omitted when type is "citeproc", as described in the MANUAL.	2020-11-02 15:14:19 +01:00
John MacFarlane	6051c751ce	Citeproc: use comma for in-text citations inside footnotes. When an author-in-text citation like `@foo` occurs in a footnote, we now render it with: `AUTHOR NAME + COMMA + SPACE + REST`. Previously we rendered: `AUTHOR NAME + SPACE + "(" + REST + ")"`. This gives better results. Note that normal citations are still rendered in parentheses.	2020-11-01 10:48:47 -08:00
John MacFarlane	01f2d81168	Improve deNote.	2020-11-01 10:48:47 -08:00
Andy Morris	f1f2728259	Fix duplicate "class" attribute in HTML writer	2020-10-30 16:38:59 +01:00
John MacFarlane	3e6d009c6b	Use new citeproc; do note capitalization here, not in citeproc.	2020-10-29 21:53:02 -07:00
John MacFarlane	bc3f16b0c1	Allow citation-abbreviations in defaults file.	2020-10-29 15:54:50 -07:00
John MacFarlane	bd7c9eb32b	LaTeX writer: Improved calculation of table column widths. We now have LaTeX do the calculation, using `\tabcolsep`. So we should now have accurate relative column widths no matter what the text width. The default template has been modified to load the calc package if tables are used.	2020-10-29 12:10:05 -07:00
John MacFarlane	95c9f3da63	Remove obsolete comment	2020-10-27 21:05:59 -07:00
John MacFarlane	3190ce95c2	Citeproc: properly handle `csl` field with `data:` URI. This is used with the JATS writer, so this fixes a regression in pandoc 2.11 with JATS output and citeproc. Closes #6783.	2020-10-27 21:04:24 -07:00
John MacFarlane	3d93414e5d	Add PandocBibliographyError and use it in parsing bibliographies. This ensures that bibliography parsing errors generate messages that include the bibliography file name -- otherwise it can be quite mysterious where it is coming from. [API change] New PandocBibliographyError constructor on PandocError type.	2020-10-26 14:46:53 -07:00
Nils Carlson	dd3d920ba0	DocBook Reader: fix duplicate bibliography bug (#6773 ) Also add unit test to ensure the behavior stays consistent.	2020-10-26 12:49:03 -07:00
John MacFarlane	9ab04a92f8	HTML reader: Parse contents of iframes. See #6770.	2020-10-23 23:31:36 -07:00
John MacFarlane	4bf171e11d	HTML reader: parse inline svg as image... ...unless `raw_html` is set in the reader (in which case the svg is passed through as raw HTML). Closes #6770.	2020-10-23 22:09:39 -07:00
John MacFarlane	efc6994c8a	Commonmark writer: fix regression with fenced divs. Starting with 2.10.1, fenced divs no longer render with HTML div tags in commonmark output. This is a regression due to our transition from cmark-gfm. This commit fixes it. Closes #6768.	2020-10-23 09:25:07 -07:00
John MacFarlane	f9c6167ad1	citeproc - improved removal of final period... ...in citations inside notes in note-based styles. These citations are put in parentheses, but the final period must be removed. See jgm/citeproc#20	2020-10-21 22:23:21 -07:00
John MacFarlane	76315d99ca	More refinements to --version output. Add ipynb version. Put user data directory on same line as heading "User data directory" (dropping "default").	2020-10-19 17:12:36 -07:00
John MacFarlane	1a2f8733b6	Normalize rewritten image paths with --extract-media. This change will avoid mixed paths like this one when `--extract-media` is used with a Word file: `![](C:\Git\TIJ4\Markdown/media/image30.wmf)` Instead we'll get `![](C:\Git\TIJ4\Markdown`media`image30.wmf)`. Closes #6761.	2020-10-19 16:32:39 -07:00
John MacFarlane	9ecea0bc62	Modify --version output. Use space more efficiently and report the citeproc version along with skylighting, texmath, and pandoc-types.	2020-10-19 16:32:39 -07:00
Nils Carlson	2332a08f1e	DocBook reader: bibliomisc and anchor support (#6754 ) Also do some minor refactoring - bibliodiv without a title no longer results in an empty Header.	2020-10-16 23:52:19 -07:00
John MacFarlane	eb3307da4e	Fix handling of xdata in bibtex/biblatex bibliographies. Closes #6752.	2020-10-15 17:41:45 -07:00
Michael Hoffmann	988d381aad	Fix some small typos in the API documentation (#6751 ) While reading the docs I found a couple of small typos.	2020-10-15 17:09:29 -07:00
Albert Krewinkel	90af138443	Fix typos in comments, doc strings, error messages, and tests Typos reported by https://fossies.org/linux/test/pandoc-master.tar.gz/codespell.html See: #6738	2020-10-14 22:26:51 +02:00
John MacFarlane	0b3b77415f	Modify fix to #6742 to use stringToLaTeX.	2020-10-14 10:22:15 -07:00
John MacFarlane	e0da02623e	LaTeX reader: support more acronym commands. `\acl`, `\aclp`, and capitalized versions of already supported commands. Closes #6746.	2020-10-13 21:00:02 -07:00
John MacFarlane	a55fb5f29d	LaTeX writer: escape option values in lstlistings environment. Closes #6742.	2020-10-13 20:53:39 -07:00
John MacFarlane	ef6627f645	LaTeX writer: fix handling of pt-BR. For polyglossia we now use `\setmainlanguage[variant=brazilian]{portuguese}` and for babel `\usepackage[shorthands=off,main=brazilian]{babel}`. Closes #2953.	2020-10-12 21:35:36 -07:00
John MacFarlane	12ff835a8a	Commonmark reader: add pipe_table extension after defaults. Otherwise we get bad results for non-table, non-paragraph lines containing pipe characters. Closes #6739. See also jgm/commonmark-hs#52.	2020-10-12 21:24:26 -07:00
John MacFarlane	2007cff203	Markdown writer: Fix autolinks rendering for gfm. Previously, autolinks rendered as raw HTML, due to the `class="uri"` added by pandoc's markdown reader. Closes #6740.	2020-10-12 18:57:04 -07:00
John MacFarlane	0b5e2601f5	LaTeX reader: allow blank lines inside `\author`.	2020-10-10 16:28:52 -07:00

... 2 3 4 5 6 ...

7312 commits