pandoc

Author	SHA1	Message	Date
John MacFarlane	073895c340	Fix some lint issues.	2021-08-11 17:53:39 -07:00
John MacFarlane	dd1a956a8a	LaTeX reader: Support `\global` before `\def`, `\let`, etc. See #7494.	2021-08-11 16:28:53 -07:00
John MacFarlane	e3a263df46	Fix scope for LaTeX macros. They should by default scope over the group in which they are defined (except `\gdef` and `\xdef`, which are global). In addition, environments must be treated as groups. We handle this by making sMacros in the LaTeX parser state a STACK of macro tables. Opening a group adds a table to the stack, closing one removes one. Only the top of the stack is queried. This commit adds a parameter for scope to the Macro constructor (not exported). Closes #7494.	2021-08-11 16:14:34 -07:00
John MacFarlane	a0e44b1ff6	LaTeX reader: improve handling of plain TeX macro primitives. - Fixed semantics for `\let`. - Implement `\edef`, `\gdef`, and `\xdef`. - Add comment noting that currently `\def` and `\edef` set global macros (so are equivalent to `\gdef` and `\xdef`). This should be fixed by scoping macro definitions to groups, in a future commit. Closes #7474.	2021-08-11 10:32:52 -07:00
John MacFarlane	3a924d8f96	HTML reader: treat commments as blank when parsing. This modifies pBlank. Previously comments could sometimes flummox the parser. Cloes #7482.	2021-08-10 12:50:23 -07:00
John MacFarlane	3d7120083a	Fix RTF table parsing bug that created undesired nested tables. Closes #7488.	2021-08-10 11:09:12 -07:00
John MacFarlane	6543b05116	Add RTF reader. - `rtf` is now supported as an input format as well as output. - New module Text.Pandoc.Readers.RTF (exporting `readRTF`). [API change] Closes #3982.	2021-08-10 10:48:55 -07:00
John MacFarlane	c0b68b2030	Allow `--slide-level=0`. When the slide level is set to 0, headings won't be used at all in splitting the document into slides. Horizontal rules must be used to separate slides. Closes #7476.	2021-08-08 11:20:26 -07:00
John MacFarlane	dea1f0f080	RTF writer: emit \outlinelevel for section headings.	2021-08-04 16:37:20 -06:00
mt_caret	407de98b5e	Stop using the HTTP package. (#7456 ) We only depend on the urlEncode function in the package, which is also provided by http-types. The HTTP package also depends on the network package, which has difficulty building on ghcjs. Add internal module Text.Pandoc.Network.HTTP, exporting `urlEncode`.	2021-08-03 15:53:05 -06:00
Peter Fabinski	8667ba2bcc	LaTeX table writer: Increase column width precision (#7466 ) In some cases, the rounding performed by the LaTeX table writer would introduce visible overrun outside the text area. This adds two more decimal places to the width values.	2021-08-03 15:34:39 -06:00
John MacFarlane	f938378d00	RTF writer: omit `\bin` in `\pict`. According to the spec, this is not needed or wanted when the data is in hexadecimal format, as it is here.	2021-08-01 22:45:41 -06:00
John MacFarlane	f145aea0f9	parseFromString: preserve at least the source directory. Previously we just set the source name to "chunk" when parsing from strings, to avoid misleading source positions. This had the side effect that `rebase_relative_paths` would break inside sections that were parsed as strings. So, now we use "ORIGINAL_SOURCE_PATH_chunk" instead of just "chunk". Closes #7464.	2021-07-29 14:54:25 -06:00
John MacFarlane	1f1a30bbf6	LaTeX writer: Use ulem for underline. ulem is conditionally included already when the `strikeout` variable is set, so we set this when there is underlined text, and use `\uline` instead of `\underline`. This fixes wrapping for underlined text. Closes #7351.	2021-07-22 23:05:43 -07:00
John MacFarlane	832196fb17	MIME: use image/x-xcf instead of application/x-xcf. Closes #7454.	2021-07-22 13:08:30 -07:00
John MacFarlane	31a5bccd57	LaTeX reader: avoid trailing hyphen in translating languages. Previously `\foreignlanguage{english}` turned into `<span lang="en-">`. The same issue affected Arabic. Closes #7447.	2021-07-17 23:07:53 -07:00
John MacFarlane	46099e79de	DocBook reader: handle images with imageobjectco elements. Closes #7440.	2021-07-16 13:10:45 -07:00
John MacFarlane	493522c562	LaTeX reader: Support `\cline` in LaTeX tables. Closes #7442.	2021-07-16 12:04:43 -07:00
John MacFarlane	18270c7a39	PDF: Fix svgIn path error. We were duplicating the temp directory; this didn't show up on macOS or linux because there we use absolute paths for the temp directory. Closes #7431.	2021-07-16 11:39:02 -07:00
Jan Tojnar	06408d08e5	DocBook reader: add support for citerefentry (#7437 ) Originally intended for referring to UNIX manual pages, either part of the same DocBook document as refentry element, or external – hence the manvolnum element. These days, refentry is more general, for example the element documentation pages linked below are each a refentry. As per the Processing expectations section of citerefentry, the element is supposed to be a hyperlink to a refentry (when in the same document) but pandoc does not support refentry tag at the moment so that is moot. https://tdg.docbook.org/tdg/5.1/citerefentry.html https://tdg.docbook.org/tdg/5.1/manvolnum.html https://tdg.docbook.org/tdg/5.1/refentry.html This roughly corresponds to a `manpage` role in rST syntax, which produces a `Code` AST node with attributes `.interpreted-text role=manpage` but that does not fit DocBook parser. https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#role-manpage	2021-07-11 15:28:52 -07:00
John MacFarlane	ac0a9da6d8	Improved parsing of raw LaTeX from Text streams (rawLaTeXParser). We now use source positions from the token stream to tell us how much of the text stream to consume. Getting this to work required a few other changes to make token source positions accurate. Closes #7434.	2021-07-11 13:50:28 -07:00
John MacFarlane	477a67061f	Always use / when adding directory to image path with extractMedia. Even on Windows. May help with #7431.	2021-07-09 14:14:19 -07:00
John MacFarlane	ae22b1e977	RST reader: fix regression with code includes. With the recent changes to include infrastructure, included code blocks were getting an extra newline. Closes #7436. Added regression test.	2021-07-09 12:27:41 -07:00
Michael Hoffmann	565330033a	Don't incorporate externally linked images in EPUB documents (#7430 ) Just like it is possible to avoid incorporating an image in EPUB by passing `data-external="1"` to a raw HTML snippet, this makes the same possible for native Images, by looking for an associated `external` attribute.	2021-07-07 09:26:37 -07:00
Michael Hoffmann	e56e2b0e0b	Recognize data-external when reading HTML img tags (#7429 ) Preserve all attributes in img tags. If attributes have a `data-` prefix, it will be stripped. In particular, this preserves a `data-external` attribute as an `external` attribute in the pandoc AST.	2021-07-06 16:06:29 -07:00
John MacFarlane	e7f8cc5786	T.P.PDF, convertImage: normalize paths. This will avoid paths on Windows with mixed path separators, which may cause problems with SVG conversion. See #7431.	2021-07-06 10:39:47 -07:00
John MacFarlane	f88ebf3ebf	Markdown reader: don't try to read contents in self-closing HTML tag. Previously we had problems parsing raw HTML with self-closing tags like `<col/>`. The problem was that pandoc would look for a closing tag to close the markdown contents, but the closing tag had, in effect, already been parsed by `htmlTag`. This fixes the issue described in <https://groups.google.com/d/msgid/pandoc-discuss/297bc662-7841-4423-bcbb-534e99bbba09n%40googlegroups.com>.	2021-07-06 10:22:07 -07:00
John MacFarlane	3ed37f0077	HTML reader: add col, colgroup to 'closes' definitions	2021-07-06 10:21:59 -07:00
John MacFarlane	3a31fe68ef	Add command test for #7394 . And fix a small bug in handling of citations in notes, which led to commas at the end of sentences in some cases.	2021-07-05 15:10:14 -07:00
John MacFarlane	77537b1765	Citeproc: cleanup and efficiency improvement in deNote.	2021-07-05 13:41:01 -07:00
John MacFarlane	ff26af59ac	Revamp note citation handling. Use latest citeproc, which uses a Span with a class rather than a Note for notes. This helps us distinguish between user notes and citation notes. Don't put citations at the beginning of a note in parentheses. (Closes #7394.)	2021-07-05 13:19:33 -07:00
Aner Lucero	cb038bb312	HTML5 writer, remove aria-hidden when explicit atl text is provided.	2021-07-02 13:02:52 -07:00
John MacFarlane	0948af9cc5	Docx writer: Add table numbering for captioned tables. The numbers are added using fields, so that Word can create a list of tables that will update automatically.	2021-06-29 11:15:40 -07:00
John MacFarlane	a01ba4463f	Docx writer: Fixed a couple bugs in Figure numbering.	2021-06-29 11:15:13 -07:00
John MacFarlane	a3d745e485	Docx writer: support figure numbers. These are set up in such a way that they will work with Word's automatic table of figures. Closes #7392.	2021-06-29 09:56:21 -07:00
Aner Lucero	f4ef652a41	Remove duplicated alt text in HTML output.	2021-06-29 09:02:13 -07:00
John MacFarlane	851d037b3e	Improve punctuation moving with `--citeproc`. Previously, using `--citeproc` could cause punctuation to move in quotes even when there aer no citations. This has been changed; now, punctuation moving is limited to citations. In addition, we only move footnotes around punctuation if the style is a note style, even if `notes-after-punctuation` is `true`.	2021-06-28 22:41:14 -07:00
John MacFarlane	97b0aa667c	Allow `$` characters in bibtex keys. Closes #7409.	2021-06-28 13:34:12 -07:00
John MacFarlane	f045e59248	Text.Pandoc.Error: fix line calculations in reporting parsec errors. Also remove a spurious initial newline in the error report.	2021-06-28 13:28:49 -07:00
John MacFarlane	4262898fe9	Set proper initial source name in parsing BibTeX. (For better error messages.)	2021-06-28 13:28:02 -07:00
John MacFarlane	dd098d4e15	Markdown writer: put space between Plain and following fenced Div. Closes #4465.	2021-06-28 11:33:22 -07:00
John MacFarlane	4a7a0cff29	ImageSize: Add Tiff constructor for ImageType. [Minor API change] This allows pandoc to get size information from tiff images. Closes #7405.	2021-06-23 11:39:50 -07:00
John MacFarlane	235cdea629	reveal.js writer: Go back to setting boolean values for variables. In a previous commit we used strings because boolean False wouldn't render as `false`. This is changed in the dev version ofdoctemplates, so we can go back to the more straightforward approach.	2021-06-23 09:54:14 -07:00
John MacFarlane	1b07997f4a	Fix regression with comment-only YAML metadata blocks. Closes #7400.	2021-06-22 09:55:50 -07:00
John MacFarlane	086790d986	Fix unneeded import	2021-06-22 09:49:24 -07:00
John MacFarlane	8eed5b90d0	LaTeX writer: add strut at end of minipage if it contains... line breaks. Without them, the last line is shorter than it should be, at least in some cases.	2021-06-21 23:33:00 -07:00
John MacFarlane	9867231779	Revert "LaTeX writer: put a strut after a line break (`\\`)." This reverts commit `e2a7ecb5f7`.	2021-06-21 23:19:40 -07:00
John MacFarlane	e2a7ecb5f7	LaTeX writer: put a strut after a line break (`\\`). This ensures that we have proper spacing before the next line (which might e.g. be a table bottom border). This gives better results in cases like test/command/7272.md.	2021-06-21 23:17:43 -07:00
John MacFarlane	0352f7845b	Improve emailAddress in Text.Pandoc.Parsing. Previously the parser would accept characters in domains that are illegal in domains, and this sometimes caused it to gobble bits of the following text. Closes #7398. Note that this change, by itself, caused some txt2tag reader tests to fail. txt2tags allows bare email addresses with a following form query. So, in addition to the change to emailAddress, we modify the txt2tags parser so it can still handle these cases.	2021-06-21 22:35:07 -07:00
John MacFarlane	ed3974a254	LaTeX writer: always use a minipage for cells with line breaks... if width information is available. Otherwise the way we treat them can lead to content that overflows a cell. Closes #7393.	2021-06-21 18:25:36 -07:00
John MacFarlane	eee648447a	LaTeX writer: Use `\strut` instead of `~` before `\\` in empty line.	2021-06-21 18:25:07 -07:00
John MacFarlane	14b2eb2aeb	reveal.js writer: better handling of options. Previously it was impossible to specify false values for options that default to true; setting the option to false just caused the portion of the template setting the option to be omitted. Now we prepopulate all the variables with their default values, including them unconditionally and allowing them to be overridden.	2021-06-21 16:40:52 -07:00
John MacFarlane	82ad855f38	Markdown writer: Fix regression in code blocks with attributes. Code blocks with a single class but nonempty attributes were having attributes drop as a result of #7242. Closes #7397.	2021-06-21 08:49:00 -07:00
John MacFarlane	3fb5499dd6	insertMediaBag: ensure we get sane mediaPath for URLs. Long URLs cannot be treated as mediaPaths, but System.FilePath's `isRelative` often returns True for them. So we add a check for an absolute URL. We also ensure that extensions are derived only from the path portion of URLs (previously a following query was being included). Closes #7391.	2021-06-18 13:19:24 -07:00
John MacFarlane	cfa26e3ca0	Docx reader: handle absolute URIs in Relationship Target. Closes #7374.	2021-06-12 13:56:09 -07:00
John MacFarlane	ea53a1dc5c	Markdown writer: allow `pipe_tables` to be disabled for commonmark... (commonmark_x, gfm). Closes #7375.	2021-06-12 10:20:19 -07:00
John MacFarlane	b0cd6c6224	Fix regression in citeproc processing. If inline references are used (in the metadata `references` field), we should still only include in the bibliography items that are actually cited -- unless `nocite` is used. Closes #7376.	2021-06-12 10:16:44 -07:00
John MacFarlane	3776e828a8	Fix MediaBag regressions. With the 2.14 release `--extract-media` stopped working as before; there could be mismatches between the paths in the rendered document and the extracted media. This patch makes several changes (while keeping the same API). The `mediaPath` in 2.14 was always constructed from the SHA1 hash of the media contents. Now, we preserve the original path unless it's an absolute path or contains `..` segments (in that case we use a path based on the SHA1 hash of the contents). When constructing a path from the SHA1 hash, we always use the original extension, if there is one. Otherwise we look up an appropriate extension for the mime type. `mediaDirectory` and `mediaItems` now use the `mediaPath`, rather than the mediabag key, for the first component of the tuple. This makes more sense, I think, and fits with the documentation of these functions; eventually, though, we should rework the API so that `mediaItems` returns both the keys and the MediaItems. Rewriting of source paths in `extractMedia` has been fixed. `fillMediaBag` has been modified so that it doesn't modify image paths (that was part of the problem in #7345). We now do path normalization (e.g. `\` separators on Windows) only in writing the media; the paths are left unchanged in the image links (sensibly, since they might be URLs and not file paths). These changes should restore the original behavior from before 2.14. Closes #7345.	2021-06-10 16:47:02 -07:00
John MacFarlane	aa79b3035c	T.P.MIME, extensionFromMimeType: add a few special cases. When we do a reverse lookup in the MIME table, we just get the last match, so when the same mime type is associated with several different extensions, we sometimes got weird results, e.g. `.vs` for `text/plain`. These special cases help us get the most standard extensions for mime types like `text/plain`.	2021-06-10 16:36:54 -07:00
Albert Krewinkel	c7dd33d5aa	Docx writer: fix handling of empty table headers A table header which does not contain any cells is now treated as an empty header. Fixes: #7369	2021-06-10 18:36:49 +02:00
Albert Krewinkel	55bcd4b4fb	Lua utils: fix handling of table headers in `from_simple_table` Passing an empty list of header cells now results in an empty table header. Fixes: #7369	2021-06-10 18:36:49 +02:00
John MacFarlane	76e5f047b0	Citeproc: avoid duplicate classes and attributes on refs div.	2021-06-08 17:51:53 -07:00
John MacFarlane	21cc52abe3	LaTeX writer: Fix regression in table header position. In recent versions the table headers were no longer bottom-aligned (if more than one line). This patch fixes that by using minipages for table headers in non-simple tables. Closes #7347.	2021-06-05 14:13:58 -06:00
Jan Tojnar	c550bf8482	CommonMark writer: do not use simple class for fenced-divs In https://github.com/jgm/pandoc/pull/7242, we introduced a simple attribute style for for code blocks and fenced divs with a single class but turns out the CommonMark extension does not support it for fenced divs. https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/fenced_divs.md	2021-06-05 13:51:18 -06:00
Jan Tojnar	7a3ee9d3d8	CommonMark writer: do not throw away attributes when Ext_attributes is enabled Ext_attributes covers at least the following: - Ext_fenced_code_attributes - Ext_header_attributes - Ext_inline_code_attributes - Ext_link_attributes	2021-06-05 13:51:18 -06:00
Jan Tojnar	c6f8c38c49	Markdown writer: re-use functions from Inline Instead of duplicating linkAttributes and attrsToMarkdown, let’s just use those from the Inline module.	2021-06-05 13:51:18 -06:00
Jan Tojnar	c8ab8bccf2	DocBook reader: Add support for danger element Added in DocBook 5.2: - https://github.com/docbook/docbook/pull/64 - https://tdg.docbook.org/tdg/5.2/danger.html	2021-06-05 08:02:21 -06:00
Jan Tojnar	af9de925de	DocBook writer: Remove non-existent admonitions attention, error and hint are actually just reStructuredText specific. danger was too until introduced in DocBook 5.2: https://github.com/docbook/docbook/issues/55	2021-06-05 08:02:21 -06:00
John MacFarlane	b6c04383e4	T.P.Class.IO: normalise path in writeMedia. This ensures that we get `\` separators on Windows.	2021-06-03 18:34:38 -06:00
John MacFarlane	311736fb0a	Text.Pandoc.PDF: only print relevant part of environment on `--verbose`.	2021-06-02 15:21:13 -06:00
John MacFarlane	2b5dad9912	Fix regression in 2.14 for generation of PDFs with SVGs. Closes #7344.	2021-06-02 10:42:22 -06:00
John MacFarlane	3b628f7664	HTML writer: Don't omit width attribute on div. Closes #7342.	2021-06-01 21:57:49 -06:00
John MacFarlane	2e4ef14d91	Markdown reader: fix pipe table regression in 2.11.4. Previously pipe tables with empty headers (that is, a header line with all empty cells) would be rendered as headerless tables. This broke in 2.11.4. The fix here is to produce an AST with an empty table head when a pipe table has all empty header cells. Closes #7343.	2021-06-01 21:44:55 -06:00
John MacFarlane	abb59bd582	LaTeX reader: don't allow optional * on symbol control sequences. Generally we allow optional starred variants of LaTeX commands (since many allow them, and if we don't accept these explicitly, ignoring the star usually gives acceptable results). But we don't want to do this for `$*$` and similar cases. Closes #7340.	2021-06-01 13:54:51 -06:00
John MacFarlane	62f46b3995	Fix regression with commonmark/gfm yaml metdata block parsing. A regression in 2.14 led to the document body being omitted after YAML metadata in some cases. This is now fixed. Closes #7339.	2021-05-31 21:34:51 -06:00
John MacFarlane	fc70f44ee2	HTML reader: fix column width regression. Column widths specified with a style attribute were off by a factor of 100 in 2.14. Closes #7334.	2021-05-30 17:15:14 -07:00
John MacFarlane	cc206af392	Have LoadedResource use relative paths. The immediate reason for this is to allow the test output of #3752 to work on both windows and linux.	2021-05-30 10:23:00 -07:00
John MacFarlane	c2f46e6df4	Docx writer: fix regression on captions. The "Table Caption" style was no longer getting applied. (It was overwritten by "Compact.") Closes #7328.	2021-05-30 10:07:28 -07:00
John MacFarlane	cc6dcf0392	Markdown reader: in rebasePaths, check for both Windows and Posix absolute paths. Previously Windows pandoc was treating `/foo/bar.jpg` as non-absolute.	2021-05-29 17:36:30 -07:00
John MacFarlane	0d7103de7e	In rebasePath, check for absolute paths two ways. isAbsolute from FilePath doesn't return True on Windows for paths beginning with `/`, so we check that separately.	2021-05-29 14:41:28 -07:00
John MacFarlane	b6b2331fdc	Support `rebase_relative_paths` for commonmark based formats. (Including `gfm`.)	2021-05-28 13:58:44 -07:00
Emily Bourke	56b211120c	Docx reader: Support new table features. * Column spans * Row spans - The spec says that if the `val` attribute is ommitted, its value should be assumed to be `continue`, and that its values are restricted to {`restart`, `continue`}. If the value has any other value, I think it seems reasonable to default it to `continue`. It might cause problems if the spec is extended in the future by adding a third possible value, in which case this would probably give incorrect behaviour, and wouldn't error. * Allow multiple header rows * Include table description in simple caption - The table description element is like alt text for a table (along with the table caption element). It seems like we should include this somewhere, but I’m not 100% sure how – I’m pairing it with the simple caption for the moment. (Should it maybe go in the block caption instead?) * Detect table captions - Check for caption paragraph style /and/ either the simple or complex table field. This means the caption detection fails for captions which don’t contain a field, as in an example doc I added as a test. However, I think it’s better to be too conservative: a missed table caption will still show up as a paragraph next to the table, whereas if I incorrectly classify something else as a table caption it could cause havoc by pairing it up with a table it’s not at all related to, or dropping it entirely. * Update tests and add new ones Partially fixes: #6316	2021-05-28 20:15:23 +02:00
Emily Bourke	44484d0dee	Docx reader: Read table column widths.	2021-05-28 20:15:23 +02:00
John MacFarlane	4842c5fb82	Two citeproc locator/suffix improvements: - Recognize locators spelled with a capital letter. Closes #7323. - Add a comma and a space in front of the suffix if it doesn't start with space or punctuation. Closes #7324.	2021-05-27 18:28:52 -07:00
John MacFarlane	4b16d181e7	rebase_relative_paths: leave empty paths unchanged.	2021-05-27 14:16:37 -07:00
John MacFarlane	0661ce699f	rebase_relative_paths extension: don't change fragment paths. We don't want a pure fragment path to be rewritten, since these are used for cross-referencing.	2021-05-27 13:53:26 -07:00
John MacFarlane	6972a7dc91	Modify rebase_reference_links treatment of reference links/images. The directory is based on the file containing the link reference, not the file containing the link, if these differ.	2021-05-27 11:26:38 -07:00
John MacFarlane	cbe16b2866	Citeproc: Don't detect math elements as locators. Closes #7321.	2021-05-27 10:49:45 -07:00
John MacFarlane	834da53058	Add `rebase_relative_paths` extension. - Add manual entry for (non-default) extension `rebase_relative_paths`. - Add constructor `Ext_rebase_relative_paths` to `Extensions` in Text.Pandoc.Extensions [API change]. When enabled, this extension rewrites relative image and link paths by prepending the (relative) directory of the containing file. - Make Markdown reader sensitive to the new extension. - Add tests for #3752. Closes #3752. NB. currently the extension applies to markdown and associated readers but not commonmark/gfm.	2021-05-27 10:38:25 -07:00
John MacFarlane	81eadfd99a	LaTeX reader: improve `\def` and implement `\newif`. - Improve parsing of `\def` macros. We previously set "verbatim mode" even for parsing the initial `\def`; this caused problems for things like ``` \def\foo{\def\bar{BAR}} \foo \bar ``` - Implement `\newif`. - Add tests.	2021-05-27 09:15:04 -07:00
John MacFarlane	8d5014fdfc	Logging: remove single quotes around paths in messages. We weren't doing it consistently and it seems unnecessary.	2021-05-25 11:53:49 -07:00
Albert Krewinkel	105a50569b	Allow compilation with base 4.15	2021-05-25 11:52:49 -07:00
Albert Krewinkel	bb2530caa4	Use haddock-library-1.10.0	2021-05-25 11:52:49 -07:00
John MacFarlane	f2c1b57469	PandocMonad: add info message in `downloadOrRead`... indicating what path local resources have been loaded from.	2021-05-25 10:08:30 -07:00
John MacFarlane	fb40c8109d	Logging: add LoadedResource constructor to LogMessage. [API change] This is for INFO-level messages telling where image data has been loaded from. (This can vary because of the resource path.)	2021-05-25 10:07:24 -07:00
Albert Krewinkel	d46ea7d7da	Jira: add support for "smart" links Support has been added for the new `[alias\|https://example.com\|smart-card]` syntax.	2021-05-25 16:54:42 +02:00
John MacFarlane	8511f6fdf6	MediaBag improvements. In the current dev version, we will sometimes add a version of an image with a hashed name, keeping the original version with the original name, which would leave to undesirable duplication. This change separates the media's filename from the media's canonical name (which is the path of the link in the document itself). Filenames are based on SHA1 hashes and assigned automatically. In Text.Pandoc.MediaBag: - Export MediaItem type [API change]. - Change MediaBag type to a map from Text to MediaItem [API change]. - `lookupMedia` now returns a `MediaItem` [API change]. - Change `insertMedia` so it sets the `mediaPath` to a filename based on the SHA1 hash of the contents. This will be used when contents are extracted. In Text.Pandoc.Class.PandocMonad: - Remove `fetchMediaResource` [API change]. Lua MediaBag module has been changed minimally. In the future it would be better, probably, to give Lua access to the full MediaItem type.	2021-05-24 09:20:44 -07:00
Albert Krewinkel	58fbf56548	Jira writer: use `{color}` when span has a color attribute Closes: tarleb/jira-wiki-markup#10	2021-05-24 09:56:02 +02:00
John MacFarlane	1af2cfb287	Handle relative lengths (e.g. `2`) in HTML column widths. See <https://www.w3.org/TR/html4/types.html#h-6.6>. "A relative length has the form "i", where "i" is an integer. When allotting space among elements competing for that space, user agents allot pixel and percentage lengths first, then divide up remaining available space among relative lengths. Each relative length receives a portion of the available space that is proportional to the integer preceding the "". The value "" is equivalent to "1". Thus, if 60 pixels of space are available after the user agent allots pixel and percentage space, and the competing relative lengths are 1, 2, and 3, the 1* will be alloted 10 pixels, the 2* will be alloted 20 pixels, and the 3* will be alloted 30 pixels." Closes #4063.	2021-05-22 22:03:54 -07:00
John MacFarlane	80b4b3fe82	Revert "HTML reader: simplify col width parsing" This reverts commit `f76fe2ab56`.	2021-05-22 22:03:51 -07:00
Albert Krewinkel	f76fe2ab56	HTML reader: simplify col width parsing	2021-05-22 13:37:42 +02:00
John MacFarlane	07d299d353	DocBook reader: ensure that first and last names are separated. Closes #6541.	2021-05-20 18:45:39 -07:00
John MacFarlane	d7b5def287	Ms writer: handle tables with multiple paragraphs. Previously they overflowed the table cell width. We now set line lengths per-cell and restore them after the table has been written. Closes #7288.	2021-05-20 17:12:38 -07:00
John MacFarlane	bb11f5fb86	LaTeX reader: More siunitx improvements. Closes #6658 . There's still one slight divergence from the siunitx behavior: we get 'kg m/A/s' instead of 'kg m/(A s)'. At the moment I'm not going to worry about that.	2021-05-20 15:30:31 -07:00
John MacFarlane	4e990a8cf9	LaTeX/siunitx: fix parsing of `\cubic` etc. See #6658 .	2021-05-20 10:13:20 -07:00
John MacFarlane	bc5058234f	LaTeX reader sinuitx: fix + sign on ang.	2021-05-20 10:13:20 -07:00
John MacFarlane	5dc917da3e	LaTeX reader siunitx: add leading 0 to numbers starting with .	2021-05-20 10:13:20 -07:00
Denis Maier	183ce58477	ConTeXt reader: improve ordered lists (#7304 ) Closes #5016 - change ordered list from itemize to enumerate - adds new itemgroup for ordered lists - add fontfeature for table figures - remove width from itemize in context writer	2021-05-20 09:59:53 -07:00
John MacFarlane	a366bd6abc	LaTeX reader: Fix parsing of `+-` in siunitx numbers. See #6658.	2021-05-20 09:03:29 -07:00
John MacFarlane	8437a4a002	LaTeX reader: support `\pm` in `SI{..}`. Closes #6620.	2021-05-20 08:16:46 -07:00
Albert Krewinkel	b6239f4150	ZimWiki writer: allow links and emphasis in headers The latest version of ZimWiki supports this. Closes: #6605	2021-05-20 12:48:05 +02:00
John MacFarlane	5736b331d8	LaTeX reader: better support for `\xspace`. Previously we only supported it in inline contexts; now we support it in all contexts, including math. Partially addresses #7299.	2021-05-19 16:14:49 -07:00
John MacFarlane	640dbf8b8f	Remove unused pragma.	2021-05-19 09:51:50 -07:00
John MacFarlane	9b5798bd9a	Use fetchItem instead of downloadOrRead in fetchMediaResource.	2021-05-18 22:35:18 -07:00
John MacFarlane	ddbd984a0d	Text.Pandoc.MediaBag: change type to use a Text key... instead of `[FilePath]`. We normalize the path and use `/` separators for consistency.	2021-05-18 22:34:23 -07:00
Albert Krewinkel	eb3dff148e	LaTeX writer: separate successive quote chars with thin space Successive quote characters are separated with a thin space to improve readability and to prevent unwanted ligatures. Detection of these quotes sometimes had failed if the second quote was nested in a span element. Closes: #6958	2021-05-18 22:55:47 +02:00
John MacFarlane	56fb4dae1b	Citeproc: ensure that CSL-related attributes are passed on... ...to a Div with id 'refs'. Previously we just left the attributes of such a Div alone, which meant that style options like entry-spacing had no effect there.	2021-05-17 20:42:43 -07:00
Albert Krewinkel	1843a8793a	HTML writer: keep attributes from code nested below pre tag. If a code block is defined with `<pre><code class="language-x">…</code></pre>`, where the `<pre>` element has no attributes, then the attributes from the `<code>` element are used instead. Any leading `language-` prefix is dropped in the code's class attribute are dropped to improve syntax highlighting. Closes: #7221	2021-05-17 18:08:02 +02:00
Albert Krewinkel	25f5b92777	HTML writer: ensure headings only have valid attribs in HTML4 Fixes: #5944	2021-05-17 15:42:15 +02:00
Albert Krewinkel	4417dacc44	ConTeXt writer: use span identifiers as reference anchors. Closes: #7246	2021-05-17 13:14:32 +02:00
Albert Krewinkel	d92622ba3c	LaTeX template: define commands for zero width non-joiner character Closes: #6639 The zero-width non-joiner character is used to avoid ligatures (e.g. in German).	2021-05-16 12:33:32 -07:00
John MacFarlane	5a6399d9f6	Markdown writer: fewer unneeded escapes for `#`. See #6259.	2021-05-16 12:23:34 -07:00
John MacFarlane	39a69c4f93	Markdown writer: improve escaping of `@`. We need to escape literal `@` before `{` because of the new citation syntax.	2021-05-16 11:53:19 -07:00
John MacFarlane	0a4c6925b6	Docx writer: copy over more settings from referenc.odcx. From settings.xml in the reference-doc, we now include: `zoom`, `embedSystemFonts`, `doNotTrackMoves`, `defaultTabStop`, `drawingGridHorizontalSpacing`, `drawingGridVerticalSpacing`, `displayHorizontalDrawingGridEvery`, `displayVerticalDrawingGridEvery`, `characterSpacingControl`, `savePreviewPicture`, `mathPr`, `themeFontLang`, `decimalSymbol`, `listSeparator`, `autoHyphenation`, `compat`. Closes #7240.	2021-05-15 15:40:49 -07:00
Albert Krewinkel	0794862aac	HTML writer: parse `<header>` as a Div HTML5 `<header>` elements are treated like `<div>` elements.	2021-05-15 16:46:02 +02:00
Albert Krewinkel	013e4a3164	HTML reader: keep h1 tags as normal headers (#7274 ) The tags `<title>` and `<h1 class="title">` often contain the same information, so the latter was dropped from the document. However, as this can lead to loss of information, the heading is now always retained. Use `--shift-heading-level-by=-1` to turn the `<h1>` into the document title, or a filter to restore the previous behavior. Closes: #2293	2021-05-14 12:31:24 -07:00
John MacFarlane	76a4e7127b	Beamer writer: support exampleblock and alertblock. A block will be rendered as an exampleblock if the heading has class `example` and alertblock if it has class `alert`. Closes #7278.	2021-05-14 10:09:46 -07:00
Albert Krewinkel	3ec5726c9b	Docx writer: fix alignment for cells. This fixes a regression introduced with the in the colspan/rowspan changes that caused column alignments to be ignored. The column alignment is used only if a default alignment is specified at the cell level; otherwise the cell-level alignment takes precedence.	2021-05-14 16:49:19 +02:00
Albert Krewinkel	17d96404f5	Docx writer: allow multirow table headers	2021-05-14 16:19:20 +02:00
Albert Krewinkel	875f8f3654	HTML reader: don't fail on unmatched closing "script" tag. Prevent the reader from crashing if the HTML input contains an unmatched closing `</script>` tag. Fixes: #7282	2021-05-14 12:13:40 +02:00
John MacFarlane	3f09f53459	Implement curly-brace syntax for Markdown citation keys. The change provides a way to use citation keys that contain special characters not usable with the standard citation key syntax. Example: `@{foo_bar{x}'}` for the key `foo_bar{x}`. Closes #6026. The change requires adding a new parameter to the `citeKey` parser from Text.Pandoc.Parsing [API change]. Markdown reader: recognize @{..} syntax for citatinos. Markdown writer: use @{..} syntax for citations when needed. Update manual with curly-brace syntax for citations. Closes #6026.	2021-05-13 21:59:32 -07:00
John MacFarlane	edca1d1656	Plain writer: handle superscript unicode minus. Closes #7276. Note: currently we still get unwanted white space around the minus; this needs to be addressed with a change in texmath.	2021-05-12 11:12:27 -07:00
John MacFarlane	0217ae2a4f	Hande 'annote' field in bibtex/biblatex writer. Closes #7266.	2021-05-12 11:05:55 -07:00
John MacFarlane	46309319ef	Fix source position reporting for YAML bibliographies. Closes #7273.	2021-05-12 06:01:13 -06:00
John MacFarlane	5eb7ad7d1e	Improve integration of settings from reference.docx. The settings we can carry over from a reference.docx are autoHyphenation, consecutiveHyphenLimit, hyphenationZone, doNotHyphenateCap, evenAndOddHeaders, and proofState. Previously this was implemented in a buggy way, so that the reference doc's values AND the new values were included. This change allows users to create a reference.docx that sets w:proofState for spelling or grammar to "dirty," so that spell/grammar checking will be triggered on the generated docx. Closes #1209.	2021-05-11 22:31:38 -06:00
John MacFarlane	a66e50840b	T.P.XML.Light - add Eq, Ord instances... for Content, Element, Attr, CDataKind. [API change]	2021-05-11 09:01:36 -06:00
John MacFarlane	2bd5d0cafb	LaTeX writer: better handling of line breaks in simple tables. Now we also handle the case where they're embedded in other elements, e.g. spans. Closes #7272.	2021-05-11 07:52:05 -06:00
nuew	ff7176de80	epub Writer: Fix belongs-to-collection XML id choice (#7267 ) The epub writer previously used the same XML id for both the book identifier and the epub collection. This causes an error on epubcheck.	2021-05-10 09:26:32 -06:00
John MacFarlane	2a2e08d823	RST reader: seek include files in the directory... ...of the file containing the include directive, as RST requires. Closes #6632.	2021-05-09 19:11:35 -06:00
John MacFarlane	b2398cd747	Org reader: Resolve org includes relative to ... ...the directory containing the file containing the INCLUDE directive. Closes #5501.	2021-05-09 19:11:35 -06:00
John MacFarlane	41a3ac9da9	RST reader: use `insertIncludedFile` from T.P.Parsing... instead of reproducing much of its code.	2021-05-09 19:11:34 -06:00
John MacFarlane	05ea507bd7	T.P.Parsing: improve include file functions. Remove old `insertIncludedFileF`. [API change] Give `insertIncludedFile` a more general type, allowing it to be used where `insertIncludedFileF` was.	2021-05-09 19:11:34 -06:00
John MacFarlane	6e45607f99	Change reader types, allowing better tracking of source positions. Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.	2021-05-09 19:11:34 -06:00
Albert Krewinkel	295d93e96b	ConTeXt writer: support blank lines in line blocks. Fixes: #6564 Thanks to @denismaier.	2021-05-07 17:17:47 +02:00
Albert Krewinkel	8357b835d9	App: allow tabs expansion even if file-scope is used Tabs in plain-text inputs are now handled correctly, even if the `--file-scope` flag is used. Closes: #6709	2021-05-05 19:09:21 +02:00
Albert Krewinkel	ddbf83f62c	Docx writer: support colspans and rowspans in tables See: #6315	2021-05-01 18:52:24 +02:00
Albert Krewinkel	3da919e35d	Add new internal module Text.Pandoc.Writers.GridTable	2021-05-01 18:52:24 +02:00
tecosaur	6b16f3bb0d	Org writer: inline latex envs need newlines (#7259 ) Closes #7252 As specified in https://orgmode.org/manual/LaTeX-fragments.html, an inline \begin{}...\end{} LaTeX block must start on a new line.	2021-04-30 10:23:28 +02:00
mbrackeantidot	b6a65445e1	Docx reader: add handling of vml image objects (jgm#4735) (#7257 ) They represent images, the same way as other images in vml format.	2021-04-29 09:11:44 -07:00
John MacFarlane	d14c5f94df	Further improvements in smart quotes. Improves heuristic for detection of an "open double quote." Closes #2103.	2021-04-29 08:48:49 -07:00
John MacFarlane	80e2e88287	Smarter smart quotes. Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.	2021-04-28 23:32:37 -07:00
Albert Krewinkel	85f379e474	JATS writer: use either styled-content or named-content for spans. If the element has a content-type attribute, or at least one class, then that value is used as `content-type` and the span is put inside a `<named-content>` element. Otherwise a `<styled-content>` element is used instead. Closes: #7211	2021-04-28 22:21:34 +02:00
Albert Krewinkel	0921b82d98	Docx writer: autoset table width if no column has an explicit width.	2021-04-27 13:27:20 +02:00
John MacFarlane	3a98f7a0c7	Minor code reformatting. Also taking this opportunity to note, for the record, that the commit for #7241 should be marked [API change]. It changes the type of `languagesByExtension` in Highlighting, adding a parameter for a `SyntaxMap`.	2021-04-25 12:22:04 -07:00
Jan Tojnar	c56d080a25	Writers: Recognize custom syntax definitions (#7241 ) Languages defined using `--syntax-definition` were not recognized by `languagesByExtension`. This patch corrects that, allowing the writers to see all custom definitions. The LaTeX still uses the default syntax map, but that's okay in that context, since `--syntax-definition` won't create new listings styles.	2021-04-25 12:19:07 -07:00
Jan Tojnar	e9c0f9f97b	Markdown writer: Cleaner (code)blocks with single class (#7242 ) When a block only has a single class and no other attributes, it is not necessary to wrap the class attribute in curly braces – the class name can be placed after the opening mark as is. This will result in bit cleaner output when pandoc is used as a markdown pretty-printer.	2021-04-25 10:36:06 -07:00
John MacFarlane	547bc2cdf8	Add quotes properly in markdown YAML metadata fields. This fixes a bug, which caused the writer to look at the LAST rather than the FIRST character in determining whether quotes were needed. So we got spurious quotes in some cases and didn't get necessary quotes in others. Closes #7245. Updated a number of test cases accordingly.	2021-04-25 10:31:33 -07:00
Albert Krewinkel	dc0ba7294d	Docx writer: add missing file	2021-04-20 13:38:16 +02:00
Albert Krewinkel	0b74bbbdaa	Docx writer: extract Table handling into separate module	2021-04-20 10:57:54 +02:00
John MacFarlane	16d372abcb	Issue error message when reader or writer format is malformed. Previously we exited with an error status but (due to a bug) no message. Closes #7231.	2021-04-19 08:38:31 -07:00
John MacFarlane	73d394ca2a	Use MetaInlines not MetaBlocks for multimarkdown metadata fields. This gives better results in converting to e.g. pandoc markdown. Ref: <https://groups.google.com/d/msgid/pandoc-discuss/9728d1f4-040e-4392-aa04-148f648a8dfdn%40googlegroups.com>	2021-04-18 22:01:12 -07:00
John MacFarlane	a478a5c4c8	Update to released unicode-collation, latest citeproc dev version. Update citeproc test.	2021-04-17 16:15:14 -07:00
John MacFarlane	7a7fefce5e	Use document's lang for the lang parameter of citeproc... even if it differs from localeLanguage. (It is designed to be possible to override the locale language, and this is especially useful when one wants to use the unicode extension syntx, e.g. fr-u-kb.)	2021-04-17 16:15:14 -07:00
John MacFarlane	aecbf8156e	Remove Text.Pandoc.BCP47 module. [API change] Use Lang from UnicodeCollation.Lang instead. This is a richer implementation of BCP 47.	2021-04-17 16:15:14 -07:00
John MacFarlane	7ba8c0d2a5	Move getLang from BCP47 -> T.P.Writers.Shared. [API change]	2021-04-17 16:15:13 -07:00
Albert Krewinkel	5f79a66ed6	JATS writer: reduce unnecessary use of <p> elements for wrapping The `<p>` element is used for wrapping in cases were the contents would otherwise not be allowed in a certain context. Unnecessary wrapping is avoided, especially around quotes (`<disp-quote>` elements). Closes: #7227	2021-04-16 22:47:37 +02:00
Albert Krewinkel	2d60524de4	JATS writer: convert spans to <named-content> elements Spans with attributes are converted to `<named-content>` elements instead of being wrapped with `<milestone-start/>` and `<milestone-end>` elements. Milestone elements are not allowed in documents using the articleauthoring tag set, so this change ensures the creation of valid documents. Closes: #7211	2021-04-10 11:49:18 +02:00
Albert Krewinkel	051b7ffeaf	JATS writer: add footnote number as label in backmatter Footnotes in the backmatter are given the footnote's number as a label. The articleauthoring output is unaffected from this change, as footnotes are placed inline there. Closes: #7210	2021-04-10 10:57:06 +02:00
John MacFarlane	20cd33e5a4	Fix regression in grid tables for wide characters. In the translation from String to Text, a char-width-sensitive splitAt' was dropped. This commit reinstates it. Closes #7214.	2021-04-08 14:48:29 -07:00
Albert Krewinkel	e227496d3a	Lua filter: respect Inlines/Blocks filter functions in pandoc.walk_*	2021-04-08 22:14:47 +02:00
John MacFarlane	60974538b2	Commonmark writer: Use backslash escapes for `<` and `\|`... instead of entities. Closes #7208.	2021-04-05 23:29:22 -07:00
John MacFarlane	21fed4a9c2	SelfContained: remove unneeded imports.	2021-04-05 23:26:54 -07:00
Albert Krewinkel	038261ea52	JATS writer: escape disallows chars in identifiers XML identifiers must start with an underscore or letter, and can contain only a limited set of punctuation characters. Any IDs not adhering to these rules are rewritten by writing the offending characters as Uxxxx, where `xxxx` is the character's hex code.	2021-04-05 21:55:54 +02:00
John MacFarlane	65a9d3a878	SelfContained: use application/octet-stream for unknown mime types... instead of halting with an error. Closes #7202.	2021-04-05 08:49:03 -07:00
John MacFarlane	935d10769d	Fix "phrase" in DocBook: take classes from "role" not "class". Closes #7195. Revises #6438.	2021-04-02 17:07:18 -07:00
tecosaur	4371223d13	Org writer: Use LaTeX style maths deliminators (#7196 ) Org works better with LaTeX-style delimiters.	2021-04-01 23:36:02 +02:00
niszet	40da6c402b	Treat tabs as spaces in ODT Reader. (#7185 )	2021-03-31 16:44:34 -07:00
John MacFarlane	e22d1fbb14	Powerpoint writer: allow monofont to be specified in metadata... ...not just using `--variable` on the command line (as in other writers). Closes #7187.	2021-03-29 14:56:44 -07:00
John MacFarlane	56ce1fc126	Fix DocBook reader mathml regression... ...caused by the switch in XML libraries. Also fixed a similar issue in JATS. Closes #7173.	2021-03-24 12:04:33 -07:00
John MacFarlane	052056289f	Simplify T.P.Asciify and export toAsciiText [API change]. Instead of encoding a giant (and incomplete) map, we now just use unicode-transforms to normalize the text to a canonical decomposition, and manipulate the result. The new `toAsciiText` is equivalent to the old `T.pack . mapMaybe toAsciiChar . T.unpack` but should be faster.	2021-03-21 23:40:19 -07:00
John MacFarlane	c389211e2f	Support `yaml_metadata_block` extension form commonmark, gfm. This is a bit more limited than with markdown, as documented in the manual: - The YAML block must be the first thing in the input. - The leaf notes are parsed in isolation from the rest of the document. So, for example, you can't use reference links if the references are defined later in the document. Closes #6537.	2021-03-20 15:58:33 -07:00
John MacFarlane	2274eb88a4	Move yamlMetaBlock from Markdown reader to T.P.Readers.Metadata.	2021-03-20 15:58:33 -07:00
John MacFarlane	bea86f394e	Markdown reader: export `yamlMetaBlock`. [API change] This will allow us to parse YAML metadata blocks in other readers, potentially.	2021-03-20 15:58:33 -07:00
John MacFarlane	ce418667ae	Text.Pandoc.Parsing: remove F type synonym. Muse and Org were defining their own F anyway, with their own state. We therefore move this definition to the Markdown reader.	2021-03-20 15:58:32 -07:00
John MacFarlane	4d041953f5	T.P.Readers.Metadata: made `yamlBsToMeta`, `yamlBsToRefs` polymorphic... on the parser state, instead of requiring ParserState. [API change]	2021-03-20 15:58:32 -07:00
John MacFarlane	84d8f3efd8	RST writer: use NonEmpty for init, last.	2021-03-20 15:58:32 -07:00
Erik Rask	82e8c29cb0	Include Header.Attr.attributes as XML attributes on section Add key-value pairs found in the attributes list of Header.Attr as XML attributes on the corresponding section element. Any key name not allowed as an XML attribute name is dropped, as are keys with invalid values where they are defined as enums in DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not intervene with computed identifiers.	2021-03-20 21:29:17 +01:00
John MacFarlane	a1a57bce4e	T.P.Shared: remove `backslashEscapes`, `escapeStringUsing`. [API change] These are inefficient association list lookups. Replace with more efficient functions in the writers that used them (with 10-25% performance improvements in haddock, org, rtf, texinfo writers).	2021-03-20 00:24:49 -07:00
John MacFarlane	eacead3eb3	Fix fallback to default partials on templates. If the directory containing a template does not contain the partial, it should be sought in the default data files. Closes #7164.	2021-03-19 22:57:48 -07:00
John MacFarlane	7678c48122	Hlint suggestion.	2021-03-19 14:43:42 -07:00
John MacFarlane	005f0fbcd5	T.P.Shared: Remove ToString, ToText typeclasses [API change]. T.P.Parsing: revise type of readWithM so that it takes a Text rather than a polymorphic ToText value. These typeclasses were there to ease the transition from String to Text. They are no longer needed, and they may clash with more useful versions under the same name. This will require a bump to 2.13.	2021-03-19 12:36:04 -07:00
John MacFarlane	4002c35a91	Protect partial uses of maximum with NonEmpty.	2021-03-19 11:55:59 -07:00
John MacFarlane	8d5116381b	Use NonEmpty instead of minimumDef.	2021-03-19 10:30:32 -07:00
John MacFarlane	a31731b8e2	Docx reader: Don't reimplement NonEmpty.	2021-03-19 10:11:08 -07:00
John MacFarlane	3428248deb	Use minimumDef instead of minimum (partial function).	2021-03-18 23:01:12 -07:00
John MacFarlane	f0e4b9cc3c	Require safe >= 0.3.18 and remove cpp.	2021-03-18 21:37:56 -07:00
John MacFarlane	1da6208315	Rewrite a foldl1 as a foldl'.	2021-03-18 21:30:59 -07:00
John MacFarlane	67e173bda1	Remove another foldr1 partial function use.	2021-03-18 21:10:22 -07:00
John MacFarlane	fd76e605cd	T.P.Readers.Odt.StyleReader: rewrite foldr1 use as foldr. This avoids a partial function.	2021-03-18 21:02:05 -07:00
John MacFarlane	c3f9e8c122	Docx writer: make nsid in abstractNum deterministic. Previously we assigned a random number (though in a deterministic way). But changes in the random package mean we get different results now on different architectures, even with the same random seed. We don't need random values; so now we just assign a value based on the list number id, which is guaranteed to be unique to the list marker.	2021-03-17 22:31:20 -07:00
John MacFarlane	7bf4be04b0	Fix regression with `tex_math_backslash` in Markdown reader. Added regression test. Closes #7155.	2021-03-17 09:10:44 -07:00
John MacFarlane	87538966a0	Removed unused LANGUAGE pragmas.	2021-03-16 13:05:29 -07:00
John MacFarlane	805d12ac9c	Remove an unneeded import	2021-03-15 14:21:52 -07:00
John MacFarlane	24191a2a27	Use foldl' instead of foldl everywhere.	2021-03-15 10:37:35 -07:00
John MacFarlane	3622097da3	Handle 'nocite' better with --biblatex and --natbib. Previously the nocite metadata field was ignored with these formats. Now it populates a `nocite-ids` template variable and causes a `\nocite` command to be issued. Closes #4585.	2021-03-14 00:10:37 -08:00
Albert Krewinkel	35688c4262	T.P.App.FormatHeuristics: shorten code, improve docs.	2021-03-13 22:06:43 +01:00
John MacFarlane	35b66a7671	MediaWiki reader: Allow block-level content in notes (ref). Closes #7145.	2021-03-13 12:50:44 -08:00
John MacFarlane	eed18d231c	Use integral values for w:tblW in docx. Cloess #7141.	2021-03-13 12:05:52 -08:00
Albert Krewinkel	00e8d0678e	Jira reader: mark divs created from panels with class "panel". Closes: tarleb/jira-wiki-markup#2	2021-03-13 14:29:47 +01:00
Albert Krewinkel	a8aa301428	Jira writer: improve div/panel handling Include div attributes in panels, always render divs with class `panel` as panels, and avoid nesting of panels.	2021-03-13 12:10:02 +01:00
John MacFarlane	894ed8ebb0	Citeproc: apply fixLinks correctly. This is code that incorporates a prefix like `https://doi.org/` into a following link when appropriate. But it didn't work because we were walking with a `[Inline] -> [Inline]` function on an `Inlines`. Changed the point of application of `fixLink` to resolve the issue. Closes #7130.	2021-03-12 11:58:52 -08:00
John MacFarlane	92ffd37475	Simplify compactDL.	2021-03-12 11:58:52 -08:00
John MacFarlane	5608dc01e5	HTML writer: Add warnings on duplicate attribute values. This prevents emitting invalid HTML. Ultimately it would be good to prevent this in the types themselves, but this is better for now. T.P.Logging: Add DuplicateAttribute constructor to LogMessage. [API change]	2021-03-10 10:19:40 -08:00
John MacFarlane	1c23e3a824	RST reader: fix logic for ending comments. Previously comments sometimes got extended too far. Closes #7134.	2021-03-09 13:03:27 -08:00
Albert Krewinkel	d7f8fbf04b	Org writer: fix operator precedence mistake in previous commit	2021-03-09 21:16:11 +01:00
Albert Krewinkel	b9b2586ed3	Org writer: prevent unintended creation of ordered list items Adjust line wrapping if default wrapping would cause a line to be read as an ordered list item. Fixes #7132	2021-03-09 18:14:54 +01:00
Albert Krewinkel	eb184d9148	Jira writer: use noformat instead of code for unknown languages. Code blocks that are not marked as a language supported by Jira are rendered as preformatted text with `{noformat}` blocks. Fixes: tarleb/jira-wiki-markup#4	2021-03-08 12:50:35 +01:00
John MacFarlane	5aa73bd0a2	LaTeX reader: handle table cells containing `&` in `\verb`. Closes #7129.	2021-03-07 15:49:02 -08:00
John MacFarlane	c652dcc16b	LaTeX reader: support hyperref command. Closes #7127.	2021-03-07 13:22:00 -08:00
John MacFarlane	735a69de6b	Allow `--resource-path` to accumulate. Previously, if `--resource-path` were used multiple times, the last resource path would replace the others. With this change, each time `--resource-path` is used, it prepends the specified path components to the existing resource path. Similarly, when `resource-path` is specified in a defaults file, the paths provided will be prepended to the existing resource path. This change also allows one to avoid using the OS-specific path separator; instead, one can simply use `--resource-path` a number of times with single paths. This form of command will not have an OS-dependent behavior. This change facilitates the use of multiple, small defaults files: each can specify a directory containing its own resources without clobbering the resource paths set by the others. Closes #6152.	2021-03-06 10:32:51 -08:00
John MacFarlane	df00cf05cb	Allow `${.}` in defaults files paths... to refer to the directory where the default file is. This will make it possible to create moveable "packages" of resources in a directory. Closes #5871.	2021-03-05 11:56:41 -08:00
John MacFarlane	6dd7520cc4	Implement environment variable interpolation in defaults files. This allows the syntax `${HOME}` to be used, in fields that expect file paths only. Any environment variable may be interpolated in this way. A warning will be raised for undefined variables. The special variable `USERDATA` is automatically set to the user data directory in force when the defaults file is parsed. (Note: it may be different from the eventual user data directory, if the defaults file or further command line options change that.) Closes #5982. Closes #5977. Closes #6108 (path not taken).	2021-03-05 10:46:01 -08:00
John MacFarlane	a832469006	Add fields for CSL optinos to Opt. * Add `optCSL`, `optBibliography`, `optCitationAbbreviations` to `Opt` [API change]. * Move `addMeta` from T.P.App.Opt to T.P.App.CommandLineOptions.	2021-03-05 10:42:33 -08:00
John MacFarlane	ccc530c588	Logging: Add EnvironmentVariableUndefined constructor to LogMessage. [API change]	2021-03-05 10:28:46 -08:00
John MacFarlane	5f9327cfc8	Shared: Change defaultUserDataDirs -> defaultUserDataDir. Rationale: the manual says that the XDG data directory will be used if it exists, otherwise the legacy data directory. So we should just determine this and use this directory, rather than having a search path which could cause some things to be taken from one data directory and others from others. [API change]	2021-03-05 10:25:18 -08:00
John MacFarlane	030209fc29	Revert "Revert "Relax `--abbreviations` rules so that a period isn't required. This reverts commit `916ce4d511`. I was confused in thinking it wouldn't work.	2021-03-04 16:25:13 -08:00
John MacFarlane	916ce4d511	Revert "Relax `--abbreviations` rules so that a period isn't required." This reverts commit `e461b7dd45`. Ill-advised change. This doesn't work because we parse strings in chunks.	2021-03-04 16:22:08 -08:00
John MacFarlane	e461b7dd45	Relax `--abbreviations` rules so that a period isn't required. Partially addresses #7124.	2021-03-04 16:02:46 -08:00
John MacFarlane	92ea8a0cb6	Revert "Add T.P.Readers.LaTeX.Include." This reverts commit `b569b0226d`. Memory usage improvement in compilation wasn't very significant.	2021-03-03 19:07:16 -08:00
John MacFarlane	b569b0226d	Add T.P.Readers.LaTeX.Include.	2021-03-03 18:47:17 -08:00
John MacFarlane	33e4c8dd6c	Remove T.P.Readers.LaTeX.Accent. Incorporate accentCommands into T.P.Readers.LaTeX.Inline.	2021-03-03 18:21:32 -08:00
John MacFarlane	da5e9e5956	Move enquote commands to T.P.LaTeX.Lang.	2021-03-03 11:22:42 -08:00
John MacFarlane	044bc44fc6	Moved more into T.P.Readers.LaTeX.Lang.	2021-03-03 11:08:02 -08:00
John MacFarlane	bbcc1501a5	Split out T.P.Readers.LaTeX.Inline.	2021-03-03 10:34:10 -08:00
John MacFarlane	e8e5ffe1f4	Split out T.P.Writers.LaTeX.Util.	2021-03-02 22:40:45 -08:00
John MacFarlane	fe483c653b	Split out T.P.Writers.LaTeX.Citation.	2021-03-02 21:57:37 -08:00
John MacFarlane	827ecdd2de	Split out T.P.Writers.LaTeX.Lang.	2021-03-02 21:33:58 -08:00
John MacFarlane	2097411e4f	Split up T.P.Writers.Markdown... with T.P.Writers.Markdown.Types and T.P.Writers.Markdown.Inline. The module was difficult to compile on low-memory system.s	2021-03-02 21:08:13 -08:00
John MacFarlane	7f1b933aaa	Make T.P.Readers.LaTeX.Types an unexported module. [API change] This is really an implementation detail that shouldn't be exposed in the public API.	2021-03-01 09:46:43 -08:00
John MacFarlane	382f0e23d2	Factor out T.P.Readers.LaTeX.Macro.	2021-03-01 09:46:43 -08:00
Albert Krewinkel	e1454fe0d0	Jira writer: use Span identifiers as anchors Closes: tarleb/jira-wiki-markup#3.	2021-03-01 14:36:11 +01:00
John MacFarlane	3793ed8beb	Removed unnecessary pragmas.	2021-02-28 23:43:55 -08:00
John MacFarlane	6a6291d9e3	Change T.P.Readers.LaTeX.SIunitx to export a command map... instead of individual commands.	2021-02-28 23:05:35 -08:00
John MacFarlane	7e38b8e55a	T.P.Readers.LaTeX: Don't export tokenize, untokenize. [API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).	2021-02-28 22:53:42 -08:00
John MacFarlane	2463fbf61d	LaTeX writer: use function instead of map for accent lookup.	2021-02-28 21:43:11 -08:00
John MacFarlane	d2bb0c7c8d	Factor out T.P.Readers.LaTeX.Math.	2021-02-28 21:05:25 -08:00
John MacFarlane	36456070c4	Fix bug in last commit.	2021-02-28 15:36:46 -08:00
John MacFarlane	7229d068c9	Markdown reader efficiency improvements. Benchmarks show that these make the reader 13-17% faster, depending on extensions.	2021-02-28 15:18:31 -08:00
John MacFarlane	cc543cf5b6	LaTeX reader: another small efficiency improvement.	2021-02-28 14:34:04 -08:00
John MacFarlane	f6cf03857b	LaTeX reader efficiency improvements. In conjunction with other changes this makes the reader almost twice as fast on our benchmark as it was on Feb. 10.	2021-02-28 12:52:41 -08:00
John MacFarlane	564c39beef	Move setDefaultLanguage to T.P.Readers.LaTeX.Lang.	2021-02-28 09:49:34 -08:00
John MacFarlane	5e571d9635	LaTeX reader: remove two unnecessary parsers in inline. These are handled anyway by regularSymbol.	2021-02-28 09:39:01 -08:00
John MacFarlane	2faa57e8e9	Factor out T.P.Readers.LaTeX.Citation.	2021-02-28 09:12:09 -08:00
John MacFarlane	08231f5cdd	Factor out T.P.Readers.LaTeX.Table.	2021-02-27 21:40:56 -08:00
John MacFarlane	925815bb33	Split off T.P.Readers.LaTeX.Accent. To help reduce memory demands compiling the main LaTeX reader.	2021-02-27 17:02:44 -08:00
Albert Krewinkel	3327b225a1	Lua: use strict evaluation when retrieving AST value from the stack Fixes: #6674	2021-02-27 21:57:12 +01:00
Salim B	fae6a204f1	Fix/update URLs and use HTTPS where possible (#7122 )	2021-02-26 17:56:04 -08:00
John MacFarlane	f0a991a22b	T.P.CSV: fix parsing of unquoted values. Previously we didn't allow unescaped quotes in unquoted values, but they are allowed. Closes #7112.	2021-02-22 21:18:04 -08:00
John MacFarlane	d30791a381	Fall back to latin1 if UTF-8 decoding fails... ...when handling URL argument served with no charset in the mime type. The assumption is that most pages that don't specify a charset in the mime type are either UTF-8 or latin1. I think that's a good assumption, though I'm not sure.	2021-02-22 14:17:22 -08:00
John MacFarlane	5a73c5d3f8	When downloading content from URL arguments, be sensitive to... the character encoding. We can properly handle UTF-8 and latin1 (ISO-8859-1); for others we raise an error. See #5600.	2021-02-22 14:01:10 -08:00
John MacFarlane	bafccd5aa2	T.P.Error: Add PandocUnsupportedCharsetError constructor... ...for PandocError. [API change]	2021-02-22 14:01:04 -08:00
John MacFarlane	4617f229ea	Text.Pandoc.MIME: add exported function getCharset. [API change]	2021-02-22 13:28:47 -08:00
John MacFarlane	80fde18fb1	Text.Pandoc.UTF8: change IO functions to return Text, not String. [API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.	2021-02-22 11:30:07 -08:00
John MacFarlane	2b37ed9f21	LaTeX reader: further optimizations in satisfyTok. Benchmarks show 2/3 of the run time and 2/3 of the allocation of the Feb. 10 benchmarks.	2021-02-21 11:30:17 -08:00
John MacFarlane	db4f882315	LaTeX reader: removed sExpanded in state. This isn't actually needed and checking it doesn't change anything. Also remove an unnecessary `doMacros` before `satisfyTok`, which does it anyway.	2021-02-21 11:24:04 -08:00
John MacFarlane	f43cb5ddcf	LaTeX reader: further performance optimization. Avoid unnecessary 'doMacros'.	2021-02-21 10:58:42 -08:00
John MacFarlane	c0c8865eaa	HTML reader: small performance tweak.	2021-02-20 23:40:02 -08:00
John MacFarlane	d8ef383692	T.P.Shared: remove some obsolete functions [API change]. Removed: - `splitByIndices` - `splitStringByIndicies` - `substitute` - `underlineSpan` None of these are used elsewhere in the code base.	2021-02-20 23:02:10 -08:00
John MacFarlane	321343b2cf	HTML reader: small efficiency improvements. Also, remove exported class NamedTag(..) [API change]. This was just intended to smooth over the transition from String to Text and is no longer needed. The functions isInlineTag and isBlockTag are no longer polymorphic.	2021-02-20 22:49:20 -08:00
John MacFarlane	cec541e54c	LaTeX reader: Another small improvement to macro handling.	2021-02-20 22:14:31 -08:00
John MacFarlane	31b8f60ea8	LaTeX reader: avoid macro resolution code if no macros defined.	2021-02-20 22:03:29 -08:00
John MacFarlane	0f955b10b4	T.P.Readers.LaTeX.Parsing: improve braced'. Remove the parameter, have it parse the opening brace, and make it more efficient.	2021-02-20 18:57:46 -08:00
John MacFarlane	13847267e9	HTML reader: efficiency improvements. Do a lookahead to find the right parser to use. Benchmarks from 34ms to 23ms, with less allocation. Also speeds up the epub reader.	2021-02-20 00:07:38 -08:00
John MacFarlane	98d26c2345	DocBook, JATS, OPML readers: performance optimization. With the new XML parser, we can avoid the expensive tree normalization step we used to do. This gives a significant speed boost in docbook and JATS parsing (e.g. 9.7 to 6 ms).	2021-02-18 21:24:31 -08:00
John MacFarlane	ef642e2bbc	T.P.XML Improve fromEntities.	2021-02-18 18:11:27 -08:00
John MacFarlane	0f5c56dfb1	T.P.PDF: disable `smart` when building PDF via LaTeX. This is to prevent accidental creation of ligatures like `` ?` `` and `` !` `` (especially in languages with quotations like German), and similar ligature issues. See jgm/citeproc#54.	2021-02-18 17:11:53 -08:00
John MacFarlane	53cf8295a4	LaTeX writer: adjust hypertargets to beginnings of paragraphs. Use `\vadjust pre` so that the hypertarget takes you to the beginning of the paragraph rather than one line down. Closes #7078. This makes a particular difference for links to citations using `--citeproc` and `link-citations: true`.	2021-02-18 14:34:38 -08:00
John MacFarlane	9e728b40f3	T.P.Shared: cleanup. Cleanup up some functions and added deprecation pragmas to funtions no longer used in the code base.	2021-02-18 13:12:15 -08:00
Albert Krewinkel	743f7216de	Org reader: fix bug in org-ref citation parsing. The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101	2021-02-18 21:59:18 +01:00
John MacFarlane	73add05789	Docx reader: use Map instead of list for Namespaces. This gives a speedup of about 5-10%. The reader is now approximately twice as fast as in the last release.	2021-02-17 09:54:39 -08:00
John MacFarlane	80a1d5c9b6	Revert "Add T.P.XML.Light.Cursor." This reverts commit `d8fc497186`.	2021-02-16 19:18:01 -08:00
John MacFarlane	d8fc497186	Add T.P.XML.Light.Cursor.	2021-02-16 18:51:41 -08:00
John MacFarlane	4af378702a	Add orig copyright/license info for code derived from xml-light.	2021-02-16 18:44:38 -08:00
John MacFarlane	d7a4996b1e	Split up T.P.XML.Light into submodules.	2021-02-16 18:40:06 -08:00
John MacFarlane	967e7f5fb9	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light... ..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to `8ca191604d` (Feb 8) B = as of `8ca191604d` (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|	2021-02-16 16:55:20 -08:00
Albert Krewinkel	8621ed600a	T.P.Error: remove unused variables	2021-02-14 15:49:12 +01:00
John MacFarlane	d84a6041e1	HTML reader: fix bad handling of empty src attribute in iframe. - If src is empty, we simply skip the iframe. - If src is invalid or cannot be fetched, we issue a warning and skip instead of failing with an error. - Closes #7099.	2021-02-13 13:08:34 -08:00
John MacFarlane	6e73273916	T.P.Error: export `renderError`. Refactor `handleError` to use `renderError`. This allows us render error messages without exiting.	2021-02-13 13:08:34 -08:00
Albert Krewinkel	a3beed9db8	Org: support task_lists extension The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336	2021-02-13 13:00:37 -08:00
Albert Krewinkel	2d60a5127c	T.P.Shared: export `handleTaskListItem`. [API change]	2021-02-13 13:00:37 -08:00
John MacFarlane	6323250bad	LaTeX reader: remove unnecessary line	2021-02-13 00:22:22 -08:00
John MacFarlane	25b7df7c2a	Remove Ext_fenced_code_attributes from allowed commonmark attributes. This attribute was listed as allowed, but it didn't actually do anything. Use `attributes` for code attributes and more. Closes #7097.	2021-02-13 00:18:40 -08:00
John MacFarlane	eb0c63b002	Avoid an unnecessary withRaw.	2021-02-12 19:29:48 -08:00
John MacFarlane	d9322629a3	LaTeX reader improvements. * Rewrote `withRaw` so it doesn't rely on fragile assumptions about token positions (which break when macros are expanded). This requires the addition of `sEnableWithRaw` and `sRawTokens` in `LaTeXState`, and a new combinator `disablingWithRaw` to disable collecting of raw tokens in certain contexts. * Add `parseFromToks` to T.P.Readers.LaTeX.Parsing. * Fix parsing of single character tokens so it doesn't mess up the new raw token collecting. * These changes slightly increase allocations and have a small performance impact, but it's minor. Closes #7092.	2021-02-12 19:04:14 -08:00
John MacFarlane	390d5e65b2	Use getTimestamp instead of getCurrentTime in writers. Setting SOURCE_DATE_EPOCH will allow reproducible builds. Partially addresses #7093. This does not suffice to fully enable reproducible in EPUB, since a unique id is being generated for each build.	2021-02-11 14:55:03 -08:00
John MacFarlane	3c4a58bad0	T.P.Class: Add getTimestamp [API change]. This attempts to read the SOURCE_DATE_EPOCH environment variable and parse a UTC time from it (treating it as a unix date stamp, see https://reproducible-builds.org/specs/source-date-epoch/). If the variable is not set or can't be parsed as a unix date stamp, then the function returns the current date.	2021-02-11 14:54:28 -08:00
John MacFarlane	acc9afaf6f	Correctly parse "raw" date value in markdown references metadata. See jgm/citeproc#53.	2021-02-11 09:16:25 -08:00
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
John MacFarlane	f70795dc5e	ODT reader: finer-grained errors on parse failure. See #7091.	2021-02-08 09:39:59 -08:00
John MacFarlane	5cd1c1001f	ODT reader: give more information if zip can't be unpacked.	2021-02-08 09:39:59 -08:00

... 4 5 6 7 8 ...

7705 commits