pandoc

Author	SHA1	Message	Date
Albert Krewinkel	1843a8793a	HTML writer: keep attributes from code nested below pre tag. If a code block is defined with `<pre><code class="language-x">…</code></pre>`, where the `<pre>` element has no attributes, then the attributes from the `<code>` element are used instead. Any leading `language-` prefix is dropped in the code's class attribute are dropped to improve syntax highlighting. Closes: #7221	2021-05-17 18:08:02 +02:00
Albert Krewinkel	0794862aac	HTML writer: parse `<header>` as a Div HTML5 `<header>` elements are treated like `<div>` elements.	2021-05-15 16:46:02 +02:00
John MacFarlane	6e45607f99	Change reader types, allowing better tracking of source positions. Previously, when multiple file arguments were provided, pandoc simply concatenated them and passed the contents to the readers, which took a Text argument. As a result, the readers had no way of knowing which file was the source of any particular bit of text. This meant that we couldn't report accurate source positions on errors or include accurate source positions as attributes in the AST. More seriously, it meant that we couldn't resolve resource paths relative to the files containing them (see e.g. #5501, #6632, #6384, #3752). Add Text.Pandoc.Sources (exported module), with a `Sources` type and a `ToSources` class. A `Sources` wraps a list of `(SourcePos, Text)` pairs. [API change] A parsec `Stream` instance is provided for `Sources`. The module also exports versions of parsec's `satisfy` and other Char parsers that track source positions accurately from a `Sources` stream (or any instance of the new `UpdateSourcePos` class). Text.Pandoc.Parsing now exports these modified Char parsers instead of the ones parsec provides. Modified parsers to use a `Sources` as stream [API change]. The readers that previously took a `Text` argument have been modified to take any instance of `ToSources`. So, they may still be used with a `Text`, but they can also be used with a `Sources` object. In Text.Pandoc.Error, modified the constructor PandocParsecError to take a `Sources` rather than a `Text` as first argument, so parse error locations can be accurately reported. T.P.Error: showPos, do not print "-" as source name.	2021-05-09 19:11:34 -06:00
mbrackeantidot	b6a65445e1	Docx reader: add handling of vml image objects (jgm#4735) (#7257 ) They represent images, the same way as other images in vml format.	2021-04-29 09:11:44 -07:00
John MacFarlane	80e2e88287	Smarter smart quotes. Treat a leading " with no closing " as a left curly quote. This supports the practice, in fiction, of continuing paragraphs quoting the same speaker without an end quote. It also helps with quotes that break over lines in line blocks. Closes #7216.	2021-04-28 23:32:37 -07:00
niszet	40da6c402b	Treat tabs as spaces in ODT Reader. (#7185 )	2021-03-31 16:44:34 -07:00
Albert Krewinkel	00e8d0678e	Jira reader: mark divs created from panels with class "panel". Closes: tarleb/jira-wiki-markup#2	2021-03-13 14:29:47 +01:00
John MacFarlane	12b47656d4	Remove superfluous imports.	2021-02-28 22:57:36 -08:00
John MacFarlane	7e38b8e55a	T.P.Readers.LaTeX: Don't export tokenize, untokenize. [API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).	2021-02-28 22:53:42 -08:00
John MacFarlane	80fde18fb1	Text.Pandoc.UTF8: change IO functions to return Text, not String. [API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.	2021-02-22 11:30:07 -08:00
Albert Krewinkel	743f7216de	Org reader: fix bug in org-ref citation parsing. The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101	2021-02-18 21:59:18 +01:00
Albert Krewinkel	a3beed9db8	Org: support task_lists extension The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336	2021-02-13 13:00:37 -08:00
Albert Krewinkel	8ffd4159d6	Jira: require jira-wiki-markup 1.3.3 * Modified the Doc parser to skip leading blank lines. This fixes parsing of documents which start with multiple blank lines. (#7095) * Prevent URLs within link aliases to be treated as autolinks. (#6944) Fixes: #7095 Fixes: #6944	2021-02-12 17:15:12 +01:00
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
Albert Krewinkel	d202f7eb77	Avoid unnecessary use of NoImplicitPrelude pragma (#7089 )	2021-02-07 10:02:35 -08:00
John MacFarlane	c841bcf3b0	Revert "Markdown reader: support GitHub wiki's internal links (#2923 ) (#6458 )" This reverts commit `6efd3460a7`. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.	2021-01-16 16:22:04 -08:00
Gautier DI FOLCO	6efd3460a7	Markdown reader: support GitHub wiki's internal links (#2923 ) (#6458 ) Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.	2021-01-16 16:15:33 -08:00
Albert Krewinkel	fe1378227b	Org reader: allow multiple pipe chars in todo sequences Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED \| DONE \| FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014	2021-01-09 13:40:31 +01:00
Albert Krewinkel	4f34345867	Update copyright notices for 2021 (#7012 )	2021-01-08 09:38:20 -08:00
Dimitri Sabadie	57b1094152	Org reader: mark verbatim code with class "verbatim". (#6998 ) * Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.	2021-01-03 08:57:47 +01:00
Albert Krewinkel	17e3efc785	Org reader: restructure output of captioned code blocks The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977	2021-01-01 11:18:36 +01:00
Albert Krewinkel	acf932825b	Org reader: preserve targets of spurious links Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916	2020-12-05 22:37:48 +01:00
Albert Krewinkel	0eedbd0a3d	HTML reader tests: disable round-trip testing for tables Information for cell alignment in a column is not preserved during round-trips.	2020-11-24 15:46:11 +01:00
Albert Krewinkel	5344dab8eb	Org reader: parse `#+LANGUAGE` into `lang` metadata field Fixes: #6845	2020-11-22 12:53:05 +01:00
TEC	0306eec5fa	Replace org #+KEYWORDS with #+keywords As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): `13424336a6` Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/	2020-11-18 14:48:56 +01:00
Albert Krewinkel	7f57546345	Fix remaining typos in tests See: #6738	2020-10-14 22:39:29 +02:00
Diego Balseiro	eda5540719	DOCX reader: Allow empty dates in comments and tracked changes (#6726 ) For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests	2020-10-06 21:03:00 -07:00
John MacFarlane	a59ae96062	Markdown reader: Set citationNoteNum accurately in citations. This also changes stateLastNoteNumber -> stateNoteNumber.	2020-09-21 10:10:37 -07:00
Christian Despres	a2d343420f	LaTeX reader: fix improper empty cell filtering (#6689 )	2020-09-15 13:36:11 -07:00
Christian Despres	cae155b095	Fix hlint suggestions, update hlint.yaml (#6680 ) * Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.	2020-09-13 07:48:14 -07:00
Laurent P. René de Cotret	482a2e5079	[Latex Reader] Fixing issues with \multirow and \multicolumn table cells (#6608 ) * Added test to replicate (#6596) * Table cell reader not consuming spaces correctly (#6596) * Prevented wrong nesting of \multicolumn and \multirow table cells (#6603) * Parse empty table cells (#6603) * Support full prototype for multirow macro (#6603) Closes #6603	2020-08-15 11:40:10 -07:00
Laurent P. René de Cotret	499fc11fca	[Latex Reader] Table cell parser not consuming spaces correctly (#6597 ) * Added test to replicate (#6596) * Table cell reader not consuming spaces correctly (#6596)	2020-08-07 22:45:47 -07:00
Laurent P. René de Cotret	8c3b5dd3ae	Col-span and row-span in LaTeX reader (#6470 ) Add multirow and multicolumn support in LaTex reader. Partially addresses #6311.	2020-07-23 11:23:21 -07:00
Albert Krewinkel	ccf9889c2c	Org reader: respect tables-excluding export setting Tables can be removed from the final document with the `#+OPTION: \|:nil` export setting.	2020-07-01 09:28:24 +02:00
Albert Krewinkel	d6711bd7d9	Org reader: respect export setting disabling footnotes Footnotes can be removed from the final document with the `#+OPTION: f:nil` export setting.	2020-06-30 22:30:15 +02:00
Albert Krewinkel	7c207c3051	Org reader: respect export setting which disables entities MathML-like entities, e.g., `\alpha`, can be disabled with the `#+OPTION: e:nil` export setting.	2020-06-30 11:39:32 +02:00
Albert Krewinkel	5ef315cc6d	Org reader: keep unknown keyword lines as raw org The lines of unknown keywords, like `#+SOMEWORD: value` are no longer read as metadata, but kept as raw `org` blocks. This ensures that more information is retained when round-tripping org-mode files; additionally, this change makes it possible to support non-standard org extensions via filters.	2020-06-29 21:19:34 +02:00
Albert Krewinkel	90ac70c79c	Org reader: unify keyword handling Handling of export settings and other keywords (like `#+LINK`) has been combined and unified.	2020-06-29 20:53:25 +02:00
Albert Krewinkel	1480606174	Org reader: support LATEX_HEADER_EXTRA and HTML_HEAD_EXTRA settings These export settings are treated like their non-extra counterparts, i.e., the values are added to the `header-includes` metadata list.	2020-06-29 17:04:29 +02:00
Albert Krewinkel	d17b257c89	Org reader: allow multiple #+SUBTITLE export settings The values of all lines are read as inlines and collected in the `subtitle` metadata field.	2020-06-29 17:03:33 +02:00
Albert Krewinkel	19175af811	JATS reader: parse abstract element into metadata field of same name (#6482 ) Closes: #6480	2020-06-28 10:35:50 -07:00
Albert Krewinkel	d2d5eb8a99	Org reader: read `#+INSTITUTE` values as text with markup The value is stored in the `institute` metadata field and used in the default beamer presentation template.	2020-06-28 19:25:57 +02:00
Albert Krewinkel	b7a8620b43	Org tests: group export settings test for Org reader	2020-06-28 19:25:57 +02:00
Albert Krewinkel	e3a6d651e1	Org reader: update behavior of author, keywords export settings The behavior of the `#+AUTHOR` and `#+KEYWORD` export settings has changed: Org now allows multiple such lines and adds a space between the contents of each line. Pandoc now always parses these settings as meta inlines; setting values are no longer treated as comma-separated lists. Note that a Lua filter can be used to restore the previous behavior.	2020-06-28 18:01:30 +02:00
Albert Krewinkel	8dce28d949	Org reader: read description lines as inlines `#+DESCRIPTION` lines are treated as text with markup. If multiple such lines are given, then all lines are read and separated by soft linebreaks. Closes: #6485	2020-06-27 09:11:00 +02:00
Albert Krewinkel	9e6e9a7221	Org reader: honor tex export option The `tex` export option can be set with `#+OPTION: tex:nil` and allows three settings: - `t` causes LaTeX fragments to be parsed as TeX or added as raw TeX, - `nil` removes all LaTeX fragments from the document, and - `verbatim` treats LaTeX as text. The default is `t`. Closes: #4070	2020-06-25 20:31:33 +02:00
John MacFarlane	b1561d8e47	Use native Underline instead of Span in Jira	2020-06-22 17:55:57 -07:00
Albert Krewinkel	f5d7d41cbd	Recognize images with uppercase extensions Fixes: #6472	2020-06-20 18:14:18 +02:00
Vaibhav Sagar	9c2b659eeb	Support new Underline element in readers and writers (#6277 ) Deprecate `underlineSpan` in Shared in favor of `Text.Pandoc.Builder.underline`.	2020-04-28 07:53:06 -07:00
John MacFarlane	aff2500d46	More fixes for round-trip tests of HTML reader. We exclude tables that have default widths but non-simple content, as these can't really round-trip.	2020-04-19 17:21:19 -07:00

1 2 3 4 5 ...

370 commits