pandoc

Author	SHA1	Message	Date
Erik Rask	82e8c29cb0	Include Header.Attr.attributes as XML attributes on section Add key-value pairs found in the attributes list of Header.Attr as XML attributes on the corresponding section element. Any key name not allowed as an XML attribute name is dropped, as are keys with invalid values where they are defined as enums in DocBook, and xml:id (for DocBook 5)/id (for DocBook 4) to not intervene with computed identifiers.	2021-03-20 21:29:17 +01:00
John MacFarlane	ceadf33246	Tests: Use getExecutablePath from base... avoiding the need to depend on the executable-path package.	2021-03-19 23:35:47 -07:00
John MacFarlane	dc94601eb5	Tests: factor out setupEnvironment in Test.Helpers. This avoids code duplication between Command and Old.	2021-03-19 21:17:13 -07:00
John MacFarlane	2ca1b20a85	Fix finding of data files from test programs. Apparently Cabal sets a `pandoc_datadir` environment variable so that the data files will be sought in the source directory rather than in the final destination (where they aren't yet installed). So we no longer need to set `--data-dir` in the tests. We just need to make sure `pandoc_datadir` is set in the environment when we call the program in the test suite. This will fix the issue with loading of pandoc.lua when pandoc is built with `-embed_data_files`, reported in #7163. Closes #7163.	2021-03-19 18:57:13 -07:00
Albert Krewinkel	00e8d0678e	Jira reader: mark divs created from panels with class "panel". Closes: tarleb/jira-wiki-markup#2	2021-03-13 14:29:47 +01:00
Albert Krewinkel	a8aa301428	Jira writer: improve div/panel handling Include div attributes in panels, always render divs with class `panel` as panels, and avoid nesting of panels.	2021-03-13 12:10:02 +01:00
Albert Krewinkel	eb184d9148	Jira writer: use noformat instead of code for unknown languages. Code blocks that are not marked as a language supported by Jira are rendered as preformatted text with `{noformat}` blocks. Fixes: tarleb/jira-wiki-markup#4	2021-03-08 12:50:35 +01:00
Albert Krewinkel	e1454fe0d0	Jira writer: use Span identifiers as anchors Closes: tarleb/jira-wiki-markup#3.	2021-03-01 14:36:11 +01:00
John MacFarlane	12b47656d4	Remove superfluous imports.	2021-02-28 22:57:36 -08:00
John MacFarlane	7e38b8e55a	T.P.Readers.LaTeX: Don't export tokenize, untokenize. [API change] These were only exported for testing, which seems the wrong thing to do. They don't belong in the public API and are not really usable as they are, without access to the Tok type which is not exported. Removed the tokenize/untokenize roundtrip test. We put a quickcheck property in the comments which may be used when this code is touched (if it is).	2021-02-28 22:53:42 -08:00
Albert Krewinkel	00e4bb51e4	tests: print accurate location if a test fails Ensures that tasty-hunit reports the location of the failing test instead of the location of the helper `test` function.	2021-02-22 23:56:04 +01:00
John MacFarlane	80fde18fb1	Text.Pandoc.UTF8: change IO functions to return Text, not String. [API change] This affects `readFile`, `getContents`, `writeFileWith`, `writeFile`, `putStrWith`, `putStr`, `putStrLnWith`, `putStrLn`. `hPutStrWith`, `hPutStr`, `hPutStrLnWith`, `hPutStrLn`, `hGetContents`. This avoids the need to uselessly create a linked list of characters when emiting output.	2021-02-22 11:30:07 -08:00
Albert Krewinkel	743f7216de	Org reader: fix bug in org-ref citation parsing. The org-ref syntax allows to list multiple citations separated by comma. This fixes a bug that accepted commas as part of the citation id, so all citation lists were parsed as one single citation. Fixes: #7101	2021-02-18 21:59:18 +01:00
John MacFarlane	967e7f5fb9	Rename Text.Pandoc.XMLParser -> Text.Pandoc.XML.Light... ..and add new definitions isomorphic to xml-light's, but with Text instead of String. This allows us to keep most of the code in existing readers that use xml-light, but avoid lots of unnecessary allocation. We also add versions of the functions from xml-light's Text.XML.Light.Output and Text.XML.Light.Proc that operate on our modified XML types, and functions that convert xml-light types to our types (since some of our dependencies, like texmath, use xml-light). Update golden tests for docx and pptx. OOXML test: Use `showContent` instead of `ppContent` in `displayDiff`. Docx: Do a manual traversal to unwrap sdt and smartTag. This is faster, and needed to pass the tests. Benchmarks: A = prior to `8ca191604d` (Feb 8) B = as of `8ca191604d` (Feb 8) C = this commit \| Reader \| A \| B \| C \| \| ------- \| ----- \| ------ \| ----- \| \| docbook \| 18 ms \| 12 ms \| 10 ms \| \| opml \| 65 ms \| 62 ms \| 35 ms \| \| jats \| 15 ms \| 11 ms \| 9 ms \| \| docx \| 72 ms \| 69 ms \| 44 ms \| \| odt \| 78 ms \| 41 ms \| 28 ms \| \| epub \| 64 ms \| 61 ms \| 56 ms \| \| fb2 \| 14 ms \| 5 ms \| 4 ms \|	2021-02-16 16:55:20 -08:00
Albert Krewinkel	a3beed9db8	Org: support task_lists extension The tasks lists extension is now supported by the org reader and writer; the extension is turned on by default. Closes: #6336	2021-02-13 13:00:37 -08:00
Albert Krewinkel	8ffd4159d6	Jira: require jira-wiki-markup 1.3.3 * Modified the Doc parser to skip leading blank lines. This fixes parsing of documents which start with multiple blank lines. (#7095) * Prevent URLs within link aliases to be treated as autolinks. (#6944) Fixes: #7095 Fixes: #6944	2021-02-12 17:15:12 +01:00
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
Albert Krewinkel	d202f7eb77	Avoid unnecessary use of NoImplicitPrelude pragma (#7089 )	2021-02-07 10:02:35 -08:00
John MacFarlane	e6c7fcc598	Fixed some compiler warnings in tests.	2021-02-02 21:09:10 -08:00
Albert Krewinkel	61b108d527	Lua: add module "pandoc.path" The module allows to work with file paths in a convenient and platform-independent manner. Closes: #6001 Closes: #6565	2021-02-02 21:04:30 -08:00
John MacFarlane	2415b2680a	Test suite: a more robust way of testing the executable. Mmny of our tests require running the pandoc executable. This is problematic for a few different reasons. First, cabal-install will sometimes run the test suite after building the library but before building the executable, which means the executable isn't in place for the tests. One can work around that by first building, then building and running the tests, but that's fragile. Second, we have to find the executable. So far, we've done that using a function findPandoc that attempts to locate it relative to the test executable (which can be located using findExecutablePath). But the logic here is delicate and work with every combination of options. To solve both problems, we add an `--emulate` option to the `test-pandoc` executable. When `--emulate` occurs as the first argument passed to `test-pandoc`, the program simply emulates the regular pandoc executable, using the rest of the arguments (after `--emulate`). Thus, test-pandoc --emulate -f markdown -t latex is just like pandoc -f markdown -t latex Since all the work is done by library functions, implementing this emulation just takes a couple lines of code and should be entirely reliable. With this change, we can test the pandoc executable by running the test program itself (locatable using findExecutablePath) with the `--emulate` option. This removes the need for the fragile `findPandoc` step, and it means we can run our integration tests even when we're just building the library, not the executable. Part of this change involved simplifying some complex handling to set environment variables for dynamic library paths. I have tested a build with `--enable-dynamic-executable`, and it works, but further testing may be needed.	2021-02-02 20:36:51 -08:00
John MacFarlane	c841bcf3b0	Revert "Markdown reader: support GitHub wiki's internal links (#2923 ) (#6458 )" This reverts commit `6efd3460a7`. Since this extension is designed to be used with GitHub markdown (gfm), we need to implement the parser as a commonmark extension (commonmark-extensions), rather than in pandoc's markdown reader. When that is done, we can add it here.	2021-01-16 16:22:04 -08:00
Gautier DI FOLCO	6efd3460a7	Markdown reader: support GitHub wiki's internal links (#2923 ) (#6458 ) Canges overview: * Add a `Ext_markdown_github_wikilink` constructor to `Extension` [API change]. * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Add tests.	2021-01-16 16:15:33 -08:00
Albert Krewinkel	fe1378227b	Org reader: allow multiple pipe chars in todo sequences Additional pipe chars, used to separate "action" state from "no further action" states, are ignored. E.g., for the following sequence, both `DONE` and `FINISHED` are states with no further action required. #+TODO: UNFINISHED \| DONE \| FINISHED Previously, parsing of the todo sequence failed if multiple pipe chars were included. Closes: #7014	2021-01-09 13:40:31 +01:00
Albert Krewinkel	4f34345867	Update copyright notices for 2021 (#7012 )	2021-01-08 09:38:20 -08:00
Dimitri Sabadie	57b1094152	Org reader: mark verbatim code with class "verbatim". (#6998 ) * Replace org-mode’s verbatim from code to codeWith. This adds the `"verbatim"` class so that exporters can apply a specific style on it. For instance, it will be possible for HTML to add a CSS rule for code + verbatim class. * Alter test for org-mode’s verbatim change. See previous commit for further detail on the new implementation.	2021-01-03 08:57:47 +01:00
Albert Krewinkel	17e3efc785	Org reader: restructure output of captioned code blocks The Div wrapper of code blocks with captions now has the class "captioned-content". The caption itself is added as a Plain block inside a Div of class "caption". This makes it easier to write filters which match on captioned code blocks. Existing filters will need to be updated. Closes: #6977	2021-01-01 11:18:36 +01:00
Albert Krewinkel	8f402beab9	LaTeX writer: support colspans and rowspans in tables. (#6950 ) Note that the multirow package is needed for rowspans. It is included in the latex template under a variable, so that it won't be used unless needed for a table.	2020-12-20 18:04:54 -08:00
Albert Krewinkel	00031fc809	Docx writer: keep raw openxml strings verbatim. Closes: #6933	2020-12-13 14:09:59 +01:00
John MacFarlane	810df00cf5	Merge pull request #6922 from jtojnar/db-writer-admonitions Docbook writer: handle admonitions	2020-12-07 08:48:02 -08:00
Jan Tojnar	70c7c5703a	Docbook writer: Handle admonition titles from Markdown reader Docbook reader produces a `Div` with `title` class for `<title>` element within an “admonition” element. Markdown writer then turns this into a fenced div with `title` class attribute. Since fenced divs are block elements, their content is recognized as a paragraph by the Markdown reader. This is an issue for Docbook writer because it would produce an invalid DocBook document from such AST – the `<title>` element can only contain “inline” elements. Let’s handle this invalid special case separately by unwrapping the paragraph before creating the `<title>` element.	2020-12-07 07:28:39 +01:00
Jan Tojnar	dc6856530c	Docbook writer: handle admonitions Similarly to `d6fdfe6f2b`, we should handle admonitions.	2020-12-07 06:23:25 +01:00
Albert Krewinkel	acf932825b	Org reader: preserve targets of spurious links Links with (internal) targets that the reader doesn't know about are converted into emphasized text. Information on the link target is now preserved by wrapping the text in a Span of class `spurious-link`, with an attribute `target` set to the link's original target. This allows to recover and fix broken or unknown links with filters. See: #6916	2020-12-05 22:37:48 +01:00
Albert Krewinkel	0eedbd0a3d	HTML reader tests: disable round-trip testing for tables Information for cell alignment in a column is not preserved during round-trips.	2020-11-24 15:46:11 +01:00
Albert Krewinkel	5344dab8eb	Org reader: parse `#+LANGUAGE` into `lang` metadata field Fixes: #6845	2020-11-22 12:53:05 +01:00
Albert Krewinkel	d286242131	JATS writer: support advanced table features	2020-11-19 22:09:52 +01:00
TEC	0306eec5fa	Replace org #+KEYWORDS with #+keywords As of ~2 years ago, lower case keywords became the standard (though they are handled case insensitive, as always): `13424336a6` Upper case keywords are exclusive to the manual: - https://orgmode.org/list/871s50zn6p.fsf@nicolasgoaziou.fr/ - https://orgmode.org/list/87tuuw3n15.fsf@nicolasgoaziou.fr/	2020-11-18 14:48:56 +01:00
Aner Lucero	f63b76e169	Markdown writer: default to using ATX headings. Previously we used Setext (underlined) headings by default. The default is now ATX (`##` style). * Add the `--markdown-headings=atx\|setext` option. * Deprecate `--atx-headers`. * Add constructor 'ATXHeadingInLHS` constructor to `LogMessage` [API change]. * Support `markdown-headings` in defaults files. * Document new options in MANUAL. Closes #6662.	2020-11-14 21:33:32 -08:00
Albert Krewinkel	7f57546345	Fix remaining typos in tests See: #6738	2020-10-14 22:39:29 +02:00
John MacFarlane	a520181cdb	Use golden test framework for command tests. This means that `--accept` can be used to update expected output.	2020-10-07 22:33:44 -07:00
Diego Balseiro	eda5540719	DOCX reader: Allow empty dates in comments and tracked changes (#6726 ) For security reasons, some legal firms delete the date from comments and tracked changes. * Make date optional (Maybe) in tracked changes and comments datatypes * Add tests	2020-10-06 21:03:00 -07:00
Michael Hoffmann	74bd5a4f47	Docx writer: better handle list items whose contents are lists (#6522 ) If the first element of a bulleted or ordered list is another list, then that first item will disappear if the target format is docx. This changes the docx writer so that it prepends an empty string for those cases. With this, no items will disappear. Closes #5948.	2020-10-02 09:30:05 -07:00
John MacFarlane	a59ae96062	Markdown reader: Set citationNoteNum accurately in citations. This also changes stateLastNoteNumber -> stateNoteNumber.	2020-09-21 10:10:37 -07:00
Christian Despres	a2d343420f	LaTeX reader: fix improper empty cell filtering (#6689 )	2020-09-15 13:36:11 -07:00
Albert Krewinkel	34151e8da8	HTML writer: support intermediate table headers Closes: #6314	2020-09-13 23:23:11 +02:00
Christian Despres	cae155b095	Fix hlint suggestions, update hlint.yaml (#6680 ) * Fix hlint suggestions, update hlint.yaml Most suggestions were redundant brackets. Some required LambdaCase. The .hlint.yaml file had a small typo, and didn't ignore camelCase suggestions in certain modules.	2020-09-13 07:48:14 -07:00
Albert Krewinkel	a400d0dc62	HTML writer: render table footers if present Part of: #6314	2020-09-12 21:49:01 +02:00
Christian Despres	22babd5382	[API change] Rename Writers.Tables and its contents (#6679 ) Writers.Tables is now Writers.AnnotatedTable. All of the types and functions in it have had the "Ann" removed from them. Now it is expected that the module be imported qualified.	2020-09-12 08:50:36 -07:00
Albert Krewinkel	9423b4b7d9	Support colspans and rowspans in HTML tables (#6644 ) * HTML writer: add support for row headers, colspans, rowspans * Add planet table tests See #6312	2020-09-10 09:47:40 -07:00
Christian Despres	10c6c411f9	Add Writers.Tables helper functions and types, add tests for those (#6655 ) Add Writers.Tables helper functions and types, add tests for those The Writers.Tables module contains an AnnTable type that is a pandoc Table with added inferred information that should be enough for writers (in particular the HTML writer) to operate on without having to lay out the table themselves. The toAnnTable and fromAnnTable functions in that module convert between AnnTable and Table. In addition to producing an AnnTable with coherent and well-formed annotations, the toAnnTable function also normalizes its input Table like the table builder does. Various tests ensure that toAnnTable normalizes tables exactly like the table builder, and that its annotations are coherent.	2020-09-05 14:36:51 -07:00

1 2 3 4 5 ...

616 commits