pandoc

Author	SHA1	Message	Date
John MacFarlane	0bdcf415e4	Switch from pretty-simple to pretty-show for native output. Update tests. Reason: it turns out that the native output generated by pretty-simple isn't always readable by the native reader. According to https://github.com/cdepillabout/pretty-simple/issues/99 it is not a design goal of the library that the rendered values be readable using 'read'. This makes it unsuitable for our purposes. pretty-show is a bit slower and it uses 4-space indents (non-configurable), but it doesn't have this serious drawback.	2021-09-28 21:17:53 -07:00
John MacFarlane	c266734448	Use pretty-simple to format native output. Previously we used our own homespun formatting. But this produces over-long lines that aren't ideal for diffs in tests. Easier to use something off-the-shelf and standard. Closes #7580. Performance is slower by about a factor of 10, but this isn't really a problem because native isn't suitable as a serialization format. (For serialization you should use json, because the reader is so much faster than native.)	2021-09-21 12:37:42 -07:00
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
John MacFarlane	38c028bd50	JATS reader: fix parsing of figures. This ensures that a figure containing a single image is parsed as a pandoc "implicit figure" (i.e., a Para with a single Image whose title attribute begins with `fig:`). More complex figures will still be parsed as divs. Closes #5321.	2019-02-23 15:40:06 -07:00

4 commits