pandoc

Author	SHA1	Message	Date
John MacFarlane	8ca191604d	Add new unexported module T.P.XMLParser. This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.	2021-02-10 22:04:11 -08:00
Arfon Smith	5847624124	Update JATS dtd (#6020 ) The current DTD for the JATS writer template is for Journal Publishing (JATS-journalpublishing1.dtd), which does not permit ext-link as a valid child (https://jats.nlm.nih.gov/publishing/tag-library/1.1/element/publisher-name.html). This update modifies the default output template to be the less restrictive JATS archiving and interchange DTD which systems like PubMed use internally to represent their articles.	2019-12-30 22:18:55 -08:00
Nokome Bentley	7d193b2aad	Remove extraneous, significant whitespace in JATS writer output (#4335 ) This patch fixes some cases where the JATS writer was introducing semantically significant whitespace by indenting and wrapping tags. Note that the JATS spec has a content model for `<p>` tags of `(#PCDATA \| ...`. Any tag where `#PCDATA` children are possible should not have any indentation. The same is true for `<th>`, `<td>`, `<term>`, `<label>`.	2018-03-05 09:44:34 -08:00
John MacFarlane	6b63b63f30	JATS reader: process author metadata.	2017-12-23 10:03:13 -08:00
Hamish Mackenzie	5d3c9e5646	Add Basic JATS reader based on DocBook reader	2017-12-20 13:54:02 +13:00

5 commits