8ca191604d
This exports functions that uses xml-conduit's parser to produce an xml-light Element or [Content]. This allows existing pandoc code to use a better parser without much modification. The new parser is used in all places where xml-light's parser was previously used. Benchmarks show a significant performance improvement in parsing XML-based formats (especially ODT and FB2). Note that the xml-light types use String, so the conversion from xml-conduit types involves a lot of extra allocation. It would be desirable to avoid that in the future by gradually switching to using xml-conduit directly. This can be done module by module. The new parser also reports errors, which we report when possible. A new constructor PandocXMLError has been added to PandocError in T.P.Error [API change]. Closes #7091, which was the main stimulus. These changes revealed the need for some changes in the tests. The docbook-reader.docbook test lacked definitions for the entities it used; these have been added. And the docx golden tests have been updated, because the new parser does not preserve the order of attributes. Add entity defs to docbook-reader.docbook. Update golden tests for docx.
24 lines
484 B
Markdown
24 lines
484 B
Markdown
```
|
|
% pandoc -f jats -t native
|
|
<fig id="fig-1">
|
|
<caption>
|
|
<p>bar</p>
|
|
</caption>
|
|
<graphic xlink:href="foo.png" xlink:alt-text="baz" />
|
|
</fig>
|
|
^D
|
|
[Para [Image ("fig-1",[],[]) [Str "bar"] ("foo.png","fig:")]]
|
|
```
|
|
|
|
```
|
|
% pandoc -f jats -t native
|
|
<fig id="fig-1">
|
|
<caption>
|
|
<title>foo</title>
|
|
<p>bar</p>
|
|
</caption>
|
|
<graphic xlink:href="foo.png" xlink:alt-text="baz" />
|
|
</fig>
|
|
^D
|
|
[Para [Image ("fig-1",[],[]) [Str "foo",LineBreak,Str "bar"] ("foo.png","fig:")]]
|
|
```
|