Commit graph

4 commits

Author SHA1 Message Date
John MacFarlane
8ca191604d Add new unexported module T.P.XMLParser.
This exports functions that uses xml-conduit's parser to
produce an xml-light Element or [Content].  This allows
existing pandoc code to use a better parser without
much modification.

The new parser is used in all places where xml-light's
parser was previously used.  Benchmarks show a significant
performance improvement in parsing XML-based formats
(especially ODT and FB2).

Note that the xml-light types use String, so the
conversion from xml-conduit types involves a lot
of extra allocation.  It would be desirable to
avoid that in the future by gradually switching
to using xml-conduit directly. This can be done
module by module.

The new parser also reports errors, which we report
when possible.

A new constructor PandocXMLError has been added to
PandocError in T.P.Error [API change].

Closes #7091, which was the main stimulus.

These changes revealed the need for some changes
in the tests.  The docbook-reader.docbook test
lacked definitions for the entities it used; these
have been added. And the docx golden tests have been
updated, because the new parser does not preserve
the order of attributes.

Add entity defs to docbook-reader.docbook.

Update golden tests for docx.
2021-02-10 22:04:11 -08:00
Mathieu Boespflug
bbf04df900
Docbook reader: implement <procedure> (#6442)
A `<procedure>` contains a sequence of `<step>`'s, or `<substeps>`
that themselves contain `<step>`'s.
2020-06-14 10:45:52 -07:00
Mathieu Boespflug
12a35dd0d0
Docbook: map <simplesect> to unnumbered section (#6436)
A <simplesect> is a section like any other, except that it never
contains an subsection, and is typically rendered unnumbered.
2020-06-14 10:40:00 -07:00
John MacFarlane
18ab864269 Moved tests/ -> test/. 2017-02-04 12:56:30 +01:00
Renamed from tests/docbook-reader.docbook (Browse further)