Lists are parsed in linear instead of exponential time now.
Contents of block tags, such as <quote>, is parsed directly,
without storing it in a string and parsing with parseFromString.
Fixed a bug: headers did not terminate lists.
Muse allows indentation to indicate quotation or alignment,
but only on the top level, not within a <quote> or list.
This patch also simplifies the code by removing museInQuote
and museInList fields from the state structure.
Headers and indented paragraphs are attempted to be parsed
only at the topmost level, instead of aborting parsing with guards.
Text::Amuse already explicitly requires it anyway.
Supporting block tags on the same line as contents makes
it hard to combine closing tag parsers with indentation parsers.
Being able to combine parsers is required for no-reparsing refactoring
of Muse reader.
Blocks following paragraphs are parsed only once at the top level.
Lists still take exponential time to parse, but this time is not
doubled anymore when this list terminates paragraph.
These are based off the reader tests, with some removed (where the
reader output was identical, based on different docx inputs). There
are still more to be added. In particular, tests for custom-styles
need to be added.
All golden docx files have been checked in MS Word
2013 (windows). There is no corruption.
There is questionable output in the `tables` test: the three tables
seemed to be joined. This will be addressed in a future commit, and
the golden docx file will be changed.
This will allow us to compare files directly in a golden test. Times
are still based on IO, but we will be able to safely skip those.
Changes:
- `getUniqueId` now calls to the state to get an incremented digit,
instead of calling to P.uniqueHash.
- we always start the PRNG in mkNumbering/mkAbstractNum with the same
seed (1848), so our randoms should be the same each time.
Comments from `--track-changes=all` were producing corrupt docx,
because the writer was trying to get id from the `(ID,_,_)` field of
the attributes, and ignoring the "id" entry in the key-value pairs. We
now check both.
There is a larger conversation to be had about the right way to treat
"id" and "class" entries in kvs, but this fix will correctly interpret
the output of the docx reader work.