Commit graph

5 commits

Author SHA1 Message Date
John MacFarlane
df0eecfc0e More accurate benchmark for normalize. 2010-12-30 15:32:34 -08:00
John MacFarlane
904050fa36 New HTML reader using tagsoup as a lexer.
* The new reader is faster and more accurate.

* API changes for Text.Pandoc.Readers.HTML:
   - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
     anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
     htmlBlockElement, htmlComment
   - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag

* tagsoup is a new dependency.

* Text.Pandoc.Parsing: Generalized type on readWith.

* Benchmark.hs: Added length calculation to force full evaluation.

* Updated HTML reader tests.

* Updated markdown and textile readers to use the functions from
  the HTML reader.

* Note: The markdown reader now correctly handles some cases it did not
  before. For example:

    <hr/>

  is reproduced without adding a space.

    <script>
      a = '<b>';
    </script>

  is parsed correctly.
2010-12-30 13:55:40 -08:00
John MacFarlane
87429ef2f2 Added normalize benchmark to Benchmark.hs. 2010-12-25 14:07:26 -08:00
John MacFarlane
77cb199d45 Benchmark: use nf for writers.
whnf gives inaccurate results.
2010-12-12 23:24:02 -08:00
John MacFarlane
4c7f7853a7 Added Benchmark.hs, testing all readers + writers using criterion. 2010-12-10 23:35:31 -08:00