Commit graph

18 commits

Author SHA1 Message Date
John MacFarlane
fcbe1e95eb Moved 'macro' and 'applyMacros'' from markdown reader to Parsing. 2011-01-04 19:12:33 -08:00
John MacFarlane
904050fa36 New HTML reader using tagsoup as a lexer.
* The new reader is faster and more accurate.

* API changes for Text.Pandoc.Readers.HTML:
   - removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
     anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
     htmlBlockElement, htmlComment
   - added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag

* tagsoup is a new dependency.

* Text.Pandoc.Parsing: Generalized type on readWith.

* Benchmark.hs: Added length calculation to force full evaluation.

* Updated HTML reader tests.

* Updated markdown and textile readers to use the functions from
  the HTML reader.

* Note: The markdown reader now correctly handles some cases it did not
  before. For example:

    <hr/>

  is reproduced without adding a space.

    <script>
      a = '<b>';
    </script>

  is parsed correctly.
2010-12-30 13:55:40 -08:00
John MacFarlane
10d85f8b0b Use functions from Text.Pandoc.Generic instead of processWith(M). 2010-12-24 13:39:27 -08:00
John MacFarlane
543aa28c38 Added new prettyprinting module.
* Added Text.Pandoc.Pretty.
  This is better suited for pandoc than the 'pretty' package.
  One advantage is that we now get proper wrapping; Emph [Inline]
  is no longer treated as a big unwrappable unit. Previously
  we only got breaks for spaces at the "outer level." We can also
  more easily avoid doubled blank lines.  Performance is
  significantly better as well.

* Removed Text.Pandoc.Blocks.
  Text.Pandoc.Pretty allows you to define blocks and concatenate
  them.

* Modified markdown, RST, org readers to use Text.Pandoc.Pretty
  instead of Text.PrettyPrint.HughesPJ.

* Text.Pandoc.Shared:  Added writerColumns to WriterOptions.

* Markdown, RST, Org writers now break text at writerColumns.

* Added --columns command-line option, which sets stColumns
  and writerColumns.

* Table parsing:  If the size of the header > stColumns,
  use the header size as 100% for purposes of calculating
  relative widths of columns.
2010-12-17 13:39:17 -08:00
John MacFarlane
5770ceca36 Removed HTML sanitization.
This is better done on the resulting HTML; use the xss-sanitize library
for this.  xss-sanitize is based on pandoc's sanitization, but improves
it.

- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
2010-12-10 12:26:03 -08:00
John MacFarlane
33ba35da9f Smart punctuation: recognize entities.
Now &ldquo;Hi&rdquo; gets parsed as a Quoted DoubleQuote inline.
2010-12-07 20:44:43 -08:00
John MacFarlane
ace3b80f1e Smart punctuation: don't alllow ellipses containing spaces.
Previously we allowed '. . .', ' . . . ', etc.  This caused
too many complications, and removed author's flexibility in
combining ellipses with spaces and periods.
2010-12-07 20:08:14 -08:00
John MacFarlane
50ca61ef49 Moved smartPunctuation from Markdown to Parsing.
+ Parameterized smartPunctuation on an inline parser.
+ Handle smartPunctuation in Textile reader.
2010-12-07 19:03:08 -08:00
John MacFarlane
5a4609584c Fix regression: markdown references should be case-insensitive.
This broke when we added the Key type.  We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.

* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
  only way the API provides to construct a Key.

Resolves Issue #272.
2010-12-05 19:27:00 -08:00
John MacFarlane
23c6f56bc5 Removed CITEPROC CPP conditionals from library code.
By Cabal policy, the API should not change depending on flags.
2010-11-06 14:58:54 -07:00
John MacFarlane
6b722d1b45 Process LaTeX macros in markdown, and apply to TeX math.
Example:
\newcommand{\plus}[2]{#1 + #2}

$\plus{3}{4}$

yields:

3+4
2010-10-26 09:03:03 -07:00
John MacFarlane
0b23956d48 Parse \chapter{} in latex.
+ Added stateHasChapters to ParserState.
+ If a \chapter command is encountered, this is set to True
  and subsequent \section commands (etc.) will be bumped up
  one level.
2010-07-13 19:18:58 -07:00
John MacFarlane
0181e66250 Merge branch 'atlists'. Added auto-numbered example lists. 2010-07-11 22:47:52 -07:00
John MacFarlane
7d687684aa Allow language-neutral table captions.
+ Captions may now begin simply with ':', instead of 'Table:'
+ Captions may now appear either above or below the table.
+ Resolves Issue #227.
2010-07-06 21:02:26 -07:00
John MacFarlane
6a8fa53f6c More refactoring of grid table code. 2010-07-05 23:43:07 -07:00
John MacFarlane
ba19dff8af Minor reformatting. 2010-07-05 20:41:42 -07:00
John MacFarlane
869946114e Moved generic grid table functions from RST reader -> Parsing.
Here they can be used by the Markdown reader as well.
2010-07-05 14:34:48 -07:00
John MacFarlane
998fd098d0 Moved parsing functions from Text.Pandoc.Shared to new module.
+ Text.Pandoc.Parsing
2010-07-05 00:06:27 -07:00