Commit graph

125 commits

Author SHA1 Message Date
John MacFarlane
58a096c058 Text.Pandoc.Parsing: Handle trailing slash in 'uri'. 2012-09-12 08:45:03 -07:00
John MacFarlane
7fc804ed22 Parsing: Generalized type of withQuoteContext. 2012-09-09 18:12:18 -07:00
John MacFarlane
dfa4b76630 Changes to literate haskell options.
- Removed writerLiterateHaskell from WriterOptions.
- Removed readerLiterateHaskell from ReaderOptions.
- Added Ext_literate_haskell to Extensions.  Test for this
  instead of the above.
- Removed failUnlessLHS from Shared.

Note:  At this point, +lhs and .lhs extension no longer has any effect.
Need to fix.
2012-08-08 23:18:19 -07:00
John MacFarlane
33fd791ea1 Made F a newtype, moved definitions to Parser.
Parser now exports F(..), askF, asksF, runF.
2012-08-02 17:12:20 -07:00
John MacFarlane
a1677b612b Parsing: removed duplication of Key and Key'.
Now we just use the former Key' (string contents),
renamed Key.  lookupKeySrc and fromKey are no longer
eport.  Key', toKey' and KeyTable' have become Key,
toKey, and KeyTable.
2012-08-01 22:40:07 -07:00
John MacFarlane
fadc7b0d87 Major rewrite of markdown reader.
* Use Builder's Inlines/Blocks instead of lists.

* Return values in the reader monad, which are then
  run (at the end of parsing) against the final
  parser state.  This allows links, notes, and
  example numbers to be resolved without a second
  parser pass.

* An effect of using Builder is that everything is
  normalized automatically.

* New exports from Text.Pandoc.Parsing:
  widthsFromIndices, NoteTable', KeyTable', Key', toKey',
  withQuoteContext, singleQuoteStart, singleQuoteEnd, doubleQuoteStart,
  doubleQuoteEnd, ellipses, apostrophe, dash

* Updated opendocument tests.

* Don't derive Show for ParserState.

* Benchmarks:  markdown reader takes 82% of the time it took before.
  Markdown writer takes 92% of the time (here the speedup is probably
  due to the fact that everything is normalized by default).
2012-08-01 21:45:40 -07:00
John MacFarlane
973c7ecacf Removed commented-out pandoc2 code.
This will be developed in a branch, noreparsing.
2012-07-27 21:04:38 -07:00
John MacFarlane
c76ef95308 Parser: Changed types to use type alias Parser, not Parsec. 2012-07-27 20:50:03 -07:00
John MacFarlane
6d7f0a1b81 Fixed whitespace errors. 2012-07-26 22:32:53 -07:00
John MacFarlane
14c911ba06 Parsing: Removed failIfStrict. 2012-07-26 22:20:44 -07:00
John MacFarlane
5186da929d Parsing: Added guardEnabled, guardDisabled. 2012-07-26 19:10:56 -07:00
John MacFarlane
2654da3823 Moved stateApplyMacros, stateIndentedCodeClasses to ReaderOptions. 2012-07-25 22:05:06 -07:00
John MacFarlane
070b968ae0 stateCitations -> readerCitations. 2012-07-25 22:05:06 -07:00
John MacFarlane
856aa8c244 Moved stateLiterateHaskell to readerLiterateHaskell in Options. 2012-07-25 22:05:06 -07:00
John MacFarlane
1dba82f25e Got rid of stateStandalone, which was hardly used anyway.
The only possible effect will be with rst fragments that
begin with an rst title block, which will now cause the
header transform.
2012-07-25 20:08:42 -07:00
John MacFarlane
95570ba34c Moved stateOldDashes to readerOldDashes in ReaderOptions. 2012-07-25 12:37:04 -07:00
John MacFarlane
335cd5de4d Moved stateTabStop to readerTabStop in ReaderOptions. 2012-07-25 12:31:16 -07:00
John MacFarlane
0d4424c21c Moved stateColumns to readerColumns in ReaderOptions. 2012-07-25 11:51:33 -07:00
John MacFarlane
ef0619cc6d Moved ParseRaw from ParserState to ReaderOptions. 2012-07-25 11:43:56 -07:00
John MacFarlane
8b380a464e Text.Pandoc.Parsing: Added getOption. 2012-07-25 11:27:25 -07:00
John MacFarlane
dfa19061ab Options -> ReaderOptions.
Better to keep reader and writer options separate.
2012-07-25 11:08:06 -07:00
John MacFarlane
da3702357d Put smart, strict in separate options field in state.
This is the beginning of a larger transition that will make
Options, not ParserState, the parameter of the read functions.
(Options will also be used in writers, in place of WriterOptions.)

Next step is to remove strict, replacing it with granular
tests for different extensions.
2012-07-25 10:45:45 -07:00
John MacFarlane
fbd3d2b450 Better algorithm for oneOfStrings.
This goes character by character, not backtracking.
2012-07-24 22:45:22 -07:00
John MacFarlane
bab816cefe Refactored table parsers, captions now not part of core tableWith. 2012-07-24 09:06:13 -07:00
John MacFarlane
d2cc56a46a Revised code for pipe tables.
* All tables now require at least one body row.
* Renamed from 'extra' to 'pipe' tables.
* Moved functions from Parsing to Readers.Markdown.
* Cleaned up code; revised to parse in one pass rather than
  parsing a raw string, splitting it, and parsing the components.
* Allow pipe tables without pipes on the ends (as PHP Markdown Extra
  does).
2012-07-22 22:09:15 -07:00
John MacFarlane
511f5e891d Merge pull request #510 from mytskine/markdown-extra
Markdown extra tables [part of the multi-markdown syntax for tables]
2012-07-22 18:40:18 -07:00
John MacFarlane
2c30c48757 Use Parser as type synonym for Parsec. 2012-07-20 15:54:57 -07:00
John MacFarlane
5085962c28 Text.Pandoc.Parsing: Export all Parsec functions used in pandoc code.
No other module directly imports Parsec.  This will make it easier
to change the parsing backend in the future, if we want to.
2012-07-20 14:41:44 -07:00
John MacFarlane
a4c28ead79 Use Text.Parsec instead of Text.ParserCombinators.Parsec. 2012-07-20 14:19:06 -07:00
John MacFarlane
2351f7a112 Provide Data.Default instances for ParserState and WriterOptions.
Now you can use def (which is re-exported by Text.Pandoc) instead of
defaultParserState or defaultWriterOptions.  For now, these
are still defined too, so existing code need not change.

Closes #546.
2012-07-19 12:38:54 -07:00
John MacFarlane
9d5230c0f6 Changed macro parser so it returns raw macro if stateApplyMacros false.
Closes #554.
2012-06-29 18:30:53 -07:00
paul.rivier
7b111542c0 textile reader improvements : better conformance to RedCloth Textile inlines 2012-04-24 15:56:59 +02:00
Greg Maslov
618dc294f9 Add parsing support for the rST default-role directive. 2012-03-24 21:48:54 -04:00
François Gannaz
a922bd6d8e Added support for markdown-extra tables in the markdown parser
Only tables whose lines begin with a "|" are supported.
There are 2 warnings about unused variables when compiling.
2012-02-21 22:00:10 +01:00
John MacFarlane
7a602d222f Limit nesting of strong/emph.
This avoids exponential lookahead in parasitic cases, like
a**a*a**a*a**a*a**a*a**a*a**a*a**a*a**.

Added stateMaxNestingLevel to ParserState.

We set this to 6, so you can still have Emph inside Emph, just not
indefinitely.
2012-02-07 22:46:41 -08:00
John MacFarlane
521e90e839 Parsing: Make characterReference fail if entity not found. 2012-02-05 23:01:35 -08:00
John MacFarlane
e2c157f86f Removed module Text.Pandoc.CharacterReferences.
Moved characterReference parser to Text.Pandoc.Parsing.
decodeCharacterReferences is now replaced by fromEntities
in Text.Pandoc.XML.
2012-02-05 22:52:00 -08:00
John MacFarlane
75485c2f11 Complete rewrite of LaTeX reader.
* The new reader is more robust, accurate, and extensible.
  It is still quite incomplete, but it should be easier
  now to add features.

* Text.Pandoc.Parsing: Added withRaw combinator.

* Markdown reader: do escapedChar before raw latex inline.
  Otherwise we capture commands like \{.

* Fixed latex citation tests for new citeproc.

* Handle \include{} commands in latex.
  This is done in pandoc.hs, not the (pure) latex reader.
  But the reader exports the needed function, handleIncludes.

* Moved err and warn from pandoc.hs to Shared.

* Fixed tests - raw tex should sometimes have trailing space.

* Updated lhs-test for highlighting-kate changes.
2012-02-04 09:56:43 -08:00
John MacFarlane
ff93a8e789 Fixed table parsing with wide or combining characters.
Closes #348.  Closes #108.
2012-01-27 00:39:00 -08:00
John MacFarlane
da8425598a New treatment of dashes in --smart mode.
* `---` is always em-dash, `--` is always en-dash.
* pandoc no longer tries to guess when `-` should be en-dash.
* A new option, `--old-dashes`, is provided for legacy documents.

Rationale: The rules for en-dash are too complex and
language-dependent for a guesser to work reliably.  This
change gives users greater control.  The alternative of
using unicode isn't very good, since unicode em- and en-
dashes are barely distinguishable in a monospace font.
2012-01-01 13:48:28 -08:00
John MacFarlane
925a4c5164 Better smart quote parsing.
* Added stateLastStrPos to ParserState. This lets us keep track
  of whether we're parsing the position immediately after a 'str'.
  If we encounter a ' in such a location, it must be an apostrophe,
  and can't be a single quote start.

* Set this in the markdown, textile, html, and rst str parsers.

* Closes #360.
2011-12-29 23:44:12 -08:00
John MacFarlane
a579e2c892 Replaced Apostrophe, Ellipses, EmDash, EnDash w/ unicode strings. 2011-12-27 15:45:34 -08:00
John MacFarlane
8f1da35917 Pretty: return Str with unicode instead of Apostrophe. 2011-12-27 11:01:10 -08:00
John MacFarlane
fa255f68ba Parsing: Removed charsInBalanced', added param to charsInBalanced.
The extra parameter is a character parser.  This is needed for
proper handling of escapes, etc.
2011-12-05 20:54:46 -08:00
John MacFarlane
7b971517b0 Parsing: Changed type of escaped to return Char 2011-12-05 20:22:27 -08:00
John MacFarlane
2d14c9b436 Added nonspaceChar to Text.Pandoc.Parsing. 2011-07-30 18:08:02 -07:00
John MacFarlane
0f0c1579f8 Smart quotes: handle '...hi' properly.
Also added test case.
2011-07-25 23:49:45 -07:00
John MacFarlane
6424e7d02c Properly handle characters in the 128..159 range.
These aren't valid in HTML, but many HTML files produced by
Windows tools contain them.  We substitute correct unicode
characters.
2011-07-23 12:43:01 -07:00
John MacFarlane
9ff589359f Revert "Parsing: Use new type aliases, PandocParser, GeneralParser."
This reverts commit ec5410bc4e.
2011-04-29 11:34:36 -07:00
John MacFarlane
ec5410bc4e Parsing: Use new type aliases, PandocParser, GeneralParser.
This should make it easier to change the types later.
2011-04-29 11:32:24 -07:00