Commit graph

123 commits

Author SHA1 Message Date
Henry de Valence
0c5e7cf8cb HLint: use elem and notElem
Replaces long conditional chains with calls to `elem` and `notElem`.
2013-12-19 20:19:24 -05:00
John MacFarlane
def05d3504 HTML reader: Parse LaTeX math if appropriate options are set.
* Moved inlineMath, displayMath from Markdown reader to Parsing.
* Export them from Parsing.  (API change.)
* Generalize their types.
2013-12-06 17:15:13 -08:00
John MacFarlane
d5660275a3 Parsing: Generalized type of registerHeader, using new typeclasses.
New type classes HasReadeOptions, HasIdentifierList, HasHeaderMap.
These allow certain common functions to be reused even in parsers
that use custom state (instead of ParserState), such as the MediaWiki
reader.

Minor API bump.
2013-11-17 08:45:21 -08:00
John MacFarlane
6ed41fdfcc Factored out registerHeader from markdown reader, added to Parsing.
Text.Pandoc.Parsing now exports registerHeader, which can be
used in other readers.
2013-09-01 08:54:10 -07:00
John MacFarlane
af786829a0 Parsing: Added stateMeta' to ParserState. 2013-08-18 16:22:56 -07:00
John MacFarlane
12e7ec4070 Added Text.Pandoc.Compat.TagSoupEntity.
This allows pandoc to compile with tagsoup 0.13.x.
Thanks to Dirk Ullrich for the patch.
2013-08-08 10:42:52 -07:00
John MacFarlane
5050cff37c Removed comment that chokes recent cpp.
Closes #933.
2013-08-03 23:16:54 -07:00
John MacFarlane
e973bbbbc8 Markdown reader: Better error messages for yaml headers. 2013-07-02 09:23:43 -07:00
John MacFarlane
f869f7e08d Use new flexible metadata type.
* Depend on pandoc 1.12.
* Added yaml dependency.
* `Text.Pandoc.XML`: Removed `stripTags`.  (API change.)
* `Text.Pandoc.Shared`:  Added `metaToJSON`.
  This will be used in writers to create a JSON object for use
  in the templates from the pandoc metadata.
* Revised readers and writers to use the new Meta type.
* `Text.Pandoc.Options`: Added `Ext_yaml_title_block`.
* Markdown reader:  Added support for YAML metadata block.
  Note that it must come at the beginning of the document.
* `Text.Pandoc.Parsing.ParserState`:  Replace `stateTitle`,
  `stateAuthors`, `stateDate` with `stateMeta`.
* RST reader:  Improved metadata.
  Treat initial field list as metadata when standalone specified.
  Previously ALL fields "title", "author", "date" in field lists
  were treated as metadata, even if not at the beginning.
  Use `subtitle` metadata field for subtitle.
* `Text.Pandoc.Templates`:  Export `renderTemplate'` that takes a string
  instead of a compiled template..
* OPML template:  Use 'for' loop for authors.
* Org template: '#+TITLE:' is inserted before the title.
  Previously the writer did this.
2013-06-24 20:29:41 -07:00
John MacFarlane
a578a490ee Parsing: Generalized state type on readWith. 2013-06-24 20:27:36 -07:00
John MacFarlane
7cb8b60910 Parsing: Better error reporting in readWith.
- Specialize readWith to String input.
- On error have it print the line in which the error occurred,
  with a caret pointing to the column.
- This should help diagnose parsing problems in LaTeX especially.
2013-03-28 22:20:05 -07:00
John MacFarlane
ee0fc19bc5 Parsing: Further improvements to uri parser.
Don't treat punctuation before percent-encoding as final punctuation.
Don't treat '+' as final punctuation.
2013-03-28 11:33:01 -07:00
John MacFarlane
099b4b7769 Mediawiki: Fixed regression for <ref>URL</ref>.
`<` is no longer allowed in URLs, according to the uri parser
in Text.Pandoc.Parsing.

Added a test case.
2013-03-28 09:54:02 -07:00
John MacFarlane
07e8cedf2b Make implicit_header_references work with explicit header ids.
(Markdown reader.)
2013-02-21 19:53:35 -08:00
John MacFarlane
cc410a71b5 Allow & in emails (for entities).
Added tests for entities in titles and links.
Closes #723.
2013-02-15 23:02:17 -08:00
John MacFarlane
59764fa388 Parsing: uri, email: resolve entities.
A markdown link `<http://g&ouml;ogle.com>` should
be a link to http://göogle.com.
2013-02-15 22:39:49 -08:00
John MacFarlane
a6c167125f Optimized oneOfStringsCI.
The call to toLower in ciMatch was very expensive (and very often
used), because toLower from Data.Char calls a fully unicode
aware function.  This optimization avoids the call to toLower
for the most common, ASCII cases.  This dramatically reduces the
speed penalty that comes from enabling the `autolink_bare_uris`
extension.  The penalty is still substantial (in one test, from 0.33s
to 0.44s), but nowhere near what it used to be.
2013-02-02 18:46:10 -08:00
John MacFarlane
8c55023d18 Fixed latex macro parsing.
Now latex macro definitions are preserved when output is latex,
and applied when it is another format, as originally intended.

Partially addresses #730.
\providecommand is still not supported.  For this we need changes
to texmath.
2013-01-28 10:50:58 -08:00
John MacFarlane
f989ff2d5d Parsing: More improvements of anyLine parser. 2013-01-25 18:32:06 -08:00
John MacFarlane
d27dc6a420 More anyLine tweaks: Use incSourceLine. 2013-01-25 17:59:57 -08:00
John MacFarlane
0801b120b9 anyLine: Set position properly. 2013-01-25 17:53:50 -08:00
John MacFarlane
4c74b7aaab Parsing: Much faster new version of anyLine.
Not only faster but uses less memory.
2013-01-25 15:32:10 -08:00
John MacFarlane
c4b93bc3e7 Fixed bug in uri parser.
The bug prevented an autolink at the end of a string (e.g.
at the end of a line block line) from counting as a link.

Closes #711.
2013-01-20 20:23:50 -08:00
John MacFarlane
bf3a911a1c Changed Ext_autolink_urls -> Ext_autolink_bare_uris.
Added tests.
2013-01-15 12:44:50 -08:00
John MacFarlane
5971721ec1 Case-insensitive parsing of URI schemes. 2013-01-15 11:48:21 -08:00
John MacFarlane
95c02f6b57 Parsing: Improve oneOfStrings, export oneOfStringsCI.
oneOfStrings will now take the longest match it can in a
list of strings, so if 'foo' and 'foobar' are both included,
'foobar' will match even if 'foo' is first in the list.
2013-01-15 11:47:35 -08:00
John MacFarlane
e0e36ce543 Revised URI parser.
* It no longer uses Network.URIs URI parser, which is too restrictive
  (not allowing unicode URIs unless encoded).
* It allows many more schemes.
* It better handles punctuation so as to avoid capturing trailing
  punctuation in bare URLs.
2013-01-15 10:52:02 -08:00
John MacFarlane
51e0bd277a Parsing: Fixed uri -- escape unicode URLs.
Otherwise Network.URI.parseURI fails on e.g. Chinese
URLs.  Changed an incorrect test in markdown-reader-more.
2013-01-14 17:38:34 -08:00
John MacFarlane
127851ea61 Parsing: Simplified and improved singleQuoteStart.
This makes 's', 'l', etc. parse properly.
Formerly we had some English-centric heuristics, but they
are no longer needed now that we keep track of the last
'Str' position in state.

Closes #698.
2013-01-14 16:06:45 -08:00
John MacFarlane
0598cf0fee Moved lineBlockLines to Parsing.
This will be used by both RST and markdown readers.
2013-01-13 11:39:32 -08:00
John MacFarlane
cf4cd2ccb0 More improvements in emailAddress parser. 2013-01-09 21:32:42 -08:00
John MacFarlane
a71641a2a0 Made email parser more correct.
Now it's based on RFC 822, though it still doesn't implement
quoted strings in email addresses.
2013-01-09 17:19:32 -08:00
John MacFarlane
d599c4cdab Added Attr field to Header.
Previously header ids were autogenerated by the writers.
Now they are generated (unless supplied explicitly) in the
markdown parser, if the `header_identifiers` extension is
selected.

In addition, the textile reader now supports id attributes on
headers.
2013-01-09 09:30:05 -08:00
John MacFarlane
ef806f6a99 Markdown reader: Warn about duplicate link references. 2013-01-04 12:01:09 -08:00
John MacFarlane
7ef07ea3fc Added stateWarnings.
It is not connected to anything yet.
2013-01-03 20:52:51 -08:00
John MacFarlane
c435e9cda7 Implemented Ext_header_identifiers, Ext_implicit_header_references.
Now by default pandoc will act as if link references have been defined
for all headers.  So, you can do this:

    # My header

    Link to [My header].
    Another link to [it][My header].

Closes #691.
2013-01-03 20:35:01 -08:00
John MacFarlane
2695434113 Fixed bug in withRaw.
Didn't correctly handle case where nothing is parsed.
2012-12-13 19:04:01 -08:00
John MacFarlane
1b68dc3405 Revert "Added stateWarnings to ParserState, added warning function."
This reverts commit 5419b504ce.
2012-10-05 19:38:43 -07:00
John MacFarlane
5419b504ce Added stateWarnings to ParserState, added warning function.
This will be used to provide warnings for things like duplicate
footnote refs and link refs.
2012-10-05 19:25:26 -07:00
John MacFarlane
93e92a4716 Renamed removedLeadingTrailingSpace to trim.
Also removeLeadingSpace to triml,
removeTrailingSpace to trimr.
2012-09-29 17:09:34 -04:00
John MacFarlane
7633d51971 Parsing: Changed type of stateSubstitutions to use Inlines. 2012-09-27 16:44:49 -07:00
John MacFarlane
35662e14a9 Removed nullBlock.
Don't use nullBlock in Textile reader.  Better to know about parsing
problems than to skip stuff when we get stuck.
2012-09-27 16:06:29 -07:00
John MacFarlane
1be27ffb3a Added stateSubstitutions to ParserState, use for RST substitutions. 2012-09-27 15:20:29 -07:00
John MacFarlane
12045d84b6 Revert "More intelligent handling of text encodings."
This reverts commit 7272735b3d.
2012-09-23 22:53:34 -07:00
John MacFarlane
7272735b3d More intelligent handling of text encodings.
Previously, UTF-8 was enforced for both input and output.

The new system:

* For input, UTF-8 is tried first; if an error is raised, the
  locale encoding is tried.
* For output, the locale encoding is always used.
2012-09-23 22:12:21 -07:00
John MacFarlane
f67333696b Revert "Use local encoding for input/output rather than forcing UTF8."
This reverts commit c69837adb6.
2012-09-23 12:33:17 -07:00
John MacFarlane
c69837adb6 Use local encoding for input/output rather than forcing UTF8.
Note that system templates are stored as UTF8
and will still be read as such, even if the local encoding
is different.  Text downloaded from URLs will also be treated
as UTF-8.
2012-09-23 11:01:33 -07:00
John MacFarlane
167012daf7 Export 'nested' in Parsing. 2012-09-12 08:45:03 -07:00
John MacFarlane
58a096c058 Text.Pandoc.Parsing: Handle trailing slash in 'uri'. 2012-09-12 08:45:03 -07:00
John MacFarlane
7fc804ed22 Parsing: Generalized type of withQuoteContext. 2012-09-09 18:12:18 -07:00