Commit graph

969 commits

Author SHA1 Message Date
John MacFarlane
060a76a38e Textile reader: Improved treatment of HTML spans (%).
Closes #1115.
2014-04-05 20:41:38 -07:00
John MacFarlane
75dbe87a99 Removed whitespace at ends of lines. 2014-04-05 20:31:27 -07:00
John MacFarlane
f2deb9d86d Org reader: Added type signature. 2014-04-05 15:50:46 -07:00
John MacFarlane
971dca588e Merge pull request #1219 from tarleb/org-images
Org-reader: support inline images, clean-up code, fix bugs
2014-04-05 15:12:40 -07:00
Albert Krewinkel
652c781e37 Org reader: Support inline images 2014-04-05 16:15:53 +02:00
Albert Krewinkel
d76d2b707b Org reader: Provide more language identifier translations
Org-mode and Pandoc use different language identifiers, marking source
code as being written in a certain programming language.  This adds more
translations from identifiers as used in Org to identifiers used in
Pandoc.

The full list of identifiers used in Org and Pandoc is available through
http://orgmode.org/manual/Languages.html and `pandoc -v`, respectively.
2014-04-05 16:15:50 +02:00
Albert Krewinkel
fd98532784 Org reader: Fix parsing of nested inlines
Text such as /*this*/ was not correctly parsed as a strong, emphasised
word.  This was due to the end-of-word recognition being to strict as it
did not accept markup chars as part of a word.  The fix involves an
additional parser state field, listing the markup chars which might be
parsed as part of a word.
2014-04-05 16:14:40 +02:00
Albert Krewinkel
d43c3e8101 Org reader: Use specialized org parser state
The default pandoc ParserState is replaced with `OrgParserState`.  This
is done to simplify the introduction of new state fields required for
efficient Org parsing.
2014-04-05 16:14:09 +02:00
Albert Krewinkel
7cf7e45e4c Org reader: Slight cleaning of table parsing code 2014-04-05 16:13:18 +02:00
John MacFarlane
ee2e769cd7 DocBook reader: Better treatment of formalpara.
We now emit the title (if present) as a separate paragraph
with boldface text.

Closes #1215.
2014-04-04 22:06:25 -07:00
John MacFarlane
e06aa97bb9 DocBook reader: set metadata "author" not "authors" 2014-04-04 21:39:08 -07:00
John MacFarlane
c5a0c7a819 Removed trailing whitespace. 2014-04-04 21:34:12 -07:00
John MacFarlane
c99ecf6cc5 DocBook reader: set "author" not "authors". 2014-04-04 21:33:16 -07:00
Matthew Pickering
1b930a9670 Added recognition of authorgroup element and releaseinfo element to DocBook reader.
Closes #1214
2014-04-04 21:00:49 -07:00
Matthew Pickering
b1c865ccc6 Converted current meta information parsing in DocBook to a more extensible version which is aware of the more recent meta representation. 2014-04-04 21:00:29 -07:00
John MacFarlane
4ee92dce0c MediaWiki reader: Fixed bug in certain nested lists.
The bug: If a level 2 list was followed by a level 1 list, the first
item of the level 1 list would be lost.

Closes #1213.
2014-04-01 10:36:23 -07:00
John MacFarlane
1cadba16eb HTML reader: idiomatic rewriting for clarity. 2014-04-01 10:10:46 -07:00
Matthew Pickering
5a51a67abd Changed the smart punctuation parser to return Inlines rather than an Inline element and updated files accordingly 2014-04-01 13:53:34 +01:00
Matthew Pickering
9b5d474e79 Converted HTML reader to use builder. Fixes #1162. 2014-04-01 13:44:19 +01:00
Matthew Pickering
0ccca94b4c Bugfix for #1175 and convert textile reader to use builder.
The reader did not correctly parse inline markup. The behavoir is now as follows.

(a) The markup must start at the start of a line, be inside previous
inline markup or be preceeded by whitespace.
(b) The markup can not span across paragraphs (delimited by \n\n)
(c) The markup can not be followed by a alphanumeric character.
(d) Square brackets can be placed around the markup to avoid having
to have white space before it.

In order to make these changes it was either necessary to convert the parser to return a list of inlines or to convert the whole reader to use the builder. The latter approach whilst more work makes a bit more sense as it becomes easy to arbitarily append and prepend elements without changing the type.

Tests are accordingly updated in a later commit to reflect the different normalisation behavoir specified by the builder monoid.
2014-04-01 13:44:06 +01:00
John MacFarlane
69a7c9f634 LaTeX reader: Better handling of figure and table with caption.
We now look for a \caption inside the environment; if one is
found, it is attached to the graphic or tabular found there.

Closes #1204.
2014-03-25 23:10:43 -07:00
John MacFarlane
994597f071 Revert "LaTeX reader: Added LPState."
This reverts commit 82ddec698e.
2014-03-25 22:40:18 -07:00
John MacFarlane
82ddec698e LaTeX reader: Added LPState.
Plan is to use this instead of ParserState in LP.
2014-03-25 15:38:30 -07:00
John MacFarlane
6992050161 Parsing: Added HasMacros, simplified other typeclasses.
Removed updateHeaderMap, setHeaderMap, getHeaderMap,
updateIdentifierList, setIdentifierList, getIdentifierList.
2014-03-25 14:55:18 -07:00
John MacFarlane
08d1404b31 API changes to HasReaderOptions, HasHeaderMap, HasIdentifierList.
Previously these were typeclasses of monads.  They've been changed
to be typeclasses of states.  This ismplifies the instance definitions
and provides more flexibility.

This is an API change!  However, it should be backwards compatible
unless you're defining instances of HasReaderOptions, HasHeaderMap,
or HasIdentifierList.  The old getOption function should work as
before (albeit with a more general type).

The function askReaderOption has been removed.
extractReaderOptions has been added.
getOption has been given a default definition.

In HasHeaderMap, extractHeaderMap and updateHeaderMap have been added.
Default definitions have been given for getHeaderMap, putHeaderMap,
and modifyHeaderMap.

In HasIdentifierList, extractIdentifierList and updateIdentifierList
have been added.  Default definitions have been given for
getIdentifierList, putIdentifierList, and modifyIdentifierList.

The ultimate goal here is to allow different parsers to use their
own, tailored parser states (instead of ParserState) while still
using shared functions.
2014-03-25 13:43:34 -07:00
John MacFarlane
5e69f845d5 LaTeX reader: Better handling of "table" environment.
Positioning options no longer rendered verbatim.
Partially addresses #1204.
2014-03-25 12:04:25 -07:00
John MacFarlane
b9cc29e15a Merge pull request #1068 from jaimeMF/mw-images-langs
MediaWiki reader: Accept image links in more languages
2014-03-24 10:39:49 -07:00
John MacFarlane
dd058b38b0 Markdown reader: Fixed regression on line breaks in strict mode.
Closes #1203.
2014-03-24 09:56:16 -07:00
Albert Krewinkel
24b2ac43b0 Add a simple Emacs Org-mode reader
The basic structure of org-mode documents is recognized; however,
org-mode features like todo markers, tags etc. are not supported yet.
2014-03-04 10:40:40 +01:00
John MacFarlane
4d0bf3c5d6 Markdown reader: Improved parsing of nested divs.
Formerly a closing div tag would be missed if it came right
after other block-level tags.
2014-02-26 22:53:12 -08:00
John MacFarlane
a208a972c3 Markdown parser: avoid backtracking when closing </div> not found. 2014-02-26 22:46:38 -08:00
John MacFarlane
581075a0ca Markdown reader: small efficiency improvement.
Switched `notFollewdBy' rawHtmlBlocks` ->
`notFollowedBy' (htmlTag isBlockTag)`, which is more
efficient.
2014-02-26 22:24:50 -08:00
John MacFarlane
69f7b1dbf3 Added readerTrace to ReaderOptions, --trace command line opt.
This is to debug backtracking-related parsing bugs.
So far it is only implemented for markdown, but it would
be good to extend it to latex and html readers.
2014-02-25 22:43:58 -08:00
John MacFarlane
a826d3936d Fixed bug in reference link parsing in markdown_mmd.
The bug was triggered by:

Link to [Google][]. Link to [twitter][].

[Google]: http://google.com
[twitter]: http://twitter.com
2014-02-21 17:32:02 -08:00
John MacFarlane
f3a062d5f9 Make rst figures true figures. Closes #1168.
Thanks to CasperVector.
2014-02-19 09:11:15 -08:00
Merijn Verstraaten
fe246ce01c Enhanced Pandoc's support for rST roles.
rST parser now supports:
    - All built-in rST roles
    - New role definition
    - Role inheritance

Issues/TODO:
    - Silently ignores illegal fields on roles
    - Silently drops class annotations for roles
    - Only supports :format: fields with a single format for :raw: roles,
      requires a change to Text.Pandoc.Definition.Format to support multiple
      formats.
    - Allows direct use of :raw: role, rST only allows indirect (i.e.,
      inherited use of :raw:).
2014-02-15 17:51:33 +01:00
John MacFarlane
3127ab2b5e Slight code reorganization in endline. 2014-02-04 10:05:52 -08:00
John MacFarlane
9f3b2f6f5d Fixed mediawiki ordered list parsing.
Closes #1122.
2014-01-22 22:07:13 -08:00
John MacFarlane
6c59f060a7 HTML reader: Fixed bug reading inline math with $$.
See #225.
2014-01-20 11:09:44 -08:00
John MacFarlane
1bc9cb4105 Merge pull request #974 from merijn/master
Added support for LaTeX style literate Haskell code blocks in rST.
2014-01-16 12:32:44 -08:00
John MacFarlane
b9b1546ed2 Markdown parser: be more permissive about citation keys.
Keys may now start with an underscore as well as a letter.
Underscores do not count as internal punctuation, but are
treated like alphanumerics, so "key:_2008" will work, as
it did not before.  (This change was necessary to use keys
generated by zotero.)

Closes #1111, closes #1011.
2014-01-09 11:25:24 -08:00
John MacFarlane
d9eff99f27 Markdown reader: Allow hard line breaks in table cells.
The \-newline form must be used; the two-space+newline form
won't work, since in a table cell nearly every line ends with
two spaces.
2014-01-07 23:39:49 -08:00
John MacFarlane
f3ee82373b HTML reader: Parse name/content pairs from meta tags as metadata.
Closes #1106.
2014-01-01 09:22:37 -08:00
Henry de Valence
3d70059a48 HLint: use fromMaybe
Replace uses of `maybe x id` with `fromMaybe x`.
2013-12-19 21:07:09 -05:00
Henry de Valence
f6d151889c HLint: redundant parens
Remove parens enclosing a single element.
2013-12-19 20:43:25 -05:00
Henry de Valence
0c5e7cf8cb HLint: use elem and notElem
Replaces long conditional chains with calls to `elem` and `notElem`.
2013-12-19 20:19:24 -05:00
John MacFarlane
0132f6fcb7 LaTeX reader: Support babel-style quoting: ` "..."' ``. 2013-12-17 16:03:43 -08:00
John MacFarlane
826443926f Docbook reader: Avoid failure if tbody contains no tr or row elements. 2013-12-16 13:58:54 -08:00
John MacFarlane
2f00f5c7c2 Properly handle script blocks in strict mode.
(That is, markdown-markdown_in_html_blocks.)
Previously a spurious `<p>` tag was being added.

Closes #1093.
2013-12-15 12:27:29 -08:00
Jeff Arnold
5adbe7b365 LaTeX reader: add support for Verb macro 2013-12-13 19:16:04 -05:00