Commit graph

229 commits

Author SHA1 Message Date
John MacFarlane
328655e863 Tracing: give less misleading line information with parseWithString.
Previously positions would be reported past the end of the chunk.
We now reset the source position within the chunk and report
positions "in chunk."
2017-06-19 22:41:09 +02:00
Herwig Stuetz
5a71632d11 Parsing: many1Till: Check for the end condition before parsing
By not checking for the end condition before the first parse, the
parser was applied too often, consuming too much of the input.

This fixes the behaviour of

  `testStringWith (many1Till (oneOf "ab") (string "aa")) "aaa"`

which before incorrectly returned `Right "a"`. With this change, it
instead correctly fails with `Left (PandocParsecError ...)` because it
is not able to parse at least one occurence of `oneOf "ab"` that is
not `"aa"`.

Note that this only affects `many1Till p end` where `p` matches on a
prefix of `end`.
2017-05-28 18:08:11 +02:00
John MacFarlane
8f2c803f97 Markdown reader: warn for notes defined but not used.
Closes #1718.

Parsing.ParserState: Make stateNotes' a Map, add stateNoteRefs.
2017-05-25 11:34:51 +02:00
John MacFarlane
bc6aac7b47 Parsing: Provide parseFromString'.
This is a verison of parseFromString specialied to
ParserState, which resets stateLastStrPos at the end.
This is almost always what we want.

This fixes a bug where `_hi_` wasn't treated as emphasis in
the following, because pandoc got confused about the
position of the last word:

    - [o] _hi_

Closes #3690.
2017-05-24 22:41:47 +02:00
Albert Krewinkel
5debb0da0f Shared: Provide custom isURI that rejects unknown schemes [isURI]
We also export the set of known `schemes`.

The new function replaces the function of the same name
from `Network.URI`, as the latter did not check whether a scheme is
well-known.  E.g. MediaWiki wikis frequently feature pages with names
like `User:John`. These links were interpreted as URIs, thus turning
internal links into global links. This is prevented by also checking
whether the scheme of a URI is frequently used (i.e. is IANA registered
or an otherwise well-known scheme).

Fixes: #2713

Update set of well-known URIs from IANA list
All official IANA schemes (as of 2017-05-22) are included in the set of
known schemes.  The four non-official schemes doi, isbn, javascript, and
pmid are kept.
2017-05-23 09:48:11 +02:00
Alexander Krotov
30a3deadcc Move indentWith to Text.Pandoc.Parsing (#3687) 2017-05-22 10:10:15 +02:00
John MacFarlane
377733e08f Merge pull request #3677 from labdsf/anylinenewline
Move anyLineNewline to Parsing.hs
2017-05-17 12:47:03 +02:00
Alexander Krotov
55ce47d050 Move anyLineNewline to Parsing.hs 2017-05-17 11:02:38 +03:00
Albert Krewinkel
9d295f4527
Parsing: add insertIncludedFilesF which returns F blocks
The `insertIncludeFiles` function was generalized and renamed to
`insertIncludedFiles'`; the specialized versions are based on that.
2017-05-14 12:40:16 +02:00
Albert Krewinkel
5ff6108b4c
Parsing: introduce HasIncludeFiles type class
The `insertIncludeFile` function is generalized to work with all parser
states which are instances of that class.
2017-05-14 10:00:58 +02:00
Albert Krewinkel
7a17c3eb9f
Parsing: replace partial with total function
Calling `tail` on an empty list raises an exception, while calling the
otherwise equivalent `drop 1` will return the empty list again.
2017-05-14 09:28:08 +02:00
Albert Krewinkel
965f1ddd4a
Update dates in copyright notices
This follows the suggestions given by the FSF for GPL licensed software.
<https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-13 23:30:13 +02:00
Albert Krewinkel
4b9fb7a128 Combine grid table parsers
The grid table parsers for markdown and rst was combined into one single
parser, slightly changing parsing behavior of both parsers:

- The markdown parser now compactifies block content cell-wise: pure
  text blocks in cells are now treated as paragraphs only if the cell
  contains multiple paragraphs, and as plain blocks otherwise. Before,
  this was true only for single-column tables.

- The rst parser now accepts newlines and multiple blocks in header
  cells.

Closes: #3638
2017-05-11 00:17:56 +02:00
Albert Krewinkel
df23d96c89
Generalize tableWith, gridTableWith
The parsing functions `tableWith` and `gridTableWith` are generalized to
work with more parsers. The parser state only has to be an instance of
the `HasOptions` class instead of requiring a concrete type. Block
parsers are required to return blocks wrapped into a monad, as this
makes it possible to use parsers returning results wrapped in `Future`s.
2017-05-02 23:41:45 +02:00
Albert Krewinkel
31caa616a9 Provide shared F monad functions for Markdown and Org readers
The `F` monads used for delayed evaluation of certain values in the
Markdown and Org readers are based on a shared data type capturing the
common pattern of both `F` types.
2017-04-30 10:59:20 +02:00
John MacFarlane
bcc848d773 Avoid parsing "Notes:**" as a bare URI.
This avoids parsing bare URIs that start with a scheme
+ colon + `*`, `_`, or `]`.

Closes #3570.
2017-04-15 13:32:28 +02:00
John MacFarlane
6bf3f89d69 Better handling of \part in LaTeX.
Closes #1905.

Removed stateChapters from ParserState.

Now we parse chapters as level 0 headers, and parts as level -1 headers.
After parsing, we check for the lowest header level, and if it's
less than 1 we bump everything up so that 1 is the lowest header level.
So `\part` will always produce a header; no command-line options
are needed.
2017-03-13 22:11:10 +01:00
John MacFarlane
2f8f8f0da6 Issue warning for duplicate header identifiers.
As noted in the previous commit, an autogenerated identifier
may still coincide with an explicit identifier that is given
for a header later in the document, or with an identifier on
a div, span, link, or image. This commit adds a warning
in this case, so users can supply an explicit identifier.

* Added `DuplicateIdentifier` to LogMessage.
* Modified HTML, Org, MediaWiki readers so their custom
  state type is an instance of HasLogMessages.  This is necessary
  for `registerHeader` to issue warnings.

See #1745.
2017-03-12 22:07:28 +01:00
John MacFarlane
c8b906256d Improved behavior of auto_identifiers when there are explicit ids.
Previously only autogenerated ids were added to the list
of header identifiers in state, so explicit ids weren't taken
into account when generating unique identifiers.  Duplicated
identifiers could result.

This simple fix ensures that explicitly given identifiers are
also taken into account.

Fixes #1745.

Note some limitations, however.  An autogenerated identifier
may still coincide with an explicit identifier that is given
for a header later in the document, or with an identifier on
a div, span, link, or image.  Fixing this would be much more
difficult, because we need to run `registerHeader` before
we have the complete parse tree (so we can't get a complete
list of identifiers from the document by walking the tree).

However, it might be worth issuing warnings for duplicate
header identifiers; I think we can do that.  It is not
common for headers to have the same text, and the issue
can always be worked around by adding explicit identifiers,
if the user is aware of it.
2017-03-12 21:30:04 +01:00
John MacFarlane
21ae5db20c Use pMacroDefinition in macro (for more direct parsing).
This is newly exported in texmath 0.9.3.

Note that this means that `macro` will now parse one
macro at a time, rather than parsing a whole group together.
2017-03-10 10:12:51 +01:00
John MacFarlane
fb47d1d909 RST reader: support RST-style citations.
The citations appear at the end of the document as a definition
list in a special div with id `citations`.

Citations link to the definitions.

Added stateCitations to ParserState.

Closes #853.
2017-03-03 22:23:01 +01:00
John MacFarlane
b246e8e9af Revert "Refined constraint for HasQuoteContext instance."
This reverts commit 3c427fc17d.
2017-02-20 15:44:33 +01:00
John MacFarlane
3c427fc17d Refined constraint for HasQuoteContext instance.
in hopes that this will help the ghc 7.8.4 build...
2017-02-20 15:25:47 +01:00
John MacFarlane
3ccf3fad2c Removed redundant constraint. 2017-02-20 15:12:20 +01:00
John MacFarlane
38daf9de68 Parsing: Added HasLogMessages, logMessage, reportLogMessages.
We need to do logging by updating parser state, or we'll get
inappropriate and repeated log messages when there is parser
backtracking.

See #3447.
2017-02-17 12:56:07 +01:00
John MacFarlane
76c55466d3 Use new warnings throughout the code base. 2017-02-11 00:14:44 +01:00
John MacFarlane
857d35fce4 Refactored some files formerly in LaTeX reader.
* Export readFileFromDirs from Class.
* Export insertIncludedFile from Parsing.

Simplified code in LaTeX/RST readers.
2017-02-07 22:33:05 +01:00
John MacFarlane
1ecc48e9f9 Moved readFileFromDirs to Text.Pandoc.Class.
This can be used in several different modules, not just
LaTeX reader.
2017-02-07 21:42:35 +01:00
John MacFarlane
5156a4fe3c Shared: rename compactify', compactify'DL -> compactify, compactifyDL. 2017-01-27 21:36:45 +01:00
John MacFarlane
56f74cb0ab Removed Shared.compactify.
Changed signatures on Parsing.tableWith and Parsing.gridTableWith.
2017-01-27 21:30:35 +01:00
John MacFarlane
5bf9125770 Removed readerOldDashes and --old-dashes option, added old_dashes extension.
API change.  CLI option change.
2017-01-25 17:07:42 +01:00
John MacFarlane
6f8b967d98 Removed readerSmart and the --smart option; added Ext_smart extension.
Now you will need to do

    -f markdown+smart

instead of

    -f markdown --smart

This change opens the way for writers, in addition to readers,
to be sensitive to +smart, but this change hasn't yet been made.

API change. Command-line option change.

Updated manual.
2017-01-25 17:07:42 +01:00
John MacFarlane
3876b91448 Make Extensions a custom type instead of a Set Extension.
The type is implemented in terms of an underlying bitset
which should be more efficient.

API change: from Text.Pandoc.Extensions export Extensions,
emptyExtensions, extensionsFromList, enableExtension, disableExtension,
extensionEnabled.
2017-01-25 17:07:42 +01:00
John MacFarlane
1a0d93a1d3 LaTeX reader: Proper include file processing.
* Removed handleIncludes from LaTeX reader [API change].
* Now the ordinary LaTeX reader handles includes in a way
  that is appropriate to the monad it is run in.
2017-01-25 17:07:40 +01:00
John MacFarlane
38064498d9 Parsing: Removed obsolete warnings stuff.
Removed stateWarnings, addWarning, and readWithWarnings.
2017-01-25 17:07:40 +01:00
Jesse Rosenthal
912eee362b Remove OverlappingInstances pragma.
It doesn't help to solve the problem in 7.8.
2017-01-25 17:07:40 +01:00
Jesse Rosenthal
e35c6c9e4d Try adding OverlappingInstances pragma to parsing.
It's having trouble figuring out HasQuoteContext.
2017-01-25 17:07:40 +01:00
Jesse Rosenthal
3574b98f81 Unify Errors. 2017-01-25 17:07:40 +01:00
Jesse Rosenthal
840439ab2a Add IncoherentInstances pragma for HasQuotedContext.
We can remove this if we can figure out a better way to do this.
2017-01-25 17:07:40 +01:00
John MacFarlane
bf72a482eb Tighten up parsing of raw email addresses.
Technically `**@user` is a valid email address, but if we
allow things like this, we get bad results in markdown flavors
that autolink raw email addresses. (See #2940.)
So we exclude a few valid email addresses in order to
avoid these more common bad cases.

Closes #2940.
2016-10-23 23:12:36 +02:00
Albert Krewinkel
3e60ed9c03
Allow empty lines when parsing line blocks
Line blocks are allowed to contain empty lines and should be parsed as a
single block in that case.  Previously an empty (line block) line would
have terminated parsing of the line block element.
2016-10-13 08:46:44 +02:00
Jesse Rosenthal
3f8d3d844f Remove TagSoup compat
We already lower-bound tagsoup at 0.13.7, which means we were always
running the compatibility layer (it was conditional on min value
0.13). Better to just use `lookupEntity` from the library directly, and
convert a string to a char if need be.
2016-09-02 12:28:53 -04:00
Jesse Rosenthal
45c7108b4f Remove Compat.Monoid
This was only necessary for GHC versions with base below 4.5
(i.e., ghc < 7.4).
2016-09-02 09:18:08 -04:00
John MacFarlane
17defd5004 Use liftM since otherwise Functor type constraint needen in ghc 7.8. 2016-07-15 12:02:37 -07:00
John MacFarlane
2f54de7cc4 Fixed compiler warnings. 2016-07-14 23:38:44 -07:00
John MacFarlane
499985c1a3 Updated copyright dates to include 2016. 2016-03-22 17:20:39 -07:00
John MacFarlane
20170c328f Changed type of Shared.uniqueIdent argument from [String] to Set String.
This avoids performance problems in documents with many identically
named headers.

Closes #2671.
2016-01-22 10:16:47 -08:00
John MacFarlane
5884ff6994 Work around tagsoup bug - not allowing uppercase x in hex entities.
Issue submitted at tagsoup.
2016-01-08 17:33:32 -08:00
John MacFarlane
12a5bd3c8d Entity handling fixes:
- Text.Pandoc.XML.fromEntities:  handle entities without a
  semicolon. Always lookup character references with the
  trailing ';', even if it wasn't present.  And never add
  it when looking up numerical entities.  (This is what
  tagsoup seems to require.)
- Text.Pandoc.Parsing.characterReference:  Always lookup
  character references with the trailing ';', and leave off
  the ';' when looking up numerical entities.

This fixes a regression for e.g. `&lang;`.
2016-01-08 17:08:01 -08:00
John MacFarlane
28a2f4c2a4 Fixed cite key parsing regression.
We were capturing final colons as in [@foo: bar];
the citation id was being parsed as "@foo:".

Closes jgm/pandoc-citeproc#201.
2015-12-12 00:27:08 -08:00