Commit graph

358 commits

Author SHA1 Message Date
John MacFarlane
9849ba7fd7 Use Control.Monad.State.Strict throughout.
This gives 20-30% speedup and reduction of memory
usage in most of the writers.
2017-06-17 07:45:28 +02:00
John MacFarlane
fa719d0264 Switched Writer types to use Text.
* XML.toEntities: changed type to Text -> Text.
* Shared.tabFilter -- fixed so it strips out CRs as before.
* Modified writers to take Text.
* Updated tests, benchmarks, trypandoc.

[API change]

Closes #3731.
2017-06-11 00:46:31 +02:00
John MacFarlane
72b45f05ed Rewrote convertTabs to use Text not String. 2017-06-10 15:22:25 +02:00
John MacFarlane
774075c3e2 Added eastAsianLineBreakFilter to Shared.
This used to live in the Markdown reader.
2017-05-30 10:22:48 +02:00
John MacFarlane
66fa38ed1c Shared.isURI: allow uppercase versions of known schemes. 2017-05-23 09:49:56 +02:00
Albert Krewinkel
5debb0da0f Shared: Provide custom isURI that rejects unknown schemes [isURI]
We also export the set of known `schemes`.

The new function replaces the function of the same name
from `Network.URI`, as the latter did not check whether a scheme is
well-known.  E.g. MediaWiki wikis frequently feature pages with names
like `User:John`. These links were interpreted as URIs, thus turning
internal links into global links. This is prevented by also checking
whether the scheme of a URI is frequently used (i.e. is IANA registered
or an otherwise well-known scheme).

Fixes: #2713

Update set of well-known URIs from IANA list
All official IANA schemes (as of 2017-05-22) are included in the set of
known schemes.  The four non-official schemes doi, isbn, javascript, and
pmid are kept.
2017-05-23 09:48:11 +02:00
Albert Krewinkel
965f1ddd4a
Update dates in copyright notices
This follows the suggestions given by the FSF for GPL licensed software.
<https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-13 23:30:13 +02:00
John MacFarlane
99be906101 Added PandocHttpException, trap exceptions in fetching from URLs.
Closes #3646.
2017-05-07 13:11:04 +02:00
John MacFarlane
d414b2543a Remove https flag.
Supporting two completely different libraries for fetching
from URLs makes it difficult to trap errors, because of
different error types expected from the libraries.

There's no clear reason not to build with these https-capable
libraires.
2017-05-07 12:49:25 +02:00
John MacFarlane
1fe1c162ac Error: Added PandocCouldNotFindDataFileError.
Use this instead of PandocAppError when appropriate.
Removed exit code from PandocAppError, use 1 for all.
2017-04-15 12:05:58 +02:00
John MacFarlane
913db947a9 Text.Pandoc.App: Throw errors rather than exiting.
These are caught (and lead to exit) in pandoc.hs, but
other uses of Text.Pandoc.App may want to recover in another
way.

Added PandocAppError to PandocError (API change).
This is a stopgap:  later we should have a separate constructor
for each type of error.

Also fixed uses of 'exit' in Shared.readDataFile, and
removed 'err' from Shared (API change).

Finally, removed the dependency on extensible-exceptions.

See #3548.
2017-04-02 23:04:48 +02:00
John MacFarlane
3765f08304 Revert "Shared: export extractIds."
This reverts commit 0ef1e51211.
2017-03-12 21:18:19 +01:00
John MacFarlane
0ef1e51211 Shared: export extractIds.
This will be used to help with #1745.
2017-03-12 12:42:03 +01:00
John MacFarlane
ba78b75146 Removed normalizeSpaces from Text.Pandoc.Shared.
Rewrote functions in RST reader and writer to avoid the need
for it.

Closes #1530.
2017-03-10 20:45:21 +01:00
John MacFarlane
9862d7c359 Shared.normalizeSpaces: strip off leading/trailing line breaks...
...not just spaces.
2017-03-10 20:33:14 +01:00
John MacFarlane
72af7b4ee5 Shared: remove 'warn'.
PDF writer: Use 'report' instead of 'warn', make it sensitive
to verbosity settings.
2017-02-24 14:29:56 +01:00
John MacFarlane
4a9069130f Shared.openURL: Changed type from an Either.
Now it will just raise an exception to be trapped later.
2017-02-23 16:21:03 +01:00
Alexander Krotov
a58112f6bc Simplify toRomanNumeral using guards (#3445) 2017-02-14 23:00:23 +01:00
Thenaesh Elango
942189056d Allow user to specify User-Agent (#3421)
This commit enables users to specify the User-Agent
header used when pandoc requests a document from
a URL. This is done by setting an environment variable.
For instance, one can do:
USER_AGENT="..." ./pandoc -f html -t markdown http://example.com

Signed-off-by: Thenaesh Elango <thenaeshelango@gmail.com>
2017-02-05 11:28:39 +01:00
John MacFarlane
5156a4fe3c Shared: rename compactify', compactify'DL -> compactify, compactifyDL. 2017-01-27 21:36:45 +01:00
John MacFarlane
56f74cb0ab Removed Shared.compactify.
Changed signatures on Parsing.tableWith and Parsing.gridTableWith.
2017-01-27 21:30:35 +01:00
John MacFarlane
4007d6a897 Removed writerIgnoreNotes.
Instead, just temporarily remove notes when generating
TOC lists in HTML and Markdown (as we already did in LaTeX).

Also export deNote from Text.Pandoc.Shared.

API change in Shared and Options.WriterOptions.
2017-01-25 17:07:42 +01:00
John MacFarlane
2d04922cd0 Factored out deNote in Shared. 2017-01-25 17:07:42 +01:00
John MacFarlane
6aff97e4e1 Text.Pandoc.Shared: Removed fetchItem, fetchItem'.
Made changes where these are used, so that the version
of fetchItem from PandocMonad can be used instead.
2017-01-25 17:07:42 +01:00
John MacFarlane
00240ca7ed Removed hush from Text.Pandoc.Shared.
Not used anywhere.
2017-01-25 17:07:41 +01:00
John MacFarlane
8165014df6 Removed --normalize option and normalization functions from Shared.
* Removed normalize, normalizeInlines, normalizeBlocks
  from Text.Pandoc.Shared.  These shouldn't now be necessary,
  since normalization is handled automatically by the Builder
  monoid instance.

* Remove `--normalize` command-line option.

* Don't use normalize in tests.

* A few revisions to readers so they work well without normalize.
2017-01-25 17:07:41 +01:00
John MacFarlane
2b24c6ff3a Shared: put err into MonadIO. 2017-01-25 17:07:41 +01:00
John MacFarlane
7bf0813814 Shared: changed err and warn output.
Don't print program name in either case.
Print [warning] for warnings.
2017-01-25 17:07:40 +01:00
John MacFarlane
e2a452ba4a Shared.fetchItem: Better handling of protocol-relative URL.
If URL starts with `//` and there is no "base URL" (as there
would be if a URL were used on the command line), then default
to http:.

Closes #2635.
2016-11-27 21:19:26 +01:00
John MacFarlane
77912ddc56 Put 'warn' in MonadIO. Add warnings for math conversions in docx. 2016-11-22 10:56:59 +01:00
John MacFarlane
0cd11b3e54 Merge pull request #3165 from hubertp-lshift/feature/odt-image
[odt] images parser
2016-10-18 22:00:58 +02:00
Hubert Plociniczak
4417e33ea9 Use bind function instead of pattern matching 2016-10-17 16:58:53 +02:00
John MacFarlane
6d13567ac5 Allow http-client 0.4.30, which is the version in stackage lts.
Previously we required 0.5.
Remove CPP conditionals for earlier versions.
2016-10-13 13:01:49 +02:00
John MacFarlane
4a1ef0b51d Revert "Remove http-client CPP conditionals."
This reverts commit 3f82471355.

We might want to revert the requirement of http-client 0.5,
as this is not yet in Stackage and that is starting to
cause problems.  I can't recall why it is there.
2016-10-13 12:35:58 +02:00
Albert Krewinkel
64b77cc2c5
Shared: add function combining lines using LineBreak
The `linesToBlock` function takes a list of lines and combines them by appending
a hard `LineBreak` to each line and concatenating the result, putting the result
it into a `Para`. This is most useful when dealing when converting `LineBlock`
elements.
2016-10-13 08:46:38 +02:00
Hubert Plociniczak
c924611de5 Basic support for images in ODT documents
Highly influenced by the docx support, refactored
some code to avoid DRY.
2016-10-12 17:50:35 +02:00
Jesse Rosenthal
3f82471355 Remove http-client CPP conditionals.
Our lower bound on http-client is 0.5, and both of these min_version
tests are less than 0.5, so they will always pass.
2016-09-03 08:41:00 -04:00
Jesse Rosenthal
45c7108b4f Remove Compat.Monoid
This was only necessary for GHC versions with base below 4.5
(i.e., ghc < 7.4).
2016-09-02 09:18:08 -04:00
Albert Krewinkel
a396003a31 Rename README to MANUAL.txt 2016-07-20 21:16:45 +02:00
Jesse Rosenthal
e8e02f1220 Shared: improve year sanity check in normalizeDate
Previously we parsed a list of dates, took the first one, and then
tested its year range. That meant that if the first one failed, we
returned nothing, regardless of what the others did. Now we test for
sanity before running `msum` over the list of Maybe values. Anything
failing the test will be Nothing, so will not be a candidate.
2016-07-14 17:02:30 -04:00
Jesse Rosenthal
bbfcd50fb1 Shared: normalizeDate should reject illegal years.
We only allow years between 1601 and 9999, inclusive. The ISO 8601
actually says that years are supposed to start with 1583, but MS Word
only allows 1601-9999. This should stop corrupted word files if the date
is out of that range, or is parsed incorrectly.
2016-07-14 17:02:30 -04:00
Jesse Rosenthal
4816facee4 Shared: Add further formats for normalizeDate
We want to avoid illegal dates -- in particular years with greater than
four digits. We attempt to parse series of digits first as `%Y%m%d`, then
`%Y%m`, and finally `%Y`.
2016-07-14 17:02:30 -04:00
John MacFarlane
b203a31ba7 Fix warning for parseURl import. 2016-07-03 22:26:08 -07:00
John MacFarlane
261c3af053 CPP workaround for deprecation of parseUrl in http-client. 2016-07-03 21:29:47 -07:00
Jesse Rosenthal
cbc2c15f0f Shared: Add BlockQuote to blocksToInlines 2016-06-23 10:50:46 -04:00
Jesse Rosenthal
2b701f9389 Shared: introduce blocksToInlines function
This is a lossy function for converting `[Block] -> [Inline]`. Its main
use, at the moment, is for docx comments, which can contain arbitrary
blocks (except for footnotes), but which will be converted to spans.

This is, at the moment, pretty useless for everything but the basic
`Para` and `Plain` comments. It can be improved, but the docx reader
should probably emit a warning if the comment contains more than this.
2016-06-23 10:50:46 -04:00
John MacFarlane
499985c1a3 Updated copyright dates to include 2016. 2016-03-22 17:20:39 -07:00
John MacFarlane
f2bd6fd37c Make protocol-relative URIs work again.
Closes #2737.
2016-02-23 21:58:10 -08:00
John MacFarlane
20170c328f Changed type of Shared.uniqueIdent argument from [String] to Set String.
This avoids performance problems in documents with many identically
named headers.

Closes #2671.
2016-01-22 10:16:47 -08:00
John MacFarlane
b27783e2ec Use cmark 0.5.
Closes #2605.
2015-12-29 19:52:06 -08:00