Commit graph

366 commits

Author SHA1 Message Date
John MacFarlane
8481298357 Don't rely on syb when we don't need to. 2017-10-27 21:44:22 -07:00
John MacFarlane
ff16db1aa3 Automatic reformating by stylish-haskell. 2017-10-27 20:28:29 -07:00
hftf
7f8a3c6cb7 Consistent underline for Readers (#2270)
* Added underlineSpan builder function.  This can be easily updated if needed. The purpose is for Readers to transform underlines consistently.

* Docx Reader: Use underlineSpan and update test

* Org Reader: Use underlineSpan and add test

* Textile Reader: Use underlineSpan and add test case

* Txt2Tags Reader: Use underlineSpan and update test

* HTML Reader: Use underlineSpan and add test case
2017-10-27 18:45:00 -04:00
John MacFarlane
2f66d57616 Remove openURL from Shared (API change).
Now all the guts of openURL have been put into openURL from
Class.  openURL is now sensitive to stRequestHeaders in CommonState
and will add these custom headers when making a request.
It no longer looks at the USER_AGENT environment variable,
since you can now set the `User-Agent` header directly.
2017-10-15 22:11:38 -07:00
John MacFarlane
7d2ff7ed6d Shared.stringify, removeFormatting: handle Quoted better.
Previously we were losing the qutation marks in Quoted
elements.  See #3958.
2017-10-08 21:55:57 -07:00
John MacFarlane
74212eb1b0 Added support for translations (localization) (see #3559).
* readDataFile, readDefaultDataFile, getReferenceDocx,
  getReferenceODT have been removed from Shared and
  moved into Class.  They are now defined in terms of
  PandocMonad primitives, rather than being primitve
  methods of the class.

* toLang has been moved from BCP47 to Class.

* NoTranslation and CouldNotLoudTranslations have
  been added to LogMessage.

* New module, Text.Pandoc.Translations, exporting
  Term, Translations, readTranslations.

* New functions in Class: translateTerm, setTranslations.
  Note that nothing is loaded from data files until
  translateTerm is used; setTranslation just sets the
  language to be used.

* Added two translation data files in data/translations.

* LaTeX reader: Support `\setmainlanguage` or `\setdefaultlanguage`
  (polyglossia) and `\figurename`.
2017-08-11 22:22:31 -07:00
John MacFarlane
6aaf8f4770 Expose getDefaultDataFile in both Shared and Class. 2017-08-10 23:04:14 -07:00
John MacFarlane
2363e6a15b Move CR filtering from tabFilter to the readers.
The readers previously assumed that CRs had been filtered
from the input.  Now we strip the CRs in the readers themselves,
before parsing.  (The point of this is just to simplify the
parsers.)

Shared now exports a new function `crFilter`. [API change]
And `tabFilter` no longer filters CRs.
2017-06-20 21:52:13 +02:00
John MacFarlane
9849ba7fd7 Use Control.Monad.State.Strict throughout.
This gives 20-30% speedup and reduction of memory
usage in most of the writers.
2017-06-17 07:45:28 +02:00
John MacFarlane
fa719d0264 Switched Writer types to use Text.
* XML.toEntities: changed type to Text -> Text.
* Shared.tabFilter -- fixed so it strips out CRs as before.
* Modified writers to take Text.
* Updated tests, benchmarks, trypandoc.

[API change]

Closes #3731.
2017-06-11 00:46:31 +02:00
John MacFarlane
72b45f05ed Rewrote convertTabs to use Text not String. 2017-06-10 15:22:25 +02:00
John MacFarlane
774075c3e2 Added eastAsianLineBreakFilter to Shared.
This used to live in the Markdown reader.
2017-05-30 10:22:48 +02:00
John MacFarlane
66fa38ed1c Shared.isURI: allow uppercase versions of known schemes. 2017-05-23 09:49:56 +02:00
Albert Krewinkel
5debb0da0f Shared: Provide custom isURI that rejects unknown schemes [isURI]
We also export the set of known `schemes`.

The new function replaces the function of the same name
from `Network.URI`, as the latter did not check whether a scheme is
well-known.  E.g. MediaWiki wikis frequently feature pages with names
like `User:John`. These links were interpreted as URIs, thus turning
internal links into global links. This is prevented by also checking
whether the scheme of a URI is frequently used (i.e. is IANA registered
or an otherwise well-known scheme).

Fixes: #2713

Update set of well-known URIs from IANA list
All official IANA schemes (as of 2017-05-22) are included in the set of
known schemes.  The four non-official schemes doi, isbn, javascript, and
pmid are kept.
2017-05-23 09:48:11 +02:00
Albert Krewinkel
965f1ddd4a
Update dates in copyright notices
This follows the suggestions given by the FSF for GPL licensed software.
<https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-13 23:30:13 +02:00
John MacFarlane
99be906101 Added PandocHttpException, trap exceptions in fetching from URLs.
Closes #3646.
2017-05-07 13:11:04 +02:00
John MacFarlane
d414b2543a Remove https flag.
Supporting two completely different libraries for fetching
from URLs makes it difficult to trap errors, because of
different error types expected from the libraries.

There's no clear reason not to build with these https-capable
libraires.
2017-05-07 12:49:25 +02:00
John MacFarlane
1fe1c162ac Error: Added PandocCouldNotFindDataFileError.
Use this instead of PandocAppError when appropriate.
Removed exit code from PandocAppError, use 1 for all.
2017-04-15 12:05:58 +02:00
John MacFarlane
913db947a9 Text.Pandoc.App: Throw errors rather than exiting.
These are caught (and lead to exit) in pandoc.hs, but
other uses of Text.Pandoc.App may want to recover in another
way.

Added PandocAppError to PandocError (API change).
This is a stopgap:  later we should have a separate constructor
for each type of error.

Also fixed uses of 'exit' in Shared.readDataFile, and
removed 'err' from Shared (API change).

Finally, removed the dependency on extensible-exceptions.

See #3548.
2017-04-02 23:04:48 +02:00
John MacFarlane
3765f08304 Revert "Shared: export extractIds."
This reverts commit 0ef1e51211.
2017-03-12 21:18:19 +01:00
John MacFarlane
0ef1e51211 Shared: export extractIds.
This will be used to help with #1745.
2017-03-12 12:42:03 +01:00
John MacFarlane
ba78b75146 Removed normalizeSpaces from Text.Pandoc.Shared.
Rewrote functions in RST reader and writer to avoid the need
for it.

Closes #1530.
2017-03-10 20:45:21 +01:00
John MacFarlane
9862d7c359 Shared.normalizeSpaces: strip off leading/trailing line breaks...
...not just spaces.
2017-03-10 20:33:14 +01:00
John MacFarlane
72af7b4ee5 Shared: remove 'warn'.
PDF writer: Use 'report' instead of 'warn', make it sensitive
to verbosity settings.
2017-02-24 14:29:56 +01:00
John MacFarlane
4a9069130f Shared.openURL: Changed type from an Either.
Now it will just raise an exception to be trapped later.
2017-02-23 16:21:03 +01:00
Alexander Krotov
a58112f6bc Simplify toRomanNumeral using guards (#3445) 2017-02-14 23:00:23 +01:00
Thenaesh Elango
942189056d Allow user to specify User-Agent (#3421)
This commit enables users to specify the User-Agent
header used when pandoc requests a document from
a URL. This is done by setting an environment variable.
For instance, one can do:
USER_AGENT="..." ./pandoc -f html -t markdown http://example.com

Signed-off-by: Thenaesh Elango <thenaeshelango@gmail.com>
2017-02-05 11:28:39 +01:00
John MacFarlane
5156a4fe3c Shared: rename compactify', compactify'DL -> compactify, compactifyDL. 2017-01-27 21:36:45 +01:00
John MacFarlane
56f74cb0ab Removed Shared.compactify.
Changed signatures on Parsing.tableWith and Parsing.gridTableWith.
2017-01-27 21:30:35 +01:00
John MacFarlane
4007d6a897 Removed writerIgnoreNotes.
Instead, just temporarily remove notes when generating
TOC lists in HTML and Markdown (as we already did in LaTeX).

Also export deNote from Text.Pandoc.Shared.

API change in Shared and Options.WriterOptions.
2017-01-25 17:07:42 +01:00
John MacFarlane
2d04922cd0 Factored out deNote in Shared. 2017-01-25 17:07:42 +01:00
John MacFarlane
6aff97e4e1 Text.Pandoc.Shared: Removed fetchItem, fetchItem'.
Made changes where these are used, so that the version
of fetchItem from PandocMonad can be used instead.
2017-01-25 17:07:42 +01:00
John MacFarlane
00240ca7ed Removed hush from Text.Pandoc.Shared.
Not used anywhere.
2017-01-25 17:07:41 +01:00
John MacFarlane
8165014df6 Removed --normalize option and normalization functions from Shared.
* Removed normalize, normalizeInlines, normalizeBlocks
  from Text.Pandoc.Shared.  These shouldn't now be necessary,
  since normalization is handled automatically by the Builder
  monoid instance.

* Remove `--normalize` command-line option.

* Don't use normalize in tests.

* A few revisions to readers so they work well without normalize.
2017-01-25 17:07:41 +01:00
John MacFarlane
2b24c6ff3a Shared: put err into MonadIO. 2017-01-25 17:07:41 +01:00
John MacFarlane
7bf0813814 Shared: changed err and warn output.
Don't print program name in either case.
Print [warning] for warnings.
2017-01-25 17:07:40 +01:00
John MacFarlane
e2a452ba4a Shared.fetchItem: Better handling of protocol-relative URL.
If URL starts with `//` and there is no "base URL" (as there
would be if a URL were used on the command line), then default
to http:.

Closes #2635.
2016-11-27 21:19:26 +01:00
John MacFarlane
77912ddc56 Put 'warn' in MonadIO. Add warnings for math conversions in docx. 2016-11-22 10:56:59 +01:00
John MacFarlane
0cd11b3e54 Merge pull request #3165 from hubertp-lshift/feature/odt-image
[odt] images parser
2016-10-18 22:00:58 +02:00
Hubert Plociniczak
4417e33ea9 Use bind function instead of pattern matching 2016-10-17 16:58:53 +02:00
John MacFarlane
6d13567ac5 Allow http-client 0.4.30, which is the version in stackage lts.
Previously we required 0.5.
Remove CPP conditionals for earlier versions.
2016-10-13 13:01:49 +02:00
John MacFarlane
4a1ef0b51d Revert "Remove http-client CPP conditionals."
This reverts commit 3f82471355.

We might want to revert the requirement of http-client 0.5,
as this is not yet in Stackage and that is starting to
cause problems.  I can't recall why it is there.
2016-10-13 12:35:58 +02:00
Albert Krewinkel
64b77cc2c5
Shared: add function combining lines using LineBreak
The `linesToBlock` function takes a list of lines and combines them by appending
a hard `LineBreak` to each line and concatenating the result, putting the result
it into a `Para`. This is most useful when dealing when converting `LineBlock`
elements.
2016-10-13 08:46:38 +02:00
Hubert Plociniczak
c924611de5 Basic support for images in ODT documents
Highly influenced by the docx support, refactored
some code to avoid DRY.
2016-10-12 17:50:35 +02:00
Jesse Rosenthal
3f82471355 Remove http-client CPP conditionals.
Our lower bound on http-client is 0.5, and both of these min_version
tests are less than 0.5, so they will always pass.
2016-09-03 08:41:00 -04:00
Jesse Rosenthal
45c7108b4f Remove Compat.Monoid
This was only necessary for GHC versions with base below 4.5
(i.e., ghc < 7.4).
2016-09-02 09:18:08 -04:00
Albert Krewinkel
a396003a31 Rename README to MANUAL.txt 2016-07-20 21:16:45 +02:00
Jesse Rosenthal
e8e02f1220 Shared: improve year sanity check in normalizeDate
Previously we parsed a list of dates, took the first one, and then
tested its year range. That meant that if the first one failed, we
returned nothing, regardless of what the others did. Now we test for
sanity before running `msum` over the list of Maybe values. Anything
failing the test will be Nothing, so will not be a candidate.
2016-07-14 17:02:30 -04:00
Jesse Rosenthal
bbfcd50fb1 Shared: normalizeDate should reject illegal years.
We only allow years between 1601 and 9999, inclusive. The ISO 8601
actually says that years are supposed to start with 1583, but MS Word
only allows 1601-9999. This should stop corrupted word files if the date
is out of that range, or is parsed incorrectly.
2016-07-14 17:02:30 -04:00
Jesse Rosenthal
4816facee4 Shared: Add further formats for normalizeDate
We want to avoid illegal dates -- in particular years with greater than
four digits. We attempt to parse series of digits first as `%Y%m%d`, then
`%Y%m`, and finally `%Y`.
2016-07-14 17:02:30 -04:00