Commit graph

150 commits

Author SHA1 Message Date
John MacFarlane
0feb7504b1 Rewrote LaTeX reader with proper tokenization.
This rewrite is primarily motivated by the need to
get macros working properly.  A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).

We now tokenize the input text, then parse the token stream.

Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.

A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.

* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
  Exports Macro, Tok, TokType, Line, Column.  [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
  so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
  Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
  rawLaTeXInline and rawLaTeXBlock.  (Both now return a String,
  and they are polymorphic in state.)
* Added orgMacros field to OrgState.  [API change]
* Removed readerApplyMacros from ReaderOptions.
  Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.

Fixes #1390.
Fixes #2118.
Fixes #3236.
Fixes #3779.
Fixes #934.
Fixes #982.
2017-07-07 12:36:00 +02:00
John MacFarlane
379b99f63a Added writerEpubSubdirectory to WriterOptions.
[API change]

The EPUB writer now takes its EPUB subdirectory from this option.

Also added `PandocEpubSubdirectoryError` to `PandocError`.
This is raised if the EPUB subdirectory is not all ASCII
alphanumerics.

See #3720.
2017-06-22 11:43:50 +02:00
John MacFarlane
aa1e39858d Text.Pandoc.App: ToJSON and FromJSON instances for Opts.
This can be used e.g. to pass options via web interface,
such as trypandoc.
2017-05-21 11:42:50 +02:00
Albert Krewinkel
965f1ddd4a
Update dates in copyright notices
This follows the suggestions given by the FSF for GPL licensed software.
<https://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html>
2017-05-13 23:30:13 +02:00
John MacFarlane
80d093843b Allow dynamic loading of syntax definitions.
See #3334.

* Add writerSyntaxMap to WriterOptions.
* Highlighting: added parameter for SyntaxMap to highlight.
* Implemented --syntax-definition option.

TODO:

[ ] Figure out whether we want to have the xml parsing
    depend on the dtd (it currently does, and fails unless
    the language.dtd is found in the same directory).
[ ] Add an option to read a KDE syntax highlighting theme
    as a custom style.
[ ] Add tests.
2017-03-30 22:36:36 +02:00
John MacFarlane
95f2726ee7 Added readerAbbreviations to ParserState.
Markdown reader now consults this to determine what is an
abbreviation.

Eventually it will be possible to specify a custom list
(see #256).
2017-03-05 10:24:39 +01:00
John MacFarlane
e256c8ce17 Stylish-haskell automatic formatting changes. 2017-03-04 13:03:41 +01:00
John MacFarlane
c7e2c718eb Removed --epub-stylesheet; use --css instead.
* Removed writerEpubStylesheet in WriterOptions.
* Removed `--epub-stylesheet` option.
* Allow `--css` to be used with epub.
* Allow multiple stylesheets to be used.
* Stylesheets will be taken both from `--css` and from
  the `stylesheet` metadata field (which can contain either
  a file path or a list of them).

Closes #3472, #847.
2017-02-27 21:29:16 +01:00
John MacFarlane
5e1249481b Added Text.Pandoc.Logging (exported module).
This now contains the Verbosity definition previously
in Options, as well as a new LogMessage datatype that
will eventually be used instead of raw strings for
warnings.

This will enable us, among other things, to provide
machine-readable warnings if desired.

See #3392.
2017-02-10 20:59:54 +01:00
John MacFarlane
47a16065c4 Removed --parse-raw and readerParseRaw.
These were confusing.

Now we rely on the +raw_tex or +raw_html extension with latex
or html input.

Thus, instead of

    --parse-raw -f latex

we use

    -f latex+raw_tex

and instead of

     --parse-raw -f html

we use

    -f html+raw_html
2017-02-06 23:33:23 +01:00
John MacFarlane
63b568f445 Changed writerEpubMetadata to a Maybe String.
API change.
2017-02-04 22:51:51 +01:00
John MacFarlane
7018003811 --mathml and MathML in HTMLMathMethod longer take an argument.
The argument was for a bridge javascript that used to be necessary
in 2004.  We have removed the script already.
2017-01-30 11:31:50 +01:00
John MacFarlane
d2e0592e01 LaTeX writer: export writeBeamer.
Removed writerBeamer from WriterOptions.
2017-01-28 09:52:45 +01:00
John MacFarlane
91cdcc796d HTML: export separate functions for slide formats.
writeS5, writeSlideous, writeRevealJs, writeDZSlides, writeSlidy.

Removed writerSlideVariant from WriterOptions.
2017-01-27 22:39:36 +01:00
John MacFarlane
f5dd123819 HTML writer: export writeHtmlStringForEPUB.
Options: Remove writerEPUBVersion.
2017-01-27 10:27:34 +01:00
John MacFarlane
b6c1d491f5 Split writeDocbook into writeDocbook4, writeDocbook5.
Removed writerDocbookVersion in WriterOptions.
Renamed default.docbook template to default.docbook4.
Allow docbook4 as an output format.
But alias docbook = docbook4.
2017-01-26 22:40:57 +01:00
John MacFarlane
fce0a60f0a Provide explicit separate functions for HTML 4 and 5.
* Text.Pandoc.Writers.HTML: removed writeHtml, writeHtmlString,
  added writeHtml4, writeHtml4String, writeHtml5, writeHtml5String.
* Removed writerHtml5 from WriterOptions.
* Renamed default.html template to default.html4.
* "html" now aliases to "html5"; to get the old HTML4 behavior,
  you must now specify "-t html4".
2017-01-25 21:51:26 +01:00
John MacFarlane
70b86f48e1 Removed readerVerbosity and writerVerbosity.
API change.

Also added a verbosity parameter to makePDF.
2017-01-25 17:07:43 +01:00
John MacFarlane
8280d6a489 Changes to verbosity in writer and reader options.
API changes: Text.Pandoc.Options:

* Added Verbosity.
* Added writerVerbosity.
* Added readerVerbosity.
* Removed writerVerbose.
* Removed readerTrace.

pandoc CLI:  The `--trace` option sets verbosity to DEBUG;
the `--quiet` option sets it to ERROR, and the `--verbose`
option sets it to INFO.  The default is WARNING.
2017-01-25 17:07:43 +01:00
John MacFarlane
d1efc839f1 Removed writerHighlight; made writerHighlightStyle a Maybe.
API change.

For no highlighting, set writerHighlightStyle to Nothing.
2017-01-25 17:07:43 +01:00
John MacFarlane
6f9df9b4f1 Removed vestigial writerMediaBag from WriterOptions.
API change.
2017-01-25 17:07:43 +01:00
John MacFarlane
4007d6a897 Removed writerIgnoreNotes.
Instead, just temporarily remove notes when generating
TOC lists in HTML and Markdown (as we already did in LaTeX).

Also export deNote from Text.Pandoc.Shared.

API change in Shared and Options.WriterOptions.
2017-01-25 17:07:42 +01:00
John MacFarlane
00c6c371f2 Removed unused readerFileScope.
API change.
2017-01-25 17:07:42 +01:00
John MacFarlane
0bcc81c0b1 Removed writerTeXLigatures.
Make `smart` extension work in LaTeX/ConTeXt writers instead.

Instead of `-t latex --no-tex-ligatures`, do `-t latex-smart`.
2017-01-25 17:07:42 +01:00
John MacFarlane
a58369a7e6 Options: changed default reader/writerExtensions to emptyExtensions.
Previously they were pandocExtensions.
This didn't make sense for many formats.
2017-01-25 17:07:42 +01:00
John MacFarlane
5bf9125770 Removed readerOldDashes and --old-dashes option, added old_dashes extension.
API change.  CLI option change.
2017-01-25 17:07:42 +01:00
John MacFarlane
6f8b967d98 Removed readerSmart and the --smart option; added Ext_smart extension.
Now you will need to do

    -f markdown+smart

instead of

    -f markdown --smart

This change opens the way for writers, in addition to readers,
to be sensitive to +smart, but this change hasn't yet been made.

API change. Command-line option change.

Updated manual.
2017-01-25 17:07:42 +01:00
John MacFarlane
3876b91448 Make Extensions a custom type instead of a Set Extension.
The type is implemented in terms of an underlying bitset
which should be more efficient.

API change: from Text.Pandoc.Extensions export Extensions,
emptyExtensions, extensionsFromList, enableExtension, disableExtension,
extensionEnabled.
2017-01-25 17:07:42 +01:00
John MacFarlane
1427252160 Split extensions code from Options into separate Text.Pandoc.Extensions.
API change.
However, Extensions exports Options, so this shouldn't have
much impact.
2017-01-25 17:07:42 +01:00
John MacFarlane
ce1664cf2b Simplified reference-docx/reference-odt to reference-doc.
* Text.Pandoc.Options.WriterOptions: removed writerReferenceDocx
  and writerReferenceODT, replaced them with writerReferenceDoc.
  This can hold either an ODT or a Docx. In this way, writerReferenceDoc
  is like writerTemplate, which can hold templates of different
  formats. [API change]

* Removed `--reference-docx` and `--reference-odt` options.

* Added `--reference-doc` option.
2017-01-25 17:07:41 +01:00
John MacFarlane
fb8a2540bd Options: Removed writerStandalone, made writerTemplate a Maybe.
Previously setting writerStandalone = True did nothing unless
a template was provided in writerTemplate.  Now a fragment
will be generated if writerTemplate is Nothing; otherwise,
the specified template will be used and standalone output
generated.  [API change]
2016-11-30 15:34:58 +01:00
Albert Krewinkel
1fc07ff4da Refactor top-level division selection (#3261)
The "default" option is no longer represented as `Nothing` but via a new
type constructor, making the `Maybe` wrapper superfluous.

The default behavior of using heuristics can now be enabled explicitly
by setting `--top-level-division=default`.

API change (`Text.Pandoc.Options`): The `Division` type was renamed to
`TopLevelDivision`. The `Section`, `Chapter`, and `Part` constructors
were renamed to `TopLevelSection`, `TopLevelChapter`, and
`TopLevelPart`, respectively. An additional `TopLevelDefault`
constructor was added, which is now also the new default value of the
`writerTopLevelDivision` field in `WriterOptions`.
2016-11-27 20:31:04 +01:00
Albert Krewinkel
baa25362a4 Allow to overwrite top-level division type heuristics (#3258)
Pandoc uses heuristics to determine the most resonable top-level
division type when emitting LaTeX or Docbook markup.  It is now possible
to overwrite this implicitly set top-level division via the
`top-level-division` command line parameter.

API change (`Text.Pandoc.Options`): the type of the
`writerTopLevelDivision` field in of the `WriterOptions` data type is
altered from `Division` to `Maybe Division`. The field's default value
is changed from `Section` to `Nothing`.

Closes: #3197
2016-11-26 21:43:46 +01:00
John MacFarlane
696dfbc993 Added angle_brackets_escapable extension.
This is needed because github flavored Markdown has a slightly
different set of escapable symbols than original Markdown;
it includes angle brackets.

Closes #2846.
2016-10-22 23:41:55 +02:00
Albert Krewinkel
595a171407
Add option for top-level division type
The `--chapters` option is replaced with `--top-level-division` which allows
users to specify the type as which top-level headers should be output. Possible
values are `section` (the default), `chapter`, or `part`.

The formats LaTeX, ConTeXt, and Docbook allow `part` as top-level division, TEI
only allows to set the `type` attribute on `div` containers.  The writers are
altered to respect this option in a sensible way.
2016-10-19 13:12:57 +02:00
Oliver Matthews
23fb52ef7d
Add --parts command line option to LaTeX writer.
Add --parts command line argument.
This only effects LaTeX writer, and only for non-beamer output formats.
It changes the output levels so the top level is 'part', the next
'chapter' and then into sections.
2016-09-06 21:43:45 +02:00
Jesse Rosenthal
27113bda1f Options: Add references location.
This will be used by the markdown writer for deciding where to put links
and footnotes.
2016-10-11 13:30:01 -04:00
KolenCheung
46be319ca9 removed mmd raw_tex in src/Text/Pandoc/Options.hs 2016-10-09 21:30:03 +02:00
John MacFarlane
e95047ed85 Markdown reader: added bracket syntax for native spans.
See #168.

Text.Pandoc.Options.Extension has a new constructor `Ext_brackted_spans`,
which is enabled by default in pandoc's Markdown.
2016-09-28 12:33:05 +02:00
John MacFarlane
58d60b1c85 Changed email-obfuscation default to no obfuscation.
- `writerEmailObfuscation` in `defaultWriterOptions` is now
  `NoObfuscation`
- the default for the command-line `--email-obfuscation` option is
  now `none`.

Closes #2988.
2016-06-20 10:37:23 -07:00
Ivo Clarysse
987ec3a752 Write out Docbook 5 namespace 2016-04-29 15:43:15 -07:00
Ivo Clarysse
271cb4d845 Add docbook5 writer support 2016-04-29 14:00:46 -07:00
John MacFarlane
499985c1a3 Updated copyright dates to include 2016. 2016-03-22 17:20:39 -07:00
Jesse Rosenthal
5c055b4cf3 Introduce file-scope parsing (parse-before-combine)
Traditionally pandoc operates on multiple files by first concetenating
them (around extra line breaks) and then processing the joined file. So
it only parses a multi-file document at the document scope. This has the
benefit that footnotes and links can be in different files, but it also
introduces a couple of difficulties:

  - it is difficult to join files with footnotes without some sort of
    preprocessing, which makes it difficult to write academic documents
    in small pieces.

  - it makes it impossible to process multiple binary input files, which
    can't be catted.

  - it makes it impossible to process files from different input
    formats.

This commit introduces alternative method. Instead of catting the files
first, it parses the files first, and then combines the parsed
output. This makes it impossible to have links across multiple files,
and auto-identified headers won't work correctly if headers in multiple
files have the same name. On the other hand, footnotes across multiple
files will work correctly and will allow more freedom for input formats.

Since ByteStringReaders can currently only read one binary file, and
will ignore subsequent files, we also changes the behavior to
automatically parse before combining if using the ByteStringReader. If
we use one file, it will work as normal. If there is more than one file
it will combine them after parsing (assuming that the format is the
same).

Note that this is intended to be an optional method, defaulting to
off. Turn it on with `--file-scope`.
2016-03-15 12:52:51 -04:00
John MacFarlane
bbc67dee36 Removed tex_math_single_backslash from markdown_github options.
Closes #2707.
2016-02-09 22:30:52 -08:00
John MacFarlane
44120ea716 Implemented east_asian_line_breaks extension.
Text.Pandoc.Options: Added `Ext_east_asian_line_breaks` constructor to
`Extension` (API change).

This extension is like `ignore_line_breaks`, but smarter -- it
only ignores line breaks between two East Asian wide characters.
This makes it better suited for writing with a mix of East Asian
and non-East Asian scripts.

Closes #2586.
2015-12-12 17:28:52 -08:00
John MacFarlane
536b6bf538 Implemented SoftBreak and new --wrap option.
Added threefold wrapping option.

* Command line option: deprecated `--no-wrap`, added
  `--wrap=[auto|none|preserve]`
* Added WrapOption, exported from Text.Pandoc.Options
* Changed type of writerWrapText in WriterOptions from
  Bool to WrapOption.
* Modified Text.Pandoc.Shared functions for SoftBreak.
* Supported SoftBreak in writers.
* Updated tests.
* Updated README.

Closes #1701.
2015-12-11 23:55:08 -08:00
John MacFarlane
73e2d7976c Renamed link attribute extensions.
* Old `link_attributes` -> `mmd_link_attributes`
* Recently added `common_link_attributes` -> `link_attributes`

Note: this change could break some existing workflows.
2015-11-19 23:17:50 -08:00
John MacFarlane
244cd5644b Merge branch 'new-image-attributes' of https://github.com/mb21/pandoc into mb21-new-image-attributes
* Bumped version to 1.16.
* Added Attr field to Link and Image.
* Added `common_link_attributes` extension.
* Updated readers for link attributes.
* Updated writers for link attributes.
* Updated tests
* Updated stack.yaml to build against unreleased versions of
  pandoc-types and texmath.
* Fixed various compiler warnings.

Closes #261.

TODO:

* Relative (percentage) image widths in docx writer.
* ODT/OpenDocument writer (untested, same issue about percentage widths).
* Update pandoc-citeproc.
2015-11-19 23:14:23 -08:00
John MacFarlane
8f5ff7075c Derive Generic instances for types in Text.Pandoc.Options. 2015-11-14 17:46:55 -08:00