Previously we had tested certain properties of the output PowerPoint
slides. Corruption, though, comes as the result of a numebr of
interrelated issues in the output pptx archive. This is a new
approach, which compares the output of the Powerpoint writer with
files that we know to (a) not be corrupt, and (b) to show the desired
output behavior (details below). This commit introduces three tests
using the new framework. More will follow.
The test procedure: given a native file and a pptx file, we generate a
pptx archive from the native file, and then test:
1. Whether the same files are in the two archives
2. Whether each of the contained xml files is the same. (We skip time
entries in `docProps/core.xml`, since these are derived from IO. We
just check to make sure that they're there in the same way in both
files.)
3. Whether each of the media files is the same.
Note that steps 2 and 3, though they compare multiple files, are one
test each, since the number of files depends on the input file (if
there is a failure, it will only report the first failed file
comparison in the test failure).
We introduce a new module, Text.Pandoc.Readers.Docx.Fields which
contains a simple parsec parser. At the moment, only simple hyperlink
fields are accepted, but that can be extended in the future.
There are two steps in the conversion: a conversion from pandoc to a
Presentation datatype modeling pptx, and a conversion from
Presentation to a pptx archive. The two steps were sharing the same
state and environment, and the code was getting a bit
spaghetti-ish. This separates the conversion into separate
modules (T.P.W.Powerpoint.Presentation, which defineds the
Presentation datatype and goes Pandoc->Presentation)
and (T.P.W.Pandoc.Output, which goes Presentation->Archive).
Text.Pandoc.Writers.Powerpoint a thin wrapper around the two modules.
If you use a custom syntax definition that refers to a syntax
you haven't loaded, pandoc will now complain when it is highlighting
the text, rather than at the start.
This saves a huge performance hit from the `missingIncludes` check.
Closes#4226.
This version fixes a bug that made it difficult to handle failures while
getting lists or a Map from Lua. A bug in pandoc, which made it
necessary to always pass a tag when using MetaList or MetaBlock, is
fixed as a result. Using the pandoc module's constructor functions for
these values is now optional (if still recommended).
* Previously we ran all lua filters before JSON filters.
* Now we run filters in the order they are presented on the
command line, whether lua or JSON.
* The type of `applyFilters` has changed (incompatible API change).
* `applyLuaFilters` has been removed (incompatible API change).
* Bump version to 2.1.
See #4196.
This is the beginning of a test suite for the powerpoint
writer. Initial tests are for the number of slides.
Note that at the moment it does not test against corruption in
Microsoft PowerPoint; it just tests that certain outcomes work as
expected. More tests will be added.
This test framework uses the PandocPure monad introduced with Pandoc 2.0.
The level of headers in included files can be shifted to a higher level
by specifying a minimum header level via the `:minlevel` parameter. E.g.
`#+include: "tour.org" :minlevel 1` will shift the headers in tour.org
such that the topmost headers become level 1 headers.
Fixes: #4154
The org reader test file had grown large, to the point that editor
performance was negatively affected in some cases. The tests are spread
over multiple submodules, and re-combined into a tasty TestTree in the
main org reader test file.
The same init file (`data/init`) that is used to setup the Lua
interpreter for Lua filters is also used to setup the interpreter of
custom writers.lua.
Support writing <fig> and <table-wrap> elements with <title> and
<caption> inside them by using Divs with class set to on of
fig, table-wrap or cation. The title is included as a Heading
so the constraint on where Heading can occur is also relaxed.
Also leaves out empty alt attributes on links.
This fixes a bug in 2.0.4, whereby pandoc could not
read the theme files generated with `--print-highlight-style`.
It also fixes some CSS issues involving line numbers.
Highlighted code blocks are now enclosed in a div with class
sourceCode.
Highlighting CSS no longer sets a generic color for pre
and code; we only set these for class `sourceCode`.
This will close#4133 and #4128.
The file `init.lua` is used to initialize the Lua interpreter which is
used in Lua filters. This gives users the option to require libraries
which they want to use in all of their filters, and to extend default
modules.
The integration with Lua's package/module system is improved: A
pandoc-specific package searcher is prepended to the searchers in
`package.searchers`. The modules `pandoc` and `pandoc.mediabag` can now
be loaded via `require`.
The List module is automatically loaded, but not assigned to a global
variable. It can be included in filters by calling `List = require
'List'`.
Lists of blocks, lists of inlines, and lists of classes are now given
`List` as a metatable, making working with them more convenient. E.g.,
it is now possible to concatenate lists of inlines using Lua's
concatenation operator `..` (requires at least one of the operants to
have `List` as a metatable):
function Emph (emph)
local s = {pandoc.Space(), pandoc.Str 'emphasized'}
return pandoc.Span(emph.content .. s)
end
Closes: #4081
The `text` module is preloaded in lua. The module contains some UTF-8
aware string functions, implemented in Haskell. The module is loaded on
request only, e.g.:
text = require 'text'
function Str (s)
s.text = text.upper(s.text)
return s
end
Refactored some code from Text.Pandoc.Lua.PandocModule
into new internal module Text.Pandoc.Lua.Filter.
Add `walk_inline` and `walk_block` in pandoc lua module.
The line identifiers are built using the code block's identifier
as a prefix. If the code block has null identifier, we use
"cb1", "cb2", etc.
Closes#4031.
This prevents the problem with extra space around highlighted
code blocks (closes#3996).
Note that we no longer put an enclosing div around highlighted
code blocks. The pre is the outer element, just as for unhighlighted
blocks.
Previously `\include` wouldn't work if the included file
contained, e.g., a begin without a matching end.
We've changed the Tok type so that it stores a full SourcePos,
rather than just a line and column. So tokens keeep track
of the file they came from. This allows us to use a simpler
method for includes, which doesn't require parsing the included
document as a whole.
Closes#3971.
Now all the guts of openURL have been put into openURL from
Class. openURL is now sensitive to stRequestHeaders in CommonState
and will add these custom headers when making a request.
It no longer looks at the USER_AGENT environment variable,
since you can now set the `User-Agent` header directly.
We now use the default.latex template for both latex and beamer.
It contains conditionals for the beamer-specific things.
`pandoc -D beamer` will return this template.
Stack instances for common data types are now provides by hslua. The
instance for Either was useful only for a very specific case; the
function that was using the `ToLuaStack Either` instance was rewritten
to work without it.
Closes: #3805
We assume that comments are defined as parsed by the
docx reader:
I want <span class="comment-start" id="0" author="Jesse Rosenthal"
date="2016-05-09T16:13:00Z">I left a comment.</span>some text to
have a comment <span class="comment-end" id="0"></span>on it.
We assume also that the id attributes are unique and properly
matched between comment-start and comment-end.
Closes#2994.
* readDataFile, readDefaultDataFile, getReferenceDocx,
getReferenceODT have been removed from Shared and
moved into Class. They are now defined in terms of
PandocMonad primitives, rather than being primitve
methods of the class.
* toLang has been moved from BCP47 to Class.
* NoTranslation and CouldNotLoudTranslations have
been added to LogMessage.
* New module, Text.Pandoc.Translations, exporting
Term, Translations, readTranslations.
* New functions in Class: translateTerm, setTranslations.
Note that nothing is loaded from data files until
translateTerm is used; setTranslation just sets the
language to be used.
* Added two translation data files in data/translations.
* LaTeX reader: Support `\setmainlanguage` or `\setdefaultlanguage`
(polyglossia) and `\figurename`.
In addition to `-Wall`:
`-Wincomplete-uni-patterns -Wincomplete-record-updates -Wredundant-constraints -Wcompat -Wnoncanonical-monad-instances -Wnoncanonical-monadfail-instances`
We no longer have a separate readGFM and writeGFM;
instead, we'll use readCommonMark and writeCommonMark
with githubExtensions.
It remains to implement these extensions conditionally.
Closes#3841.
This uses bindings to GitHub's fork of cmark, so it should parse
gfm exactly as GitHub does (excepting certain postprocessing
steps, involving notifications, emojis, etc.).
* Added Text.Pandoc.Readers.GFM (exporting readGFM)
* Added Text.Pandoc.Writers.GFM (exporting writeGFM)
* Added `gfm` as input and output forma
Note that tables are currently always rendered as HTML
in the writer; this can be improved when CMarkGFM supports
tables in output.
Added TikiWiki reader, including tests and documentation.
It's probably not *complete*, but it works pretty well, handles all
the basics (and some not-so-basics).
This rewrite is primarily motivated by the need to
get macros working properly. A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).
We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.
A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
Exports Macro, Tok, TokType, Line, Column. [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
rawLaTeXInline and rawLaTeXBlock. (Both now return a String,
and they are polymorphic in state.)
* Added orgMacros field to OrgState. [API change]
* Removed readerApplyMacros from ReaderOptions.
Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.
Fixes#1390.
Fixes#2118.
Fixes#3236.
Fixes#3779.
Fixes#934.
Fixes#982.
* New module Text.Pandoc.Readers.Vimwiki, exporting readVimwiki [API change].
* New input format `vimwiki`.
* New data file, `data/vimwiki.css`, for displaying the HTML produced by this reader and pandoc's HTML writer in the style of vimwiki's own HTML export.
Support for the `#+INCLUDE:` file inclusion mechanism was added.
Recognized include types are *example*, *export*, *src*, and normal org
file inclusion. Advanced features like line numbers and level selection
are not implemented yet.
Closes: #3510
Supporting two completely different libraries for fetching
from URLs makes it difficult to trap errors, because of
different error types expected from the libraries.
There's no clear reason not to build with these https-capable
libraires.
Writer helper functions were defined in the top-level Text.Pandoc
module. These functions are moved to the Writer submodule as to enable
reuse in other submodules.
Reader helper functions were defined in the top-level Text.Pandoc
module. These functions are moved to the Readers submodule as to enable
reuse in other submodules.
The lua filters and custom lua writer system defined very similar
StackValue instances for strings and tuples. These instance definitions
are extracted to a separate module to enable sharing.
These are caught (and lead to exit) in pandoc.hs, but
other uses of Text.Pandoc.App may want to recover in another
way.
Added PandocAppError to PandocError (API change).
This is a stopgap: later we should have a separate constructor
for each type of error.
Also fixed uses of 'exit' in Shared.readDataFile, and
removed 'err' from Shared (API change).
Finally, removed the dependency on extensible-exceptions.
See #3548.
Plain text readers are exposed to lua scripts via the `pandoc.reader`
submodule, which is further subdivided by format. Converting e.g. a
markdown string into a pandoc document is possible from within lua:
doc = pandoc.reader.markdown.read_doc("Hello, World!")
A `read_block` convenience function is provided for all formats,
although it will still parse the whole string but return only the first
block as the result.
Custom reader options are not supported yet, default options are used
for all parsing operations.
I think template haskell is robust enough now across platforms
that this will work.
Motivation: file-embed gives us better dependency tracking: if a data
file changes, ghc/stack/cabal know to recompile the Data module.
This also removes hsb2hs as a build dependency.
The 0.5.0 release of hslua fixes problems with lua C modules on linux.
The signature of the `loadstring` function changed, so a compatibility
wrapper is introduced to allow both 0.4.* and 0.5.* versions to be used.
* New module: Text.Pandoc.Writers.Ms.
* New template: default.ms.
* The writer uses texmath's new eqn writer to convert math
to eqn format, so a ms file produced with this writer
should be processed with `groff -ms -e` if it contains
math.
* Add `--lua-filter` option. This works like `--filter` but takes pathnames of special lua filters and uses the lua interpreter baked into pandoc, so that no external interpreter is needed. Note that lua filters are all applied after regular filters, regardless of their position on the command line.
* Add Text.Pandoc.Lua, exporting `runLuaFilter`. Add `pandoc.lua` to data files.
* Add private module Text.Pandoc.Lua.PandocModule to supply the default lua module.
* Add Tests.Lua to tests.
* Add data/pandoc.lua, the lua module pandoc imports when processing its lua filters.
* Document in MANUAL.txt.
This contains a list of strings that will be recognized by pandoc's
Markdown parser as abbreviations. (A nonbreaking space will
be inserted after the period, preventing a sentence space in
formats like LaTeX.)
Users can override the default by putting a file abbreviations
in their user data directory (`~/.pandoc` on *nix).
Closes#1905.
Removed stateChapters from ParserState.
Now we parse chapters as level 0 headers, and parts as level -1 headers.
After parsing, we check for the lowest header level, and if it's
less than 1 we bump everything up so that 1 is the lowest header level.
So `\part` will always produce a header; no command-line options
are needed.
+ Text.Pandoc.Writers.Shared
+ Text.Pandoc.Parsing
+ Text.Pandoc.Asciify
+ Text.Pandoc.Emoji
+ Text.Pandoc.ImageSize
[API change]
These are often helpful to people writing their own
reader or writer modules. Closes#3260.
This now contains the Verbosity definition previously
in Options, as well as a new LogMessage datatype that
will eventually be used instead of raw strings for
warnings.
This will enable us, among other things, to provide
machine-readable warnings if desired.
See #3392.
The App module provides a function that does a pandoc conversion,
based on option settings. The program (pandoc.hs) now does nothing
more than parse options and pass them to this function, which can
easily be used by other applications (e.g. a GUI wrapper).
The Opt structure has been further simplified.
API changes:
* New exposed module Text.Pandoc.App
* Text.Pandoc.Highlighting has been exposed.
* highlightingStyles has been moved to Text.Pandoc.Highlighting.
Removed writerDocbookVersion in WriterOptions.
Renamed default.docbook template to default.docbook4.
Allow docbook4 as an output format.
But alias docbook = docbook4.
* Text.Pandoc.Writers.HTML: removed writeHtml, writeHtmlString,
added writeHtml4, writeHtml4String, writeHtml5, writeHtml5String.
* Removed writerHtml5 from WriterOptions.
* Renamed default.html template to default.html4.
* "html" now aliases to "html5"; to get the old HTML4 behavior,
you must now specify "-t html4".
The type is implemented in terms of an underlying bitset
which should be more efficient.
API change: from Text.Pandoc.Extensions export Extensions,
emptyExtensions, extensionsFromList, enableExtension, disableExtension,
extensionEnabled.
* Remove exported module `Text.Pandoc.Readers.TeXMath`
* Add exported module `Text.Pandoc.Writers.Math`
* The function `texMathToInlines` now lives in `Text.Pandoc.Writers.Math`
* Export helper function `convertMath` from `Text.Pandoc.Writers.Math`
* Use these functions in all writers that do math conversion.
This ensures that warnings will always be issued for failed
math conversions.
Introduce a new module, Text.Pandoc.Free, with pure versions, based on
the free monad, of numerous IO functions used in writers and
readers. These functions are in a pure
Monad (PandocAction). PandocAction takes as a parameter the type of
IORefs in it. It can be aliased in individual writers and readers to
avoid this parameter.
Note that this means that at the moment a reader can only use one type
of IORef. If possible, it would be nice to remove this limitation.
So far this just reproduces capacity.
Later we'll be able to add features like warning
messages, dynamic loading of xml syntax definitions,
and dynamic loading of themes.
+ Removed Text.Pandoc.Readers.Docx.Fonts
+ Moved its code to texmath; we now use (from texmath 0.9)
Text.TeXMath.Unicode.Fonts
+ Use texmath 0.9 (currently from git).
+ Updated epub tests because texmath now handles more mathml.
We already lower-bound tagsoup at 0.13.7, which means we were always
running the compatibility layer (it was conditional on min value
0.13). Better to just use `lookupEntity` from the library directly, and
convert a string to a char if need be.