Now list item contents is parsed as blocks,
without resorting to parseFromString.
Only the first line of paragraph has to
be indented now, just like in Emacs Muse
and Text::Amuse.
Definition lists are not refactored yet.
See also: issue #3865.
This is difficult to recreate with a modern version of Word, so I'm
using the file submitted with the bug report. It would be preferable
to find a smaller example with Latin characters, though, so as not to
confuse the issue being tested.
The change both improves performance and fixes a
regression whereby normal citations inside inline notes
were not parsed correctly.
Closesjgm/pandoc-citeproc#315.
Elements with attributes got an additional `attr` accessor. Attributes
were accessible only via the `identifier`, `classes`, and `attributes`,
which was in conflict with the documentation, which indirectly states
that such elements have the an `attr` property.
Every constructor which accepts a list of blocks now also accepts a
single block element for convenience. Furthermore, strings are accepted as
shorthand for `{pandoc.Str "text"}` in constructors.
We had previously defaulted to slideLevel 2. Now we use the correct
behavior of defaulting to the highest level header followed by
content. We change an expected test result to match this behavior.
This gives a pure way to insert an ersatz file into a FileTree.
In addition, we normalize paths both on insertion and on
lookup, so that "foo" and "./foo" will be judged equivalent.
This is the beginning of a test suite for the powerpoint
writer. Initial tests are for the number of slides.
Note that at the moment it does not test against corruption in
Microsoft PowerPoint; it just tests that certain outcomes work as
expected. More tests will be added.
This test framework uses the PandocPure monad introduced with Pandoc 2.0.
The level of headers in included files can be shifted to a higher level
by specifying a minimum header level via the `:minlevel` parameter. E.g.
`#+include: "tour.org" :minlevel 1` will shift the headers in tour.org
such that the topmost headers become level 1 headers.
Fixes: #4154
The org reader test file had grown large, to the point that editor
performance was negatively affected in some cases. The tests are spread
over multiple submodules, and re-combined into a tasty TestTree in the
main org reader test file.
The same init file (`data/init`) that is used to setup the Lua
interpreter for Lua filters is also used to setup the interpreter of
custom writers.lua.
Support writing <fig> and <table-wrap> elements with <title> and
<caption> inside them by using Divs with class set to on of
fig, table-wrap or cation. The title is included as a Heading
so the constraint on where Heading can occur is also relaxed.
Also leaves out empty alt attributes on links.
The integration with Lua's package/module system is improved: A
pandoc-specific package searcher is prepended to the searchers in
`package.searchers`. The modules `pandoc` and `pandoc.mediabag` can now
be loaded via `require`.
* Deprecate `--strip-empty-paragraphs` option. Instead we now
use an `empty_paragraphs` extension that can be enabled on
the reader or writer. By default, disabled.
* Add `Ext_empty_paragraphs` constructor to `Extension`.
* Revert "Docx reader: don't strip out empty paragraphs."
This reverts commit d6c58eb836.
* Implement `empty_paragraphs` extension in docx reader and writer,
opendocument writer, html reader and writer.
* Add tests for `empty_paragraphs` extension.
We now have the `--strip-empty-paragraphs` option for that,
if you want it. Closes#2252.
Updated docx reader tests.
We use stripEmptyParagraphs to avoid changing too
many tests. We should add new tests for empty paragraphs.
Attribute lists are represented as associative lists in Lua. Pure
associative lists are awkward to work with. A metatable is attached to
attribute lists, allowing to access and use the associative list as if
the attributes were stored in as normal key-value pair in table.
Note that this changes the way `pairs` works on attribute lists. Instead
of producing integer keys and two-element tables, the resulting iterator
function now returns the key and value of those pairs. Use `ipairs` to
get the old behavior.
Warning: the new iteration mechanism only works if pandoc has been
compiled with Lua 5.2 or later (current default: 5.3).
The `pandoc.Attr` function is altered to allow passing attributes as
key-values in a normal table. This is more convenient than having to
construct the associative list which is used internally.
Closes#4071
The `text` module is preloaded in lua. The module contains some UTF-8
aware string functions, implemented in Haskell. The module is loaded on
request only, e.g.:
text = require 'text'
function Str (s)
s.text = text.upper(s.text)
return s
end
<annotation> is not allowed inside <body> according to FictionBook2 XML schema. Besides that, the same information is already placed inside <description>.
Related bug: #2424
* Basic skeleton for creole reader.
No real functionality besides preliminary bold and italics yet.
* Creole: add support for bold/italic with implicit end at paragraph end.
* Creole: add support for headings.
* Creole: add support for tilde escaped chars.
* Basic skeleton for creole reader.
No real functionality besides preliminary bold and italics yet.
* Creole: add support for bold/italic with implicit end at paragraph end.
* Creole: add support for headings.
* Creole: add support for tilde escaped chars.
* Add a test suite for the creole parser
So far this covers only things the parser already supports.
* Added simple parsing of flat unordered lists.
* Added tests for unordered lists in creole.
* First, wrong(!) implementation of sublists.
Fails test, as sublists should not be embedded in a list item!
* Implementation of unordered sublists.
* Added support for ordered lists to creole reader.
* Added utility function to append parsers to Creole reader.
* Creole reader: Fixed list item end detection in sub lists.
* Tests for creole reader: added more tests for lists.
Covering ordered and unordered tests, even mixed. Tests for
formatting in list items still missing...
* Added "nowiki" blocks. One exception rule is missing...
* Creole reader: nowiki: implemented exception for curly brackets.
* Creole reader: added inline nowiki.
* Creole reader: added horizontalRule.
* Creole reader: added auto linking of URIs.
* Creole reader: detect horizontalRule as para end.
Used the opportunity for a little refactoring.
* Creole reader: added forced line breaks.
Including test.
* Creole reader: implement wiki links.
* Creole reader: added image support.
* Creole reader: support images as links.
* Creole reader: implemented placeholder -- by simply dropping them.
* Creole reader: added tests for links.
After observing a regression, it was really time... ;-)
* Creole reader: fixed links with names.
* Creole reader: allow space after first of enclosing tags.
Space after the start of formatting tags are allowed with creole,
e.g. "there is // italic text // in here" is legal.
This problem was discovered using the creole1.0test.txt document from
http://www.wikicreole.org/wiki/Creole1.0TestCases
See l.57:
# // italic item 3 //
* Creole reader: fixed links without names.
* Creole reader: Tests, sorted into groups.
* Creole reader: implemented tables.
* Removed redundant import.
* Creole reader: add correct escaping of links.
* Creole reader: allow handling of e.g. links in parenthesis and quotes.
* Creole reader: Modified disclaimer as most of the code is actually by me.
* Creole reader: Tests: added escaped links.
* Creole reader: preserve leading and trailing space in bold/italic.
* Creole reader: detect tables without a leading blank line.
* Creole Reader: added official creole1.0test.txt as "old" test.
The base document was downloaded from
http://www.wikicreole.org/wiki/Creole1.0TestCases.
The Wiki, and therefore the test document is
Copyright (C) by the contributors.
Some rights reserved, license CC BY-SA.
http://creativecommons.org/licenses/by-sa/1.0/
* Added underlineSpan builder function. This can be easily updated if needed. The purpose is for Readers to transform underlines consistently.
* Docx Reader: Use underlineSpan and update test
* Org Reader: Use underlineSpan and add test
* Textile Reader: Use underlineSpan and add test case
* Txt2Tags Reader: Use underlineSpan and update test
* HTML Reader: Use underlineSpan and add test case
Removed `writerSourceURL` from `WriterOptions` (API change).
Added `stSourceURL` to `CommonState`.
It is set automatically by `setInputFiles`.
Text.Pandoc.Class now exports `setInputFiles`, `setOutputFile`.
The type of `getInputFiles` has changed; it now returns `[FilePath]`
instead of `Maybe [FilePath]`.
Functions in Class that formerly took the source URL as a parameter
now have one fewer parameter (`fetchItem`, `downloadOrRead`,
`setMediaResource`, `fillMediaBag`).
Removed `WriterOptions` parameter from `makeSelfContained` in
`SelfContained`.
The org reader was updated to match current org-mode behavior: the set
of characters which are acceptable to occur as the first or last
character in an org emphasis have been changed and now allows all
non-whitespace chars at the inner border of emphasized text (see
`org-emphasis-regexp-components`).
Fixes: #3933
Previously pandoc would sometimes combine two line blocks separated by blanks, and ignore trailing blank lines within the line block.
Test is checked to be consisted with http://rst.ninjs.org/
This change makes it possible to define a catch-all function using lua's
metatable lookup functionality.
function catch_all(el)
…
end
return {
setmetatable({}, {__index = function(_) return catch_all end})
}
A further effect of this change is that the map with filter functions
now only contains functions corresponding to AST element constructors.
Previously we just tacked on a directory to the command
line, but that didn't work when we e.g. used a pipe for round tripping,
with two invocations of pandoc.
Added TikiWiki reader, including tests and documentation.
It's probably not *complete*, but it works pretty well, handles all
the basics (and some not-so-basics).
This rewrite is primarily motivated by the need to
get macros working properly. A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).
We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.
A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
Exports Macro, Tok, TokType, Line, Column. [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
rawLaTeXInline and rawLaTeXBlock. (Both now return a String,
and they are polymorphic in state.)
* Added orgMacros field to OrgState. [API change]
* Removed readerApplyMacros from ReaderOptions.
Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.
Fixes#1390.
Fixes#2118.
Fixes#3236.
Fixes#3779.
Fixes#934.
Fixes#982.
in Text.Pandoc.Lua. Also to pushPandocModule.
This change allows users to override pandoc.lua with a file
in their local data directory, adding custom functions, etc.
@tarleb, if you think this is a bad idea, you can revert this.
But in general our data files are all overridable.
* New module Text.Pandoc.Readers.Vimwiki, exporting readVimwiki [API change].
* New input format `vimwiki`.
* New data file, `data/vimwiki.css`, for displaying the HTML produced by this reader and pandoc's HTML writer in the style of vimwiki's own HTML export.
* XML.toEntities: changed type to Text -> Text.
* Shared.tabFilter -- fixed so it strips out CRs as before.
* Modified writers to take Text.
* Updated tests, benchmarks, trypandoc.
[API change]
Closes#3731.
The Emacs default is to include tags in the headline when exporting.
Instead of just empty spans, which contain the tag name as attribute,
tags are rendered as small caps and wrapped in those spans.
Non-breaking spaces serve as separators for multiple tags.
Until now, org-ref cite keys included special characters also at the
end. This caused problems when citations occur right before colons or
at the end of a sentence.
With this change, all non alphanumeric characters at the end of a cite
key are ignored.
This also adds `,` to the list of special characters that are legal
in cite keys to better mirror the behaviour of org-export.
Emacs parses org documents into a tree structure, which is then
post-processed during exporting. The reader is changed to do the same,
turning the document into a single tree of headlines starting at
level 0.
Fixes: #3695
Parsing of smart quotes and special characters can either be enabled via
the `smart` language extension or the `'` and `-` export options. Smart
parsing is active if either the extension or export option is enabled.
Only smart parsing of special characters (like ellipses and en and em
dashes) is enabled by default, while smart quotes are disabled.
This means that all smart parsing features will be enabled by adding the
`smart` language extension. Fine-grained control is possible by leaving
the language extension disabled. In that case, smart parsing is
controlled via the aforementioned export OPTIONS only.
Previously, all smart parsing was disabled unless the language extension
was enabled.
The implicitly defined global filter (i.e. all element filtering
functions defined in the global lua environment) is used if no filter is
returned from a lua script. This allows to just write top-level
functions in order to define a lua filter. E.g
function Emph(elem) return pandoc.Strong(elem.content) end
Source block parameter names are no longer prefixed with *rundoc*. This
was intended to simplify working with the rundoc project, a babel
runner. However, the rundoc project is unmaintained, and adding those
markers is not the reader's job anyway.
The original language that is specified for a source element is now
retained as the `data-org-language` attribute and only added if it
differs from the translated language.
The line-numbering switch that can be given to source blocks (`-n` with
an start number as an optional parameter) is parsed and translated to a
class/key-value combination used by highlighting and other readers and
writers.
Plain text readers are exposed to lua scripts via the `pandoc.reader`
submodule, which is further subdivided by format. Converting e.g. a
markdown string into a pandoc document is possible from within lua:
doc = pandoc.reader.markdown.read_doc("Hello, World!")
A `read_block` convenience function is provided for all formats,
although it will still parse the whole string but return only the first
block as the result.
Custom reader options are not supported yet, default options are used
for all parsing operations.
* New module: Text.Pandoc.Writers.Ms.
* New template: default.ms.
* The writer uses texmath's new eqn writer to convert math
to eqn format, so a ms file produced with this writer
should be processed with `groff -ms -e` if it contains
math.
* Add `--lua-filter` option. This works like `--filter` but takes pathnames of special lua filters and uses the lua interpreter baked into pandoc, so that no external interpreter is needed. Note that lua filters are all applied after regular filters, regardless of their position on the command line.
* Add Text.Pandoc.Lua, exporting `runLuaFilter`. Add `pandoc.lua` to data files.
* Add private module Text.Pandoc.Lua.PandocModule to supply the default lua module.
* Add Tests.Lua to tests.
* Add data/pandoc.lua, the lua module pandoc imports when processing its lua filters.
* Document in MANUAL.txt.
These were confusing.
Now we rely on the +raw_tex or +raw_html extension with latex
or html input.
Thus, instead of
--parse-raw -f latex
we use
-f latex+raw_tex
and instead of
--parse-raw -f html
we use
-f html+raw_html