Previously both needed to be specified (unless the image was
being resized to be smaller than its original size).
If height but not width is specified, we now set width to
textwidth (and similarly if width but not height is specified).
Since we have keepaspectratio, this yields the desired result.
Attribute lists are represented as associative lists in Lua. Pure
associative lists are awkward to work with. A metatable is attached to
attribute lists, allowing to access and use the associative list as if
the attributes were stored in as normal key-value pair in table.
Note that this changes the way `pairs` works on attribute lists. Instead
of producing integer keys and two-element tables, the resulting iterator
function now returns the key and value of those pairs. Use `ipairs` to
get the old behavior.
Warning: the new iteration mechanism only works if pandoc has been
compiled with Lua 5.2 or later (current default: 5.3).
The `pandoc.Attr` function is altered to allow passing attributes as
key-values in a normal table. This is more convenient than having to
construct the associative list which is used internally.
Closes#4071
The `text` module is preloaded in lua. The module contains some UTF-8
aware string functions, implemented in Haskell. The module is loaded on
request only, e.g.:
text = require 'text'
function Str (s)
s.text = text.upper(s.text)
return s
end
This fixes a bug where pandoc would stop parsing a URI with an
empty attribute: for example, `&a=&b=` wolud stop at `a`.
(The uri parser tries to guess which punctuation characters
are part of the URI and which might be punctuation after it.)
Closes#4068.
Previously we got a crash, because we were trying to print
a native cmark STRIKETHROUGH node, and the commonmark writer
in cmark-github doesn't support this. Work around this by
using a raw node to add the strikethrough delimiters.
Closes#4038.
* Move as much as possible to the CSS in the template.
* Ensure that all the HTML-based templates (including epub)
contain the CSS for columns.
* Columns default to 50% width unless they are given a width
attribute.
Closes#4028.
The line identifiers are built using the code block's identifier
as a prefix. If the code block has null identifier, we use
"cb1", "cb2", etc.
Closes#4031.
* Remove "width" attribute which is not allowed on div.
* Remove space between `<div class="column">` elements,
since this prevents columns whose widths sum to 100%
(the space takes up space).
Closes#4028.
<annotation> is not allowed inside <body> according to FictionBook2 XML schema. Besides that, the same information is already placed inside <description>.
Related bug: #2424
and other non-HTML formats (`Text.Pandoc.Readers.HTML.htmlTag`).
The parser stopped at the first `>` character, even if it wasn't
the end of the comment.
Closes#4019.
Previously bullet lists interacted in odd way with ordered lists.
For example, bullet lists nested in ordered list had incorrect
indentation. Besides that, indentation with spaces is not rendered
by FBReader and fbless. To avoid this problem, bullet lists are
indented by appending bullets to marker just the same way it is
done for ordered lists.
According to FB2 XML schema <empty-line /> cannot be placed inside
<p>. Linux FBReader can't display such paragraphs, e.g. any "loose"
lists produced by pandoc prior to this commit. Besides that,
FB2 writer placed <p> inside <p> when writing nested lists,
this commit fixes the bug.
Also this commit removes leading non-breaking space from ordered
lists for consistency with bullet lists.
Definition lists are not affected at all.
* Basic skeleton for creole reader.
No real functionality besides preliminary bold and italics yet.
* Creole: add support for bold/italic with implicit end at paragraph end.
* Creole: add support for headings.
* Creole: add support for tilde escaped chars.
* Basic skeleton for creole reader.
No real functionality besides preliminary bold and italics yet.
* Creole: add support for bold/italic with implicit end at paragraph end.
* Creole: add support for headings.
* Creole: add support for tilde escaped chars.
* Add a test suite for the creole parser
So far this covers only things the parser already supports.
* Added simple parsing of flat unordered lists.
* Added tests for unordered lists in creole.
* First, wrong(!) implementation of sublists.
Fails test, as sublists should not be embedded in a list item!
* Implementation of unordered sublists.
* Added support for ordered lists to creole reader.
* Added utility function to append parsers to Creole reader.
* Creole reader: Fixed list item end detection in sub lists.
* Tests for creole reader: added more tests for lists.
Covering ordered and unordered tests, even mixed. Tests for
formatting in list items still missing...
* Added "nowiki" blocks. One exception rule is missing...
* Creole reader: nowiki: implemented exception for curly brackets.
* Creole reader: added inline nowiki.
* Creole reader: added horizontalRule.
* Creole reader: added auto linking of URIs.
* Creole reader: detect horizontalRule as para end.
Used the opportunity for a little refactoring.
* Creole reader: added forced line breaks.
Including test.
* Creole reader: implement wiki links.
* Creole reader: added image support.
* Creole reader: support images as links.
* Creole reader: implemented placeholder -- by simply dropping them.
* Creole reader: added tests for links.
After observing a regression, it was really time... ;-)
* Creole reader: fixed links with names.
* Creole reader: allow space after first of enclosing tags.
Space after the start of formatting tags are allowed with creole,
e.g. "there is // italic text // in here" is legal.
This problem was discovered using the creole1.0test.txt document from
http://www.wikicreole.org/wiki/Creole1.0TestCases
See l.57:
# // italic item 3 //
* Creole reader: fixed links without names.
* Creole reader: Tests, sorted into groups.
* Creole reader: implemented tables.
* Removed redundant import.
* Creole reader: add correct escaping of links.
* Creole reader: allow handling of e.g. links in parenthesis and quotes.
* Creole reader: Modified disclaimer as most of the code is actually by me.
* Creole reader: Tests: added escaped links.
* Creole reader: preserve leading and trailing space in bold/italic.
* Creole reader: detect tables without a leading blank line.
* Creole Reader: added official creole1.0test.txt as "old" test.
The base document was downloaded from
http://www.wikicreole.org/wiki/Creole1.0TestCases.
The Wiki, and therefore the test document is
Copyright (C) by the contributors.
Some rights reserved, license CC BY-SA.
http://creativecommons.org/licenses/by-sa/1.0/
* Added underlineSpan builder function. This can be easily updated if needed. The purpose is for Readers to transform underlines consistently.
* Docx Reader: Use underlineSpan and update test
* Org Reader: Use underlineSpan and add test
* Textile Reader: Use underlineSpan and add test case
* Txt2Tags Reader: Use underlineSpan and update test
* HTML Reader: Use underlineSpan and add test case
This prevents the problem with extra space around highlighted
code blocks (closes#3996).
Note that we no longer put an enclosing div around highlighted
code blocks. The pre is the outer element, just as for unhighlighted
blocks.
Previously `\include` wouldn't work if the included file
contained, e.g., a begin without a matching end.
We've changed the Tok type so that it stores a full SourcePos,
rather than just a line and column. So tokens keeep track
of the file they came from. This allows us to use a simpler
method for includes, which doesn't require parsing the included
document as a whole.
Closes#3971.
Removed `writerSourceURL` from `WriterOptions` (API change).
Added `stSourceURL` to `CommonState`.
It is set automatically by `setInputFiles`.
Text.Pandoc.Class now exports `setInputFiles`, `setOutputFile`.
The type of `getInputFiles` has changed; it now returns `[FilePath]`
instead of `Maybe [FilePath]`.
Functions in Class that formerly took the source URL as a parameter
now have one fewer parameter (`fetchItem`, `downloadOrRead`,
`setMediaResource`, `fillMediaBag`).
Removed `WriterOptions` parameter from `makeSelfContained` in
`SelfContained`.
The org reader was updated to match current org-mode behavior: the set
of characters which are acceptable to occur as the first or last
character in an org emphasis have been changed and now allows all
non-whitespace chars at the inner border of emphasized text (see
`org-emphasis-regexp-components`).
Fixes: #3933
* Options: Added readerStripComments to ReaderOptions.
* Added `--strip-comments` command-line option.
* Made `htmlTag` from the HTML reader sensitive to this feature.
This affects Markdown and Textile input.
Closes#2552.
Div's are difficult to translate into org syntax, as there are multiple
div-like structures (drawers, special blocks, greater blocks) which all
have their advantages and disadvantages. Previously pandoc would
use raw HTML to preserve the full div information; this was rarely
useful and resulted in visual clutter. Div-rendering was changed to
discard the div's classes and key-value pairs if there is no natural way
to translate the div into an org structure.
Closes: #3771
Previously pandoc would sometimes combine two line blocks separated by blanks, and ignore trailing blank lines within the line block.
Test is checked to be consisted with http://rst.ninjs.org/
This change makes it possible to define a catch-all function using lua's
metatable lookup functionality.
function catch_all(el)
…
end
return {
setmetatable({}, {__index = function(_) return catch_all end})
}
A further effect of this change is that the map with filter functions
now only contains functions corresponding to AST element constructors.
Closes#3511.
Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces. Now the indentation required is determined by the
first line of the list item: to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker. See the CommonMark
spec for more details and examples.
Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules. Here are some examples
of texts that will be parsed differently:
- a
- b
will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.
- a
code
Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin. With the
four-space rule, this would be a regular paragraph rather than a code
block.
- a
code
Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`. With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).
This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
#1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).
Users who want to use the old rules can select the `four_space_rule`
extension.
* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
of `gobbleSpaces` has been changed so that a `ReaderOptions`
parameter is not needed.
Acronyms are not resolved by the reader, but acronym and glossary information is put into attributes on Spans so that they can be processed in filters.
The structure expected is:
<div class="columns">
<div class="column" width="40%">
contents...
</div>
<div class="column" width="60%">
contents...
</div>
</div>
Support has been added for beamer and all HTML slide formats.
Closes#1710.
Note: later we could add a more elegant way to create
this structure in Markdown than to use raw HTML div elements.
This would come for free with a "native div syntax" (#168).
Or we could devise something specific to slides
Previously we just tacked on a directory to the command
line, but that didn't work when we e.g. used a pipe for round tripping,
with two invocations of pandoc.
We assume that comments are defined as parsed by the
docx reader:
I want <span class="comment-start" id="0" author="Jesse Rosenthal"
date="2016-05-09T16:13:00Z">I left a comment.</span>some text to
have a comment <span class="comment-end" id="0"></span>on it.
We assume also that the id attributes are unique and properly
matched between comment-start and comment-end.
Closes#2994.
Previously they would be transmitted to the template without
any escaping.
Note that `--M title='*foo*'` yields a different result from
---
title: *foo*
---
In the latter case, we have emphasis; in the former case, just
a string with literal asterisks (which will be escaped
in formats, like Markdown, that require it).
Closes#3792.
Also, fix regular macros so they're expanded at the
point of use, and NOT also the point of definition.
`\let` macros, by contrast, are expanded at the
point of definition. Added an `ExpansionPoint`
field to `Macro` to track this difference.
We previously did this only with raw blocks, on the assumption
that math environments would always be raw blocks. This has changed
since we now parse them as inline environments.
Closes#3816.
Thus, a span with attribute 'foo' gets written to HTML5
with 'data-foo', so it is valid HTML5.
HTML4 is not affected.
This will allow us to use custom attributes in pandoc without
producing invalid HTML.
Fixed applyMacros so that it operates on the whole
string, not just the first token!
Don't remove macro definitions from the output,
even if Ext_latex_macros is set, so that macros will
be applied. Since they're only applied to math in
Markdown, removing the macros can have bad effects.
Even for math macros, keeping them should be harmless.
Added TikiWiki reader, including tests and documentation.
It's probably not *complete*, but it works pretty well, handles all
the basics (and some not-so-basics).
This rewrite is primarily motivated by the need to
get macros working properly. A side benefit is that the
reader is significantly faster (27s -> 19s in one
benchmark, and there is a lot of room for further
optimization).
We now tokenize the input text, then parse the token stream.
Macros modify the token stream, so they should now be effective
in any context, including math. Thus, we no longer need the clunky
macro processing capacities of texmath.
A custom state LaTeXState is used instead of ParserState.
This, plus the tokenization, will require some rewriting
of the exported functions rawLaTeXInline, inlineCommand,
rawLaTeXBlock.
* Added Text.Pandoc.Readers.LaTeX.Types (new exported module).
Exports Macro, Tok, TokType, Line, Column. [API change]
* Text.Pandoc.Parsing: adjusted type of `insertIncludedFile`
so it can be used with token parser.
* Removed old texmath macro stuff from Parsing.
Use Macro from Text.Pandoc.Readers.LaTeX.Types instead.
* Removed texmath macro material from Markdown reader.
* Changed types for Text.Pandoc.Readers.LaTeX's
rawLaTeXInline and rawLaTeXBlock. (Both now return a String,
and they are polymorphic in state.)
* Added orgMacros field to OrgState. [API change]
* Removed readerApplyMacros from ReaderOptions.
Now we just check the `latex_macros` reader extension.
* Allow `\newcommand\foo{blah}` without braces.
Fixes#1390.
Fixes#2118.
Fixes#3236.
Fixes#3779.
Fixes#934.
Fixes#982.
in Text.Pandoc.Lua. Also to pushPandocModule.
This change allows users to override pandoc.lua with a file
in their local data directory, adding custom functions, etc.
@tarleb, if you think this is a bad idea, you can revert this.
But in general our data files are all overridable.
No more SingleQuoted, DoubleQuoted, InlineMath, DisplayMath.
This makes everything uniform and predictable, though it does
open up a difference btw lua filters and custom writers.
If the metadata field is all on one line, we try to interpret
it as Inlines, and only try parsing as Blocks if that fails.
If it extends over one line (including possibly the `|` or
`>` character signaling an indented block), then we parse as
Blocks.
This was motivated by some German users finding that
date: '22. Juin 2017'
got parsed as an ordered list.
Closes#3755.
Note that if the table has a first page header and a
continuation page header, the notes will appear only
on the first occurrence of the header.
Closes#2378.
* New module Text.Pandoc.Readers.Vimwiki, exporting readVimwiki [API change].
* New input format `vimwiki`.
* New data file, `data/vimwiki.css`, for displaying the HTML produced by this reader and pandoc's HTML writer in the style of vimwiki's own HTML export.
Note that as a result of this change, the following,
which formerly produced a header with two lines separated
by a line break, will now produce a header followed by a
paragraph:
# Hi\
there
This may affect some existing documents that relied on
this undocumented and unintended behavior.
This change makes pandoc more consistent with other
Markdown implementations, and with itself (since the two-space
version of a line break doesn't work inside ATX headers, and
neither version works inside Setext headers).
Closes#3730.
* XML.toEntities: changed type to Text -> Text.
* Shared.tabFilter -- fixed so it strips out CRs as before.
* Modified writers to take Text.
* Updated tests, benchmarks, trypandoc.
[API change]
Closes#3731.
Currently we only handle the form `0.9\linewidth`.
Anything else would have to be converted to a percentage,
using some kind arbitrary assumptions about line widths.
See #3709.
The Emacs default is to include tags in the headline when exporting.
Instead of just empty spans, which contain the tag name as attribute,
tags are rendered as small caps and wrapped in those spans.
Non-breaking spaces serve as separators for multiple tags.
Babel result blocks can have block attributes like captions and names.
Result blocks with attributes were not recognized and were parsed as
normal blocks without attributes.
Fixes: #3706
Until now, org-ref cite keys included special characters also at the
end. This caused problems when citations occur right before colons or
at the end of a sentence.
With this change, all non alphanumeric characters at the end of a cite
key are ignored.
This also adds `,` to the list of special characters that are legal
in cite keys to better mirror the behaviour of org-export.
With `--reference-location` of `section` or `block`, pandoc
will now repeat references that have been used in earlier
sections.
The Markdown reader has also been modified, so that *exactly*
repeated references do not generate a warning, only
references with the same label but different targets.
The idea is that, with references after every block,
one might want to repeat references sometimes.
Closes#3701.
Emacs parses org documents into a tree structure, which is then
post-processed during exporting. The reader is changed to do the same,
turning the document into a single tree of headlines starting at
level 0.
Fixes: #3695
- Export `inEm` from ImageSize [API change].
- Change `showFl` and `show` instance for `Dimension` so
extra decimal places are omitted.
- Added `Em` as a constructor of `Dimension` [API change].
- Allow `em`, `cm`, `in` to pass through without conversion
in HTML, LaTeX.
Closes#3450.
This is now the default for pandoc's Markdown.
It allows whitespace between the two parts of a
reference link: e.g.
[a] [b]
[b]: url
This is now forbidden by default.
Closes#2602.