A consequence of this change is that the backtick form will be
preferred in general if both are enabled. I think that is good,
as it is much more widespread than the tilde form.
Closes#1084.
This is reported to be necessary to avoid an error from recent
versions of Libre Office when files contain more than one image.
Closes#1069.
Thanks to wmanley for reporting and diagnosing the problem.
Going forward we'll use pandoc-citeproc, as an external filter.
The `--bibliography`, `--csl`, and `--citation-abbreviation` fields
have been removed. Instead one must include `bibliography`, `csl`,
or `csl-abbrevs` fields in the document's YAML metadata. The filter
can then be used as follows:
pandoc --filter pandoc-citeproc
The `Text.Pandoc.Biblio` module has been removed. Henceforth,
`Text.CSL.Pandoc` from pandoc-citations can be used by library users.
The Markdown and LaTeX readers now longer format bibliographies and
citations. That must be done using `processCites` or `processCites'`
from Text.CSL.Pandoc.
All bibliography-related fields have been removed from `ReaderOptions`
and `WriterOptions`: `writerBiblioFiles`, `readerReferences`,
`readerCitationStyle`.
API change.
The code:
~~~{#test}
asdf
~~~
gets compiled to html:
<pre id="test">
asdf
</pre>
So it is possible to link to the identifier `test`
But this doesn't happen on latex
When using the listings package (`--listings`) it is possible to set the
identifier using the `label=test` property:
\begin{lstlisting}[label=id]
hi
\end{lstlisting}
And this is exactly what this patch is doing.
Modified LaTeX Reader/Writer and added tests for this.
* Add ??? as fallback text for non-resolved citations.
* Biblio: Put references (including a header at the end of
the document, if one exists) inside a Div with class "references".
This gives some control over styling of references, and allows
scripts to manipulate them.
* Markdown writer: Print markdown citation codes, and disable
printing of references, if `citations` extension is enabled.
NOTE: It would be good to improve what citeproc-hs does for
a nonexistent key.
When the original document had text containing //, this was previously
included, unchanged, in the dokuwiki output, and this interacted badly
with later, intended, formating text.
Done because I noticed that in the Autolinks section of writer.dokuwiki, the URL in inlined code was getting auto-linked, when it wasn't supposed to.
This also meant that any inline code examples that had text that looked like dokuwiki syntax could break the formatting of later text.
I've found some incorrect behaviours with the dokuwiki output, for which
extra test cases will be needed - that aren't covered by the standard
pandoc test input files.
* Closed#927 (a bug in which `<pre>` in certain contexts was
not recognized as a code block).
* Remove internal HTML tags in code blocks, rather than printing
them verbatim.
* Parse attributes on `<pre>` tag for code blocks.
Now the `title`, `section`, `header`, and `footer` can all be set
individually in metadata. The `description` variable has been
removed.
Quotes have been added so that spaces are allowed in the title.
If you have a title that begins
COMMAND(1) footer here | header here
pandoc will parse it as before into a title, section, header, and
footer. But you can also specify these elements explicitly.
Closes#885.
* Depend on pandoc 1.12.
* Added yaml dependency.
* `Text.Pandoc.XML`: Removed `stripTags`. (API change.)
* `Text.Pandoc.Shared`: Added `metaToJSON`.
This will be used in writers to create a JSON object for use
in the templates from the pandoc metadata.
* Revised readers and writers to use the new Meta type.
* `Text.Pandoc.Options`: Added `Ext_yaml_title_block`.
* Markdown reader: Added support for YAML metadata block.
Note that it must come at the beginning of the document.
* `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`,
`stateAuthors`, `stateDate` with `stateMeta`.
* RST reader: Improved metadata.
Treat initial field list as metadata when standalone specified.
Previously ALL fields "title", "author", "date" in field lists
were treated as metadata, even if not at the beginning.
Use `subtitle` metadata field for subtitle.
* `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string
instead of a compiled template..
* OPML template: Use 'for' loop for authors.
* Org template: '#+TITLE:' is inserted before the title.
Previously the writer did this.
rst2html doesn't add `<p>` tags to list items (even when they are
separated by blank lines) unless there are multiple paragraphs in the
list. This commit changes the RST reader to conform more closely to
what docutils does.
Closes#880.
The _note attribute is supported. This is unofficial, but
used e.g. in OmniOutliner and supported by multimarkdown.
We treat the contents as markdown blocks under a section
header.
Added to documentation and tests.
* Reverts 1.11 change that caused citations to be rendered as
markdown citations, even if `--biblio` was specified, unless
`citation` extension is disabled. Now, formatted citations
are always printed if `--biblio` was specified. If you want to
reformat markdown keeping pandoc markdown citations intact,
just don't specify `--biblio`.
* Reverted now unnecessary changes to Text.Pandoc.Biblio adding the raw
block to mark the bibliography, and to Text.Pandoc.Writers.Markdown
to remove the bibliography if `citations` not specified.
* If the content of a `Cite` inline is a `RawInline "latex"`, which
means that a LaTeX citation command was parsed and `--biblio` wasn't
specified, then render it as a pandoc markdown citation. This means
that `pandoc -f latex -t markdown`, without `--biblio`, will convert
LaTeX citation commands to pandoc markdown citations.
Previously, a LaTeX citation would always be parsed as a Citation
element, with the raw LaTeX in the [Inline] part.
Now, the LaTeX citation is parsed as a Citation element only if
`--biblio` was specified (i.e. only if there is a nonempty set
of references in readerReferences). Otherwise it is parsed as
raw LaTeX.
This will make it possible to simplify some things in the markdown
writer. It also makes the LaTeX reader behave more like the Markdown
reader.
Previously citations were rendered as citeproc-formatted citations
by default. Now we render them as pandoc citations, e.g. `[@item1]`,
unless the `citations` extension is disabled.
If you still want formatted citations in your markdown output,
use `pandoc -t markdown-citations`.
* Moved code for translating listings language names to
highlighting-kate names and back from LaTeX reader to Highlighting.
* Text.Pandoc.Highlighting no longer exposed (API change)
* Text.Pandoc.Highlighting exports toListingsLang, fromListingsLang
Note: The attributes go on the enclosing section or div
if `--section-divs` is specified.
Also fixed a regression (only now noticed) in html+lhs output.
Previously the bird tracks were being omitted.
The 1.10 code assumed that each table header cell contains
exactly one block. That failed for headerless tables (0) and also
for tables with multiple blocks in a header cell.
The code is fixed and tests provided. Thanks to Andrew Lee for
pointing out the bug.
* RTF writer: Export writeRTFWithEmbeddedImages instead of
rtfEmbedImage.
* Text.Pandoc: Use writeRTFWithEmbeddedImages for RTF.
* Moved code for embedding images in RTF out of pandoc.hs.
* It no longer uses Network.URIs URI parser, which is too restrictive
(not allowing unicode URIs unless encoded).
* It allows many more schemes.
* It better handles punctuation so as to avoid capturing trailing
punctuation in bare URLs.
* In markdown reader, add a '\1' character to the beginning
of the title of an image that is alone in its paragraph,
if implicit_figures extension is selected.
* In writers, check for Para [Image alt (src,'\1':tit)] and treat
it as a figure if possible.
* Updated tests.
This is a bit of a hack, but it allows us to make implicit_figures
an extension of the markdown reader, rather than the writers.
We now (a) use anonymous links for links with inline URLs, and
(b) use an inline link instead of a reference link if the
reference link would require a label that has already been
used for a different link.
Closes#511.
Previously header ids were autogenerated by the writers.
Now they are generated (unless supplied explicitly) in the
markdown parser, if the `header_identifiers` extension is
selected.
In addition, the textile reader now supports id attributes on
headers.
Taking into account new context/latex output, and fixing
some bugs in the test suite Tests.Helpers and Tests.Writers.ConTeXt.
(We had the wrong order of expected/actual in the diff output.)
Previously the textile reader and writer incorrectly implented
RST-style autolinks for URLs and email addresses.
This has been fixed. Now an autolink is done this way:
"$":http://myurl.com
Now pandoc correctly handles hard line breaks inside list items.
Previously they broke list parsing. Thanks to Pablo
Rodríguez for pointing out the problem.
* Depend on text.
* Expose Text.Pandoc.UTF8.
* Text.Pandoc.UTF8 now exports toString, fromString,
toStringLazy, fromStringLazy.
* These are used instead of the old utf8-string functions.
Now we insert anchors after each header, and use @ref
instead of @uref for links.
Commas are now escaped as @comma{} only when needed; previously
all commas were escaped. (This change is needed, in part, because @ref
commands must be followed by a real comma or period.)
Also insert a blank line in from of @verbatim environments.
Previously the parser would hang on input like this:
[[[[[[[[[[[[[[[[[[hi
We fixed this by making the link parser parser characters
between balanced brackets (skipping brackets in inline code spans),
then parsing the result as an inline list.
One change is that
[hi *there]* bud](/url)
is now no longer parsed as a link. But in this respect pandoc behaved
differently from most other implementations anyway, so that seems okay.
All current tests pass. Added test for this case.
Closes#620.
This allows the markdown reader to treat '\begin' (not followed
by an argument) as a raw string rather than erroring out when
it doesn't find a '{'.
Closes#622.
Unescaped -'s become hyphens, while \-'s are left as ascii
minus signs. That is preferable for use with command-line
options.
See http://lintian.debian.org/tags/hyphen-used-as-minus-sign.html.
Thanks to Andrea Bolognani for bringing the issue to our
attention.
- Removed writerLiterateHaskell from WriterOptions.
- Removed readerLiterateHaskell from ReaderOptions.
- Added Ext_literate_haskell to Extensions. Test for this
instead of the above.
- Removed failUnlessLHS from Shared.
Note: At this point, +lhs and .lhs extension no longer has any effect.
Need to fix.
* Use Builder's Inlines/Blocks instead of lists.
* Return values in the reader monad, which are then
run (at the end of parsing) against the final
parser state. This allows links, notes, and
example numbers to be resolved without a second
parser pass.
* An effect of using Builder is that everything is
normalized automatically.
* New exports from Text.Pandoc.Parsing:
widthsFromIndices, NoteTable', KeyTable', Key', toKey',
withQuoteContext, singleQuoteStart, singleQuoteEnd, doubleQuoteStart,
doubleQuoteEnd, ellipses, apostrophe, dash
* Updated opendocument tests.
* Don't derive Show for ParserState.
* Benchmarks: markdown reader takes 82% of the time it took before.
Markdown writer takes 92% of the time (here the speedup is probably
due to the fact that everything is normalized by default).
To run tests, configure with --enable-tests, then 'cabal test'.
You can specify particular tests using --test-options='-t markdown'.
No output is shown unless tests fail. In the future, we can move
to the detailed-1.0 interface.
* All tables now require at least one body row.
* Renamed from 'extra' to 'pipe' tables.
* Moved functions from Parsing to Readers.Markdown.
* Cleaned up code; revised to parse in one pass rather than
parsing a raw string, splitting it, and parsing the components.
* Allow pipe tables without pipes on the ends (as PHP Markdown Extra
does).
* Use `:` form instead of `~`, for better compatibility with other
markdown implementations.
* Don't wrap the term, because it breaks definition lists.
This way you can still get the raw latex back, even if you don't
process with citeproc. Previously, cites were not visible at all
unless you specified --biblio on the command line and converted
them using citeproc, or used --natbib or --biblatex.
* The new reader is more robust, accurate, and extensible.
It is still quite incomplete, but it should be easier
now to add features.
* Text.Pandoc.Parsing: Added withRaw combinator.
* Markdown reader: do escapedChar before raw latex inline.
Otherwise we capture commands like \{.
* Fixed latex citation tests for new citeproc.
* Handle \include{} commands in latex.
This is done in pandoc.hs, not the (pure) latex reader.
But the reader exports the needed function, handleIncludes.
* Moved err and warn from pandoc.hs to Shared.
* Fixed tests - raw tex should sometimes have trailing space.
* Updated lhs-test for highlighting-kate changes.
* New module `Text.Pandoc.Docx`.
* New output format `docx`.
* Added reference.docx.
* New option `--reference-docx`.
The writer includes support for highlighted code blocks
and math (which is converted from TeX to OMML using
texmath's new OMML module).
Pandoc previously behaved like Markdown.pl for consecutive
lists of different styles. Thus, the following would be parsed
as a single ordered list, rather than an ordered list followed
by an unordered list:
1. one
2. two
- one
- two
This patch makes pandoc behave more sensibly, parsing this as
two lists. Any change in list type (ordered/unordered) or in
list number style will trigger a new list. Thus, the following
will also be parsed as two lists:
1. one
2. two
a. one
b. two
Since we regard this as a bug in Markdown.pl, and not something
anyone would ever rely on, we do not preserve the old behavior
even when `--strict` is selected.
* `---` is always em-dash, `--` is always en-dash.
* pandoc no longer tries to guess when `-` should be en-dash.
* A new option, `--old-dashes`, is provided for legacy documents.
Rationale: The rules for en-dash are too complex and
language-dependent for a guesser to work reliably. This
change gives users greater control. The alternative of
using unicode isn't very good, since unicode em- and en-
dashes are barely distinguishable in a monospace font.
Inline math uses the :math:`...` construct.
Display math uses
.. math:: ...
or if multilin
.. math::
...
These seem to be supported now by rst2latex.py.
Beamer output uses the default LaTeX template, with some
customizations via variables.
Added `writerBeamer` to `WriterOptions`.
Added `--beamer` option to `markdown2pdf`.
The container element will have the classes, id, and
key-value attributes you specified in the delimited code
block.
Previously these were stripped off.
escapeURI now only escapes space characters, leaving unicode characters
as they are, instead of converting them to octets and URL-encoding them,
as before. This gives more readable URIs. User agents now do the
percent-encoding themselves.
URIs are no longer unescaped at all on conversion to markdown, asciidoc,
rst, org.
Closes#349.
Still TODO:
- documentation in README
- add default.asciidoc to templates/
- lists
- tables
- proper escaping
- footnotes with blank lines - print separately at end?
currently they are just ignored.
- fix header (date gives weird result on pandoc README)
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere
else", or, with the new citeproc-hs, "and nowhere else".
Now it yields " and nowhere else".
A horizontal rule now gets transformed into an empty H1 header
before 'hierarchicalize' is called.
If the document that does not begin with an H1 header, an
empty one is provided.
This avoids the need for kludgy raw HTML.
Also, the 'titleslide' class is added to any section containing
just a title:
----
----
For example, in
Just a few glitches remaining.
<ul><li> In this situation, one loses the list.
</ul>
And in this, the preformatting.
<pre>Preformatted text not starting with its own blank line.
</pre>
Thansk to Dirk Laurie for noticing the issue.
So, in RST, 'http://google.com.' should be parsed as a link
to 'http://google.com' followed by a period.
The parser is smart enough to recognize balanced parentheses,
as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'.
Also added ()s to RST specialChars, so '(http://google.com)'
will be parsed as a link in parens.
Added test cases.
Resolves Issue #291.
"First paragraph" as opposed to "Text body." This allows
users to specify e.g. that only paragraphs after the first
paragraph of a section are to be indented.
Thanks to Andrea Rossato for the patch.
Closes github Issue #20.
Additional related changes:
* URLs in Code in autolinks now use class "url".
* Require highlighting-kate 0.2.8.2, which omits the final <br/> tag,
essential for inline code.
Field lists now work properly with block content.
(Thanks to Lachlan Musicman for pointing out the bug.)
In addition, definition list items are now always Para instead
of Plain -- which matches behavior of rst2xml.py.
Finally, in image blocks, the alt attribute is parsed properly
and used for the alt, not also the title.
The old TeX, HtmlInline and RawHtml elements have been removed
and replaced by generic RawInline and RawBlock elements.
All modules updated to use the new raw elements.
It needs to be inside the if(strikeout) condition, after
the ulem package is imported; otherwise we try to renewcommand{\sout} when
\sout isn't yet defined.
Moved old tests into Old.hs and added new simple test-pandoc.hs for
loading and grouping together tests from different files. Later
commits will add more testfiles to the suite with more modular tests.
+ You can now specify glob patterns after 'cabal test';
e.g. 'cabal test latex' will only run the latex tests.
+ Instead of detecting highlighting support in Setup.hs,
we now detect it in test-pandoc, by looking to see if
'languages' is null.
+ We now verify the lhs readers against the lhs-test.native,
normalizing with 'normalize'. This makes more sense than
verifying against HTML, which also brings in the HTML writer.
+ Added lhsn-test.nohl.{html,html+lhs}, so we can do the lhs
tests whether or not highlighting has been installed.
+ <nav> for TOC, <figure> for figures, type attribute in <ol>.
+ Don't add math javascript in html5.
+ Use style attributes instead of deprecated width, align.
+ html template: move <title> after <meta>.
Note: charset needs to be declared before title.
+ slidy and s5 templates: move <title> after <meta>.
+ html template: Added link to html5 shim for IE.
+ Make --html5 have an effect only for 'html' writer (not s5, slidy, epub).
'(_hi_)' was being parsed with literal underscores (no emphasis).
The fix: the 'str' parser now only parses alphanumerics and
embedded underscores. All other symbols are handled by the
'symbol' parser. This has a slight effect on the AST, since
you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there
should not be a visible effect in any of the writers.
Thanks to gwern for pointing out the regression.
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
* Added Text.Pandoc.Pretty.
This is better suited for pandoc than the 'pretty' package.
One advantage is that we now get proper wrapping; Emph [Inline]
is no longer treated as a big unwrappable unit. Previously
we only got breaks for spaces at the "outer level." We can also
more easily avoid doubled blank lines. Performance is
significantly better as well.
* Removed Text.Pandoc.Blocks.
Text.Pandoc.Pretty allows you to define blocks and concatenate
them.
* Modified markdown, RST, org readers to use Text.Pandoc.Pretty
instead of Text.PrettyPrint.HughesPJ.
* Text.Pandoc.Shared: Added writerColumns to WriterOptions.
* Markdown, RST, Org writers now break text at writerColumns.
* Added --columns command-line option, which sets stColumns
and writerColumns.
* Table parsing: If the size of the header > stColumns,
use the header size as 100% for purposes of calculating
relative widths of columns.
This is necessary as the latex citation commands include there own
punctuation, which resulted in doubled commas for markdown documents
where citeproc output works correctly.
This is necessary because converting from markdown to latex correctly
changes hyphens to en-dashes and some spaces to non-breaking spaces.
Converting back to markdown does not undo this changes, and so the
tests have to undo them.
* The recent change allowing spaces and newlines in the URL
caused problems when reference keys are stacked up without
blank lines between. This is now fixed.
* Added test.
Resolves issue #258.
Note that there are some differences in how docutils and
pandoc treat footnotes. Currently pandoc ignores the numeral
or symbol used in the note; footnotes are put in an auto-numbered
ordered list.
Previously we allowed '. . .', ' . . . ', etc. This caused
too many complications, and removed author's flexibility in
combining ellipses with spaces and periods.
Previously, curly quotes were just parsed literally, leading
to problems in some output formats. Now they are parsed as
Quoted inlines, if --smart is specified.
Resolves Issue #270.
This broke when we added the Key type. We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.
* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
only way the API provides to construct a Key.
Resolves Issue #272.