When the original document had text containing //, this was previously
included, unchanged, in the dokuwiki output, and this interacted badly
with later, intended, formating text.
Done because I noticed that in the Autolinks section of writer.dokuwiki, the URL in inlined code was getting auto-linked, when it wasn't supposed to.
This also meant that any inline code examples that had text that looked like dokuwiki syntax could break the formatting of later text.
I've found some incorrect behaviours with the dokuwiki output, for which
extra test cases will be needed - that aren't covered by the standard
pandoc test input files.
* Closed#927 (a bug in which `<pre>` in certain contexts was
not recognized as a code block).
* Remove internal HTML tags in code blocks, rather than printing
them verbatim.
* Parse attributes on `<pre>` tag for code blocks.
Now the `title`, `section`, `header`, and `footer` can all be set
individually in metadata. The `description` variable has been
removed.
Quotes have been added so that spaces are allowed in the title.
If you have a title that begins
COMMAND(1) footer here | header here
pandoc will parse it as before into a title, section, header, and
footer. But you can also specify these elements explicitly.
Closes#885.
* Depend on pandoc 1.12.
* Added yaml dependency.
* `Text.Pandoc.XML`: Removed `stripTags`. (API change.)
* `Text.Pandoc.Shared`: Added `metaToJSON`.
This will be used in writers to create a JSON object for use
in the templates from the pandoc metadata.
* Revised readers and writers to use the new Meta type.
* `Text.Pandoc.Options`: Added `Ext_yaml_title_block`.
* Markdown reader: Added support for YAML metadata block.
Note that it must come at the beginning of the document.
* `Text.Pandoc.Parsing.ParserState`: Replace `stateTitle`,
`stateAuthors`, `stateDate` with `stateMeta`.
* RST reader: Improved metadata.
Treat initial field list as metadata when standalone specified.
Previously ALL fields "title", "author", "date" in field lists
were treated as metadata, even if not at the beginning.
Use `subtitle` metadata field for subtitle.
* `Text.Pandoc.Templates`: Export `renderTemplate'` that takes a string
instead of a compiled template..
* OPML template: Use 'for' loop for authors.
* Org template: '#+TITLE:' is inserted before the title.
Previously the writer did this.
rst2html doesn't add `<p>` tags to list items (even when they are
separated by blank lines) unless there are multiple paragraphs in the
list. This commit changes the RST reader to conform more closely to
what docutils does.
Closes#880.
The _note attribute is supported. This is unofficial, but
used e.g. in OmniOutliner and supported by multimarkdown.
We treat the contents as markdown blocks under a section
header.
Added to documentation and tests.
* Reverts 1.11 change that caused citations to be rendered as
markdown citations, even if `--biblio` was specified, unless
`citation` extension is disabled. Now, formatted citations
are always printed if `--biblio` was specified. If you want to
reformat markdown keeping pandoc markdown citations intact,
just don't specify `--biblio`.
* Reverted now unnecessary changes to Text.Pandoc.Biblio adding the raw
block to mark the bibliography, and to Text.Pandoc.Writers.Markdown
to remove the bibliography if `citations` not specified.
* If the content of a `Cite` inline is a `RawInline "latex"`, which
means that a LaTeX citation command was parsed and `--biblio` wasn't
specified, then render it as a pandoc markdown citation. This means
that `pandoc -f latex -t markdown`, without `--biblio`, will convert
LaTeX citation commands to pandoc markdown citations.
Previously, a LaTeX citation would always be parsed as a Citation
element, with the raw LaTeX in the [Inline] part.
Now, the LaTeX citation is parsed as a Citation element only if
`--biblio` was specified (i.e. only if there is a nonempty set
of references in readerReferences). Otherwise it is parsed as
raw LaTeX.
This will make it possible to simplify some things in the markdown
writer. It also makes the LaTeX reader behave more like the Markdown
reader.
Previously citations were rendered as citeproc-formatted citations
by default. Now we render them as pandoc citations, e.g. `[@item1]`,
unless the `citations` extension is disabled.
If you still want formatted citations in your markdown output,
use `pandoc -t markdown-citations`.
* Moved code for translating listings language names to
highlighting-kate names and back from LaTeX reader to Highlighting.
* Text.Pandoc.Highlighting no longer exposed (API change)
* Text.Pandoc.Highlighting exports toListingsLang, fromListingsLang
Note: The attributes go on the enclosing section or div
if `--section-divs` is specified.
Also fixed a regression (only now noticed) in html+lhs output.
Previously the bird tracks were being omitted.
The 1.10 code assumed that each table header cell contains
exactly one block. That failed for headerless tables (0) and also
for tables with multiple blocks in a header cell.
The code is fixed and tests provided. Thanks to Andrew Lee for
pointing out the bug.
* RTF writer: Export writeRTFWithEmbeddedImages instead of
rtfEmbedImage.
* Text.Pandoc: Use writeRTFWithEmbeddedImages for RTF.
* Moved code for embedding images in RTF out of pandoc.hs.
* It no longer uses Network.URIs URI parser, which is too restrictive
(not allowing unicode URIs unless encoded).
* It allows many more schemes.
* It better handles punctuation so as to avoid capturing trailing
punctuation in bare URLs.
* In markdown reader, add a '\1' character to the beginning
of the title of an image that is alone in its paragraph,
if implicit_figures extension is selected.
* In writers, check for Para [Image alt (src,'\1':tit)] and treat
it as a figure if possible.
* Updated tests.
This is a bit of a hack, but it allows us to make implicit_figures
an extension of the markdown reader, rather than the writers.
We now (a) use anonymous links for links with inline URLs, and
(b) use an inline link instead of a reference link if the
reference link would require a label that has already been
used for a different link.
Closes#511.
Previously header ids were autogenerated by the writers.
Now they are generated (unless supplied explicitly) in the
markdown parser, if the `header_identifiers` extension is
selected.
In addition, the textile reader now supports id attributes on
headers.
Taking into account new context/latex output, and fixing
some bugs in the test suite Tests.Helpers and Tests.Writers.ConTeXt.
(We had the wrong order of expected/actual in the diff output.)
Previously the textile reader and writer incorrectly implented
RST-style autolinks for URLs and email addresses.
This has been fixed. Now an autolink is done this way:
"$":http://myurl.com
Now pandoc correctly handles hard line breaks inside list items.
Previously they broke list parsing. Thanks to Pablo
Rodríguez for pointing out the problem.
* Depend on text.
* Expose Text.Pandoc.UTF8.
* Text.Pandoc.UTF8 now exports toString, fromString,
toStringLazy, fromStringLazy.
* These are used instead of the old utf8-string functions.
Now we insert anchors after each header, and use @ref
instead of @uref for links.
Commas are now escaped as @comma{} only when needed; previously
all commas were escaped. (This change is needed, in part, because @ref
commands must be followed by a real comma or period.)
Also insert a blank line in from of @verbatim environments.
Previously the parser would hang on input like this:
[[[[[[[[[[[[[[[[[[hi
We fixed this by making the link parser parser characters
between balanced brackets (skipping brackets in inline code spans),
then parsing the result as an inline list.
One change is that
[hi *there]* bud](/url)
is now no longer parsed as a link. But in this respect pandoc behaved
differently from most other implementations anyway, so that seems okay.
All current tests pass. Added test for this case.
Closes#620.
This allows the markdown reader to treat '\begin' (not followed
by an argument) as a raw string rather than erroring out when
it doesn't find a '{'.
Closes#622.
Unescaped -'s become hyphens, while \-'s are left as ascii
minus signs. That is preferable for use with command-line
options.
See http://lintian.debian.org/tags/hyphen-used-as-minus-sign.html.
Thanks to Andrea Bolognani for bringing the issue to our
attention.
- Removed writerLiterateHaskell from WriterOptions.
- Removed readerLiterateHaskell from ReaderOptions.
- Added Ext_literate_haskell to Extensions. Test for this
instead of the above.
- Removed failUnlessLHS from Shared.
Note: At this point, +lhs and .lhs extension no longer has any effect.
Need to fix.
* Use Builder's Inlines/Blocks instead of lists.
* Return values in the reader monad, which are then
run (at the end of parsing) against the final
parser state. This allows links, notes, and
example numbers to be resolved without a second
parser pass.
* An effect of using Builder is that everything is
normalized automatically.
* New exports from Text.Pandoc.Parsing:
widthsFromIndices, NoteTable', KeyTable', Key', toKey',
withQuoteContext, singleQuoteStart, singleQuoteEnd, doubleQuoteStart,
doubleQuoteEnd, ellipses, apostrophe, dash
* Updated opendocument tests.
* Don't derive Show for ParserState.
* Benchmarks: markdown reader takes 82% of the time it took before.
Markdown writer takes 92% of the time (here the speedup is probably
due to the fact that everything is normalized by default).
To run tests, configure with --enable-tests, then 'cabal test'.
You can specify particular tests using --test-options='-t markdown'.
No output is shown unless tests fail. In the future, we can move
to the detailed-1.0 interface.
* All tables now require at least one body row.
* Renamed from 'extra' to 'pipe' tables.
* Moved functions from Parsing to Readers.Markdown.
* Cleaned up code; revised to parse in one pass rather than
parsing a raw string, splitting it, and parsing the components.
* Allow pipe tables without pipes on the ends (as PHP Markdown Extra
does).
* Use `:` form instead of `~`, for better compatibility with other
markdown implementations.
* Don't wrap the term, because it breaks definition lists.
This way you can still get the raw latex back, even if you don't
process with citeproc. Previously, cites were not visible at all
unless you specified --biblio on the command line and converted
them using citeproc, or used --natbib or --biblatex.
* The new reader is more robust, accurate, and extensible.
It is still quite incomplete, but it should be easier
now to add features.
* Text.Pandoc.Parsing: Added withRaw combinator.
* Markdown reader: do escapedChar before raw latex inline.
Otherwise we capture commands like \{.
* Fixed latex citation tests for new citeproc.
* Handle \include{} commands in latex.
This is done in pandoc.hs, not the (pure) latex reader.
But the reader exports the needed function, handleIncludes.
* Moved err and warn from pandoc.hs to Shared.
* Fixed tests - raw tex should sometimes have trailing space.
* Updated lhs-test for highlighting-kate changes.
* New module `Text.Pandoc.Docx`.
* New output format `docx`.
* Added reference.docx.
* New option `--reference-docx`.
The writer includes support for highlighted code blocks
and math (which is converted from TeX to OMML using
texmath's new OMML module).
Pandoc previously behaved like Markdown.pl for consecutive
lists of different styles. Thus, the following would be parsed
as a single ordered list, rather than an ordered list followed
by an unordered list:
1. one
2. two
- one
- two
This patch makes pandoc behave more sensibly, parsing this as
two lists. Any change in list type (ordered/unordered) or in
list number style will trigger a new list. Thus, the following
will also be parsed as two lists:
1. one
2. two
a. one
b. two
Since we regard this as a bug in Markdown.pl, and not something
anyone would ever rely on, we do not preserve the old behavior
even when `--strict` is selected.
* `---` is always em-dash, `--` is always en-dash.
* pandoc no longer tries to guess when `-` should be en-dash.
* A new option, `--old-dashes`, is provided for legacy documents.
Rationale: The rules for en-dash are too complex and
language-dependent for a guesser to work reliably. This
change gives users greater control. The alternative of
using unicode isn't very good, since unicode em- and en-
dashes are barely distinguishable in a monospace font.
Inline math uses the :math:`...` construct.
Display math uses
.. math:: ...
or if multilin
.. math::
...
These seem to be supported now by rst2latex.py.
Beamer output uses the default LaTeX template, with some
customizations via variables.
Added `writerBeamer` to `WriterOptions`.
Added `--beamer` option to `markdown2pdf`.
The container element will have the classes, id, and
key-value attributes you specified in the delimited code
block.
Previously these were stripped off.
escapeURI now only escapes space characters, leaving unicode characters
as they are, instead of converting them to octets and URL-encoding them,
as before. This gives more readable URIs. User agents now do the
percent-encoding themselves.
URIs are no longer unescaped at all on conversion to markdown, asciidoc,
rst, org.
Closes#349.
Still TODO:
- documentation in README
- add default.asciidoc to templates/
- lists
- tables
- proper escaping
- footnotes with blank lines - print separately at end?
currently they are just ignored.
- fix header (date gives weird result on pandoc README)
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere
else", or, with the new citeproc-hs, "and nowhere else".
Now it yields " and nowhere else".
A horizontal rule now gets transformed into an empty H1 header
before 'hierarchicalize' is called.
If the document that does not begin with an H1 header, an
empty one is provided.
This avoids the need for kludgy raw HTML.
Also, the 'titleslide' class is added to any section containing
just a title:
----
----