* New module `Text.Pandoc.Docx`.
* New output format `docx`.
* Added reference.docx.
* New option `--reference-docx`.
The writer includes support for highlighted code blocks
and math (which is converted from TeX to OMML using
texmath's new OMML module).
Pandoc previously behaved like Markdown.pl for consecutive
lists of different styles. Thus, the following would be parsed
as a single ordered list, rather than an ordered list followed
by an unordered list:
1. one
2. two
- one
- two
This patch makes pandoc behave more sensibly, parsing this as
two lists. Any change in list type (ordered/unordered) or in
list number style will trigger a new list. Thus, the following
will also be parsed as two lists:
1. one
2. two
a. one
b. two
Since we regard this as a bug in Markdown.pl, and not something
anyone would ever rely on, we do not preserve the old behavior
even when `--strict` is selected.
* `---` is always em-dash, `--` is always en-dash.
* pandoc no longer tries to guess when `-` should be en-dash.
* A new option, `--old-dashes`, is provided for legacy documents.
Rationale: The rules for en-dash are too complex and
language-dependent for a guesser to work reliably. This
change gives users greater control. The alternative of
using unicode isn't very good, since unicode em- and en-
dashes are barely distinguishable in a monospace font.
Inline math uses the :math:`...` construct.
Display math uses
.. math:: ...
or if multilin
.. math::
...
These seem to be supported now by rst2latex.py.
Beamer output uses the default LaTeX template, with some
customizations via variables.
Added `writerBeamer` to `WriterOptions`.
Added `--beamer` option to `markdown2pdf`.
The container element will have the classes, id, and
key-value attributes you specified in the delimited code
block.
Previously these were stripped off.
escapeURI now only escapes space characters, leaving unicode characters
as they are, instead of converting them to octets and URL-encoding them,
as before. This gives more readable URIs. User agents now do the
percent-encoding themselves.
URIs are no longer unescaped at all on conversion to markdown, asciidoc,
rst, org.
Closes#349.
Still TODO:
- documentation in README
- add default.asciidoc to templates/
- lists
- tables
- proper escaping
- footnotes with blank lines - print separately at end?
currently they are just ignored.
- fix header (date gives weird result on pandoc README)
Previously `[@item1 and nowhere else]` yielded the locator ", and nowhere
else", or, with the new citeproc-hs, "and nowhere else".
Now it yields " and nowhere else".
A horizontal rule now gets transformed into an empty H1 header
before 'hierarchicalize' is called.
If the document that does not begin with an H1 header, an
empty one is provided.
This avoids the need for kludgy raw HTML.
Also, the 'titleslide' class is added to any section containing
just a title:
----
----
For example, in
Just a few glitches remaining.
<ul><li> In this situation, one loses the list.
</ul>
And in this, the preformatting.
<pre>Preformatted text not starting with its own blank line.
</pre>
Thansk to Dirk Laurie for noticing the issue.
So, in RST, 'http://google.com.' should be parsed as a link
to 'http://google.com' followed by a period.
The parser is smart enough to recognize balanced parentheses,
as often occur in wikipedia links: 'http://foo.bar/baz_(bam)'.
Also added ()s to RST specialChars, so '(http://google.com)'
will be parsed as a link in parens.
Added test cases.
Resolves Issue #291.
"First paragraph" as opposed to "Text body." This allows
users to specify e.g. that only paragraphs after the first
paragraph of a section are to be indented.
Thanks to Andrea Rossato for the patch.
Closes github Issue #20.
Additional related changes:
* URLs in Code in autolinks now use class "url".
* Require highlighting-kate 0.2.8.2, which omits the final <br/> tag,
essential for inline code.
Field lists now work properly with block content.
(Thanks to Lachlan Musicman for pointing out the bug.)
In addition, definition list items are now always Para instead
of Plain -- which matches behavior of rst2xml.py.
Finally, in image blocks, the alt attribute is parsed properly
and used for the alt, not also the title.
The old TeX, HtmlInline and RawHtml elements have been removed
and replaced by generic RawInline and RawBlock elements.
All modules updated to use the new raw elements.
It needs to be inside the if(strikeout) condition, after
the ulem package is imported; otherwise we try to renewcommand{\sout} when
\sout isn't yet defined.
Moved old tests into Old.hs and added new simple test-pandoc.hs for
loading and grouping together tests from different files. Later
commits will add more testfiles to the suite with more modular tests.
+ You can now specify glob patterns after 'cabal test';
e.g. 'cabal test latex' will only run the latex tests.
+ Instead of detecting highlighting support in Setup.hs,
we now detect it in test-pandoc, by looking to see if
'languages' is null.
+ We now verify the lhs readers against the lhs-test.native,
normalizing with 'normalize'. This makes more sense than
verifying against HTML, which also brings in the HTML writer.
+ Added lhsn-test.nohl.{html,html+lhs}, so we can do the lhs
tests whether or not highlighting has been installed.
+ <nav> for TOC, <figure> for figures, type attribute in <ol>.
+ Don't add math javascript in html5.
+ Use style attributes instead of deprecated width, align.
+ html template: move <title> after <meta>.
Note: charset needs to be declared before title.
+ slidy and s5 templates: move <title> after <meta>.
+ html template: Added link to html5 shim for IE.
+ Make --html5 have an effect only for 'html' writer (not s5, slidy, epub).
'(_hi_)' was being parsed with literal underscores (no emphasis).
The fix: the 'str' parser now only parses alphanumerics and
embedded underscores. All other symbols are handled by the
'symbol' parser. This has a slight effect on the AST, since
you'll get [Str "hi",Str ":"] insntead of [Str "hi:"]. But there
should not be a visible effect in any of the writers.
Thanks to gwern for pointing out the regression.
* The new reader is faster and more accurate.
* API changes for Text.Pandoc.Readers.HTML:
- removed rawHtmlBlock, anyHtmlBlockTag, anyHtmlInlineTag,
anyHtmlTag, anyHtmlEndTag, htmlEndTag, extractTagType,
htmlBlockElement, htmlComment
- added htmlTag, htmlInBalanced, isInlineTag, isBlockTag, isTextTag
* tagsoup is a new dependency.
* Text.Pandoc.Parsing: Generalized type on readWith.
* Benchmark.hs: Added length calculation to force full evaluation.
* Updated HTML reader tests.
* Updated markdown and textile readers to use the functions from
the HTML reader.
* Note: The markdown reader now correctly handles some cases it did not
before. For example:
<hr/>
is reproduced without adding a space.
<script>
a = '<b>';
</script>
is parsed correctly.
* Added Text.Pandoc.Pretty.
This is better suited for pandoc than the 'pretty' package.
One advantage is that we now get proper wrapping; Emph [Inline]
is no longer treated as a big unwrappable unit. Previously
we only got breaks for spaces at the "outer level." We can also
more easily avoid doubled blank lines. Performance is
significantly better as well.
* Removed Text.Pandoc.Blocks.
Text.Pandoc.Pretty allows you to define blocks and concatenate
them.
* Modified markdown, RST, org readers to use Text.Pandoc.Pretty
instead of Text.PrettyPrint.HughesPJ.
* Text.Pandoc.Shared: Added writerColumns to WriterOptions.
* Markdown, RST, Org writers now break text at writerColumns.
* Added --columns command-line option, which sets stColumns
and writerColumns.
* Table parsing: If the size of the header > stColumns,
use the header size as 100% for purposes of calculating
relative widths of columns.
This is necessary as the latex citation commands include there own
punctuation, which resulted in doubled commas for markdown documents
where citeproc output works correctly.
This is necessary because converting from markdown to latex correctly
changes hyphens to en-dashes and some spaces to non-breaking spaces.
Converting back to markdown does not undo this changes, and so the
tests have to undo them.
* The recent change allowing spaces and newlines in the URL
caused problems when reference keys are stacked up without
blank lines between. This is now fixed.
* Added test.
Resolves issue #258.
Note that there are some differences in how docutils and
pandoc treat footnotes. Currently pandoc ignores the numeral
or symbol used in the note; footnotes are put in an auto-numbered
ordered list.
Previously we allowed '. . .', ' . . . ', etc. This caused
too many complications, and removed author's flexibility in
combining ellipses with spaces and periods.
Previously, curly quotes were just parsed literally, leading
to problems in some output formats. Now they are parsed as
Quoted inlines, if --smart is specified.
Resolves Issue #270.
This broke when we added the Key type. We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.
* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
only way the API provides to construct a Key.
Resolves Issue #272.
Now the tests are produced in HTML format (so we can see all
formatting). Also, we produce them in three different style,
chicago-author-date, ieee, and mhra.
Now we handle a suffix after a bare locator, e.g.
@item1 [p. 30, suffix]
The suffix now includes any punctuation that introduces it.
A few tests fail because of problems with citeproc (extra space
before the suffix, missing space after comma separating multiple
page ranges in the locator).
Suffixes and prefixes are now [Inline]. The locator is separated
from the citation key by a blank space. The locator consists of
one introductory word and any number of words containing at
least one digit. The suffix, if any, is separated from the locator
by a comma, and continues til the end of the citation.
Previously some of the writers added spurious whitespace.
This has been removed, resolving Issue #232.
NOTE: If your application combines pandoc's output with other
text, for example in a template, you may need to add spacing.
For example, a pandoc-generated markdown file will not have
a blank line after the final block element. If you are inserting
it into another markdown file, you will need to make sure there
is a blank line between it and the next block element.
+ Header identifiers now get attached to the headers, unless
--section-divs is specified, in which case they are added to
enclosing divs. By default, the divs are not added.
+ Resolves Issue #230, #239.
+ Added --webtex command-line option, with optional parameter.
(Defaults to using google charts API.)
+ Added WebTeX HTMLMathMethod.
+ Removed MimeTeX HTMLMathMethod. (WebTeX is generic and subsumes it.)
+ Modified --mimetex option to use WebTeX.
+ Thanks to lpeterse for the idea and some of the code.
* This replaces a lot of custom parser code, and expands
the tex -> unicode conversion.
* The behavior has also changed: if the whole formula can't
be converted, the whole formula is left in raw TeX.
Previously, pandoc converted parts of the formula to unicode
and left other parts in raw TeX.
* Added (but not yet exported) readTeXMath', which returns a Maybe.
* Updated tests
* Added data/MathMLinHTML.js, which is included when no URL is provided
for --mathml. This allows MathML to be displayed in better browsers,
as text/html.
* The module was no longer necessary; its functionality (two lines)
was incorporated into pandoc.hs.
* Consolidated the two LaTeXMathML.js files into one.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1909 788f1e2b-df1e-0410-8736-df70ead52e1b
Text.Pandoc.Writers.Markdown now exports a writePlain,
which writes plain text without links, pictures, or
special formatting (not even markdown conventions).
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1907 788f1e2b-df1e-0410-8736-df70ead52e1b
The new rule: If the link target is an absolute URL, an external
link is created. Otherwise, a wikilink is created.
Examples:
1. [label](/foo/bar) => [[foo/bar|label]]
2. [label](foo) => [[foo|label]]
3. [label](http://gitit.net/foo) => [http://gitit.net/foo label]
Note on 1: We strip the leading / here, since otherwise we get a
link to Help:Links/foo/bar. would it be better for 1 to become
[http://{SERVERNAME}}/foo/bar label]? Perhaps, since this would
guarantee the same link destination as you'd get if you used the
HTML writer directly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1904 788f1e2b-df1e-0410-8736-df70ead52e1b
Also, DON'T put image in figure (as was done previously)
when it's an inline image.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1893 788f1e2b-df1e-0410-8736-df70ead52e1b
* These options now imply -s; previously they worked also
in fragment mode.
* Users can now adjust position of include-before and
include-after text in the templates.
* Default position of include-before moved back (as it
originally was) before table of contents.
* Resolves Issue #217.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1883 788f1e2b-df1e-0410-8736-df70ead52e1b
Otherwise the list labels (numbers) often extend past the left
margin, which looks bad.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1858 788f1e2b-df1e-0410-8736-df70ead52e1b
Based on a patch by Justin Bogner.
Titles may span multiple lines, provided continuation lines
begin with a space character.
Separate authors may be put on multiple lines, provided
each line after the first begins with a space character.
Each author must fit on one line. Multiple authors on
a single line may still be separated by a semicolon.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1854 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Table cells can now contain multiple block elements, such
as lists or paragraphs.
+ Table parser is now forgiving of spaces at ends of lines.
+ Added test cases.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1852 788f1e2b-df1e-0410-8736-df70ead52e1b
This is too English-centric. Writers can provide their own
header at the end of the document.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1850 788f1e2b-df1e-0410-8736-df70ead52e1b
= head = is now level 1, == head == level 2, etc.
This seems to be correct; it's only by convention that
wikipedia articles have level 2 headers at most.
Patch due to Eric Kow.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1849 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Resolves Issue #220.
+ Added escapeURI function to Markdown reader. This escapes
links in a way that makes sense for markdown. If they've
used URI escapes like %20 in their link, these will be preserved.
But if they've used a special character or space without escaping
it, it will be escaped. This should make sense in most cases.
+ Previously pandoc collapsed adjacent spaces and replaced these
sequences of spaces with + characters. That isn't correct for
a URI path (+ is to be used only in the query part). We've also
removed the space-collapsing behavior.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1847 788f1e2b-df1e-0410-8736-df70ead52e1b
It is used not just for links but for toc, section heading bookmarks,
footnotes, etc.
Also added unicode=true on hyperref options.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1795 788f1e2b-df1e-0410-8736-df70ead52e1b