* Added Text.Pandoc.Pretty.
This is better suited for pandoc than the 'pretty' package.
One advantage is that we now get proper wrapping; Emph [Inline]
is no longer treated as a big unwrappable unit. Previously
we only got breaks for spaces at the "outer level." We can also
more easily avoid doubled blank lines. Performance is
significantly better as well.
* Removed Text.Pandoc.Blocks.
Text.Pandoc.Pretty allows you to define blocks and concatenate
them.
* Modified markdown, RST, org readers to use Text.Pandoc.Pretty
instead of Text.PrettyPrint.HughesPJ.
* Text.Pandoc.Shared: Added writerColumns to WriterOptions.
* Markdown, RST, Org writers now break text at writerColumns.
* Added --columns command-line option, which sets stColumns
and writerColumns.
* Table parsing: If the size of the header > stColumns,
use the header size as 100% for purposes of calculating
relative widths of columns.
This is necessary as the latex citation commands include there own
punctuation, which resulted in doubled commas for markdown documents
where citeproc output works correctly.
There was a bug in parsing '_emph_, ...': when followed by
a comma, underscore emphasis did not register. (Thanks to
gwern for pointing this out.)
This bug was introduced by the change in
c66921f2ac
* The recent change allowing spaces and newlines in the URL
caused problems when reference keys are stacked up without
blank lines between. This is now fixed.
* Added test.
Moved inlineNote parser after superscript parser,
so ^[link](/foo)^ gets recognized as a superscripted
link, not an inline note followed by garbage.
Thanks to Conal Elliott for pointing out the problem.
This is better done on the resulting HTML; use the xss-sanitize library
for this. xss-sanitize is based on pandoc's sanitization, but improves
it.
- Removed stateSanitize from ParserState.
- Removed --sanitize-html option.
Resolves issue #258.
Note that there are some differences in how docutils and
pandoc treat footnotes. Currently pandoc ignores the numeral
or symbol used in the note; footnotes are put in an auto-numbered
ordered list.
Previously we allowed '. . .', ' . . . ', etc. This caused
too many complications, and removed author's flexibility in
combining ellipses with spaces and periods.
The 'str' parser now reads internal _'s as part of the string.
This prevents pandoc from getting started looking for an emphasized
block, which can cause exponential slowdowns in some cases.
Resolves Issue #182.
Previously, curly quotes were just parsed literally, leading
to problems in some output formats. Now they are parsed as
Quoted inlines, if --smart is specified.
Resolves Issue #270.
This broke when we added the Key type. We had assumed that
the custom case-insensitive Ord instance would ensure case-insensitive
matching, but that is not how Data.Map works.
* Added a test case for case-insensitivity in markdown-reader-more
* Removed old refsMatch from Text.Pandoc.Parsing module;
* hid the 'Key' constructor;
* dropped the custom Ord and Eq instances, deriving instead;
* added fromKey and toKey to convert between Keys and Inline lists;
* toKey ensures that keys are case-insensitive, since this is the
only way the API provides to construct a Key.
Resolves Issue #272.
The smartPuncutation parser from the markdown parser
was being used, but this creates two problems:
* smart punctuation rules are slightly different in textile,
for example, a single dash wish space around becomes an
En dash.
* the following gets parsed as a double quoted string followed
by a colon, rather than as a link:
"emphasized text":http://my.url.com
This needs rethinking.
Do a quick lookahead to make sure what follows looks like a setext
header before parsing any Inlines. This gives a 15% performance
boost in one benchmark. Many thanks to knieriem for finding
the problem (in peg-markdown):
https://github.com/jgm/peg-markdown/issues/issue/3
If suffix doesn't begin with punctuation, include opening
comma and space in result.
Previously,
@item [only a suffix]
would result in something like
Doe (2002only a suffix)
because there was no opening delimiter.
* Don't look for bibliography in ~/.pandoc. Reason: doing
this requires a read + parse of the bibliography even when
the document doesn't use citations. This is a big performance
drag on regular pandoc invocations.
* Only look for default.csl if the document contains references.
Reason: avoids the need to read and parse csl file when the
document contains no references anyway.
* Removed findFirstFile from Shared.
Now we handle a suffix after a bare locator, e.g.
@item1 [p. 30, suffix]
The suffix now includes any punctuation that introduces it.
A few tests fail because of problems with citeproc (extra space
before the suffix, missing space after comma separating multiple
page ranges in the locator).
Suffixes and prefixes are now [Inline]. The locator is separated
from the citation key by a blank space. The locator consists of
one introductory word and any number of words containing at
least one digit. The suffix, if any, is separated from the locator
by a comma, and continues til the end of the citation.
citationPrefix now [Inline] rather than String;
citationSuffix added.
This change presupposes no changes in citeproc-hs.
It passes a string for these values to citeproc-hs.
Eventually, citeproc-hs should use an [Inline] for
these as well.
We now get Text.Pandoc.Definition from the new pandoc-types package.
This will make it possible for other programs to supply output
in Pandoc format, without depending on the whole pandoc package.
Thanks to Jonathan Daugherty for the patch.
The gladTeX program gives finer control over the LaTeX environment
used to render its input. The latest version (1.1) uses the
"displaymath" environment by default, which is nice for large,
block-level equations, but it isn't so nice for inline math (where
"math" is more appropriate). This patch causes the HTML writer to
differentiate between the two by explicitly setting the LaTeX
environment on the generated EQ tag.
Previously some of the writers added spurious whitespace.
This has been removed, resolving Issue #232.
NOTE: If your application combines pandoc's output with other
text, for example in a template, you may need to add spacing.
For example, a pandoc-generated markdown file will not have
a blank line after the final block element. If you are inserting
it into another markdown file, you will need to make sure there
is a blank line between it and the next block element.
+ Header identifiers now get attached to the headers, unless
--section-divs is specified, in which case they are added to
enclosing divs. By default, the divs are not added.
+ Resolves Issue #230, #239.
+ Added --webtex command-line option, with optional parameter.
(Defaults to using google charts API.)
+ Added WebTeX HTMLMathMethod.
+ Removed MimeTeX HTMLMathMethod. (WebTeX is generic and subsumes it.)
+ Modified --mimetex option to use WebTeX.
+ Thanks to lpeterse for the idea and some of the code.
+ Added stateHasChapters to ParserState.
+ If a \chapter command is encountered, this is set to True
and subsequent \section commands (etc.) will be bumped up
one level.
The previous fix resulted in bird tracks being included in
both html and html+lhs renderings of literate haskell sections
when pandoc was compiled without highlighting support. This change make
pandoc without highlighting behave like pandoc with highlighting: the
bird tracks are used only if html+lhs output is specified.
+ New writer module Text.Pandoc.Writers.EPUB
+ Stylesheet in epub.css
+ --epub-stylesheet command-line option.
+ New utility module Text.Pandoc.UUID to generate
random UUIDs for EPUBs.
+ Transformed the old Text.Pandoc.ODT module into a proper
writer module, Text.Pandoc.Writers.ODT.
+ Instead of saveOpenDocumentAsODT, we now have writeODT, which
takes a Pandoc document and produces a bytestring.
saveOpenDocumentAsODT has been removed.
+ To extract the images and insert them into the ODT, we now use
processPandocM on the Pandoc document rather than a custom XML parser.
+ Handle the case where the image is remote (or not found) by
converting the Image element into an Emph with the label.
+ Plumbing in pandoc.hs changed slightly to accomodate this, and to
allow other writers that live in the IO monad.
Resolves Issue #242. Previously the bird tracks would be
stripped off when pandoc was not compiled with highlighting support,
even if -t html+lhs was specified.
Thanks to Nicholas Wu for pointing out the problem.
* This affects the RST and Markdown readers.
* The type for stateKeys in ParserState has also changed.
* Pandoc, Meta, Inline, and Block have been given Ord instances.
* Reference keys now have a type of their own (Key), with its
own Ord instance for case-insensitive comparison.
Use new rawLaTeXInline' in LaTeX reader, and export rawLaTeXInline
for use in markdown reader.
Fixes bug wherein '\section{foo}' was not recognized as raw TeX
in markdown document.
* This replaces a lot of custom parser code, and expands
the tex -> unicode conversion.
* The behavior has also changed: if the whole formula can't
be converted, the whole formula is left in raw TeX.
Previously, pandoc converted parts of the formula to unicode
and left other parts in raw TeX.
* Added (but not yet exported) readTeXMath', which returns a Maybe.
* Updated tests
Previously some characters that are illegal in HTML identifiers,
such as '<', were being allowed in header identifiers. The logic
has now been fixed. Thanks to Xyne for reporting.
Now it escapes all characters that aren't allowed in a URI.
%, ?, /, and other characters that are allowed in a URI are
left alone. Unicode high characters are UTF-8 encoded.
* Added stringToURI to Shared. This is used in the HTML
writer for all URIs. It properly URI-encodes high
characters (> 127), leaving everything else (including
symbols and spaces) the same.
* Modified unsanitaryURI to allow UTF8 characters in a URI.
(First, we convert the URI to URI-encoded octets, then we
pass through parseURIReference.)
This resolves gitit Issue #99. Previously
'[abc](http://gitit.net/测试)' would not be rendered as
a link when --sanitize was selected.
It's safe to depend on extensible-exceptions, since this is
shipped with GHC 6.10 and 6.12.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1911 788f1e2b-df1e-0410-8736-df70ead52e1b
* Added data/MathMLinHTML.js, which is included when no URL is provided
for --mathml. This allows MathML to be displayed in better browsers,
as text/html.
* The module was no longer necessary; its functionality (two lines)
was incorporated into pandoc.hs.
* Consolidated the two LaTeXMathML.js files into one.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1909 788f1e2b-df1e-0410-8736-df70ead52e1b
If argument is an absolute URL without a recognized extension,
and no reader is explicitly specified, use HTML.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1908 788f1e2b-df1e-0410-8736-df70ead52e1b
Text.Pandoc.Writers.Markdown now exports a writePlain,
which writes plain text without links, pictures, or
special formatting (not even markdown conventions).
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1907 788f1e2b-df1e-0410-8736-df70ead52e1b
Now we have a list of "transforms" (Pandoc -> Pandoc).
They get applied at the end in a fold.
This should make it easier to add new document-transforming
options in the future.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1905 788f1e2b-df1e-0410-8736-df70ead52e1b
The new rule: If the link target is an absolute URL, an external
link is created. Otherwise, a wikilink is created.
Examples:
1. [label](/foo/bar) => [[foo/bar|label]]
2. [label](foo) => [[foo|label]]
3. [label](http://gitit.net/foo) => [http://gitit.net/foo label]
Note on 1: We strip the leading / here, since otherwise we get a
link to Help:Links/foo/bar. would it be better for 1 to become
[http://{SERVERNAME}}/foo/bar label]? Perhaps, since this would
guarantee the same link destination as you'd get if you used the
HTML writer directly.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1904 788f1e2b-df1e-0410-8736-df70ead52e1b
Also, DON'T put image in figure (as was done previously)
when it's an inline image.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1893 788f1e2b-df1e-0410-8736-df70ead52e1b
Inverse bird tracks (<) are used for haskell example code that is not
part of the literate Haskell program.
Resolves Issue #211.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1888 788f1e2b-df1e-0410-8736-df70ead52e1b
* These options now imply -s; previously they worked also
in fragment mode.
* Users can now adjust position of include-before and
include-after text in the templates.
* Default position of include-before moved back (as it
originally was) before table of contents.
* Resolves Issue #217.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1883 788f1e2b-df1e-0410-8736-df70ead52e1b
Previously the markdown writer printed raw citation codes, e.g.
[geach1970], rather than the expanded citations provided by citeproc,
e.g. (Geach 1970). Now it prints the expanded citations. This means
that the document produced can be processed as a markdown document
without citeproc. Thanks to dsanson for reporting, and arossato
for the patch.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1871 788f1e2b-df1e-0410-8736-df70ead52e1b
In this case, the widths must be in the first table row.
In the process, simplified table generation code.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1864 788f1e2b-df1e-0410-8736-df70ead52e1b
Based on a patch by Justin Bogner.
Titles may span multiple lines, provided continuation lines
begin with a space character.
Separate authors may be put on multiple lines, provided
each line after the first begins with a space character.
Each author must fit on one line. Multiple authors on
a single line may still be separated by a semicolon.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1854 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Table cells can now contain multiple block elements, such
as lists or paragraphs.
+ Table parser is now forgiving of spaces at ends of lines.
+ Added test cases.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1852 788f1e2b-df1e-0410-8736-df70ead52e1b
Markdown.pl doesn't URI-escape anything, so we won't do that either,
except for spaces, which can cause problems if not escaped.
Resolves Issue #220 and partially reverts r1847.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1851 788f1e2b-df1e-0410-8736-df70ead52e1b
This is too English-centric. Writers can provide their own
header at the end of the document.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1850 788f1e2b-df1e-0410-8736-df70ead52e1b
= head = is now level 1, == head == level 2, etc.
This seems to be correct; it's only by convention that
wikipedia articles have level 2 headers at most.
Patch due to Eric Kow.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1849 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Resolves Issue #220.
+ Added escapeURI function to Markdown reader. This escapes
links in a way that makes sense for markdown. If they've
used URI escapes like %20 in their link, these will be preserved.
But if they've used a special character or space without escaping
it, it will be escaped. This should make sense in most cases.
+ Previously pandoc collapsed adjacent spaces and replaced these
sequences of spaces with + characters. That isn't correct for
a URI path (+ is to be used only in the query part). We've also
removed the space-collapsing behavior.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1847 788f1e2b-df1e-0410-8736-df70ead52e1b
If getAppUserDataDirectory raises an error, just use
the default data files.
Previously pandoc *assumed* HOME was set and would error out
if not.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1842 788f1e2b-df1e-0410-8736-df70ead52e1b
The following is not valid xhtml, but the intent is clear:
<ol>
<li>one</li>
<ol><li>sub</li></ol>
<li>two</li>
</ol>
We'll treat the <ol> as if it's in a <li>.
Resolves Issue #215.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1836 788f1e2b-df1e-0410-8736-df70ead52e1b
html2markdown is no longer needed, since you can pass URI arguments
to pandoc and directly convert web pages. (Note, however, that pandoc
assumes the pages are UTF8. html2markdown made an attempt to guess the
encoding and convert them.)
hsmarkdown is pointless -- a large executable that could be replaced
by 'pandoc --strict'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1834 788f1e2b-df1e-0410-8736-df70ead52e1b
Otherwise "E. coli" starts a list. This might change the semantics
of some existing documents, since previously the two-space requirement
was only enforced when the second word started with a capital letter.
But it is consistent with the existing documentation and follows the
principle of least surprise.
Resolves Issue #212.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1829 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Adds dependency on HTTP.
+ If a parameter is an absolute URI, pandoc will try to
get the content via HTTP.
+ So, you can do: pandoc -r html -w markdown http://www.fsf.org
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1826 788f1e2b-df1e-0410-8736-df70ead52e1b
+ Incorporated idea (from HXT) that an element can be closed
by an open tag for another element.
+ Javascript is partially parsed to make sure that a <script>
section is not closed by a </script> in a comment or string.
+ More lenient non-quoted attribute values.
Now we accept anything but a space character, quote, or <>.
This helps in parsing e.g. www.google.com!
+ Bare & signs are now parsed as a string. This is a common
HTML mistake.
+ Skip a bare < in malformed HTML.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1825 788f1e2b-df1e-0410-8736-df70ead52e1b
This is more uniform, and calling libraries can always disable
searching of user directories for overrides.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1821 788f1e2b-df1e-0410-8736-df70ead52e1b
+ This specifies a user data directory. If not specified, will default
to ~/.pandoc on unix or Application Data\pandoc on Windows.
Files placed in the user data directory will override system default
data files.
+ Added datadir parameter to readDataFile, saveOpenDocumentAsODT,
latexMathMLScript, s5HeaderIncludes, and getTemplate. Removed
getDefaultTemplate.
+ Updated documentation.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1809 788f1e2b-df1e-0410-8736-df70ead52e1b
This allows the caller to select whether to allow user overrides
from the user data directory (~/.pandoc).
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1803 788f1e2b-df1e-0410-8736-df70ead52e1b
This allows the user to customized the styles used in pandoc-generated
ODTs. The user may also put a default reference.odt in the ~/.pandoc
directory.
We have removed the old data/odt directory and replaced it with a
reference.odt.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1760 788f1e2b-df1e-0410-8736-df70ead52e1b
Packages will be included only if they are needed, given what
is in the document. So if you never use strikeout, you don't
need to install the ulem package.
Also moved amsmath to the top of the package list, made
\maketitle conditional on a title being present, and
adjusted spacing.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1738 788f1e2b-df1e-0410-8736-df70ead52e1b
If --xetex is specified, pandoc produces latex suitable for
processing by xelatex, and markdown2pdf uses xelatex to create
the PDF. Resolves Issue #185.
This seems better than using latex packages to detect xetex,
since not all latex installations will have these.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1737 788f1e2b-df1e-0410-8736-df70ead52e1b
Also changed treatment of multiple authors: they now occupy
multiple paragraphs rather than using a line break.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1734 788f1e2b-df1e-0410-8736-df70ead52e1b
Note that now the "--after-body" will come after the "AUTHORS"
section, whereas before it would come before it. This is a
slight break from backwards compatibility.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1733 788f1e2b-df1e-0410-8736-df70ead52e1b
The tags are so long that it's pointless.
Use <> instead of $$ to prevent huge indents.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1730 788f1e2b-df1e-0410-8736-df70ead52e1b
Put variables in right order. We've specified that if they
use -A, -B, -H multiple times, the text appears in the same
order as on the command line.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1722 788f1e2b-df1e-0410-8736-df70ead52e1b
Use stHasMath instead of stIncludes.
This gives the user more control over how the math
directive is defined.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1715 788f1e2b-df1e-0410-8736-df70ead52e1b
Instead, require that these be flush left in multiline
conditionals.
Also, swallow empty space after keywords in multiline conditionals.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1709 788f1e2b-df1e-0410-8736-df70ead52e1b
s5 css and js is included using header-includes variable.
We don't need a separate s5 template, so it has been
removed.
Use linebreak to separate authors in S5 title page.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@1705 788f1e2b-df1e-0410-8736-df70ead52e1b