2006-10-29 20:58:20 +01:00
|
|
|
% Pandoc
|
2006-10-17 16:22:29 +02:00
|
|
|
% John MacFarlane
|
2006-12-29 22:49:31 +01:00
|
|
|
% December 29, 2006
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc is a [Haskell] library for converting from one markup format
|
2006-10-27 05:28:22 +02:00
|
|
|
to another, and a command-line tool that uses this library. It can read
|
|
|
|
[markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
|
2006-10-27 05:16:13 +02:00
|
|
|
and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
|
2007-01-01 22:08:12 +01:00
|
|
|
[DocBook XML], and [S5] HTML slide shows. Pandoc's version of markdown
|
|
|
|
contains some enhancements, like footnotes and embedded LaTeX.
|
2006-10-27 05:16:13 +02:00
|
|
|
|
|
|
|
In contrast to existing tools for converting markdown to HTML, which
|
2006-10-29 20:58:20 +01:00
|
|
|
use regex substitutions, Pandoc has a modular design: it consists of a
|
2006-10-27 05:16:13 +02:00
|
|
|
set of readers, which parse text in a given format and produce a native
|
|
|
|
representation of the document, and a set of writers, which convert
|
|
|
|
this native representation into a target format. Thus, adding an input
|
|
|
|
or output format requires only adding a reader or writer.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[markdown]: http://daringfireball.net/projects/markdown/
|
|
|
|
[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
|
|
|
|
[S5]: http://meyerweb.com/eric/tools/s5/
|
|
|
|
[HTML]: http://www.w3.org/TR/html40/
|
2006-12-22 21:16:03 +01:00
|
|
|
[LaTeX]: http://www.latex-project.org/
|
2006-10-17 16:22:29 +02:00
|
|
|
[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format
|
2007-01-01 22:08:12 +01:00
|
|
|
[DocBook XML]: http://www.docbook.org/
|
2006-10-17 16:22:29 +02:00
|
|
|
[Haskell]: http://www.haskell.org/
|
|
|
|
|
2006-12-20 04:56:36 +01:00
|
|
|
(c) 2006 John MacFarlane (jgm at berkeley dot edu). Released under the
|
2006-10-17 16:22:29 +02:00
|
|
|
[GPL], version 2 or greater. This software carries no warranty of
|
2006-12-20 04:23:00 +01:00
|
|
|
any kind. (See COPYRIGHT for full copyright and warranty notices.)
|
2006-12-20 04:56:36 +01:00
|
|
|
Recai Oktaş (roktas at debian dot org) deserves credit for the build
|
2006-10-28 08:35:35 +02:00
|
|
|
system, the debian package, and the robust wrapper scripts.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
[GPL]: http://www.gnu.org/copyleft/gpl.html "GNU General Public License"
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Requirements
|
|
|
|
============
|
|
|
|
|
|
|
|
The `pandoc` program itself does not depend on any external libraries
|
2006-12-28 03:20:09 +01:00
|
|
|
or programs.
|
2006-12-22 21:16:03 +01:00
|
|
|
|
2006-12-29 19:50:13 +01:00
|
|
|
The wrapper script `html2markdown` requires
|
2006-12-22 21:16:03 +01:00
|
|
|
|
2006-12-28 03:20:09 +01:00
|
|
|
- `pandoc` (which must be in the PATH)
|
2006-12-22 21:16:03 +01:00
|
|
|
- a POSIX-compliant shell (installed by default on all linux and unix
|
|
|
|
systems, including Mac OS X, and in [Cygwin] for Windows),
|
|
|
|
- `HTML Tidy`
|
|
|
|
- `iconv` (for character encoding conversion). (If `iconv` is absent,
|
2006-12-29 19:50:13 +01:00
|
|
|
`html2markdown` will still work, but it will treat everything as UTF-8.)
|
2006-12-22 21:16:03 +01:00
|
|
|
|
|
|
|
The wrapper script `markdown2pdf` requires
|
|
|
|
|
2006-12-28 03:20:09 +01:00
|
|
|
- `pandoc` (which must be in the PATH)
|
2006-12-22 21:16:03 +01:00
|
|
|
- a POSIX-compliant shell
|
|
|
|
- `pdflatex`, which should be part of any [LaTeX] distribution
|
|
|
|
- the [unicode] and [fancyvrb] LaTeX packages, which are included
|
2006-12-28 08:22:41 +01:00
|
|
|
in many LaTeX distributions.[^1] If your installation of LaTeX
|
2006-12-22 21:16:03 +01:00
|
|
|
does not include these packages, you will get an error (complaining
|
|
|
|
about missing `ucs.sty` or `fancyvrb.sty`) when you try to compile
|
|
|
|
a LaTeX file produced by Pandoc, or when you use the `markdown2pdf`
|
|
|
|
script (described below). If this happens, install the [unicode] and
|
|
|
|
[fancyvrb] packages package from [CTAN]. (Get the zip file from CTAN
|
|
|
|
and unpack it into `~/texmf/tex/latex/`. You may also need to run
|
|
|
|
`mktexlsr` or `texhash` before the files can be found by TeX.)
|
|
|
|
|
2007-01-02 03:58:54 +01:00
|
|
|
The wrapper script `hsmarkdown` requires only a POSIX-compliant shell.
|
|
|
|
|
|
|
|
[Cygwin]: http://www.cygwin.com/
|
|
|
|
[HTML Tidy]: http://tidy.sourceforge.net/
|
|
|
|
[`iconv`]: http://www.gnu.org/software/libiconv/
|
2006-12-22 21:16:03 +01:00
|
|
|
[CTAN]: http://www.ctan.org "Comprehensive TeX Archive Network"
|
|
|
|
[unicode]: http://www.ctan.org/tex-archive/macros/latex/contrib/unicode/
|
|
|
|
[fancyvrb]: http://www.ctan.org/tex-archive/macros/latex/contrib/fancyvrb/
|
|
|
|
|
2006-12-28 08:22:41 +01:00
|
|
|
[^1]: The [unicode] package allows LaTeX to process UTF-8 characters.
|
|
|
|
[fancyvrb] allows code blocks and verbatim text to be used within
|
|
|
|
footnotes.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Using Pandoc
|
|
|
|
============
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
If you run `pandoc` without arguments, it will accept input from
|
|
|
|
STDIN. If you run it with file names as arguments, it will take input
|
2006-12-28 03:20:09 +01:00
|
|
|
from those files. By default, `pandoc` writes its output to STDOUT.
|
|
|
|
If you want to write to a file, use the `-o` option:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
pandoc -o hello.html hello.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
Note that you can specify multiple input files on the command line.
|
|
|
|
`pandoc` will concatenate them all (with blank lines between them)
|
|
|
|
before parsing:
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
pandoc -s chapter1.txt chapter2.txt references.txt > book.html
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
(The `-s` option here tells `pandoc` to produce a standalone HTML file,
|
|
|
|
with a proper header, rather than a fragment. For more details on this
|
|
|
|
and many other command-line options, see below.)
|
|
|
|
|
2006-12-28 03:20:09 +01:00
|
|
|
The format of the input and output can be specified explicitly using
|
|
|
|
command-line options. The input format can be specified using the
|
|
|
|
`-r/--read` or `-f/--from` options, the output format using the
|
|
|
|
`-w/--write` or `-t/--to` options. Thus, to convert `hello.txt` from
|
|
|
|
markdown to LaTeX, you could type:
|
|
|
|
|
|
|
|
pandoc -f markdown -t latex hello.txt
|
|
|
|
|
|
|
|
To convert `hello.html` from html to markdown:
|
|
|
|
|
|
|
|
pandoc -f html -t markdown hello.html
|
|
|
|
|
|
|
|
Supported output formats include `markdown`, `latex`, `html`, `rtf`
|
2007-01-01 22:08:12 +01:00
|
|
|
(rich text format), `rst` (reStructuredText), `docbook` (DocBook
|
|
|
|
XML), and `s5` (which produces an HTML file that acts like powerpoint).
|
|
|
|
Supported input formats include `markdown`, `html`, `latex`, and `rst`.
|
|
|
|
Note that the `rst` reader only parses a subset of reStructuredText
|
|
|
|
syntax. For example, it doesn't handle tables, definition lists, option
|
|
|
|
lists, or footnotes. It handles only the constructs expressible in
|
|
|
|
unextended markdown. But for simple documents it should be adequate.
|
|
|
|
The `latex` and `html` readers are also limited in what they can do.
|
|
|
|
Because the `html` reader is picky about the HTML it parses, it is
|
|
|
|
recommended that you pipe HTML through [HTML Tidy] before sending it to
|
|
|
|
`pandoc`, or use the `html2markdown` script described below.
|
2006-12-28 03:20:09 +01:00
|
|
|
|
|
|
|
If you don't specify a reader or writer explicitly, `pandoc` will
|
|
|
|
try to determine the input and output format from the extensions of
|
|
|
|
the input and output filenames. Thus, for example,
|
|
|
|
|
|
|
|
pandoc -o hello.tex hello.txt
|
|
|
|
|
|
|
|
will convert `hello.txt` from markdown to LaTeX. If no output file
|
|
|
|
is specified (so that output goes to STDOUT), or if the output file's
|
|
|
|
extension is unknown, the output format will default to HTML.
|
|
|
|
If no input file is specified (so that input comes from STDIN), or
|
|
|
|
if the input files' extensions are unknown, the input format will
|
|
|
|
be assumed to be markdown unless explicitly specified.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Character encodings
|
|
|
|
-------------------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
Unfortunately, due to limitations in GHC, `pandoc` does not automatically
|
|
|
|
detect the system's local character encoding. Hence, all input and
|
2006-11-01 05:32:00 +01:00
|
|
|
output is assumed to be in the UTF-8 encoding. If your local character
|
|
|
|
encoding is not UTF-8 and you use accented or foreign characters,
|
|
|
|
you should pipe the input and output through [`iconv`]. For example,
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-11-01 05:32:00 +01:00
|
|
|
iconv -t utf-8 source.txt | pandoc | iconv -f utf-8 > output.html
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
will convert `source.txt` from the local encoding to UTF-8, then
|
2006-11-01 05:32:00 +01:00
|
|
|
convert it to HTML, then convert back to the local encoding,
|
|
|
|
putting the output in `output.html`.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-11-01 05:32:00 +01:00
|
|
|
The shell scripts (described below) automatically convert the input
|
|
|
|
from the local encoding to UTF-8 before running them through `pandoc`,
|
|
|
|
then convert the output back to the local encoding.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-02 03:58:54 +01:00
|
|
|
`markdown2pdf`, `html2markdown`, and `hsmarkdown`
|
|
|
|
=================================================
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-02 03:58:54 +01:00
|
|
|
Three shell scripts, `markdown2pdf`, `html2markdown`, and `hsmarkdown`,
|
|
|
|
are included in the standard Pandoc installation. (They are not included
|
|
|
|
in the Windows binary package, as they require a POSIX shell, but they
|
|
|
|
may be used in Windows under Cygwin.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
1. `markdown2pdf` produces a PDF file from markdown-formatted
|
2006-12-28 03:20:09 +01:00
|
|
|
text, using `pandoc` and `pdflatex`. The default
|
2006-12-22 21:16:03 +01:00
|
|
|
behavior of `markdown2pdf` is to create a file with the same
|
|
|
|
base name as the first argument and the extension `pdf`; thus,
|
|
|
|
for example,
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
markdown2pdf sample.txt endnotes.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
will produce `sample.pdf`. (If `sample.pdf` exists already,
|
|
|
|
it will be backed up before being overwritten.) An output file
|
|
|
|
name can be specified explicitly using the `-o` option:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
markdown2pdf -o "My Book.pdf" chap1.txt chap2.txt chap3.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
If no input file is specified, input will be taken from STDIN.
|
2007-01-08 22:16:18 +01:00
|
|
|
All of `pandoc`'s options will work with `markdown2pdf` as well.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-29 19:50:13 +01:00
|
|
|
2. `html2markdown` grabs a web page from a file or URL and converts
|
2006-12-28 03:20:09 +01:00
|
|
|
it to markdown-formatted text, using `tidy` and `pandoc`.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-08 22:16:18 +01:00
|
|
|
All of `pandoc`'s options will work with `html2markdown` as well.
|
|
|
|
In addition, the following special options may be used.
|
|
|
|
The special options must be separated from the `html2markdown`
|
|
|
|
command and any regular Pandoc options by the delimiter `--`:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-08 22:16:18 +01:00
|
|
|
html2markdown -o out.txt -- -e latin1 -g curl google.com
|
|
|
|
|
|
|
|
The `-e` or `--encoding` option specifies the character encoding
|
|
|
|
of the HTML input. If this option is not specified, and input
|
|
|
|
is not from STDIN, `html2markdown` will attempt to determine the
|
|
|
|
page's character encoding from the "Content-type" meta tag.
|
|
|
|
If this is not present, UTF-8 is assumed.
|
|
|
|
|
|
|
|
The `-g` or `--grabber` option specifies the command to be used to
|
|
|
|
fetch the contents of a URL:
|
|
|
|
|
|
|
|
html2markdown -g 'curl --user foo:bar' www.mysite.com
|
|
|
|
|
|
|
|
If this option is not specified, `html2markdown` searches for an
|
|
|
|
available program (`wget`, `curl`, or a text-mode browser) to fetch
|
|
|
|
the contents of a URL.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-02 03:58:54 +01:00
|
|
|
3. `hsmarkdown` is designed to be used as a drop-in replacement for
|
|
|
|
`Markdown.pl`. It forces `pandoc` to convert from markdown to
|
|
|
|
HTML, and to use the `--strict` flag for maximal compliance with
|
|
|
|
official markdown syntax. (All of Pandoc's syntax extensions and
|
|
|
|
variants, described below, are disabled.) No other command-line
|
|
|
|
options are allowed. (In fact, options will be interpreted as
|
|
|
|
filenames.)
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Command-line options
|
|
|
|
====================
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
Various command-line options can be used to customize the output.
|
Changes to Pandoc's options to facilitate wrapper scripts:
+ removed -d/--debug option
+ added --dump-args option, which prints the name of the output file
(or '-' for STDOUT) and all the command-line arguments (excluding
Pandoc options and their arguments), one per line, then exits. Note
that special wrapper options will be treated as arguments if they
follow '--' at the end of the command line. Thus,
pandoc --dump-args -o foo.html foo.txt -- -e latin1
will print the following to STDOUT:
foo.html
foo.txt
-e
latin1
+ added --ignore-args option, which causes Pandoc to ignore all
(non-option) arguments, including any special options that occur
after '--' at the end of the command line.
+ '-' now means STDIN as the name of an input file, STDOUT as the
name of an output file. So,
pandoc -o - -
will take input from STDIN and print output to STDOUT. Note that
if multiple '-o' options are specified on the same line, the last
one takes precedence. So, in a script,
pandoc "$@" -o -
will guarantee output to STDOUT, even if the '-o' option was used.
+ documented these changes in man pages, README, and changelog.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@454 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08 09:11:08 +01:00
|
|
|
For further documentation, see the `pandoc(1)` man page.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-28 03:20:09 +01:00
|
|
|
`-f`, `--from`, `-r`, or `--read` can be used to specify the input
|
|
|
|
format -- the format Pandoc will be converting *from*. Available
|
|
|
|
formats are `native`, `markdown`, `rst`, `html`, and `latex`.
|
|
|
|
|
|
|
|
`-t`, `--to`, `-w`, or `--write` can be used to specify the output
|
|
|
|
format -- the format Pandoc will be converting *to*. Available formats
|
2007-01-01 22:08:12 +01:00
|
|
|
are `native`, `html`, `s5`, `docbook`, `latex`, `markdown`, `rst`, and
|
|
|
|
`rtf`.
|
2006-12-28 03:20:09 +01:00
|
|
|
|
|
|
|
`-s` or `--standalone` indicates that a standalone document is to be
|
|
|
|
produced (with appropriate headers and footers), rather than a fragment.
|
|
|
|
|
Changes to Pandoc's options to facilitate wrapper scripts:
+ removed -d/--debug option
+ added --dump-args option, which prints the name of the output file
(or '-' for STDOUT) and all the command-line arguments (excluding
Pandoc options and their arguments), one per line, then exits. Note
that special wrapper options will be treated as arguments if they
follow '--' at the end of the command line. Thus,
pandoc --dump-args -o foo.html foo.txt -- -e latin1
will print the following to STDOUT:
foo.html
foo.txt
-e
latin1
+ added --ignore-args option, which causes Pandoc to ignore all
(non-option) arguments, including any special options that occur
after '--' at the end of the command line.
+ '-' now means STDIN as the name of an input file, STDOUT as the
name of an output file. So,
pandoc -o - -
will take input from STDIN and print output to STDOUT. Note that
if multiple '-o' options are specified on the same line, the last
one takes precedence. So, in a script,
pandoc "$@" -o -
will guarantee output to STDOUT, even if the '-o' option was used.
+ documented these changes in man pages, README, and changelog.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@454 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08 09:11:08 +01:00
|
|
|
`-o` or `--output` specifies the name of the output file. If this
|
|
|
|
option is not specified, or if its argument is `-`, output will be sent
|
|
|
|
to STDOUT.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`-p` or `--preserve-tabs` causes tabs in the source text to be
|
|
|
|
preserved, rather than converted to spaces (the default).
|
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
`--tabstop` allows the user to set the tab stop (which defaults to 4).
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-30 23:51:49 +01:00
|
|
|
`--strict` specifies that strict markdown syntax is to be used, without
|
|
|
|
pandoc's usual extensions and variants (described below).
|
|
|
|
|
2006-10-17 16:22:29 +02:00
|
|
|
`-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML
|
|
|
|
codes and LaTeX environments that it can't translate as raw HTML or
|
|
|
|
LaTeX. Raw HTML can be printed in markdown, reStructuredText, HTML,
|
|
|
|
and S5 output; raw LaTeX can be printed in markdown, reStructuredText,
|
|
|
|
and LaTeX output. The default is for the readers to omit
|
|
|
|
untranslatable HTML codes and LaTeX environments. (The LaTeX reader
|
|
|
|
does pass through untranslatable LaTeX commands, even if `-R` is not
|
|
|
|
specified.)
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
`-C` or `--custom-header` can be used to specify a custom document
|
|
|
|
header. To see the headers used by default, use the `-D` option:
|
|
|
|
for example, `pandoc -D html` prints the default HTML header.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`-c` or `--css` allows the user to specify a custom stylesheet that
|
|
|
|
will be linked to in HTML and S5 output.
|
|
|
|
|
|
|
|
`-H` or `--include-in-header` specifies a file to be included
|
|
|
|
(verbatim) at the end of the document header. This can be used, for
|
|
|
|
example, to include special CSS or javascript in HTML documents.
|
|
|
|
|
|
|
|
`-B` or `--include-before-body` specifies a file to be included
|
|
|
|
(verbatim) at the beginning of the document body (after the `<body>`
|
|
|
|
tag in HTML, or the `\begin{document}` command in LaTeX). This can be
|
|
|
|
used to include navigation bars or banners in HTML documents.
|
|
|
|
|
|
|
|
`-A` or `--include-after-body` specifies a file to be included
|
|
|
|
(verbatim) at the end of the docment body (before the `</body>` tag in
|
|
|
|
HTML, or the `\end{document}` command in LaTeX).
|
|
|
|
|
|
|
|
`-T` or `--title-prefix` specifies a string to be included as a prefix
|
|
|
|
at the beginning of the title that appears in the HTML header (but not
|
|
|
|
in the title as it appears at the beginning of the HTML body). (See
|
|
|
|
below on Titles.)
|
|
|
|
|
2006-12-18 23:02:39 +01:00
|
|
|
`-S` or `--smart` causes `pandoc` to produce typographically
|
2007-01-06 10:54:58 +01:00
|
|
|
correct output, along the lines of John Gruber's [Smartypants].
|
2006-10-17 16:22:29 +02:00
|
|
|
Straight quotes are converted to curly quotes, `---` to dashes, and
|
2007-01-06 10:54:58 +01:00
|
|
|
`...` to ellipses. (Note: This option is only significant when
|
|
|
|
the input format is `markdown`. It is selected automatically
|
|
|
|
when the output format is `latex`.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[Smartypants]: http://daringfireball.net/projects/smartypants/
|
|
|
|
|
|
|
|
`-m` or `--asciimathml` will cause LaTeX formulas (between $ signs) in
|
|
|
|
HTML or S5 to display as formulas rather than as code. The trick will
|
|
|
|
not work in all browsers, but it works in Firefox. Peter Jipsen's
|
|
|
|
[ASCIIMathML] script is used to do the magic.
|
|
|
|
|
|
|
|
[ASCIIMathML]: http://www1.chapman.edu/~jipsen/mathml/asciimath.html
|
|
|
|
|
|
|
|
`-i` or `--incremental` causes all lists in S5 output to be displayed
|
|
|
|
incrementally by default (one item at a time). The normal default
|
2006-10-27 05:16:13 +02:00
|
|
|
is for lists to be displayed all at once.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`-N` or `--number-sections` causes sections to be numbered in LaTeX
|
|
|
|
output. By default, sections are not numbered.
|
|
|
|
|
Changes to Pandoc's options to facilitate wrapper scripts:
+ removed -d/--debug option
+ added --dump-args option, which prints the name of the output file
(or '-' for STDOUT) and all the command-line arguments (excluding
Pandoc options and their arguments), one per line, then exits. Note
that special wrapper options will be treated as arguments if they
follow '--' at the end of the command line. Thus,
pandoc --dump-args -o foo.html foo.txt -- -e latin1
will print the following to STDOUT:
foo.html
foo.txt
-e
latin1
+ added --ignore-args option, which causes Pandoc to ignore all
(non-option) arguments, including any special options that occur
after '--' at the end of the command line.
+ '-' now means STDIN as the name of an input file, STDOUT as the
name of an output file. So,
pandoc -o - -
will take input from STDIN and print output to STDOUT. Note that
if multiple '-o' options are specified on the same line, the last
one takes precedence. So, in a script,
pandoc "$@" -o -
will guarantee output to STDOUT, even if the '-o' option was used.
+ documented these changes in man pages, README, and changelog.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@454 788f1e2b-df1e-0410-8736-df70ead52e1b
2007-01-08 09:11:08 +01:00
|
|
|
`--dump-args` is intended to make it easier to create wrapper scripts
|
|
|
|
that use Pandoc. It causes Pandoc to dump information about the arguments
|
|
|
|
with which it was called to STDOUT, then exit. The first line printed
|
|
|
|
is the name of the output file specified using the `-o` or `--output`
|
|
|
|
option, or `-` if output would go to STDOUT. The remaining lines, if any,
|
|
|
|
list command-line arguments. These will include the names of input
|
|
|
|
files and any special options passed after ` -- ` on the command line.
|
|
|
|
So, for example,
|
|
|
|
|
|
|
|
pandoc --dump-args -o foo.html -s foo.txt appendix.txt -- -e latin1
|
|
|
|
|
|
|
|
will cause the following to be printed to STDOUT:
|
|
|
|
|
|
|
|
foo.html
|
|
|
|
foo.txt
|
|
|
|
appendix.txt
|
|
|
|
-e
|
|
|
|
latin1
|
|
|
|
|
|
|
|
`--ignore-args` causes Pandoc to ignore all command-line arguments.
|
|
|
|
Regular Pandoc options are not ignored. Thus, for example,
|
|
|
|
|
|
|
|
pandoc --ignore-args -o foo.html -s foo.txt -- -e latin1
|
|
|
|
|
|
|
|
is equivalent to
|
|
|
|
|
|
|
|
pandoc -o foo.html -s
|
2006-12-22 21:16:03 +01:00
|
|
|
|
|
|
|
`-v` or `--version` prints the version number to STDERR.
|
|
|
|
|
|
|
|
`-h` or `--help` prints a usage message to STDERR.
|
|
|
|
|
|
|
|
Pandoc's markdown vs. standard markdown
|
|
|
|
=======================================
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
In parsing markdown, Pandoc departs from and extends [standard markdown]
|
2007-01-04 18:44:09 +01:00
|
|
|
in a few respects. (To run Pandoc on the official markdown test suite,
|
|
|
|
type `make test-markdown`.) Except where noted, these differences can
|
2007-01-02 03:58:54 +01:00
|
|
|
be suppressed by specifying the `--strict` command-line option or by
|
|
|
|
using the `hsmarkdown` wrapper.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[standard markdown]: http://daringfireball.net/projects/markdown/syntax
|
2006-12-22 21:16:03 +01:00
|
|
|
"Markdown syntax description"
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-09 00:54:15 +01:00
|
|
|
Backslash escapes
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Except inside a code block or inline code, any punctuation or space
|
|
|
|
character preceded by a backslash will be treated literally, even if it
|
|
|
|
would normally indicate formatting. Thus, for example, if one writes
|
|
|
|
|
|
|
|
*\*hello\**
|
|
|
|
|
|
|
|
one will get
|
|
|
|
|
|
|
|
<em>*hello*</em>
|
|
|
|
|
|
|
|
instead of
|
|
|
|
|
|
|
|
<strong>hello</strong>
|
|
|
|
|
|
|
|
This rule is easier to remember than standard markdown's rule,
|
|
|
|
which allows only the following characters to be backslash-escaped:
|
|
|
|
|
|
|
|
\`*_{}[]()>#+-.!
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Lists
|
|
|
|
-----
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc behaves differently from standard markdown on some "edge
|
2006-10-17 16:22:29 +02:00
|
|
|
cases" involving lists. Consider this source:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
1. First
|
|
|
|
2. Second:
|
|
|
|
- Fee
|
|
|
|
- Fie
|
|
|
|
- Foe
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
3. Third
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc transforms this into a "compact list" (with no `<p>` tags
|
2006-10-17 16:22:29 +02:00
|
|
|
around "First", "Second", or "Third"), while markdown puts `<p>`
|
|
|
|
tags around "Second" and "Third" (but not "First"), because of
|
2006-10-29 20:58:20 +01:00
|
|
|
the blank space around "Third". Pandoc follows a simple rule:
|
2006-10-17 16:22:29 +02:00
|
|
|
if the text is followed by a blank line, it is treated as a
|
|
|
|
paragraph. Since "Second" is followed by a list, and not a blank
|
|
|
|
line, it isn't treated as a paragraph. The fact that the list
|
2006-12-30 23:51:49 +01:00
|
|
|
is followed by a blank line is irrelevant. (Note: Pandoc works
|
|
|
|
this way even when the `--strict` option is specified. This
|
|
|
|
behavior is consistent with the official markdown syntax
|
|
|
|
description, even though it is different from that of `Markdown.pl`.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-16 20:43:00 +01:00
|
|
|
Unlike standard markdown, Pandoc allows ordered list items to be
|
|
|
|
marked with single letters, instead of numbers. So, for example,
|
|
|
|
this source yields a nested ordered list:
|
|
|
|
|
|
|
|
1. First
|
|
|
|
2. Second
|
|
|
|
a. Fee
|
|
|
|
b. Fie
|
|
|
|
3. Third
|
|
|
|
|
|
|
|
Pandoc also extends standard markdown in allowing list item markers
|
|
|
|
to be terminated by ')':
|
|
|
|
|
|
|
|
1) First
|
|
|
|
2) Second
|
|
|
|
A) Fee
|
|
|
|
B) Fie
|
|
|
|
3) Third
|
|
|
|
|
|
|
|
Note that Pandoc pays no attention to the *type* of ordered list
|
|
|
|
item marker used. Thus, the following is treated just the same as
|
|
|
|
the example above:
|
|
|
|
|
|
|
|
A) First
|
|
|
|
1. Second
|
|
|
|
2. Fee
|
|
|
|
B) Fie
|
|
|
|
C) Third
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Reference links
|
|
|
|
---------------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-30 23:51:49 +01:00
|
|
|
Pandoc allows implicit reference links with just a single set of
|
|
|
|
brackets. So, the following links are equivalent:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
1. Here's my [link]
|
|
|
|
2. Here's my [link][]
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
[link]: linky.com
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-04 18:44:09 +01:00
|
|
|
(Note: Pandoc works this way even if `--strict` is specified, because
|
|
|
|
`Markdown.pl` 1.0.2b7 allows single-bracket links.)
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Footnotes
|
|
|
|
---------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc's markdown allows footnotes, using the following syntax:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-19 08:30:36 +01:00
|
|
|
Here is a footnote reference,[^1] and another.[^longnote]
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-19 08:30:36 +01:00
|
|
|
[^1]: Here is the footnote. It can go anywhere in the document,
|
2006-12-12 08:04:09 +01:00
|
|
|
except in embedded contexts like block quotes or lists.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-19 08:30:36 +01:00
|
|
|
[^longnote]: Here's the other note. This one contains multiple
|
2006-10-27 05:16:13 +02:00
|
|
|
blocks.
|
2006-12-19 08:30:36 +01:00
|
|
|
|
2006-12-20 00:13:03 +01:00
|
|
|
Subsequent paragraphs are indented to show that they belong to
|
2006-12-19 08:30:36 +01:00
|
|
|
the previous footnote.
|
|
|
|
|
|
|
|
{ some.code }
|
|
|
|
|
2006-12-20 00:13:03 +01:00
|
|
|
The whole paragraph can be indented, or just the first line.
|
|
|
|
In this way, multi-paragraph footnotes work just like
|
|
|
|
multi-paragraph list items in markdown.
|
|
|
|
|
|
|
|
This paragraph won't be part of the note.
|
2006-12-19 08:30:36 +01:00
|
|
|
|
|
|
|
The identifiers in footnote references may not contain spaces, tabs,
|
2006-12-20 00:13:03 +01:00
|
|
|
or newlines. These identifiers are used only to correlate the
|
|
|
|
footnote reference with the note itself; in the output, footnotes
|
|
|
|
will be numbered sequentially.
|
|
|
|
|
|
|
|
Inline footnotes are also allowed (though, unlike regular notes,
|
|
|
|
they cannot contain multiple paragraphs). The syntax is as follows:
|
|
|
|
|
|
|
|
Here is an inline note.^[Inlines notes are easier to write, since
|
|
|
|
you don't have to pick an identifier and move down to type the
|
|
|
|
note.]
|
|
|
|
|
|
|
|
Inline and regular footnotes may be mixed freely.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2007-01-15 20:52:42 +01:00
|
|
|
Tables
|
|
|
|
------
|
|
|
|
|
|
|
|
Two kinds of tables may be used. Both kinds presuppose the use of
|
|
|
|
a fixed-width font, such as Courier. Currently only the HTML,
|
|
|
|
Docbook, and LaTeX writers support tables.
|
|
|
|
|
|
|
|
Simple tables look like this:
|
|
|
|
|
|
|
|
Right Left Center Default
|
|
|
|
------- ------ ---------- -------
|
|
|
|
12 12 12 12
|
|
|
|
123 123 123 123
|
|
|
|
1 1 1 1
|
|
|
|
|
|
|
|
Table: Demonstration of simple table syntax.
|
|
|
|
|
|
|
|
The headers and table rows must each fit on one line. Column
|
|
|
|
alignments are determined by the position of the header text relative
|
|
|
|
to the dashed line below it:[^2]
|
|
|
|
|
|
|
|
- If the dashed line is flush with the header text on the right side
|
|
|
|
but extends beyond it on the left, the column is right-aligned.
|
|
|
|
- If the dashed line is flush with the header text on the left side
|
|
|
|
but extends beyond it on the right, the column is left-aligned.
|
|
|
|
- If the dashed line extends beyond the header text on both sides,
|
|
|
|
the column is centered.
|
|
|
|
- If the dashed line is flush with the header text on both sides,
|
|
|
|
the default alignment is used (in most cases, this will be left).
|
|
|
|
|
|
|
|
[^2]: This scheme is due to Michel Fortin, who proposed it on the
|
|
|
|
Markdown discussion list: <http://six.pairlist.net/pipermail/markdown-discuss/2005-March/001097.html>
|
|
|
|
|
|
|
|
The table must end with a blank line. Optionally, a caption may be
|
|
|
|
provided (as illustrated in the example above). A caption is a paragraph
|
|
|
|
beginning with the string `Table:`, which will be stripped off.
|
|
|
|
|
|
|
|
The table parser pays attention to the widths of the columns, and
|
|
|
|
the writers try to reproduce these relative widths in the output.
|
|
|
|
So, if you find that one of the columns is too narrow in the output,
|
|
|
|
try widening it in the markdown source.
|
|
|
|
|
|
|
|
Multiline tables allow headers and table rows to span multiple lines
|
|
|
|
of text. Here is an example:
|
|
|
|
|
|
|
|
---------------------------------------------------------------
|
|
|
|
Centered Left Right
|
|
|
|
Header Aligned Aligned Default aligned
|
|
|
|
---------- --------- ----------- ---------------------------
|
|
|
|
First row 12.0 Example of a row that spans
|
|
|
|
multiple lines.
|
|
|
|
|
|
|
|
Second row 5.0 Here's another one. Note
|
|
|
|
the blank line between rows.
|
|
|
|
---------------------------------------------------------------
|
|
|
|
|
|
|
|
Table: Optional caption. This, too, may span multiple
|
|
|
|
lines.
|
|
|
|
|
|
|
|
These work like simple tables, but with the following differences:
|
|
|
|
|
|
|
|
- They must begin with a row of dashes, before the header text.
|
|
|
|
- They must end with a row of dashes, then a blank line.
|
|
|
|
- The rows must be separated by blank lines.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Embedded HTML
|
|
|
|
-------------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc treats embedded HTML in markdown a bit differently than
|
2006-10-17 16:22:29 +02:00
|
|
|
Markdown 1.0. While Markdown 1.0 leaves HTML blocks exactly as they
|
2006-10-29 20:58:20 +01:00
|
|
|
are, Pandoc treats text between HTML tags as markdown. Thus, for
|
|
|
|
example, Pandoc will turn
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<td>*one*</td>
|
|
|
|
<td>[a link](http://google.com)</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
into
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<td><em>one</em></td>
|
|
|
|
<td><a href="http://google.com">a link</a></td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-30 23:51:49 +01:00
|
|
|
whereas `Markdown.pl` will preserve it as is.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
There is one exception to this rule: text between `<script>` and
|
|
|
|
`</script>` tags is not interpreted as markdown.
|
|
|
|
|
|
|
|
This departure from standard markdown should make it easier to mix
|
|
|
|
markdown with HTML block elements. For example, one can surround
|
|
|
|
a block of markdown text with `<div>` tags without preventing it
|
|
|
|
from being interpreted as markdown.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Title blocks
|
|
|
|
------------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
If the file begins with a title block
|
|
|
|
|
|
|
|
% title
|
|
|
|
% author(s) (separated by commas)
|
|
|
|
% date
|
|
|
|
|
|
|
|
it will be parsed as bibliographic information, not regular text. (It
|
|
|
|
will be used, for example, in the title of standalone LaTeX or HTML
|
|
|
|
output.) The block may contain just a title, a title and an author,
|
|
|
|
or all three lines. Each must begin with a % and fit on one line.
|
|
|
|
The title may contain standard inline formatting. If you want to
|
|
|
|
include an author but no title, or a title and a date but no author,
|
|
|
|
you need a blank line:
|
|
|
|
|
|
|
|
% My title
|
|
|
|
%
|
|
|
|
% June 15, 2006
|
|
|
|
|
|
|
|
Titles will be written only when the `--standalone` (`-s`) option is
|
|
|
|
chosen. In HTML output, titles will appear twice: once in the
|
|
|
|
document head -- this is the title that will appear at the top of the
|
|
|
|
window in a browser -- and once at the beginning of the document body.
|
|
|
|
The title in the document head can have an optional prefix attached
|
|
|
|
(`--title-prefix` or `-T` option). The title in the body appears as
|
|
|
|
an H1 element with class "title", so it can be suppressed or
|
|
|
|
reformatted with CSS.
|
|
|
|
|
|
|
|
If a title prefix is specified with `-T` and no title block appears
|
|
|
|
in the document, the title prefix will be used by itself as the
|
|
|
|
HTML title.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Box-style blockquotes
|
|
|
|
---------------------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc supports emacs-style boxquote block quotes, in addition to
|
2006-10-17 16:22:29 +02:00
|
|
|
standard markdown (email-style) boxquotes:
|
|
|
|
|
|
|
|
,----
|
|
|
|
| They look like this.
|
|
|
|
`----
|
|
|
|
|
2006-12-31 00:19:14 +01:00
|
|
|
Blank lines before headers and blockquotes
|
|
|
|
------------------------------------------
|
|
|
|
|
|
|
|
Standard markdown syntax does not require a blank line before a header
|
|
|
|
or blockquote. Pandoc does require this (except, of course, at the
|
|
|
|
beginning of the document). The reason for the requirement is that
|
|
|
|
it is all too easy for a `>` or `#` to end up at the beginning of a
|
|
|
|
line by accident (perhaps through line wrapping). Consider, for
|
|
|
|
example:
|
|
|
|
|
|
|
|
I like several of their flavors of ice cream: #22, for example, and
|
|
|
|
#5.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Inline LaTeX
|
|
|
|
------------
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
Anything between two $ characters will be parsed as LaTeX math. The
|
|
|
|
opening $ must have a character immediately to its right, while the
|
|
|
|
closing $ must have a character immediately to its left. Thus,
|
|
|
|
`$20,000 and $30,000` won't parse as math. The $ character can be
|
|
|
|
escaped with a backslash if needed.
|
|
|
|
|
|
|
|
If you pass the `-m` (`--asciimathml`) option to `pandoc`, it will
|
|
|
|
include the [ASCIIMathML] script in the resulting HTML. This will
|
|
|
|
cause LaTeX math to be displayed as formulas in better browsers.
|
|
|
|
|
|
|
|
[ASCIIMathML]: http://www1.chapman.edu/~jipsen/asciimath.html
|
|
|
|
|
|
|
|
Inline LaTeX commands will also be preserved and passed unchanged
|
|
|
|
to the LaTeX writer. Thus, for example, you can use LaTeX to
|
|
|
|
include BibTeX citations:
|
|
|
|
|
|
|
|
This result was proved in \cite{jones.1967}.
|
|
|
|
|
|
|
|
You can also use LaTeX environments. For example,
|
|
|
|
|
|
|
|
\begin{tabular}{|l|l|}\hline
|
2006-12-12 08:04:09 +01:00
|
|
|
Age & Frequency \\ \hline
|
2006-10-17 16:22:29 +02:00
|
|
|
18--25 & 15 \\
|
2006-12-12 08:04:09 +01:00
|
|
|
26--35 & 33 \\
|
|
|
|
36--45 & 22 \\ \hline
|
2006-10-17 16:22:29 +02:00
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
Note, however, that material between the begin and end tags will
|
|
|
|
be interpreted as raw LaTeX, not as markdown.
|
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Custom headers
|
2006-12-30 23:51:49 +01:00
|
|
|
==============
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
When run with the "standalone" option (`-s`), `pandoc` creates a
|
|
|
|
standalone file, complete with an appropriate header. To see the
|
|
|
|
default headers used for html and latex, use the following commands:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -D html
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -D latex
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
If you want to use a different header, just create a file containing
|
|
|
|
it and specify it on the command line as follows:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc --header=MyHeaderFile
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Producing S5 with Pandoc
|
|
|
|
========================
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-22 21:16:03 +01:00
|
|
|
Producing an [S5] web-based slide show with Pandoc is easy. A title
|
|
|
|
page is constructed automatically from the document's title block (see
|
|
|
|
above). Each section (with a level-one header) produces a single slide.
|
|
|
|
(Note that if the section is too big, the slide will not fit on the page;
|
|
|
|
S5 is not smart enough to produce multiple pages.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
Here's the markdown source for a simple slide show, `eating.txt`:
|
|
|
|
|
|
|
|
% Eating Habits
|
|
|
|
% John Doe
|
|
|
|
% March 22, 2005
|
|
|
|
|
|
|
|
# In the morning
|
|
|
|
|
|
|
|
- Eat eggs
|
|
|
|
- Drink coffee
|
|
|
|
|
|
|
|
# In the evening
|
|
|
|
|
|
|
|
- Eat spaghetti
|
|
|
|
- Drink wine
|
|
|
|
|
|
|
|
To produce the slide show, simply type
|
|
|
|
|
|
|
|
pandoc -w s5 -s eating.txt > eating.html
|
|
|
|
|
|
|
|
and open up `eating.html` in a browser. The HTML file embeds
|
|
|
|
all the required javascript and CSS, so no other files are necessary.
|
|
|
|
|
|
|
|
Note that by default, the S5 writer produces lists that display
|
|
|
|
"all at once." If you want your lists to display incrementally
|
|
|
|
(one item at a time), use the `-i` option. If you want a
|
|
|
|
particular list to depart from the default (that is, to display
|
|
|
|
incrementally without the `-i` option and all at once with the
|
|
|
|
`-i` option), put it in a block quote:
|
|
|
|
|
|
|
|
> - Eat spaghetti
|
|
|
|
> - Drink wine
|
|
|
|
|
|
|
|
In this way incremental and nonincremental lists can be mixed in
|
|
|
|
a single document.
|
|
|
|
|