2006-10-29 20:58:20 +01:00
|
|
|
% Pandoc
|
2006-10-17 16:22:29 +02:00
|
|
|
% John MacFarlane
|
2006-10-29 20:32:49 +01:00
|
|
|
% October 30, 2006
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc is a [Haskell] library for converting from one markup format
|
2006-10-27 05:28:22 +02:00
|
|
|
to another, and a command-line tool that uses this library. It can read
|
|
|
|
[markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
|
2006-10-27 05:16:13 +02:00
|
|
|
and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
|
2006-10-29 20:58:20 +01:00
|
|
|
and [S5] HTML slide shows. Pandoc's version of markdown contains some
|
2006-10-27 05:16:13 +02:00
|
|
|
enhancements, like footnotes and embedded LaTeX.
|
|
|
|
|
|
|
|
In contrast to existing tools for converting markdown to HTML, which
|
2006-10-29 20:58:20 +01:00
|
|
|
use regex substitutions, Pandoc has a modular design: it consists of a
|
2006-10-27 05:16:13 +02:00
|
|
|
set of readers, which parse text in a given format and produce a native
|
|
|
|
representation of the document, and a set of writers, which convert
|
|
|
|
this native representation into a target format. Thus, adding an input
|
|
|
|
or output format requires only adding a reader or writer.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[markdown]: http://daringfireball.net/projects/markdown/
|
|
|
|
[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
|
|
|
|
[S5]: http://meyerweb.com/eric/tools/s5/
|
|
|
|
[HTML]: http://www.w3.org/TR/html40/
|
|
|
|
[LaTeX]: http://www.latex-project.org/
|
|
|
|
[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format
|
|
|
|
[Haskell]: http://www.haskell.org/
|
|
|
|
|
|
|
|
(c) 2006 John MacFarlane (jgm At berkeley.edu). Released under the
|
|
|
|
[GPL], version 2 or greater. This software carries no warranty of
|
2006-12-20 04:23:00 +01:00
|
|
|
any kind. (See COPYRIGHT for full copyright and warranty notices.)
|
2006-10-28 08:35:35 +02:00
|
|
|
Recai Oktaş (roktas At debian.org) deserves credit for the build
|
|
|
|
system, the debian package, and the robust wrapper scripts.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[GPL]: http://www.gnu.org/copyleft/gpl.html
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
# Using Pandoc
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
If you run `pandoc` without arguments, it will accept input from
|
|
|
|
STDIN. If you run it with file names as arguments, it will take input
|
|
|
|
from those files. It accepts several command-line options. For a
|
|
|
|
list, type
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -h
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
The most important options specify the format of the source file and
|
|
|
|
the output. The default reader is markdown; the default writer is
|
|
|
|
HTML. So if you don't specify a reader or writer, `pandoc` will
|
2006-10-27 05:16:13 +02:00
|
|
|
convert markdown to HTML. For example,
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc hello.txt
|
2006-10-27 05:16:13 +02:00
|
|
|
|
|
|
|
will convert `hello.txt` from markdown to HTML. For other conversions,
|
|
|
|
you must specify a reader and/or a writer using the `-r` and `-w`
|
|
|
|
flags. To convert markdown to LaTeX, you would write:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -w latex hello.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
To convert html to markdown:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -r html -w markdown hello.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
Supported writers include `markdown`, `latex`, `html`, `rtf` (rich text
|
|
|
|
format), `rst` (reStructuredText), and `s5` (which produces an HTML
|
|
|
|
file that acts like powerpoint). Supported readers include `markdown`,
|
|
|
|
`html`, `latex`, and `rst`. Note that the `rst` reader only parses
|
|
|
|
a subset of reStructuredText syntax. For example, it doesn't handle
|
|
|
|
tables, definition lists, option lists, or footnotes. It handles only the
|
|
|
|
constructs expressible in unextended markdown. But for simple documents
|
|
|
|
it should be adequate. The `latex` and `html` readers are also limited
|
|
|
|
in what they can do.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`pandoc` writes its output to STDOUT. If you want to write to a file,
|
|
|
|
use redirection:
|
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
pandoc hello.txt > hello.html
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
Note that you can specify multiple input files on the command line.
|
|
|
|
`pandoc` will concatenate them all (with blank lines between them)
|
|
|
|
before parsing:
|
|
|
|
|
|
|
|
pandoc -s chapter1.txt chapter2.txt chapter3.txt references.txt > book.html
|
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
(The `-s` option here tells `pandoc` to produce a standalone HTML file,
|
|
|
|
with a proper header, rather than a fragment. For more details on this
|
|
|
|
and many other command-line options, see below.)
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
# Character encodings
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
Unfortunately, due to limitations in GHC, `pandoc` does not automatically
|
|
|
|
detect the system's local character encoding. Hence, all input and
|
2006-11-01 05:32:00 +01:00
|
|
|
output is assumed to be in the UTF-8 encoding. If your local character
|
|
|
|
encoding is not UTF-8 and you use accented or foreign characters,
|
|
|
|
you should pipe the input and output through [`iconv`]. For example,
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-11-01 05:32:00 +01:00
|
|
|
iconv -t utf-8 source.txt | pandoc | iconv -f utf-8 > output.html
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
will convert `source.txt` from the local encoding to UTF-8, then
|
2006-11-01 05:32:00 +01:00
|
|
|
convert it to HTML, then convert back to the local encoding,
|
|
|
|
putting the output in `output.html`.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[`iconv`]: http://www.gnu.org/software/libiconv/
|
|
|
|
|
2006-11-01 05:32:00 +01:00
|
|
|
The shell scripts (described below) automatically convert the input
|
|
|
|
from the local encoding to UTF-8 before running them through `pandoc`,
|
|
|
|
then convert the output back to the local encoding.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-11-13 16:47:01 +01:00
|
|
|
## LaTeX and UTF-8
|
|
|
|
|
|
|
|
LaTeX sources produced by Pandoc use `ucs.sty`, which is included in many
|
|
|
|
LaTeX distributions. This allows LaTeX to process UTF-8 characters.
|
|
|
|
If your installation of LaTeX does not include `ucs.sty`, you will get an
|
|
|
|
error when you try to compile a LaTeX file produced by Pandoc, or when
|
|
|
|
you use the `markdown2pdf` script (described below). If this happens,
|
|
|
|
install the [unicode] package from [CTAN]. (Get the `unicode.zip`
|
|
|
|
file from CTAN, unpack it, and copy the whole `unicode` directory into
|
|
|
|
`~/texmf/tex/latex/`. You may also need to run `mktexlsr` or `texhash`
|
|
|
|
before the files can be found by TeX.)
|
|
|
|
|
|
|
|
[CTAN]: http://www.ctan.org
|
|
|
|
[unicode]: http://www.ctan.org/tex-archive/macros/latex/contrib/unicode/
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
# The shell scripts
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
Five shell scripts have been included that make it easy to run
|
|
|
|
`pandoc` without worrying about character encodings, and without
|
|
|
|
remembering all the command-line options:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
- `markdown2html` converts markdown-formatted text to HTML
|
|
|
|
- `markdown2latex` converts markdown-formatted text to LaTeX
|
|
|
|
- `markdown2pdf` produces a PDF file from markdown-formatted
|
|
|
|
text, using `pdflatex`.
|
|
|
|
- `html2markdown` converts HTML to markdown-formatted text
|
|
|
|
- `latex2markdown` converts LaTeX to markdown-formatted text
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
All of the scripts use `iconv` (if available) to convert to and from
|
|
|
|
the local character encoding. All of the scripts presuppose that
|
|
|
|
`pandoc` is in the path, and some have additional requirements. (For
|
|
|
|
example, `html2markdown` uses `tidy`, and `markdown2pdf` uses
|
|
|
|
`pdflatex`.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
When no arguments are specified, text will be read from standard
|
|
|
|
input. Arguments specify input files (limited to one in the case of
|
|
|
|
`latex2markdown` and `html2markdown`; the other scripts accept any number
|
|
|
|
of arguments). `html2markdown` may take a URL as argument instead of
|
|
|
|
a filename; in this case, `curl`, `wget`, or an available text-based
|
|
|
|
browser will be used to fetch the contents of the URL. (The `-n` option
|
|
|
|
inhibits this behavior; the `-g` option allows the user to specify a
|
|
|
|
custom command that will be used to fetch from a URL.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
With the exception of `markdown2pdf`, the scripts write to standard output.
|
|
|
|
Output can be sent to a file using shell output redirection:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
latex2markdown sample.tex > sample.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
The default behavior of `markdown2pdf` is to create a file with the same
|
|
|
|
base name as the first argument and the extension `pdf`; thus, for example,
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
markdown2pdf sample.txt endnotes.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
will produce `sample.pdf`. (If `sample.pdf` exists already, it will be
|
|
|
|
backed up before being overwritten.) An output file name can be specified
|
|
|
|
explicitly using the `-o` option:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
markdown2pdf -o "My Book.pdf" chap1.txt chap2.txt chap3.txt
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
Options specific to the scripts, like `-o`, `-g`, and `-n`, must
|
|
|
|
be specified *before* any command-line arguments (file names or URLs).
|
|
|
|
Any options specified *after* the command-line arguments will be
|
|
|
|
passed directly to `pandoc`. For example,
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
markdown2html tusks.txt -S -T Elephants
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
will convert `tusks.txt` to `tusks.html` using smart quotes, ellipses,
|
|
|
|
and dashes, with "Elephants" as the page title prefix. (For a
|
|
|
|
complete list of `pandoc` options, see below.) When there are no
|
|
|
|
command-line arguments (because input is from STDIN), `pandoc`
|
|
|
|
options must be preceded by ` -- `:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
cat tusks.txt | markdown2html -- -S -T Elephants
|
2006-11-14 02:43:55 +01:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
The ` -- ` separator may optionally be used when there are command-line
|
|
|
|
arguments:
|
2006-11-12 02:50:56 +01:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
markdown2html -- tusks.txt -S -T Elephants
|
2006-11-13 16:47:01 +01:00
|
|
|
|
2006-10-17 16:22:29 +02:00
|
|
|
# Command-line options
|
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
Various command-line options can be used to customize the output.
|
2006-10-17 16:22:29 +02:00
|
|
|
For a complete list, type
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc --help
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`-p` or `--preserve-tabs` causes tabs in the source text to be
|
|
|
|
preserved, rather than converted to spaces (the default).
|
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
`--tabstop` allows the user to set the tab stop (which defaults to 4).
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML
|
|
|
|
codes and LaTeX environments that it can't translate as raw HTML or
|
|
|
|
LaTeX. Raw HTML can be printed in markdown, reStructuredText, HTML,
|
|
|
|
and S5 output; raw LaTeX can be printed in markdown, reStructuredText,
|
|
|
|
and LaTeX output. The default is for the readers to omit
|
|
|
|
untranslatable HTML codes and LaTeX environments. (The LaTeX reader
|
|
|
|
does pass through untranslatable LaTeX commands, even if `-R` is not
|
|
|
|
specified.)
|
|
|
|
|
|
|
|
`-s` or `--standalone` causes `pandoc` to produce a standalone file,
|
|
|
|
complete with appropriate document headers. By default, `pandoc`
|
|
|
|
produces a fragment.
|
|
|
|
|
|
|
|
`--custom-header` can be used to specify a custom document header. To
|
|
|
|
see the headers used by default, use the `-D` option: for example,
|
|
|
|
`pandoc -D html` prints the default HTML header.
|
|
|
|
|
|
|
|
`-c` or `--css` allows the user to specify a custom stylesheet that
|
|
|
|
will be linked to in HTML and S5 output.
|
|
|
|
|
|
|
|
`-H` or `--include-in-header` specifies a file to be included
|
|
|
|
(verbatim) at the end of the document header. This can be used, for
|
|
|
|
example, to include special CSS or javascript in HTML documents.
|
|
|
|
|
|
|
|
`-B` or `--include-before-body` specifies a file to be included
|
|
|
|
(verbatim) at the beginning of the document body (after the `<body>`
|
|
|
|
tag in HTML, or the `\begin{document}` command in LaTeX). This can be
|
|
|
|
used to include navigation bars or banners in HTML documents.
|
|
|
|
|
|
|
|
`-A` or `--include-after-body` specifies a file to be included
|
|
|
|
(verbatim) at the end of the docment body (before the `</body>` tag in
|
|
|
|
HTML, or the `\end{document}` command in LaTeX).
|
|
|
|
|
|
|
|
`-T` or `--title-prefix` specifies a string to be included as a prefix
|
|
|
|
at the beginning of the title that appears in the HTML header (but not
|
|
|
|
in the title as it appears at the beginning of the HTML body). (See
|
|
|
|
below on Titles.)
|
|
|
|
|
2006-12-18 23:02:39 +01:00
|
|
|
`-S` or `--smart` causes `pandoc` to produce typographically
|
2006-10-17 16:22:29 +02:00
|
|
|
correct HTML output, along the lines of John Gruber's [Smartypants].
|
|
|
|
Straight quotes are converted to curly quotes, `---` to dashes, and
|
|
|
|
`...` to ellipses.
|
|
|
|
|
|
|
|
[Smartypants]: http://daringfireball.net/projects/smartypants/
|
|
|
|
|
|
|
|
`-m` or `--asciimathml` will cause LaTeX formulas (between $ signs) in
|
|
|
|
HTML or S5 to display as formulas rather than as code. The trick will
|
|
|
|
not work in all browsers, but it works in Firefox. Peter Jipsen's
|
|
|
|
[ASCIIMathML] script is used to do the magic.
|
|
|
|
|
|
|
|
[ASCIIMathML]: http://www1.chapman.edu/~jipsen/mathml/asciimath.html
|
|
|
|
|
|
|
|
`-i` or `--incremental` causes all lists in S5 output to be displayed
|
|
|
|
incrementally by default (one item at a time). The normal default
|
2006-10-27 05:16:13 +02:00
|
|
|
is for lists to be displayed all at once.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
`-N` or `--number-sections` causes sections to be numbered in LaTeX
|
|
|
|
output. By default, sections are not numbered.
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
# Pandoc's markdown vs. standard markdown
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
In parsing markdown, Pandoc departs from and extends [standard markdown]
|
|
|
|
in a few respects. (To run Pandoc on the official
|
2006-10-27 05:16:13 +02:00
|
|
|
markdown test suite, type `make test-markdown`.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
[standard markdown]: http://daringfireball.net/projects/markdown/syntax
|
|
|
|
|
2006-12-20 01:25:54 +01:00
|
|
|
## Section Headings
|
|
|
|
|
|
|
|
Pandoc creates an invisible anchor in front of every HTML section
|
|
|
|
heading. The ID of this anchor is derived from the section heading
|
|
|
|
itself: spaces are converted to underscores, and formatting, links,
|
|
|
|
and other markup are removed. Thus, for example, the source
|
|
|
|
|
|
|
|
## Aristotle's *De Anima*
|
|
|
|
|
|
|
|
gets converted to HTML as follows:
|
|
|
|
|
|
|
|
<a id="Aristotle's_De_Anima"></a>
|
|
|
|
<h2>Aristotle's <em>De Anima</em></h2>
|
|
|
|
|
|
|
|
This makes it easy to provide internal links that jump to a particular
|
|
|
|
place in a document. To provide a link to the heading above, for
|
|
|
|
example, just insert:
|
|
|
|
|
|
|
|
[Back to Aristotle](#Aristotle's_De_Anima)
|
|
|
|
|
2006-10-17 16:22:29 +02:00
|
|
|
## Lists
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc behaves differently from standard markdown on some "edge
|
2006-10-17 16:22:29 +02:00
|
|
|
cases" involving lists. Consider this source:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
1. First
|
|
|
|
2. Second:
|
|
|
|
- Fee
|
|
|
|
- Fie
|
|
|
|
- Foe
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
3. Third
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc transforms this into a "compact list" (with no `<p>` tags
|
2006-10-17 16:22:29 +02:00
|
|
|
around "First", "Second", or "Third"), while markdown puts `<p>`
|
|
|
|
tags around "Second" and "Third" (but not "First"), because of
|
2006-10-29 20:58:20 +01:00
|
|
|
the blank space around "Third". Pandoc follows a simple rule:
|
2006-10-17 16:22:29 +02:00
|
|
|
if the text is followed by a blank line, it is treated as a
|
|
|
|
paragraph. Since "Second" is followed by a list, and not a blank
|
|
|
|
line, it isn't treated as a paragraph. The fact that the list
|
|
|
|
is followed by a blank line is irrelevant.
|
|
|
|
|
2006-12-16 20:43:00 +01:00
|
|
|
Unlike standard markdown, Pandoc allows ordered list items to be
|
|
|
|
marked with single letters, instead of numbers. So, for example,
|
|
|
|
this source yields a nested ordered list:
|
|
|
|
|
|
|
|
1. First
|
|
|
|
2. Second
|
|
|
|
a. Fee
|
|
|
|
b. Fie
|
|
|
|
3. Third
|
|
|
|
|
|
|
|
Pandoc also extends standard markdown in allowing list item markers
|
|
|
|
to be terminated by ')':
|
|
|
|
|
|
|
|
1) First
|
|
|
|
2) Second
|
|
|
|
A) Fee
|
|
|
|
B) Fie
|
|
|
|
3) Third
|
|
|
|
|
|
|
|
Note that Pandoc pays no attention to the *type* of ordered list
|
|
|
|
item marker used. Thus, the following is treated just the same as
|
|
|
|
the example above:
|
|
|
|
|
|
|
|
A) First
|
|
|
|
1. Second
|
|
|
|
2. Fee
|
|
|
|
B) Fie
|
|
|
|
C) Third
|
|
|
|
|
2006-10-17 16:22:29 +02:00
|
|
|
## Literal quotes in titles
|
|
|
|
|
|
|
|
Standard markdown allows unescaped literal quotes in titles, as
|
|
|
|
in
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
[foo]: "bar "embedded" baz"
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc requires all quotes within titles to be escaped:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
[foo]: "bar \"embedded\" baz"
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
## Reference links
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc allows implicit reference links in either of two styles:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
1. Here's my [link]
|
|
|
|
2. Here's my [link][]
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
[link]: linky.com
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
If there's no corresponding reference, the implicit reference link
|
|
|
|
will appear as regular bracketed text. Note: even `[link][]` will
|
|
|
|
appear as `[link]` if there's no reference for `link`. If you want
|
|
|
|
`[link][]`, use a backslash escape: `\[link]\[]`.
|
|
|
|
|
|
|
|
## Footnotes
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc's markdown allows footnotes, using the following syntax:
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-19 08:30:36 +01:00
|
|
|
Here is a footnote reference,[^1] and another.[^longnote]
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-19 08:30:36 +01:00
|
|
|
[^1]: Here is the footnote. It can go anywhere in the document,
|
2006-12-12 08:04:09 +01:00
|
|
|
except in embedded contexts like block quotes or lists.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-19 08:30:36 +01:00
|
|
|
[^longnote]: Here's the other note. This one contains multiple
|
2006-10-27 05:16:13 +02:00
|
|
|
blocks.
|
2006-12-19 08:30:36 +01:00
|
|
|
|
2006-12-20 00:13:03 +01:00
|
|
|
Subsequent paragraphs are indented to show that they belong to
|
2006-12-19 08:30:36 +01:00
|
|
|
the previous footnote.
|
|
|
|
|
|
|
|
{ some.code }
|
|
|
|
|
2006-12-20 00:13:03 +01:00
|
|
|
The whole paragraph can be indented, or just the first line.
|
|
|
|
In this way, multi-paragraph footnotes work just like
|
|
|
|
multi-paragraph list items in markdown.
|
|
|
|
|
|
|
|
This paragraph won't be part of the note.
|
2006-12-19 08:30:36 +01:00
|
|
|
|
|
|
|
The identifiers in footnote references may not contain spaces, tabs,
|
2006-12-20 00:13:03 +01:00
|
|
|
or newlines. These identifiers are used only to correlate the
|
|
|
|
footnote reference with the note itself; in the output, footnotes
|
|
|
|
will be numbered sequentially.
|
|
|
|
|
|
|
|
Inline footnotes are also allowed (though, unlike regular notes,
|
|
|
|
they cannot contain multiple paragraphs). The syntax is as follows:
|
|
|
|
|
|
|
|
Here is an inline note.^[Inlines notes are easier to write, since
|
|
|
|
you don't have to pick an identifier and move down to type the
|
|
|
|
note.]
|
|
|
|
|
|
|
|
Inline and regular footnotes may be mixed freely.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
## Embedded HTML
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc treats embedded HTML in markdown a bit differently than
|
2006-10-17 16:22:29 +02:00
|
|
|
Markdown 1.0. While Markdown 1.0 leaves HTML blocks exactly as they
|
2006-10-29 20:58:20 +01:00
|
|
|
are, Pandoc treats text between HTML tags as markdown. Thus, for
|
|
|
|
example, Pandoc will turn
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<td>*one*</td>
|
|
|
|
<td>[a link](http://google.com)</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
into
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<td><em>one</em></td>
|
|
|
|
<td><a href="http://google.com">a link</a></td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-27 05:16:13 +02:00
|
|
|
whereas Markdown 1.0 will preserve it as is.
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
There is one exception to this rule: text between `<script>` and
|
|
|
|
`</script>` tags is not interpreted as markdown.
|
|
|
|
|
|
|
|
This departure from standard markdown should make it easier to mix
|
|
|
|
markdown with HTML block elements. For example, one can surround
|
|
|
|
a block of markdown text with `<div>` tags without preventing it
|
|
|
|
from being interpreted as markdown.
|
|
|
|
|
|
|
|
## Title blocks
|
|
|
|
|
|
|
|
If the file begins with a title block
|
|
|
|
|
|
|
|
% title
|
|
|
|
% author(s) (separated by commas)
|
|
|
|
% date
|
|
|
|
|
|
|
|
it will be parsed as bibliographic information, not regular text. (It
|
|
|
|
will be used, for example, in the title of standalone LaTeX or HTML
|
|
|
|
output.) The block may contain just a title, a title and an author,
|
|
|
|
or all three lines. Each must begin with a % and fit on one line.
|
|
|
|
The title may contain standard inline formatting. If you want to
|
|
|
|
include an author but no title, or a title and a date but no author,
|
|
|
|
you need a blank line:
|
|
|
|
|
|
|
|
% My title
|
|
|
|
%
|
|
|
|
% June 15, 2006
|
|
|
|
|
|
|
|
Titles will be written only when the `--standalone` (`-s`) option is
|
|
|
|
chosen. In HTML output, titles will appear twice: once in the
|
|
|
|
document head -- this is the title that will appear at the top of the
|
|
|
|
window in a browser -- and once at the beginning of the document body.
|
|
|
|
The title in the document head can have an optional prefix attached
|
|
|
|
(`--title-prefix` or `-T` option). The title in the body appears as
|
|
|
|
an H1 element with class "title", so it can be suppressed or
|
|
|
|
reformatted with CSS.
|
|
|
|
|
|
|
|
If a title prefix is specified with `-T` and no title block appears
|
|
|
|
in the document, the title prefix will be used by itself as the
|
|
|
|
HTML title.
|
|
|
|
|
|
|
|
## Box-style blockquotes
|
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Pandoc supports emacs-style boxquote block quotes, in addition to
|
2006-10-17 16:22:29 +02:00
|
|
|
standard markdown (email-style) boxquotes:
|
|
|
|
|
|
|
|
,----
|
|
|
|
| They look like this.
|
|
|
|
`----
|
|
|
|
|
|
|
|
## Inline LaTeX
|
|
|
|
|
|
|
|
Anything between two $ characters will be parsed as LaTeX math. The
|
|
|
|
opening $ must have a character immediately to its right, while the
|
|
|
|
closing $ must have a character immediately to its left. Thus,
|
|
|
|
`$20,000 and $30,000` won't parse as math. The $ character can be
|
|
|
|
escaped with a backslash if needed.
|
|
|
|
|
|
|
|
If you pass the `-m` (`--asciimathml`) option to `pandoc`, it will
|
|
|
|
include the [ASCIIMathML] script in the resulting HTML. This will
|
|
|
|
cause LaTeX math to be displayed as formulas in better browsers.
|
|
|
|
|
|
|
|
[ASCIIMathML]: http://www1.chapman.edu/~jipsen/asciimath.html
|
|
|
|
|
|
|
|
Inline LaTeX commands will also be preserved and passed unchanged
|
|
|
|
to the LaTeX writer. Thus, for example, you can use LaTeX to
|
|
|
|
include BibTeX citations:
|
|
|
|
|
|
|
|
This result was proved in \cite{jones.1967}.
|
|
|
|
|
|
|
|
You can also use LaTeX environments. For example,
|
|
|
|
|
|
|
|
\begin{tabular}{|l|l|}\hline
|
2006-12-12 08:04:09 +01:00
|
|
|
Age & Frequency \\ \hline
|
2006-10-17 16:22:29 +02:00
|
|
|
18--25 & 15 \\
|
2006-12-12 08:04:09 +01:00
|
|
|
26--35 & 33 \\
|
|
|
|
36--45 & 22 \\ \hline
|
2006-10-17 16:22:29 +02:00
|
|
|
\end{tabular}
|
|
|
|
|
|
|
|
Note, however, that material between the begin and end tags will
|
|
|
|
be interpreted as raw LaTeX, not as markdown.
|
|
|
|
|
|
|
|
## Custom headers
|
|
|
|
|
|
|
|
When run with the "standalone" option (`-s`), `pandoc` creates a
|
|
|
|
standalone file, complete with an appropriate header. To see the
|
|
|
|
default headers used for html and latex, use the following commands:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -D html
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc -D latex
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
If you want to use a different header, just create a file containing
|
|
|
|
it and specify it on the command line as follows:
|
|
|
|
|
2006-12-12 08:04:09 +01:00
|
|
|
pandoc --header=MyHeaderFile
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
# Producing S5 with Pandoc
|
2006-10-17 16:22:29 +02:00
|
|
|
|
2006-10-29 20:58:20 +01:00
|
|
|
Producing an [S5] slide show with Pandoc is easy. A title page is
|
2006-10-17 16:22:29 +02:00
|
|
|
constructed automatically from the document's title block (see above).
|
|
|
|
Each section (with a level-one header) produces a single slide. (Note
|
|
|
|
that if the section is too big, the slide will not fit on the page; S5
|
2006-10-27 05:16:13 +02:00
|
|
|
is not smart enough to produce multiple pages.)
|
2006-10-17 16:22:29 +02:00
|
|
|
|
|
|
|
Here's the markdown source for a simple slide show, `eating.txt`:
|
|
|
|
|
|
|
|
% Eating Habits
|
|
|
|
% John Doe
|
|
|
|
% March 22, 2005
|
|
|
|
|
|
|
|
# In the morning
|
|
|
|
|
|
|
|
- Eat eggs
|
|
|
|
- Drink coffee
|
|
|
|
|
|
|
|
# In the evening
|
|
|
|
|
|
|
|
- Eat spaghetti
|
|
|
|
- Drink wine
|
|
|
|
|
|
|
|
To produce the slide show, simply type
|
|
|
|
|
|
|
|
pandoc -w s5 -s eating.txt > eating.html
|
|
|
|
|
|
|
|
and open up `eating.html` in a browser. The HTML file embeds
|
|
|
|
all the required javascript and CSS, so no other files are necessary.
|
|
|
|
|
|
|
|
Note that by default, the S5 writer produces lists that display
|
|
|
|
"all at once." If you want your lists to display incrementally
|
|
|
|
(one item at a time), use the `-i` option. If you want a
|
|
|
|
particular list to depart from the default (that is, to display
|
|
|
|
incrementally without the `-i` option and all at once with the
|
|
|
|
`-i` option), put it in a block quote:
|
|
|
|
|
|
|
|
> - Eat spaghetti
|
|
|
|
> - Drink wine
|
|
|
|
|
|
|
|
In this way incremental and nonincremental lists can be mixed in
|
|
|
|
a single document.
|
|
|
|
|