Minor corrections and improvements to README.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@10 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2006-10-27 03:16:13 +00:00
parent 86e8b9635a
commit 3a9d4b2d16

172
README
View file

@ -2,11 +2,19 @@
% John MacFarlane
% August 10, 2006
`pandoc` converts files from one markup format to another. It can
read [markdown] and (with some limitations) [reStructuredText], [HTML], and
[LaTeX], and it can write [markdown], [reStructuredText], [HTML],
[LaTeX], [RTF], and [S5] HTML slide shows. It is written in
[Haskell], using the excellent [Parsec] parser combinator library.
`pandoc` is a [Haskell] library for converting files from one markup
format to another, and a command-line tool that uses this library. It can
read [markdown] and (subsets of) [reStructuredText], [HTML], and [LaTeX],
and it can write [markdown], [reStructuredText], [HTML], [LaTeX], [RTF],
and [S5] HTML slide shows. `pandoc`'s version of markdown contains some
enhancements, like footnotes and embedded LaTeX.
In contrast to existing tools for converting markdown to HTML, which
use regex substitutions, `pandoc` has a modular design: it consists of a
set of readers, which parse text in a given format and produce a native
representation of the document, and a set of writers, which convert
this native representation into a target format. Thus, adding an input
or output format requires only adding a reader or writer.
[markdown]: http://daringfireball.net/projects/markdown/
[reStructuredText]: http://docutils.sourceforge.net/docs/ref/rst/introduction.html
@ -15,7 +23,6 @@ read [markdown] and (with some limitations) [reStructuredText], [HTML], and
[LaTeX]: http://www.latex-project.org/
[RTF]: http://en.wikipedia.org/wiki/Rich_Text_Format
[Haskell]: http://www.haskell.org/
[Parsec]: http://www.cs.uu.nl/~daan/download/parsec/parsec.html
(c) 2006 John MacFarlane (jgm At berkeley.edu). Released under the
[GPL], version 2 or greater. This software carries no warranty of
@ -27,7 +34,7 @@ any kind. (See LICENSE for full copyright and warranty notices.)
## Installing GHC
To compile `pandoc`, you'll need [GHC] version 6.4 or greater.
To compile `pandoc`, you'll need [GHC] version 6.4 or greater.
If you don't have GHC already, you can get it from the
[GHC Download] page.
@ -35,65 +42,58 @@ If you don't have GHC already, you can get it from the
[GHC]: http://www.haskell.org/ghc/
[GHC Download]: http://www.haskell.org/ghc/download.html
Note: As of this writing, there's no MacOS X installer package for
GHC 6.4.2 (the latest version). There is an installer for
GHC 6.4.1 [here](http://www.haskell.org/ghc/download_ghc_641.html#macosx).
It will work just fine on PPC-based Macs. GHC has not yet been ported
to Intel Macs: see <http://hackage.haskell.org/trac/ghc/wiki/X86OSXGhc>.
You'll also need standard build tools: GNU Make, sed, bash, and perl.
You'll also need standard build tools: GNU `make`, `sed`, `bash`, and `perl`.
These are standard on unix systems (including MacOS X). If you're
using Windows, you can install [Cygwin].
[Cygwin]: http://www.cygwin.com/
Note: I have tested `pandoc` on MacOS X and Linux systems. I have not
tried it on Windows, and I have no idea whether it will work on Windows.
## Installing `pandoc`
1. Change to the directory containing the `pandoc` distribution.
2. Compile:
make
make
3. Optional, but recommended:
3. See if it worked (optional, but recommended):
make test
make test
4. If you want to install the `pandoc` program and the relevant wrappers
and documents (including this file) into `/usr/local` directory, type:
make install
If you only want the `pandoc` program and the shell scripts `latex2markdown`,
`markdown2latex`, `markdown2pdf`, `markdown2html`, `html2markdown` installed
into your `~/bin` directory, type (note the **`-exec`** suffix):
4. Install:
PREFIX=~ make install-exec
make install
5. If you want to install the Pandoc library modules for use in
other Haskell programs, type (as root):
Note: This installs `pandoc`, together with its wrappers and
documentation, into the `/usr/local` directory, which requires root
privileges. If you don't have root privileges or would prefer to
install `pandoc` and the associated shell scripts into your `~/bin`
directory, type this instead:
make install-lib
6. To install the library documentation (into `/usr/local/pandoc-doc`),
type:
PREFIX=~ make install-exec
make install-lib-doc
5. Install Haskell libraries (optional):
make install-lib
6. Install library documentation into `/usr/local/pandoc-doc` (optional):
make install-lib-doc
## Removing `pandoc`
Each of the installation steps described above can be reversed:
make uninstall
PREFIX=~ make uninstall-exec
make uninstall-lib
make uninstall-lib-doc
# Using `pandoc`
You can run `pandoc` like this:
./pandoc
If you copy the `pandoc` executable to a directory in your path
(perhaps using `make install`), you can invoke it without the "./":
pandoc
If you run `pandoc` without arguments, it will accept input from
STDIN. If you run it with file names as arguments, it will take input
from those files. It accepts several command-line options. For a
@ -104,29 +104,34 @@ list, type
The most important options specify the format of the source file and
the output. The default reader is markdown; the default writer is
HTML. So if you don't specify a reader or writer, `pandoc` will
convert markdown to HTML. To convert markdown to LaTeX, you could
write:
convert markdown to HTML. For example,
pandoc -w latex input.txt
pandoc hello.txt
will convert `hello.txt` from markdown to HTML. For other conversions,
you must specify a reader and/or a writer using the `-r` and `-w`
flags. To convert markdown to LaTeX, you would write:
pandoc -w latex hello.txt
To convert html to markdown:
pandoc -r html -w markdown input.txt
pandoc -r html -w markdown hello.txt
Supported writers include markdown, LaTeX, HTML, RTF,
reStructuredText, and S5 (which produces an HTML file that acts like
powerpoint). Supported readers include markdown, HTML, LaTeX, and
reStructuredText. Note that the rst (reStructuredText) reader only
parses a subset of rst syntax. For example, it doesn't handle tables,
definition lists, option lists, or footnotes. It handles only the
constructs expressible in unextended markdown. But for simple
documents it should be adequate. The LaTeX and HTML readers are also
limited in what they can do.
Supported writers include `markdown`, `latex`, `html`, `rtf` (rich text
format), `rst` (reStructuredText), and `s5` (which produces an HTML
file that acts like powerpoint). Supported readers include `markdown`,
`html`, `latex`, and `rst`. Note that the `rst` reader only parses
a subset of reStructuredText syntax. For example, it doesn't handle
tables, definition lists, option lists, or footnotes. It handles only the
constructs expressible in unextended markdown. But for simple documents
it should be adequate. The `latex` and `html` readers are also limited
in what they can do.
`pandoc` writes its output to STDOUT. If you want to write to a file,
use redirection:
pandoc input.txt > output.html
pandoc hello.txt > hello.html
Note that you can specify multiple input files on the command line.
`pandoc` will concatenate them all (with blank lines between them)
@ -134,14 +139,18 @@ before parsing:
pandoc -s chapter1.txt chapter2.txt chapter3.txt references.txt > book.html
(The `-s` option here tells `pandoc` to produce a standalone HTML file,
with a proper header, rather than a fragment. For more details on this
and many other command-line options, see below.)
## Character encoding
Unfortunately, due to limitations in GHC, `pandoc` does not
automatically detect the system's local character encoding. Hence,
all input and output is assumed to be in the UTF-8 encoding. If you
use accented or foreign characters, you should convert the input file
to UTF-8 before processing it with `pandoc`. This can be done by
piping the input through [`iconv`]: for example,
Unfortunately, due to limitations in GHC, `pandoc` does not automatically
detect the system's local character encoding. Hence, all input and
output is assumed to be in the UTF-8 encoding. If you use accented or
foreign characters, you should convert the input file to UTF-8 before
processing it with `pandoc`. This can be done by piping the input through
[`iconv`]: for example,
iconv -t utf-8 source.txt | pandoc > output.html
@ -158,18 +167,18 @@ from the local encoding to UTF-8 before running them through `pandoc`.
For convenience, five shell scripts have been included that make it
easy to run `pandoc` without remembering all the command-line options.
All of the scripts presuppose that `pandoc` is in the path, and
`html2markdown` also presupposes that `curl` and `tidy` are in the
path.
some have additional requirements. (For example, `html2markdown`
uses `tidy`, and `markdown2pdf` uses `pdflatex`.)
1. `markdown2html` converts markdown to HTML, running `iconv` first to
convert the file to UTF-8. (This can be used as a replacement for
`Markdown.pl`.)
2. `html2markdown` can take either a filename or a URL as argument. If
it is given a URL, it uses `curl` to fetch the contents of the
specified URL, then filters this through `tidy` to straighten up the
HTML and convert to UTF-8, and finally passes this HTML to `pandoc` to
produce markdown text:
it is given a URL, it uses `curl`, `wget`, or an available text-based
browser to fetch the contents of the specified URL, then filters this
through `tidy` to straighten up the HTML and convert to UTF-8,
and finally passes this HTML to `pandoc` to produce markdown text:
html2markdown http://www.fsf.org
@ -185,24 +194,23 @@ path.
markdown2latex mytextfile.txt
5. `markdown2pdf` converts markdown to PDF, using LaTeX, but removing
all the intermediate files created by LaTeX. Example:
5. `markdown2pdf` converts markdown to PDF using `pdflatex`. Example:
markdown2pdf mytextfile.txt
creates a file `mytextfile.pdf` in the working directory.
creates a file `mytextfile.pdf`.
# Command-line options
Various command-line options can be used to customize the output.
Various command-line options can be used to customize the output.
For a complete list, type
pandoc --help
pandoc --help
`-p` or `--preserve-tabs` causes tabs in the source text to be
preserved, rather than converted to spaces (the default).
`--tabstop` allows the user to set the tab stop (which defaults to 4).
`--tabstop` allows the user to set the tab stop (which defaults to 4).
`-R` or `--parse-raw` causes the HTML and LaTeX readers to parse HTML
codes and LaTeX environments that it can't translate as raw HTML or
@ -258,7 +266,7 @@ not work in all browsers, but it works in Firefox. Peter Jipsen's
`-i` or `--incremental` causes all lists in S5 output to be displayed
incrementally by default (one item at a time). The normal default
is for lists to be displayed all at once.
is for lists to be displayed all at once.
`-N` or `--number-sections` causes sections to be numbered in LaTeX
output. By default, sections are not numbered.
@ -267,7 +275,7 @@ output. By default, sections are not numbered.
In parsing markdown, `pandoc` departs from and extends [standard markdown]
in a few respects. (To run `pandoc` on the official
markdown test suite, type `make markdown_tests`.)
markdown test suite, type `make test-markdown`.)
[standard markdown]: http://daringfireball.net/projects/markdown/syntax
@ -328,7 +336,7 @@ appear as `[link]` if there's no reference for `link`. If you want
except in embedded contexts like block quotes or lists.
^(longnote) Here's the other note. This one contains multiple
blocks.
blocks.
^
^ Caret characters are used to indicate that the blocks all belong
to a single footnote (as with block quotes).
@ -363,7 +371,7 @@ into
</tr>
</table>
whereas Markdown 1.0 will preserve it as is.
whereas Markdown 1.0 will preserve it as is.
There is one exception to this rule: text between `<script>` and
`</script>` tags is not interpreted as markdown.
@ -468,7 +476,7 @@ Producing an [S5] slide show with `pandoc` is easy. A title page is
constructed automatically from the document's title block (see above).
Each section (with a level-one header) produces a single slide. (Note
that if the section is too big, the slide will not fit on the page; S5
is not smart enough to produce multiple pages.)
is not smart enough to produce multiple pages.)
Here's the markdown source for a simple slide show, `eating.txt`: