Revised customizing-pandoc.md and included TODOs.

This commit is contained in:
John MacFarlane 2018-10-16 09:53:37 -07:00
parent 7f814c5339
commit e32220ef4f

View file

@ -1,64 +1,152 @@
---
author:
- Mauro Bieg
- John MacFarlane
title: Customizing Pandoc
---
This document provides a quick overview over the various ways to
customize pandoc's output. Follow the links to learn how to use each
approach.
customize pandoc's output, with links to fuller documentation
and some examples.
[Templates](/MANUAL.html#templates)
## Templates
: Pandoc comes with a template for (almost) every output format. A
template is a plain text file, that contains for example the line
`$body$`. That variable is replaced by the document's body text on
output.
When the `-s`/`--standalone` option is used, pandoc will
generate a standalone document rather than a fragment.
For example, in HTML output this will include the
`<head>` element; in LaTeX output, it will include the
preamble.
There are many other variables, like `title`, `header-includes`,
etc. that are either set automatically, or that you can set using
[YAML metadata blocks](/MANUAL.html#extension-yaml_metadata_block),
[`--metadata`](/MANUAL.html#option--metadata) (which properly escape
things), or `--variable` (which does no escaping). You can also
generate your own template (e.g. `pandoc -D html > myletter.html`)
and customize that file, for example by introducing new variables.
Pandoc comes with a default template for (almost) every output
format. A template is a plain text file containing variables
that are replaced by text generated by pandoc. For example,
the variable `$body$` will be replaced by the document body,
and `$title$` by the title from metadata. Variables will
be automatically populated by the contents of like-named
metadata fields (with proper escaping). (See
[YAML metadata blocks](/MANUAL.html#extension-yaml_metadata_block)
for documentation on setting metafields in pandoc markdown
documents; the command line option
[`--metadata`](/MANUAL.html#option--metadata) can also be
used.) Values for variables can also be specified directly
from the command line using `--variable` (which does no escaping).
[reference.docx/pptx/odt](/MANUAL.html#option--reference-doc)
To look at the default template for an output format, you can do
`pandoc -D FORMAT`, where `FORMAT` is replaced by the name of
the format. You can also replace the defaults with your
own custom templates, either by using the `--template` option
or by putting the custom template in your user data directory
(on linux and macOS, `~/.pandoc/templates/`).
: To output a `docx`, `pptx` or `odt` document, which is a ZIP of
several files, things are a bit more complicated. Instead of a
single template file, you need to provide a customized
`reference.docx/pptx/odt`.
For more information, see [Templates](/MANUAL.html#templates) in
the pandoc manual.
[Lua filters](lua-filters.html) and [filters](filters.html)
### Example: adding structured author data to HTML
: Templates are very powerful, but they are only a sort of scaffold to
place your document's body text in. You cannot directly change the
body text using the template (beyond e.g. adding CSS for HTML
output, or `\renewcommand` for LaTeX output).
TODO
If you need to affect the output of the actual body text, you
probably need a pandoc filter. A filter is a small program, that
transforms the document, between the parsing and the writing phase,
while it is still in pandoc's native format -- an abstract syntax
tree (AST), not unlike the HTML DOM. As can be seen in the [AST
definition](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html)
`Pandoc Meta [Block]`, a pandoc document is a chunk of metadata and
a list of `Block`s.
### Example: generating documents from YAML metadata
- There's a [list of third party filters on the
wiki](https://github.com/jgm/pandoc/wiki/Pandoc-Filters).
- Unless you have a good reason not to, it's best to write your
own filter in the Lua scripting language. Since pandoc ships
with a Lua interpreter, Lua filters are very portable and
efficient. See [Lua filters](lua-filters.html).
- For a gentle introduction into filters and writing them in any
programming language, see [filters](filters.html).
TODO <!-- Example of generating a structured document,
say, a table, from structured YAML metadata using
just the control structures in pandoc's template
language. -->
Furthecustomizations
## Reference docx/pptx/odt
For `docx`, `pptx` or `odt` documents, things are a bit more
complicated. Instead of a single template file, you need to
provide a customized `reference.docx/pptx/odt`.
See the manual for the
[`--reference-doc`](/MANUAL.html#option--reference-doc) option.
### Example: changing the font and line spacing in a Word docx
TODO
## Filters
Templates are very powerful, but they are only a sort of scaffold to
place your document's body text in. You cannot directly change the
body text using the template.
If you need to affect the output of the actual body text, you
can use a pandoc filter. A filter is a small program that
transforms the document, between the parsing and the writing phase,
while it is still in pandoc's native format. For example,
a filter might find all the Header elements of a document
and capitalize their text.
Pandoc's native representation of a document is an
abstract syntax tree (AST), not unlike the HTML DOM. It is
documented
[here](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html). A `Pandoc` document is a chunk of
metadata (`Meta`) and a list of `Block`s. The `Block`s, in
turn, are composed of other `Block`s and `Inline` elements.
(`Block` elements are things like paragraphs, lists, headers,
and code blocks. `Inline` elements are individual words,
links, emphasis, and so on.) Filters operate on these
elements.
There are two kinds of filters: JSON filters (which transform a
JSON serialization of the pandoc AST, and may be written in any
language that can parse and emit JSON), and Lua filters (which
use an interface built directly into pandoc, and must be written
in the Lua language). If you are writing your own filters, it
is best to use Lua filters, which are more portable (they
require only pandoc itself) and more efficient. See [Lua
filters](lua-filters.html) for documentation and examples. If
you would prefer to write your filter in another language, see
[Filters](filters.html) for a gentle introduction to JSON
filters.
There's a repository of lua filters at
[pandoc/lua-filters](https://github.com/pandoc/lua-filters)
on GitHub. A number of pandoc filters, written in
Haskell, are available on
[Hackage](https://hackage.haskell.org/packages/search?terms=pandoc+filter)
and can be installed using the `stack` or `cabal` tools.
The wiki also lists [third party
filters](https://github.com/jgm/pandoc/wiki/Pandoc-Filters).
### Example: capitalizing headers
TODO
### Example: code extractor
TODO
## Generic Divs and Spans
TODO
[Divs and Spans](/MANUAL.html#divs-and-spans): generic blocks
that can be transformed with filters
### Example: colored text
### Example: custom styles in docx
[Custom Styles in Docx](/MANUAL.html#custom-styles-in-docx)
## Raw attributes
TODO
[Generic raw attributes](/MANUAL.html#generic-raw-attribute):
to include raw snippets
## Custom writers
TODO
[Custom writers](/MANUAL.html#custom-writers)
## Custom syntax highlighting
TODO
[Custom syntax highlighting](/MANUAL.html#syntax-highlighting),
provided by the [skylighting
library](https://github.com/jgm/skylighting)
including highlighting styles
: - [Custom Styles in Docx](/MANUAL.html#custom-styles-in-docx)
- If you're converting from Markdown, see
- [Generic raw attributes](/MANUAL.html#generic-raw-attribute):
to include raw snippets
- [Divs and Spans](/MANUAL.html#divs-and-spans): generic blocks
that can be transformed with filters
- [Custom syntax highlighting](/MANUAL.html#syntax-highlighting),
provided by the [skylighting
library](https://github.com/jgm/skylighting)
- [Custom writers](/MANUAL.html#custom-writers)
- [Pandoc Extras wiki page](https://github.com/jgm/pandoc/wiki/Pandoc-Extras)