pandoc/doc/customizing-pandoc.md
2018-10-16 10:42:48 -07:00

5.2 KiB

author title
Mauro Bieg
John MacFarlane
Customizing Pandoc

This document provides a quick overview over the various ways to customize pandoc's output, with links to fuller documentation and some examples.

Templates

When the -s/--standalone option is used, pandoc will generate a standalone document rather than a fragment. For example, in HTML output this will include the <head> element; in LaTeX output, it will include the preamble.

Pandoc comes with a default template for (almost) every output format. A template is a plain text file containing variables that are replaced by text generated by pandoc. For example, the variable $body$ will be replaced by the document body, and $title$ by the title from metadata. Variables will be automatically populated by the contents of like-named metadata fields (with proper escaping). (See YAML metadata blocks for documentation on setting metafields in pandoc markdown documents; the command line option --metadata can also be used.) Values for variables can also be specified directly from the command line using --variable (which does no escaping).

To look at the default template for an output format, you can do pandoc -D FORMAT, where FORMAT is replaced by the name of the format. You can also replace the defaults with your own custom templates, either by using the --template option or by putting the custom template in your user data directory (on linux and macOS, ~/.pandoc/templates/).

Note that in many cases you can avoid the need for a custom template by making use of the --include-in-header, --include-before-body, and --include-after-body options.

For more information, see Templates in the pandoc manual.

Example: adding structured author data to HTML

TODO

Example: generating documents from YAML metadata

TODO

Reference docx/pptx/odt

For docx, pptx or odt documents, things are a bit more complicated. Instead of a single template file, you need to provide a customized reference.docx/pptx/odt. See the manual for the --reference-doc option.

Example: changing the font and line spacing in a Word docx

TODO

Filters

Templates are very powerful, but they are only a sort of scaffold to place your document's body text in. You cannot directly change the body text using the template.

If you need to affect the output of the actual body text, you can use a pandoc filter. A filter is a small program that transforms the document, between the parsing and the writing phase, while it is still in pandoc's native format. For example, a filter might find all the Header elements of a document and capitalize their text.

Pandoc's native representation of a document is an abstract syntax tree (AST), not unlike the HTML DOM. It is documented here. A Pandoc document is a chunk of metadata (Meta) and a list of Blocks. The Blocks, in turn, are composed of other Blocks and Inline elements. (Block elements are things like paragraphs, lists, headers, and code blocks. Inline elements are individual words, links, emphasis, and so on.) Filters operate on these elements. You can use pandoc -t native to learn about the AST's structure.

There are two kinds of filters: JSON filters (which transform a JSON serialization of the pandoc AST, and may be written in any language that can parse and emit JSON), and Lua filters (which use an interface built directly into pandoc, and must be written in the Lua language). If you are writing your own filters, it is best to use Lua filters, which are more portable (they require only pandoc itself) and more efficient. See Lua filters for documentation and examples. If you would prefer to write your filter in another language, see Filters for a gentle introduction to JSON filters.

There's a repository of lua filters at pandoc/lua-filters on GitHub. A number of pandoc filters, written in Haskell, are available on Hackage and can be installed using the stack or cabal tools. The wiki also lists third party filters.

Example: capitalizing headers

TODO

Example: code extractor

TODO

Generic Divs and Spans

TODO Divs and Spans: generic blocks that can be transformed with filters

Example: colored text

Example: custom styles in docx

Custom Styles in Docx

Raw attributes

TODO Generic raw attributes: to include raw snippets

Custom writers

TODO Custom writers

Custom syntax highlighting

TODO Custom syntax highlighting, provided by the skylighting library

including highlighting styles