Added introduction and lightly edited rest of lua-filters document.
See #3608.
This commit is contained in:
parent
c462fcafef
commit
f2dfb3f23b
1 changed files with 151 additions and 72 deletions
|
@ -1,86 +1,156 @@
|
||||||
Lua Filters
|
% Pandoc Lua Filters
|
||||||
===========
|
% Albert Krewinkel, John MacFarlane
|
||||||
|
% August 21, 2017
|
||||||
|
|
||||||
Pandoc expects lua files to return a list of filters. The filters in that list
|
# Introduction
|
||||||
are called sequentially, each on the result of the previous filter. If there is
|
|
||||||
no value returned by the filter script, then pandoc will try to generate a
|
|
||||||
filter by collecting all top-level functions whose names correspond to those of
|
|
||||||
pandoc elements (e.g., `Str`, `Para`, `Meta`, or `Pandoc`).
|
|
||||||
|
|
||||||
Filters are expected to be put into separate files and are passed via the
|
Pandoc has long supported filters, which allow the pandoc
|
||||||
`--lua-filter` command-line argument. E.g., if a filter is defined in a file
|
abstract syntax tree (AST) to be manipulated between the parsing
|
||||||
`current-date.lua`, then it would be applied like this:
|
and the writing phase. Traditional pandoc filters accept a JSON
|
||||||
|
representation of the pandoc AST and produce an altered JSON
|
||||||
|
representation of the AST. They may be written in any
|
||||||
|
programming language, and invoked from pandoc using the
|
||||||
|
`--filter` option.
|
||||||
|
|
||||||
|
Although traditional filters are very flexible, they have a
|
||||||
|
couple of disadvantages. First, there is some overhead in
|
||||||
|
writing JSON to stdout and reading it from stdin (twice,
|
||||||
|
once on each side of the filter). Second, whether a filter
|
||||||
|
will work will depend on details of the user's environment.
|
||||||
|
A filter may require an interpreter for a certain programming
|
||||||
|
language to be available, as well as a library for manipulating
|
||||||
|
the pandoc AST in JSON form. One cannot simply provide a filter
|
||||||
|
that can be used by anyone who has a certain version of the
|
||||||
|
pandoc executable.
|
||||||
|
|
||||||
|
Starting with pandoc 2.0, we have made it possible to write
|
||||||
|
filters in lua without any external dependencies at all.
|
||||||
|
A lua interpreter and a lua library for creating pandoc filters
|
||||||
|
is built into the pandoc executable. Pandoc data types
|
||||||
|
are marshalled to lua directly, avoiding the overhead of writing
|
||||||
|
JSON to stdout and reading it from stdin.
|
||||||
|
|
||||||
|
Here is an example of a lua filter that converts strong emphasis
|
||||||
|
to small caps:
|
||||||
|
|
||||||
|
``` lua
|
||||||
|
return {
|
||||||
|
{
|
||||||
|
Strong = function (elem)
|
||||||
|
return pandoc.SmallCaps(elem.c)
|
||||||
|
end,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
or equivalently,
|
||||||
|
|
||||||
|
``` lua
|
||||||
|
function Strong(elem)
|
||||||
|
return pandoc.SmallCaps(elem.c)
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
This says: walk the AST, and when you find a Strong element,
|
||||||
|
replace it with a SmallCaps element with the same content.
|
||||||
|
|
||||||
|
To run it, save it in a file, say `smallcaps.lua`, and invoke
|
||||||
|
pandoc with `--lua-filter=smallcaps.lua`.
|
||||||
|
|
||||||
|
Here's a quick performance comparison, using a version of the
|
||||||
|
pandoc manual, MANUAL.txt, and versions of the same filter
|
||||||
|
written in compiled Haskell (`smallcaps`) and interpreted Python
|
||||||
|
(`smallcaps.py`):
|
||||||
|
|
||||||
|
| Command | Time |
|
||||||
|
|--------------------------------------------------|------:|
|
||||||
|
| `pandoc MANUAL.txt` | 1.01s |
|
||||||
|
| `pandoc MANUAL.txt --filter ./smallcaps` | 1.36s |
|
||||||
|
| `pandoc MANUAL.txt --filter ./smallcaps.py` | 1.40s |
|
||||||
|
| `pandoc MANUAL.txt --lua-filter ./smallcaps.lua` | 1.03s |
|
||||||
|
|
||||||
|
As you can see, the lua filter avoids the substantial overhead
|
||||||
|
associated with marshalling to and from JSON over a pipe.
|
||||||
|
|
||||||
|
# Lua filter structure
|
||||||
|
|
||||||
|
Lua filters are tables with element names as keys and values
|
||||||
|
consisting of functions acting on those elements.
|
||||||
|
|
||||||
|
Filters are expected to be put into separate files and are
|
||||||
|
passed via the `--lua-filter` command-line argument. For
|
||||||
|
example, if a filter is defined in a file `current-date.lua`,
|
||||||
|
then it would be applied like this:
|
||||||
|
|
||||||
pandoc --lua-filter=current-date.lua -f markdown MANUAL.txt
|
pandoc --lua-filter=current-date.lua -f markdown MANUAL.txt
|
||||||
|
|
||||||
The `--lua-filter` can be supplied multiple times, causing the filters to be
|
The `--lua-filter` can be supplied multiple times, causing the
|
||||||
applied sequentially in the order they were given. If other, non-Lua filters are
|
filters to be applied sequentially in the order they were given.
|
||||||
given as well (via `--filter`), then those are executed *after* all Lua filters
|
If other, non-Lua filters are given as well (via `--filter`),
|
||||||
have been applied.
|
then those are executed *after* all Lua filters have been
|
||||||
|
applied.
|
||||||
|
|
||||||
Lua Filter Structure
|
Pandoc expects each lua file to return a list of filters. The
|
||||||
--------------------
|
filters in that list are called sequentially, each on the result
|
||||||
|
of the previous filter. If there is no value returned by the
|
||||||
|
filter script, then pandoc will try to generate a single filter
|
||||||
|
by collecting all top-level functions whose names correspond to
|
||||||
|
those of pandoc elements (e.g., `Str`, `Para`, `Meta`, or
|
||||||
|
`Pandoc`). (That is why the two examples above are equivalent.)
|
||||||
|
|
||||||
Lua filters are tables with element names as keys and values consisting
|
For each filter, the document is traversed and each element
|
||||||
of functions acting on those elements.
|
subjected to the filter. Elements for which the filter contains
|
||||||
|
an entry (i.e. a function of the same name) are passed to lua
|
||||||
|
element filtering function. In other words, filter entries will
|
||||||
|
be called for each corresponding element in the document,
|
||||||
|
getting the respective element as input.
|
||||||
|
|
||||||
Filter Application
|
The element function's output must be an element of the same
|
||||||
------------------
|
type as the input. This means a filter function acting on an
|
||||||
|
inline element must return an inline, and a block element must
|
||||||
For each filter, the document is traversed and each element subjected to
|
remain a block element after filter application. Pandoc will
|
||||||
the filter. Elements for which the filter contains an entry (i.e. a
|
throw an error if this condition is violated.
|
||||||
function of the same name) are passed to lua element filtering function.
|
|
||||||
In other words, filter entries will be called for each corresponding
|
|
||||||
element in the document, getting the respective element as input.
|
|
||||||
|
|
||||||
The element function's output must be an element of the same type as the
|
|
||||||
input. This means a filter function acting on an inline element must
|
|
||||||
return an inline, and a block element must remain a block element after
|
|
||||||
filter application. Pandoc will throw an error if this condition is
|
|
||||||
violated.
|
|
||||||
|
|
||||||
Elements without matching functions are left untouched.
|
Elements without matching functions are left untouched.
|
||||||
|
|
||||||
See [module documentation](pandoc-module.html) for a list of pandoc
|
See [module documentation](pandoc-module.html) for a list of pandoc
|
||||||
elements.
|
elements.
|
||||||
|
|
||||||
|
# Pandoc Module
|
||||||
|
|
||||||
Pandoc Module
|
The `pandoc` lua module is loaded into the filter's lua
|
||||||
=============
|
environment and provides a set of functions and constants to
|
||||||
|
make creation and manipulation of elements easier. The global
|
||||||
|
variable `pandoc` is bound to the module and should generally
|
||||||
|
not be overwritten for this reason.
|
||||||
|
|
||||||
The `pandoc` lua module is loaded into the filter's lua environment and
|
Two major functionalities are provided by the module: element
|
||||||
provides a set of functions and constants to make creation and
|
creator functions and access to some of pandoc's main
|
||||||
manipulation of elements easier. The global variable `pandoc` is bound
|
functionalities.
|
||||||
to the module and should generally not be overwritten for this reason.
|
|
||||||
|
|
||||||
Two major functionalities are provided by the module: element creator
|
## Element creation
|
||||||
functions and access to some of pandoc's main functionalities.
|
|
||||||
|
|
||||||
Element creation
|
Element creator functions like `Str`, `Para`, and `Pandoc` are
|
||||||
----------------
|
designed to allow easy creation of new elements that are simple
|
||||||
|
to use and can be read back from the lua environment.
|
||||||
|
Internally, pandoc uses these functions to create the lua
|
||||||
|
objects which are passed to element filter functions. This means
|
||||||
|
that elements created via this module will behave exactly as
|
||||||
|
those elements accessible through the filter function parameter.
|
||||||
|
|
||||||
Element creator functions like `Str`, `Para`, and `Pandoc` are designed to
|
## Exposed pandoc functionality
|
||||||
allow easy creation of new elements that are simple to use and can be
|
|
||||||
read back from the lua environment. Internally, pandoc uses these
|
|
||||||
functions to create the lua objects which are passed to element filter
|
|
||||||
functions. This means that elements created via this module will behave
|
|
||||||
exactly as those elements accessible through the filter function parameter.
|
|
||||||
|
|
||||||
Exposed pandoc functionality
|
Some filters will require access to certain functions provided
|
||||||
----------------------------
|
by pandoc. This is currently limited to the `read` function
|
||||||
|
which allows to parse strings into pandoc documents from within
|
||||||
|
the lua filter.
|
||||||
|
|
||||||
Some filters will require access to certain functions provided by
|
# Examples
|
||||||
pandoc. This is currently limited to the `read` function which allows to
|
|
||||||
parse strings into pandoc documents from within the lua filter.
|
|
||||||
|
|
||||||
|
## Macro substitution.
|
||||||
|
|
||||||
Examples
|
The following filter converts the string `{{helloworld}}` into
|
||||||
--------
|
emphasized text "Hello, World".
|
||||||
|
|
||||||
### Macro substitution.
|
|
||||||
|
|
||||||
The following filter converts strings containing `{{helloworld}}` with
|
|
||||||
emphasized text.
|
|
||||||
|
|
||||||
``` lua
|
``` lua
|
||||||
return {
|
return {
|
||||||
|
@ -96,9 +166,11 @@ return {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Default metadata file
|
## Default metadata file
|
||||||
|
|
||||||
Using the metadata from an external file as default values.
|
This filter causes metadata defined in an external file
|
||||||
|
(`metadata-file.yaml`) to be used as default values in
|
||||||
|
a document's metadata:
|
||||||
|
|
||||||
``` lua
|
``` lua
|
||||||
-- read metadata file into string
|
-- read metadata file into string
|
||||||
|
@ -122,7 +194,10 @@ return {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Setting the date in the metadata
|
## Setting the date in the metadata
|
||||||
|
|
||||||
|
This filter sets the date in the document's metadata to the
|
||||||
|
current date:
|
||||||
|
|
||||||
```lua
|
```lua
|
||||||
function Meta(m)
|
function Meta(m)
|
||||||
|
@ -131,7 +206,7 @@ function Meta(m)
|
||||||
end
|
end
|
||||||
```
|
```
|
||||||
|
|
||||||
### Extracting information about links
|
## Extracting information about links
|
||||||
|
|
||||||
This filter prints a table of all the URLs linked to
|
This filter prints a table of all the URLs linked to
|
||||||
in the document, together with the number of links to
|
in the document, together with the number of links to
|
||||||
|
@ -168,11 +243,15 @@ function Doc (blocks, meta)
|
||||||
end
|
end
|
||||||
```
|
```
|
||||||
|
|
||||||
### Replacing placeholders with their metadata value
|
## Replacing placeholders with their metadata value
|
||||||
|
|
||||||
Lua filter functions are run in the order *Inlines → Blocks → Meta → Pandoc*.
|
Lua filter functions are run in the order
|
||||||
Passing information from a higher level (e.g., metadata) to a lower level (e.g.,
|
|
||||||
inlines) is still possible by using two filters living in the same file:
|
> *Inlines → Blocks → Meta → Pandoc*.
|
||||||
|
|
||||||
|
Passing information from a higher level (e.g., metadata) to a
|
||||||
|
lower level (e.g., inlines) is still possible by using two
|
||||||
|
filters living in the same file:
|
||||||
|
|
||||||
``` lua
|
``` lua
|
||||||
local vars = {}
|
local vars = {}
|
||||||
|
@ -200,8 +279,8 @@ If the contents of file `occupations.md` is
|
||||||
|
|
||||||
``` markdown
|
``` markdown
|
||||||
---
|
---
|
||||||
name: John MacFarlane
|
name: Samuel Q. Smith
|
||||||
occupation: Professor of Philosophy
|
occupation: Professor of Phrenology
|
||||||
---
|
---
|
||||||
|
|
||||||
Name
|
Name
|
||||||
|
@ -218,10 +297,10 @@ then running `pandoc --lua-filter=meta-vars.lua occupations.md` will output:
|
||||||
``` html
|
``` html
|
||||||
<dl>
|
<dl>
|
||||||
<dt>Name</dt>
|
<dt>Name</dt>
|
||||||
<dd><p><span>John MacFarlane</span></p>
|
<dd><p><span>Samuel Q. Smith</span></p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt>Occupation</dt>
|
<dt>Occupation</dt>
|
||||||
<dd><p><span>Professor of Philosophy</span></p>
|
<dd><p><span>Professor of Phrenology</span></p>
|
||||||
</dd>
|
</dd>
|
||||||
</dl>
|
</dl>
|
||||||
```
|
```
|
||||||
|
|
Loading…
Reference in a new issue