Removed plugin support.

Plugins will probably be added in a later version,
but for now we want to avoid the heavy GHC API dependency.
Also, plugins are very slow with the current system.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@1548 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2009-02-24 22:59:44 +00:00
parent bb639d5506
commit 5f3a147dfd
6 changed files with 8 additions and 332 deletions

217
README
View file

@ -309,14 +309,6 @@ For further documentation, see the `pandoc(1)` man page.
repeatedly to include multiple files. They will be included in the
order specified.
`-P` *MODULE[,MODULE...]*, `--plugins`*=MODULE[,MODULE...]*
: specifies plugins to load, by module name or source file pathname.
Plugins should export a function `transform` of type `a -> a`
or `a -> IO a`, where `a` is `Inline`, `Block`, `Pandoc`,
`[Inline]`, or `[Block]`. This function will be used to transform
the pandoc document after it is parsed by the reader and before it
is written out by the writer. (See below on [Plugins](#plugins).)
`-T` or `--title-prefix` *string*
: includes *string* as a prefix at the beginning of the title that
appears in the HTML header (but not in the title as it appears at
@ -1124,212 +1116,3 @@ ordinary HTML (without bird tracks).
writes HTML with the Haskell code in bird tracks, so it can be copied
and pasted as literate Haskell source.
Plugins
=======
Pandoc's plugin system allows users to modify pandoc's behavior by writing
short Haskell programs. To how this works, and why it is useful, we
need to understand that pandoc transforms one format (the source
format) into another (the target format) by first converting from the
source format into a Haskell data structure representing the document,
and then converting this data structure into the target format. For
example:
-------------------------------------------------------------------------------
Document Format Contents
------------ --------- -----------------------------------------------------
source markdown `Hello *world*.`
↓ ↓ ↓
intermediate native `Pandoc (Meta [] [] "")
[Para [Str "Hello",Space,Emph [Str "world"],Str "."]]`
↓ ↓ ↓
target HTML `<p>Hello <em>world</em>.</p>`
-------------------------------------------------------------------------------
We can use standard text-processing tools (`perl`, `sed`, `awk`, etc.)
to modify the source or target documents. But what if we want to modify
the intermediate representation -- the parsed document -- before it is
written to the target format? That's where plugins are needed.
A plugin is a Haskell module that exports a function `transform`, which
will be used to transform the native representation, after it is generated
by the reader, but before it has been transformed by the writer.
An example will help make this clearer. Suppose we want to use pandoc with
the WordPress blog engine. WordPress provides support for LaTeX math, but
instead of `$e = mc^2$`, WordPress wants `$LaTeX e = mc^2$`. Prior to plugins,
there was no good way to make pandoc do this. We could have tried using
regex replacements on the markdown input or HTML output, but this would have
been error-prone: if someone writes `$e = mc^2$` in a code block, for
example, we wouldn't want to insert `LaTeX` there. There's no good way to
identify the math chunks without parsing the whole document. And pandoc
is already doing that, so why not make use of this work? By writing a
plugin, we can. Here's the whole plugin:
~~~ {.haskell}
-- WordPressPlugin.hs
module WordPressPlugin (transform) where
import Text.Pandoc
transform :: Inline -> Inline
transform (Math x y) = Math x $ "LaTeX " ++ y
transform x = x
~~~
This is a Haskell program, but a very short one. The lines
~~~ {.haskell}
module WordPressPlugin (transform) where
import Text.Pandoc
~~~
just define the name of the module (`WordPressPlugin`), the names of any
exported functions (for a plugin, this will always just be `transform`),
and the modules that will be used in the program itself (`Text.Pandoc`).
Every plugin must export a function named `transform`.
The real meat of the program is the three-line definition of `transform`:
~~~ {.haskell}
transform :: Inline -> Inline
transform (Math x y) = Math x $ "LaTeX " ++ y
transform x = x
~~~
The first line defines the type of the function: it is a function that
takes an `Inline` element and returns an `Inline` element. (For the definition
of `Inline`, see the module `Text.Pandoc.Definition`.) The `transform`
function in a plugin need not be `Inline -> Inline`, but it must have
type `a -> a` or `a -> IO a`, where `a` is `Pandoc`, `Block`, `Inline`,
`[Block]`, or `[Inline]`.
The next line says that when the input matches the pattern `Math x y`,
the string `LaTeX ` should be inserted at the beginning of `y`. (The `x`
just specifies whether the math element is inline or display math, so
we leave it alone.) The last line says, in effect, that the `transform`
function has no effect on any other kind of `Inline` element -- it just
passes it through. When the plugin is applied, this transformation will
be used on every `Inline` element in the document, and `LaTeX ` will be
inserted where needed in math elements.
To use this plugin, we just specify the module (or alternatively the filename)
with the `--plugins` option:
% echo "Hello, $e=mc^2$." | pandoc -m --plugins=WordPressPlugin.hs
<p
>Hello, <span class="LaTeX"
>$LaTeX e=mc^2$</span
>.</p
>
Let's look at a more complex example, involving IO. Suppose we want to include
some [graphviz](http://www.graphviz.org/) diagrams in our document.
Of course, we could use a Makefile to generate the diagrams, then use
regular images in our document. But wouldn't it be nicer just to include
the graphviz code in the document itself, perhaps in a specially marked
delimited code block?
~~~ {.dot name="diagram1"}
graph G {
e
subgraph clusterA {
a -- b;
subgraph clusterC {
C -- D;
}
}
subgraph clusterB {
d -- f
}
d -- D
e -- clusterB
clusterC -- clusterB
}
~~~
This can be accomplished by a plugin:
~~~ {.haskell}
-- DotPlugin.hs
module DotPlugin (transform) where
import Text.Pandoc
import Text.Pandoc.Shared
import System.Process (readProcess)
import Data.Char (ord)
-- from the utf8-string package on HackageDB:
import Data.ByteString.Lazy.UTF8 (fromString)
-- from the SHA package on HackageDB:
import Data.Digest.Pure.SHA
transform :: Block -> IO Block
transform (CodeBlock (id, classes, namevals) contents) | "dot" `elem` classes = do
let (name, outfile) = case lookup "name" namevals of
Just fn -> ([Str fn], fn ++ ".png")
Nothing -> ([], uniqueName contents ++ ".png")
result <- readProcess "dot" ["-Tpng"] contents
writeFile outfile result
return $ Para [Image name (outfile, "")]
transform x = return x
-- | Generate a unique filename given the file's contents.
uniqueName :: String -> String
uniqueName = showDigest . sha1 . fromString
~~~
The heart of this plugin is the `transform` function, which converts a `Block`
to a `Block`. Again, there are two clauses, one for code blocks that are marked
with the "dot" class, one for all other blocks. Code blocks with ".dot" are
replaced with links to an image file; this file is generated by running
`dot -Tpng` on the contents of the code block.
Because `transform` performs file reads and writes, it needs to be in the
IO monad, hence the type: `Block -> IO Block`.
One more example. Suppose we want emphasized text to be CAPITALIZED
instead of italicized. We could use a plugin:
~~~ {.haskell}
module CapitalizeEmphasisPlugin (transform) where
import Text.Pandoc
import Data.Char (toUpper)
transform :: [Inline] -> [Inline]
transform (Emph xs : ys) = processWith capStr xs ++ transform ys
transform (x : ys) = x : transform ys
transform [] = []
capStr :: Inline -> Inline
capStr (Str x) = Str (map toUpper x)
capStr x = x
~~~
Here `transform` converts a whole list of `Inline` elements to another
such list. The key clause is
~~~ {.haskell}
transform (Emph xs : ys) = processWith capStr xs ++ transform ys
~~~
This applies the `capStr` function recursively to all inlines in the
list of emphasized inlines and puts the transformed list in place
of the original. `capStr` is a simple `Inline` transformation that
capitalizes `Str` elements and leaves everything else alone. The
function `processWith`, defined in `Text.Pandoc.Definition`, uses some
`Data.Generics` magic to apply its argument (here `capStr`) to every
`Inline` element in a list, including elements that are deeply buried in
other elements. Thus
processWith captStr [Str "one", Strong [Str "two", Space]] ==>
[Str "ONE", Strong [Str "TWO", Space]]
There are other sample plugins in the `plugins` subdirectory of the
pandoc source code.
**Note:** Do not attempt to use plugins when running pandoc in the
directory containing pandoc's source code. The interpreter will try to
load the files directly from the source code, rather than reading the compiled
versions, and pandoc will hang.

View file

@ -16,19 +16,9 @@ pandoc (1.2)
+ Added tests (run only if highlighting support compiled in).
+ Documented lhs features in man page and README.
* Added a plugin system, based on hint. This allows users to
write small interpreted Haskell programs that transform the
AST between the reader and the writer.
+ In Text.Pandoc.Definition, added processWith, processWithM,
and queryWith, and deprecated processPandoc and queryPandoc
for these more general functions, which are useful in writing
plugins.
+ Added module Text.Pandoc.Plugins.
+ Added a --plugins option to pandoc, and code to run the parsed
document through all the plugins.
+ Provided five sample plugin files in the plugins/ directory.
+ Documented --plugin in the pandoc man page and README.
* In Text.Pandoc.Definition, added processWith, processWithM,
and queryWith, and deprecated processPandoc and queryPandoc
for these more general functions.
* Fixed bug in mediawiki writer: improper closing tags in tables.
Thanks to Benct Philip Jonsson for reporting the bug.

View file

@ -181,14 +181,6 @@ to Pandoc. Or use `html2markdown`(1), a wrapper around `pandoc`.
`context`, `docbook`, `man`, `markdown`, `opendocument`,
`rst`, `rtf`).
-P *MODULE[,MODULE...]*, \--plugins=*MODULE[,MODULE...]*
: Specify plugins to load, by module name or source file pathname.
Plugins should export a function `transform` of type `a -> a`
or `a -> IO a`, where `a` is `Inline`, `Block`, `Pandoc`,
`[Inline]`, or `[Block]`. This function will be used to transform
the pandoc document after it is parsed by the reader and before it
is written out by the writer.
-T *STRING*, \--title-prefix=*STRING*
: Specify *STRING* as a prefix to the HTML window title.

View file

@ -38,13 +38,7 @@ Data-Files:
-- documentation
README, INSTALL, COPYRIGHT, BUGS, changelog,
-- wrappers
markdown2pdf, html2markdown, hsmarkdown,
-- sample plugins
plugins/WordPressPlugin.hs,
plugins/CapitalizeEmphasisPlugin.hs,
plugins/IncludeFilePlugin.hs,
plugins/DotPlugin.hs,
plugins/ListLinksPlugin.hs
markdown2pdf, html2markdown, hsmarkdown
Extra-Source-Files:
-- sources for man pages
man/man1/pandoc.1.md, man/man1/markdown2pdf.1.md,
@ -159,7 +153,7 @@ Library
mtl >= 1.1, network >= 2, filepath >= 1.1,
process >= 1, directory >= 1, template-haskell >= 2.2,
bytestring >= 0.9, zip-archive >= 0.1.1,
utf8-string >= 0.3, old-time >= 1, hint >= 0.3.0.0
utf8-string >= 0.3, old-time >= 1
if impl(ghc >= 6.10)
Build-depends: base >= 4, syb
else
@ -178,7 +172,6 @@ Library
Text.Pandoc.CharacterReferences,
Text.Pandoc.Shared,
Text.Pandoc.ODT,
Text.Pandoc.Plugins,
Text.Pandoc.LaTeXMathML,
Text.Pandoc.DefaultHeaders,
Text.Pandoc.Highlighting,

View file

@ -1,69 +0,0 @@
{-
Copyright (C) 2008 John MacFarlane <jgm@berkeley.edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-}
{- |
Module : Text.Pandoc.Pluigns
Copyright : Copyright (C) 2008 John MacFarlane
License : GNU GPL, version 2 or above
Maintainer : John MacFarlane <jgm@berkeley.edu>
Stability : alpha
Portability : portable
Support for plugins.
-}
module Text.Pandoc.Plugins (getPlugin)
where
import Language.Haskell.Interpreter
import Text.Pandoc
import Control.Monad (unless, liftM)
import Control.Monad.Error (throwError)
import Data.List (isInfixOf)
-- | Returns the function named @transform@ in the specified
-- module. The module may be identified either by module name
-- or by path name. The @transform@ function should have type
-- @a -> a@ or @a -> IO a@, where @a@ is an instance of 'Data':
-- for example, @Pandoc -> Pandoc@, @Inline -> IO Inline@,
-- @Block -> Block@, or @[Inline] -> IO [Inline]@.
getPlugin :: String -> IO (Pandoc -> IO Pandoc)
getPlugin modsrc = do
res <- runInterpreter (evaluatePlugin modsrc)
case res of
Right func -> return func
Left (WontCompile xs) -> error $ "WontCompile error for plugin '" ++ modsrc ++ "'\n" ++ unlines (map errMsg xs)
Left (NotAllowed x) -> error $ "NotAllowed error for plugin '" ++ modsrc ++ "'\n" ++ x
Left (UnknownError x) -> error $ "UnknownError for plugin '" ++ modsrc ++ "'\n" ++ x
Left (GhcException x) -> error $ "GhcException for plugin '" ++ modsrc ++ "'\n" ++ x
evaluatePlugin :: String -> Interpreter (Pandoc -> IO Pandoc)
evaluatePlugin modsrc = do
set [installedModulesInScope := False]
loadModules [modsrc]
modnames <- getLoadedModules
setTopLevelModules modnames
setImports ["Prelude", "Text.Pandoc", "Text.Pandoc.Definition"]
exports <- liftM concat $ mapM getModuleExports modnames
unless ((Fun "transform") `elem` exports) $
throwError $ UnknownError $ "The plugin module must define a function 'transform'."
transformType <- typeOf "transform"
if "-> IO" `isInfixOf` transformType
then interpret "processWithM transform" (as :: Pandoc -> IO Pandoc)
else interpret "return . (processWith transform)" (as :: Pandoc -> IO Pandoc)

View file

@ -32,7 +32,7 @@ writers.
module Main where
import Text.Pandoc
import Text.Pandoc.ODT
import Text.Pandoc.Shared ( HTMLMathMethod (..), splitBy, tabFilter, ObfuscationMethod (..) )
import Text.Pandoc.Shared ( HTMLMathMethod (..), tabFilter, ObfuscationMethod (..) )
import Text.Pandoc.Highlighting ( languages )
import System.Environment ( getArgs, getProgName, getEnvironment )
import System.Exit ( exitWith, ExitCode (..) )
@ -48,8 +48,7 @@ import System.IO.UTF8
import Text.CSL
import Text.Pandoc.Biblio
#endif
import Text.Pandoc.Plugins (getPlugin)
import Control.Monad (foldM, when, unless)
import Control.Monad (when, unless)
copyrightMessage :: String
copyrightMessage = "\nCopyright (C) 2006-8 John MacFarlane\n" ++
@ -365,15 +364,6 @@ options =
"FILENAME")
"" -- "File to use for custom header (implies -s)"
, Option "P" ["plugins"]
(ReqArg
(\arg opt -> do
let pluginModules = splitBy ',' arg
plugins <- mapM getPlugin pluginModules
return opt { optPlugins = plugins })
"MODULE[,MODULE...]")
"" -- "Haskell modules"
, Option "T" ["title-prefix"]
(ReqArg
(\arg opt -> return opt { optTitlePrefix = arg,
@ -540,7 +530,6 @@ main = do
, optReferenceLinks = referenceLinks
, optWrapText = wrap
, optSanitizeHTML = sanitize
, optPlugins = plugins
, optEmailObfuscation = obfuscationMethod
#ifdef _CITEPROC
, optBiblioFile = biblioFile
@ -653,9 +642,7 @@ main = do
return doc
#endif
doc'' <- foldM (flip ($)) doc' plugins
let writerOutput = writer writerOptions doc'' ++ "\n"
let writerOutput = writer writerOptions doc' ++ "\n"
case writerName' of
"odt" -> saveOpenDocumentAsODT outputFile sourceDirRelative writerOutput