More progress on using-the-pandoc-api.md.
This commit is contained in:
parent
f82bcc2bf3
commit
07bf48d07b
1 changed files with 241 additions and 35 deletions
|
@ -9,77 +9,283 @@ Detailed API documentation at the level of individual functions
|
|||
and types is available at
|
||||
<https://hackage.haskell.org/package/pandoc>.
|
||||
|
||||
# Pandoc's architecture, and a simple example
|
||||
# Pandoc's architecture
|
||||
|
||||
Pandoc structure, readers, writers.
|
||||
Pandoc is structured as a set of *readers*, which translate
|
||||
various input formats into an abstract syntax tree (the
|
||||
Pandoc AST) representing a structured document, and a set of
|
||||
*writers*, which render this AST into various input formats.
|
||||
Pictorially:
|
||||
|
||||
example of using a reader.
|
||||
```
|
||||
[input format] ==reader==> [Pandoc AST] ==writer==> [output format]
|
||||
```
|
||||
|
||||
example of using a writer.
|
||||
This architecture allows pandoc to perform $M \times n$
|
||||
conversions with $M$ readers and $N$ writers.
|
||||
|
||||
chaining them together.
|
||||
The Pandoc AST is defined in the
|
||||
[pandoc-types](https://hackage.haskell.org/package/pandoc-types)
|
||||
package. You should start by looking at the Haddock
|
||||
documentation for
|
||||
[Text.Pandoc.Definition](https://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html). As you'll see, a `Pandoc` is
|
||||
composed of some metadata and a list of `Block`s. There are
|
||||
various kinds of `Block`, including `Para` (paragraph),
|
||||
`Header` (section heading), and `BlockQuote`. Some of the
|
||||
`Block`s (like `BlockQuote`) contain lists of `Block`s,
|
||||
while others (like `Para`) contain lists of `Inline`s, and
|
||||
still others (like `CodeBlock`) contain plain text or
|
||||
nothing. `Inline`s are the basic elements of paragraphs.
|
||||
The distinction between `Block` and `Inline` in the type
|
||||
system makes it impossible to represent, for example,
|
||||
a link (`Inline`) whose link text is a block quote (`Block`).
|
||||
This expressive limitation is mostly a help rather than a
|
||||
hindrance, since many of the formats pandoc supports have
|
||||
similar limitations.
|
||||
|
||||
The best way to explore the pandoc AST is to use `pandoc -t
|
||||
native`, which will display the AST correspoding to some
|
||||
Markdown input:
|
||||
|
||||
```
|
||||
% echo -e "1. *foo*\n2. bar" | pandoc -t native
|
||||
[OrderedList (1,Decimal,Period)
|
||||
[[Plain [Emph [Str "foo"]]]
|
||||
,[Plain [Str "bar"]]]]
|
||||
```
|
||||
|
||||
# A simple example
|
||||
|
||||
Here is a simple example of the use of a pandoc reader and
|
||||
writer to perform a conversion inside ghci:
|
||||
|
||||
```
|
||||
import Text.Pandoc
|
||||
import qualified Data.Text as T
|
||||
import qualified Data.Text.IO as TIO
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
result <- runIO $ do
|
||||
doc <- readMarkdown def (T.pack "[testing](url)")
|
||||
writeRST def doc
|
||||
rst <- handleError result
|
||||
TIO.putStrLn rst
|
||||
```
|
||||
|
||||
Some notes:
|
||||
|
||||
1. The first part constructs a conversion pipeline: the input
|
||||
string is passed to `readMarkdown`, and the resulting Pandoc
|
||||
AST (`doc`) is then rendered by `writeRST`. The conversion
|
||||
pipeline is "run" by `runIO`---more on that below.
|
||||
|
||||
2. `result` has the type `Either PandocError Text`. We could
|
||||
pattern-match on this manually, but it's simpler in this
|
||||
context to use the `handleError` function from
|
||||
Text.Pandoc.Error. This exits with an appropriate error
|
||||
code and message if the value is a `Left`, and returns the
|
||||
`Text` if the value is a `Right`.
|
||||
|
||||
# The PandocMonad class
|
||||
|
||||
Pandoc's functions define computations that can be run in
|
||||
any instance of the `PandocMonad` typeclass. Two instances
|
||||
are provided: `PandocIO` and `PandocPure`. The difference is
|
||||
that computations run in `PandocIO` are allowed to do IO
|
||||
(for example, read a file), while computations in `PandocPure`
|
||||
are free of any side effects. `PandocPure` is useful when
|
||||
you want to prevent users from doing anything malicious.
|
||||
Let's look at the types of `readMarkdown` and `writeRST`:
|
||||
|
||||
Here's an example of such a computation, from the module
|
||||
`Text.Pandoc.Class`:
|
||||
```haskell
|
||||
readMarkdown :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
|
||||
|
||||
writeRST :: PandocMonad m => WriterOptions -> Pandoc -> m Text
|
||||
```
|
||||
|
||||
The `PandocMonad m =>` part is a typeclass constraint.
|
||||
It says that `readMarkdown` and `writeRST` define computations
|
||||
that can be used in any instance of the `PandocMonad`
|
||||
type class. `PandocMonad` is defined in the module
|
||||
Text.Pandoc.Class.
|
||||
|
||||
Two instances of `PandocMonad` are provided: `PandocIO` and
|
||||
`PandocPure`. The difference is that computations run in
|
||||
`PandocIO` are allowed to do IO (for example, read a file),
|
||||
while computations in `PandocPure` are free of any side effects.
|
||||
`PandocPure` is useful for sandboxed environments, when you want
|
||||
to prevent users from doing anything malicious. To run the
|
||||
conversion in `PandocIO`, use `runIO` (as above). To run it in
|
||||
`PandocPure`, use `runPure`.
|
||||
|
||||
As you can see from the Haddocks,
|
||||
[Text.Pandoc.Class](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Class.html)
|
||||
exports many auxiliary functions that can be used in any
|
||||
instance of `PandocMonad`. For example:
|
||||
|
||||
```haskell
|
||||
-- | Get the verbosity level.
|
||||
getVerbosity :: PandocMonad m => m Verbosity
|
||||
|
||||
-- | Set the verbosity level.
|
||||
setVerbosity :: PandocMonad m => Verbosity -> m ()
|
||||
|
||||
-- Get the accomulated log messages (in temporal order).
|
||||
getLog :: PandocMonad m => m [LogMessage]
|
||||
getLog = reverse <$> getsCommonState stLog
|
||||
|
||||
-- | Log a message using 'logOutput'. Note that
|
||||
-- 'logOutput' is called only if the verbosity
|
||||
-- level exceeds the level of the message, but
|
||||
-- the message is added to the list of log messages
|
||||
-- that will be retrieved by 'getLog' regardless
|
||||
-- of its verbosity level.
|
||||
report :: PandocMonad m => LogMessage -> m ()
|
||||
|
||||
-- | Fetch an image or other item from the local filesystem or the net.
|
||||
-- Returns raw content and maybe mime type.
|
||||
fetchItem :: PandocMonad m
|
||||
=> String
|
||||
-> m (B.ByteString, Maybe MimeType)
|
||||
|
||||
setResourcePath :: PandocMonad m => [FilePath] -> m ()
|
||||
```
|
||||
|
||||
motivations
|
||||
Class.
|
||||
If we wanted more verbose informational messages
|
||||
during the conversion we defined in the previous
|
||||
section, we could do this:
|
||||
|
||||
# The Pandoc structure
|
||||
|
||||
blocks/inlines
|
||||
|
||||
# Readers and writers
|
||||
|
||||
getReader, getWriter
|
||||
```haskell
|
||||
result <- runIO $ do
|
||||
setVerbosity INFO
|
||||
doc <- readMarkdown def (T.pack "[testing](url)")
|
||||
writeRST def doc
|
||||
```
|
||||
|
||||
# Options
|
||||
|
||||
various reader and writer options you can set.
|
||||
templates
|
||||
extensions
|
||||
The first argument of each reader or writer is for
|
||||
options controlling the behavior of the reader or writer:
|
||||
`ReaderOptions` for readers and `WriterOptions`
|
||||
for writers. These are defined in
|
||||
[Text.Pandoc.Options](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Options.html). It is a good idea to study these
|
||||
options to see what can be adjusted.
|
||||
|
||||
`def` (from Data.Default) denotes a default value for
|
||||
each kind of option. (You can also use `defaultWriterOptions`
|
||||
and `defaultReaderOptions`.) Generally you'll want to use
|
||||
the defaults and modify them only when needed, for example:
|
||||
|
||||
```haskell
|
||||
writeRST def{ writerReferenceLinks = True }
|
||||
```
|
||||
|
||||
Some particularly important options to know about:
|
||||
|
||||
1. `writerTemplate`: By default, this is `Nothing`, which
|
||||
means that a document fragment will be produced. If you
|
||||
want a full document, you need to specify `Just template`,
|
||||
where `template` is a String containing the template's
|
||||
contents (not the path).
|
||||
|
||||
2. `readerExtensions` and `writerExtensions`: These specify
|
||||
the extensions to be used in parsing and rendering.
|
||||
Extensions are defined in [Text.Pandoc.Extensions](https://hackage.haskell.org/package/pandoc/docs/Text-Pandoc-Extensions.html).
|
||||
|
||||
# Builder
|
||||
|
||||
Inlines vs Inline, etc.
|
||||
Monoid
|
||||
|
||||
example: report from CSV data
|
||||
Inlines vs Inline, etc.
|
||||
|
||||
Concatenating lists is slow. So we use special types Inlines and Blocks that wrap Sequences of Inline and Block elements.
|
||||
|
||||
Monoid - makes it easy to build up docs programatically.
|
||||
|
||||
Example.
|
||||
Here’s a JSON data source about CNG fueling stations in the
|
||||
Chicago area: cng_fuel_chicago.json. Boss says: write me a
|
||||
letter in Word listing all the stations that take the Voyager
|
||||
card.
|
||||
|
||||
```
|
||||
[ {
|
||||
"state" : "IL",
|
||||
"city" : "Chicago",
|
||||
"fuel_type_code" : "CNG",
|
||||
"zip" : "60607",
|
||||
"station_name" : "Clean Energy - Yellow Cab",
|
||||
"cards_accepted" : "A D M V Voyager Wright_Exp CleanEnergy",
|
||||
"street_address" : "540 W Grenshaw"
|
||||
}, ...
|
||||
```
|
||||
|
||||
No need to open Word for this job! fuel.hs
|
||||
|
||||
```
|
||||
{-# LANGUAGE OverloadedStrings #-}
|
||||
import Text.Pandoc.Builder
|
||||
import Text.Pandoc
|
||||
import Data.Monoid ((<>), mempty, mconcat)
|
||||
import Data.Aeson
|
||||
import Control.Applicative
|
||||
import Control.Monad (mzero)
|
||||
import qualified Data.ByteString.Lazy as BL
|
||||
import qualified Data.Text as T
|
||||
import Data.List (intersperse)
|
||||
|
||||
data Station = Station{
|
||||
address :: String
|
||||
, name :: String
|
||||
, cardsAccepted :: [String]
|
||||
} deriving Show
|
||||
|
||||
instance FromJSON Station where
|
||||
parseJSON (Object v) = Station <$>
|
||||
v .: "street_address" <*>
|
||||
v .: "station_name" <*>
|
||||
(words <$> (v .:? "cards_accepted" .!= ""))
|
||||
parseJSON _ = mzero
|
||||
|
||||
createLetter :: [Station] -> Pandoc
|
||||
createLetter stations = doc $
|
||||
para "Dear Boss:" <>
|
||||
para "Here are the CNG stations that accept Voyager cards:" <>
|
||||
simpleTable [plain "Station", plain "Address", plain "Cards accepted"]
|
||||
(map stationToRow stations) <>
|
||||
para "Your loyal servant," <>
|
||||
plain (image "JohnHancock.png" "" mempty)
|
||||
where
|
||||
stationToRow station =
|
||||
[ plain (text $ name station)
|
||||
, plain (text $ address station)
|
||||
, plain (mconcat $ intersperse linebreak $ map text $ cardsAccepted station)
|
||||
]
|
||||
|
||||
main :: IO ()
|
||||
main = do
|
||||
json <- BL.readFile "cng_fuel_chicago.json"
|
||||
let letter = case decode json of
|
||||
Just stations -> createLetter [s | s <- stations,
|
||||
"Voyager" `elem` cardsAccepted s]
|
||||
Nothing -> error "Could not decode JSON"
|
||||
BL.writeFile "letter.docx" =<< writeDocx def letter
|
||||
putStrLn "Created letter.docx"
|
||||
```
|
||||
|
||||
|
||||
|
||||
# Templates and other data files
|
||||
|
||||
readDataFile
|
||||
|
||||
# Handling errors and warnings
|
||||
|
||||
# Generic transformations
|
||||
|
||||
Walk and syb for AST transformations
|
||||
|
||||
# Filters
|
||||
Filters: see filters.md
|
||||
|
||||
just the basic idea of toJSONFilter
|
||||
the rest can be left to filters.md
|
||||
|
||||
# Self-contained
|
||||
but, how do you run filters from a program?
|
||||
need to export these functions from Text.Pandoc.App!
|
||||
|
||||
|
||||
# PDF
|
||||
|
||||
# Custom PandocMonad instances
|
||||
|
||||
# Creating a front-end
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue