Support ipynb (Jupyter notebook) as input and output format.

[API change]

* Depend on ipynb library.

* Add `ipynb` as input and output format.

* Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4).

* Added Text.Pandoc.Writers.Ipynb (supports nbformat v4).

* Added ipynb readers and writers to T.P.Readers,
  T.P.Writers, and T.P.Extensions.  Register the
  file extension .ipynb for this format.

* Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error.

* Note: there is no template for ipynb.
This commit is contained in:
John MacFarlane 2019-01-13 07:56:55 -08:00
parent 5ddd7b121e
commit 395ea03069
13 changed files with 638 additions and 9 deletions

View file

@ -248,6 +248,7 @@ General options {.options}
if you need extensions not supported in [`gfm`](#markdown-variants).
- `haddock` ([Haddock markup])
- `html` ([HTML])
- `ipynb` ([Jupyter notebook])
- `jats` ([JATS] XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX])
@ -300,6 +301,7 @@ General options {.options}
- `html` or `html5` ([HTML], i.e. [HTML5]/XHTML [polyglot markup])
- `html4` ([XHTML] 1.0 Transitional)
- `icml` ([InDesign ICML])
- `ipynb` ([Jupyter notebook])
- `jats` ([JATS] XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX])
@ -471,6 +473,7 @@ General options {.options}
[PDF]: https://www.adobe.com/pdf/
[reveal.js]: http://lab.hakim.se/reveal-js/
[FictionBook2]: http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1
[Jupyter notebook]: https://nbformat.readthedocs.io/en/latest/
[InDesign ICML]: http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf
[TEI Simple]: https://github.com/TEIC/TEI-Simple
[Muse]: https://amusewiki.org/library/manual
@ -730,6 +733,8 @@ General writer options {.options}
where there are nonsemantic newlines in the source, there
will be nonsemantic newlines in the output as well).
Automatic wrapping does not currently work in HTML output.
In `ipynb` output, this option affects wrapping of the
contents of markdown cells.
`--columns=`*NUMBER*
@ -902,6 +907,7 @@ Options affecting specific writers {.options}
: Use ATX-style headers in Markdown output. The default is
to use setext-style headers for levels 1-2, and then ATX headers.
(Note: for `gfm` output, ATX headers are always used.)
This option also affects markdown cells in `ipynb` output.
`--top-level-division=[default|section|chapter|part]`
@ -1806,6 +1812,10 @@ section [Pandoc's Markdown] below (See [Markdown variants] for
`commonmark` and `gfm`.) In the following, extensions that also work
for other formats are covered.
Note that markdown extensions added to the `ipynb` format
affect Markdown cells in Jupyter notebooks (as do command-line
options like `--atx-headers`).
Typography
----------
@ -1955,11 +1965,19 @@ This extension can be enabled/disabled for the following formats
input formats
: `latex`, `org`, `textile`, `html` (environments, `\ref`, and
`\eqref` only)
`\eqref` only), `ipynb`
output formats
: `textile`, `commonmark`
Note: as applied to `ipynb`, `raw_html` and `raw_tex` affect not
only raw TeX in markdown cells, but data with mime type
`text/html` in output cells. Since the `ipynb` reader attempts
to preserve the richest possible outputs when several options
are given, you will get best results if you disable `raw_html`
and `raw_tex` when converting to formats like `docx` which don't
allow raw `html` or `tex`.
#### Extension: `native_divs` {#native_divs}
This extension is enabled by default for HTML input. This means that
@ -4747,6 +4765,112 @@ with the `src` attribute. For example:
</source>
</audio>
Creating Jupyter notebooks with pandoc
======================================
When creating a [Jupyter notebook], pandoc will try to infer the
notebook structure. Code blocks with the class `code` will be
taken as code cells, and intervening content will be taken as
Markdown cells. Attachments will automatically be created for
images in Markdown cells. For example:
````
---
title: My notebook
nbformat: 4
nbformat_minor: 5
kernelspec:
display_name: Python 2
language: python
name: python2
language_info:
codemirror_mode:
name: ipython
version: 2
file_extension: ".py"
mimetype: "text/x-python"
name: "python"
nbconvert_exporter: "python"
pygments_lexer: "ipython2"
version: "2.7.15"
---
# Lorem ipsum
**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus
bibendum felis dictum sodales.
``` code
print("hello")
```
## Pyout
``` code
from IPython.display import HTML
HTML("""
<script>
console.log("hello");
</script>
<b>HTML</b>
""")
```
## Image
This image ![image](myimage.png) will be
included as a cell attachment.
````
If you want to add cell attributes, group cells differently, or
add output to code cells, then you need to include divs to
indicate the structure. You can use either [fenced
divs][Extension: `fenced_divs`] or [native divs][Extension:
`native_divs`] for this. Here is an example:
````
:::::: {.cell .markdown}
# Lorem
**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus
bibendum felis dictum sodales.
::::::
:::::: {.cell .code execution_count=1}
``` {.python}
print("hello")
```
::: {.output .stream .stdout}
```
hello
```
:::
::::::
:::::: {.cell .code execution_count=2}
``` {.python}
from IPython.display import HTML
HTML("""
<script>
console.log("hello");
</script>
<b>HTML</b>
""")
```
::: {.output .execute_result execution_count=2}
```{=html}
<script>
console.log("hello");
</script>
<b>HTML</b>
hello
```
:::
::::::
````
Syntax highlighting
===================

View file

@ -32,6 +32,7 @@ library. It can convert *from*
- `creole` ([Creole 1.0](http://www.wikicreole.org/wiki/Creole1.0))
- `docbook` ([DocBook](http://docbook.org))
- `docx` ([Word docx](https://en.wikipedia.org/wiki/Office_Open_XML))
- `dokuwiki` ([DokuWiki markup](https://www.dokuwiki.org/dokuwiki))
- `epub` ([EPUB](http://idpf.org/epub))
- `fb2`
([FictionBook2](http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1)
@ -44,6 +45,8 @@ library. It can convert *from*
- `haddock` ([Haddock
markup](https://www.haskell.org/haddock/doc/html/ch03s08.html))
- `html` ([HTML](http://www.w3.org/html/))
- `ipynb` ([Jupyter
notebook](https://nbformat.readthedocs.io/en/latest/))
- `jats` ([JATS](https://jats.nlm.nih.gov) XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX](http://latex-project.org))
@ -105,6 +108,8 @@ It can convert *to*
- `html4` ([XHTML](http://www.w3.org/TR/xhtml1/) 1.0 Transitional)
- `icml` ([InDesign
ICML](http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf))
- `ipynb` ([Jupyter
notebook](https://nbformat.readthedocs.io/en/latest/))
- `jats` ([JATS](https://jats.nlm.nih.gov) XML)
- `json` (JSON version of native AST)
- `latex` ([LaTeX](http://latex-project.org))

View file

@ -4,3 +4,4 @@ source-repository-package
type: git
location: https://github.com/jgm/pandoc-citeproc
tag: 4b467c62af17ddfc739933891c5ea2291a6b9b76

View file

@ -20,14 +20,14 @@ description: Pandoc is a Haskell library for converting from one markup
(subsets of) HTML, reStructuredText, LaTeX, DocBook, JATS,
MediaWiki markup, DokuWiki markup, TWiki markup,
TikiWiki markup, Creole 1.0, Haddock markup, OPML,
Emacs Org-Mode, Emacs Muse, txt2tags,
Vimwiki, Word Docx, ODT, EPUB, FictionBook2, roff man,
and Textile, and it can write Markdown, reStructuredText,
XHTML, HTML 5, LaTeX, ConTeXt, DocBook, JATS, OPML, TEI,
OpenDocument, ODT, Word docx, PowerPoint pptx,
RTF, MediaWiki, DokuWiki, ZimWiki, Textile,
Emacs Org-Mode, Emacs Muse, txt2tags, ipynb (Jupyter
notebooks), Vimwiki, Word Docx, ODT, EPUB, FictionBook2,
roff man, and Textile, and it can write Markdown,
reStructuredText, XHTML, HTML 5, LaTeX, ConTeXt, DocBook,
JATS, OPML, TEI, OpenDocument, ODT, Word docx,
PowerPoint pptx, RTF, MediaWiki, DokuWiki, ZimWiki, Textile,
roff man, roff ms, plain text, Emacs Org-Mode,
AsciiDoc, Haddock markup, EPUB (v2 and v3),
AsciiDoc, Haddock markup, EPUB (v2 and v3), ipynb,
FictionBook2, InDesign ICML, Muse, LaTeX beamer slides,
and several kinds of HTML/JavaScript slide shows
(S5, Slidy, Slideous, DZSlides, reveal.js).
@ -398,7 +398,8 @@ library
http-types >= 0.8 && < 0.13,
case-insensitive >= 1.2 && < 1.3,
unicode-transforms >= 0.3 && < 0.4,
HsYAML >= 0.1.1.1 && < 0.2
HsYAML >= 0.1.1.1 && < 0.2,
ipynb >= 0.1 && < 0.2
if impl(ghc < 8.0)
build-depends: semigroups == 0.18.*,
-- basement 0.0.8 and foundation 0.0.21, transitive
@ -470,12 +471,14 @@ library
Text.Pandoc.Readers.Man,
Text.Pandoc.Readers.FB2,
Text.Pandoc.Readers.DokuWiki,
Text.Pandoc.Readers.Ipynb,
Text.Pandoc.Writers,
Text.Pandoc.Writers.Native,
Text.Pandoc.Writers.Docbook,
Text.Pandoc.Writers.JATS,
Text.Pandoc.Writers.OPML,
Text.Pandoc.Writers.HTML,
Text.Pandoc.Writers.Ipynb,
Text.Pandoc.Writers.ICML,
Text.Pandoc.Writers.LaTeX,
Text.Pandoc.Writers.ConTeXt,

View file

@ -90,5 +90,6 @@ formatFromFilePath x =
".txt" -> Just "markdown"
".wiki" -> Just "mediawiki"
".xhtml" -> Just "html"
".ipynb" -> Just "ipynb"
['.',y] | y `elem` ['1'..'9'] -> Just "man"
_ -> Nothing

View file

@ -70,6 +70,7 @@ data PandocError = PandocIOError String IOError
| PandocEpubSubdirectoryError String
| PandocMacroLoop String
| PandocUTF8DecodingError String Int Word8
| PandocIpynbDecodingError String
deriving (Show, Typeable, Generic)
instance Exception PandocError
@ -124,6 +125,8 @@ handleError (Left e) =
"UTF-8 decoding error in " ++ f ++ " at byte offset " ++ show offset ++
" (" ++ printf "%2x" w ++ ").\n" ++
"The input must be a UTF-8 encoded text."
PandocIpynbDecodingError w -> err 93 $
"ipynb decoding error: " ++ w
err :: Int -> String -> IO a
err exitCode msg = do

View file

@ -332,6 +332,8 @@ getDefaultExtensions "markdown_phpextra" = phpMarkdownExtraExtensions
getDefaultExtensions "markdown_mmd" = multimarkdownExtensions
getDefaultExtensions "markdown_github" = githubMarkdownExtensions
getDefaultExtensions "markdown" = pandocExtensions
getDefaultExtensions "ipynb" = enableExtension Ext_tex_math_dollars
githubMarkdownExtensions
getDefaultExtensions "muse" = extensionsFromList
[Ext_amuse,
Ext_auto_identifiers]

View file

@ -67,6 +67,7 @@ module Text.Pandoc.Readers
, readEPUB
, readMuse
, readFB2
, readIpynb
-- * Miscellaneous
, getReader
, getDefaultExtensions
@ -90,6 +91,7 @@ import Text.Pandoc.Readers.Docx
import Text.Pandoc.Readers.DokuWiki
import Text.Pandoc.Readers.EPUB
import Text.Pandoc.Readers.FB2
import Text.Pandoc.Readers.Ipynb
import Text.Pandoc.Readers.Haddock
import Text.Pandoc.Readers.HTML (readHtml)
import Text.Pandoc.Readers.JATS (readJATS)
@ -147,6 +149,7 @@ readers = [ ("native" , TextReader readNative)
,("muse" , TextReader readMuse)
,("man" , TextReader readMan)
,("fb2" , TextReader readFB2)
,("ipynb" , TextReader readIpynb)
]
-- | Retrieve reader, extensions based on formatSpec (format+extensions).

View file

@ -0,0 +1,249 @@
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-
Copyright (C) 2019 John MacFarlane <jgm@berkeley.edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-}
{- |
Module : Text.Pandoc.Readers.Ipynb
Copyright : Copyright (C) 2019 John MacFarlane
License : GNU GPL, version 2 or above
Maintainer : John MacFarlane <jgm@berkeley.edu>
Stability : alpha
Portability : portable
Ipynb (Jupyter notebook JSON format) reader for pandoc.
-}
module Text.Pandoc.Readers.Ipynb ( readIpynb )
where
import Prelude
import Data.Maybe (fromMaybe)
import Data.Digest.Pure.SHA (sha1, showDigest)
import Text.Pandoc.Options
import qualified Text.Pandoc.Builder as B
import Text.Pandoc.Logging
import Text.Pandoc.Definition
import Data.Ipynb as Ipynb
import Text.Pandoc.Class
import Text.Pandoc.MIME (extensionFromMimeType)
import Text.Pandoc.UTF8
import Text.Pandoc.Error
import Data.Text (Text)
import qualified Data.Map as M
import qualified Data.Text as T
import qualified Data.Text.Encoding as TE
import qualified Data.ByteString.Lazy as BL
import Data.Aeson as Aeson
import Control.Monad.Except (throwError)
import Text.Pandoc.Readers.Markdown (readMarkdown)
import Text.Pandoc.Readers.HTML (readHtml)
readIpynb :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
readIpynb opts t = do
let src = BL.fromStrict (TE.encodeUtf8 t)
case eitherDecode src of
Right (notebook4 :: Notebook NbV4) -> notebookToPandoc opts notebook4
Left _ ->
case eitherDecode src of
Right (notebook3 :: Notebook NbV3) -> notebookToPandoc opts notebook3
Left err -> throwError $ PandocIpynbDecodingError err
notebookToPandoc :: (PandocMonad m, FromJSON (Notebook a))
=> ReaderOptions -> Notebook a -> m Pandoc
notebookToPandoc opts notebook = do
let cells = notebookCells notebook
let (fmt,fmtminor) = notebookFormat notebook
let m = M.insert "nbformat" (MetaString $ show fmt) $
M.insert "nbformat_minor" (MetaString $ show fmtminor) $
jsonMetaToMeta (notebookMetadata notebook)
let lang = case M.lookup "kernelspec" m of
Just (MetaMap ks) ->
case M.lookup "language" ks of
Just (MetaString l) -> l
_ -> "python"
_ -> "python"
bs <- mconcat <$> mapM (cellToBlocks opts lang) cells
let Pandoc _ blocks = B.doc bs
return $ Pandoc (Meta m) blocks
cellToBlocks :: PandocMonad m
=> ReaderOptions -> String -> Cell a -> m B.Blocks
cellToBlocks opts lang c = do
let Source ts = cellSource c
let source = mconcat ts
let kvs = jsonMetaToPairs (cellMetadata c)
let attachments = maybe mempty M.toList $ cellAttachments c
mapM_ addAttachment attachments
case cellType c of
Ipynb.Markdown -> do
Pandoc _ bs <- readMarkdown opts source
return $ B.divWith ("",["cell","markdown"],kvs)
$ B.fromList bs
Ipynb.Heading lev -> do
Pandoc _ bs <- readMarkdown opts
(T.replicate lev "#" <> " " <> source)
return $ B.divWith ("",["cell","markdown"],kvs)
$ B.fromList bs
Ipynb.Raw -> do
let format = fromMaybe "" $ lookup "format" kvs
let format' =
case format of
"text/html" -> "html"
"text/latex" -> "latex"
"application/pdf" -> "latex"
"text/markdown" -> "markdown"
"text/x-rsrt" -> "rst"
_ -> format
return $ B.divWith ("",["cell","raw"],kvs) $ B.rawBlock format'
$ T.unpack source
Ipynb.Code{ codeOutputs = outputs, codeExecutionCount = ec } -> do
outputBlocks <- mconcat <$> mapM (outputToBlock opts) outputs
let kvs' = maybe kvs (\x -> ("execution_count", show x):kvs) ec
return $ B.divWith ("",["cell","code"],kvs') $
B.codeBlockWith ("",[lang],[]) (T.unpack source)
<> outputBlocks
addAttachment :: PandocMonad m => (Text, MimeBundle) -> m ()
addAttachment (fname, mimeBundle) = do
let fp = T.unpack fname
case M.toList (unMimeBundle mimeBundle) of
(mimeType, BinaryData bs):_ ->
insertMedia fp (Just $ T.unpack mimeType) (BL.fromStrict bs)
(mimeType, TextualData t):_ ->
insertMedia fp (Just $ T.unpack mimeType)
(BL.fromStrict $ TE.encodeUtf8 t)
(mimeType, JsonData v):_ ->
insertMedia fp (Just $ T.unpack mimeType) (encode v)
[] -> report $ CouldNotFetchResource fp "no attachment"
outputToBlock :: PandocMonad m => ReaderOptions -> Output a -> m B.Blocks
outputToBlock _ Stream{ streamName = sName,
streamText = Source text } = do
return $ B.divWith ("",["output","stream",T.unpack sName],[])
$ B.codeBlock $ T.unpack . mconcat $ text
outputToBlock opts DisplayData{ displayData = data',
displayMetadata = metadata' } =
B.divWith ("",["output", "display_data"],[]) <$>
handleData opts metadata' data'
outputToBlock opts ExecuteResult{ executeCount = ec,
executeData = data',
executeMetadata = metadata' } =
B.divWith ("",["output", "execute_result"],[("execution_count",show ec)])
<$> handleData opts metadata' data'
outputToBlock _ Err{ errName = ename,
errValue = evalue,
errTraceback = traceback } = do
return $ B.divWith ("",["output","error"],
[("ename",T.unpack ename),
("evalue",T.unpack evalue)])
$ B.codeBlock $ T.unpack . T.unlines $ traceback
-- We want to display the richest output possible given
-- the output format.
handleData :: PandocMonad m
=> ReaderOptions -> JSONMeta -> MimeBundle -> m B.Blocks
handleData opts metadata (MimeBundle mb) = do
let mimePairs = M.toList mb
results <- mapM dataBlock mimePairs
-- return the result with highest priority:
let highest = maximum (0 : map fst results)
return $ case [r | (pr, r) <- results, pr == highest] of
x:_ -> x
[] -> mempty
where
exts = readerExtensions opts
dataBlock :: PandocMonad m => (MimeType, MimeData) -> m (Int, B.Blocks)
dataBlock (mt, BinaryData bs)
| "image/" `T.isPrefixOf` mt
= do
-- normally metadata maps from mime types to key-value map;
-- but not always...
let meta = case M.lookup mt metadata of
Just v@(Object{}) ->
case fromJSON v of
Success m' -> m'
Error _ -> mempty
_ -> mempty
let metaPairs = jsonMetaToPairs meta
let bl = BL.fromStrict bs
-- SHA1 hash for filename
let mt' = T.unpack mt
let fname = showDigest (sha1 bl) ++
case extensionFromMimeType mt' of
Nothing -> ""
Just ext -> '.':ext
insertMedia fname (Just mt') bl
return (3, B.para $ B.imageWith ("",[],metaPairs) fname "" mempty)
dataBlock (_, BinaryData _) = return (0, mempty)
dataBlock ("text/html", TextualData t)
| extensionEnabled Ext_raw_html exts
= return (2, B.rawBlock "html" $ T.unpack t)
| otherwise = do -- try parsing the HTML
Pandoc _ bls <- readHtml opts t
return (1, B.fromList bls)
dataBlock ("text/latex", TextualData t) =
return $ if extensionEnabled Ext_raw_tex exts
then (2, B.rawBlock "latex" $ T.unpack t)
else (0, mempty)
dataBlock ("text/plain", TextualData t) =
return (0, B.codeBlock $ T.unpack t)
dataBlock (_, JsonData v) =
return (2, B.codeBlockWith ("",["json"],[]) $ toStringLazy $ encode v)
dataBlock _ = return (0, mempty)
jsonMetaToMeta :: JSONMeta -> M.Map String MetaValue
jsonMetaToMeta = M.mapKeys T.unpack . M.map valueToMetaValue
where
valueToMetaValue :: Value -> MetaValue
valueToMetaValue x@(Object{}) =
case fromJSON x of
Error s -> MetaString s
Success jm' -> MetaMap $ jsonMetaToMeta jm'
valueToMetaValue x@(Array{}) =
case fromJSON x of
Error s -> MetaString s
Success xs -> MetaList $ map valueToMetaValue xs
valueToMetaValue (Bool b) = MetaBool b
valueToMetaValue (String t) = MetaString (T.unpack t)
valueToMetaValue (Number n) = MetaString (show n)
valueToMetaValue Aeson.Null = MetaString ""
jsonMetaToPairs :: JSONMeta -> [(String, String)]
jsonMetaToPairs = M.toList . M.mapMaybe
(\case
MetaString s -> Just s
MetaBool True -> Just "true"
MetaBool False -> Just "false"
-- for now we skip complex cell metadata:
_ -> Nothing) . jsonMetaToMeta

View file

@ -58,6 +58,7 @@ getDefaultTemplate writer = do
"docx" -> return ""
"fb2" -> return ""
"pptx" -> return ""
"ipynb" -> return ""
"odt" -> getDefaultTemplate "opendocument"
"html" -> getDefaultTemplate "html5"
"docbook" -> getDefaultTemplate "docbook5"

View file

@ -49,6 +49,7 @@ module Text.Pandoc.Writers
, writeEPUB2
, writeEPUB3
, writeFB2
, writeIpynb
, writeHaddock
, writeHtml4
, writeHtml4String
@ -101,6 +102,7 @@ import Text.Pandoc.Writers.Docx
import Text.Pandoc.Writers.DokuWiki
import Text.Pandoc.Writers.EPUB
import Text.Pandoc.Writers.FB2
import Text.Pandoc.Writers.Ipynb
import Text.Pandoc.Writers.Haddock
import Text.Pandoc.Writers.HTML
import Text.Pandoc.Writers.ICML
@ -140,6 +142,7 @@ writers = [
,("epub2" , ByteStringWriter writeEPUB2)
,("epub3" , ByteStringWriter writeEPUB3)
,("fb2" , TextWriter writeFB2)
,("ipynb" , TextWriter writeIpynb)
,("html" , TextWriter writeHtml5String)
,("html4" , TextWriter writeHtml4String)
,("html5" , TextWriter writeHtml5String)

View file

@ -0,0 +1,233 @@
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-
Copyright (C) 2019 John MacFarlane <jgm@berkeley.edu>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-}
{- |
Module : Text.Pandoc.Writers.Ipynb
Copyright : Copyright (C) 2019 John MacFarlane
License : GNU GPL, version 2 or above
Maintainer : John MacFarlane <jgm@berkeley.edu>
Stability : alpha
Portability : portable
Ipynb (Jupyter notebook JSON format) writer for pandoc.
-}
module Text.Pandoc.Writers.Ipynb ( writeIpynb )
where
import Prelude
import Control.Monad (foldM)
import Control.Monad.State
import qualified Data.Map as M
import Data.Maybe (catMaybes, fromMaybe)
import Text.Pandoc.Options
import Text.Pandoc.Definition
import Data.Ipynb as Ipynb
import Text.Pandoc.Walk (walkM)
import qualified Text.Pandoc.Builder as B
import Text.Pandoc.Class
import Text.Pandoc.Logging
import Data.Text (Text)
import qualified Data.Text as T
import Data.Aeson as Aeson
import qualified Text.Pandoc.UTF8 as UTF8
import Text.Pandoc.Shared (safeRead)
import Text.Pandoc.Writers.Shared (metaToJSON')
import Text.Pandoc.Writers.Markdown (writeMarkdown)
import qualified Data.Text.Encoding as TE
import qualified Data.ByteString.Lazy as BL
import Data.Aeson.Encode.Pretty (Config(..), defConfig,
encodePretty', keyOrder, Indent(Spaces))
writeIpynb :: PandocMonad m => WriterOptions -> Pandoc -> m Text
writeIpynb opts d = do
notebook <- pandocToNotebook opts d
return $ TE.decodeUtf8 . BL.toStrict . encodePretty' defConfig{
confIndent = Spaces 1,
confCompare = keyOrder
[ "cells", "nbformat", "nbformat_minor",
"cell_type", "output_type",
"execution_count", "metadata",
"outputs", "source",
"data", "name", "text" ] }
$ notebook
pandocToNotebook :: PandocMonad m
=> WriterOptions -> Pandoc -> m (Notebook NbV4)
pandocToNotebook opts (Pandoc meta blocks) = do
let blockWriter bs = writeMarkdown
opts{ writerTemplate = Nothing } (Pandoc nullMeta bs)
let inlineWriter ils = T.stripEnd <$> writeMarkdown
opts{ writerTemplate = Nothing } (Pandoc nullMeta [Plain ils])
metadata' <- metaToJSON' blockWriter inlineWriter $
B.deleteMeta "nbformat" $
B.deleteMeta "nbformat_minor" $ meta
let metadata = case fromJSON metadata' of
Error _ -> mempty -- TODO warning here? shouldn't happen
Success x -> x
cells <- extractCells opts blocks
return $ Notebook{
notebookMetadata = metadata
, notebookFormat = (4, 5)
, notebookCells = cells }
addAttachment :: PandocMonad m
=> Inline
-> StateT (M.Map Text MimeBundle) m Inline
addAttachment (Image attr lab (src,tit)) = do
(img, mbmt) <- fetchItem src
let mt = maybe "application/octet-stream" (T.pack) mbmt
modify $ M.insert (T.pack src)
(MimeBundle (M.insert mt (BinaryData img) mempty))
return $ Image attr lab ("attachment:" <> src, tit)
addAttachment x = return x
extractCells :: PandocMonad m => WriterOptions -> [Block] -> m [Cell a]
extractCells _ [] = return []
extractCells opts (Div (_id,classes,kvs) xs : bs)
| "cell" `elem` classes
, "markdown" `elem` classes = do
let meta = pairsToJSONMeta kvs
(newdoc, attachments) <-
runStateT (walkM addAttachment (Pandoc nullMeta xs)) mempty
source <- writeMarkdown opts{ writerTemplate = Nothing } newdoc
(Cell{
cellType = Markdown
, cellSource = Source $ breakLines source
, cellMetadata = meta
, cellAttachments = if M.null attachments
then Nothing
else Just attachments } :)
<$> extractCells opts bs
| "cell" `elem` classes
, "code" `elem` classes = do
let (codeContent, rest) =
case xs of
(CodeBlock _ t : ys) -> (T.pack t, ys)
ys -> (mempty, ys)
let meta = pairsToJSONMeta kvs
outputs <- catMaybes <$> mapM blockToOutput rest
let exeCount = lookup "execution_count" kvs >>= safeRead
(Cell{
cellType = Ipynb.Code {
codeExecutionCount = exeCount
, codeOutputs = outputs
}
, cellSource = Source $ breakLines codeContent
, cellMetadata = meta
, cellAttachments = Nothing } :) <$> extractCells opts bs
| "cell" `elem` classes
, "raw" `elem` classes =
case xs of
[RawBlock (Format f) raw] -> do
let format' =
case f of
"html" -> "text/html"
"revealjs" -> "text/html"
"latex" -> "text/latex"
"markdown" -> "text/markdown"
"rst" -> "text/x-rst"
_ -> f
(Cell{
cellType = Raw
, cellSource = Source $ breakLines $ T.pack raw
, cellMetadata = M.insert "format"
(Aeson.String $ T.pack format') mempty
, cellAttachments = Nothing } :) <$> extractCells opts bs
_ -> extractCells opts bs
extractCells opts (CodeBlock (_id,classes,kvs) raw : bs)
| "code" `elem` classes = do
let meta = pairsToJSONMeta kvs
let exeCount = lookup "execution_count" kvs >>= safeRead
(Cell{
cellType = Ipynb.Code {
codeExecutionCount = exeCount
, codeOutputs = []
}
, cellSource = Source $ breakLines $ T.pack raw
, cellMetadata = meta
, cellAttachments = Nothing } :) <$> extractCells opts bs
extractCells opts (b:bs) = do
let isCodeOrDiv (CodeBlock (_,cl,_) _) = "code" `elem` cl
isCodeOrDiv (Div (_,cl,_) _) = "cell" `elem` cl
isCodeOrDiv _ = False
let (mds, rest) = break (isCodeOrDiv) bs
extractCells opts (Div ("",["cell","markdown"],[]) (b:mds) : rest)
blockToOutput :: PandocMonad m => Block -> m (Maybe (Output a))
blockToOutput (Div (_,["output","stream",sname],_) (CodeBlock _ t:_)) =
return $ Just
$ Stream{ streamName = T.pack sname
, streamText = Source (breakLines $ T.pack t) }
blockToOutput (Div (_,["output","error"],kvs) (CodeBlock _ t:_)) =
return $ Just
$ Err{ errName = maybe mempty T.pack (lookup "ename" kvs)
, errValue = maybe mempty T.pack (lookup "evalue" kvs)
, errTraceback = breakLines $ T.pack t }
blockToOutput (Div (_,["output","execute_result"],kvs) bs) = do
(data', metadata') <- extractData bs
return $ Just
$ ExecuteResult{ executeCount = fromMaybe 0 $
lookup "execution_count" kvs >>= safeRead
, executeData = data'
, executeMetadata = pairsToJSONMeta kvs <> metadata'}
blockToOutput (Div (_,["output","display_data"],kvs) bs) = do
(data', metadata') <- extractData bs
return $ Just
$ DisplayData { displayData = data'
, displayMetadata = pairsToJSONMeta kvs <> metadata'}
blockToOutput _ = return Nothing
extractData :: PandocMonad m => [Block] -> m (MimeBundle, JSONMeta)
extractData bs = do
(mmap, meta) <- foldM go mempty bs
return (MimeBundle mmap, meta)
where
go (mmap, meta) b@(Para [Image (_,_,kvs) _ (src,_)]) = do
(img, mbmt) <- fetchItem src
case mbmt of
Just mt -> return
(M.insert (T.pack mt) (BinaryData img) mmap,
meta <> pairsToJSONMeta kvs)
Nothing -> (mmap, meta) <$ report (BlockNotRendered b)
go (mmap, meta) b@(CodeBlock (_,["json"],_) code) =
case decode (UTF8.fromStringLazy code) of
Just v -> return
(M.insert "application/json" (JsonData v) mmap, meta)
Nothing -> (mmap, meta) <$ report (BlockNotRendered b)
go (mmap, meta) (CodeBlock ("",[],[]) code) =
return (M.insert "text/plain" (TextualData (T.pack code)) mmap, meta)
go (mmap, meta) (RawBlock (Format "html") raw) =
return (M.insert "text/html" (TextualData (T.pack raw)) mmap, meta)
go (mmap, meta) (RawBlock (Format "latex") raw) =
return (M.insert "text/latex" (TextualData (T.pack raw)) mmap, meta)
go (mmap, meta) b = (mmap, meta) <$ report (BlockNotRendered b)
pairsToJSONMeta :: [(String, String)] -> JSONMeta
pairsToJSONMeta kvs =
M.fromList [(T.pack k, case v of
"true" -> Bool True
"false" -> Bool False
_ -> case safeRead v of
Just n -> Number n
_ -> String (T.pack v))
| (k,v) <- kvs , k /= "execution_count" ]

View file

@ -21,6 +21,7 @@ packages:
extra-deps:
- github: jgm/pandoc-citeproc
commit: 4b467c62af17ddfc739933891c5ea2291a6b9b76
- ipynb-0.1
- haddock-library-1.7.0
- HsYAML-0.1.1.2
- yaml-0.11.0.0