Support ipynb (Jupyter notebook) as input and output format.
[API change] * Depend on ipynb library. * Add `ipynb` as input and output format. * Added Text.Pandoc.Readers.Ipynb (supports both nbformat v3 and v4). * Added Text.Pandoc.Writers.Ipynb (supports nbformat v4). * Added ipynb readers and writers to T.P.Readers, T.P.Writers, and T.P.Extensions. Register the file extension .ipynb for this format. * Add `PandocIpynbDecodingError` constructor to Text.Pandoc.Error.Error. * Note: there is no template for ipynb.
This commit is contained in:
parent
5ddd7b121e
commit
395ea03069
13 changed files with 638 additions and 9 deletions
126
MANUAL.txt
126
MANUAL.txt
|
@ -248,6 +248,7 @@ General options {.options}
|
|||
if you need extensions not supported in [`gfm`](#markdown-variants).
|
||||
- `haddock` ([Haddock markup])
|
||||
- `html` ([HTML])
|
||||
- `ipynb` ([Jupyter notebook])
|
||||
- `jats` ([JATS] XML)
|
||||
- `json` (JSON version of native AST)
|
||||
- `latex` ([LaTeX])
|
||||
|
@ -300,6 +301,7 @@ General options {.options}
|
|||
- `html` or `html5` ([HTML], i.e. [HTML5]/XHTML [polyglot markup])
|
||||
- `html4` ([XHTML] 1.0 Transitional)
|
||||
- `icml` ([InDesign ICML])
|
||||
- `ipynb` ([Jupyter notebook])
|
||||
- `jats` ([JATS] XML)
|
||||
- `json` (JSON version of native AST)
|
||||
- `latex` ([LaTeX])
|
||||
|
@ -471,6 +473,7 @@ General options {.options}
|
|||
[PDF]: https://www.adobe.com/pdf/
|
||||
[reveal.js]: http://lab.hakim.se/reveal-js/
|
||||
[FictionBook2]: http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1
|
||||
[Jupyter notebook]: https://nbformat.readthedocs.io/en/latest/
|
||||
[InDesign ICML]: http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf
|
||||
[TEI Simple]: https://github.com/TEIC/TEI-Simple
|
||||
[Muse]: https://amusewiki.org/library/manual
|
||||
|
@ -730,6 +733,8 @@ General writer options {.options}
|
|||
where there are nonsemantic newlines in the source, there
|
||||
will be nonsemantic newlines in the output as well).
|
||||
Automatic wrapping does not currently work in HTML output.
|
||||
In `ipynb` output, this option affects wrapping of the
|
||||
contents of markdown cells.
|
||||
|
||||
`--columns=`*NUMBER*
|
||||
|
||||
|
@ -902,6 +907,7 @@ Options affecting specific writers {.options}
|
|||
: Use ATX-style headers in Markdown output. The default is
|
||||
to use setext-style headers for levels 1-2, and then ATX headers.
|
||||
(Note: for `gfm` output, ATX headers are always used.)
|
||||
This option also affects markdown cells in `ipynb` output.
|
||||
|
||||
`--top-level-division=[default|section|chapter|part]`
|
||||
|
||||
|
@ -1806,6 +1812,10 @@ section [Pandoc's Markdown] below (See [Markdown variants] for
|
|||
`commonmark` and `gfm`.) In the following, extensions that also work
|
||||
for other formats are covered.
|
||||
|
||||
Note that markdown extensions added to the `ipynb` format
|
||||
affect Markdown cells in Jupyter notebooks (as do command-line
|
||||
options like `--atx-headers`).
|
||||
|
||||
Typography
|
||||
----------
|
||||
|
||||
|
@ -1955,11 +1965,19 @@ This extension can be enabled/disabled for the following formats
|
|||
|
||||
input formats
|
||||
: `latex`, `org`, `textile`, `html` (environments, `\ref`, and
|
||||
`\eqref` only)
|
||||
`\eqref` only), `ipynb`
|
||||
|
||||
output formats
|
||||
: `textile`, `commonmark`
|
||||
|
||||
Note: as applied to `ipynb`, `raw_html` and `raw_tex` affect not
|
||||
only raw TeX in markdown cells, but data with mime type
|
||||
`text/html` in output cells. Since the `ipynb` reader attempts
|
||||
to preserve the richest possible outputs when several options
|
||||
are given, you will get best results if you disable `raw_html`
|
||||
and `raw_tex` when converting to formats like `docx` which don't
|
||||
allow raw `html` or `tex`.
|
||||
|
||||
#### Extension: `native_divs` {#native_divs}
|
||||
|
||||
This extension is enabled by default for HTML input. This means that
|
||||
|
@ -4747,6 +4765,112 @@ with the `src` attribute. For example:
|
|||
</source>
|
||||
</audio>
|
||||
|
||||
Creating Jupyter notebooks with pandoc
|
||||
======================================
|
||||
|
||||
When creating a [Jupyter notebook], pandoc will try to infer the
|
||||
notebook structure. Code blocks with the class `code` will be
|
||||
taken as code cells, and intervening content will be taken as
|
||||
Markdown cells. Attachments will automatically be created for
|
||||
images in Markdown cells. For example:
|
||||
|
||||
````
|
||||
---
|
||||
title: My notebook
|
||||
nbformat: 4
|
||||
nbformat_minor: 5
|
||||
kernelspec:
|
||||
display_name: Python 2
|
||||
language: python
|
||||
name: python2
|
||||
language_info:
|
||||
codemirror_mode:
|
||||
name: ipython
|
||||
version: 2
|
||||
file_extension: ".py"
|
||||
mimetype: "text/x-python"
|
||||
name: "python"
|
||||
nbconvert_exporter: "python"
|
||||
pygments_lexer: "ipython2"
|
||||
version: "2.7.15"
|
||||
---
|
||||
|
||||
# Lorem ipsum
|
||||
|
||||
**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus
|
||||
bibendum felis dictum sodales.
|
||||
|
||||
``` code
|
||||
print("hello")
|
||||
```
|
||||
|
||||
## Pyout
|
||||
|
||||
``` code
|
||||
from IPython.display import HTML
|
||||
HTML("""
|
||||
<script>
|
||||
console.log("hello");
|
||||
</script>
|
||||
<b>HTML</b>
|
||||
""")
|
||||
```
|
||||
|
||||
## Image
|
||||
|
||||
This image ![image](myimage.png) will be
|
||||
included as a cell attachment.
|
||||
````
|
||||
|
||||
If you want to add cell attributes, group cells differently, or
|
||||
add output to code cells, then you need to include divs to
|
||||
indicate the structure. You can use either [fenced
|
||||
divs][Extension: `fenced_divs`] or [native divs][Extension:
|
||||
`native_divs`] for this. Here is an example:
|
||||
|
||||
````
|
||||
:::::: {.cell .markdown}
|
||||
# Lorem
|
||||
|
||||
**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus
|
||||
bibendum felis dictum sodales.
|
||||
::::::
|
||||
|
||||
:::::: {.cell .code execution_count=1}
|
||||
``` {.python}
|
||||
print("hello")
|
||||
```
|
||||
|
||||
::: {.output .stream .stdout}
|
||||
```
|
||||
hello
|
||||
```
|
||||
:::
|
||||
::::::
|
||||
|
||||
:::::: {.cell .code execution_count=2}
|
||||
``` {.python}
|
||||
from IPython.display import HTML
|
||||
HTML("""
|
||||
<script>
|
||||
console.log("hello");
|
||||
</script>
|
||||
<b>HTML</b>
|
||||
""")
|
||||
```
|
||||
|
||||
::: {.output .execute_result execution_count=2}
|
||||
```{=html}
|
||||
<script>
|
||||
console.log("hello");
|
||||
</script>
|
||||
<b>HTML</b>
|
||||
hello
|
||||
```
|
||||
:::
|
||||
::::::
|
||||
````
|
||||
|
||||
Syntax highlighting
|
||||
===================
|
||||
|
||||
|
|
|
@ -32,6 +32,7 @@ library. It can convert *from*
|
|||
- `creole` ([Creole 1.0](http://www.wikicreole.org/wiki/Creole1.0))
|
||||
- `docbook` ([DocBook](http://docbook.org))
|
||||
- `docx` ([Word docx](https://en.wikipedia.org/wiki/Office_Open_XML))
|
||||
- `dokuwiki` ([DokuWiki markup](https://www.dokuwiki.org/dokuwiki))
|
||||
- `epub` ([EPUB](http://idpf.org/epub))
|
||||
- `fb2`
|
||||
([FictionBook2](http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1)
|
||||
|
@ -44,6 +45,8 @@ library. It can convert *from*
|
|||
- `haddock` ([Haddock
|
||||
markup](https://www.haskell.org/haddock/doc/html/ch03s08.html))
|
||||
- `html` ([HTML](http://www.w3.org/html/))
|
||||
- `ipynb` ([Jupyter
|
||||
notebook](https://nbformat.readthedocs.io/en/latest/))
|
||||
- `jats` ([JATS](https://jats.nlm.nih.gov) XML)
|
||||
- `json` (JSON version of native AST)
|
||||
- `latex` ([LaTeX](http://latex-project.org))
|
||||
|
@ -105,6 +108,8 @@ It can convert *to*
|
|||
- `html4` ([XHTML](http://www.w3.org/TR/xhtml1/) 1.0 Transitional)
|
||||
- `icml` ([InDesign
|
||||
ICML](http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf))
|
||||
- `ipynb` ([Jupyter
|
||||
notebook](https://nbformat.readthedocs.io/en/latest/))
|
||||
- `jats` ([JATS](https://jats.nlm.nih.gov) XML)
|
||||
- `json` (JSON version of native AST)
|
||||
- `latex` ([LaTeX](http://latex-project.org))
|
||||
|
|
|
@ -4,3 +4,4 @@ source-repository-package
|
|||
type: git
|
||||
location: https://github.com/jgm/pandoc-citeproc
|
||||
tag: 4b467c62af17ddfc739933891c5ea2291a6b9b76
|
||||
|
||||
|
|
19
pandoc.cabal
19
pandoc.cabal
|
@ -20,14 +20,14 @@ description: Pandoc is a Haskell library for converting from one markup
|
|||
(subsets of) HTML, reStructuredText, LaTeX, DocBook, JATS,
|
||||
MediaWiki markup, DokuWiki markup, TWiki markup,
|
||||
TikiWiki markup, Creole 1.0, Haddock markup, OPML,
|
||||
Emacs Org-Mode, Emacs Muse, txt2tags,
|
||||
Vimwiki, Word Docx, ODT, EPUB, FictionBook2, roff man,
|
||||
and Textile, and it can write Markdown, reStructuredText,
|
||||
XHTML, HTML 5, LaTeX, ConTeXt, DocBook, JATS, OPML, TEI,
|
||||
OpenDocument, ODT, Word docx, PowerPoint pptx,
|
||||
RTF, MediaWiki, DokuWiki, ZimWiki, Textile,
|
||||
Emacs Org-Mode, Emacs Muse, txt2tags, ipynb (Jupyter
|
||||
notebooks), Vimwiki, Word Docx, ODT, EPUB, FictionBook2,
|
||||
roff man, and Textile, and it can write Markdown,
|
||||
reStructuredText, XHTML, HTML 5, LaTeX, ConTeXt, DocBook,
|
||||
JATS, OPML, TEI, OpenDocument, ODT, Word docx,
|
||||
PowerPoint pptx, RTF, MediaWiki, DokuWiki, ZimWiki, Textile,
|
||||
roff man, roff ms, plain text, Emacs Org-Mode,
|
||||
AsciiDoc, Haddock markup, EPUB (v2 and v3),
|
||||
AsciiDoc, Haddock markup, EPUB (v2 and v3), ipynb,
|
||||
FictionBook2, InDesign ICML, Muse, LaTeX beamer slides,
|
||||
and several kinds of HTML/JavaScript slide shows
|
||||
(S5, Slidy, Slideous, DZSlides, reveal.js).
|
||||
|
@ -398,7 +398,8 @@ library
|
|||
http-types >= 0.8 && < 0.13,
|
||||
case-insensitive >= 1.2 && < 1.3,
|
||||
unicode-transforms >= 0.3 && < 0.4,
|
||||
HsYAML >= 0.1.1.1 && < 0.2
|
||||
HsYAML >= 0.1.1.1 && < 0.2,
|
||||
ipynb >= 0.1 && < 0.2
|
||||
if impl(ghc < 8.0)
|
||||
build-depends: semigroups == 0.18.*,
|
||||
-- basement 0.0.8 and foundation 0.0.21, transitive
|
||||
|
@ -470,12 +471,14 @@ library
|
|||
Text.Pandoc.Readers.Man,
|
||||
Text.Pandoc.Readers.FB2,
|
||||
Text.Pandoc.Readers.DokuWiki,
|
||||
Text.Pandoc.Readers.Ipynb,
|
||||
Text.Pandoc.Writers,
|
||||
Text.Pandoc.Writers.Native,
|
||||
Text.Pandoc.Writers.Docbook,
|
||||
Text.Pandoc.Writers.JATS,
|
||||
Text.Pandoc.Writers.OPML,
|
||||
Text.Pandoc.Writers.HTML,
|
||||
Text.Pandoc.Writers.Ipynb,
|
||||
Text.Pandoc.Writers.ICML,
|
||||
Text.Pandoc.Writers.LaTeX,
|
||||
Text.Pandoc.Writers.ConTeXt,
|
||||
|
|
|
@ -90,5 +90,6 @@ formatFromFilePath x =
|
|||
".txt" -> Just "markdown"
|
||||
".wiki" -> Just "mediawiki"
|
||||
".xhtml" -> Just "html"
|
||||
".ipynb" -> Just "ipynb"
|
||||
['.',y] | y `elem` ['1'..'9'] -> Just "man"
|
||||
_ -> Nothing
|
||||
|
|
|
@ -70,6 +70,7 @@ data PandocError = PandocIOError String IOError
|
|||
| PandocEpubSubdirectoryError String
|
||||
| PandocMacroLoop String
|
||||
| PandocUTF8DecodingError String Int Word8
|
||||
| PandocIpynbDecodingError String
|
||||
deriving (Show, Typeable, Generic)
|
||||
|
||||
instance Exception PandocError
|
||||
|
@ -124,6 +125,8 @@ handleError (Left e) =
|
|||
"UTF-8 decoding error in " ++ f ++ " at byte offset " ++ show offset ++
|
||||
" (" ++ printf "%2x" w ++ ").\n" ++
|
||||
"The input must be a UTF-8 encoded text."
|
||||
PandocIpynbDecodingError w -> err 93 $
|
||||
"ipynb decoding error: " ++ w
|
||||
|
||||
err :: Int -> String -> IO a
|
||||
err exitCode msg = do
|
||||
|
|
|
@ -332,6 +332,8 @@ getDefaultExtensions "markdown_phpextra" = phpMarkdownExtraExtensions
|
|||
getDefaultExtensions "markdown_mmd" = multimarkdownExtensions
|
||||
getDefaultExtensions "markdown_github" = githubMarkdownExtensions
|
||||
getDefaultExtensions "markdown" = pandocExtensions
|
||||
getDefaultExtensions "ipynb" = enableExtension Ext_tex_math_dollars
|
||||
githubMarkdownExtensions
|
||||
getDefaultExtensions "muse" = extensionsFromList
|
||||
[Ext_amuse,
|
||||
Ext_auto_identifiers]
|
||||
|
|
|
@ -67,6 +67,7 @@ module Text.Pandoc.Readers
|
|||
, readEPUB
|
||||
, readMuse
|
||||
, readFB2
|
||||
, readIpynb
|
||||
-- * Miscellaneous
|
||||
, getReader
|
||||
, getDefaultExtensions
|
||||
|
@ -90,6 +91,7 @@ import Text.Pandoc.Readers.Docx
|
|||
import Text.Pandoc.Readers.DokuWiki
|
||||
import Text.Pandoc.Readers.EPUB
|
||||
import Text.Pandoc.Readers.FB2
|
||||
import Text.Pandoc.Readers.Ipynb
|
||||
import Text.Pandoc.Readers.Haddock
|
||||
import Text.Pandoc.Readers.HTML (readHtml)
|
||||
import Text.Pandoc.Readers.JATS (readJATS)
|
||||
|
@ -147,6 +149,7 @@ readers = [ ("native" , TextReader readNative)
|
|||
,("muse" , TextReader readMuse)
|
||||
,("man" , TextReader readMan)
|
||||
,("fb2" , TextReader readFB2)
|
||||
,("ipynb" , TextReader readIpynb)
|
||||
]
|
||||
|
||||
-- | Retrieve reader, extensions based on formatSpec (format+extensions).
|
||||
|
|
249
src/Text/Pandoc/Readers/Ipynb.hs
Normal file
249
src/Text/Pandoc/Readers/Ipynb.hs
Normal file
|
@ -0,0 +1,249 @@
|
|||
{-# LANGUAGE NoImplicitPrelude #-}
|
||||
{-# LANGUAGE DeriveGeneric #-}
|
||||
{-# LANGUAGE LambdaCase #-}
|
||||
{-# LANGUAGE OverloadedStrings #-}
|
||||
{-# LANGUAGE FlexibleContexts #-}
|
||||
{-# LANGUAGE ScopedTypeVariables #-}
|
||||
{-
|
||||
Copyright (C) 2019 John MacFarlane <jgm@berkeley.edu>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
-}
|
||||
|
||||
{- |
|
||||
Module : Text.Pandoc.Readers.Ipynb
|
||||
Copyright : Copyright (C) 2019 John MacFarlane
|
||||
License : GNU GPL, version 2 or above
|
||||
|
||||
Maintainer : John MacFarlane <jgm@berkeley.edu>
|
||||
Stability : alpha
|
||||
Portability : portable
|
||||
|
||||
Ipynb (Jupyter notebook JSON format) reader for pandoc.
|
||||
-}
|
||||
module Text.Pandoc.Readers.Ipynb ( readIpynb )
|
||||
where
|
||||
import Prelude
|
||||
import Data.Maybe (fromMaybe)
|
||||
import Data.Digest.Pure.SHA (sha1, showDigest)
|
||||
import Text.Pandoc.Options
|
||||
import qualified Text.Pandoc.Builder as B
|
||||
import Text.Pandoc.Logging
|
||||
import Text.Pandoc.Definition
|
||||
import Data.Ipynb as Ipynb
|
||||
import Text.Pandoc.Class
|
||||
import Text.Pandoc.MIME (extensionFromMimeType)
|
||||
import Text.Pandoc.UTF8
|
||||
import Text.Pandoc.Error
|
||||
import Data.Text (Text)
|
||||
import qualified Data.Map as M
|
||||
import qualified Data.Text as T
|
||||
import qualified Data.Text.Encoding as TE
|
||||
import qualified Data.ByteString.Lazy as BL
|
||||
import Data.Aeson as Aeson
|
||||
import Control.Monad.Except (throwError)
|
||||
import Text.Pandoc.Readers.Markdown (readMarkdown)
|
||||
import Text.Pandoc.Readers.HTML (readHtml)
|
||||
|
||||
readIpynb :: PandocMonad m => ReaderOptions -> Text -> m Pandoc
|
||||
readIpynb opts t = do
|
||||
let src = BL.fromStrict (TE.encodeUtf8 t)
|
||||
case eitherDecode src of
|
||||
Right (notebook4 :: Notebook NbV4) -> notebookToPandoc opts notebook4
|
||||
Left _ ->
|
||||
case eitherDecode src of
|
||||
Right (notebook3 :: Notebook NbV3) -> notebookToPandoc opts notebook3
|
||||
Left err -> throwError $ PandocIpynbDecodingError err
|
||||
|
||||
notebookToPandoc :: (PandocMonad m, FromJSON (Notebook a))
|
||||
=> ReaderOptions -> Notebook a -> m Pandoc
|
||||
notebookToPandoc opts notebook = do
|
||||
let cells = notebookCells notebook
|
||||
let (fmt,fmtminor) = notebookFormat notebook
|
||||
let m = M.insert "nbformat" (MetaString $ show fmt) $
|
||||
M.insert "nbformat_minor" (MetaString $ show fmtminor) $
|
||||
jsonMetaToMeta (notebookMetadata notebook)
|
||||
let lang = case M.lookup "kernelspec" m of
|
||||
Just (MetaMap ks) ->
|
||||
case M.lookup "language" ks of
|
||||
Just (MetaString l) -> l
|
||||
_ -> "python"
|
||||
_ -> "python"
|
||||
bs <- mconcat <$> mapM (cellToBlocks opts lang) cells
|
||||
let Pandoc _ blocks = B.doc bs
|
||||
return $ Pandoc (Meta m) blocks
|
||||
|
||||
cellToBlocks :: PandocMonad m
|
||||
=> ReaderOptions -> String -> Cell a -> m B.Blocks
|
||||
cellToBlocks opts lang c = do
|
||||
let Source ts = cellSource c
|
||||
let source = mconcat ts
|
||||
let kvs = jsonMetaToPairs (cellMetadata c)
|
||||
let attachments = maybe mempty M.toList $ cellAttachments c
|
||||
mapM_ addAttachment attachments
|
||||
case cellType c of
|
||||
Ipynb.Markdown -> do
|
||||
Pandoc _ bs <- readMarkdown opts source
|
||||
return $ B.divWith ("",["cell","markdown"],kvs)
|
||||
$ B.fromList bs
|
||||
Ipynb.Heading lev -> do
|
||||
Pandoc _ bs <- readMarkdown opts
|
||||
(T.replicate lev "#" <> " " <> source)
|
||||
return $ B.divWith ("",["cell","markdown"],kvs)
|
||||
$ B.fromList bs
|
||||
Ipynb.Raw -> do
|
||||
let format = fromMaybe "" $ lookup "format" kvs
|
||||
let format' =
|
||||
case format of
|
||||
"text/html" -> "html"
|
||||
"text/latex" -> "latex"
|
||||
"application/pdf" -> "latex"
|
||||
"text/markdown" -> "markdown"
|
||||
"text/x-rsrt" -> "rst"
|
||||
_ -> format
|
||||
return $ B.divWith ("",["cell","raw"],kvs) $ B.rawBlock format'
|
||||
$ T.unpack source
|
||||
Ipynb.Code{ codeOutputs = outputs, codeExecutionCount = ec } -> do
|
||||
outputBlocks <- mconcat <$> mapM (outputToBlock opts) outputs
|
||||
let kvs' = maybe kvs (\x -> ("execution_count", show x):kvs) ec
|
||||
return $ B.divWith ("",["cell","code"],kvs') $
|
||||
B.codeBlockWith ("",[lang],[]) (T.unpack source)
|
||||
<> outputBlocks
|
||||
|
||||
addAttachment :: PandocMonad m => (Text, MimeBundle) -> m ()
|
||||
addAttachment (fname, mimeBundle) = do
|
||||
let fp = T.unpack fname
|
||||
case M.toList (unMimeBundle mimeBundle) of
|
||||
(mimeType, BinaryData bs):_ ->
|
||||
insertMedia fp (Just $ T.unpack mimeType) (BL.fromStrict bs)
|
||||
(mimeType, TextualData t):_ ->
|
||||
insertMedia fp (Just $ T.unpack mimeType)
|
||||
(BL.fromStrict $ TE.encodeUtf8 t)
|
||||
(mimeType, JsonData v):_ ->
|
||||
insertMedia fp (Just $ T.unpack mimeType) (encode v)
|
||||
[] -> report $ CouldNotFetchResource fp "no attachment"
|
||||
|
||||
outputToBlock :: PandocMonad m => ReaderOptions -> Output a -> m B.Blocks
|
||||
outputToBlock _ Stream{ streamName = sName,
|
||||
streamText = Source text } = do
|
||||
return $ B.divWith ("",["output","stream",T.unpack sName],[])
|
||||
$ B.codeBlock $ T.unpack . mconcat $ text
|
||||
outputToBlock opts DisplayData{ displayData = data',
|
||||
displayMetadata = metadata' } =
|
||||
B.divWith ("",["output", "display_data"],[]) <$>
|
||||
handleData opts metadata' data'
|
||||
outputToBlock opts ExecuteResult{ executeCount = ec,
|
||||
executeData = data',
|
||||
executeMetadata = metadata' } =
|
||||
B.divWith ("",["output", "execute_result"],[("execution_count",show ec)])
|
||||
<$> handleData opts metadata' data'
|
||||
outputToBlock _ Err{ errName = ename,
|
||||
errValue = evalue,
|
||||
errTraceback = traceback } = do
|
||||
return $ B.divWith ("",["output","error"],
|
||||
[("ename",T.unpack ename),
|
||||
("evalue",T.unpack evalue)])
|
||||
$ B.codeBlock $ T.unpack . T.unlines $ traceback
|
||||
|
||||
-- We want to display the richest output possible given
|
||||
-- the output format.
|
||||
handleData :: PandocMonad m
|
||||
=> ReaderOptions -> JSONMeta -> MimeBundle -> m B.Blocks
|
||||
handleData opts metadata (MimeBundle mb) = do
|
||||
let mimePairs = M.toList mb
|
||||
|
||||
results <- mapM dataBlock mimePairs
|
||||
|
||||
-- return the result with highest priority:
|
||||
|
||||
let highest = maximum (0 : map fst results)
|
||||
return $ case [r | (pr, r) <- results, pr == highest] of
|
||||
x:_ -> x
|
||||
[] -> mempty
|
||||
|
||||
where
|
||||
|
||||
exts = readerExtensions opts
|
||||
|
||||
dataBlock :: PandocMonad m => (MimeType, MimeData) -> m (Int, B.Blocks)
|
||||
dataBlock (mt, BinaryData bs)
|
||||
| "image/" `T.isPrefixOf` mt
|
||||
= do
|
||||
-- normally metadata maps from mime types to key-value map;
|
||||
-- but not always...
|
||||
let meta = case M.lookup mt metadata of
|
||||
Just v@(Object{}) ->
|
||||
case fromJSON v of
|
||||
Success m' -> m'
|
||||
Error _ -> mempty
|
||||
_ -> mempty
|
||||
let metaPairs = jsonMetaToPairs meta
|
||||
let bl = BL.fromStrict bs
|
||||
-- SHA1 hash for filename
|
||||
let mt' = T.unpack mt
|
||||
let fname = showDigest (sha1 bl) ++
|
||||
case extensionFromMimeType mt' of
|
||||
Nothing -> ""
|
||||
Just ext -> '.':ext
|
||||
insertMedia fname (Just mt') bl
|
||||
return (3, B.para $ B.imageWith ("",[],metaPairs) fname "" mempty)
|
||||
|
||||
dataBlock (_, BinaryData _) = return (0, mempty)
|
||||
|
||||
dataBlock ("text/html", TextualData t)
|
||||
| extensionEnabled Ext_raw_html exts
|
||||
= return (2, B.rawBlock "html" $ T.unpack t)
|
||||
| otherwise = do -- try parsing the HTML
|
||||
Pandoc _ bls <- readHtml opts t
|
||||
return (1, B.fromList bls)
|
||||
|
||||
dataBlock ("text/latex", TextualData t) =
|
||||
return $ if extensionEnabled Ext_raw_tex exts
|
||||
then (2, B.rawBlock "latex" $ T.unpack t)
|
||||
else (0, mempty)
|
||||
|
||||
dataBlock ("text/plain", TextualData t) =
|
||||
return (0, B.codeBlock $ T.unpack t)
|
||||
|
||||
dataBlock (_, JsonData v) =
|
||||
return (2, B.codeBlockWith ("",["json"],[]) $ toStringLazy $ encode v)
|
||||
|
||||
dataBlock _ = return (0, mempty)
|
||||
|
||||
jsonMetaToMeta :: JSONMeta -> M.Map String MetaValue
|
||||
jsonMetaToMeta = M.mapKeys T.unpack . M.map valueToMetaValue
|
||||
where
|
||||
valueToMetaValue :: Value -> MetaValue
|
||||
valueToMetaValue x@(Object{}) =
|
||||
case fromJSON x of
|
||||
Error s -> MetaString s
|
||||
Success jm' -> MetaMap $ jsonMetaToMeta jm'
|
||||
valueToMetaValue x@(Array{}) =
|
||||
case fromJSON x of
|
||||
Error s -> MetaString s
|
||||
Success xs -> MetaList $ map valueToMetaValue xs
|
||||
valueToMetaValue (Bool b) = MetaBool b
|
||||
valueToMetaValue (String t) = MetaString (T.unpack t)
|
||||
valueToMetaValue (Number n) = MetaString (show n)
|
||||
valueToMetaValue Aeson.Null = MetaString ""
|
||||
|
||||
jsonMetaToPairs :: JSONMeta -> [(String, String)]
|
||||
jsonMetaToPairs = M.toList . M.mapMaybe
|
||||
(\case
|
||||
MetaString s -> Just s
|
||||
MetaBool True -> Just "true"
|
||||
MetaBool False -> Just "false"
|
||||
-- for now we skip complex cell metadata:
|
||||
_ -> Nothing) . jsonMetaToMeta
|
|
@ -58,6 +58,7 @@ getDefaultTemplate writer = do
|
|||
"docx" -> return ""
|
||||
"fb2" -> return ""
|
||||
"pptx" -> return ""
|
||||
"ipynb" -> return ""
|
||||
"odt" -> getDefaultTemplate "opendocument"
|
||||
"html" -> getDefaultTemplate "html5"
|
||||
"docbook" -> getDefaultTemplate "docbook5"
|
||||
|
|
|
@ -49,6 +49,7 @@ module Text.Pandoc.Writers
|
|||
, writeEPUB2
|
||||
, writeEPUB3
|
||||
, writeFB2
|
||||
, writeIpynb
|
||||
, writeHaddock
|
||||
, writeHtml4
|
||||
, writeHtml4String
|
||||
|
@ -101,6 +102,7 @@ import Text.Pandoc.Writers.Docx
|
|||
import Text.Pandoc.Writers.DokuWiki
|
||||
import Text.Pandoc.Writers.EPUB
|
||||
import Text.Pandoc.Writers.FB2
|
||||
import Text.Pandoc.Writers.Ipynb
|
||||
import Text.Pandoc.Writers.Haddock
|
||||
import Text.Pandoc.Writers.HTML
|
||||
import Text.Pandoc.Writers.ICML
|
||||
|
@ -140,6 +142,7 @@ writers = [
|
|||
,("epub2" , ByteStringWriter writeEPUB2)
|
||||
,("epub3" , ByteStringWriter writeEPUB3)
|
||||
,("fb2" , TextWriter writeFB2)
|
||||
,("ipynb" , TextWriter writeIpynb)
|
||||
,("html" , TextWriter writeHtml5String)
|
||||
,("html4" , TextWriter writeHtml4String)
|
||||
,("html5" , TextWriter writeHtml5String)
|
||||
|
|
233
src/Text/Pandoc/Writers/Ipynb.hs
Normal file
233
src/Text/Pandoc/Writers/Ipynb.hs
Normal file
|
@ -0,0 +1,233 @@
|
|||
{-# LANGUAGE NoImplicitPrelude #-}
|
||||
{-# LANGUAGE DeriveGeneric #-}
|
||||
{-# LANGUAGE OverloadedStrings #-}
|
||||
{-# LANGUAGE ScopedTypeVariables #-}
|
||||
{-
|
||||
Copyright (C) 2019 John MacFarlane <jgm@berkeley.edu>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
-}
|
||||
|
||||
{- |
|
||||
Module : Text.Pandoc.Writers.Ipynb
|
||||
Copyright : Copyright (C) 2019 John MacFarlane
|
||||
License : GNU GPL, version 2 or above
|
||||
|
||||
Maintainer : John MacFarlane <jgm@berkeley.edu>
|
||||
Stability : alpha
|
||||
Portability : portable
|
||||
|
||||
Ipynb (Jupyter notebook JSON format) writer for pandoc.
|
||||
|
||||
-}
|
||||
module Text.Pandoc.Writers.Ipynb ( writeIpynb )
|
||||
where
|
||||
import Prelude
|
||||
import Control.Monad (foldM)
|
||||
import Control.Monad.State
|
||||
import qualified Data.Map as M
|
||||
import Data.Maybe (catMaybes, fromMaybe)
|
||||
import Text.Pandoc.Options
|
||||
import Text.Pandoc.Definition
|
||||
import Data.Ipynb as Ipynb
|
||||
import Text.Pandoc.Walk (walkM)
|
||||
import qualified Text.Pandoc.Builder as B
|
||||
import Text.Pandoc.Class
|
||||
import Text.Pandoc.Logging
|
||||
import Data.Text (Text)
|
||||
import qualified Data.Text as T
|
||||
import Data.Aeson as Aeson
|
||||
import qualified Text.Pandoc.UTF8 as UTF8
|
||||
import Text.Pandoc.Shared (safeRead)
|
||||
import Text.Pandoc.Writers.Shared (metaToJSON')
|
||||
import Text.Pandoc.Writers.Markdown (writeMarkdown)
|
||||
import qualified Data.Text.Encoding as TE
|
||||
import qualified Data.ByteString.Lazy as BL
|
||||
import Data.Aeson.Encode.Pretty (Config(..), defConfig,
|
||||
encodePretty', keyOrder, Indent(Spaces))
|
||||
|
||||
writeIpynb :: PandocMonad m => WriterOptions -> Pandoc -> m Text
|
||||
writeIpynb opts d = do
|
||||
notebook <- pandocToNotebook opts d
|
||||
return $ TE.decodeUtf8 . BL.toStrict . encodePretty' defConfig{
|
||||
confIndent = Spaces 1,
|
||||
confCompare = keyOrder
|
||||
[ "cells", "nbformat", "nbformat_minor",
|
||||
"cell_type", "output_type",
|
||||
"execution_count", "metadata",
|
||||
"outputs", "source",
|
||||
"data", "name", "text" ] }
|
||||
$ notebook
|
||||
|
||||
pandocToNotebook :: PandocMonad m
|
||||
=> WriterOptions -> Pandoc -> m (Notebook NbV4)
|
||||
pandocToNotebook opts (Pandoc meta blocks) = do
|
||||
let blockWriter bs = writeMarkdown
|
||||
opts{ writerTemplate = Nothing } (Pandoc nullMeta bs)
|
||||
let inlineWriter ils = T.stripEnd <$> writeMarkdown
|
||||
opts{ writerTemplate = Nothing } (Pandoc nullMeta [Plain ils])
|
||||
metadata' <- metaToJSON' blockWriter inlineWriter $
|
||||
B.deleteMeta "nbformat" $
|
||||
B.deleteMeta "nbformat_minor" $ meta
|
||||
let metadata = case fromJSON metadata' of
|
||||
Error _ -> mempty -- TODO warning here? shouldn't happen
|
||||
Success x -> x
|
||||
cells <- extractCells opts blocks
|
||||
return $ Notebook{
|
||||
notebookMetadata = metadata
|
||||
, notebookFormat = (4, 5)
|
||||
, notebookCells = cells }
|
||||
|
||||
addAttachment :: PandocMonad m
|
||||
=> Inline
|
||||
-> StateT (M.Map Text MimeBundle) m Inline
|
||||
addAttachment (Image attr lab (src,tit)) = do
|
||||
(img, mbmt) <- fetchItem src
|
||||
let mt = maybe "application/octet-stream" (T.pack) mbmt
|
||||
modify $ M.insert (T.pack src)
|
||||
(MimeBundle (M.insert mt (BinaryData img) mempty))
|
||||
return $ Image attr lab ("attachment:" <> src, tit)
|
||||
addAttachment x = return x
|
||||
|
||||
extractCells :: PandocMonad m => WriterOptions -> [Block] -> m [Cell a]
|
||||
extractCells _ [] = return []
|
||||
extractCells opts (Div (_id,classes,kvs) xs : bs)
|
||||
| "cell" `elem` classes
|
||||
, "markdown" `elem` classes = do
|
||||
let meta = pairsToJSONMeta kvs
|
||||
(newdoc, attachments) <-
|
||||
runStateT (walkM addAttachment (Pandoc nullMeta xs)) mempty
|
||||
source <- writeMarkdown opts{ writerTemplate = Nothing } newdoc
|
||||
(Cell{
|
||||
cellType = Markdown
|
||||
, cellSource = Source $ breakLines source
|
||||
, cellMetadata = meta
|
||||
, cellAttachments = if M.null attachments
|
||||
then Nothing
|
||||
else Just attachments } :)
|
||||
<$> extractCells opts bs
|
||||
| "cell" `elem` classes
|
||||
, "code" `elem` classes = do
|
||||
let (codeContent, rest) =
|
||||
case xs of
|
||||
(CodeBlock _ t : ys) -> (T.pack t, ys)
|
||||
ys -> (mempty, ys)
|
||||
let meta = pairsToJSONMeta kvs
|
||||
outputs <- catMaybes <$> mapM blockToOutput rest
|
||||
let exeCount = lookup "execution_count" kvs >>= safeRead
|
||||
(Cell{
|
||||
cellType = Ipynb.Code {
|
||||
codeExecutionCount = exeCount
|
||||
, codeOutputs = outputs
|
||||
}
|
||||
, cellSource = Source $ breakLines codeContent
|
||||
, cellMetadata = meta
|
||||
, cellAttachments = Nothing } :) <$> extractCells opts bs
|
||||
| "cell" `elem` classes
|
||||
, "raw" `elem` classes =
|
||||
case xs of
|
||||
[RawBlock (Format f) raw] -> do
|
||||
let format' =
|
||||
case f of
|
||||
"html" -> "text/html"
|
||||
"revealjs" -> "text/html"
|
||||
"latex" -> "text/latex"
|
||||
"markdown" -> "text/markdown"
|
||||
"rst" -> "text/x-rst"
|
||||
_ -> f
|
||||
(Cell{
|
||||
cellType = Raw
|
||||
, cellSource = Source $ breakLines $ T.pack raw
|
||||
, cellMetadata = M.insert "format"
|
||||
(Aeson.String $ T.pack format') mempty
|
||||
, cellAttachments = Nothing } :) <$> extractCells opts bs
|
||||
_ -> extractCells opts bs
|
||||
extractCells opts (CodeBlock (_id,classes,kvs) raw : bs)
|
||||
| "code" `elem` classes = do
|
||||
let meta = pairsToJSONMeta kvs
|
||||
let exeCount = lookup "execution_count" kvs >>= safeRead
|
||||
(Cell{
|
||||
cellType = Ipynb.Code {
|
||||
codeExecutionCount = exeCount
|
||||
, codeOutputs = []
|
||||
}
|
||||
, cellSource = Source $ breakLines $ T.pack raw
|
||||
, cellMetadata = meta
|
||||
, cellAttachments = Nothing } :) <$> extractCells opts bs
|
||||
extractCells opts (b:bs) = do
|
||||
let isCodeOrDiv (CodeBlock (_,cl,_) _) = "code" `elem` cl
|
||||
isCodeOrDiv (Div (_,cl,_) _) = "cell" `elem` cl
|
||||
isCodeOrDiv _ = False
|
||||
let (mds, rest) = break (isCodeOrDiv) bs
|
||||
extractCells opts (Div ("",["cell","markdown"],[]) (b:mds) : rest)
|
||||
|
||||
blockToOutput :: PandocMonad m => Block -> m (Maybe (Output a))
|
||||
blockToOutput (Div (_,["output","stream",sname],_) (CodeBlock _ t:_)) =
|
||||
return $ Just
|
||||
$ Stream{ streamName = T.pack sname
|
||||
, streamText = Source (breakLines $ T.pack t) }
|
||||
blockToOutput (Div (_,["output","error"],kvs) (CodeBlock _ t:_)) =
|
||||
return $ Just
|
||||
$ Err{ errName = maybe mempty T.pack (lookup "ename" kvs)
|
||||
, errValue = maybe mempty T.pack (lookup "evalue" kvs)
|
||||
, errTraceback = breakLines $ T.pack t }
|
||||
blockToOutput (Div (_,["output","execute_result"],kvs) bs) = do
|
||||
(data', metadata') <- extractData bs
|
||||
return $ Just
|
||||
$ ExecuteResult{ executeCount = fromMaybe 0 $
|
||||
lookup "execution_count" kvs >>= safeRead
|
||||
, executeData = data'
|
||||
, executeMetadata = pairsToJSONMeta kvs <> metadata'}
|
||||
blockToOutput (Div (_,["output","display_data"],kvs) bs) = do
|
||||
(data', metadata') <- extractData bs
|
||||
return $ Just
|
||||
$ DisplayData { displayData = data'
|
||||
, displayMetadata = pairsToJSONMeta kvs <> metadata'}
|
||||
blockToOutput _ = return Nothing
|
||||
|
||||
extractData :: PandocMonad m => [Block] -> m (MimeBundle, JSONMeta)
|
||||
extractData bs = do
|
||||
(mmap, meta) <- foldM go mempty bs
|
||||
return (MimeBundle mmap, meta)
|
||||
where
|
||||
go (mmap, meta) b@(Para [Image (_,_,kvs) _ (src,_)]) = do
|
||||
(img, mbmt) <- fetchItem src
|
||||
case mbmt of
|
||||
Just mt -> return
|
||||
(M.insert (T.pack mt) (BinaryData img) mmap,
|
||||
meta <> pairsToJSONMeta kvs)
|
||||
Nothing -> (mmap, meta) <$ report (BlockNotRendered b)
|
||||
go (mmap, meta) b@(CodeBlock (_,["json"],_) code) =
|
||||
case decode (UTF8.fromStringLazy code) of
|
||||
Just v -> return
|
||||
(M.insert "application/json" (JsonData v) mmap, meta)
|
||||
Nothing -> (mmap, meta) <$ report (BlockNotRendered b)
|
||||
go (mmap, meta) (CodeBlock ("",[],[]) code) =
|
||||
return (M.insert "text/plain" (TextualData (T.pack code)) mmap, meta)
|
||||
go (mmap, meta) (RawBlock (Format "html") raw) =
|
||||
return (M.insert "text/html" (TextualData (T.pack raw)) mmap, meta)
|
||||
go (mmap, meta) (RawBlock (Format "latex") raw) =
|
||||
return (M.insert "text/latex" (TextualData (T.pack raw)) mmap, meta)
|
||||
go (mmap, meta) b = (mmap, meta) <$ report (BlockNotRendered b)
|
||||
|
||||
pairsToJSONMeta :: [(String, String)] -> JSONMeta
|
||||
pairsToJSONMeta kvs =
|
||||
M.fromList [(T.pack k, case v of
|
||||
"true" -> Bool True
|
||||
"false" -> Bool False
|
||||
_ -> case safeRead v of
|
||||
Just n -> Number n
|
||||
_ -> String (T.pack v))
|
||||
| (k,v) <- kvs , k /= "execution_count" ]
|
|
@ -21,6 +21,7 @@ packages:
|
|||
extra-deps:
|
||||
- github: jgm/pandoc-citeproc
|
||||
commit: 4b467c62af17ddfc739933891c5ea2291a6b9b76
|
||||
- ipynb-0.1
|
||||
- haddock-library-1.7.0
|
||||
- HsYAML-0.1.1.2
|
||||
- yaml-0.11.0.0
|
||||
|
|
Loading…
Reference in a new issue