diff --git a/MANUAL.txt b/MANUAL.txt index f9a751451..ca2c5ea7f 100644 --- a/MANUAL.txt +++ b/MANUAL.txt @@ -248,6 +248,7 @@ General options {.options} if you need extensions not supported in [`gfm`](#markdown-variants). - `haddock` ([Haddock markup]) - `html` ([HTML]) + - `ipynb` ([Jupyter notebook]) - `jats` ([JATS] XML) - `json` (JSON version of native AST) - `latex` ([LaTeX]) @@ -300,6 +301,7 @@ General options {.options} - `html` or `html5` ([HTML], i.e. [HTML5]/XHTML [polyglot markup]) - `html4` ([XHTML] 1.0 Transitional) - `icml` ([InDesign ICML]) + - `ipynb` ([Jupyter notebook]) - `jats` ([JATS] XML) - `json` (JSON version of native AST) - `latex` ([LaTeX]) @@ -471,6 +473,7 @@ General options {.options} [PDF]: https://www.adobe.com/pdf/ [reveal.js]: http://lab.hakim.se/reveal-js/ [FictionBook2]: http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1 +[Jupyter notebook]: https://nbformat.readthedocs.io/en/latest/ [InDesign ICML]: http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf [TEI Simple]: https://github.com/TEIC/TEI-Simple [Muse]: https://amusewiki.org/library/manual @@ -730,6 +733,8 @@ General writer options {.options} where there are nonsemantic newlines in the source, there will be nonsemantic newlines in the output as well). Automatic wrapping does not currently work in HTML output. + In `ipynb` output, this option affects wrapping of the + contents of markdown cells. `--columns=`*NUMBER* @@ -902,6 +907,7 @@ Options affecting specific writers {.options} : Use ATX-style headers in Markdown output. The default is to use setext-style headers for levels 1-2, and then ATX headers. (Note: for `gfm` output, ATX headers are always used.) + This option also affects markdown cells in `ipynb` output. `--top-level-division=[default|section|chapter|part]` @@ -1806,6 +1812,10 @@ section [Pandoc's Markdown] below (See [Markdown variants] for `commonmark` and `gfm`.) In the following, extensions that also work for other formats are covered. +Note that markdown extensions added to the `ipynb` format +affect Markdown cells in Jupyter notebooks (as do command-line +options like `--atx-headers`). + Typography ---------- @@ -1955,11 +1965,19 @@ This extension can be enabled/disabled for the following formats input formats : `latex`, `org`, `textile`, `html` (environments, `\ref`, and - `\eqref` only) + `\eqref` only), `ipynb` output formats : `textile`, `commonmark` +Note: as applied to `ipynb`, `raw_html` and `raw_tex` affect not +only raw TeX in markdown cells, but data with mime type +`text/html` in output cells. Since the `ipynb` reader attempts +to preserve the richest possible outputs when several options +are given, you will get best results if you disable `raw_html` +and `raw_tex` when converting to formats like `docx` which don't +allow raw `html` or `tex`. + #### Extension: `native_divs` {#native_divs} This extension is enabled by default for HTML input. This means that @@ -4747,6 +4765,112 @@ with the `src` attribute. For example: +Creating Jupyter notebooks with pandoc +====================================== + +When creating a [Jupyter notebook], pandoc will try to infer the +notebook structure. Code blocks with the class `code` will be +taken as code cells, and intervening content will be taken as +Markdown cells. Attachments will automatically be created for +images in Markdown cells. For example: + +```` +--- +title: My notebook +nbformat: 4 +nbformat_minor: 5 +kernelspec: + display_name: Python 2 + language: python + name: python2 +language_info: + codemirror_mode: + name: ipython + version: 2 + file_extension: ".py" + mimetype: "text/x-python" + name: "python" + nbconvert_exporter: "python" + pygments_lexer: "ipython2" + version: "2.7.15" +--- + +# Lorem ipsum + +**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus +bibendum felis dictum sodales. + +``` code +print("hello") +``` + +## Pyout + +``` code +from IPython.display import HTML +HTML(""" + +HTML +""") +``` + +## Image + +This image ![image](myimage.png) will be +included as a cell attachment. +```` + +If you want to add cell attributes, group cells differently, or +add output to code cells, then you need to include divs to +indicate the structure. You can use either [fenced +divs][Extension: `fenced_divs`] or [native divs][Extension: +`native_divs`] for this. Here is an example: + +```` +:::::: {.cell .markdown} +# Lorem + +**Lorem ipsum** dolor sit amet, consectetur adipiscing elit. Nunc luctus +bibendum felis dictum sodales. +:::::: + +:::::: {.cell .code execution_count=1} +``` {.python} +print("hello") +``` + +::: {.output .stream .stdout} +``` +hello +``` +::: +:::::: + +:::::: {.cell .code execution_count=2} +``` {.python} +from IPython.display import HTML +HTML(""" + +HTML +""") +``` + +::: {.output .execute_result execution_count=2} +```{=html} + +HTML +hello +``` +::: +:::::: +```` + Syntax highlighting =================== diff --git a/README.md b/README.md index 76e13bbe2..6094aa8d7 100644 --- a/README.md +++ b/README.md @@ -32,6 +32,7 @@ library. It can convert *from* - `creole` ([Creole 1.0](http://www.wikicreole.org/wiki/Creole1.0)) - `docbook` ([DocBook](http://docbook.org)) - `docx` ([Word docx](https://en.wikipedia.org/wiki/Office_Open_XML)) + - `dokuwiki` ([DokuWiki markup](https://www.dokuwiki.org/dokuwiki)) - `epub` ([EPUB](http://idpf.org/epub)) - `fb2` ([FictionBook2](http://www.fictionbook.org/index.php/Eng:XML_Schema_Fictionbook_2.1) @@ -44,6 +45,8 @@ library. It can convert *from* - `haddock` ([Haddock markup](https://www.haskell.org/haddock/doc/html/ch03s08.html)) - `html` ([HTML](http://www.w3.org/html/)) + - `ipynb` ([Jupyter + notebook](https://nbformat.readthedocs.io/en/latest/)) - `jats` ([JATS](https://jats.nlm.nih.gov) XML) - `json` (JSON version of native AST) - `latex` ([LaTeX](http://latex-project.org)) @@ -105,6 +108,8 @@ It can convert *to* - `html4` ([XHTML](http://www.w3.org/TR/xhtml1/) 1.0 Transitional) - `icml` ([InDesign ICML](http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/indesign/sdk/cs6/idml/idml-cookbook.pdf)) + - `ipynb` ([Jupyter + notebook](https://nbformat.readthedocs.io/en/latest/)) - `jats` ([JATS](https://jats.nlm.nih.gov) XML) - `json` (JSON version of native AST) - `latex` ([LaTeX](http://latex-project.org)) diff --git a/cabal.project b/cabal.project index d291a15ab..ee0a8c237 100644 --- a/cabal.project +++ b/cabal.project @@ -4,3 +4,4 @@ source-repository-package type: git location: https://github.com/jgm/pandoc-citeproc tag: 4b467c62af17ddfc739933891c5ea2291a6b9b76 + diff --git a/pandoc.cabal b/pandoc.cabal index c7d61cde8..f8852c0f5 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -20,14 +20,14 @@ description: Pandoc is a Haskell library for converting from one markup (subsets of) HTML, reStructuredText, LaTeX, DocBook, JATS, MediaWiki markup, DokuWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Haddock markup, OPML, - Emacs Org-Mode, Emacs Muse, txt2tags, - Vimwiki, Word Docx, ODT, EPUB, FictionBook2, roff man, - and Textile, and it can write Markdown, reStructuredText, - XHTML, HTML 5, LaTeX, ConTeXt, DocBook, JATS, OPML, TEI, - OpenDocument, ODT, Word docx, PowerPoint pptx, - RTF, MediaWiki, DokuWiki, ZimWiki, Textile, + Emacs Org-Mode, Emacs Muse, txt2tags, ipynb (Jupyter + notebooks), Vimwiki, Word Docx, ODT, EPUB, FictionBook2, + roff man, and Textile, and it can write Markdown, + reStructuredText, XHTML, HTML 5, LaTeX, ConTeXt, DocBook, + JATS, OPML, TEI, OpenDocument, ODT, Word docx, + PowerPoint pptx, RTF, MediaWiki, DokuWiki, ZimWiki, Textile, roff man, roff ms, plain text, Emacs Org-Mode, - AsciiDoc, Haddock markup, EPUB (v2 and v3), + AsciiDoc, Haddock markup, EPUB (v2 and v3), ipynb, FictionBook2, InDesign ICML, Muse, LaTeX beamer slides, and several kinds of HTML/JavaScript slide shows (S5, Slidy, Slideous, DZSlides, reveal.js). @@ -398,7 +398,8 @@ library http-types >= 0.8 && < 0.13, case-insensitive >= 1.2 && < 1.3, unicode-transforms >= 0.3 && < 0.4, - HsYAML >= 0.1.1.1 && < 0.2 + HsYAML >= 0.1.1.1 && < 0.2, + ipynb >= 0.1 && < 0.2 if impl(ghc < 8.0) build-depends: semigroups == 0.18.*, -- basement 0.0.8 and foundation 0.0.21, transitive @@ -470,12 +471,14 @@ library Text.Pandoc.Readers.Man, Text.Pandoc.Readers.FB2, Text.Pandoc.Readers.DokuWiki, + Text.Pandoc.Readers.Ipynb, Text.Pandoc.Writers, Text.Pandoc.Writers.Native, Text.Pandoc.Writers.Docbook, Text.Pandoc.Writers.JATS, Text.Pandoc.Writers.OPML, Text.Pandoc.Writers.HTML, + Text.Pandoc.Writers.Ipynb, Text.Pandoc.Writers.ICML, Text.Pandoc.Writers.LaTeX, Text.Pandoc.Writers.ConTeXt, diff --git a/src/Text/Pandoc/App/FormatHeuristics.hs b/src/Text/Pandoc/App/FormatHeuristics.hs index c8dbcd645..723c5d894 100644 --- a/src/Text/Pandoc/App/FormatHeuristics.hs +++ b/src/Text/Pandoc/App/FormatHeuristics.hs @@ -90,5 +90,6 @@ formatFromFilePath x = ".txt" -> Just "markdown" ".wiki" -> Just "mediawiki" ".xhtml" -> Just "html" + ".ipynb" -> Just "ipynb" ['.',y] | y `elem` ['1'..'9'] -> Just "man" _ -> Nothing diff --git a/src/Text/Pandoc/Error.hs b/src/Text/Pandoc/Error.hs index e4a5bb092..62843dbd2 100644 --- a/src/Text/Pandoc/Error.hs +++ b/src/Text/Pandoc/Error.hs @@ -70,6 +70,7 @@ data PandocError = PandocIOError String IOError | PandocEpubSubdirectoryError String | PandocMacroLoop String | PandocUTF8DecodingError String Int Word8 + | PandocIpynbDecodingError String deriving (Show, Typeable, Generic) instance Exception PandocError @@ -124,6 +125,8 @@ handleError (Left e) = "UTF-8 decoding error in " ++ f ++ " at byte offset " ++ show offset ++ " (" ++ printf "%2x" w ++ ").\n" ++ "The input must be a UTF-8 encoded text." + PandocIpynbDecodingError w -> err 93 $ + "ipynb decoding error: " ++ w err :: Int -> String -> IO a err exitCode msg = do diff --git a/src/Text/Pandoc/Extensions.hs b/src/Text/Pandoc/Extensions.hs index f660cf766..b96fafa81 100644 --- a/src/Text/Pandoc/Extensions.hs +++ b/src/Text/Pandoc/Extensions.hs @@ -332,6 +332,8 @@ getDefaultExtensions "markdown_phpextra" = phpMarkdownExtraExtensions getDefaultExtensions "markdown_mmd" = multimarkdownExtensions getDefaultExtensions "markdown_github" = githubMarkdownExtensions getDefaultExtensions "markdown" = pandocExtensions +getDefaultExtensions "ipynb" = enableExtension Ext_tex_math_dollars + githubMarkdownExtensions getDefaultExtensions "muse" = extensionsFromList [Ext_amuse, Ext_auto_identifiers] diff --git a/src/Text/Pandoc/Readers.hs b/src/Text/Pandoc/Readers.hs index 434725887..04ba64f89 100644 --- a/src/Text/Pandoc/Readers.hs +++ b/src/Text/Pandoc/Readers.hs @@ -67,6 +67,7 @@ module Text.Pandoc.Readers , readEPUB , readMuse , readFB2 + , readIpynb -- * Miscellaneous , getReader , getDefaultExtensions @@ -90,6 +91,7 @@ import Text.Pandoc.Readers.Docx import Text.Pandoc.Readers.DokuWiki import Text.Pandoc.Readers.EPUB import Text.Pandoc.Readers.FB2 +import Text.Pandoc.Readers.Ipynb import Text.Pandoc.Readers.Haddock import Text.Pandoc.Readers.HTML (readHtml) import Text.Pandoc.Readers.JATS (readJATS) @@ -147,6 +149,7 @@ readers = [ ("native" , TextReader readNative) ,("muse" , TextReader readMuse) ,("man" , TextReader readMan) ,("fb2" , TextReader readFB2) + ,("ipynb" , TextReader readIpynb) ] -- | Retrieve reader, extensions based on formatSpec (format+extensions). diff --git a/src/Text/Pandoc/Readers/Ipynb.hs b/src/Text/Pandoc/Readers/Ipynb.hs new file mode 100644 index 000000000..2497e6a0a --- /dev/null +++ b/src/Text/Pandoc/Readers/Ipynb.hs @@ -0,0 +1,249 @@ +{-# LANGUAGE NoImplicitPrelude #-} +{-# LANGUAGE DeriveGeneric #-} +{-# LANGUAGE LambdaCase #-} +{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE FlexibleContexts #-} +{-# LANGUAGE ScopedTypeVariables #-} +{- +Copyright (C) 2019 John MacFarlane + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Readers.Ipynb + Copyright : Copyright (C) 2019 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane + Stability : alpha + Portability : portable + +Ipynb (Jupyter notebook JSON format) reader for pandoc. +-} +module Text.Pandoc.Readers.Ipynb ( readIpynb ) +where +import Prelude +import Data.Maybe (fromMaybe) +import Data.Digest.Pure.SHA (sha1, showDigest) +import Text.Pandoc.Options +import qualified Text.Pandoc.Builder as B +import Text.Pandoc.Logging +import Text.Pandoc.Definition +import Data.Ipynb as Ipynb +import Text.Pandoc.Class +import Text.Pandoc.MIME (extensionFromMimeType) +import Text.Pandoc.UTF8 +import Text.Pandoc.Error +import Data.Text (Text) +import qualified Data.Map as M +import qualified Data.Text as T +import qualified Data.Text.Encoding as TE +import qualified Data.ByteString.Lazy as BL +import Data.Aeson as Aeson +import Control.Monad.Except (throwError) +import Text.Pandoc.Readers.Markdown (readMarkdown) +import Text.Pandoc.Readers.HTML (readHtml) + +readIpynb :: PandocMonad m => ReaderOptions -> Text -> m Pandoc +readIpynb opts t = do + let src = BL.fromStrict (TE.encodeUtf8 t) + case eitherDecode src of + Right (notebook4 :: Notebook NbV4) -> notebookToPandoc opts notebook4 + Left _ -> + case eitherDecode src of + Right (notebook3 :: Notebook NbV3) -> notebookToPandoc opts notebook3 + Left err -> throwError $ PandocIpynbDecodingError err + +notebookToPandoc :: (PandocMonad m, FromJSON (Notebook a)) + => ReaderOptions -> Notebook a -> m Pandoc +notebookToPandoc opts notebook = do + let cells = notebookCells notebook + let (fmt,fmtminor) = notebookFormat notebook + let m = M.insert "nbformat" (MetaString $ show fmt) $ + M.insert "nbformat_minor" (MetaString $ show fmtminor) $ + jsonMetaToMeta (notebookMetadata notebook) + let lang = case M.lookup "kernelspec" m of + Just (MetaMap ks) -> + case M.lookup "language" ks of + Just (MetaString l) -> l + _ -> "python" + _ -> "python" + bs <- mconcat <$> mapM (cellToBlocks opts lang) cells + let Pandoc _ blocks = B.doc bs + return $ Pandoc (Meta m) blocks + +cellToBlocks :: PandocMonad m + => ReaderOptions -> String -> Cell a -> m B.Blocks +cellToBlocks opts lang c = do + let Source ts = cellSource c + let source = mconcat ts + let kvs = jsonMetaToPairs (cellMetadata c) + let attachments = maybe mempty M.toList $ cellAttachments c + mapM_ addAttachment attachments + case cellType c of + Ipynb.Markdown -> do + Pandoc _ bs <- readMarkdown opts source + return $ B.divWith ("",["cell","markdown"],kvs) + $ B.fromList bs + Ipynb.Heading lev -> do + Pandoc _ bs <- readMarkdown opts + (T.replicate lev "#" <> " " <> source) + return $ B.divWith ("",["cell","markdown"],kvs) + $ B.fromList bs + Ipynb.Raw -> do + let format = fromMaybe "" $ lookup "format" kvs + let format' = + case format of + "text/html" -> "html" + "text/latex" -> "latex" + "application/pdf" -> "latex" + "text/markdown" -> "markdown" + "text/x-rsrt" -> "rst" + _ -> format + return $ B.divWith ("",["cell","raw"],kvs) $ B.rawBlock format' + $ T.unpack source + Ipynb.Code{ codeOutputs = outputs, codeExecutionCount = ec } -> do + outputBlocks <- mconcat <$> mapM (outputToBlock opts) outputs + let kvs' = maybe kvs (\x -> ("execution_count", show x):kvs) ec + return $ B.divWith ("",["cell","code"],kvs') $ + B.codeBlockWith ("",[lang],[]) (T.unpack source) + <> outputBlocks + +addAttachment :: PandocMonad m => (Text, MimeBundle) -> m () +addAttachment (fname, mimeBundle) = do + let fp = T.unpack fname + case M.toList (unMimeBundle mimeBundle) of + (mimeType, BinaryData bs):_ -> + insertMedia fp (Just $ T.unpack mimeType) (BL.fromStrict bs) + (mimeType, TextualData t):_ -> + insertMedia fp (Just $ T.unpack mimeType) + (BL.fromStrict $ TE.encodeUtf8 t) + (mimeType, JsonData v):_ -> + insertMedia fp (Just $ T.unpack mimeType) (encode v) + [] -> report $ CouldNotFetchResource fp "no attachment" + +outputToBlock :: PandocMonad m => ReaderOptions -> Output a -> m B.Blocks +outputToBlock _ Stream{ streamName = sName, + streamText = Source text } = do + return $ B.divWith ("",["output","stream",T.unpack sName],[]) + $ B.codeBlock $ T.unpack . mconcat $ text +outputToBlock opts DisplayData{ displayData = data', + displayMetadata = metadata' } = + B.divWith ("",["output", "display_data"],[]) <$> + handleData opts metadata' data' +outputToBlock opts ExecuteResult{ executeCount = ec, + executeData = data', + executeMetadata = metadata' } = + B.divWith ("",["output", "execute_result"],[("execution_count",show ec)]) + <$> handleData opts metadata' data' +outputToBlock _ Err{ errName = ename, + errValue = evalue, + errTraceback = traceback } = do + return $ B.divWith ("",["output","error"], + [("ename",T.unpack ename), + ("evalue",T.unpack evalue)]) + $ B.codeBlock $ T.unpack . T.unlines $ traceback + +-- We want to display the richest output possible given +-- the output format. +handleData :: PandocMonad m + => ReaderOptions -> JSONMeta -> MimeBundle -> m B.Blocks +handleData opts metadata (MimeBundle mb) = do + let mimePairs = M.toList mb + + results <- mapM dataBlock mimePairs + + -- return the result with highest priority: + + let highest = maximum (0 : map fst results) + return $ case [r | (pr, r) <- results, pr == highest] of + x:_ -> x + [] -> mempty + + where + + exts = readerExtensions opts + + dataBlock :: PandocMonad m => (MimeType, MimeData) -> m (Int, B.Blocks) + dataBlock (mt, BinaryData bs) + | "image/" `T.isPrefixOf` mt + = do + -- normally metadata maps from mime types to key-value map; + -- but not always... + let meta = case M.lookup mt metadata of + Just v@(Object{}) -> + case fromJSON v of + Success m' -> m' + Error _ -> mempty + _ -> mempty + let metaPairs = jsonMetaToPairs meta + let bl = BL.fromStrict bs + -- SHA1 hash for filename + let mt' = T.unpack mt + let fname = showDigest (sha1 bl) ++ + case extensionFromMimeType mt' of + Nothing -> "" + Just ext -> '.':ext + insertMedia fname (Just mt') bl + return (3, B.para $ B.imageWith ("",[],metaPairs) fname "" mempty) + + dataBlock (_, BinaryData _) = return (0, mempty) + + dataBlock ("text/html", TextualData t) + | extensionEnabled Ext_raw_html exts + = return (2, B.rawBlock "html" $ T.unpack t) + | otherwise = do -- try parsing the HTML + Pandoc _ bls <- readHtml opts t + return (1, B.fromList bls) + + dataBlock ("text/latex", TextualData t) = + return $ if extensionEnabled Ext_raw_tex exts + then (2, B.rawBlock "latex" $ T.unpack t) + else (0, mempty) + + dataBlock ("text/plain", TextualData t) = + return (0, B.codeBlock $ T.unpack t) + + dataBlock (_, JsonData v) = + return (2, B.codeBlockWith ("",["json"],[]) $ toStringLazy $ encode v) + + dataBlock _ = return (0, mempty) + +jsonMetaToMeta :: JSONMeta -> M.Map String MetaValue +jsonMetaToMeta = M.mapKeys T.unpack . M.map valueToMetaValue + where + valueToMetaValue :: Value -> MetaValue + valueToMetaValue x@(Object{}) = + case fromJSON x of + Error s -> MetaString s + Success jm' -> MetaMap $ jsonMetaToMeta jm' + valueToMetaValue x@(Array{}) = + case fromJSON x of + Error s -> MetaString s + Success xs -> MetaList $ map valueToMetaValue xs + valueToMetaValue (Bool b) = MetaBool b + valueToMetaValue (String t) = MetaString (T.unpack t) + valueToMetaValue (Number n) = MetaString (show n) + valueToMetaValue Aeson.Null = MetaString "" + +jsonMetaToPairs :: JSONMeta -> [(String, String)] +jsonMetaToPairs = M.toList . M.mapMaybe + (\case + MetaString s -> Just s + MetaBool True -> Just "true" + MetaBool False -> Just "false" + -- for now we skip complex cell metadata: + _ -> Nothing) . jsonMetaToMeta diff --git a/src/Text/Pandoc/Templates.hs b/src/Text/Pandoc/Templates.hs index 80e2b1fa4..492e0ef91 100644 --- a/src/Text/Pandoc/Templates.hs +++ b/src/Text/Pandoc/Templates.hs @@ -58,6 +58,7 @@ getDefaultTemplate writer = do "docx" -> return "" "fb2" -> return "" "pptx" -> return "" + "ipynb" -> return "" "odt" -> getDefaultTemplate "opendocument" "html" -> getDefaultTemplate "html5" "docbook" -> getDefaultTemplate "docbook5" diff --git a/src/Text/Pandoc/Writers.hs b/src/Text/Pandoc/Writers.hs index b61404238..2aca56147 100644 --- a/src/Text/Pandoc/Writers.hs +++ b/src/Text/Pandoc/Writers.hs @@ -49,6 +49,7 @@ module Text.Pandoc.Writers , writeEPUB2 , writeEPUB3 , writeFB2 + , writeIpynb , writeHaddock , writeHtml4 , writeHtml4String @@ -101,6 +102,7 @@ import Text.Pandoc.Writers.Docx import Text.Pandoc.Writers.DokuWiki import Text.Pandoc.Writers.EPUB import Text.Pandoc.Writers.FB2 +import Text.Pandoc.Writers.Ipynb import Text.Pandoc.Writers.Haddock import Text.Pandoc.Writers.HTML import Text.Pandoc.Writers.ICML @@ -140,6 +142,7 @@ writers = [ ,("epub2" , ByteStringWriter writeEPUB2) ,("epub3" , ByteStringWriter writeEPUB3) ,("fb2" , TextWriter writeFB2) + ,("ipynb" , TextWriter writeIpynb) ,("html" , TextWriter writeHtml5String) ,("html4" , TextWriter writeHtml4String) ,("html5" , TextWriter writeHtml5String) diff --git a/src/Text/Pandoc/Writers/Ipynb.hs b/src/Text/Pandoc/Writers/Ipynb.hs new file mode 100644 index 000000000..6288b2e96 --- /dev/null +++ b/src/Text/Pandoc/Writers/Ipynb.hs @@ -0,0 +1,233 @@ +{-# LANGUAGE NoImplicitPrelude #-} +{-# LANGUAGE DeriveGeneric #-} +{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE ScopedTypeVariables #-} +{- +Copyright (C) 2019 John MacFarlane + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA +-} + +{- | + Module : Text.Pandoc.Writers.Ipynb + Copyright : Copyright (C) 2019 John MacFarlane + License : GNU GPL, version 2 or above + + Maintainer : John MacFarlane + Stability : alpha + Portability : portable + +Ipynb (Jupyter notebook JSON format) writer for pandoc. + +-} +module Text.Pandoc.Writers.Ipynb ( writeIpynb ) +where +import Prelude +import Control.Monad (foldM) +import Control.Monad.State +import qualified Data.Map as M +import Data.Maybe (catMaybes, fromMaybe) +import Text.Pandoc.Options +import Text.Pandoc.Definition +import Data.Ipynb as Ipynb +import Text.Pandoc.Walk (walkM) +import qualified Text.Pandoc.Builder as B +import Text.Pandoc.Class +import Text.Pandoc.Logging +import Data.Text (Text) +import qualified Data.Text as T +import Data.Aeson as Aeson +import qualified Text.Pandoc.UTF8 as UTF8 +import Text.Pandoc.Shared (safeRead) +import Text.Pandoc.Writers.Shared (metaToJSON') +import Text.Pandoc.Writers.Markdown (writeMarkdown) +import qualified Data.Text.Encoding as TE +import qualified Data.ByteString.Lazy as BL +import Data.Aeson.Encode.Pretty (Config(..), defConfig, + encodePretty', keyOrder, Indent(Spaces)) + +writeIpynb :: PandocMonad m => WriterOptions -> Pandoc -> m Text +writeIpynb opts d = do + notebook <- pandocToNotebook opts d + return $ TE.decodeUtf8 . BL.toStrict . encodePretty' defConfig{ + confIndent = Spaces 1, + confCompare = keyOrder + [ "cells", "nbformat", "nbformat_minor", + "cell_type", "output_type", + "execution_count", "metadata", + "outputs", "source", + "data", "name", "text" ] } + $ notebook + +pandocToNotebook :: PandocMonad m + => WriterOptions -> Pandoc -> m (Notebook NbV4) +pandocToNotebook opts (Pandoc meta blocks) = do + let blockWriter bs = writeMarkdown + opts{ writerTemplate = Nothing } (Pandoc nullMeta bs) + let inlineWriter ils = T.stripEnd <$> writeMarkdown + opts{ writerTemplate = Nothing } (Pandoc nullMeta [Plain ils]) + metadata' <- metaToJSON' blockWriter inlineWriter $ + B.deleteMeta "nbformat" $ + B.deleteMeta "nbformat_minor" $ meta + let metadata = case fromJSON metadata' of + Error _ -> mempty -- TODO warning here? shouldn't happen + Success x -> x + cells <- extractCells opts blocks + return $ Notebook{ + notebookMetadata = metadata + , notebookFormat = (4, 5) + , notebookCells = cells } + +addAttachment :: PandocMonad m + => Inline + -> StateT (M.Map Text MimeBundle) m Inline +addAttachment (Image attr lab (src,tit)) = do + (img, mbmt) <- fetchItem src + let mt = maybe "application/octet-stream" (T.pack) mbmt + modify $ M.insert (T.pack src) + (MimeBundle (M.insert mt (BinaryData img) mempty)) + return $ Image attr lab ("attachment:" <> src, tit) +addAttachment x = return x + +extractCells :: PandocMonad m => WriterOptions -> [Block] -> m [Cell a] +extractCells _ [] = return [] +extractCells opts (Div (_id,classes,kvs) xs : bs) + | "cell" `elem` classes + , "markdown" `elem` classes = do + let meta = pairsToJSONMeta kvs + (newdoc, attachments) <- + runStateT (walkM addAttachment (Pandoc nullMeta xs)) mempty + source <- writeMarkdown opts{ writerTemplate = Nothing } newdoc + (Cell{ + cellType = Markdown + , cellSource = Source $ breakLines source + , cellMetadata = meta + , cellAttachments = if M.null attachments + then Nothing + else Just attachments } :) + <$> extractCells opts bs + | "cell" `elem` classes + , "code" `elem` classes = do + let (codeContent, rest) = + case xs of + (CodeBlock _ t : ys) -> (T.pack t, ys) + ys -> (mempty, ys) + let meta = pairsToJSONMeta kvs + outputs <- catMaybes <$> mapM blockToOutput rest + let exeCount = lookup "execution_count" kvs >>= safeRead + (Cell{ + cellType = Ipynb.Code { + codeExecutionCount = exeCount + , codeOutputs = outputs + } + , cellSource = Source $ breakLines codeContent + , cellMetadata = meta + , cellAttachments = Nothing } :) <$> extractCells opts bs + | "cell" `elem` classes + , "raw" `elem` classes = + case xs of + [RawBlock (Format f) raw] -> do + let format' = + case f of + "html" -> "text/html" + "revealjs" -> "text/html" + "latex" -> "text/latex" + "markdown" -> "text/markdown" + "rst" -> "text/x-rst" + _ -> f + (Cell{ + cellType = Raw + , cellSource = Source $ breakLines $ T.pack raw + , cellMetadata = M.insert "format" + (Aeson.String $ T.pack format') mempty + , cellAttachments = Nothing } :) <$> extractCells opts bs + _ -> extractCells opts bs +extractCells opts (CodeBlock (_id,classes,kvs) raw : bs) + | "code" `elem` classes = do + let meta = pairsToJSONMeta kvs + let exeCount = lookup "execution_count" kvs >>= safeRead + (Cell{ + cellType = Ipynb.Code { + codeExecutionCount = exeCount + , codeOutputs = [] + } + , cellSource = Source $ breakLines $ T.pack raw + , cellMetadata = meta + , cellAttachments = Nothing } :) <$> extractCells opts bs +extractCells opts (b:bs) = do + let isCodeOrDiv (CodeBlock (_,cl,_) _) = "code" `elem` cl + isCodeOrDiv (Div (_,cl,_) _) = "cell" `elem` cl + isCodeOrDiv _ = False + let (mds, rest) = break (isCodeOrDiv) bs + extractCells opts (Div ("",["cell","markdown"],[]) (b:mds) : rest) + +blockToOutput :: PandocMonad m => Block -> m (Maybe (Output a)) +blockToOutput (Div (_,["output","stream",sname],_) (CodeBlock _ t:_)) = + return $ Just + $ Stream{ streamName = T.pack sname + , streamText = Source (breakLines $ T.pack t) } +blockToOutput (Div (_,["output","error"],kvs) (CodeBlock _ t:_)) = + return $ Just + $ Err{ errName = maybe mempty T.pack (lookup "ename" kvs) + , errValue = maybe mempty T.pack (lookup "evalue" kvs) + , errTraceback = breakLines $ T.pack t } +blockToOutput (Div (_,["output","execute_result"],kvs) bs) = do + (data', metadata') <- extractData bs + return $ Just + $ ExecuteResult{ executeCount = fromMaybe 0 $ + lookup "execution_count" kvs >>= safeRead + , executeData = data' + , executeMetadata = pairsToJSONMeta kvs <> metadata'} +blockToOutput (Div (_,["output","display_data"],kvs) bs) = do + (data', metadata') <- extractData bs + return $ Just + $ DisplayData { displayData = data' + , displayMetadata = pairsToJSONMeta kvs <> metadata'} +blockToOutput _ = return Nothing + +extractData :: PandocMonad m => [Block] -> m (MimeBundle, JSONMeta) +extractData bs = do + (mmap, meta) <- foldM go mempty bs + return (MimeBundle mmap, meta) + where + go (mmap, meta) b@(Para [Image (_,_,kvs) _ (src,_)]) = do + (img, mbmt) <- fetchItem src + case mbmt of + Just mt -> return + (M.insert (T.pack mt) (BinaryData img) mmap, + meta <> pairsToJSONMeta kvs) + Nothing -> (mmap, meta) <$ report (BlockNotRendered b) + go (mmap, meta) b@(CodeBlock (_,["json"],_) code) = + case decode (UTF8.fromStringLazy code) of + Just v -> return + (M.insert "application/json" (JsonData v) mmap, meta) + Nothing -> (mmap, meta) <$ report (BlockNotRendered b) + go (mmap, meta) (CodeBlock ("",[],[]) code) = + return (M.insert "text/plain" (TextualData (T.pack code)) mmap, meta) + go (mmap, meta) (RawBlock (Format "html") raw) = + return (M.insert "text/html" (TextualData (T.pack raw)) mmap, meta) + go (mmap, meta) (RawBlock (Format "latex") raw) = + return (M.insert "text/latex" (TextualData (T.pack raw)) mmap, meta) + go (mmap, meta) b = (mmap, meta) <$ report (BlockNotRendered b) + +pairsToJSONMeta :: [(String, String)] -> JSONMeta +pairsToJSONMeta kvs = + M.fromList [(T.pack k, case v of + "true" -> Bool True + "false" -> Bool False + _ -> case safeRead v of + Just n -> Number n + _ -> String (T.pack v)) + | (k,v) <- kvs , k /= "execution_count" ] diff --git a/stack.yaml b/stack.yaml index 5b67c6660..d01e0dc35 100644 --- a/stack.yaml +++ b/stack.yaml @@ -21,6 +21,7 @@ packages: extra-deps: - github: jgm/pandoc-citeproc commit: 4b467c62af17ddfc739933891c5ea2291a6b9b76 +- ipynb-0.1 - haddock-library-1.7.0 - HsYAML-0.1.1.2 - yaml-0.11.0.0