pandoc/src/Text/Pandoc/App/FormatHeuristics.hs
John MacFarlane e0984a43a9 Add built-in citation support using new citeproc library.
This deprecates the use of the external pandoc-citeproc
filter; citation processing is now built in to pandoc.

* Add dependency on citeproc library.
* Add Text.Pandoc.Citeproc module (and some associated unexported
  modules under Text.Pandoc.Citeproc).  Exports `processCitations`.
  [API change]
* Add data files needed for Text.Pandoc.Citeproc:  default.csl
  in the data directory, and a citeproc directory that is just
  used at compile-time.  Note that we've added file-embed as a mandatory
  rather than a conditional depedency, because of the biblatex
  localization files. We might eventually want to use readDataFile
  for this, but it would take some code reorganization.
* Text.Pandoc.Loging: Add `CiteprocWarning` to `LogMessage` and use it
  in `processCitations`. [API change]
* Add tests from the pandoc-citeproc package as command tests (including
  some tests pandoc-citeproc did not pass).
* Remove instructions for building pandoc-citeproc from CI and
  release binary build instructions.  We will no longer distribute
  pandoc-citeproc.
* Markdown reader: tweak abbreviation support.  Don't insert a
  nonbreaking space after a potential abbreviation if it comes right before
  a note or citation.  This messes up several things, including citeproc's
  moving of note citations.
* Add `csljson` as and input and output format. This allows pandoc
  to convert between `csljson` and other bibliography formats,
  and to generate formatted versions of CSL JSON bibliographies.
* Add module Text.Pandoc.Writers.CslJson, exporting `writeCslJson`. [API
  change]
* Add module Text.Pandoc.Readers.CslJson, exporting `readCslJson`. [API
  change]
* Added `bibtex`, `biblatex` as input formats.  This allows pandoc
  to convert between BibLaTeX and BibTeX and other bibliography formats,
  and to generated formatted versions of BibTeX/BibLaTeX bibliographies.
* Add module Text.Pandoc.Readers.BibTeX, exporting `readBibTeX` and
  `readBibLaTeX`. [API change]
* Make "standalone" implicit if output format is a bibliography format.
  This is needed because pandoc readers for bibliography formats put
  the bibliographic information in the `references` field of metadata;
  and unless standalone is specified, metadata gets ignored.
  (TODO: This needs improvement. We should trigger standalone for the
  reader when the input format is bibliographic, and for the writer
  when the output format is markdown.)
* Carry over `citationNoteNum` to `citationNoteNumber`.  This was just
  ignored in pandoc-citeproc.
* Text.Pandoc.Filter: Add `CiteprocFilter` constructor to Filter.
  [API change] This runs the processCitations transformation.
  We need to treat it like a filter so it can be placed
  in the sequence of filter runs (after some, before others).
  In FromYAML, this is parsed from `citeproc` or `{type: citeproc}`,
  so this special filter may be specified either way in a defaults file
  (or by `citeproc: true`, though this gives no control of positioning
  relative to other filters).  TODO: we need to add something to the
  manual section on defaults files for this.
* Add deprecation warning if `upandoc-citeproc` filter is used.
* Add `--citeproc/-C` option to trigger citation processing.
  This behaves like a filter and will be positioned
  relative to filters as they appear on the command line.
* Rewrote the manual on citatations, adding a dedicated Citations
  section which also includes some information formerly found in
  the pandoc-citeproc man page.
* Look for CSL styles in the `csl` subdirectory of the pandoc user data
  directory.  This changes the old pandoc-citeproc behavior, which looked
  in `~/.csl`.  Users can simply symlink `~/.csl` to the `csl`
  subdirectory of their pandoc user data directory if they want
  the old behavior.
* Add support for CSL bibliography entry formatting to LaTeX, HTML,
  Ms writers.  Added CSL-related CSS to styles.html.
2020-09-21 10:15:50 -07:00

79 lines
2.5 KiB
Haskell

{-# LANGUAGE OverloadedStrings #-}
{- |
Module : Text.Pandoc.App.FormatHeuristics
Copyright : Copyright (C) 2006-2020 John MacFarlane
License : GNU GPL, version 2 or above
Maintainer : John MacFarlane <jgm@berkeley@edu>
Stability : alpha
Portability : portable
Guess the format of a file from its name.
-}
module Text.Pandoc.App.FormatHeuristics
( formatFromFilePaths
) where
import Data.Char (toLower)
import Data.Text (Text)
import System.FilePath (takeExtension)
-- Determine default format based on file extensions.
formatFromFilePaths :: [FilePath] -> Maybe Text
formatFromFilePaths [] = Nothing
formatFromFilePaths (x:xs) =
case formatFromFilePath x of
Just f -> Just f
Nothing -> formatFromFilePaths xs
-- Determine format based on file extension
formatFromFilePath :: FilePath -> Maybe Text
formatFromFilePath x =
case takeExtension (map toLower x) of
".adoc" -> Just "asciidoc"
".asciidoc" -> Just "asciidoc"
".context" -> Just "context"
".ctx" -> Just "context"
".db" -> Just "docbook"
".doc" -> Just "doc" -- so we get an "unknown reader" error
".docx" -> Just "docx"
".dokuwiki" -> Just "dokuwiki"
".epub" -> Just "epub"
".fb2" -> Just "fb2"
".htm" -> Just "html"
".html" -> Just "html"
".icml" -> Just "icml"
".json" -> Just "json"
".latex" -> Just "latex"
".lhs" -> Just "markdown+lhs"
".ltx" -> Just "latex"
".markdown" -> Just "markdown"
".md" -> Just "markdown"
".ms" -> Just "ms"
".muse" -> Just "muse"
".native" -> Just "native"
".odt" -> Just "odt"
".opml" -> Just "opml"
".org" -> Just "org"
".pdf" -> Just "pdf" -- so we get an "unknown reader" error
".pptx" -> Just "pptx"
".roff" -> Just "ms"
".rst" -> Just "rst"
".rtf" -> Just "rtf"
".s5" -> Just "s5"
".t2t" -> Just "t2t"
".tei" -> Just "tei"
".tei.xml" -> Just "tei"
".tex" -> Just "latex"
".texi" -> Just "texinfo"
".texinfo" -> Just "texinfo"
".text" -> Just "markdown"
".textile" -> Just "textile"
".txt" -> Just "markdown"
".wiki" -> Just "mediawiki"
".xhtml" -> Just "html"
".ipynb" -> Just "ipynb"
".csv" -> Just "csv"
".bib" -> Just "biblatex"
['.',y] | y `elem` ['1'..'9'] -> Just "man"
_ -> Nothing