Allow absolute URI as parameter (in this case, content is downloaded).

+ Adds dependency on HTTP.
+ If a parameter is an absolute URI, pandoc will try to
  get the content via HTTP.
+ So, you can do:  pandoc -r html -w markdown http://www.fsf.org

git-svn-id: https://pandoc.googlecode.com/svn/trunk@1826 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2010-02-02 07:37:01 +00:00
parent 19b0c72dd1
commit 9fee73d2a3
4 changed files with 28 additions and 7 deletions

11
README
View file

@ -62,12 +62,17 @@ Note that you can specify multiple input files on the command line.
`pandoc` will concatenate them all (with blank lines between them)
before parsing:
pandoc -s ch1.txt ch2.txt refs.txt > book.html
pandoc -s ch1.txt ch2.txt refs.txt > book.html
(The `-s` option here tells `pandoc` to produce a standalone HTML file,
with a proper header, rather than a fragment. For more details on this
and many other command-line options, see below.)
Instead of a filename, you can specify an absolute URI. In this
case pandoc will attempt to download the content via HTTP:
pandoc -f html -t markdown http://www.fsf.org
The format of the input and output can be specified explicitly using
command-line options. The input format can be specified using the
`-r/--read` or `-f/--from` options, the output format using the
@ -113,7 +118,9 @@ Character encodings
-------------------
All input is assumed to be in the UTF-8 encoding, and all output
is in UTF-8. If your local character encoding is not UTF-8 and you use
is in UTF-8 (unless your version of pandoc was compiled using
GHC 6.12 or higher, in which case the local encoding will be used).
If your local character encoding is not UTF-8 and you use
accented or foreign characters, you should pipe the input and output
through [`iconv`]. For example,

View file

@ -26,6 +26,11 @@ format). For output to a file, use the `-o` option:
pandoc -o output.html input.txt
Instead of a file, an absolute URI may be given. In this case
pandoc will fetch the content using HTTP:
pandoc -f html -t markdown http://www.fsf.org
The input and output formats may be specified using command-line options
(see **OPTIONS**, below, for details). If these formats are not
specified explicitly, Pandoc will attempt to determine them
@ -48,9 +53,10 @@ markdown: the differences are described in the *README* file in
the user documentation. If standard markdown syntax is desired, the
`--strict` option may be used.
Pandoc uses the UTF-8 character encoding for both input and output.
If your local character encoding is not UTF-8, you should pipe input
and output through `iconv`:
Pandoc uses the UTF-8 character encoding for both input and output
(unless compiled with GHC 6.12 or higher, in which case it uses
the local encoding). If your local character encoding is not UTF-8, you
should pipe input and output through `iconv`:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8

View file

@ -145,7 +145,8 @@ Library
mtl >= 1.1, network >= 2, filepath >= 1.1,
process >= 1, directory >= 1,
bytestring >= 0.9, zip-archive >= 0.1.1.4,
utf8-string >= 0.3, old-time >= 1
utf8-string >= 0.3, old-time >= 1,
HTTP >= 4000.0
if impl(ghc >= 6.10)
Build-depends: base >= 4 && < 5, syb
else

View file

@ -59,6 +59,9 @@ import Text.CSL
import Text.Pandoc.Biblio
#endif
import Control.Monad (when, unless)
import Network.HTTP
import Network.URI (parseURI)
import Data.ByteString.Lazy.UTF8 (toString)
copyrightMessage :: String
copyrightMessage = "\nCopyright (C) 2006-8 John MacFarlane\n" ++
@ -731,7 +734,11 @@ main = do
let readSources [] = mapM readSource ["-"]
readSources srcs = mapM readSource srcs
readSource "-" = getContents
readSource src = readFile src
readSource src = case parseURI src of
Just u -> readURI u
Nothing -> readFile src
readURI uri = simpleHTTP (mkRequest GET uri) >>= getResponseBody >>=
return . toString -- treat all as UTF8
let convertTabs = tabFilter (if preserveTabs then 0 else tabStop)