From 997ea5ea1d02e31bb8a9b03e3db26684cc81ac59 Mon Sep 17 00:00:00 2001 From: fiddlosopher Date: Sat, 6 Feb 2010 18:55:28 +0000 Subject: [PATCH] Removed html2markdown and hsmarkdown. html2markdown is no longer needed, since you can pass URI arguments to pandoc and directly convert web pages. (Note, however, that pandoc assumes the pages are UTF8. html2markdown made an attempt to guess the encoding and convert them.) hsmarkdown is pointless -- a large executable that could be replaced by 'pandoc --strict'. git-svn-id: https://pandoc.googlecode.com/svn/trunk@1834 788f1e2b-df1e-0410-8736-df70ead52e1b --- README | 106 +++++------------ Setup.hs | 4 +- html2markdown | 221 ------------------------------------ man/man1/hsmarkdown.1.md | 42 ------- man/man1/html2markdown.1.md | 95 ---------------- pandoc.cabal | 19 +--- src/hsmarkdown.hs | 47 -------- 7 files changed, 36 insertions(+), 498 deletions(-) delete mode 100755 html2markdown delete mode 100644 man/man1/hsmarkdown.1.md delete mode 100644 man/man1/html2markdown.1.md delete mode 100644 src/hsmarkdown.hs diff --git a/README b/README index b8c9db03e..dc7d1f63c 100644 --- a/README +++ b/README @@ -127,92 +127,49 @@ will convert `source.txt` from the local encoding to UTF-8, then convert it to HTML, then convert back to the local encoding, putting the output in `output.html`. -The wrapper scripts (described below) automatically convert the input -from the local encoding to UTF-8 before running them through `pandoc`, -then convert the output back to the local encoding. - Wrappers ======== -Three wrapper scripts, `markdown2pdf`, `html2markdown`, and -`hsmarkdown`, are included in the standard Pandoc installation. (The -Windows binary package does not include `html2markdown`, which is -a POSIX shell script. It does include portable Haskell versions of -`markdown2pdf` and `hsmarkdown`.) +`markdown2pdf` +-------------- -1. `markdown2pdf` produces a PDF file from markdown-formatted - text, using `pandoc` and `pdflatex`. The default - behavior of `markdown2pdf` is to create a file with the same - base name as the first argument and the extension `pdf`; thus, - for example, +The standard Pandoc installation includes `markdown2pdf`, a wrapper +around `pandoc` and `pdflatex` that produces PDFs directly from markdown +sources. The default behavior of `markdown2pdf` is to create a file with +the same base name as the first argument and the extension `pdf`; thus, +for example, - markdown2pdf sample.txt endnotes.txt + markdown2pdf sample.txt endnotes.txt - will produce `sample.pdf`. (If `sample.pdf` exists already, - it will be backed up before being overwritten.) An output file - name can be specified explicitly using the `-o` option: +will produce `sample.pdf`. (If `sample.pdf` exists already, +it will be backed up before being overwritten.) An output file +name can be specified explicitly using the `-o` option: - markdown2pdf -o book.pdf chap1 chap2 + markdown2pdf -o book.pdf chap1 chap2 - If no input file is specified, input will be taken from stdin. - All of `pandoc`'s options will work with `markdown2pdf` as well. +If no input file is specified, input will be taken from stdin. +All of `pandoc`'s options will work with `markdown2pdf` as well. - `markdown2pdf` assumes that `pdflatex` is in the path. It also - assumes that the following LaTeX packages are available: - `unicode`, `fancyhdr` (if you have verbatim text in footnotes), - `graphicx` (if you use images), `array` (if you use tables), - and `ulem` (if you use strikeout text). If they are not already - included in your LaTeX distribution, you can get them from - [CTAN]. A full [TeX Live] or [MacTeX] distribution will have all of - these packages. +`markdown2pdf` assumes that `pdflatex` is in the path. It also +assumes that the following LaTeX packages are available: +`unicode`, `fancyhdr` (if you have verbatim text in footnotes), +`graphicx` (if you use images), `array` (if you use tables), +and `ulem` (if you use strikeout text). If they are not already +included in your LaTeX distribution, you can get them from +[CTAN]. A full [TeX Live] or [MacTeX] distribution will have all of +these packages. -2. `html2markdown` grabs a web page from a file or URL and converts - it to markdown-formatted text, using `tidy` and `pandoc`. +`hsmarkdown` +------------ - All of `pandoc`'s options will work with `html2markdown` as well. - In addition, the following special options may be used. - The special options must be separated from the `html2markdown` - command and any regular Pandoc options by the delimiter `--`: - - html2markdown -o out.txt -- -e latin1 -g curl google.com - - The `-e` or `--encoding` option specifies the character encoding - of the HTML input. If this option is not specified, and input - is not from stdin, `html2markdown` will attempt to determine the - page's character encoding from the "Content-type" meta tag. - If this is not present, UTF-8 is assumed. - - The `-g` or `--grabber` option specifies the command to be used to - fetch the contents of a URL: - - html2markdown -g 'curl --user foo:bar' www.mysite.com - - If this option is not specified, `html2markdown` searches for an - available program (`wget`, `curl`, or a text-mode browser) to fetch - the contents of a URL. - - `html2markdown` requires [HTML Tidy], which must be in the path. - It uses [`iconv`] for character encoding conversions; if `iconv` - is absent, it will still work, but it will treat everything as UTF-8. - -3. `hsmarkdown` is designed to be used as a drop-in replacement for - `Markdown.pl`. It forces `pandoc` to convert from markdown to - HTML, and to use the `--strict` flag for maximal compliance with - official markdown syntax. (All of Pandoc's syntax extensions and - variants, described below, are disabled.) No other command-line - options are allowed. (In fact, options will be interpreted as - filenames.) - - As an alternative to using the `hsmarkdown` script, the - user may create a symbolic link to `pandoc` called `hsmarkdown`. - When invoked under the name `hsmarkdown`, `pandoc` will behave - as if the `--strict` flag had been selected, and no command-line - options will be recognized. However, this approach does not work - under Cygwin, due to problems with its simulation of symbolic - links. +A user who wants a drop-in replacement for `Markdown.pl` may create +a symbolic link to the `pandoc` executable called `hsmarkdown`. When +invoked under the name `hsmarkdown`, `pandoc` will behave as if the +`--strict` flag had been selected, and no command-line options will be +recognized. However, this approach does not work under Cygwin, due to +problems with its simulation of symbolic links. [Cygwin]: http://www.cygwin.com/ -[HTML Tidy]: http://tidy.sourceforge.net/ [`iconv`]: http://www.gnu.org/software/libiconv/ [CTAN]: http://www.ctan.org "Comprehensive TeX Archive Network" [TeX Live]: http://www.tug.org/texlive/ @@ -562,8 +519,7 @@ Pandoc's markdown vs. standard markdown In parsing markdown, Pandoc departs from and extends [standard markdown] in a few respects. Except where noted, these differences can -be suppressed by specifying the `--strict` command-line option or by -using the `hsmarkdown` wrapper. +be suppressed by specifying the `--strict` command-line option. [standard markdown]: http://daringfireball.net/projects/markdown/syntax "Markdown syntax description" diff --git a/Setup.hs b/Setup.hs index bd48dbe6e..7284202f2 100644 --- a/Setup.hs +++ b/Setup.hs @@ -51,7 +51,7 @@ makeManPages :: Args -> BuildFlags -> PackageDescription -> LocalBuildInfo -> IO makeManPages _ flags _ _ = mapM_ (makeManPage (fromFlag $ buildVerbosity flags)) manpages manpages :: [FilePath] -manpages = ["pandoc.1", "hsmarkdown.1", "html2markdown.1", "markdown2pdf.1"] +manpages = ["pandoc.1", "markdown2pdf.1"] manDir :: FilePath manDir = "man" "man1" @@ -80,7 +80,7 @@ installScripts pkg lbi verbosity copy = (zip (repeat ".") (wrappers \\ exes)) where exes = map exeName $ filter isBuildable $ executables pkg isBuildable = buildable . buildInfo - wrappers = ["html2markdown", "hsmarkdown", "markdown2pdf"] + wrappers = ["markdown2pdf"] installManpages :: PackageDescription -> LocalBuildInfo -> Verbosity -> CopyDest -> IO () diff --git a/html2markdown b/html2markdown deleted file mode 100755 index 0649e0478..000000000 --- a/html2markdown +++ /dev/null @@ -1,221 +0,0 @@ -#!/bin/sh -e -# converts HTML from a URL, file, or stdin to markdown -# uses an available program to fetch URL and tidy to normalize it first - -REQUIRED="tidy" -SYNOPSIS="converts HTML from a URL, file, or STDIN to markdown-formatted text." - -THIS=${0##*/} - -NEWLINE=' -' - -err () { echo "$*" | fold -s -w ${COLUMNS:-110} >&2; } -errn () { printf "$*" | fold -s -w ${COLUMNS:-110} >&2; } - -usage () { - err "$1 - $2" # short description - err "See the $1(1) man page for usage." -} - -# Portable which(1). -pathfind () { - oldifs="$IFS"; IFS=':' - for _p in $PATH; do - if [ -x "$_p/$*" ] && [ -f "$_p/$*" ]; then - IFS="$oldifs" - return 0 - fi - done - IFS="$oldifs" - return 1 -} - -for p in pandoc $REQUIRED; do - pathfind $p || { - err "You need '$p' to use this program!" - exit 1 - } -done - -CONF=$(pandoc --dump-args "$@" 2>&1) || { - errcode=$? - echo "$CONF" | sed -e '/^pandoc \[OPTIONS\] \[FILES\]/,$d' >&2 - [ $errcode -eq 2 ] && usage "$THIS" "$SYNOPSIS" - exit $errcode -} - -OUTPUT=$(echo "$CONF" | sed -ne '1p') -ARGS=$(echo "$CONF" | sed -e '1d') - - -grab_url_with () { - url="${1:?internal error: grab_url_with: url required}" - - shift - cmdline="$@" - - prog= - prog_opts= - if [ -n "$cmdline" ]; then - eval "set -- $cmdline" - prog=$1 - shift - prog_opts="$@" - fi - - if [ -z "$prog" ]; then - # Locate a sensible web grabber (note the order). - for p in wget lynx w3m curl links w3c; do - if pathfind $p; then - prog=$p - break - fi - done - - [ -n "$prog" ] || { - errn "$THIS: Couldn't find a program to fetch the file from URL " - err "(e.g. wget, w3m, lynx, w3c, or curl)." - return 1 - } - else - pathfind "$prog" || { - err "$THIS: No such web grabber '$prog' found; aborting." - return 1 - } - fi - - # Setup proper base options for known grabbers. - base_opts= - case "$prog" in - wget) base_opts="-O-" ;; - lynx) base_opts="-source" ;; - w3m) base_opts="-dump_source" ;; - curl) base_opts="" ;; - links) base_opts="-source" ;; - w3c) base_opts="-n -get" ;; - *) err "$THIS: unhandled web grabber '$prog'; hope it succeeds." - esac - - err "$THIS: invoking '$prog $base_opts $prog_opts $url'..." - eval "set -- $base_opts $prog_opts" - $prog "$@" "$url" -} - -# Parse command-line arguments -parse_arguments () { - while [ $# -gt 0 ]; do - case "$1" in - --encoding=*) - wholeopt="$1" - # extract encoding from after = - encoding="${wholeopt#*=}" ;; - -e|--encoding|-encoding) - shift - encoding="$1" ;; - --grabber=*) - wholeopt="$1" - # extract encoding from after = - grabber="\"${wholeopt#*=}\"" ;; - -g|--grabber|-grabber) - shift - grabber="$1" ;; - *) - if [ -z "$argument" ]; then - argument="$1" - else - err "Warning: extra argument '$1' will be ignored." - fi ;; - esac - shift - done -} - -argument= -encoding= -grabber= - -oldifs="$IFS" -IFS=$NEWLINE -parse_arguments $ARGS -IFS="$oldifs" - -inurl= -if [ -n "$argument" ] && ! [ -f "$argument" ]; then - # Treat given argument as an URL. - inurl="$argument" -fi - -# As a security measure refuse to proceed if mktemp is not available. -pathfind mktemp || { err "Couldn't find 'mktemp'; aborting."; exit 1; } - -# Avoid issues with /tmp directory on Windows/Cygwin -cygwin= -cygwin=$(uname | sed -ne '/^CYGWIN/p') -if [ -n "$cygwin" ]; then - TMPDIR=. - export TMPDIR -fi - -THIS_TEMPDIR= -THIS_TEMPDIR="$(mktemp -d -t $THIS.XXXXXXXX)" || exit 1 -readonly THIS_TEMPDIR - -trap 'exitcode=$? - [ -z "$THIS_TEMPDIR" ] || rm -rf "$THIS_TEMPDIR" - exit $exitcode' 0 1 2 3 13 15 - -if [ -n "$inurl" ]; then - err "Attempting to fetch file from '$inurl'..." - - grabber_out=$THIS_TEMPDIR/grabber.out - grabber_log=$THIS_TEMPDIR/grabber.log - if ! grab_url_with "$inurl" "$grabber" 1>$grabber_out 2>$grabber_log; then - errn "grab_url_with failed" - if [ -f $grabber_log ]; then - err " with the following error log." - err - cat >&2 $grabber_log - else - err . - fi - exit 1 - fi - - argument="$grabber_out" -fi - -if [ -z "$encoding" ] && [ "x$argument" != "x" ]; then - # Try to determine character encoding if not specified - # and input is not STDIN. - encoding=$( - head "$argument" | - LC_ALL=C tr 'A-Z' 'a-z' | - sed -ne '/ $htmlinput # read from STDIN -elif [ -f "$argument" ]; then - to_utf8 "$argument" > $htmlinput # read from file -else - err "File '$argument' not found." - exit 1 -fi - -if ! cat $htmlinput | pandoc --ignore-args -r html -w markdown "$@" ; then - err "Failed to parse HTML. Trying again with tidy..." - tidy -q -asxhtml -utf8 $htmlinput | \ - pandoc --ignore-args -r html -w markdown "$@" -fi diff --git a/man/man1/hsmarkdown.1.md b/man/man1/hsmarkdown.1.md deleted file mode 100644 index a197ef2ca..000000000 --- a/man/man1/hsmarkdown.1.md +++ /dev/null @@ -1,42 +0,0 @@ -% HSMARKDOWN(1) Pandoc User Manuals -% John MacFarlane -% January 8, 2008 - -# NAME - -hsmarkdown - convert markdown-formatted text to HTML - -# SYNOPSIS - -hsmarkdown [*input-file*]... - -# DESCRIPTION - -`hsmarkdown` converts markdown-formatted text to HTML. It is designed -to be usable as a drop-in replacement for John Gruber's `Markdown.pl`. - -If no *input-file* is specified, input is read from *stdin*. -Otherwise, the *input-files* are concatenated (with a blank -line between each) and used as input. Output goes to *stdout* by -default. For output to a file, use shell redirection: - - hsmarkdown input.txt > output.html - -`hsmarkdown` uses the UTF-8 character encoding for both input and output. -If your local character encoding is not UTF-8, you should pipe input -and output through `iconv`: - - iconv -t utf-8 input.txt | hsmarkdown | iconv -f utf-8 - -`hsmarkdown` is implemented as a wrapper around `pandoc`(1). It -calls `pandoc` with the options `--from markdown --to html ---strict` and disables all other options. (Command-line options -will be interpreted as filenames, as they are by `Markdown.pl`.) - -# SEE ALSO - -`pandoc`(1). The *README* -file distributed with Pandoc contains full documentation. - -The Pandoc source code and all documentation may be downloaded from -. diff --git a/man/man1/html2markdown.1.md b/man/man1/html2markdown.1.md deleted file mode 100644 index 73e3420dd..000000000 --- a/man/man1/html2markdown.1.md +++ /dev/null @@ -1,95 +0,0 @@ -% HTML2MARKDOWN(1) Pandoc User Manuals -% John MacFarlane and Recai Oktas -% January 8, 2008 - -# NAME - -html2markdown - converts HTML to markdown-formatted text - -# SYNOPSIS - -html2markdown [*pandoc-options*] [\-- *special-options*] [*input-file* or -*URL*] - -# DESCRIPTION - -`html2markdown` converts *input-file* or *URL* (or text -from *stdin*) from HTML to markdown-formatted plain text. -If a URL is specified, `html2markdown` uses an available program -(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent -to *stdout* unless an output file is specified using the `-o` -option. - -`html2markdown` uses the character encoding specified in the -"Content-type" meta tag. If this is not present, or if input comes -from *stdin*, UTF-8 is assumed. A character encoding may be specified -explicitly using the `-e` special option. - -# OPTIONS - -`html2markdown` is a wrapper for `pandoc`, so all of -`pandoc`'s options may be used. See `pandoc`(1) for -a complete list. The following options are most relevant: - --s, \--standalone -: Include title, author, and date information (if present) at the - top of markdown output. - --o *FILE*, \--output=*FILE* -: Write output to *FILE* instead of *stdout*. - -\--strict -: Use strict markdown syntax, with no extensions or variants. - -\--reference-links -: Use reference-style links, rather than inline links, in writing markdown - or reStructuredText. - --R, \--parse-raw -: Parse untranslatable HTML codes as raw HTML. - -\--no-wrap -: Disable text wrapping in output. (Default is to wrap text.) - --H *FILE*, \--include-in-header=*FILE* -: Include contents of *FILE* at the end of the header. Implies - `-s`. - --B *FILE*, \--include-before-body=*FILE* -: Include contents of *FILE* at the beginning of the document body. - --A *FILE*, \--include-after-body=*FILE* -: Include contents of *FILE* at the end of the document body. - --C *FILE*, \--custom-header=*FILE* -: Use contents of *FILE* - as the document header (overriding the default header, which can be - printed using `pandoc -D markdown`). Implies `-s`. - -# SPECIAL OPTIONS - -In addition, the following special options may be used. The special -options must be separated from the `html2markdown` command and any -regular `pandoc` options by the delimiter \``--`', as in - - html2markdown -o foo.txt -- -g 'curl -u bar:baz' -e latin1 \ - www.foo.com - --e *encoding*, \--encoding=*encoding* -: Assume the character encoding *encoding* in reading HTML. - (Note: *encoding* will be passed to `iconv`; a list of - available encodings may be obtained using `iconv -l`.) - If this option is not specified and input is not from - *stdin*, `html2markdown` will try to extract the character encoding - from the "Content-type" meta tag. If no character encoding is - specified in this way, or if input is from *stdin*, UTF-8 will be - assumed. - --g *command*, \--grabber=*command* -: Use *command* to fetch the contents of a URL. (By default, - `html2markdown` searches for an available program or text-based - browser to fetch the contents of a URL.) - -# SEE ALSO - -`pandoc`(1), `iconv`(1) diff --git a/pandoc.cabal b/pandoc.cabal index 4a2120079..57ad24b78 100644 --- a/pandoc.cabal +++ b/pandoc.cabal @@ -59,11 +59,10 @@ Data-Files: -- documentation README, INSTALL, COPYRIGHT, BUGS, changelog, -- wrappers - markdown2pdf, html2markdown, hsmarkdown + markdown2pdf Extra-Source-Files: -- sources for man pages man/man1/pandoc.1.md, man/man1/markdown2pdf.1.md, - man/man1/html2markdown.1.md, man/man1/hsmarkdown.1.md, -- tests tests/bodybg.gif, tests/writer.latex, @@ -120,8 +119,7 @@ Extra-Source-Files: tests/lhs-test.html+lhs, tests/lhs-test.fragment.html+lhs, tests/RunTests.hs -Extra-Tmp-Files: man/man1/pandoc.1, man/man1/hsmarkdown.1, - man/man1/html2markdown.1, man/man1/markdown2pdf.1 +Extra-Tmp-Files: man/man1/pandoc.1, man/man1/markdown2pdf.1 Flag highlighting Description: Compile in support for syntax highlighting of code blocks. @@ -130,7 +128,7 @@ Flag executable Description: Build the pandoc executable. Default: True Flag wrappers - Description: Build the wrappers (hsmarkdown, markdown2pdf). + Description: Build the wrappers (markdown2pdf). Default: True Flag library Description: Build the pandoc library. @@ -219,17 +217,6 @@ Executable pandoc else Buildable: False -Executable hsmarkdown - Hs-Source-Dirs: src - Main-Is: hsmarkdown.hs - Ghc-Options: -Wall -threaded - Ghc-Prof-Options: -auto-all - Extensions: CPP - if flag(wrappers) - Buildable: True - else - Buildable: False - Executable markdown2pdf Hs-Source-Dirs: src Main-Is: markdown2pdf.hs diff --git a/src/hsmarkdown.hs b/src/hsmarkdown.hs deleted file mode 100644 index 3f689d4ec..000000000 --- a/src/hsmarkdown.hs +++ /dev/null @@ -1,47 +0,0 @@ -{- -Copyright (C) 2006-8 John MacFarlane - -This program is free software; you can redistribute it and/or modify -it under the terms of the GNU General Public License as published by -the Free Software Foundation; either version 2 of the License, or -(at your option) any later version. - -This program is distributed in the hope that it will be useful, -but WITHOUT ANY WARRANTY; without even the implied warranty of -MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -GNU General Public License for more details. - -You should have received a copy of the GNU General Public License -along with this program; if not, write to the Free Software -Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA --} - -{- | - Copyright : Copyright (C) 2009 John MacFarlane - License : GNU GPL, version 2 or above - - Maintainer : John MacFarlane - Stability : alpha - Portability : portable - -Wrapper around pandoc that emulates Markdown.pl as closely as possible. --} -module Main where -import System.Process -import System.Environment ( getArgs ) --- Note: ghc >= 6.12 (base >=4.2) supports unicode through iconv --- So we use System.IO.UTF8 only if we have an earlier version -#if MIN_VERSION_base(4,2,0) -#else -import Prelude hiding ( putStr, putStrLn, writeFile, readFile, getContents ) -import System.IO.UTF8 -#endif -import Control.Monad (forM_) - -main :: IO () -main = do - files <- getArgs - let runPandoc inp = readProcess "pandoc" ["--from", "markdown", "--to", "html", "--strict"] inp >>= putStrLn - if null files - then getContents >>= runPandoc - else forM_ files $ \f -> readFile f >>= runPandoc