+ Changed 'web2markdown' to 'html2markdown'.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@309 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2006-12-29 18:50:13 +00:00
parent eea359203a
commit 3491420b53
7 changed files with 26 additions and 26 deletions

View file

@ -25,7 +25,7 @@ EXECSBASE := $(shell sed -ne 's/^[Ee]xecutable:[[:space:]]*//p' $(CABAL).in)
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
# Install targets # Install targets
#------------------------------------------------------------------------------- #-------------------------------------------------------------------------------
WRAPPERS := web2markdown markdown2pdf WRAPPERS := html2markdown markdown2pdf
# Add .exe extensions if we're running Windows/Cygwin. # Add .exe extensions if we're running Windows/Cygwin.
EXTENSION := $(shell uname | tr '[:upper:]' '[:lower:]' | \ EXTENSION := $(shell uname | tr '[:upper:]' '[:lower:]' | \
sed -ne 's/^cygwin.*$$/\.exe/p') sed -ne 's/^cygwin.*$$/\.exe/p')

18
README
View file

@ -38,14 +38,14 @@ Requirements
The `pandoc` program itself does not depend on any external libraries The `pandoc` program itself does not depend on any external libraries
or programs. or programs.
The wrapper script `web2markdown` requires The wrapper script `html2markdown` requires
- `pandoc` (which must be in the PATH) - `pandoc` (which must be in the PATH)
- a POSIX-compliant shell (installed by default on all linux and unix - a POSIX-compliant shell (installed by default on all linux and unix
systems, including Mac OS X, and in [Cygwin] for Windows), systems, including Mac OS X, and in [Cygwin] for Windows),
- `HTML Tidy` - `HTML Tidy`
- `iconv` (for character encoding conversion). (If `iconv` is absent, - `iconv` (for character encoding conversion). (If `iconv` is absent,
`web2markdown` will still work, but it will treat everything as UTF-8.) `html2markdown` will still work, but it will treat everything as UTF-8.)
[Cygwin]: http://www.cygwin.com/ [Cygwin]: http://www.cygwin.com/
[HTML Tidy]: http://tidy.sourceforge.net/ [HTML Tidy]: http://tidy.sourceforge.net/
@ -117,7 +117,7 @@ But for simple documents it should be adequate. The `latex` and `html`
readers are also limited in what they can do. Because the `html` readers are also limited in what they can do. Because the `html`
reader is picky about the HTML it parses, it is recommended that you reader is picky about the HTML it parses, it is recommended that you
pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
`web2markdown` script described below. `html2markdown` script described below.
If you don't specify a reader or writer explicitly, `pandoc` will If you don't specify a reader or writer explicitly, `pandoc` will
try to determine the input and output format from the extensions of try to determine the input and output format from the extensions of
@ -151,10 +151,10 @@ The shell scripts (described below) automatically convert the input
from the local encoding to UTF-8 before running them through `pandoc`, from the local encoding to UTF-8 before running them through `pandoc`,
then convert the output back to the local encoding. then convert the output back to the local encoding.
`markdown2pdf` and `web2markdown` `markdown2pdf` and `html2markdown`
================================= ==================================
Two shell scripts, `markdown2pdf` and `web2markdown`, are included in Two shell scripts, `markdown2pdf` and `html2markdown`, are included in
the standard Pandoc installation. (They are not included in the Windows the standard Pandoc installation. (They are not included in the Windows
binary package, as they require a POSIX shell, but they may be used binary package, as they require a POSIX shell, but they may be used
in Windows under Cygwin.) in Windows under Cygwin.)
@ -175,19 +175,19 @@ in Windows under Cygwin.)
If no input file is specified, input will be taken from STDIN. If no input file is specified, input will be taken from STDIN.
2. `web2markdown` grabs a web page from a file or URL and converts 2. `html2markdown` grabs a web page from a file or URL and converts
it to markdown-formatted text, using `tidy` and `pandoc`. it to markdown-formatted text, using `tidy` and `pandoc`.
Unless input is from STDIN, an attempt is made to determine the Unless input is from STDIN, an attempt is made to determine the
character encoding of the page from the "Content-type" meta tag. character encoding of the page from the "Content-type" meta tag.
If this is not present, UTF-8 is assumed. Alternatively, a character If this is not present, UTF-8 is assumed. Alternatively, a character
encoding may be specified explicitly using the `-e` option. encoding may be specified explicitly using the `-e` option.
`web2markdown` searches for an available program (`wget`, `curl`, `html2markdown` searches for an available program (`wget`, `curl`,
or a text-mode browser) to fetch the contents of a URL. or a text-mode browser) to fetch the contents of a URL.
Optionally, the `-g` command may be used to specify the command Optionally, the `-g` command may be used to specify the command
to be used: to be used:
web2markdown -g 'wget --user=foo --password=bar' mysite.com html2markdown -g 'wget --user=foo --password=bar' mysite.com
Command-line options Command-line options
==================== ====================

View file

@ -1,22 +1,22 @@
.TH WEB2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals" .TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
.SH NAME .SH NAME
web2markdown \- converts HTML to markdown-formatted text html2markdown \- converts HTML to markdown-formatted text
.SH SYNOPSIS .SH SYNOPSIS
\fBweb2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR] \fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
.SH DESCRIPTION .SH DESCRIPTION
\fBweb2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text \fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
from STDIN) from HTML to markdown\-formatted plain text. from STDIN) from HTML to markdown\-formatted plain text.
If a URL is specified, \fBweb2markdown\fR uses an available program If a URL is specified, \fBhtml2markdown\fR uses an available program
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent (e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
to STDOUT unless an output file is specified using the \fB\-o\fR to STDOUT unless an output file is specified using the \fB\-o\fR
option. option.
.PP .PP
\fBweb2markdown\fR uses the character encoding specified in the \fBhtml2markdown\fR uses the character encoding specified in the
"Content-type" meta tag. If this is not present, or if input comes "Content-type" meta tag. If this is not present, or if input comes
from STDIN, UTF-8 is assumed. A character encoding may be specified from STDIN, UTF-8 is assumed. A character encoding may be specified
explicitly using the \fB\-e\fR option. explicitly using the \fB\-e\fR option.
.PP .PP
\fBweb2markdown\fR is a wrapper for \fBpandoc\fR. \fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
.SH OPTIONS .SH OPTIONS
.TP .TP
.B \-s, \-\-standalone .B \-s, \-\-standalone
@ -62,17 +62,17 @@ Assume the character encoding \fIencoding\fR in reading HTML.
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of (Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
available encodings may be obtained using `\fBiconv \-l\fR'.) available encodings may be obtained using `\fBiconv \-l\fR'.)
If the \fB\-e\fR option is not specified and input is not from If the \fB\-e\fR option is not specified and input is not from
STDIN, \fBweb2markdown\fR will try to extract the character encoding STDIN, \fBhtml2markdown\fR will try to extract the character encoding
from the "Content-type" meta tag. If no character encoding is from the "Content-type" meta tag. If no character encoding is
specified in this way, or if input is from STDIN, UTF-8 will be specified in this way, or if input is from STDIN, UTF-8 will be
assumed. assumed.
.TP .TP
.B \-g \fIcommand\fR .B \-g \fIcommand\fR
Use \fIcommand\fR to fetch the contents of a URL. (By default, Use \fIcommand\fR to fetch the contents of a URL. (By default,
\fBweb2markdown\fR searches for an available program or text-based \fBhtml2markdown\fR searches for an available program or text-based
browser to fetch the contents of a URL.) For example: browser to fetch the contents of a URL.) For example:
.IP .IP
web2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
.SH "SEE ALSO" .SH "SEE ALSO"
\fBpandoc\fR(1), \fBpandoc\fR(1),

View file

@ -41,7 +41,7 @@ and output through \fBiconv\fR:
.PP .PP
\fIPandoc\fR's HTML parser is not very forgiving. If your input is \fIPandoc\fR's HTML parser is not very forgiving. If your input is
HTML, consider running it through \fBtidy\fR(1) before passing it HTML, consider running it through \fBtidy\fR(1) before passing it
to Pandoc. Or use \fBweb2markdown\fR(1), a wrapper around \fBpandoc\fR. to Pandoc. Or use \fBhtml2markdown\fR(1), a wrapper around \fBpandoc\fR.
.SH OPTIONS .SH OPTIONS
.TP .TP
@ -151,7 +151,7 @@ Print version.
Show usage message. Show usage message.
.SH "SEE ALSO" .SH "SEE ALSO"
\fBweb2markdown\fR(1), \fBhtml2markdown\fR(1),
\fBmarkdown2pdf\fR(1). \fBmarkdown2pdf\fR(1).
The The
.I README .I README

View file

@ -72,7 +72,7 @@ grabber=
while [ $# -gt 0 ]; do while [ $# -gt 0 ]; do
case "$1" in case "$1" in
-h|--help) -h|--help)
pandoc -h 2>&1 | sed -e 's/pandoc/web2markdown/' \ pandoc -h 2>&1 | sed -e 's/pandoc/html2markdown/' \
-e '/^[[:space:]]*\(-f\|-t\|-S\|-N\|-m\|-i\|-c\|-T\|-D\|-d\)/,/./d'\ -e '/^[[:space:]]*\(-f\|-t\|-S\|-N\|-m\|-i\|-c\|-T\|-D\|-d\)/,/./d'\
1>&2 1>&2
err " -e ENCODING, --encoding=ENCODING" err " -e ENCODING, --encoding=ENCODING"
@ -81,7 +81,7 @@ while [ $# -gt 0 ]; do
err " Specify command to be used to grab contents of URL" err " Specify command to be used to grab contents of URL"
exit 0 ;; exit 0 ;;
-v|--version) -v|--version)
pandoc -v 2>&1 | sed -e 's/pandoc/web2markdown/' 1>&2 pandoc -v 2>&1 | sed -e 's/pandoc/html2markdown/' 1>&2
exit 0 ;; exit 0 ;;
-e) -e)
shift shift

View file

@ -14,7 +14,7 @@ pandoc -s README.tex -o demo0.txt
pandoc -s -w rst README -o demo0.txt pandoc -s -w rst README -o demo0.txt
pandoc -s README -o demo0.rtf pandoc -s README -o demo0.rtf
pandoc -s -m -i -w s5 S5DEMO -o demo0.html pandoc -s -m -i -w s5 S5DEMO -o demo0.html
web2markdown http://www.gnu.org/software/make/ -o demo0.txt html2markdown http://www.gnu.org/software/make/ -o demo0.txt
markdown2pdf README -o demo0.pdf markdown2pdf README -o demo0.pdf
markdown2pdf -C myheader.tex README -o demo0.pdf' markdown2pdf -C myheader.tex README -o demo0.pdf'

View file

@ -35,7 +35,7 @@ you should extract from the zip archive and put somewhere in your
PATH). See the included file `README-WINDOWS.txt` for instructions PATH). See the included file `README-WINDOWS.txt` for instructions
on using the program. Note: If you use [Cygwin], we recommend that on using the program. Note: If you use [Cygwin], we recommend that
you compile Pandoc from source. This will give you access to the you compile Pandoc from source. This will give you access to the
wrapper scripts `markdown2pdf` and `web2markdown`, which are not wrapper scripts `markdown2pdf` and `html2markdown`, which are not
included in the Windows binary package. included in the Windows binary package.
[`@TARBALL_NAME@`]: http://pandoc.googlecode.com/files/@TARBALL_NAME@ [`@TARBALL_NAME@`]: http://pandoc.googlecode.com/files/@TARBALL_NAME@