+ Changed 'web2markdown' to 'html2markdown'.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@309 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2006-12-29 18:50:13 +00:00
parent eea359203a
commit 3491420b53
7 changed files with 26 additions and 26 deletions

View file

@ -25,7 +25,7 @@ EXECSBASE := $(shell sed -ne 's/^[Ee]xecutable:[[:space:]]*//p' $(CABAL).in)
#-------------------------------------------------------------------------------
# Install targets
#-------------------------------------------------------------------------------
WRAPPERS := web2markdown markdown2pdf
WRAPPERS := html2markdown markdown2pdf
# Add .exe extensions if we're running Windows/Cygwin.
EXTENSION := $(shell uname | tr '[:upper:]' '[:lower:]' | \
sed -ne 's/^cygwin.*$$/\.exe/p')

18
README
View file

@ -38,14 +38,14 @@ Requirements
The `pandoc` program itself does not depend on any external libraries
or programs.
The wrapper script `web2markdown` requires
The wrapper script `html2markdown` requires
- `pandoc` (which must be in the PATH)
- a POSIX-compliant shell (installed by default on all linux and unix
systems, including Mac OS X, and in [Cygwin] for Windows),
- `HTML Tidy`
- `iconv` (for character encoding conversion). (If `iconv` is absent,
`web2markdown` will still work, but it will treat everything as UTF-8.)
`html2markdown` will still work, but it will treat everything as UTF-8.)
[Cygwin]: http://www.cygwin.com/
[HTML Tidy]: http://tidy.sourceforge.net/
@ -117,7 +117,7 @@ But for simple documents it should be adequate. The `latex` and `html`
readers are also limited in what they can do. Because the `html`
reader is picky about the HTML it parses, it is recommended that you
pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
`web2markdown` script described below.
`html2markdown` script described below.
If you don't specify a reader or writer explicitly, `pandoc` will
try to determine the input and output format from the extensions of
@ -151,10 +151,10 @@ The shell scripts (described below) automatically convert the input
from the local encoding to UTF-8 before running them through `pandoc`,
then convert the output back to the local encoding.
`markdown2pdf` and `web2markdown`
=================================
`markdown2pdf` and `html2markdown`
==================================
Two shell scripts, `markdown2pdf` and `web2markdown`, are included in
Two shell scripts, `markdown2pdf` and `html2markdown`, are included in
the standard Pandoc installation. (They are not included in the Windows
binary package, as they require a POSIX shell, but they may be used
in Windows under Cygwin.)
@ -175,19 +175,19 @@ in Windows under Cygwin.)
If no input file is specified, input will be taken from STDIN.
2. `web2markdown` grabs a web page from a file or URL and converts
2. `html2markdown` grabs a web page from a file or URL and converts
it to markdown-formatted text, using `tidy` and `pandoc`.
Unless input is from STDIN, an attempt is made to determine the
character encoding of the page from the "Content-type" meta tag.
If this is not present, UTF-8 is assumed. Alternatively, a character
encoding may be specified explicitly using the `-e` option.
`web2markdown` searches for an available program (`wget`, `curl`,
`html2markdown` searches for an available program (`wget`, `curl`,
or a text-mode browser) to fetch the contents of a URL.
Optionally, the `-g` command may be used to specify the command
to be used:
web2markdown -g 'wget --user=foo --password=bar' mysite.com
html2markdown -g 'wget --user=foo --password=bar' mysite.com
Command-line options
====================

View file

@ -1,22 +1,22 @@
.TH WEB2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
.SH NAME
web2markdown \- converts HTML to markdown-formatted text
html2markdown \- converts HTML to markdown-formatted text
.SH SYNOPSIS
\fBweb2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
.SH DESCRIPTION
\fBweb2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
from STDIN) from HTML to markdown\-formatted plain text.
If a URL is specified, \fBweb2markdown\fR uses an available program
If a URL is specified, \fBhtml2markdown\fR uses an available program
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
to STDOUT unless an output file is specified using the \fB\-o\fR
option.
.PP
\fBweb2markdown\fR uses the character encoding specified in the
\fBhtml2markdown\fR uses the character encoding specified in the
"Content-type" meta tag. If this is not present, or if input comes
from STDIN, UTF-8 is assumed. A character encoding may be specified
explicitly using the \fB\-e\fR option.
.PP
\fBweb2markdown\fR is a wrapper for \fBpandoc\fR.
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
.SH OPTIONS
.TP
.B \-s, \-\-standalone
@ -62,17 +62,17 @@ Assume the character encoding \fIencoding\fR in reading HTML.
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
available encodings may be obtained using `\fBiconv \-l\fR'.)
If the \fB\-e\fR option is not specified and input is not from
STDIN, \fBweb2markdown\fR will try to extract the character encoding
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
from the "Content-type" meta tag. If no character encoding is
specified in this way, or if input is from STDIN, UTF-8 will be
assumed.
.TP
.B \-g \fIcommand\fR
Use \fIcommand\fR to fetch the contents of a URL. (By default,
\fBweb2markdown\fR searches for an available program or text-based
\fBhtml2markdown\fR searches for an available program or text-based
browser to fetch the contents of a URL.) For example:
.IP
web2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
.SH "SEE ALSO"
\fBpandoc\fR(1),

View file

@ -41,7 +41,7 @@ and output through \fBiconv\fR:
.PP
\fIPandoc\fR's HTML parser is not very forgiving. If your input is
HTML, consider running it through \fBtidy\fR(1) before passing it
to Pandoc. Or use \fBweb2markdown\fR(1), a wrapper around \fBpandoc\fR.
to Pandoc. Or use \fBhtml2markdown\fR(1), a wrapper around \fBpandoc\fR.
.SH OPTIONS
.TP
@ -151,7 +151,7 @@ Print version.
Show usage message.
.SH "SEE ALSO"
\fBweb2markdown\fR(1),
\fBhtml2markdown\fR(1),
\fBmarkdown2pdf\fR(1).
The
.I README

View file

@ -72,7 +72,7 @@ grabber=
while [ $# -gt 0 ]; do
case "$1" in
-h|--help)
pandoc -h 2>&1 | sed -e 's/pandoc/web2markdown/' \
pandoc -h 2>&1 | sed -e 's/pandoc/html2markdown/' \
-e '/^[[:space:]]*\(-f\|-t\|-S\|-N\|-m\|-i\|-c\|-T\|-D\|-d\)/,/./d'\
1>&2
err " -e ENCODING, --encoding=ENCODING"
@ -81,7 +81,7 @@ while [ $# -gt 0 ]; do
err " Specify command to be used to grab contents of URL"
exit 0 ;;
-v|--version)
pandoc -v 2>&1 | sed -e 's/pandoc/web2markdown/' 1>&2
pandoc -v 2>&1 | sed -e 's/pandoc/html2markdown/' 1>&2
exit 0 ;;
-e)
shift

View file

@ -14,7 +14,7 @@ pandoc -s README.tex -o demo0.txt
pandoc -s -w rst README -o demo0.txt
pandoc -s README -o demo0.rtf
pandoc -s -m -i -w s5 S5DEMO -o demo0.html
web2markdown http://www.gnu.org/software/make/ -o demo0.txt
html2markdown http://www.gnu.org/software/make/ -o demo0.txt
markdown2pdf README -o demo0.pdf
markdown2pdf -C myheader.tex README -o demo0.pdf'

View file

@ -35,7 +35,7 @@ you should extract from the zip archive and put somewhere in your
PATH). See the included file `README-WINDOWS.txt` for instructions
on using the program. Note: If you use [Cygwin], we recommend that
you compile Pandoc from source. This will give you access to the
wrapper scripts `markdown2pdf` and `web2markdown`, which are not
wrapper scripts `markdown2pdf` and `html2markdown`, which are not
included in the Windows binary package.
[`@TARBALL_NAME@`]: http://pandoc.googlecode.com/files/@TARBALL_NAME@