+ Changed 'web2markdown' to 'html2markdown'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@309 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
parent
eea359203a
commit
3491420b53
7 changed files with 26 additions and 26 deletions
2
Makefile
2
Makefile
|
@ -25,7 +25,7 @@ EXECSBASE := $(shell sed -ne 's/^[Ee]xecutable:[[:space:]]*//p' $(CABAL).in)
|
||||||
#-------------------------------------------------------------------------------
|
#-------------------------------------------------------------------------------
|
||||||
# Install targets
|
# Install targets
|
||||||
#-------------------------------------------------------------------------------
|
#-------------------------------------------------------------------------------
|
||||||
WRAPPERS := web2markdown markdown2pdf
|
WRAPPERS := html2markdown markdown2pdf
|
||||||
# Add .exe extensions if we're running Windows/Cygwin.
|
# Add .exe extensions if we're running Windows/Cygwin.
|
||||||
EXTENSION := $(shell uname | tr '[:upper:]' '[:lower:]' | \
|
EXTENSION := $(shell uname | tr '[:upper:]' '[:lower:]' | \
|
||||||
sed -ne 's/^cygwin.*$$/\.exe/p')
|
sed -ne 's/^cygwin.*$$/\.exe/p')
|
||||||
|
|
18
README
18
README
|
@ -38,14 +38,14 @@ Requirements
|
||||||
The `pandoc` program itself does not depend on any external libraries
|
The `pandoc` program itself does not depend on any external libraries
|
||||||
or programs.
|
or programs.
|
||||||
|
|
||||||
The wrapper script `web2markdown` requires
|
The wrapper script `html2markdown` requires
|
||||||
|
|
||||||
- `pandoc` (which must be in the PATH)
|
- `pandoc` (which must be in the PATH)
|
||||||
- a POSIX-compliant shell (installed by default on all linux and unix
|
- a POSIX-compliant shell (installed by default on all linux and unix
|
||||||
systems, including Mac OS X, and in [Cygwin] for Windows),
|
systems, including Mac OS X, and in [Cygwin] for Windows),
|
||||||
- `HTML Tidy`
|
- `HTML Tidy`
|
||||||
- `iconv` (for character encoding conversion). (If `iconv` is absent,
|
- `iconv` (for character encoding conversion). (If `iconv` is absent,
|
||||||
`web2markdown` will still work, but it will treat everything as UTF-8.)
|
`html2markdown` will still work, but it will treat everything as UTF-8.)
|
||||||
|
|
||||||
[Cygwin]: http://www.cygwin.com/
|
[Cygwin]: http://www.cygwin.com/
|
||||||
[HTML Tidy]: http://tidy.sourceforge.net/
|
[HTML Tidy]: http://tidy.sourceforge.net/
|
||||||
|
@ -117,7 +117,7 @@ But for simple documents it should be adequate. The `latex` and `html`
|
||||||
readers are also limited in what they can do. Because the `html`
|
readers are also limited in what they can do. Because the `html`
|
||||||
reader is picky about the HTML it parses, it is recommended that you
|
reader is picky about the HTML it parses, it is recommended that you
|
||||||
pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
|
pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
|
||||||
`web2markdown` script described below.
|
`html2markdown` script described below.
|
||||||
|
|
||||||
If you don't specify a reader or writer explicitly, `pandoc` will
|
If you don't specify a reader or writer explicitly, `pandoc` will
|
||||||
try to determine the input and output format from the extensions of
|
try to determine the input and output format from the extensions of
|
||||||
|
@ -151,10 +151,10 @@ The shell scripts (described below) automatically convert the input
|
||||||
from the local encoding to UTF-8 before running them through `pandoc`,
|
from the local encoding to UTF-8 before running them through `pandoc`,
|
||||||
then convert the output back to the local encoding.
|
then convert the output back to the local encoding.
|
||||||
|
|
||||||
`markdown2pdf` and `web2markdown`
|
`markdown2pdf` and `html2markdown`
|
||||||
=================================
|
==================================
|
||||||
|
|
||||||
Two shell scripts, `markdown2pdf` and `web2markdown`, are included in
|
Two shell scripts, `markdown2pdf` and `html2markdown`, are included in
|
||||||
the standard Pandoc installation. (They are not included in the Windows
|
the standard Pandoc installation. (They are not included in the Windows
|
||||||
binary package, as they require a POSIX shell, but they may be used
|
binary package, as they require a POSIX shell, but they may be used
|
||||||
in Windows under Cygwin.)
|
in Windows under Cygwin.)
|
||||||
|
@ -175,19 +175,19 @@ in Windows under Cygwin.)
|
||||||
|
|
||||||
If no input file is specified, input will be taken from STDIN.
|
If no input file is specified, input will be taken from STDIN.
|
||||||
|
|
||||||
2. `web2markdown` grabs a web page from a file or URL and converts
|
2. `html2markdown` grabs a web page from a file or URL and converts
|
||||||
it to markdown-formatted text, using `tidy` and `pandoc`.
|
it to markdown-formatted text, using `tidy` and `pandoc`.
|
||||||
Unless input is from STDIN, an attempt is made to determine the
|
Unless input is from STDIN, an attempt is made to determine the
|
||||||
character encoding of the page from the "Content-type" meta tag.
|
character encoding of the page from the "Content-type" meta tag.
|
||||||
If this is not present, UTF-8 is assumed. Alternatively, a character
|
If this is not present, UTF-8 is assumed. Alternatively, a character
|
||||||
encoding may be specified explicitly using the `-e` option.
|
encoding may be specified explicitly using the `-e` option.
|
||||||
|
|
||||||
`web2markdown` searches for an available program (`wget`, `curl`,
|
`html2markdown` searches for an available program (`wget`, `curl`,
|
||||||
or a text-mode browser) to fetch the contents of a URL.
|
or a text-mode browser) to fetch the contents of a URL.
|
||||||
Optionally, the `-g` command may be used to specify the command
|
Optionally, the `-g` command may be used to specify the command
|
||||||
to be used:
|
to be used:
|
||||||
|
|
||||||
web2markdown -g 'wget --user=foo --password=bar' mysite.com
|
html2markdown -g 'wget --user=foo --password=bar' mysite.com
|
||||||
|
|
||||||
Command-line options
|
Command-line options
|
||||||
====================
|
====================
|
||||||
|
|
|
@ -1,22 +1,22 @@
|
||||||
.TH WEB2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
|
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
web2markdown \- converts HTML to markdown-formatted text
|
html2markdown \- converts HTML to markdown-formatted text
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
\fBweb2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
|
\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
\fBweb2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
|
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
|
||||||
from STDIN) from HTML to markdown\-formatted plain text.
|
from STDIN) from HTML to markdown\-formatted plain text.
|
||||||
If a URL is specified, \fBweb2markdown\fR uses an available program
|
If a URL is specified, \fBhtml2markdown\fR uses an available program
|
||||||
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
|
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
|
||||||
to STDOUT unless an output file is specified using the \fB\-o\fR
|
to STDOUT unless an output file is specified using the \fB\-o\fR
|
||||||
option.
|
option.
|
||||||
.PP
|
.PP
|
||||||
\fBweb2markdown\fR uses the character encoding specified in the
|
\fBhtml2markdown\fR uses the character encoding specified in the
|
||||||
"Content-type" meta tag. If this is not present, or if input comes
|
"Content-type" meta tag. If this is not present, or if input comes
|
||||||
from STDIN, UTF-8 is assumed. A character encoding may be specified
|
from STDIN, UTF-8 is assumed. A character encoding may be specified
|
||||||
explicitly using the \fB\-e\fR option.
|
explicitly using the \fB\-e\fR option.
|
||||||
.PP
|
.PP
|
||||||
\fBweb2markdown\fR is a wrapper for \fBpandoc\fR.
|
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
|
||||||
.SH OPTIONS
|
.SH OPTIONS
|
||||||
.TP
|
.TP
|
||||||
.B \-s, \-\-standalone
|
.B \-s, \-\-standalone
|
||||||
|
@ -62,17 +62,17 @@ Assume the character encoding \fIencoding\fR in reading HTML.
|
||||||
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
|
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
|
||||||
available encodings may be obtained using `\fBiconv \-l\fR'.)
|
available encodings may be obtained using `\fBiconv \-l\fR'.)
|
||||||
If the \fB\-e\fR option is not specified and input is not from
|
If the \fB\-e\fR option is not specified and input is not from
|
||||||
STDIN, \fBweb2markdown\fR will try to extract the character encoding
|
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
|
||||||
from the "Content-type" meta tag. If no character encoding is
|
from the "Content-type" meta tag. If no character encoding is
|
||||||
specified in this way, or if input is from STDIN, UTF-8 will be
|
specified in this way, or if input is from STDIN, UTF-8 will be
|
||||||
assumed.
|
assumed.
|
||||||
.TP
|
.TP
|
||||||
.B \-g \fIcommand\fR
|
.B \-g \fIcommand\fR
|
||||||
Use \fIcommand\fR to fetch the contents of a URL. (By default,
|
Use \fIcommand\fR to fetch the contents of a URL. (By default,
|
||||||
\fBweb2markdown\fR searches for an available program or text-based
|
\fBhtml2markdown\fR searches for an available program or text-based
|
||||||
browser to fetch the contents of a URL.) For example:
|
browser to fetch the contents of a URL.) For example:
|
||||||
.IP
|
.IP
|
||||||
web2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
|
html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
|
||||||
|
|
||||||
.SH "SEE ALSO"
|
.SH "SEE ALSO"
|
||||||
\fBpandoc\fR(1),
|
\fBpandoc\fR(1),
|
|
@ -41,7 +41,7 @@ and output through \fBiconv\fR:
|
||||||
.PP
|
.PP
|
||||||
\fIPandoc\fR's HTML parser is not very forgiving. If your input is
|
\fIPandoc\fR's HTML parser is not very forgiving. If your input is
|
||||||
HTML, consider running it through \fBtidy\fR(1) before passing it
|
HTML, consider running it through \fBtidy\fR(1) before passing it
|
||||||
to Pandoc. Or use \fBweb2markdown\fR(1), a wrapper around \fBpandoc\fR.
|
to Pandoc. Or use \fBhtml2markdown\fR(1), a wrapper around \fBpandoc\fR.
|
||||||
|
|
||||||
.SH OPTIONS
|
.SH OPTIONS
|
||||||
.TP
|
.TP
|
||||||
|
@ -151,7 +151,7 @@ Print version.
|
||||||
Show usage message.
|
Show usage message.
|
||||||
|
|
||||||
.SH "SEE ALSO"
|
.SH "SEE ALSO"
|
||||||
\fBweb2markdown\fR(1),
|
\fBhtml2markdown\fR(1),
|
||||||
\fBmarkdown2pdf\fR(1).
|
\fBmarkdown2pdf\fR(1).
|
||||||
The
|
The
|
||||||
.I README
|
.I README
|
||||||
|
|
|
@ -72,7 +72,7 @@ grabber=
|
||||||
while [ $# -gt 0 ]; do
|
while [ $# -gt 0 ]; do
|
||||||
case "$1" in
|
case "$1" in
|
||||||
-h|--help)
|
-h|--help)
|
||||||
pandoc -h 2>&1 | sed -e 's/pandoc/web2markdown/' \
|
pandoc -h 2>&1 | sed -e 's/pandoc/html2markdown/' \
|
||||||
-e '/^[[:space:]]*\(-f\|-t\|-S\|-N\|-m\|-i\|-c\|-T\|-D\|-d\)/,/./d'\
|
-e '/^[[:space:]]*\(-f\|-t\|-S\|-N\|-m\|-i\|-c\|-T\|-D\|-d\)/,/./d'\
|
||||||
1>&2
|
1>&2
|
||||||
err " -e ENCODING, --encoding=ENCODING"
|
err " -e ENCODING, --encoding=ENCODING"
|
||||||
|
@ -81,7 +81,7 @@ while [ $# -gt 0 ]; do
|
||||||
err " Specify command to be used to grab contents of URL"
|
err " Specify command to be used to grab contents of URL"
|
||||||
exit 0 ;;
|
exit 0 ;;
|
||||||
-v|--version)
|
-v|--version)
|
||||||
pandoc -v 2>&1 | sed -e 's/pandoc/web2markdown/' 1>&2
|
pandoc -v 2>&1 | sed -e 's/pandoc/html2markdown/' 1>&2
|
||||||
exit 0 ;;
|
exit 0 ;;
|
||||||
-e)
|
-e)
|
||||||
shift
|
shift
|
|
@ -14,7 +14,7 @@ pandoc -s README.tex -o demo0.txt
|
||||||
pandoc -s -w rst README -o demo0.txt
|
pandoc -s -w rst README -o demo0.txt
|
||||||
pandoc -s README -o demo0.rtf
|
pandoc -s README -o demo0.rtf
|
||||||
pandoc -s -m -i -w s5 S5DEMO -o demo0.html
|
pandoc -s -m -i -w s5 S5DEMO -o demo0.html
|
||||||
web2markdown http://www.gnu.org/software/make/ -o demo0.txt
|
html2markdown http://www.gnu.org/software/make/ -o demo0.txt
|
||||||
markdown2pdf README -o demo0.pdf
|
markdown2pdf README -o demo0.pdf
|
||||||
markdown2pdf -C myheader.tex README -o demo0.pdf'
|
markdown2pdf -C myheader.tex README -o demo0.pdf'
|
||||||
|
|
||||||
|
|
|
@ -35,7 +35,7 @@ you should extract from the zip archive and put somewhere in your
|
||||||
PATH). See the included file `README-WINDOWS.txt` for instructions
|
PATH). See the included file `README-WINDOWS.txt` for instructions
|
||||||
on using the program. Note: If you use [Cygwin], we recommend that
|
on using the program. Note: If you use [Cygwin], we recommend that
|
||||||
you compile Pandoc from source. This will give you access to the
|
you compile Pandoc from source. This will give you access to the
|
||||||
wrapper scripts `markdown2pdf` and `web2markdown`, which are not
|
wrapper scripts `markdown2pdf` and `html2markdown`, which are not
|
||||||
included in the Windows binary package.
|
included in the Windows binary package.
|
||||||
|
|
||||||
[`@TARBALL_NAME@`]: http://pandoc.googlecode.com/files/@TARBALL_NAME@
|
[`@TARBALL_NAME@`]: http://pandoc.googlecode.com/files/@TARBALL_NAME@
|
||||||
|
|
Loading…
Reference in a new issue