+ Changed 'web2markdown' to 'html2markdown'.
git-svn-id: https://pandoc.googlecode.com/svn/trunk@309 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
parent
eea359203a
commit
3491420b53
7 changed files with 26 additions and 26 deletions
2
Makefile
2
Makefile
|
@ -25,7 +25,7 @@ EXECSBASE := $(shell sed -ne 's/^[Ee]xecutable:[[:space:]]*//p' $(CABAL).in)
|
|||
#-------------------------------------------------------------------------------
|
||||
# Install targets
|
||||
#-------------------------------------------------------------------------------
|
||||
WRAPPERS := web2markdown markdown2pdf
|
||||
WRAPPERS := html2markdown markdown2pdf
|
||||
# Add .exe extensions if we're running Windows/Cygwin.
|
||||
EXTENSION := $(shell uname | tr '[:upper:]' '[:lower:]' | \
|
||||
sed -ne 's/^cygwin.*$$/\.exe/p')
|
||||
|
|
18
README
18
README
|
@ -38,14 +38,14 @@ Requirements
|
|||
The `pandoc` program itself does not depend on any external libraries
|
||||
or programs.
|
||||
|
||||
The wrapper script `web2markdown` requires
|
||||
The wrapper script `html2markdown` requires
|
||||
|
||||
- `pandoc` (which must be in the PATH)
|
||||
- a POSIX-compliant shell (installed by default on all linux and unix
|
||||
systems, including Mac OS X, and in [Cygwin] for Windows),
|
||||
- `HTML Tidy`
|
||||
- `iconv` (for character encoding conversion). (If `iconv` is absent,
|
||||
`web2markdown` will still work, but it will treat everything as UTF-8.)
|
||||
`html2markdown` will still work, but it will treat everything as UTF-8.)
|
||||
|
||||
[Cygwin]: http://www.cygwin.com/
|
||||
[HTML Tidy]: http://tidy.sourceforge.net/
|
||||
|
@ -117,7 +117,7 @@ But for simple documents it should be adequate. The `latex` and `html`
|
|||
readers are also limited in what they can do. Because the `html`
|
||||
reader is picky about the HTML it parses, it is recommended that you
|
||||
pipe HTML through [HTML Tidy] before sending it to `pandoc`, or use the
|
||||
`web2markdown` script described below.
|
||||
`html2markdown` script described below.
|
||||
|
||||
If you don't specify a reader or writer explicitly, `pandoc` will
|
||||
try to determine the input and output format from the extensions of
|
||||
|
@ -151,10 +151,10 @@ The shell scripts (described below) automatically convert the input
|
|||
from the local encoding to UTF-8 before running them through `pandoc`,
|
||||
then convert the output back to the local encoding.
|
||||
|
||||
`markdown2pdf` and `web2markdown`
|
||||
=================================
|
||||
`markdown2pdf` and `html2markdown`
|
||||
==================================
|
||||
|
||||
Two shell scripts, `markdown2pdf` and `web2markdown`, are included in
|
||||
Two shell scripts, `markdown2pdf` and `html2markdown`, are included in
|
||||
the standard Pandoc installation. (They are not included in the Windows
|
||||
binary package, as they require a POSIX shell, but they may be used
|
||||
in Windows under Cygwin.)
|
||||
|
@ -175,19 +175,19 @@ in Windows under Cygwin.)
|
|||
|
||||
If no input file is specified, input will be taken from STDIN.
|
||||
|
||||
2. `web2markdown` grabs a web page from a file or URL and converts
|
||||
2. `html2markdown` grabs a web page from a file or URL and converts
|
||||
it to markdown-formatted text, using `tidy` and `pandoc`.
|
||||
Unless input is from STDIN, an attempt is made to determine the
|
||||
character encoding of the page from the "Content-type" meta tag.
|
||||
If this is not present, UTF-8 is assumed. Alternatively, a character
|
||||
encoding may be specified explicitly using the `-e` option.
|
||||
|
||||
`web2markdown` searches for an available program (`wget`, `curl`,
|
||||
`html2markdown` searches for an available program (`wget`, `curl`,
|
||||
or a text-mode browser) to fetch the contents of a URL.
|
||||
Optionally, the `-g` command may be used to specify the command
|
||||
to be used:
|
||||
|
||||
web2markdown -g 'wget --user=foo --password=bar' mysite.com
|
||||
html2markdown -g 'wget --user=foo --password=bar' mysite.com
|
||||
|
||||
Command-line options
|
||||
====================
|
||||
|
|
|
@ -1,22 +1,22 @@
|
|||
.TH WEB2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
|
||||
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
|
||||
.SH NAME
|
||||
web2markdown \- converts HTML to markdown-formatted text
|
||||
html2markdown \- converts HTML to markdown-formatted text
|
||||
.SH SYNOPSIS
|
||||
\fBweb2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
|
||||
\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
|
||||
.SH DESCRIPTION
|
||||
\fBweb2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
|
||||
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
|
||||
from STDIN) from HTML to markdown\-formatted plain text.
|
||||
If a URL is specified, \fBweb2markdown\fR uses an available program
|
||||
If a URL is specified, \fBhtml2markdown\fR uses an available program
|
||||
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
|
||||
to STDOUT unless an output file is specified using the \fB\-o\fR
|
||||
option.
|
||||
.PP
|
||||
\fBweb2markdown\fR uses the character encoding specified in the
|
||||
\fBhtml2markdown\fR uses the character encoding specified in the
|
||||
"Content-type" meta tag. If this is not present, or if input comes
|
||||
from STDIN, UTF-8 is assumed. A character encoding may be specified
|
||||
explicitly using the \fB\-e\fR option.
|
||||
.PP
|
||||
\fBweb2markdown\fR is a wrapper for \fBpandoc\fR.
|
||||
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
.B \-s, \-\-standalone
|
||||
|
@ -62,17 +62,17 @@ Assume the character encoding \fIencoding\fR in reading HTML.
|
|||
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
|
||||
available encodings may be obtained using `\fBiconv \-l\fR'.)
|
||||
If the \fB\-e\fR option is not specified and input is not from
|
||||
STDIN, \fBweb2markdown\fR will try to extract the character encoding
|
||||
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
|
||||
from the "Content-type" meta tag. If no character encoding is
|
||||
specified in this way, or if input is from STDIN, UTF-8 will be
|
||||
assumed.
|
||||
.TP
|
||||
.B \-g \fIcommand\fR
|
||||
Use \fIcommand\fR to fetch the contents of a URL. (By default,
|
||||
\fBweb2markdown\fR searches for an available program or text-based
|
||||
\fBhtml2markdown\fR searches for an available program or text-based
|
||||
browser to fetch the contents of a URL.) For example:
|
||||
.IP
|
||||
web2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
|
||||
html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
|
||||
|
||||
.SH "SEE ALSO"
|
||||
\fBpandoc\fR(1),
|
|
@ -41,7 +41,7 @@ and output through \fBiconv\fR:
|
|||
.PP
|
||||
\fIPandoc\fR's HTML parser is not very forgiving. If your input is
|
||||
HTML, consider running it through \fBtidy\fR(1) before passing it
|
||||
to Pandoc. Or use \fBweb2markdown\fR(1), a wrapper around \fBpandoc\fR.
|
||||
to Pandoc. Or use \fBhtml2markdown\fR(1), a wrapper around \fBpandoc\fR.
|
||||
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
|
@ -151,7 +151,7 @@ Print version.
|
|||
Show usage message.
|
||||
|
||||
.SH "SEE ALSO"
|
||||
\fBweb2markdown\fR(1),
|
||||
\fBhtml2markdown\fR(1),
|
||||
\fBmarkdown2pdf\fR(1).
|
||||
The
|
||||
.I README
|
||||
|
|
|
@ -72,7 +72,7 @@ grabber=
|
|||
while [ $# -gt 0 ]; do
|
||||
case "$1" in
|
||||
-h|--help)
|
||||
pandoc -h 2>&1 | sed -e 's/pandoc/web2markdown/' \
|
||||
pandoc -h 2>&1 | sed -e 's/pandoc/html2markdown/' \
|
||||
-e '/^[[:space:]]*\(-f\|-t\|-S\|-N\|-m\|-i\|-c\|-T\|-D\|-d\)/,/./d'\
|
||||
1>&2
|
||||
err " -e ENCODING, --encoding=ENCODING"
|
||||
|
@ -81,7 +81,7 @@ while [ $# -gt 0 ]; do
|
|||
err " Specify command to be used to grab contents of URL"
|
||||
exit 0 ;;
|
||||
-v|--version)
|
||||
pandoc -v 2>&1 | sed -e 's/pandoc/web2markdown/' 1>&2
|
||||
pandoc -v 2>&1 | sed -e 's/pandoc/html2markdown/' 1>&2
|
||||
exit 0 ;;
|
||||
-e)
|
||||
shift
|
|
@ -14,7 +14,7 @@ pandoc -s README.tex -o demo0.txt
|
|||
pandoc -s -w rst README -o demo0.txt
|
||||
pandoc -s README -o demo0.rtf
|
||||
pandoc -s -m -i -w s5 S5DEMO -o demo0.html
|
||||
web2markdown http://www.gnu.org/software/make/ -o demo0.txt
|
||||
html2markdown http://www.gnu.org/software/make/ -o demo0.txt
|
||||
markdown2pdf README -o demo0.pdf
|
||||
markdown2pdf -C myheader.tex README -o demo0.pdf'
|
||||
|
||||
|
|
|
@ -35,7 +35,7 @@ you should extract from the zip archive and put somewhere in your
|
|||
PATH). See the included file `README-WINDOWS.txt` for instructions
|
||||
on using the program. Note: If you use [Cygwin], we recommend that
|
||||
you compile Pandoc from source. This will give you access to the
|
||||
wrapper scripts `markdown2pdf` and `web2markdown`, which are not
|
||||
wrapper scripts `markdown2pdf` and `html2markdown`, which are not
|
||||
included in the Windows binary package.
|
||||
|
||||
[`@TARBALL_NAME@`]: http://pandoc.googlecode.com/files/@TARBALL_NAME@
|
||||
|
|
Loading…
Reference in a new issue