Documentation changes corresponding to r456.

git-svn-id: https://pandoc.googlecode.com/svn/trunk@457 788f1e2b-df1e-0410-8736-df70ead52e1b
This commit is contained in:
fiddlosopher 2007-01-08 21:16:18 +00:00
parent 58697ebe78
commit 9a37ee459c
4 changed files with 45 additions and 48 deletions

30
README
View file

@ -176,20 +176,32 @@ may be used in Windows under Cygwin.)
markdown2pdf -o "My Book.pdf" chap1.txt chap2.txt chap3.txt
If no input file is specified, input will be taken from STDIN.
All of `pandoc`'s options will work with `markdown2pdf` as well.
2. `html2markdown` grabs a web page from a file or URL and converts
it to markdown-formatted text, using `tidy` and `pandoc`.
Unless input is from STDIN, an attempt is made to determine the
character encoding of the page from the "Content-type" meta tag.
If this is not present, UTF-8 is assumed. Alternatively, a character
encoding may be specified explicitly using the `-e` option.
`html2markdown` searches for an available program (`wget`, `curl`,
or a text-mode browser) to fetch the contents of a URL.
Optionally, the `-g` command may be used to specify the command
to be used:
All of `pandoc`'s options will work with `html2markdown` as well.
In addition, the following special options may be used.
The special options must be separated from the `html2markdown`
command and any regular Pandoc options by the delimiter `--`:
html2markdown -g 'wget --user=foo --password=bar' mysite.com
html2markdown -o out.txt -- -e latin1 -g curl google.com
The `-e` or `--encoding` option specifies the character encoding
of the HTML input. If this option is not specified, and input
is not from STDIN, `html2markdown` will attempt to determine the
page's character encoding from the "Content-type" meta tag.
If this is not present, UTF-8 is assumed.
The `-g` or `--grabber` option specifies the command to be used to
fetch the contents of a URL:
html2markdown -g 'curl --user foo:bar' www.mysite.com
If this option is not specified, `html2markdown` searches for an
available program (`wget`, `curl`, or a text-mode browser) to fetch
the contents of a URL.
3. `hsmarkdown` is designed to be used as a drop-in replacement for
`Markdown.pl`. It forces `pandoc` to convert from markdown to

3
debian/changelog vendored
View file

@ -210,9 +210,6 @@ pandoc (0.3) unstable; urgency=low
+ getopts shell builtin is used for portable option parsing.
+ Improved html2markdown's web grabber code, making it more robust,
configurable and verbose. Added '-e', '-g' options.
Possible use case:
# Use wget by setting timeout to 10 seconds and limit retries to 2.
html2markdown -g 'wget --timeout=10 --tries=2'
-- Recai Oktaş <roktas@debian.org> Fri, 05 Jan 2007 09:41:19 +0200

View file

@ -2,7 +2,8 @@
.SH NAME
html2markdown \- converts HTML to markdown-formatted text
.SH SYNOPSIS
\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
\fBhtml2markdown\fR [\fIpandoc\-options\fR]
[\-\- \fIspecial\-options\fR] [\fIinput\-file\fR or \fIURL\fR]
.SH DESCRIPTION
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
from STDIN) from HTML to markdown\-formatted plain text.
@ -14,10 +15,12 @@ option.
\fBhtml2markdown\fR uses the character encoding specified in the
"Content-type" meta tag. If this is not present, or if input comes
from STDIN, UTF-8 is assumed. A character encoding may be specified
explicitly using the \fB\-e\fR option.
.PP
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
explicitly using the \fB\-e\fR special option.
.SH OPTIONS
.PP
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR, so all of
\fBpandoc\fR's options may be used. See \fBpandoc\fR(1) for
a complete list. The following options are most relevant:
.TP
.B \-s, \-\-standalone
Include title, author, and date information (if present) at the
@ -26,12 +29,6 @@ top of markdown output.
.B \-o FILE, \-\-output=FILE
Write output to \fIFILE\fR instead of STDOUT.
.TP
.B \-p, \-\-preserve-tabs
Preserve tabs instead of converting them to spaces.
.TP
.B \-\-tab-stop=\fITABSTOP\fB
Specify tab stop (default is 4).
.TP
.B \-\-strict
Use strict markdown syntax, with no extensions or variants.
.TP
@ -54,29 +51,29 @@ Use contents of \fIFILE\fR
as the document header (overriding the default header, which can be
printed using '\fBpandoc \-D markdown\fR'). Implies
\fB-s\fR.
.SH "SPECIAL OPTIONS"
.PP
In addition, the following special options may be used. The special
options must be separated from the \fBhtml2markdown\fR command and any
regular \fBpandoc\fR options by the delimiter `\-\-', as in
.IP
.B html2markdown \-o foo.txt \-\- \-g 'curl \-u bar:baz' \-e latin1
.B www.foo.com
.TP
.B \-v, \-\-version
Print version.
.TP
.B \-h, \-\-help
Show usage message.
.TP
.B \-e \fIencoding\fR
.B \-e \fIencoding\fR, \-\-encoding=\fIencoding\fR
Assume the character encoding \fIencoding\fR in reading HTML.
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
available encodings may be obtained using `\fBiconv \-l\fR'.)
If the \fB\-e\fR option is not specified and input is not from
If this option is not specified and input is not from
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
from the "Content-type" meta tag. If no character encoding is
specified in this way, or if input is from STDIN, UTF-8 will be
assumed.
.TP
.B \-g \fIcommand\fR
.B \-g \fIcommand\fR, \-\-grabber=\fIcommand\fR
Use \fIcommand\fR to fetch the contents of a URL. (By default,
\fBhtml2markdown\fR searches for an available program or text-based
browser to fetch the contents of a URL.) For example:
.IP
html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
browser to fetch the contents of a URL.)
.SH "SEE ALSO"
\fBpandoc\fR(1),

View file

@ -23,19 +23,16 @@ output through \fBiconv\fR:
\fBmarkdown2pdf\fR assumes that the 'unicode' and 'fancyvrb' packages
are in latex's search path. If these packages are not included in your
latex setup, they can be obtained from <http://ctan.org>.
.PP
\fBmarkdown2pdf\fR is a wrapper around \fBpandoc\fR.
.SH OPTIONS
.PP
\fBmarkdown2pdf\fR is a wrapper around \fBpandoc\fR, so all of
\fBpandoc\fR's options can be used with \fBmarkdown2pdf\fR as well.
See \fBpandoc\fR(1) for a complete list.
The following options are most relevant:
.TP
.B \-o FILE, \-\-output=FILE
Write output to \fIFILE\fR.
.TP
.B \-p, \-\-preserve-tabs
Preserve tabs instead of converting them to spaces.
.TP
.B \-\-tab-stop=\fITABSTOP\fB
Specify tab stop (default is 4).
.TP
.B \-\-strict
Use strict markdown syntax, with no extensions or variants.
.TP
@ -57,12 +54,6 @@ Include (LaTeX) contents of \fIFILE\fR at the end of the document body.
Use contents of \fIFILE\fR
as the LaTeX document header (overriding the default header, which can be
printed using '\fBpandoc \-D latex\fR'). Implies \fB-s\fR.
.TP
.B \-v, \-\-version
Print version.
.TP
.B \-h, \-\-help
Show usage message.
.SH "SEE ALSO"
\fBpandoc\fR(1),
\fBpdflatex\fR(1)