9a37ee459c
git-svn-id: https://pandoc.googlecode.com/svn/trunk@457 788f1e2b-df1e-0410-8736-df70ead52e1b
82 lines
3.1 KiB
Groff
82 lines
3.1 KiB
Groff
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
|
|
.SH NAME
|
|
html2markdown \- converts HTML to markdown-formatted text
|
|
.SH SYNOPSIS
|
|
\fBhtml2markdown\fR [\fIpandoc\-options\fR]
|
|
[\-\- \fIspecial\-options\fR] [\fIinput\-file\fR or \fIURL\fR]
|
|
.SH DESCRIPTION
|
|
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
|
|
from STDIN) from HTML to markdown\-formatted plain text.
|
|
If a URL is specified, \fBhtml2markdown\fR uses an available program
|
|
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
|
|
to STDOUT unless an output file is specified using the \fB\-o\fR
|
|
option.
|
|
.PP
|
|
\fBhtml2markdown\fR uses the character encoding specified in the
|
|
"Content-type" meta tag. If this is not present, or if input comes
|
|
from STDIN, UTF-8 is assumed. A character encoding may be specified
|
|
explicitly using the \fB\-e\fR special option.
|
|
.SH OPTIONS
|
|
.PP
|
|
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR, so all of
|
|
\fBpandoc\fR's options may be used. See \fBpandoc\fR(1) for
|
|
a complete list. The following options are most relevant:
|
|
.TP
|
|
.B \-s, \-\-standalone
|
|
Include title, author, and date information (if present) at the
|
|
top of markdown output.
|
|
.TP
|
|
.B \-o FILE, \-\-output=FILE
|
|
Write output to \fIFILE\fR instead of STDOUT.
|
|
.TP
|
|
.B \-\-strict
|
|
Use strict markdown syntax, with no extensions or variants.
|
|
.TP
|
|
.TP
|
|
.B \-R, \-\-parse-raw
|
|
Parse untranslatable HTML codes as raw HTML.
|
|
.TP
|
|
.B \-H \fIFILE\fB, \-\-include-in-header=\fIFILE\fB
|
|
Include contents of \fIFILE\fR at the end of the header. Implies
|
|
\fB\-s\fR.
|
|
.TP
|
|
.B \-B \fIFILE\fB, \-\-include-before-body=\fIFILE\fB
|
|
Include contents of \fIFILE\fR at the beginning of the document body.
|
|
.TP
|
|
.B \-A \fIFILE\fB, \-\-include-after-body=\fIFILE\fB
|
|
Include contents of \fIFILE\fR at the end of the document body.
|
|
.TP
|
|
.B \-C \fIFILE\fB, \-\-custom-header=\fIFILE\fB
|
|
Use contents of \fIFILE\fR
|
|
as the document header (overriding the default header, which can be
|
|
printed using '\fBpandoc \-D markdown\fR'). Implies
|
|
\fB-s\fR.
|
|
.SH "SPECIAL OPTIONS"
|
|
.PP
|
|
In addition, the following special options may be used. The special
|
|
options must be separated from the \fBhtml2markdown\fR command and any
|
|
regular \fBpandoc\fR options by the delimiter `\-\-', as in
|
|
.IP
|
|
.B html2markdown \-o foo.txt \-\- \-g 'curl \-u bar:baz' \-e latin1
|
|
.B www.foo.com
|
|
.TP
|
|
.B \-e \fIencoding\fR, \-\-encoding=\fIencoding\fR
|
|
Assume the character encoding \fIencoding\fR in reading HTML.
|
|
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
|
|
available encodings may be obtained using `\fBiconv \-l\fR'.)
|
|
If this option is not specified and input is not from
|
|
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
|
|
from the "Content-type" meta tag. If no character encoding is
|
|
specified in this way, or if input is from STDIN, UTF-8 will be
|
|
assumed.
|
|
.TP
|
|
.B \-g \fIcommand\fR, \-\-grabber=\fIcommand\fR
|
|
Use \fIcommand\fR to fetch the contents of a URL. (By default,
|
|
\fBhtml2markdown\fR searches for an available program or text-based
|
|
browser to fetch the contents of a URL.)
|
|
|
|
.SH "SEE ALSO"
|
|
\fBpandoc\fR(1),
|
|
\fBiconv\fR(1)
|
|
.SH AUTHOR
|
|
John MacFarlane and Recai Oktas
|