2006-12-29 19:50:13 +01:00
|
|
|
.TH HTML2MARKDOWN 1 "December 15, 2006" Pandoc "User Manuals"
|
2006-12-22 21:16:03 +01:00
|
|
|
.SH NAME
|
2006-12-29 19:50:13 +01:00
|
|
|
html2markdown \- converts HTML to markdown-formatted text
|
2006-12-22 21:16:03 +01:00
|
|
|
.SH SYNOPSIS
|
2006-12-29 19:50:13 +01:00
|
|
|
\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR]
|
2006-12-22 21:16:03 +01:00
|
|
|
.SH DESCRIPTION
|
2006-12-29 19:50:13 +01:00
|
|
|
\fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text
|
2006-12-22 21:16:03 +01:00
|
|
|
from STDIN) from HTML to markdown\-formatted plain text.
|
2006-12-29 19:50:13 +01:00
|
|
|
If a URL is specified, \fBhtml2markdown\fR uses an available program
|
2006-12-22 21:16:03 +01:00
|
|
|
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
|
|
|
|
to STDOUT unless an output file is specified using the \fB\-o\fR
|
|
|
|
option.
|
|
|
|
.PP
|
2006-12-29 19:50:13 +01:00
|
|
|
\fBhtml2markdown\fR uses the character encoding specified in the
|
2006-12-22 21:16:03 +01:00
|
|
|
"Content-type" meta tag. If this is not present, or if input comes
|
|
|
|
from STDIN, UTF-8 is assumed. A character encoding may be specified
|
|
|
|
explicitly using the \fB\-e\fR option.
|
|
|
|
.PP
|
2006-12-29 19:50:13 +01:00
|
|
|
\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR.
|
2006-12-22 21:16:03 +01:00
|
|
|
.SH OPTIONS
|
|
|
|
.TP
|
|
|
|
.B \-s, \-\-standalone
|
|
|
|
Include title, author, and date information (if present) at the
|
|
|
|
top of markdown output.
|
|
|
|
.TP
|
|
|
|
.B \-o FILE, \-\-output=FILE
|
|
|
|
Write output to \fIFILE\fR instead of STDOUT.
|
|
|
|
.TP
|
|
|
|
.B \-p, \-\-preserve-tabs
|
|
|
|
Preserve tabs instead of converting them to spaces.
|
|
|
|
.TP
|
|
|
|
.B \-\-tab-stop=\fITABSTOP\fB
|
|
|
|
Specify tab stop (default is 4).
|
|
|
|
.TP
|
|
|
|
.B \-R, \-\-parse-raw
|
|
|
|
Parse untranslatable HTML codes as raw HTML.
|
|
|
|
.TP
|
|
|
|
.B \-H \fIFILE\fB, \-\-include-in-header=\fIFILE\fB
|
|
|
|
Include contents of \fIFILE\fR at the end of the header. Implies
|
|
|
|
\fB\-s\fR.
|
|
|
|
.TP
|
|
|
|
.B \-B \fIFILE\fB, \-\-include-before-body=\fIFILE\fB
|
|
|
|
Include contents of \fIFILE\fR at the beginning of the document body.
|
|
|
|
.TP
|
|
|
|
.B \-A \fIFILE\fB, \-\-include-after-body=\fIFILE\fB
|
|
|
|
Include contents of \fIFILE\fR at the end of the document body.
|
|
|
|
.TP
|
|
|
|
.B \-C \fIFILE\fB, \-\-custom-header=\fIFILE\fB
|
|
|
|
Use contents of \fIFILE\fR
|
|
|
|
as the document header (overriding the default header, which can be
|
|
|
|
printed using '\fBpandoc \-D markdown\fR'). Implies
|
|
|
|
\fB-s\fR.
|
|
|
|
.TP
|
|
|
|
.B \-v, \-\-version
|
|
|
|
Print version.
|
|
|
|
.TP
|
|
|
|
.B \-h, \-\-help
|
|
|
|
Show usage message.
|
|
|
|
.TP
|
|
|
|
.B \-e \fIencoding\fR
|
|
|
|
Assume the character encoding \fIencoding\fR in reading HTML.
|
|
|
|
(Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of
|
|
|
|
available encodings may be obtained using `\fBiconv \-l\fR'.)
|
|
|
|
If the \fB\-e\fR option is not specified and input is not from
|
2006-12-29 19:50:13 +01:00
|
|
|
STDIN, \fBhtml2markdown\fR will try to extract the character encoding
|
2006-12-22 21:16:03 +01:00
|
|
|
from the "Content-type" meta tag. If no character encoding is
|
|
|
|
specified in this way, or if input is from STDIN, UTF-8 will be
|
|
|
|
assumed.
|
|
|
|
.TP
|
|
|
|
.B \-g \fIcommand\fR
|
|
|
|
Use \fIcommand\fR to fetch the contents of a URL. (By default,
|
2006-12-29 19:50:13 +01:00
|
|
|
\fBhtml2markdown\fR searches for an available program or text-based
|
2006-12-22 21:16:03 +01:00
|
|
|
browser to fetch the contents of a URL.) For example:
|
|
|
|
.IP
|
2006-12-29 19:50:13 +01:00
|
|
|
html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com
|
2006-12-22 21:16:03 +01:00
|
|
|
|
|
|
|
.SH "SEE ALSO"
|
|
|
|
\fBpandoc\fR(1),
|
|
|
|
\fBiconv\fR(1)
|
|
|
|
.SH AUTHOR
|
|
|
|
John MacFarlane and Recai Oktas
|