2007-07-21 22:36:08 +02:00
|
|
|
% HTML2MARKDOWN(1) Pandoc User Manuals
|
2007-07-02 01:26:20 +02:00
|
|
|
% John MacFarlane and Recai Oktas
|
2008-01-08 18:25:57 +01:00
|
|
|
% January 8, 2008
|
2007-07-02 01:26:20 +02:00
|
|
|
|
|
|
|
# NAME
|
|
|
|
|
|
|
|
html2markdown - converts HTML to markdown-formatted text
|
|
|
|
|
|
|
|
# SYNOPSIS
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
html2markdown [*pandoc-options*] [\-- *special-options*] [*input-file* or
|
2007-07-02 04:16:58 +02:00
|
|
|
*URL*]
|
2007-07-02 01:26:20 +02:00
|
|
|
|
|
|
|
# DESCRIPTION
|
|
|
|
|
|
|
|
`html2markdown` converts *input-file* or *URL* (or text
|
|
|
|
from STDIN) from HTML to markdown-formatted plain text.
|
|
|
|
If a URL is specified, `html2markdown` uses an available program
|
|
|
|
(e.g. wget, w3m, lynx or curl) to fetch its contents. Output is sent
|
|
|
|
to STDOUT unless an output file is specified using the `-o`
|
|
|
|
option.
|
|
|
|
|
|
|
|
`html2markdown` uses the character encoding specified in the
|
|
|
|
"Content-type" meta tag. If this is not present, or if input comes
|
|
|
|
from STDIN, UTF-8 is assumed. A character encoding may be specified
|
|
|
|
explicitly using the `-e` special option.
|
|
|
|
|
|
|
|
# OPTIONS
|
|
|
|
|
|
|
|
`html2markdown` is a wrapper for `pandoc`, so all of
|
|
|
|
`pandoc`'s options may be used. See `pandoc`(1) for
|
|
|
|
a complete list. The following options are most relevant:
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-s, \--standalone
|
2007-07-02 01:26:20 +02:00
|
|
|
: Include title, author, and date information (if present) at the
|
|
|
|
top of markdown output.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-o *FILE*, \--output=*FILE*
|
2007-07-02 01:26:20 +02:00
|
|
|
: Write output to *FILE* instead of STDOUT.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
\--strict
|
2007-07-02 01:26:20 +02:00
|
|
|
: Use strict markdown syntax, with no extensions or variants.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
\--reference-links
|
2007-07-02 01:26:20 +02:00
|
|
|
: Use reference-style links, rather than inline links, in writing markdown
|
|
|
|
or reStructuredText.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-R, \--parse-raw
|
2007-07-02 01:26:20 +02:00
|
|
|
: Parse untranslatable HTML codes as raw HTML.
|
|
|
|
|
2007-09-27 03:28:28 +02:00
|
|
|
\--no-wrap
|
|
|
|
: Disable text wrapping in output. (Default is to wrap text.)
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-H *FILE*, \--include-in-header=*FILE*
|
2007-07-02 01:26:20 +02:00
|
|
|
: Include contents of *FILE* at the end of the header. Implies
|
|
|
|
`-s`.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-B *FILE*, \--include-before-body=*FILE*
|
2007-07-02 01:26:20 +02:00
|
|
|
: Include contents of *FILE* at the beginning of the document body.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-A *FILE*, \--include-after-body=*FILE*
|
2007-07-02 01:26:20 +02:00
|
|
|
: Include contents of *FILE* at the end of the document body.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-C *FILE*, \--custom-header=*FILE*
|
2007-07-02 04:16:58 +02:00
|
|
|
: Use contents of *FILE*
|
|
|
|
as the document header (overriding the default header, which can be
|
|
|
|
printed using `pandoc -D markdown`). Implies `-s`.
|
2007-07-02 01:26:20 +02:00
|
|
|
|
|
|
|
# SPECIAL OPTIONS
|
|
|
|
|
|
|
|
In addition, the following special options may be used. The special
|
|
|
|
options must be separated from the `html2markdown` command and any
|
2007-07-02 04:16:58 +02:00
|
|
|
regular `pandoc` options by the delimiter \``--`', as in
|
2007-07-02 01:26:20 +02:00
|
|
|
|
|
|
|
html2markdown -o foo.txt -- -g 'curl -u bar:baz' -e latin1 \
|
|
|
|
www.foo.com
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-e *encoding*, \--encoding=*encoding*
|
2007-07-02 01:26:20 +02:00
|
|
|
: Assume the character encoding *encoding* in reading HTML.
|
|
|
|
(Note: *encoding* will be passed to `iconv`; a list of
|
|
|
|
available encodings may be obtained using `iconv -l`.)
|
|
|
|
If this option is not specified and input is not from
|
|
|
|
STDIN, `html2markdown` will try to extract the character encoding
|
|
|
|
from the "Content-type" meta tag. If no character encoding is
|
|
|
|
specified in this way, or if input is from STDIN, UTF-8 will be
|
|
|
|
assumed.
|
|
|
|
|
2007-07-08 05:31:26 +02:00
|
|
|
-g *command*, \--grabber=*command*
|
2007-07-02 01:26:20 +02:00
|
|
|
: Use *command* to fetch the contents of a URL. (By default,
|
|
|
|
`html2markdown` searches for an available program or text-based
|
|
|
|
browser to fetch the contents of a URL.)
|
|
|
|
|
|
|
|
# SEE ALSO
|
|
|
|
|
|
|
|
`pandoc`(1), `iconv`(1)
|