From 9a37ee459cafd70310d732e600a73ccfdfe2cbb8 Mon Sep 17 00:00:00 2001 From: fiddlosopher Date: Mon, 8 Jan 2007 21:16:18 +0000 Subject: [PATCH] Documentation changes corresponding to r456. git-svn-id: https://pandoc.googlecode.com/svn/trunk@457 788f1e2b-df1e-0410-8736-df70ead52e1b --- README | 30 ++++++++++++++++++++--------- debian/changelog | 3 --- man/man1/html2markdown.1 | 41 +++++++++++++++++++--------------------- man/man1/markdown2pdf.1 | 19 +++++-------------- 4 files changed, 45 insertions(+), 48 deletions(-) diff --git a/README b/README index de1efc1bc..f95a93758 100644 --- a/README +++ b/README @@ -176,20 +176,32 @@ may be used in Windows under Cygwin.) markdown2pdf -o "My Book.pdf" chap1.txt chap2.txt chap3.txt If no input file is specified, input will be taken from STDIN. + All of `pandoc`'s options will work with `markdown2pdf` as well. 2. `html2markdown` grabs a web page from a file or URL and converts it to markdown-formatted text, using `tidy` and `pandoc`. - Unless input is from STDIN, an attempt is made to determine the - character encoding of the page from the "Content-type" meta tag. - If this is not present, UTF-8 is assumed. Alternatively, a character - encoding may be specified explicitly using the `-e` option. - `html2markdown` searches for an available program (`wget`, `curl`, - or a text-mode browser) to fetch the contents of a URL. - Optionally, the `-g` command may be used to specify the command - to be used: + All of `pandoc`'s options will work with `html2markdown` as well. + In addition, the following special options may be used. + The special options must be separated from the `html2markdown` + command and any regular Pandoc options by the delimiter `--`: - html2markdown -g 'wget --user=foo --password=bar' mysite.com + html2markdown -o out.txt -- -e latin1 -g curl google.com + + The `-e` or `--encoding` option specifies the character encoding + of the HTML input. If this option is not specified, and input + is not from STDIN, `html2markdown` will attempt to determine the + page's character encoding from the "Content-type" meta tag. + If this is not present, UTF-8 is assumed. + + The `-g` or `--grabber` option specifies the command to be used to + fetch the contents of a URL: + + html2markdown -g 'curl --user foo:bar' www.mysite.com + + If this option is not specified, `html2markdown` searches for an + available program (`wget`, `curl`, or a text-mode browser) to fetch + the contents of a URL. 3. `hsmarkdown` is designed to be used as a drop-in replacement for `Markdown.pl`. It forces `pandoc` to convert from markdown to diff --git a/debian/changelog b/debian/changelog index a06c40579..8ce9acc47 100644 --- a/debian/changelog +++ b/debian/changelog @@ -210,9 +210,6 @@ pandoc (0.3) unstable; urgency=low + getopts shell builtin is used for portable option parsing. + Improved html2markdown's web grabber code, making it more robust, configurable and verbose. Added '-e', '-g' options. - Possible use case: - # Use wget by setting timeout to 10 seconds and limit retries to 2. - html2markdown -g 'wget --timeout=10 --tries=2' -- Recai Oktaş Fri, 05 Jan 2007 09:41:19 +0200 diff --git a/man/man1/html2markdown.1 b/man/man1/html2markdown.1 index 542d26852..78c27808e 100644 --- a/man/man1/html2markdown.1 +++ b/man/man1/html2markdown.1 @@ -2,7 +2,8 @@ .SH NAME html2markdown \- converts HTML to markdown-formatted text .SH SYNOPSIS -\fBhtml2markdown\fR [\fIoptions\fR] [\fIinput\-file\fR or \fIURL\fR] +\fBhtml2markdown\fR [\fIpandoc\-options\fR] +[\-\- \fIspecial\-options\fR] [\fIinput\-file\fR or \fIURL\fR] .SH DESCRIPTION \fBhtml2markdown\fR converts \fIinput\-file\fR or \fIURL\fR (or text from STDIN) from HTML to markdown\-formatted plain text. @@ -14,10 +15,12 @@ option. \fBhtml2markdown\fR uses the character encoding specified in the "Content-type" meta tag. If this is not present, or if input comes from STDIN, UTF-8 is assumed. A character encoding may be specified -explicitly using the \fB\-e\fR option. -.PP -\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR. +explicitly using the \fB\-e\fR special option. .SH OPTIONS +.PP +\fBhtml2markdown\fR is a wrapper for \fBpandoc\fR, so all of +\fBpandoc\fR's options may be used. See \fBpandoc\fR(1) for +a complete list. The following options are most relevant: .TP .B \-s, \-\-standalone Include title, author, and date information (if present) at the @@ -26,12 +29,6 @@ top of markdown output. .B \-o FILE, \-\-output=FILE Write output to \fIFILE\fR instead of STDOUT. .TP -.B \-p, \-\-preserve-tabs -Preserve tabs instead of converting them to spaces. -.TP -.B \-\-tab-stop=\fITABSTOP\fB -Specify tab stop (default is 4). -.TP .B \-\-strict Use strict markdown syntax, with no extensions or variants. .TP @@ -54,29 +51,29 @@ Use contents of \fIFILE\fR as the document header (overriding the default header, which can be printed using '\fBpandoc \-D markdown\fR'). Implies \fB-s\fR. +.SH "SPECIAL OPTIONS" +.PP +In addition, the following special options may be used. The special +options must be separated from the \fBhtml2markdown\fR command and any +regular \fBpandoc\fR options by the delimiter `\-\-', as in +.IP +.B html2markdown \-o foo.txt \-\- \-g 'curl \-u bar:baz' \-e latin1 +.B www.foo.com .TP -.B \-v, \-\-version -Print version. -.TP -.B \-h, \-\-help -Show usage message. -.TP -.B \-e \fIencoding\fR +.B \-e \fIencoding\fR, \-\-encoding=\fIencoding\fR Assume the character encoding \fIencoding\fR in reading HTML. (Note: \fIencoding\fR will be passed to \fBiconv\fR; a list of available encodings may be obtained using `\fBiconv \-l\fR'.) -If the \fB\-e\fR option is not specified and input is not from +If this option is not specified and input is not from STDIN, \fBhtml2markdown\fR will try to extract the character encoding from the "Content-type" meta tag. If no character encoding is specified in this way, or if input is from STDIN, UTF-8 will be assumed. .TP -.B \-g \fIcommand\fR +.B \-g \fIcommand\fR, \-\-grabber=\fIcommand\fR Use \fIcommand\fR to fetch the contents of a URL. (By default, \fBhtml2markdown\fR searches for an available program or text-based -browser to fetch the contents of a URL.) For example: -.IP -html2markdown \-g 'wget \-\-user=foo \-\-password=bar' mysite.com +browser to fetch the contents of a URL.) .SH "SEE ALSO" \fBpandoc\fR(1), diff --git a/man/man1/markdown2pdf.1 b/man/man1/markdown2pdf.1 index 4524c0ac2..3162742bb 100644 --- a/man/man1/markdown2pdf.1 +++ b/man/man1/markdown2pdf.1 @@ -23,19 +23,16 @@ output through \fBiconv\fR: \fBmarkdown2pdf\fR assumes that the 'unicode' and 'fancyvrb' packages are in latex's search path. If these packages are not included in your latex setup, they can be obtained from . -.PP -\fBmarkdown2pdf\fR is a wrapper around \fBpandoc\fR. .SH OPTIONS +.PP +\fBmarkdown2pdf\fR is a wrapper around \fBpandoc\fR, so all of +\fBpandoc\fR's options can be used with \fBmarkdown2pdf\fR as well. +See \fBpandoc\fR(1) for a complete list. +The following options are most relevant: .TP .B \-o FILE, \-\-output=FILE Write output to \fIFILE\fR. .TP -.B \-p, \-\-preserve-tabs -Preserve tabs instead of converting them to spaces. -.TP -.B \-\-tab-stop=\fITABSTOP\fB -Specify tab stop (default is 4). -.TP .B \-\-strict Use strict markdown syntax, with no extensions or variants. .TP @@ -57,12 +54,6 @@ Include (LaTeX) contents of \fIFILE\fR at the end of the document body. Use contents of \fIFILE\fR as the LaTeX document header (overriding the default header, which can be printed using '\fBpandoc \-D latex\fR'). Implies \fB-s\fR. -.TP -.B \-v, \-\-version -Print version. -.TP -.B \-h, \-\-help -Show usage message. .SH "SEE ALSO" \fBpandoc\fR(1), \fBpdflatex\fR(1)