The documentation states that the target format name should match the output format, which isn't the case for `docx`/`openxml`.
This PR adds a remark and a frequently requested example (inserting a pagebreak in docx output) to the documentation.
The resulting PDF can be verified using the Apache PDFBox preflight app.
```
$ java -jar preflight-app-2.0.8.jar test.pdf
The file test.pdf is a valid PDF/A-1b file
```
Instructions on how to install the ICC profiles on ConTeXt standalone can be found in the wiki: <http://wiki.contextgarden.net/PDFX#ICC_profiles>.
If the ICC profiles are not available the log will contain messages like these
```
backend > profiles > profile specification 'sRGB.icc' loaded from '/usr/local/texlive/2017/texmf-dist/tex/context/colors/icc/context/colorprofiles.xml'
backend > profiles > error, couldn't locate profile 'srgb.icc'
backend > profiles > no default profile 'srgb.icc' for colorspace 'rgb'
backend > profiles > profile specification 'sRGB IEC61966-2.1' loaded from '/usr/local/texlive/2017/texmf-dist/tex/context/colors/icc/context/colorprofiles.xml'
backend > profiles > error, couldn't locate profile 'srgb.icc'
backend > profiles > invalid output intent 'sRGB IEC61966-2.1'
```
and the resulting PDF will not be valid PDF/A:
```
$ java -jar preflight-app-2.0.8.jar test.pdf
The file test.pdf is not a valid PDF/A-1b file, error(s) :
2.4.3 : Invalid Color space, The operator "g" can't be used without Color Profile on page 1
2.4.3 : Invalid Color space, The operator "G" can't be used without Color Profile on page 1
2.4.3 : Invalid Color space, /DeviceGray default for operator "TJ" can't be used without Color Profile on page 1
2.4.3 : Invalid Color space, /DeviceGray default for operator "TJ" can't be used without Color Profile on page 1
2.4.3 : Invalid Color space, The operator "g" can't be used without Color Profile on page 1
2.4.3 : Invalid Color space, The operator "G" can't be used without Color Profile on page 1
2.4.3 : Invalid Color space, /DeviceGray default for operator "TJ" can't be used without Color Profile on page 1
```
However, the PDF will still be generated and the "errors" shown in the log do not break anything.
It makes more sense not to interpret -- otherwise using the original
document as the reference-doc would produce two of everything: the
interpreted version and the uninterpreted style version.
Previously Emph, Strong, etc were outside the custom-style span. This
moves them inside in order to make it easier to write filters that act
on the formatting in these contents.
Tests and MANUAL example are changed to match.
The previous commit had a bug where custom-style spans would be read
with every recurrsion. This fixes that, and changes the example given
in the manual.
This also necessitated implementing colors and underlining, though
there is currently no way to produce these from markdown. Note that
background colors can't be implemented in PowerPoint, so highlighting
styles that require these will be incomplete.
@jkr - the tabs were inserted by
your 624abeec5c,
presumably through some automatic setting in your editor
that replaced 8 spaces with a tab.
This messed up indented formatting in the manual.
In Pandoc v2 using --section-divs and -t html results in <section>s, not <div>s, by default, as was the case for v1.9.
This change to the Manual emphasizes that you must use -t html4 if you want divs, otherwise you get sections.
Don't pass through macro definitions themselves when `latex_macros`
is set. The macros have already been applied.
If `latex_macros` is enabled, then `rawLaTeXBlock` in
Text.Pandoc.Readers.LaTeX will succeed in parsing a macro definition,
and will update pandoc's internal macro map accordingly, but the
empty string will be returned.
Together with earlier changes, this closes#4179.
It would be awkward to indent example list contents to the
first non-space character after the label, since example
list labels are often long.
Thanks to Bernhard Fisseni for the suggestion.
Previously we computed the column sizes based on the ratio
between the header lines and the text width (as set by `--columns`).
This meant that tables with very short header lines would be
very narrow. With this change, pipe tables with wrapping cells will
always take up the whole text width. The relative column widths
will still be determined by the ratio of header lines, but they
will be normalized to add up to 1.0.
* Deprecate `--strip-empty-paragraphs` option. Instead we now
use an `empty_paragraphs` extension that can be enabled on
the reader or writer. By default, disabled.
* Add `Ext_empty_paragraphs` constructor to `Extension`.
* Revert "Docx reader: don't strip out empty paragraphs."
This reverts commit d6c58eb836.
* Implement `empty_paragraphs` extension in docx reader and writer,
opendocument writer, html reader and writer.
* Add tests for `empty_paragraphs` extension.
This generates a JSON version of a highlighting style, which can be
saved as a .theme file, modified, and used with `--highlight-style`.
Closes#4106.
Closes#4096.
* In Options.HTMLMathMethod, the KaTeX contsructor now takes only
one string (for the KaTeX base URL), rather than two [API change].
* The default URL has been updated to the latest version.
* The autoload script is now loaded by default.
* Options: Added readerStripComments to ReaderOptions.
* Added `--strip-comments` command-line option.
* Made `htmlTag` from the HTML reader sensitive to this feature.
This affects Markdown and Textile input.
Closes#2552.
* Rename --latex-engine to --pdf-engine
* In `Text.Pandoc.Options.WriterOptions`, rename `writerLaTeXEngine` to `writerPdfEngine` and `writerLaTeXArgs` to `writerPdfArgs`.
* Add support for `weasyprint` and `prince`, in addition to `wkhtmltopdf`, for PDF generation via HTML (closes#3906).
* `Text.Pandoc.PDF.html2pdf`: use stdin instead of intermediate HTML file
* Name change OSX -> macOS
fix commit c96b64e
This commit finishes remaining osx to macOS change, as well as replacing MacOS with macOS.
The reason for the later one is because the "correct" casing of macOS is like that. Apple styles it to looks like iOS, watchOS, tvOS, etc. And unfortunately they all start with a lowercase letter, making propercasing (or even title-casing) odd.
* fix casing of Linux, UNIX, and Windows
Closes#3511.
Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces. Now the indentation required is determined by the
first line of the list item: to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker. See the CommonMark
spec for more details and examples.
Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules. Here are some examples
of texts that will be parsed differently:
- a
- b
will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.
- a
code
Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin. With the
four-space rule, this would be a regular paragraph rather than a code
block.
- a
code
Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`. With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).
This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
#1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).
Users who want to use the old rules can select the `four_space_rule`
extension.
* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
of `gobbleSpaces` has been changed so that a `ReaderOptions`
parameter is not needed.
We now allow default output to stdout when it can be
determined that the output is being piped. (On Windows,
as mentioned before, this can't be determined.)
Using '-o -' forces output to stdout regardless.
Previously, for binary formats, output to stdout was disabled
unless we could detect that the output was being piped (and not
sent to the terminal). Unfortunately, such detection is not
possible on Windows, leaving windows users no way to pipe binary
output. So we have changed the behavior in the following way:
* If the -o option is not used, binary output is never sent
to stdout by default; instead, an error is raised.
* IF '-o -' is used, binary output is sent to stdout, regardless
of whether it is being piped. This works on Windows too.
Added TikiWiki reader, including tests and documentation.
It's probably not *complete*, but it works pretty well, handles all
the basics (and some not-so-basics).
* New module Text.Pandoc.Readers.Vimwiki, exporting readVimwiki [API change].
* New input format `vimwiki`.
* New data file, `data/vimwiki.css`, for displaying the HTML produced by this reader and pandoc's HTML writer in the style of vimwiki's own HTML export.
This is now the default for pandoc's Markdown.
It allows whitespace between the two parts of a
reference link: e.g.
[a] [b]
[b]: url
This is now forbidden by default.
Closes#2602.
SEARCHPATH is separated by the usual character,
depending on OS (: on unix, ; on windows).
Note: This does not yet work for PDF output, because the
routine that creates PDFs runs outside PandocMonad.
(This has to do with its use of inTemporaryDirectory and
its interaction with our exceptions.)
The best solution would be to figure out how to move the
PDF creation routines into PandocMonad. Second-best,
just pass an extra parameter in?
See #852.
See #3334.
* Add writerSyntaxMap to WriterOptions.
* Highlighting: added parameter for SyntaxMap to highlight.
* Implemented --syntax-definition option.
TODO:
[ ] Figure out whether we want to have the xml parsing
depend on the dtd (it currently does, and fails unless
the language.dtd is found in the same directory).
[ ] Add an option to read a KDE syntax highlighting theme
as a custom style.
[ ] Add tests.
This is enabled by default in pandoc and GitHub markdown but not the
other flavors.
This requirse a space between the opening #'s and the header
text in ATX headers (as CommonMark does but many other implementations
do not). This is desirable to avoid falsely capturing things ilke
#hashtag
or
#5Closes#3512.
* Add `--lua-filter` option. This works like `--filter` but takes pathnames of special lua filters and uses the lua interpreter baked into pandoc, so that no external interpreter is needed. Note that lua filters are all applied after regular filters, regardless of their position on the command line.
* Add Text.Pandoc.Lua, exporting `runLuaFilter`. Add `pandoc.lua` to data files.
* Add private module Text.Pandoc.Lua.PandocModule to supply the default lua module.
* Add Tests.Lua to tests.
* Add data/pandoc.lua, the lua module pandoc imports when processing its lua filters.
* Document in MANUAL.txt.
Make clear that template variable `meta-json` does not contain plain text values or JSON output format but field values transformed to the selected output format.
* Removed writerEpubStylesheet in WriterOptions.
* Removed `--epub-stylesheet` option.
* Allow `--css` to be used with epub.
* Allow multiple stylesheets to be used.
* Stylesheets will be taken both from `--css` and from
the `stylesheet` metadata field (which can contain either
a file path or a list of them).
Closes#3472, #847.
You can leave an external link as it is by adding the attribute
data-external="1" to the element. Pandoc will then not try to
incorporate its content when `--self-contained` is used. This is
similar to a feature already supported by the EPUB writer.
Closes#2656.
Now in reveal.js, an image with class `stretch` in a paragraph
by itself will stretch to fill the whole screen, with no
caption or figure environment.
Closes#1291.
These were confusing.
Now we rely on the +raw_tex or +raw_html extension with latex
or html input.
Thus, instead of
--parse-raw -f latex
we use
-f latex+raw_tex
and instead of
--parse-raw -f html
we use
-f html+raw_html
This commit enables users to specify the User-Agent
header used when pandoc requests a document from
a URL. This is done by setting an environment variable.
For instance, one can do:
USER_AGENT="..." ./pandoc -f html -t markdown http://example.com
Signed-off-by: Thenaesh Elango <thenaeshelango@gmail.com>
Removed writerDocbookVersion in WriterOptions.
Renamed default.docbook template to default.docbook4.
Allow docbook4 as an output format.
But alias docbook = docbook4.
Thus, to "unsmartify" something that has been parsed as
smart by pandoc, you can use `-t markdown+smart`, and
straight quotes will be produced instead of curly quotes,
etc.
Example:
% pandoc -f latex -t markdown+smart
``hi''---ok
^D
"hi"---ok
Now you will need to do
-f markdown+smart
instead of
-f markdown --smart
This change opens the way for writers, in addition to readers,
to be sensitive to +smart, but this change hasn't yet been made.
API change. Command-line option change.
Updated manual.
* Removed normalize, normalizeInlines, normalizeBlocks
from Text.Pandoc.Shared. These shouldn't now be necessary,
since normalization is handled automatically by the Builder
monoid instance.
* Remove `--normalize` command-line option.
* Don't use normalize in tests.
* A few revisions to readers so they work well without normalize.
* Text.Pandoc.Options.WriterOptions: removed writerReferenceDocx
and writerReferenceODT, replaced them with writerReferenceDoc.
This can hold either an ODT or a Docx. In this way, writerReferenceDoc
is like writerTemplate, which can hold templates of different
formats. [API change]
* Removed `--reference-docx` and `--reference-odt` options.
* Added `--reference-doc` option.
* Add '@*' usage
Added to nocite section to add the wildcard functionality.
* Restoring accented characters.
Removed trailing whitespace in new text too.
* Removing unneeded single quotes.
* Fix spelling typos:
* hightlight
* respecitively
* codeblock – inconsistent with rest of document using “code block”
* Use consistent case for proper nouns.
For example: “ASCII”, “Unicode”, “Latin”, “JavaScript”, “CSS”.
The "default" option is no longer represented as `Nothing` but via a new
type constructor, making the `Maybe` wrapper superfluous.
The default behavior of using heuristics can now be enabled explicitly
by setting `--top-level-division=default`.
API change (`Text.Pandoc.Options`): The `Division` type was renamed to
`TopLevelDivision`. The `Section`, `Chapter`, and `Part` constructors
were renamed to `TopLevelSection`, `TopLevelChapter`, and
`TopLevelPart`, respectively. An additional `TopLevelDefault`
constructor was added, which is now also the new default value of the
`writerTopLevelDivision` field in `WriterOptions`.
- We now first treat the argument of `--filter` as a full (absolute
or relative) path, looking for a program there. If it's found,
we run it.
- If not, and if it is a simple program name or a relative path,
we try resolving it relative to `$DATADIR/filters`.
- If this fails, then we treat it as a program name and look in
the user's PATH.
Previously if you did `--filter foo` and you had `foo` in your
path and also an executable `foo` in your working directory,
the one in the path would be used. Now the one in the working
directory is used.
In addition, when you do `--filter foo/bar.hs`, pandoc will now
find a filter `$DATADIR/filters/foo/bar.hs` -- assuming there
isn't a `foo/bar.hs` relative to the working directory.
@jkr note the slight revision of what we had before.
This was motivated by the idea that one might clone filter
repositories into the filters subdirectory; it is nice to
be able to run them as `reponame/filtername`.
* Markdown reader: modify bracketedSpan to check small caps
* MANUAL.txt: add description on the use of `bracketed_spans` in small cap
* Improve markdown readers: bracketedSpan function EXACTLY as spanHtml
Added `--list-input-formats`, `--list-output-formats`,
`--list-extensions`, `--list-highlight-languages`,
`--list-highlight-styles`.
Removed list of highlighting languages from `--version`
output.
Removed list of input and output formats from default
`--help` output.
Closes#3173.
This is needed because github flavored Markdown has a slightly
different set of escapable symbols than original Markdown;
it includes angle brackets.
Closes#2846.
The `--chapters` option is replaced with `--top-level-division` which allows
users to specify the type as which top-level headers should be output. Possible
values are `section` (the default), `chapter`, or `part`.
The formats LaTeX, ConTeXt, and Docbook allow `part` as top-level division, TEI
only allows to set the `type` attribute on `div` containers. The writers are
altered to respect this option in a sensible way.
Add --parts command line argument.
This only effects LaTeX writer, and only for non-beamer output formats.
It changes the output levels so the top level is 'part', the next
'chapter' and then into sections.