Commit graph

2688 commits

Author SHA1 Message Date
John MacFarlane
b3b40546cb HTML reader: Fix performance issue with malformed HTML tables.
We let a `</table>` tag close an open `<tr>` or `<td>`.
Closes #1167.
2014-06-20 10:47:29 -07:00
John MacFarlane
cab4b829b3 Support --trace in HTML reader. 2014-06-20 10:39:24 -07:00
John MacFarlane
12efffa85a LaTeX writer: Fixed strikeout + highlighted code. Closes #1294.
Previously strikeout highlighted code caused an error.
2014-06-20 10:24:30 -07:00
Jesse Rosenthal
f6ae644831 Make strNormalize go bottomUp.
This was how it used to be before it was folded into blockNormalize.
2014-06-20 12:31:36 -04:00
Jesse Rosenthal
2aa5f58c5b Docx reader: Add a comment explaining strNormalize
`normalize` from Text.Pandoc.Shared is more general. In tests, though,
it more than doubles the run time. `strNormalize` does less, but it does
what we need. This comment is added for future maintainability.
2014-06-20 10:27:18 -04:00
Jesse Rosenthal
03af19a7e1 Docx Reader: Normalize DefinitionLists
Previously DefinitionList had been left out of `blockNormalize`. Now it
is included.
2014-06-20 10:20:37 -04:00
Jesse Rosenthal
3da515bdb0 Docx reader: simplify blockNormalize
Use a function `stripSpaces`, instead of recursion. Makes it a bit
easier to read and mantain, and simplify normalizing DefinitionList,
which was left out the first time.
2014-06-20 10:12:28 -04:00
Jesse Rosenthal
7fd48b30e0 Docx reader: Fix hdr handling in block norm
`blockNormalize` previously forgot to account for the case in which a
Header's inlines did not start with a space.
2014-06-20 09:30:30 -04:00
John MacFarlane
557b302731 Docx writer: Use Compact style for empty table cells.
Otherwise we get overly tall lines when there are empty
table cells and the other cells are compact.

Closes #1353.
2014-06-19 23:31:17 -07:00
John MacFarlane
3c059dbe60 HTML reader: Allow space between <col> and </col>.
Test case:

```
<table border="1">
  <colgroup>
    <col> </col>
    <col></col>
  </colgroup>
  <tbody>
    <tr>
        <td>X</td>
        <td>Y</td>
    </tr>
    <tr>
        <td>1</td>
        <td>2</td>
    </tr>
  </tbody>
</table>

```
2014-06-19 23:24:28 -07:00
John MacFarlane
0d8e0e5674 Merge pull request #1354 from jkr/literalTab
Parse literal tabs in docx
2014-06-19 22:47:32 -07:00
Jesse Rosenthal
a934db9a32 Introduce blockNormalize
This will help take care of spaces introduced at the beginning of strings.
2014-06-19 19:28:55 -04:00
Jesse Rosenthal
0e7d2dbd43 Have Docx reader properly interpret tabs. 2014-06-19 17:55:02 -04:00
Jesse Rosenthal
86fc44d6b3 Add literal tabs to parser. 2014-06-19 17:53:52 -04:00
John MacFarlane
5cb53a48d5 ImageSize: ignore unknown exif header tag rather than crashing.
Some images seem to have tag type of 256, which was causing
a runtime error.
2014-06-19 14:30:03 -07:00
John MacFarlane
00281559bf Haddock writer: Use _____ for hrule.
Avoids interpretation as list.
2014-06-19 00:28:23 -07:00
John MacFarlane
de7b3a3d08 Haddock writer: Only use Decimal list style. 2014-06-18 18:11:01 -07:00
John MacFarlane
c4182b39ca Small fix to haddock "tables". 2014-06-18 18:08:41 -07:00
John MacFarlane
ff6a2baeb9 More polish on Haddock reader/writer. 2014-06-18 17:49:59 -07:00
John MacFarlane
35e57db5c2 Finished first draft of Haddock writer. 2014-06-18 17:09:36 -07:00
John MacFarlane
9fc5c8d7af Rewrote haddock reader to use haddock-library.
This brings pandoc's rendering of haddock markup in line
with the new haddock.

Note that we preserve line breaks in `@` code blocks, unlike
the earlier version.

Modified tests pass.  More tests would be good.
2014-06-18 14:18:55 -07:00
John MacFarlane
ab390a10ec Removed old haddock reader code. Add dependency on haddock-library.
This also removes the dependency on alex and happy.
2014-06-18 11:33:09 -07:00
John MacFarlane
b371e83d73 Highlighting: Let .numberLines work even if no language given.
Closes #1287, jgm/highlighting-kate#40.
2014-06-17 15:15:56 -07:00
John MacFarlane
59272e4d99 DocBook reader: Support <?asciidoc-br?>.
Closes #1236.

Note, this is a bit of a kludge, to work around the fact that xml-light
doesn't parse `<?asciidoc-br?>` correctly.  We preprocess the input,
replacing that instruction with `<br/>`, and then parse that as a line
break.  Other XML instructions are simply removed from the input stream.
2014-06-17 12:14:02 -07:00
John MacFarlane
fc291efad3 LaTeX reader: Correctly handle table rows with too few cells.
LaTeX seems to treat them as if they have empty cells at the
end.  Closes #241.
2014-06-17 00:38:55 -07:00
John MacFarlane
7d60c798bf Fixed compiler warning. 2014-06-16 23:02:20 -07:00
John MacFarlane
bbe99003f8 Naming: Use Docx instead of DocX.
For consistency with the existing writer.
2014-06-16 22:44:40 -07:00
John MacFarlane
bec9f3c641 Merge branch 'docx' of https://github.com/jkr/pandoc into jkr-docx 2014-06-16 22:16:45 -07:00
John MacFarlane
78ee2416d1 Org reader: make tildes create inline code.
Closes #1345.  Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.

I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc.  The previous behavior resulted in the
text not appearing in any output format.
2014-06-16 22:03:26 -07:00
John MacFarlane
f9b97e6bfb Small improvement to fix to #1333.
This allows blank lines at end of multiline headers.
2014-06-16 21:26:50 -07:00
John MacFarlane
9da5d8955e Markdown reader: fixed #1333 (table parsing bug). 2014-06-16 21:18:24 -07:00
John MacFarlane
87c08be58f LaTeX reader: handle leading/trailing spaces in emph better.
`\emph{ hi }` gets parsed as `[Space, Emph [Str "hi"], Space]`
so that we don't get things like `* hi *` in markdown output.

Also applies to textbf and some other constructions.

Closes #1146.  (`--normalize` isn't touched by this, but
normalization should not generally be necessary with the
changes to the readers.)
2014-06-16 19:18:33 -07:00
John MacFarlane
459805de4c LaTeX reader: don't assume preamble doesn't contain environments.
Closes #1338.
2014-06-16 17:43:56 -07:00
John MacFarlane
31fd843133 HTML reader: Fixed major parsing problem with HTML tables.
Table cells were being combined into one cell.  Closes #1341.
2014-06-16 15:45:20 -07:00
John MacFarlane
2b364b34bb Merge pull request #1344 from mpickering/master
Moved extractSpaces to Shared.hs
2014-06-16 14:43:43 -07:00
John MacFarlane
01ef573ac2 Org reader: fixed #1342.
This change rewrites `inlineLaTeXCommand` so that parsec will
know when input is being consumed.  Previously a run-time
error would be produced with some input involving raw latex.
(I believe this does not affect the last release, as the inline
latex reading was added recently.)
2014-06-16 14:18:06 -07:00
mpickering
7807564d44 Moved extractSpaces to Shared.hs
Generalised and move the extractSpaces function from `HTML.hs` to
`Shared.hs` so that the docx reader can also use it.
2014-06-16 20:45:54 +01:00
mpickering
3bc818d2d3 Integrated the docx reader into the main pandoc program.
Changes also include generalising the types of reader allowed. The
mechanism now mimics the more general output mechanism.
2014-06-16 07:18:40 -04:00
Jesse Rosenthal
293e4cfdc3 Add DocX files to tree.
This introduces Text.Pandoc.DocX, and its exported `readDocX` function.
2014-06-16 07:18:34 -04:00
James Aspnes
abbf33ae7d allow (and discard) optional argument for \caption 2014-06-12 21:19:00 -04:00
John MacFarlane
9681574661 LaTeX reader: Handle comments at the end of tables.
This resolves the issue illustrated in
http://stackoverflow.com/questions/24009489/comments-in-latex-break-pandoc-table.
2014-06-03 23:17:42 -07:00
John MacFarlane
ab5dda7a60 Markdown writer: Prettier pipe tables.
Columns are now aligned.  Closes #1323.
2014-06-03 23:17:03 -07:00
John MacFarlane
45f3851611 Docx writer: Section numbering carries over from reference.docx.
Closes #1305.
2014-06-03 16:46:55 -07:00
John MacFarlane
0ddb4cd2e8 Docx writer: Combine reference.docx numbering with pandoc's.
This should have fixed #1305, allowing the reference.docx to define
section numbering, but it doesn't.  Now the headings appear with proper
indentation, but the numbers don't appear.  Unclear why.  styles.xml and
numbering.xml basically match the docx which has the expected result.
2014-06-03 13:14:32 -07:00
John MacFarlane
ec047aaa8c Docx writer: pandoc uses only numIds >= 1000 for lists.
This opens up the possiblity (with further code changes) of
preserving some numbering from the reference.docx (e.g. header
numbering.)  See #1305.
2014-06-03 12:13:31 -07:00
John MacFarlane
2842ad5a97 Docx writer: Changed abstractNumId numbering scheme.
Now the minimum id used by pandoc is 990.  All ids start with "99".
This gives some room for a reference.docx to define numbering styles.
Note:  this is not yet possible, since pandoc generates numbering.xml
entirely on its own.
2014-06-03 11:33:09 -07:00
John MacFarlane
05355ac57b Docx writer: Simplified abstractNumId numbering.
Instead of sequential numbering, we assign numbers based on the
list marker styles.  This simplifies some of the code and should
make it easier to modify numbering in the future.
2014-06-03 11:03:40 -07:00
John MacFarlane
9b4e772718 Templates: use ordNum instead of ord.
Closes #1022.
2014-06-03 11:01:23 -07:00
John MacFarlane
2a627f85fe Shared: Added ordNub.
API change (adds export).
2014-06-03 11:00:54 -07:00
John MacFarlane
cbfde5cb50 Docx writer: Create overrides per-image for media/ in ref docx.
This should be somewhat more robust and cover more types
of images.
2014-06-02 20:39:27 -07:00
John MacFarlane
326d7fa8f8 Docx writer: Improved entryFromArchive to avoid parse.
No need to parse the XML if we're just going to render it
right away!
2014-06-02 20:20:16 -07:00
John MacFarlane
bf915da6cd Docx writer: Make images work in reference.docx headers/footers.
* All media from reference.docx are copied into result.
* Added defaults for common image types to [Content Types].
* Avoided redundant XML parse + write for entries taken over from
  reference.docx, for better performance.
2014-06-02 20:07:41 -07:00
John MacFarlane
e1cf47efa0 Templates: Fail informatively on template syntax errors.
With the move from parsec to attoparsec, we lost good error
reporting.  In fact, since we weren't testing for end of input,
malformed templates would fail silently.  Here we revert back to
Parsec for better error messages.
2014-06-01 23:45:05 -07:00
John MacFarlane
7242165bed Docx writer: Improved handling of headers/footers. 2014-06-01 22:29:13 -07:00
John MacFarlane
6848f642e8 Docx writer: Header and footer are now carried over from reference.docx. 2014-06-01 21:17:00 -07:00
John MacFarlane
6327ccf523 Minor code reformat. 2014-06-01 15:29:27 -07:00
John MacFarlane
23a9b800a3 Docx writer: Take over document formatting from reference.docx.
This includes margins, page size, page orientation.
2014-05-31 22:02:33 -07:00
John MacFarlane
9cf5f74e8f PDF writer: Fixed treatment of data uris for images.
Closes #1062.
2014-05-28 10:41:40 -07:00
John MacFarlane
e656658af8 Merge pull request #1302 from tarleb/inline-latex
Org reader: support for inline LaTeX
2014-05-28 09:26:48 -07:00
John MacFarlane
e3ddc371de Markdown reader: Handle c++ and objective-c as language identifiers
in github-style fenced blocks.  Closes #1318.

Note:  This is special-case handling of these two cases.
It would be good to do something more systematic.
2014-05-27 12:44:39 -07:00
John MacFarlane
2e80613451 Markdown reader: inline math must have nonspace before final $.
Closes #1313.
2014-05-27 11:59:28 -07:00
Albert Krewinkel
3238a2f919 Org reader: support for inline LaTeX
Inline LaTeX is now accepted and parsed by the org-mode reader.  Both,
math symbols (like \tau) and LaTeX commands (like \cite{Coffee}), can be
used without any further escaping.
2014-05-20 22:29:21 +02:00
John MacFarlane
3c77ab98bf EPUB writer: Handle multiple dates with OPF event attributes.
Note: in EPUB3 we can have only one dc:date, so only the first
one is used.
2014-05-19 13:25:44 -07:00
John MacFarlane
8d04c821aa Avoid import Prelude hiding (catch).
See #1309.
2014-05-19 09:45:00 -07:00
John MacFarlane
ee8c8da8cc Removed dependency on conduit.
* http-conduit flag is now https.
* Instead of http-conduit, we depend on http-client and http-client-tls.
2014-05-18 22:07:00 -07:00
John MacFarlane
c5c9b0d289 EPUB writer: Fixed regression on cover image.
In 1.12.4 and 1.12.4.2, the cover image would not appear properly,
because the metadata id was not correct.

This was introduced by the fix to #1254.

Now we derive the id from the actual cover image filename,
which we preserve rather than using "cover-image."
2014-05-15 10:11:48 -07:00
John MacFarlane
60b8b85040 Merge pull request #1293 from tarleb/typo
Process: Fix minor typo in pipeProcess' docs
2014-05-14 06:41:04 -07:00
John MacFarlane
b5959b2007 Merge pull request #1297 from tarleb/citations
Org reader: support Pandocs citation extension
2014-05-14 06:37:29 -07:00
Albert Krewinkel
ceeb701c25 Org reader: support Pandocs citation extension
Citations are defined via the "normal citation" syntax used in markdown,
with the sole difference that newlines are not allowed between "[...]".
This is for consistency, as org-mode generally disallows newlines
between square brackets.

The extension is turned on by default and can be turned off via the
default syntax-extension mechanism, i.e. by specifying "org-citation" as
the input format.
Move `citeKey` from Readers.Markdown into Parsing

The function can be used by other readers, so it is made accessible for
all parsers.
2014-05-14 15:00:26 +02:00
Albert Krewinkel
2423f9e6b1 Move citeKey from Readers.Markdown to Parsing
The function can be used by other readers, so it is made accessible for
all parsers.
2014-05-14 14:58:05 +02:00
Albert Krewinkel
9df589b9c5 Introduce class HasLastStrPosition, generalize functions
Both `ParserState` and `OrgParserState` keep track of the parser position at
which the last string ended.  This patch introduces a new class
`HasLastStrPosition` and makes the above types instances of that class.  This
enables the generalization of functions updating the state or checking if one
is right after a string.
2014-05-14 14:57:00 +02:00
John MacFarlane
aa019448d6 LaTeX reader: Support \addbibresource. 2014-05-12 13:06:06 -07:00
John MacFarlane
2348f07b11 Shared addMetaField: if old and new values both lists, concatenate. 2014-05-12 13:05:42 -07:00
John MacFarlane
a8319d1339 LaTeX reader: set bibliography in metadata from \bibliography cmd. 2014-05-11 22:52:29 -07:00
Albert Krewinkel
113a32daa8 Process: Fix minor typo in pipeProcess' docs
Replace fullstop with comma, adjust capitalisation.
2014-05-11 15:07:01 +02:00
John MacFarlane
0092606476 LaTeX reader: Don't error on "%foo" with no newline. 2014-05-10 23:26:32 -07:00
Albert Krewinkel
c5fd631b55 Org reader: Fix block parameter reader, relax constraints
The reader produced wrong results for block containing non-letter chars
in their parameter arguments.  This patch relaxes constraints in that it
allows block header arguments to contain any non-space character (except
for ']' for inline blocks).

Thanks to Xiao Hanyu for noticing this.
2014-05-10 11:35:54 +02:00
John MacFarlane
884693fea8 Merge pull request #1288 from tarleb/update-copyright
Update copyright notices for 2014, add missing notices
2014-05-09 09:53:06 -07:00
Albert Krewinkel
07694b3018 Org reader: Fix parsing of blank lines within blocks
Blank lines were parsed as two newlines instead of just one.
Thanks to Xiao Hanyu (@xiaohanyu) for pointing this out.
2014-05-09 18:23:23 +02:00
Albert Krewinkel
757c4f68f3 Org reader: Support arguments for code blocks
The general form of source block headers
(`#+BEGIN_SRC <language> <switches> <header arguments>`) was not
recognized by the reader.  This patch adds support for the above form,
adds header arguments to the block's key-value pairs and marks the block
as a rundoc block if header arguments are present.

This closes #1286.
2014-05-09 18:08:30 +02:00
Albert Krewinkel
7760504bb2 Org reader: refactor #+BEGIN..#+END block parsing code 2014-05-09 10:53:08 +02:00
Albert Krewinkel
8fdbef841d Update copyright notices for 2014, add missing notices 2014-05-09 00:46:08 +02:00
mpickering
f0f88111e6 Small improvement to textile reader fix. Removed 'try'. 2014-05-07 09:48:48 -07:00
mpickering
0050b50905 Fix textile reader hanging.
Textile reader hung on

    pandoc -f textile http://johnmacfarlane.net/pandoc/demo/example25.textile

The reader no longer hangs.
2014-05-07 09:32:25 -07:00
John MacFarlane
84f2336a7d Textile reader: Rearranged inline parsers for performance.
This is possible because of the rewrite of simpleInline.
Also removed a redundant parser for grouped inlines.
2014-05-06 23:41:56 -07:00
John MacFarlane
442eecc15c Textile reader: Rewrote simpleInline for clarity and efficiency.
This way we only look once for the opening `[`.
2014-05-06 23:27:16 -07:00
John MacFarlane
ea4e947bd0 Textile reader: Disallow blank lines in inline contexts.
@hi

    there@

should not be a single code span.
2014-05-06 23:16:47 -07:00
John MacFarlane
d6a9ba1cdc Make --trace work with textile reader. 2014-05-06 22:28:11 -07:00
John MacFarlane
10644607e3 Textile reader: Rewrote some inline parsing code for clarity.
(It seems clearer to put the whitespace parsing in the grouped
parser.  This also uses stateLastStrPos to determine when the
border is adjacent to an alphanumeric.)
2014-05-06 22:14:35 -07:00
Albert Krewinkel
71bd4fb2b3 Org reader: Read inline code blocks
Org's inline code blocks take forms like `src_haskell(print "hi")` and
are frequently used to include results from computations called from
within the document.  The blocks are read as inline code and marked with
the special class `rundoc-block`.  Proper handling and execution of
these blocks is the subject of a separate library, rundoc, which is
work in progress.

This closes #1278.
2014-05-06 13:21:26 +02:00
John MacFarlane
dbd6c1540f Fixed the fix to #1154.
We need to strip off up to 4 spaces, not up to 3.
2014-05-04 16:21:18 -07:00
John MacFarlane
51aa304834 LaTeX writer: Fixed inconsistencies with reference escaping.
- toLabel is now monadic, and it does the needed string escaping.
- Closes #1130.
2014-05-04 14:43:05 -07:00
John MacFarlane
0c7e084342 Docx writer: Fall back on distribution reference.docx.
* Undid changes to parseXml in last commit.
* Instead of a string fallback, we have parseXml fall back
  on the reference.docx that comes with pandoc if the user's
  reference.docx does not contain a needed file.
* Closes #1185.
2014-05-04 10:54:45 -07:00
John MacFarlane
d728715981 Docx writer: Added ability to give fallback in parseXml. 2014-05-04 10:45:20 -07:00
John MacFarlane
3e42f08e87 Markdown reader: Fixed bug with unwanted code in lists.
Closes #1154.

When reading a raw list item, we now strip off nonindent
spaces.
2014-05-04 08:07:17 -07:00
John MacFarlane
96c0c950ca AsciiDoc writer: Handle multiblock table cells.
Closes #1246.
2014-05-03 21:31:53 -07:00
John MacFarlane
fde52c25a6 AsciiDoc writer: Correctly handle empty table cells.
Closes #1245.
2014-05-03 21:08:45 -07:00
John MacFarlane
abd3a039b9 DocBook writer: Small tweaks to last commit.
* Use isTightList from Shared.
* Adjust writer test, since isTightList is a bit different from what
  was used before.

Closes #1250.
2014-05-03 20:45:38 -07:00
Neil Mayhew
ccbf4fc9c2 Distinguish tight and loose lists in Docbook output
Determined by the first block of the first item being Plain.
2014-05-03 18:37:02 -07:00
John MacFarlane
2ba7873086 LaTeX reader: Fixed regression introduced with last commit.
Tests now pass again.
2014-05-03 18:34:23 -07:00