Commit graph

1094 commits

Author SHA1 Message Date
Jesse Rosenthal
9614ddfedc Docx reader: Remove unnecessary filter in Parse.
mapMaybe does the filtering for us.
2014-06-25 11:00:15 -04:00
Jesse Rosenthal
ed44e4ca8c Docx reader: Add rudimentary track changes support.
This will only read the insertions, and ignore the deletions.
2014-06-25 10:38:01 -04:00
Jesse Rosenthal
38e1d3e95b Docx reader: Parse Insertions and Deletions.
This is just for the Parse module, reading it into the Docx format. It
still has to be translated into pandoc.
2014-06-25 10:32:48 -04:00
Jesse Rosenthal
c343f1a90b Docx Reader: Add change types
Insertion and deletion. Dates are just strings for now.
2014-06-25 08:10:19 -04:00
Jesse Rosenthal
69743cd598 Docx reader: Ignore zero (or negative) indent
If a block has an indentation less than or equal to zero, it should not be
treated as a block quote.
2014-06-24 15:06:25 -04:00
Jesse Rosenthal
a8866bc121 Docx reader: remove T.P.Generic import.
This marks the removal of the final tree-walk in the code. (Though there
is still one in the Lists module.)
2014-06-24 12:15:26 -04:00
Jesse Rosenthal
5ae6b8c6f1 Docx reader: pass definition test.
This commit also fixes a problem with the previous code pushes, which
wouldn't allow code blocks to share a div.
2014-06-24 12:12:02 -04:00
Jesse Rosenthal
bebea5e936 Docx reader: pass code tests. 2014-06-24 10:34:07 -04:00
Jesse Rosenthal
08633fad33 Add copyright block to T.P.R.Docx.Reducible. 2014-06-23 20:26:08 -04:00
John MacFarlane
ac6756009f Merge pull request #1366 from jkr/reducible3
Docx rewrite and cleanup (in terms of Reducible typeclass)
2014-06-23 14:33:38 -07:00
Jesse Rosenthal
11b0778744 Use Reducible in docx reader.
This cleans up them implementation, and cuts down on tree-walking.
Anecdotally, I've seen about a 3-fold speedup.
2014-06-23 17:08:17 -04:00
Jesse Rosenthal
94d0fb1538 Move some of the clean-up logic into List module.
This will allow us to get rid of more general functions we no longer need in
the main reader.
2014-06-23 17:08:17 -04:00
Jesse Rosenthal
ef5fad2698 Add new typeclass, Reducible
This defines a typeclass `Reducible` which allows us to "reduce" pandoc
Inlines and Blocks, like so

    Emph [Strong [Str "foo", Space]] <++> Strong [Emph [Str "bar"]], Str
"baz"] =
        [Strong [Emph [Str "foo", Space, Str "bar"], Space, Str "baz"]]

So adjacent formattings and strings are appropriately grouped.

Another set of operators for `(Reducible a) => (Many a)` are also
included.
2014-06-23 17:08:05 -04:00
John MacFarlane
e03ed7377c Markdown reader: Combine consecutive latex environments.
This helps when you have two minipages which can't have
blank lines between them.

See #690, #1196.
2014-06-23 12:42:27 -07:00
Jesse Rosenthal
8e5bd9d851 Docx reader: Fix spacing in formatting.
The normalizing tests revealed a problem with unformatted spaces, brought about
by `spanTrim`. This fixes by not trimming the spaces out of spans until they
are in their final form.
2014-06-22 01:53:30 -04:00
Jesse Rosenthal
9c7e0dc84b Implement new normalization.
There were some problems with the old str normalization. This fixes those
problems. Also, since it drills down on its own, it only needs to be
mapped over the blocks, not walked over the tree.
2014-06-22 00:45:18 -04:00
John MacFarlane
5d0103606f Markdown reader: Support smallcaps through span.
`<span style="font-variant:small-caps;">foo</span>` will be
parsed as a `SmallCaps` inline, and will work in all output
formats that support small caps.

Closes #1360.
2014-06-20 15:26:45 -07:00
John MacFarlane
d397a66107 MediaWiki reader: Tightened up template parsing.
The opening "{{" must be followed by an alphanumeric or ':'.
This prevents the exponential slowdown in #1033.
Closes #1033.
2014-06-20 12:00:26 -07:00
John MacFarlane
8f20ac3da3 MediaWiki reader: Support --trace. 2014-06-20 11:39:24 -07:00
John MacFarlane
56c410ef6a Markdown reader: Prevent spurious line breaks after list items.
When the `hard_line_breaks` option was specified, pandoc would
produce a spurious line break after a tight list item.  This
patch solves the problem.  Closes #1137.
2014-06-20 11:10:35 -07:00
John MacFarlane
b3b40546cb HTML reader: Fix performance issue with malformed HTML tables.
We let a `</table>` tag close an open `<tr>` or `<td>`.
Closes #1167.
2014-06-20 10:47:29 -07:00
John MacFarlane
cab4b829b3 Support --trace in HTML reader. 2014-06-20 10:39:24 -07:00
Jesse Rosenthal
f6ae644831 Make strNormalize go bottomUp.
This was how it used to be before it was folded into blockNormalize.
2014-06-20 12:31:36 -04:00
Jesse Rosenthal
2aa5f58c5b Docx reader: Add a comment explaining strNormalize
`normalize` from Text.Pandoc.Shared is more general. In tests, though,
it more than doubles the run time. `strNormalize` does less, but it does
what we need. This comment is added for future maintainability.
2014-06-20 10:27:18 -04:00
Jesse Rosenthal
03af19a7e1 Docx Reader: Normalize DefinitionLists
Previously DefinitionList had been left out of `blockNormalize`. Now it
is included.
2014-06-20 10:20:37 -04:00
Jesse Rosenthal
3da515bdb0 Docx reader: simplify blockNormalize
Use a function `stripSpaces`, instead of recursion. Makes it a bit
easier to read and mantain, and simplify normalizing DefinitionList,
which was left out the first time.
2014-06-20 10:12:28 -04:00
Jesse Rosenthal
7fd48b30e0 Docx reader: Fix hdr handling in block norm
`blockNormalize` previously forgot to account for the case in which a
Header's inlines did not start with a space.
2014-06-20 09:30:30 -04:00
John MacFarlane
3c059dbe60 HTML reader: Allow space between <col> and </col>.
Test case:

```
<table border="1">
  <colgroup>
    <col> </col>
    <col></col>
  </colgroup>
  <tbody>
    <tr>
        <td>X</td>
        <td>Y</td>
    </tr>
    <tr>
        <td>1</td>
        <td>2</td>
    </tr>
  </tbody>
</table>

```
2014-06-19 23:24:28 -07:00
Jesse Rosenthal
a934db9a32 Introduce blockNormalize
This will help take care of spaces introduced at the beginning of strings.
2014-06-19 19:28:55 -04:00
Jesse Rosenthal
0e7d2dbd43 Have Docx reader properly interpret tabs. 2014-06-19 17:55:02 -04:00
Jesse Rosenthal
86fc44d6b3 Add literal tabs to parser. 2014-06-19 17:53:52 -04:00
John MacFarlane
ff6a2baeb9 More polish on Haddock reader/writer. 2014-06-18 17:49:59 -07:00
John MacFarlane
35e57db5c2 Finished first draft of Haddock writer. 2014-06-18 17:09:36 -07:00
John MacFarlane
9fc5c8d7af Rewrote haddock reader to use haddock-library.
This brings pandoc's rendering of haddock markup in line
with the new haddock.

Note that we preserve line breaks in `@` code blocks, unlike
the earlier version.

Modified tests pass.  More tests would be good.
2014-06-18 14:18:55 -07:00
John MacFarlane
ab390a10ec Removed old haddock reader code. Add dependency on haddock-library.
This also removes the dependency on alex and happy.
2014-06-18 11:33:09 -07:00
John MacFarlane
59272e4d99 DocBook reader: Support <?asciidoc-br?>.
Closes #1236.

Note, this is a bit of a kludge, to work around the fact that xml-light
doesn't parse `<?asciidoc-br?>` correctly.  We preprocess the input,
replacing that instruction with `<br/>`, and then parse that as a line
break.  Other XML instructions are simply removed from the input stream.
2014-06-17 12:14:02 -07:00
John MacFarlane
fc291efad3 LaTeX reader: Correctly handle table rows with too few cells.
LaTeX seems to treat them as if they have empty cells at the
end.  Closes #241.
2014-06-17 00:38:55 -07:00
John MacFarlane
7d60c798bf Fixed compiler warning. 2014-06-16 23:02:20 -07:00
John MacFarlane
bbe99003f8 Naming: Use Docx instead of DocX.
For consistency with the existing writer.
2014-06-16 22:44:40 -07:00
John MacFarlane
bec9f3c641 Merge branch 'docx' of https://github.com/jkr/pandoc into jkr-docx 2014-06-16 22:16:45 -07:00
John MacFarlane
78ee2416d1 Org reader: make tildes create inline code.
Closes #1345.  Also relabeled 'code' and 'verbatim' parsers
to accord with the org-mode manual.

I'm not sure what the distinction between code and verbatim
is supposed to be, but I'm pretty sure both should be represented
as Code inlines in pandoc.  The previous behavior resulted in the
text not appearing in any output format.
2014-06-16 22:03:26 -07:00
John MacFarlane
f9b97e6bfb Small improvement to fix to #1333.
This allows blank lines at end of multiline headers.
2014-06-16 21:26:50 -07:00
John MacFarlane
9da5d8955e Markdown reader: fixed #1333 (table parsing bug). 2014-06-16 21:18:24 -07:00
John MacFarlane
87c08be58f LaTeX reader: handle leading/trailing spaces in emph better.
`\emph{ hi }` gets parsed as `[Space, Emph [Str "hi"], Space]`
so that we don't get things like `* hi *` in markdown output.

Also applies to textbf and some other constructions.

Closes #1146.  (`--normalize` isn't touched by this, but
normalization should not generally be necessary with the
changes to the readers.)
2014-06-16 19:18:33 -07:00
John MacFarlane
459805de4c LaTeX reader: don't assume preamble doesn't contain environments.
Closes #1338.
2014-06-16 17:43:56 -07:00
John MacFarlane
31fd843133 HTML reader: Fixed major parsing problem with HTML tables.
Table cells were being combined into one cell.  Closes #1341.
2014-06-16 15:45:20 -07:00
John MacFarlane
2b364b34bb Merge pull request #1344 from mpickering/master
Moved extractSpaces to Shared.hs
2014-06-16 14:43:43 -07:00
John MacFarlane
01ef573ac2 Org reader: fixed #1342.
This change rewrites `inlineLaTeXCommand` so that parsec will
know when input is being consumed.  Previously a run-time
error would be produced with some input involving raw latex.
(I believe this does not affect the last release, as the inline
latex reading was added recently.)
2014-06-16 14:18:06 -07:00
mpickering
7807564d44 Moved extractSpaces to Shared.hs
Generalised and move the extractSpaces function from `HTML.hs` to
`Shared.hs` so that the docx reader can also use it.
2014-06-16 20:45:54 +01:00
Jesse Rosenthal
293e4cfdc3 Add DocX files to tree.
This introduces Text.Pandoc.DocX, and its exported `readDocX` function.
2014-06-16 07:18:34 -04:00