Commit graph

1150 commits

Author SHA1 Message Date
Matthew Pickering
089745af61 EPUB Reader: Added EPUB reader 2014-07-31 21:39:49 +01:00
John MacFarlane
6dd2418476 New module, Text.Pandoc.MediaBag.
Moved `MediaBag` definition and functions from Shared:
`lookupMedia`, `mediaDirectory`, `insertMedia`, `extractMediaBag`.
Removed `emptyMediaBag`; use `mempty` instead, since `MediaBag`
is a Monoid.
2014-07-31 12:00:21 -07:00
John MacFarlane
00662faefb Made MediaBag a newtype, and added mime type information to media.
Shared now exports functions for interacting with a MediaBag:

- `emptyMediaBag`
- `lookuMedia`
- `insertMedia`
- `mediaDirectory`
- `extractMediaBag`
2014-07-31 11:05:35 -07:00
John MacFarlane
71e76175be getT2TMeta: Take list of source files instead of single.
Get latest modification time.
2014-07-30 17:25:00 -07:00
Jesse Rosenthal
9ce2295700 Docx reader: Make docx reader put image data in MediaBag.
Image data will not be put in a media bag map, which will be output
along with the pandoc output.
2014-07-30 12:46:03 -04:00
Jesse Rosenthal
840108a9c1 Docx reader: Make metavalues out of styled paragraphs.
This will make paragraphs styled with `Author`, `Title`, `Subtitle`,
`Date`, and `Abstract` into pandoc metavalues, rather than text. The
implementation only takes those elements from the beginning of the
document (ignoring empty paragraphs).

Multiple paragraphs in the `Author` style will be made into a metaList,
one paragraph per item. Hard linebreaks (shift-return) in the paragraph
will be maintained, and can be used for institution, email, etc.
2014-07-29 13:03:01 -04:00
John MacFarlane
d9751f91c4 Merge pull request #1457 from mpickering/generalstate
Generalised more in Parsing.hs to enable the use of custom state
2014-07-26 19:01:26 -07:00
Matthew Pickering
9e4604fa0b Added compatability layer to support directory-1.1 2014-07-27 00:36:23 +01:00
Matthew Pickering
f0a32197c8 Txt2Tags Reader: Added copyright information 2014-07-27 00:12:57 +01:00
Matthew Pickering
43304d6bd6 Txt2Tags Reader: Added recognition of macros 2014-07-27 00:12:56 +01:00
Matthew Pickering
7d04d383a6 Added txt2tags reader
http://txt2tags.org/

There are two points which currently do not match the official
implementation.

1. In the official implementation lists can not be nested like the
following but the reader would interpret this as a bullet list with the
first item being a numbered list.

```
  - + This is not a list
```

2. The specification describes how URIs automatically becomes links.
Unfortunately as is often the case, their definitiong of URI is not
clear. I tried three solutions but was unsure about which to adopt.

* Using isURI from Network.URI, this matches far too many strings and is
therefore unsuitable
* Using uri from Text.Pandoc.Shared, this doesn't match all strings that
the reference implementation matches
* Try to simulate the regex which is used in the native code

I went with the third approach but it is not perfect, for example
trailing punctuation is captured in Urls.
2014-07-27 00:12:56 +01:00
Matthew Pickering
5e2d22a27e Generalised more in Parsing.hs to enable the use of custom state 2014-07-26 21:23:49 +01:00
John MacFarlane
9c3f7688ee DocBook reader: Better handle elements inside code environments.
Of course, we can't include structure in the code block, but
this way we at least preserve the text.  Closes #1449.
2014-07-23 10:06:36 -07:00
Matthew Pickering
83028c5982 Exported runParserT and Stream 2014-07-22 16:37:41 +01:00
John MacFarlane
98c7ada061 HTML reader: parse Div and Span elements even without --parse-raw.
Closes #1434.
2014-07-20 21:43:54 -07:00
John MacFarlane
b6c769084e Fix behavior of markdown_attribute extension.
It now works as in PHP markdown extra.  Setting `markdown="1"` on
an outer tag affects all contained tags until it is reversed with
`markdown="0"`.  Closes #1378.

Added `stateMarkdownAttribute` to `ParserState`.
2014-07-20 17:44:28 -07:00
John MacFarlane
a243afb551 Markdown reader: Fixed small bug in HTML parsing with markdown_attribute.
Test case:

    <aside markdown="1">
    *hi*
    </aside>

Previously gave:

    <article markdown="1">
    <p><em>hi</em> </article></p>
2014-07-20 17:22:29 -07:00
John MacFarlane
4af8eed764 Markdown reader: revised definition list syntax (closes #1429).
* This change brings pandoc's definition list syntax into alignment
  with that used in PHP markdown extra and multimarkdown (with the
  exception that pandoc is more flexible about the definition markers,
  allowing tildes as well as colons).

* Lazily wrapped definitions are now allowed; blank space is required
  between list items; and the space before definition is used to
  determine whether it is a paragraph or a "plain" element.

* For backwards compatibility, a new extension,
  `compact_definition_lists`, has been added that restores the behavior
  of pandoc 1.12.x, allowing tight definition lists with no blank space
  between items, and disallowing lazy wrapping.
2014-07-20 16:33:59 -07:00
John MacFarlane
87096c64f8 Org reader: text adjacent to a list yields a Plain, not Para.
This gives better results for tight lists.  Closes #1437.

An alternative solution would be to use Para everywhere, and
never Plain.  I am not sufficiently familiar with org to know
which is best.  Thoughts, @tarleb?
2014-07-20 12:56:01 -07:00
Matthew Pickering
e7d8039969 Renamed readTeXMath' to avoid name conflict with texmath 0.6.7
Also removed deprecated readTeXMath.
2014-07-19 18:10:59 +01:00
Craig S. Bosma
1bb4f0c497 Org reader: Respect :exports header arguments on code blocks
Adds support to the org reader for conditionally exporting either the code block,
results block immediately following, both, or neither, depending on the value
of the `:exports` header argument. If no such argument is supplied, the default
org behavior (for most languages) of exporting code is used.
2014-07-17 10:23:22 -05:00
John MacFarlane
1bff443ac9 Removed redundant clause in markdown parser.
Thanks @dubiousjim.  Close #1431.
2014-07-16 07:55:39 -07:00
Jesse Rosenthal
4b2d07a642 Docx Reader: Fix hdr auto-id when already auto-id.
If header anchors (bookmarks in a header paragraph) already have an
auto-id, which will happen if they're generated by pandoc, we don't want
to rename it twice, and thus end up with an unnecessary number at the
end. So we add a state value to check if we're in a header. If we are,
we don't rename the bookmark -- wait until we rename it in our header
handling.
2014-07-16 03:50:38 +01:00
Jesse Rosenthal
a4671afd64 Docx Reader: Change state handling.
We don't need `updateDState` -- the built-in `modify` works just
fine. And we redefine `withDState` to use modify.
2014-07-16 03:43:14 +01:00
John MacFarlane
cb62cd08e0 Use renderTags' for all tag rendering.
This properly handles tags that should be self-closing.
Previously `<hr/>` would appear in EPUB output as `<hr></hr>`.
Closes #1420.
2014-07-13 08:28:28 -07:00
John MacFarlane
4676bfdf82 Removed space at ends of lines in source. 2014-07-12 22:57:22 -07:00
John MacFarlane
8bbcff0cfc Merge pull request #1414 from mpickering/general
Improvements to Parsing.hs
2014-07-12 14:11:09 -07:00
John MacFarlane
6c4345aa0b Merge pull request #1415 from jkr/nicertype
Nicer Docx type
2014-07-12 14:06:29 -07:00
Jesse Rosenthal
fe2eda9d54 Docx Reader: Add a compatibility layer for Except.
mtl switched from ErrorT to ExceptT, but we're not sure which mtl we'll
be dealing with. This should make errors work with both.

The main difference (beside the name of the module and the monad
transformer) is that Except doesn't require an instance of an Error
Typeclass. So we define that for compatability. When we switch to a
later mtl, using Control.Monad.Exception, we can just erase the instance
declaration, and all should work fine.
2014-07-12 18:04:06 +01:00
Jesse Rosenthal
d65fd58171 Docx Reader: A nicer Docx type.
This modifies the Docx type in the parser to avoid all the extra files
(Notes, numbering, etc). A reader monad keeps track of these, and applies
them at the end. The reader monad is stacked with ErrorT to enable better
error-handling than the old Maybes. (Note that the better error handling
isn't really there yet, but it is now possible.)

One long-term goal of these changes is to make it easier to write the Docx
type. This should make it easier to develop a standalone docx package in the
future.
2014-07-12 18:03:27 +01:00
Matthew Pickering
2fb8063f78 Removed (>>~) function
This function is equivalent to the more general (<*) which is defined in
Control.Applicative. This change makes pandoc code easier to understand for
those not familar with the codebase.
2014-07-11 12:51:26 +01:00
John MacFarlane
9e0495cb83 Fixed an issue caused by e4263d306e.
This sets `stateInHtmlBlock` to `Just "div"` when we're parsing
an HTML div.

Without this fix, a closing `</div>` tag could be parsed as part
of a list item rather than after the list.
2014-07-10 15:04:18 -07:00
John MacFarlane
83d4c2733c Markdown reader: Fixed regression with intraword underscores.
Closes #1121.
2014-07-10 14:51:08 -07:00
John MacFarlane
ee522be94f Markdown reader: Slight rewrite of enclosure/emphOrStrong code.
Semantics should be the same.
2014-07-10 14:37:10 -07:00
John MacFarlane
e4263d306e Revamped raw HTML block parsing in markdown.
- We no longer include trailing spaces and newlines in the
  raw blocks.
- We look for closing tags for elements (but without backtracking).
- Each block-level tag is its own RawBlock; we no longer try to
  consolidate them (though `--normalize` will do so).

Closes #1330.
2014-07-07 15:53:59 -06:00
John MacFarlane
cbeb931554 HTML reader: adjust blockTags and eitherBlockOrInline.
- Added `audio` and `source` in `eitherBlockOrInline`.
- Moved `video`, `svg`, `progress`, `script`, `noscript`, `svg` from
  `blockTags` to `eitherBlockOrInline`.
- `map` and `object` were mistakenly in both lists; they have been removed
  from `blockTags`.
2014-07-07 15:53:59 -06:00
Jesse Rosenthal
d77ccbba63 Docx Reader: Write LaTeX based on equations in word.
This is a first stab at writing out equations in LaTeX based on
omml equations in Word. There are some glitches: unicode chars not known to
LaTeX are silently skipped, and functions (such as `\oiiint`) not in the
standard LaTeX packages are inserted, which can lead to pdf compilation
errors (depending, of course, on your preamble).

Adding, for example, `\usepackage[charter]{mathdesign}` to the preamble will
allow you to use most of the more esoteric functions.
2014-07-02 16:54:33 -04:00
Jesse Rosenthal
9f4bacf86f Docx Reader: Add new file, TexChar.
This will allow us to deal with unicode characters from word equations. This
part of the process will need to continue to be improved.
2014-07-02 16:53:28 -04:00
Jesse Rosenthal
2bc0c77791 Docx Reader: Parse omml equations. 2014-07-02 16:52:39 -04:00
Jesse Rosenthal
0abfd386a4 Docx reader: clean up parStyle processing.
This gets rid of `divAttrToContainers`: an internal convenience function
which had become pretty inconvenient. Rather than converting classes and
indentations to string lists and back, we deal with the `pPr` attribute
directly.
2014-06-30 11:19:06 -04:00
John MacFarlane
aad618d9db Merge pull request #1386 from jkr/hanging_indent
Fix hanging indent behavior
2014-06-29 21:31:44 -07:00
Jesse Rosenthal
0f59196e0e Docx reader: Make use of new ParIndentation info.
Here, when hanging indents are greater than or equal to left indents, we
don't set it to block quote. Such indents are frequently used in
academic bibliographies. (Thanks to Caleb McDaniel.)
2014-06-29 23:20:10 -04:00
Jesse Rosenthal
c0fcc8a789 Docx reader: Add ParIndentation type to parser.
This lets us keep more information about the indentation, and act
accordingly in the reader.
2014-06-29 23:19:56 -04:00
Jesse Rosenthal
b1eba3c65c Docx Reader: Update state properly
Previously, a fresh state was created for the purpose of updating. In
the future, when there is more than one field in the state, this
obviously won't work.
2014-06-29 08:14:36 -04:00
Jesse Rosenthal
c0a8d5ac72 Docx Reader: All headers get auto id.
Previously, only those with an anchor got an auto id. Now, all do, which
puts it in line with pandoc's markdown extension.
2014-06-28 17:47:00 -04:00
Jesse Rosenthal
dce360e1e6 Docx Reader: Introduce link rewriting. 2014-06-28 04:00:16 -04:00
Jesse Rosenthal
b89a3ba2b1 make makeHeaderAnchors make an auto id
Record relationship between original id and auto id, so we can fix links
after.
2014-06-28 04:00:16 -04:00
Jesse Rosenthal
5969baf5b9 Rewrote header generation.
In preparation for auto ids.
2014-06-28 04:00:16 -04:00
Jesse Rosenthal
1de8d4d087 Docx Reader: Simplify makeHeaderAnchors
Using pattern guard, in preparation for doing some more complicated
stuff with it (recording header anchors, so we can change them to auto
ids.)
2014-06-28 04:00:16 -04:00
Jesse Rosenthal
ab76bbebbe Docx Reader: Clean up guards
Use PatternGuards to get rid of need for `isJust`, `fromJust`
altogether.
2014-06-28 04:00:16 -04:00