|
ba7dd6a690
|
Make cacheFonts slightly more useful by passing layer directly to it and run the ReaderT underneath
|
2020-03-19 10:27:29 +01:00 |
|
|
d21e14f9a4
|
Hey, zlib isn't needed anymore for getText since all decoding is done directly in the Box instance for Streams
|
2020-03-19 10:27:28 +01:00 |
|
|
f31e9eb38b
|
Generalize Ids out of Content to handle Object Ids too
|
2020-03-19 10:27:21 +01:00 |
|
|
f2a99e1fd2
|
Reorder module PDF.Body in alphabetical order
|
2020-03-14 16:25:26 +01:00 |
|
|
5b8d951516
|
WIP: Try about everything that's possible to try, OrderedMap or [(,)], try to decouple Box instance for Content and the one for Indexed Text, breaks getText… will probably require some advanced effect library, there seems to be a weird MonadReader conflict in the errors messages
|
2020-03-11 18:55:18 +01:00 |
|
|
3b1a5152e4
|
Try connecting all the Box instance in the getText demo, try to encode pages contents with a simple assoc list
|
2020-03-10 22:57:11 +01:00 |
|
|
dce10ae63a
|
Keep Page as only a reference object keeping the ObjectId explicit so we can modify the actual objects one day, write an OrderedMap data structure to help
|
2020-03-08 22:18:47 +01:00 |
|
|
a9252b129a
|
Start a Box module to describe inclusion relations between different types and get a MonadState action on the top type for any modification down there
|
2020-02-23 22:24:59 +01:00 |
|
|
bcf2e05bfb
|
Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already)
|
2020-02-17 15:29:59 +01:00 |
|
|
23186100a8
|
Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access
|
2020-02-15 10:25:09 +01:00 |
|
|
a72d76e229
|
Add unit tests to make sure I'm not breaking things too much
|
2020-02-14 17:58:03 +01:00 |
|
|
aed7af376a
|
WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell
|
2020-02-11 08:29:08 +01:00 |
|
|
9f1b1afafe
|
Implement Text rendering from parsed Content
|
2020-02-10 10:54:44 +01:00 |
|
|
20466c4f13
|
WIP: Clean code parsing «pages» (now Content), separated from text rendering (will be reimplemented as an upper layer, also providing modification as stream filters) — Page is also forgotten for now, will need a big improvement in Object navigation
|
2020-02-09 22:42:57 +01:00 |
|
|
325250383a
|
Add support for fonts and implement MacRomanEncoding
|
2020-02-08 08:15:32 +01:00 |
|
|
f9f799c59b
|
Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1)
|
2019-11-29 11:51:35 +01:00 |
|
|
42a02808c1
|
Merge branch 'main' into extract-text
|
2019-11-27 18:05:47 +01:00 |
|
|
380c1e439b
|
Fix a bug preventing Hufflepdf from reading objects with a ' ' after the obj keyword
|
2019-11-27 18:01:19 +01:00 |
|
|
c9f050e64b
|
Remove deprecated debug script and forgotten comments to bypass the selective export of Text module
|
2019-10-14 10:17:15 +02:00 |
|
|
36d7f9b819
|
Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary
|
2019-10-14 10:17:15 +02:00 |
|
|
3b59fd0c61
|
Separate CMap and Text in two distinct modules
|
2019-10-14 10:17:15 +02:00 |
|
|
1dd22c3889
|
Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from strings
|
2019-10-14 10:17:15 +02:00 |
|
|
c349d9b4c2
|
Don't trust serializer, they have nothing todo with a reasonable binary encoding
|
2019-10-14 10:17:15 +02:00 |
|
|
e7484ef536
|
Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps
|
2019-10-14 10:17:15 +02:00 |
|
|
6f3c159ea7
|
Adding a module to implement text reading and a demo program to go with it
|
2019-10-14 10:17:15 +02:00 |
|
|
d6994f0813
|
Release 0.2.0.0
|
2019-10-14 10:16:14 +02:00 |
|
|
68f90d20e2
|
Implement PDF's multilayer updates and use it in getObj to display only the current version of the object taken into account instead of the concatenation of all its versions
|
2019-09-22 01:40:39 +02:00 |
|
|
9ab010de61
|
Add to example programs to show how the lib can be used
|
2019-09-20 22:42:17 +02:00 |
|
|
dd79cb3fc7
|
Release bugfix v0.1.1.1
|
2019-05-31 15:16:23 +02:00 |
|
|
11cb6504d7
|
Go strict ByteStrings with attoparsec
|
2019-05-24 10:48:09 +02:00 |
|
|
b60f337cc4
|
First useable version
|
2019-05-18 11:09:03 +02:00 |
|
|
2c165daaa7
|
Finally opt for uppercase Hufflepdf and rename cabal package
|
2019-05-18 09:49:31 +02:00 |
|