Hufflepdf

Author	SHA1	Message	Date
Tissevert	d9f69014a0	Make a couple improvements in performance + add an example script to extract pages from a PDF	2020-05-28 18:54:15 +02:00
Tissevert	09bd706748	Export Content operators, needed to write filters like reveal	2020-03-19 10:27:29 +01:00
Tissevert	ba7dd6a690	Make cacheFonts slightly more useful by passing layer directly to it and run the ReaderT underneath	2020-03-19 10:27:29 +01:00
Tissevert	d21e14f9a4	Hey, zlib isn't needed anymore for getText since all decoding is done directly in the Box instance for Streams	2020-03-19 10:27:28 +01:00
Tissevert	f31e9eb38b	Generalize Ids out of Content to handle Object Ids too	2020-03-19 10:27:21 +01:00
Tissevert	f2a99e1fd2	Reorder module PDF.Body in alphabetical order	2020-03-14 16:25:26 +01:00
Tissevert	5b8d951516	WIP: Try about everything that's possible to try, OrderedMap or [(,)], try to decouple Box instance for Content and the one for Indexed Text, breaks getText… will probably require some advanced effect library, there seems to be a weird MonadReader conflict in the errors messages	2020-03-11 18:55:18 +01:00
Tissevert	3b1a5152e4	Try connecting all the Box instance in the getText demo, try to encode pages contents with a simple assoc list	2020-03-10 22:57:11 +01:00
Tissevert	dce10ae63a	Keep Page as only a reference object keeping the ObjectId explicit so we can modify the actual objects one day, write an OrderedMap data structure to help	2020-03-08 22:18:47 +01:00
Tissevert	a9252b129a	Start a Box module to describe inclusion relations between different types and get a MonadState action on the top type for any modification down there	2020-02-23 22:24:59 +01:00
Tissevert	bcf2e05bfb	Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already)	2020-02-17 15:29:59 +01:00
Tissevert	23186100a8	Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access	2020-02-15 10:25:09 +01:00
Tissevert	a72d76e229	Add unit tests to make sure I'm not breaking things too much	2020-02-14 17:58:03 +01:00
Tissevert	aed7af376a	WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell	2020-02-11 08:29:08 +01:00
Tissevert	9f1b1afafe	Implement Text rendering from parsed Content	2020-02-10 10:54:44 +01:00
Tissevert	20466c4f13	WIP: Clean code parsing «pages» (now Content), separated from text rendering (will be reimplemented as an upper layer, also providing modification as stream filters) — Page is also forgotten for now, will need a big improvement in Object navigation	2020-02-09 22:42:57 +01:00
Tissevert	325250383a	Add support for fonts and implement MacRomanEncoding	2020-02-08 08:15:32 +01:00
Tissevert	f9f799c59b	Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1)	2019-11-29 11:51:35 +01:00
Tissevert	42a02808c1	Merge branch 'main' into extract-text	2019-11-27 18:05:47 +01:00
Tissevert	380c1e439b	Fix a bug preventing Hufflepdf from reading objects with a ' ' after the `obj` keyword	2019-11-27 18:01:19 +01:00
Tissevert	c9f050e64b	Remove deprecated debug script and forgotten comments to bypass the selective export of Text module	2019-10-14 10:17:15 +02:00
Tissevert	36d7f9b819	Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary	2019-10-14 10:17:15 +02:00
Tissevert	3b59fd0c61	Separate CMap and Text in two distinct modules	2019-10-14 10:17:15 +02:00
Tissevert	1dd22c3889	Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from strings	2019-10-14 10:17:15 +02:00
Tissevert	c349d9b4c2	Don't trust serializer, they have nothing todo with a reasonable binary encoding	2019-10-14 10:17:15 +02:00
Tissevert	e7484ef536	Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps	2019-10-14 10:17:15 +02:00
Tissevert	6f3c159ea7	Adding a module to implement text reading and a demo program to go with it	2019-10-14 10:17:15 +02:00
Tissevert	d6994f0813	Release 0.2.0.0	2019-10-14 10:16:14 +02:00
Tissevert	68f90d20e2	Implement PDF's multilayer updates and use it in getObj to display only the current version of the object taken into account instead of the concatenation of all its versions	2019-09-22 01:40:39 +02:00
Tissevert	9ab010de61	Add to example programs to show how the lib can be used	2019-09-20 22:42:17 +02:00
Tissevert	dd79cb3fc7	Release bugfix v0.1.1.1	2019-05-31 15:16:23 +02:00
Tissevert	11cb6504d7	Go strict ByteStrings with attoparsec	2019-05-24 10:48:09 +02:00
Tissevert	b60f337cc4	First useable version	2019-05-18 11:09:03 +02:00
Tissevert	2c165daaa7	Finally opt for uppercase Hufflepdf and rename cabal package	2019-05-18 09:49:31 +02:00

34 Commits