Hufflepdf

Author	SHA1	Message	Date
Tissevert	3b1a5152e4	Try connecting all the Box instance in the getText demo, try to encode pages contents with a simple assoc list	2020-03-10 22:57:11 +01:00
Tissevert	2b9abc24b6	Add a separate instance for Raw streams that don't try to decode them	2020-03-04 18:31:30 +01:00
Tissevert	309f6ed461	Actually re-implement getText with the simpler Box instance	2020-03-04 18:19:10 +01:00
Tissevert	cb257fc07e	Rename function for clarity : actually it's doing just what w StreamContent does, but without checking the headers to re-zlib-encode the stream content	2020-02-27 17:30:42 +01:00
Tissevert	99014ff30d	Recognize openStream was just an implementation of r for the Box m () Object ByteString, and extend it implementing the w operation while we're at it	2020-02-26 22:13:29 +01:00
Tissevert	bcf2e05bfb	Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already)	2020-02-17 15:29:59 +01:00
Tissevert	6096a1a237	Simplify navigations by centering everything on Objects to avoid needing to many conversion tools between DirectObject / Object / Dictionary	2020-02-15 13:51:24 +01:00
Tissevert	23186100a8	Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access	2020-02-15 10:25:09 +01:00
Tissevert	ae938acc02	Merge branch 'main' into extract-text	2020-02-12 17:34:56 +01:00
Tissevert	325250383a	Add support for fonts and implement MacRomanEncoding	2020-02-08 08:15:32 +01:00
Tissevert	8373bd1ea0	Removing +x permission on getText source that shouldn't ever have been set	2019-11-29 19:07:54 +01:00
Tissevert	7eca875900	Improve getObj example to catch no-existing ObjectId and default to listing existing ObjectIds when none is provided	2019-11-29 11:53:08 +01:00
Tissevert	f9f799c59b	Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1)	2019-11-29 11:51:35 +01:00
Tissevert	c9f050e64b	Remove deprecated debug script and forgotten comments to bypass the selective export of Text module	2019-10-14 10:17:15 +02:00
Tissevert	3a3e1533b4	Clean ByteString types to identify when a ByteString contains the representation of an integer in a given base and fix the last remaining PDF string (un)escaping issue	2019-10-14 10:17:15 +02:00
Tissevert	36d7f9b819	Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary	2019-10-14 10:17:15 +02:00
Tissevert	3b59fd0c61	Separate CMap and Text in two distinct modules	2019-10-14 10:17:15 +02:00
Tissevert	0374b72920	Finish implementing reading, still bugs to investigate	2019-10-14 10:17:15 +02:00
Tissevert	e7484ef536	Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps	2019-10-14 10:17:15 +02:00
Tissevert	f9e5683bf4	WIP: Use previous changes to start implementing font caching and text parsing (still very broken, doesn't compile)	2019-10-14 10:17:15 +02:00
Tissevert	6f3c159ea7	Adding a module to implement text reading and a demo program to go with it	2019-10-14 10:17:15 +02:00
Tissevert	68f90d20e2	Implement PDF's multilayer updates and use it in getObj to display only the current version of the object taken into account instead of the concatenation of all its versions	2019-09-22 01:40:39 +02:00
Tissevert	9ab010de61	Add to example programs to show how the lib can be used	2019-09-20 22:42:17 +02:00

23 commits