Commit Graph

21 Commits

Author SHA1 Message Date
Tissevert fe055150a3 Migrate to Text to represent page contents and get rid of encoding concerns to early 2020-02-07 10:49:16 +01:00
Tissevert 5fa32e35db Implement Font retrieving for simple fonts with an /Encoding and no ToUnicode 2020-02-05 22:15:18 +01:00
Tissevert b859338a57 Start implementing the MacRomanEncoding 2020-02-05 18:03:44 +01:00
Tissevert 22cde37025 Add a Font class type to allow text rendition schemes other than CMaps 2020-02-05 14:42:51 +01:00
Tissevert f9f799c59b Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1) 2019-11-29 11:51:35 +01:00
Tissevert 42a02808c1 Merge branch 'main' into extract-text 2019-11-27 18:05:47 +01:00
Tissevert 380c1e439b Fix a bug preventing Hufflepdf from reading objects with a ' ' after the `obj` keyword 2019-11-27 18:01:19 +01:00
Tissevert c9f050e64b Remove deprecated debug script and forgotten comments to bypass the selective export of Text module 2019-10-14 10:17:15 +02:00
Tissevert 36d7f9b819 Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary 2019-10-14 10:17:15 +02:00
Tissevert 3b59fd0c61 Separate CMap and Text in two distinct modules 2019-10-14 10:17:15 +02:00
Tissevert 1dd22c3889 Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from strings 2019-10-14 10:17:15 +02:00
Tissevert c349d9b4c2 Don't trust serializer, they have nothing todo with a reasonable binary encoding 2019-10-14 10:17:15 +02:00
Tissevert e7484ef536 Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps 2019-10-14 10:17:15 +02:00
Tissevert 6f3c159ea7 Adding a module to implement text reading and a demo program to go with it 2019-10-14 10:17:15 +02:00
Tissevert d6994f0813 Release 0.2.0.0 2019-10-14 10:16:14 +02:00
Tissevert 68f90d20e2 Implement PDF's multilayer updates and use it in getObj to display only the current version of the object taken into account instead of the concatenation of all its versions 2019-09-22 01:40:39 +02:00
Tissevert 9ab010de61 Add to example programs to show how the lib can be used 2019-09-20 22:42:17 +02:00
Tissevert dd79cb3fc7 Release bugfix v0.1.1.1 2019-05-31 15:16:23 +02:00
Tissevert 11cb6504d7 Go strict ByteStrings with attoparsec 2019-05-24 10:48:09 +02:00
Tissevert b60f337cc4 First useable version 2019-05-18 11:09:03 +02:00
Tissevert 2c165daaa7 Finally opt for uppercase Hufflepdf and rename cabal package 2019-05-18 09:49:31 +02:00