Commit Graph

22 Commits

Author SHA1 Message Date
Tissevert fe055150a3 Migrate to Text to represent page contents and get rid of encoding concerns to early 2020-02-07 10:49:16 +01:00
Tissevert 57996749c6 Fix loose parser not making sure endOfInput is reached; add two families of operators and simplify the «Show» instance with a dedicated function to allow deleting lines of uninteresting code 2020-02-06 16:54:27 +01:00
Tissevert 6ed57d66e8 Reimplement cMap as a type of Font and make the code ready for other Fonts 2020-02-05 17:42:17 +01:00
Tissevert cefb08ee50 Going a step further in «optimization» (slowing it even more…) by replacing choice by a search in a Map 2019-11-30 21:46:22 +01:00
Tissevert afbbcbffc5 Finish implementing the new stack-based call parser 2019-11-30 12:39:40 +01:00
Tissevert bac08446dd WIP: starting to fix this criminally inefficient parser for PDF's postfix-operator instructions 2019-11-29 17:42:57 +01:00
Tissevert 08a9717b3a Get rid of wrapper PageContents structure returned by PageContent in the PDF.Text module (and return directly [ByteString] instead) 2019-11-29 11:48:28 +01:00
Tissevert c9f050e64b Remove deprecated debug script and forgotten comments to bypass the selective export of Text module 2019-10-14 10:17:15 +02:00
Tissevert 3a3e1533b4 Clean ByteString types to identify when a ByteString contains the representation of an integer in a given base and fix the last remaining PDF string (un)escaping issue 2019-10-14 10:17:15 +02:00
Tissevert 7a15113285 Try and re-implement string decoding — compiles but now fails to decode any string 2019-10-14 10:17:15 +02:00
Tissevert 36d7f9b819 Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary 2019-10-14 10:17:15 +02:00
Tissevert b8ca7281aa Fix parsing errors forgetting to make sure there's a space after special operator arguments like names and stringObjects 2019-10-14 10:17:15 +02:00
Tissevert 32efdcdd6b Try and fix stuff by generalizing a signature to ease debugging and add parenthesis which I think should have been here all along 2019-10-14 10:17:15 +02:00
Tissevert 3b59fd0c61 Separate CMap and Text in two distinct modules 2019-10-14 10:17:15 +02:00
Tissevert 0374b72920 Finish implementing reading, still bugs to investigate 2019-10-14 10:17:15 +02:00
Tissevert 1dd22c3889 Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from strings 2019-10-14 10:17:15 +02:00
Tissevert 98d029c4d4 In complete debug, more or less implemented CMap parsing but apparently it uses UTF16 ?! 2019-10-14 10:17:15 +02:00
Tissevert c349d9b4c2 Don't trust serializer, they have nothing todo with a reasonable binary encoding 2019-10-14 10:17:15 +02:00
Tissevert e7484ef536 Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps 2019-10-14 10:17:15 +02:00
Tissevert f9e5683bf4 WIP: Use previous changes to start implementing font caching and text parsing (still very broken, doesn't compile) 2019-10-14 10:17:15 +02:00
Tissevert 51db57ec67 Ugly commit, breaks everything, still trying to figure a grammar for text 2019-10-14 10:17:15 +02:00
Tissevert 6f3c159ea7 Adding a module to implement text reading and a demo program to go with it 2019-10-14 10:17:15 +02:00