5fa32e35dbImplement Font retrieving for simple fonts with an /Encoding and no ToUnicodeTissevert2020-02-05 22:15:18 +0100
b5a15a692bForgot to remove commented-out dead codeTissevert2020-02-05 19:49:03 +0100
b859338a57Start implementing the MacRomanEncodingTissevert2020-02-05 18:03:44 +0100
764e2c6a4fRemoving deprecated hidding for «fail»Tissevert2020-02-05 18:02:52 +0100
6ed57d66e8Reimplement cMap as a type of Font and make the code ready for other FontsTissevert2020-02-05 17:42:17 +0100
22cde37025Add a Font class type to allow text rendition schemes other than CMapsTissevert2020-02-05 14:42:51 +0100
c48ab22808Forgot some useless parentheses when playing with operator precedencesTissevert2020-02-04 17:05:15 +0100
a2b66ac6d6Generalize the getFont function because some /Resources have a direct dictionary as value for their /Font propertyTissevert2020-02-04 17:04:42 +0100
cefb08ee50Going a step further in «optimization» (slowing it even more…) by replacing choice by a search in a MapTissevert2019-11-30 21:46:22 +0100
afbbcbffc5Finish implementing the new stack-based call parserTissevert2019-11-30 12:39:40 +0100
8373bd1ea0Removing +x permission on getText source that shouldn't ever have been setTissevert2019-11-29 19:07:54 +0100
bac08446ddWIP: starting to fix this criminally inefficient parser for PDF's postfix-operator instructionsTissevert2019-11-29 17:42:57 +0100
7eca875900Improve getObj example to catch no-existing ObjectId and default to listing existing ObjectIds when none is provided
main
Tissevert2019-11-29 11:53:08 +0100
f9f799c59bTake the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1)Tissevert2019-11-29 11:51:35 +0100
08a9717b3aGet rid of wrapper PageContents structure returned by PageContent in the PDF.Text module (and return directly [ByteString] instead)Tissevert2019-11-29 11:48:28 +0100
42a02808c1Merge branch 'main' into extract-textTissevert2019-11-27 18:05:47 +0100
380c1e439bFix a bug preventing Hufflepdf from reading objects with a ' ' after the `obj` keywordTissevert2019-11-27 18:01:03 +0100
c9f050e64bRemove deprecated debug script and forgotten comments to bypass the selective export of Text moduleTissevert2019-10-07 12:30:07 +0200
3a3e1533b4Clean ByteString types to identify when a ByteString contains the representation of an integer in a given base and fix the last remaining PDF string (un)escaping issueTissevert2019-10-04 18:46:07 +0200
a96e36ec5aFix error silently discarding code ranges, make sure ByteString intervals are created with the correct byte length and decode utf16BE encoded values in single-value rangesTissevert2019-10-03 14:59:06 +0200
7a15113285Try and re-implement string decoding — compiles but now fails to decode any stringTissevert2019-10-03 07:59:09 +0200
36d7f9b819Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessaryTissevert2019-09-30 14:13:12 +0200
b8ca7281aaFix parsing errors forgetting to make sure there's a space after special operator arguments like names and stringObjectsTissevert2019-09-28 09:25:59 +0200
32efdcdd6bTry and fix stuff by generalizing a signature to ease debugging and add parenthesis which I think should have been here all alongTissevert2019-09-27 18:38:03 +0200
3b59fd0c61Separate CMap and Text in two distinct modulesTissevert2019-09-27 18:16:12 +0200
0374b72920Finish implementing reading, still bugs to investigateTissevert2019-09-27 12:21:06 +0200
1dd22c3889Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from stringsTissevert2019-09-26 16:56:13 +0200
98d029c4d4In complete debug, more or less implemented CMap parsing but apparently it uses UTF16 ?!Tissevert2019-09-26 15:51:41 +0200
c349d9b4c2Don't trust serializer, they have nothing todo with a reasonable binary encodingTissevert2019-09-25 23:46:24 +0200
e7484ef536Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMapsTissevert2019-09-25 18:42:34 +0200
f9e5683bf4WIP: Use previous changes to start implementing font caching and text parsing (still very broken, doesn't compile)Tissevert2019-09-24 18:38:12 +0200
b8eb9e6856Generalize the Parser type into a MonadParser class to use with MonadTrans and remove redundant code already defined in Applicative or AttoparsecTissevert2019-09-24 18:36:17 +0200
66d315b7feReflect the distinction between eval and run from State monad into the Parser moduleTissevert2019-09-24 18:32:23 +0200
51db57ec67Ugly commit, breaks everything, still trying to figure a grammar for textTissevert2019-09-23 23:19:27 +0200
6f3c159ea7Adding a module to implement text reading and a demo program to go with itTissevert2019-09-23 18:00:47 +0200
68f90d20e2Implement PDF's multilayer updates and use it in getObj to display only the current version of the object taken into account instead of the concatenation of all its versionsTissevert2019-09-22 01:40:39 +0200
3a39c75e6aStop requiring an empty line between subsections in a xref sectionTissevert2019-09-22 01:37:28 +0200
29c5823f34Fix precision bug caused by using Floats to represent PDF Number values sometimes used to represent a byte offset within a fileTissevert2019-09-22 01:34:17 +0200
9ab010de61Add to example programs to show how the lib can be usedTissevert2019-09-20 22:42:17 +0200
088637b2c0Compat stuff for Monoid / SemigroupTissevert2019-05-16 21:40:19 +0200
96190a8ca4Forgot to add changes to cabal fileTissevert2019-05-16 17:06:14 +0200
645466024aStarting to implement output with String builderTissevert2019-05-16 17:04:45 +0200
9b2f890227Boyer-Moore is canceled, implement the rest of parsing with naive searchTissevert2019-05-16 11:01:50 +0200
fc41f815a3Broken state : trying to implement Boyer-Moore for fast-forwarding to the end of a sectionTissevert2019-05-15 19:12:38 +0200
379a821550Fix bugs preventing the objects from loadingTissevert2019-05-15 15:03:55 +0200
44508a204cReuse Parser type in PDF.Body (and generalize the type of the comment parser)Tissevert2019-05-15 09:04:17 +0200
91292d6401Implement retrieving objects in the body of the document and use it to populate the structure previously parsedTissevert2019-05-14 18:42:11 +0200
8043f84da8Cut PDF module in two, implement basic parsing up to reading XRef tableTissevert2019-05-13 18:22:05 +0200
6eacb55fc4Fix bug preventing startXref to be found for files with a single byte EOL encodingTissevert2019-05-13 11:34:15 +0200