|
d90eaf6f1c
|
Add Box instances to allow handling some exceptions in monad and converting them to Traversable accessible from the data part of the type
|
2020-02-27 17:22:12 +01:00 |
|
|
99014ff30d
|
Recognize openStream was just an implementation of r for the Box m () Object ByteString, and extend it implementing the w operation while we're at it
|
2020-02-26 22:13:29 +01:00 |
|
|
f4df4aab22
|
Found a nicer formulation that doesn't require transitivity or index agglomeration and swapped argument of w for more reusability with at / atAll
|
2020-02-26 17:19:22 +01:00 |
|
|
bdbc5f7351
|
Generalize the Box instances on containers from the particular cases of Document/Layers and Layers/Objects and move them to PDF.Box
|
2020-02-25 17:36:54 +01:00 |
|
|
30fece6537
|
Notice the 'edit' I exported earlier could be reused to simplify the w implementation of the proof that Box is a transitive relation
|
2020-02-25 09:27:56 +01:00 |
|
|
1a70f2972b
|
Expose Box index flags in PDF and PDF.Layer
|
2020-02-24 21:37:09 +01:00 |
|
|
67faa06ea2
|
Lift unused restriction on MonadFail for AllObjects instance of Box Layer
|
2020-02-24 21:36:31 +01:00 |
|
|
83a63d4b02
|
Implement Box instance from Layer to Object, either all at once or indexed by an ObjectId
|
2020-02-24 17:29:22 +01:00 |
|
|
85ee8519c4
|
Implement Box instances from Document to Layers and EOLStyle
|
2020-02-24 17:28:17 +01:00 |
|
|
e607f9cd37
|
Implement transitivity instance, extract a part of modifyAt as a convenient 'edit' function useful elsewhere and present a right-infix version of (,) to allow writing the nested tuple indexes more conveniently
|
2020-02-24 17:27:37 +01:00 |
|
|
a9252b129a
|
Start a Box module to describe inclusion relations between different types and get a MonadState action on the top type for any modification down there
|
2020-02-23 22:24:59 +01:00 |
|
|
71e62ee732
|
Add IDs to Instructions so that they can be selected in a given Content (and modified one day…)
|
2020-02-23 22:21:09 +01:00 |
|
|
160999a7d7
|
A small renaming for more clarity and because I thought «update» could be needed for a function name but after all maybe not but it's still better that way
|
2020-02-23 22:17:09 +01:00 |
|
|
36b1782464
|
Follow previous renaming for a local variable in Navigation for more clarity
|
2020-02-23 22:15:52 +01:00 |
|
|
bcf2e05bfb
|
Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already)
|
2020-02-17 15:29:59 +01:00 |
|
|
6096a1a237
|
Simplify navigations by centering everything on Objects to avoid needing to many conversion tools between DirectObject / Object / Dictionary
|
2020-02-15 13:51:24 +01:00 |
|
|
23186100a8
|
Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access
|
2020-02-15 10:25:09 +01:00 |
|
|
b916ab5206
|
Just noticed Streams are a kind of Dictionary too, since they have a header
|
2020-02-15 10:23:32 +01:00 |
|
|
4a6dbda7d3
|
Move Error type from Pages to Navigation as a candidate for MonadFail required by PDFContent defined there
|
2020-02-15 10:22:42 +01:00 |
|
|
923d1800b0
|
Gain a bit of speed by using native Attoparsec for number types instead of reimplementing them with ByteString conversion and call to read
|
2020-02-14 18:02:40 +01:00 |
|
|
1c457d71d8
|
Fix the reading of Hexadecimal string objects detected by running the tests implemented from the spec
|
2020-02-14 18:00:12 +01:00 |
|
|
a72d76e229
|
Add unit tests to make sure I'm not breaking things too much
|
2020-02-14 17:58:03 +01:00 |
|
|
919f640443
|
Merge branch 'extract-text' into navigation
|
2020-02-12 17:35:56 +01:00 |
|
|
32f9866106
|
Use peek to improve directObject parser avoiding a large <|> disjunction
|
2020-02-12 17:34:27 +01:00 |
|
|
eb4d76002c
|
Finish the split of Navigation out of Page, generalize the use of MonadFail with a custom Error monad (~= Either String)
|
2020-02-11 22:41:46 +01:00 |
|
|
af994cb50c
|
WIP: in the process of migrating to Object.Navigation in Pages, still unsure how to manage simple Content parsing and efficient font loading (+ giving a way to edit Contents)
|
2020-02-11 17:59:15 +01:00 |
|
|
704d7a7fcf
|
It turns out Output.concat wasn't necessary, OBuilder seems already is a Monoid so mconcat works (that fact was used in the very implementation of concat…)
|
2020-02-11 17:36:29 +01:00 |
|
|
11647eb4eb
|
Implement output for Content streams
|
2020-02-11 17:26:47 +01:00 |
|
|
aed7af376a
|
WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell
|
2020-02-11 08:29:08 +01:00 |
|
|
e77bbbcda9
|
WIP: start moving some navigation-related routines from Pages into Object directly and generalize them to multi-component to allow easier browsing
|
2020-02-10 17:43:04 +01:00 |
|
|
195446e653
|
Allow resources with no /Font field, they won't cause any problem as long as no call to Tf (to load a font) is made
|
2020-02-10 17:41:44 +01:00 |
|
|
9f1b1afafe
|
Implement Text rendering from parsed Content
|
2020-02-10 10:54:44 +01:00 |
|
|
20466c4f13
|
WIP: Clean code parsing «pages» (now Content), separated from text rendering (will be reimplemented as an upper layer, also providing modification as stream filters) — Page is also forgotten for now, will need a big improvement in Object navigation
|
2020-02-09 22:42:57 +01:00 |
|
|
325250383a
|
Add support for fonts and implement MacRomanEncoding
|
2020-02-08 08:15:32 +01:00 |
|
|
c48ab22808
|
Forgot some useless parentheses when playing with operator precedences
|
2020-02-04 17:05:15 +01:00 |
|
|
a2b66ac6d6
|
Generalize the getFont function because some /Resources have a direct dictionary as value for their /Font property
|
2020-02-04 17:04:42 +01:00 |
|
|
cefb08ee50
|
Going a step further in «optimization» (slowing it even more…) by replacing choice by a search in a Map
|
2019-11-30 21:46:22 +01:00 |
|
|
afbbcbffc5
|
Finish implementing the new stack-based call parser
|
2019-11-30 12:39:40 +01:00 |
|
|
bac08446dd
|
WIP: starting to fix this criminally inefficient parser for PDF's postfix-operator instructions
|
2019-11-29 17:42:57 +01:00 |
|
|
f9f799c59b
|
Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1)
|
2019-11-29 11:51:35 +01:00 |
|
|
08a9717b3a
|
Get rid of wrapper PageContents structure returned by PageContent in the PDF.Text module (and return directly [ByteString] instead)
|
2019-11-29 11:48:28 +01:00 |
|
|
42a02808c1
|
Merge branch 'main' into extract-text
|
2019-11-27 18:05:47 +01:00 |
|
|
380c1e439b
|
Fix a bug preventing Hufflepdf from reading objects with a ' ' after the obj keyword
|
2019-11-27 18:01:19 +01:00 |
|
|
c9f050e64b
|
Remove deprecated debug script and forgotten comments to bypass the selective export of Text module
|
2019-10-14 10:17:15 +02:00 |
|
|
3a3e1533b4
|
Clean ByteString types to identify when a ByteString contains the representation of an integer in a given base and fix the last remaining PDF string (un)escaping issue
|
2019-10-14 10:17:15 +02:00 |
|
|
a96e36ec5a
|
Fix error silently discarding code ranges, make sure ByteString intervals are created with the correct byte length and decode utf16BE encoded values in single-value ranges
|
2019-10-14 10:17:15 +02:00 |
|
|
d07c286f8e
|
Clean exported ByteString custom functions
|
2019-10-14 10:17:15 +02:00 |
|
|
7a15113285
|
Try and re-implement string decoding — compiles but now fails to decode any string
|
2019-10-14 10:17:15 +02:00 |
|
|
36d7f9b819
|
Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary
|
2019-10-14 10:17:15 +02:00 |
|
|
b8ca7281aa
|
Fix parsing errors forgetting to make sure there's a space after special operator arguments like names and stringObjects
|
2019-10-14 10:17:15 +02:00 |
|
|
32efdcdd6b
|
Try and fix stuff by generalizing a signature to ease debugging and add parenthesis which I think should have been here all along
|
2019-10-14 10:17:15 +02:00 |
|
|
3b59fd0c61
|
Separate CMap and Text in two distinct modules
|
2019-10-14 10:17:15 +02:00 |
|
|
0374b72920
|
Finish implementing reading, still bugs to investigate
|
2019-10-14 10:17:15 +02:00 |
|
|
1dd22c3889
|
Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from strings
|
2019-10-14 10:17:15 +02:00 |
|
|
98d029c4d4
|
In complete debug, more or less implemented CMap parsing but apparently it uses UTF16 ?!
|
2019-10-14 10:17:15 +02:00 |
|
|
c349d9b4c2
|
Don't trust serializer, they have nothing todo with a reasonable binary encoding
|
2019-10-14 10:17:15 +02:00 |
|
|
e7484ef536
|
Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps
|
2019-10-14 10:17:15 +02:00 |
|
|
f9e5683bf4
|
WIP: Use previous changes to start implementing font caching and text parsing (still very broken, doesn't compile)
|
2019-10-14 10:17:15 +02:00 |
|
|
b8eb9e6856
|
Generalize the Parser type into a MonadParser class to use with MonadTrans and remove redundant code already defined in Applicative or Attoparsec
|
2019-10-14 10:17:15 +02:00 |
|
|
66d315b7fe
|
Reflect the distinction between eval and run from State monad into the Parser module
|
2019-10-14 10:17:15 +02:00 |
|
|
51db57ec67
|
Ugly commit, breaks everything, still trying to figure a grammar for text
|
2019-10-14 10:17:15 +02:00 |
|
|
6f3c159ea7
|
Adding a module to implement text reading and a demo program to go with it
|
2019-10-14 10:17:15 +02:00 |
|
|
68f90d20e2
|
Implement PDF's multilayer updates and use it in getObj to display only the current version of the object taken into account instead of the concatenation of all its versions
|
2019-09-22 01:40:39 +02:00 |
|
|
3a39c75e6a
|
Stop requiring an empty line between subsections in a xref section
|
2019-09-22 01:37:28 +02:00 |
|
|
29c5823f34
|
Fix precision bug caused by using Floats to represent PDF Number values sometimes used to represent a byte offset within a file
|
2019-09-22 01:34:17 +02:00 |
|
|
699f830a45
|
Simplify XRef structure, clarify integer types and remove nextLine
|
2019-09-20 22:39:14 +02:00 |
|
|
264b0dc92b
|
Stop requiring «trailer» keywords to live on a separate line as counter-examples have been found
|
2019-05-31 15:08:54 +02:00 |
|
|
9dac275f68
|
Keep comment-opening '%' along with the comment and support empty lines
|
2019-05-31 15:07:41 +02:00 |
|
|
85e4eb9273
|
Fix bypassed error message for lines + add one for occurrences
|
2019-05-31 15:06:20 +02:00 |
|
|
11cb6504d7
|
Go strict ByteStrings with attoparsec
|
2019-05-24 10:48:09 +02:00 |
|
|
0daa03d958
|
Remove commented out dead code
|
2019-05-21 09:07:37 +02:00 |
|
|
b60f337cc4
|
First useable version
|
2019-05-18 11:09:03 +02:00 |
|
|
5614a25048
|
Generate valid PDF
|
2019-05-18 09:01:13 +02:00 |
|
|
0336baa687
|
Fix output implementation with dynamic XRefs
|
2019-05-17 16:14:06 +02:00 |
|
|
e23618da68
|
Implement output
|
2019-05-16 22:41:14 +02:00 |
|
|
088637b2c0
|
Compat stuff for Monoid / Semigroup
|
2019-05-16 21:40:19 +02:00 |
|
|
645466024a
|
Starting to implement output with String builder
|
2019-05-16 17:04:45 +02:00 |
|
|
9b2f890227
|
Boyer-Moore is canceled, implement the rest of parsing with naive search
|
2019-05-16 11:01:50 +02:00 |
|
|
fc41f815a3
|
Broken state : trying to implement Boyer-Moore for fast-forwarding to the end of a section
|
2019-05-15 19:13:35 +02:00 |
|
|
379a821550
|
Fix bugs preventing the objects from loading
|
2019-05-15 15:03:55 +02:00 |
|
|
44508a204c
|
Reuse Parser type in PDF.Body (and generalize the type of the comment parser)
|
2019-05-15 09:04:17 +02:00 |
|
|
91292d6401
|
Implement retrieving objects in the body of the document and use it to populate the structure previously parsed
|
2019-05-14 18:42:11 +02:00 |
|
|
8043f84da8
|
Cut PDF module in two, implement basic parsing up to reading XRef table
|
2019-05-13 18:22:05 +02:00 |
|
|
6eacb55fc4
|
Fix bug preventing startXref to be found for files with a single byte EOL encoding
|
2019-05-13 11:34:15 +02:00 |
|
|
c036334b6f
|
Prototype successfully parsing (only last) startxref
|
2019-05-13 08:05:28 +02:00 |
|