|
bcf2e05bfb
|
Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already)
|
2020-02-17 15:29:59 +01:00 |
|
|
6096a1a237
|
Simplify navigations by centering everything on Objects to avoid needing to many conversion tools between DirectObject / Object / Dictionary
|
2020-02-15 13:51:24 +01:00 |
|
|
23186100a8
|
Reimplement getObj with the newest tools in PDF.Object.Navigation, in particular implement browsing by paths or random objectId access
|
2020-02-15 10:25:09 +01:00 |
|
|
b916ab5206
|
Just noticed Streams are a kind of Dictionary too, since they have a header
|
2020-02-15 10:23:32 +01:00 |
|
|
4a6dbda7d3
|
Move Error type from Pages to Navigation as a candidate for MonadFail required by PDFContent defined there
|
2020-02-15 10:22:42 +01:00 |
|
|
923d1800b0
|
Gain a bit of speed by using native Attoparsec for number types instead of reimplementing them with ByteString conversion and call to read
|
2020-02-14 18:02:40 +01:00 |
|
|
1c457d71d8
|
Fix the reading of Hexadecimal string objects detected by running the tests implemented from the spec
|
2020-02-14 18:00:12 +01:00 |
|
|
a72d76e229
|
Add unit tests to make sure I'm not breaking things too much
|
2020-02-14 17:58:03 +01:00 |
|
|
919f640443
|
Merge branch 'extract-text' into navigation
|
2020-02-12 17:35:56 +01:00 |
|
|
ae938acc02
|
Merge branch 'main' into extract-text
|
2020-02-12 17:34:56 +01:00 |
|
|
32f9866106
|
Use peek to improve directObject parser avoiding a large <|> disjunction
|
2020-02-12 17:34:27 +01:00 |
|
|
eb4d76002c
|
Finish the split of Navigation out of Page, generalize the use of MonadFail with a custom Error monad (~= Either String)
|
2020-02-11 22:41:46 +01:00 |
|
|
af994cb50c
|
WIP: in the process of migrating to Object.Navigation in Pages, still unsure how to manage simple Content parsing and efficient font loading (+ giving a way to edit Contents)
|
2020-02-11 17:59:15 +01:00 |
|
|
704d7a7fcf
|
It turns out Output.concat wasn't necessary, OBuilder seems already is a Monoid so mconcat works (that fact was used in the very implementation of concat…)
|
2020-02-11 17:36:29 +01:00 |
|
|
11647eb4eb
|
Implement output for Content streams
|
2020-02-11 17:26:47 +01:00 |
|
|
aed7af376a
|
WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell
|
2020-02-11 08:29:08 +01:00 |
|
|
e77bbbcda9
|
WIP: start moving some navigation-related routines from Pages into Object directly and generalize them to multi-component to allow easier browsing
|
2020-02-10 17:43:04 +01:00 |
|
|
195446e653
|
Allow resources with no /Font field, they won't cause any problem as long as no call to Tf (to load a font) is made
|
2020-02-10 17:41:44 +01:00 |
|
|
9f1b1afafe
|
Implement Text rendering from parsed Content
|
2020-02-10 10:54:44 +01:00 |
|
|
20466c4f13
|
WIP: Clean code parsing «pages» (now Content), separated from text rendering (will be reimplemented as an upper layer, also providing modification as stream filters) — Page is also forgotten for now, will need a big improvement in Object navigation
|
2020-02-09 22:42:57 +01:00 |
|
|
325250383a
|
Add support for fonts and implement MacRomanEncoding
|
2020-02-08 08:15:32 +01:00 |
|
|
c48ab22808
|
Forgot some useless parentheses when playing with operator precedences
|
2020-02-04 17:05:15 +01:00 |
|
|
a2b66ac6d6
|
Generalize the getFont function because some /Resources have a direct dictionary as value for their /Font property
|
2020-02-04 17:04:42 +01:00 |
|
|
cefb08ee50
|
Going a step further in «optimization» (slowing it even more…) by replacing choice by a search in a Map
|
2019-11-30 21:46:22 +01:00 |
|
|
afbbcbffc5
|
Finish implementing the new stack-based call parser
|
2019-11-30 12:39:40 +01:00 |
|
|
8373bd1ea0
|
Removing +x permission on getText source that shouldn't ever have been set
|
2019-11-29 19:07:54 +01:00 |
|
|
bac08446dd
|
WIP: starting to fix this criminally inefficient parser for PDF's postfix-operator instructions
|
2019-11-29 17:42:57 +01:00 |
|
|
7eca875900
|
Improve getObj example to catch no-existing ObjectId and default to listing existing ObjectIds when none is provided
|
2019-11-29 11:53:08 +01:00 |
|
|
f9f799c59b
|
Take the dirty code of «getText» and turn it into a relatively clean module exposing pages, that can be retrieved all at once or by page number (numbered human-style, starting from 1)
|
2019-11-29 11:51:35 +01:00 |
|
|
08a9717b3a
|
Get rid of wrapper PageContents structure returned by PageContent in the PDF.Text module (and return directly [ByteString] instead)
|
2019-11-29 11:48:28 +01:00 |
|
|
42a02808c1
|
Merge branch 'main' into extract-text
|
2019-11-27 18:05:47 +01:00 |
|
|
380c1e439b
|
Fix a bug preventing Hufflepdf from reading objects with a ' ' after the obj keyword
|
2019-11-27 18:01:19 +01:00 |
|
|
c9f050e64b
|
Remove deprecated debug script and forgotten comments to bypass the selective export of Text module
|
2019-10-14 10:17:15 +02:00 |
|
|
3a3e1533b4
|
Clean ByteString types to identify when a ByteString contains the representation of an integer in a given base and fix the last remaining PDF string (un)escaping issue
|
2019-10-14 10:17:15 +02:00 |
|
|
a96e36ec5a
|
Fix error silently discarding code ranges, make sure ByteString intervals are created with the correct byte length and decode utf16BE encoded values in single-value ranges
|
2019-10-14 10:17:15 +02:00 |
|
|
d07c286f8e
|
Clean exported ByteString custom functions
|
2019-10-14 10:17:15 +02:00 |
|
|
7a15113285
|
Try and re-implement string decoding — compiles but now fails to decode any string
|
2019-10-14 10:17:15 +02:00 |
|
|
36d7f9b819
|
Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary
|
2019-10-14 10:17:15 +02:00 |
|
|
b8ca7281aa
|
Fix parsing errors forgetting to make sure there's a space after special operator arguments like names and stringObjects
|
2019-10-14 10:17:15 +02:00 |
|
|
32efdcdd6b
|
Try and fix stuff by generalizing a signature to ease debugging and add parenthesis which I think should have been here all along
|
2019-10-14 10:17:15 +02:00 |
|
|
3b59fd0c61
|
Separate CMap and Text in two distinct modules
|
2019-10-14 10:17:15 +02:00 |
|
|
0374b72920
|
Finish implementing reading, still bugs to investigate
|
2019-10-14 10:17:15 +02:00 |
|
|
1dd22c3889
|
Going to try with Text, naturally handling UTF-16 but will still have to parse «int codes» manually from strings
|
2019-10-14 10:17:15 +02:00 |
|
|
98d029c4d4
|
In complete debug, more or less implemented CMap parsing but apparently it uses UTF16 ?!
|
2019-10-14 10:17:15 +02:00 |
|
|
c349d9b4c2
|
Don't trust serializer, they have nothing todo with a reasonable binary encoding
|
2019-10-14 10:17:15 +02:00 |
|
|
e7484ef536
|
Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps
|
2019-10-14 10:17:15 +02:00 |
|
|
f9e5683bf4
|
WIP: Use previous changes to start implementing font caching and text parsing (still very broken, doesn't compile)
|
2019-10-14 10:17:15 +02:00 |
|
|
b8eb9e6856
|
Generalize the Parser type into a MonadParser class to use with MonadTrans and remove redundant code already defined in Applicative or Attoparsec
|
2019-10-14 10:17:15 +02:00 |
|
|
66d315b7fe
|
Reflect the distinction between eval and run from State monad into the Parser module
|
2019-10-14 10:17:15 +02:00 |
|
|
51db57ec67
|
Ugly commit, breaks everything, still trying to figure a grammar for text
|
2019-10-14 10:17:15 +02:00 |
|