|
bcf2e05bfb
|
Move Content out of Object module into a separate one incorporating PDF.Update (which is actually an operation that is defined only on that structure), and rename it Layer to avoid confusion with Content streams as defined in the specs (which have their own PDF.Content module already)
|
2020-02-17 15:29:59 +01:00 |
|
|
923d1800b0
|
Gain a bit of speed by using native Attoparsec for number types instead of reimplementing them with ByteString conversion and call to read
|
2020-02-14 18:02:40 +01:00 |
|
|
1c457d71d8
|
Fix the reading of Hexadecimal string objects detected by running the tests implemented from the spec
|
2020-02-14 18:00:12 +01:00 |
|
|
a72d76e229
|
Add unit tests to make sure I'm not breaking things too much
|
2020-02-14 17:58:03 +01:00 |
|
|
919f640443
|
Merge branch 'extract-text' into navigation
|
2020-02-12 17:35:56 +01:00 |
|
|
32f9866106
|
Use peek to improve directObject parser avoiding a large <|> disjunction
|
2020-02-12 17:34:27 +01:00 |
|
|
704d7a7fcf
|
It turns out Output.concat wasn't necessary, OBuilder seems already is a Monoid so mconcat works (that fact was used in the very implementation of concat…)
|
2020-02-11 17:36:29 +01:00 |
|
|
aed7af376a
|
WIP: still trying to figure things out, moved to a separate submodule for Navigation, proper naming is hell
|
2020-02-11 08:29:08 +01:00 |
|
|
e77bbbcda9
|
WIP: start moving some navigation-related routines from Pages into Object directly and generalize them to multi-component to allow easier browsing
|
2020-02-10 17:43:04 +01:00 |
|
|
42a02808c1
|
Merge branch 'main' into extract-text
|
2019-11-27 18:05:47 +01:00 |
|
|
380c1e439b
|
Fix a bug preventing Hufflepdf from reading objects with a ' ' after the obj keyword
|
2019-11-27 18:01:19 +01:00 |
|
|
3a3e1533b4
|
Clean ByteString types to identify when a ByteString contains the representation of an integer in a given base and fix the last remaining PDF string (un)escaping issue
|
2019-10-14 10:17:15 +02:00 |
|
|
d07c286f8e
|
Clean exported ByteString custom functions
|
2019-10-14 10:17:15 +02:00 |
|
|
36d7f9b819
|
Still debugging, broke pretty much everything and finally implementing a proper coderange parsing for CMap because apparently that's necessary
|
2019-10-14 10:17:15 +02:00 |
|
|
3b59fd0c61
|
Separate CMap and Text in two distinct modules
|
2019-10-14 10:17:15 +02:00 |
|
|
c349d9b4c2
|
Don't trust serializer, they have nothing todo with a reasonable binary encoding
|
2019-10-14 10:17:15 +02:00 |
|
|
e7484ef536
|
Completely lost, the same old Char8 / Word8 again, implemented all the text reading, still needing a couple details to parse CMaps
|
2019-10-14 10:17:15 +02:00 |
|
|
b8eb9e6856
|
Generalize the Parser type into a MonadParser class to use with MonadTrans and remove redundant code already defined in Applicative or Attoparsec
|
2019-10-14 10:17:15 +02:00 |
|
|
51db57ec67
|
Ugly commit, breaks everything, still trying to figure a grammar for text
|
2019-10-14 10:17:15 +02:00 |
|
|
6f3c159ea7
|
Adding a module to implement text reading and a demo program to go with it
|
2019-10-14 10:17:15 +02:00 |
|
|
3a39c75e6a
|
Stop requiring an empty line between subsections in a xref section
|
2019-09-22 01:37:28 +02:00 |
|
|
29c5823f34
|
Fix precision bug caused by using Floats to represent PDF Number values sometimes used to represent a byte offset within a file
|
2019-09-22 01:34:17 +02:00 |
|
|
699f830a45
|
Simplify XRef structure, clarify integer types and remove nextLine
|
2019-09-20 22:39:14 +02:00 |
|
|
264b0dc92b
|
Stop requiring «trailer» keywords to live on a separate line as counter-examples have been found
|
2019-05-31 15:08:54 +02:00 |
|
|
9dac275f68
|
Keep comment-opening '%' along with the comment and support empty lines
|
2019-05-31 15:07:41 +02:00 |
|
|
85e4eb9273
|
Fix bypassed error message for lines + add one for occurrences
|
2019-05-31 15:06:20 +02:00 |
|
|
11cb6504d7
|
Go strict ByteStrings with attoparsec
|
2019-05-24 10:48:09 +02:00 |
|
|
5614a25048
|
Generate valid PDF
|
2019-05-18 09:01:13 +02:00 |
|
|
0336baa687
|
Fix output implementation with dynamic XRefs
|
2019-05-17 16:14:06 +02:00 |
|
|
e23618da68
|
Implement output
|
2019-05-16 22:41:14 +02:00 |
|
|
645466024a
|
Starting to implement output with String builder
|
2019-05-16 17:04:45 +02:00 |
|
|
9b2f890227
|
Boyer-Moore is canceled, implement the rest of parsing with naive search
|
2019-05-16 11:01:50 +02:00 |
|
|
fc41f815a3
|
Broken state : trying to implement Boyer-Moore for fast-forwarding to the end of a section
|
2019-05-15 19:13:35 +02:00 |
|
|
379a821550
|
Fix bugs preventing the objects from loading
|
2019-05-15 15:03:55 +02:00 |
|
|
44508a204c
|
Reuse Parser type in PDF.Body (and generalize the type of the comment parser)
|
2019-05-15 09:04:17 +02:00 |
|
|
91292d6401
|
Implement retrieving objects in the body of the document and use it to populate the structure previously parsed
|
2019-05-14 18:42:11 +02:00 |
|
|
8043f84da8
|
Cut PDF module in two, implement basic parsing up to reading XRef table
|
2019-05-13 18:22:05 +02:00 |
|