tastytea
63a8ab2683
pass c strings to fmt
...
continuous-integration/drone/push Build is passing
Details
boost strings and filesystem paths used to be automatically converted,
but that doesn't happen anymore with fmt 9
2022-08-16 16:26:17 +02:00
tastytea
299063e02c
Add language to books, documents and matches.
...
Currently only the book's language is actually read and applied down the line.
2021-08-20 16:57:29 +02:00
tastytea
b134bd0301
Add pointer to preferred text version (raw or cleaned) to document.
2021-08-20 15:07:00 +02:00
tastytea
b53e99306c
Re-add support for raw text searching.
2021-08-17 13:55:53 +02:00
tastytea
84ef5d1bf3
Move book processing into own file.
continuous-integration/drone/push Build is failing
Details
2021-08-17 13:05:14 +02:00
tastytea
f8270369b6
Make whitespace-reduction a bit more efficient.
...
continuous-integration/drone/push Build is passing
Details
We now use 2 passes instead of 3.
2021-06-08 17:30:29 +02:00
tastytea
f59c86e20d
Don't search for whitespace beyond the start/end of the text.
2021-06-06 23:48:06 +02:00
tastytea
0470acb00e
Make --raw work again.
continuous-integration/drone/push Build is passing
Details
2021-06-06 22:37:09 +02:00
tastytea
1e29608c7e
Fix positioning of matches in search::search().
2021-06-06 22:34:52 +02:00
tastytea
9708bb69c8
Don't attempt to access a pointer to nowhere.
2021-06-06 21:34:48 +02:00
tastytea
b8431019b7
Don't inject page numbers and headline-markers into the text.
...
continuous-integration/drone/push Build is failing
Details
The metadata is recorded in position → data pairs.
Closes: #13
2021-06-06 21:26:09 +02:00
tastytea
a49c500d0f
Fix <style> and <script> erasure.
...
I didn't take into account that <script […]/> is possible.
2021-06-06 16:06:14 +02:00
tastytea
262aab6671
Add debug log for replacements.
2021-06-06 15:52:09 +02:00
tastytea
9067b387ef
Fix pagebreak-iterators.
...
Oopsie! 😄
2021-06-06 15:50:13 +02:00
tastytea
99e1cd8e98
Re-enabled address sanitizer.
...
continuous-integration/drone/push Build is passing
Details
Found out what was wrong: I fed boost::regex_search() the pointer to a substring
that was created in-place. match[2] was a pointer to a substring inside that.
The problem was, that match was declared outside of the if-block. So after the
if-block match[2] would point to a now freed memory address. It didn't have any
effects because I didn't use match afterwards.
I rewrote the whole thing with iterators. Slightly less readable, slightly
better performance (probably).
2021-06-05 17:45:07 +02:00
tastytea
bdf9a86651
Fix pagebreak-regex and range in which pagebreaks are searched.
2021-06-05 17:18:35 +02:00
tastytea
f1a0015f28
Disable address sanitizer.
...
It complains about boost/regex/v5/sub_match.hpp:57:30 and I can't figure out
what's wrong or how to ignore it.
2021-06-05 14:24:53 +02:00
tastytea
12e1c64fc0
Make text formatting more readable.
2021-06-05 13:34:48 +02:00
tastytea
7b4b9edfe5
Rename file names in search::matches to make it more clear.
2021-06-01 19:15:00 +02:00
tastytea
a7fae314b3
Log some progress info to log file.
continuous-integration/drone/push Build is passing
Details
2021-06-01 17:17:00 +02:00
tastytea
07915bdf87
Add lots of debug output.
2021-06-01 15:32:10 +02:00
tastytea
76ed0c9dbf
Un-escape named and numbered entities in documents before searching.
continuous-integration/drone/push Build is passing
Details
2021-05-30 23:32:35 +02:00
tastytea
7ddfe32e30
Move is_whitespace() and urldecode() to helpers.
2021-05-30 21:52:52 +02:00
tastytea
94564fa914
Strip whitespace from headlines.
2021-05-30 21:16:24 +02:00
tastytea
e7633fe134
Rename prefix to before and suffix to after.
continuous-integration/drone/push Build is passing
Details
2021-05-30 14:47:18 +02:00
tastytea
6255d665af
Replace tabs with a space in search::cleanup().
2021-05-30 14:37:05 +02:00
tastytea
d7ad180721
Use iterators in search::context() and don't return extra whitespace
...
Should be easier to understand now.
2021-05-30 13:45:56 +02:00
tastytea
790e60a055
Fix end-of-headline detection.
2021-05-29 23:00:16 +02:00
tastytea
37e868b3f2
Remove <style> and <script> snippets.
...
Closes: #8
2021-05-29 18:52:03 +02:00
tastytea
00e3edb9f2
Only search files in spine, in the right order.
...
The spine lists all content documents in their linear reading order. So we're
finally getting our results in the right order! 🎉
Since we skip the images and fonts, which usually make up the most bytes in an
EPUB file, the performance increase is immense. I measured 60-70% in a very
short test.
Closes: #1
2021-05-29 17:34:43 +02:00
tastytea
4ff796a590
Make regular expressions static variables.
...
continuous-integration/drone/push Build is passing
Details
Fewer allocations → faster program. About 17% speed increase with 89 books on up
to 3 cores. Measured using the average of 4 runs.
Before: ~15,5 seconds
After: ~12,8 seconds
Calls to allocation functions went down from 16.652.583 to 5.059.301.
2021-05-28 19:11:32 +02:00
tastytea
e64591f204
Rework option parsing, change --no-filename.
...
continuous-integration/drone/push Build is failing
Details
Options are now better accessible, --no-filename accepts the values filesystem,
in-epub or all.
2021-05-27 17:20:00 +02:00
tastytea
c376ce8466
Print the EPUB file name if more than 1 input file.
...
Change --no-filename to mean: Don't print the EPUB file name.
2021-05-27 14:46:23 +02:00
tastytea
29ae22cc4a
Make regex const.
2021-05-27 09:46:59 +02:00
tastytea
fe02b155f5
Import std::string into epubgrep::search namespace.
continuous-integration/drone/push Build is passing
Details
2021-05-26 18:02:27 +02:00
tastytea
e1d29c5893
Don't replace stuff in search::cleanup_text() if nothing matched.
2021-05-24 20:02:27 +02:00
tastytea
09090a1c13
Fix bugs in search::context().
...
- Don't add context if words == 0
- Handle beginning / end of text correctly.
2021-05-24 19:57:15 +02:00
tastytea
c790c4952c
Extract page numbers.
2021-05-24 18:56:43 +02:00
tastytea
bb4a4c719f
Wrap headlines in <H> and </H> during cleanup.
2021-05-24 18:08:40 +02:00
tastytea
8ab7d0f655
Extract headlines.
2021-05-24 17:27:30 +02:00
tastytea
972ce1d0fe
Don't strip headlines.
2021-05-24 16:37:30 +02:00
tastytea
bb1a43ca92
Move cleanup_text(), document functions.
2021-05-24 16:23:07 +02:00
tastytea
84e2b387e5
Clean up text before searching.
2021-05-24 16:01:41 +02:00
tastytea
1979956f03
Add basic search functionality and context output.
2021-05-24 15:35:49 +02:00
tastytea
1f82d9927a
Add skeleton for search::search().
...
- Type for matches
- Type for options.
2021-05-24 07:52:36 +02:00