Commit Graph

27 Commits

Author SHA1 Message Date
7b4b9edfe5
Rename file names in search::matches to make it more clear. 2021-06-01 19:15:00 +02:00
a7fae314b3
Log some progress info to log file.
All checks were successful
continuous-integration/drone/push Build is passing
2021-06-01 17:17:00 +02:00
07915bdf87
Add lots of debug output. 2021-06-01 15:32:10 +02:00
76ed0c9dbf
Un-escape named and numbered entities in documents before searching.
All checks were successful
continuous-integration/drone/push Build is passing
2021-05-30 23:32:35 +02:00
7ddfe32e30
Move is_whitespace() and urldecode() to helpers. 2021-05-30 21:52:52 +02:00
94564fa914
Strip whitespace from headlines. 2021-05-30 21:16:24 +02:00
e7633fe134
Rename prefix to before and suffix to after.
All checks were successful
continuous-integration/drone/push Build is passing
2021-05-30 14:47:18 +02:00
6255d665af
Replace tabs with a space in search::cleanup(). 2021-05-30 14:37:05 +02:00
d7ad180721
Use iterators in search::context() and don't return extra whitespace
Should be easier to understand now.
2021-05-30 13:45:56 +02:00
790e60a055
Fix end-of-headline detection. 2021-05-29 23:00:16 +02:00
37e868b3f2
Remove <style> and <script> snippets.
Closes: #8
2021-05-29 18:52:03 +02:00
00e3edb9f2
Only search files in spine, in the right order.
The spine lists all content documents in their linear reading order. So we're
finally getting our results in the right order! 🎉

Since we skip the images and fonts, which usually make up the most bytes in an
EPUB file, the performance increase is immense. I measured 60-70% in a very
short test.

Closes: #1
2021-05-29 17:34:43 +02:00
4ff796a590
Make regular expressions static variables.
All checks were successful
continuous-integration/drone/push Build is passing
Fewer allocations → faster program. About 17% speed increase with 89 books on up
to 3 cores. Measured using the average of 4 runs.
Before: ~15,5 seconds
 After: ~12,8 seconds

Calls to allocation functions went down from 16.652.583 to 5.059.301.
2021-05-28 19:11:32 +02:00
e64591f204
Rework option parsing, change --no-filename.
Some checks failed
continuous-integration/drone/push Build is failing
Options are now better accessible, --no-filename accepts the values filesystem,
in-epub or all.
2021-05-27 17:20:00 +02:00
c376ce8466
Print the EPUB file name if more than 1 input file.
Change --no-filename to mean: Don't print the EPUB file name.
2021-05-27 14:46:23 +02:00
29ae22cc4a
Make regex const. 2021-05-27 09:46:59 +02:00
fe02b155f5
Import std::string into epubgrep::search namespace.
All checks were successful
continuous-integration/drone/push Build is passing
2021-05-26 18:02:27 +02:00
e1d29c5893
Don't replace stuff in search::cleanup_text() if nothing matched. 2021-05-24 20:02:27 +02:00
09090a1c13
Fix bugs in search::context().
- Don't add context if words == 0
- Handle beginning / end of text correctly.
2021-05-24 19:57:15 +02:00
c790c4952c
Extract page numbers. 2021-05-24 18:56:43 +02:00
bb4a4c719f
Wrap headlines in <H> and </H> during cleanup. 2021-05-24 18:08:40 +02:00
8ab7d0f655
Extract headlines. 2021-05-24 17:27:30 +02:00
972ce1d0fe
Don't strip headlines. 2021-05-24 16:37:30 +02:00
bb1a43ca92
Move cleanup_text(), document functions. 2021-05-24 16:23:07 +02:00
84e2b387e5
Clean up text before searching. 2021-05-24 16:01:41 +02:00
1979956f03
Add basic search functionality and context output. 2021-05-24 15:35:49 +02:00
1f82d9927a
Add skeleton for search::search().
- Type for matches
- Type for options.
2021-05-24 07:52:36 +02:00