Commit Graph

91 Commits

Author SHA1 Message Date
03138c1dbf
Remove unnecessary include. 2021-05-30 18:07:13 +02:00
e7633fe134
Rename prefix to before and suffix to after.
All checks were successful
continuous-integration/drone/push Build is passing
2021-05-30 14:47:18 +02:00
6255d665af
Replace tabs with a space in search::cleanup(). 2021-05-30 14:37:05 +02:00
d7ad180721
Use iterators in search::context() and don't return extra whitespace
Should be easier to understand now.
2021-05-30 13:45:56 +02:00
790e60a055
Fix end-of-headline detection. 2021-05-29 23:00:16 +02:00
160ff20387
Revert "Fix pugixml target."
All checks were successful
continuous-integration/drone/push Build is passing
pugixml 1.8.1 (the version in Ubuntu bionic) does not have that target.

This reverts commit 2a3e3f87b5.
2021-05-29 22:10:21 +02:00
2a3e3f87b5
Fix pugixml target. 2021-05-29 21:37:40 +02:00
37e868b3f2
Remove <style> and <script> snippets.
Closes: #8
2021-05-29 18:52:03 +02:00
2d65961688
Output XML errors. 2021-05-29 18:42:29 +02:00
ba5716c585
Skip in-epub file if it is not found, except it's the container.
We skipped the whole EPUB before.
2021-05-29 18:20:23 +02:00
5bd1030ad8
Try opf: variants of XML tags if normal variants are not found. 2021-05-29 18:09:44 +02:00
03b367ee98
Don't print same file path twice in error message.
zip::exception always has  the filename in the message.
2021-05-29 17:37:41 +02:00
00e3edb9f2
Only search files in spine, in the right order.
The spine lists all content documents in their linear reading order. So we're
finally getting our results in the right order! 🎉

Since we skip the images and fonts, which usually make up the most bytes in an
EPUB file, the performance increase is immense. I measured 60-70% in a very
short test.

Closes: #1
2021-05-29 17:34:43 +02:00
c94d9de0db
Reformat error messages.
One line per error message.
2021-05-29 12:53:14 +02:00
4ff796a590
Make regular expressions static variables.
All checks were successful
continuous-integration/drone/push Build is passing
Fewer allocations → faster program. About 17% speed increase with 89 books on up
to 3 cores. Measured using the average of 4 runs.
Before: ~15,5 seconds
 After: ~12,8 seconds

Calls to allocation functions went down from 16.652.583 to 5.059.301.
2021-05-28 19:11:32 +02:00
4df7b36dfc
Print matches while still searching.
All checks were successful
continuous-integration/drone/push Build is passing
Previously we printed the matches at the end.
2021-05-28 17:18:34 +02:00
59759b5934
Put output stuff into own function in different file.
It got a little crowded in main(). 😊
2021-05-28 17:07:11 +02:00
308e2d271f
Skip rest of file if encoding of files in EPUB is broken.
Standard says UTF-8. I don't want to deal with weird Windows-encodings or
whatever this is.

Closes: #7
2021-05-28 13:57:51 +02:00
65b46ca846
Do not allow more threads than max_threads. 2021-05-28 11:48:38 +02:00
c3131e01f0
Add setting to suppress this-is-not-an-EPUB errors. 2021-05-27 21:48:35 +02:00
84f600196c
Add error code to zip::exception. 2021-05-27 21:39:01 +02:00
b96315f8bb
Don't add extra newlines before errors. 2021-05-27 21:03:42 +02:00
2b91a839cc
Add --raw and --context again.
Forgot to re-implement them when I overhauled the option parsing…
2021-05-27 21:01:07 +02:00
8d5565a72c
Don't write to matches_all simultaneously from different threads.
What did I do yesterday?!? 😬

Closes: #6
2021-05-27 20:42:20 +02:00
38bf9be948
Fix some more memory leaks. 2021-05-27 20:11:59 +02:00
b24ea9b71e
Fix memory leak. 🤦
All checks were successful
continuous-integration/drone/push Build is passing
That's why I don't write C. 😄

This seems to fix issue #6 in single-threaded mode but sometimes throws “double
free or corruption (out)” in multi-threaded mode.

Bug: #6
2021-05-27 20:05:02 +02:00
fbb87cac81
Remove a few unnecessary .data(), remove unnecessary include. 2021-05-27 19:08:53 +02:00
c50659a339
Chunk error string to make it better translatable. 2021-05-27 17:24:19 +02:00
e64591f204
Rework option parsing, change --no-filename.
Some checks failed
continuous-integration/drone/push Build is failing
Options are now better accessible, --no-filename accepts the values filesystem,
in-epub or all.
2021-05-27 17:20:00 +02:00
c376ce8466
Print the EPUB file name if more than 1 input file.
Change --no-filename to mean: Don't print the EPUB file name.
2021-05-27 14:46:23 +02:00
0c45e7ac98
Add --recursive and --dereference-recursive.
Closes: #5
2021-05-27 14:45:52 +02:00
b764f5423c
Put input files into a std::vector<filesystem::path>.
We need that for supporting recursive directory search later.

#
# Previous commits:
#   29ae22c Make regex const.
#   8ed72af Update german translation.
#   a3b0964 Remove old comment.
#   d107ce5 Modify config file example.
2021-05-27 13:46:47 +02:00
29ae22cc4a
Make regex const. 2021-05-27 09:46:59 +02:00
a3b0964873
Remove old comment. 2021-05-26 20:20:21 +02:00
7dcf6d599c
Remove debug statements. 2021-05-26 18:25:53 +02:00
fe02b155f5
Import std::string into epubgrep::search namespace.
All checks were successful
continuous-integration/drone/push Build is passing
2021-05-26 18:02:27 +02:00
fc0aa02bc9
Use threads if more than one input file is searched.
Use 75% of the available threads (rounded up).

Closes: #4
2021-05-26 17:50:52 +02:00
694cb3bc44
Add --no-filename switch.
All checks were successful
continuous-integration/drone/push Build is passing
Suppresses the mentioning of file names on output.
2021-05-26 09:04:16 +02:00
fd8db544bd
Add --nocolor switch.
Closes: #2
2021-05-25 11:52:13 +02:00
b72d3f3420
Color matches bright magenta. 2021-05-25 11:00:05 +02:00
d3c3062cc0
Add Termcolor dependency and bundle it in dist/. 2021-05-25 10:55:44 +02:00
ce015954ea
Only initialize search::options once. 2021-05-25 10:02:34 +02:00
4644c2afd4
Support CMake 3.12.
All checks were successful
continuous-integration/drone/push Build is passing
Ubuntu 20.04 has 3.16, so requiring 3.17 is a bit mean.
2021-05-25 07:38:07 +02:00
be229d25d6
Don't demand required options if --help or --version is requested.
All checks were successful
continuous-integration/drone/push Build is passing
Bump version to 0.1.2.
2021-05-25 07:15:04 +02:00
e1d29c5893
Don't replace stuff in search::cleanup_text() if nothing matched. 2021-05-24 20:02:27 +02:00
09090a1c13
Fix bugs in search::context().
- Don't add context if words == 0
- Handle beginning / end of text correctly.
2021-05-24 19:57:15 +02:00
1f25daed26
Add basic error handling to search. 2021-05-24 19:10:00 +02:00
c790c4952c
Extract page numbers. 2021-05-24 18:56:43 +02:00
bb4a4c719f
Wrap headlines in <H> and </H> during cleanup. 2021-05-24 18:08:40 +02:00
8ab7d0f655
Extract headlines. 2021-05-24 17:27:30 +02:00