epubgrep

Commit Graph

Author	SHA1	Message	Date
tastytea	9067b387ef	Fix pagebreak-iterators. Oopsie! 😄	2021-06-06 15:50:13 +02:00
tastytea	99e1cd8e98	Re-enabled address sanitizer. continuous-integration/drone/push Build is passing Details Found out what was wrong: I fed boost::regex_search() the pointer to a substring that was created in-place. match[2] was a pointer to a substring inside that. The problem was, that match was declared outside of the if-block. So after the if-block match[2] would point to a now freed memory address. It didn't have any effects because I didn't use match afterwards. I rewrote the whole thing with iterators. Slightly less readable, slightly better performance (probably).	2021-06-05 17:45:07 +02:00
tastytea	bdf9a86651	Fix pagebreak-regex and range in which pagebreaks are searched.	2021-06-05 17:18:35 +02:00
tastytea	f1a0015f28	Disable address sanitizer. It complains about boost/regex/v5/sub_match.hpp:57:30 and I can't figure out what's wrong or how to ignore it.	2021-06-05 14:24:53 +02:00
tastytea	12e1c64fc0	Make text formatting more readable.	2021-06-05 13:34:48 +02:00
tastytea	7b4b9edfe5	Rename file names in search::matches to make it more clear.	2021-06-01 19:15:00 +02:00
tastytea	a7fae314b3	Log some progress info to log file. continuous-integration/drone/push Build is passing Details	2021-06-01 17:17:00 +02:00
tastytea	07915bdf87	Add lots of debug output.	2021-06-01 15:32:10 +02:00
tastytea	76ed0c9dbf	Un-escape named and numbered entities in documents before searching. continuous-integration/drone/push Build is passing Details	2021-05-30 23:32:35 +02:00
tastytea	7ddfe32e30	Move is_whitespace() and urldecode() to helpers.	2021-05-30 21:52:52 +02:00
tastytea	94564fa914	Strip whitespace from headlines.	2021-05-30 21:16:24 +02:00
tastytea	e7633fe134	Rename prefix to before and suffix to after. continuous-integration/drone/push Build is passing Details	2021-05-30 14:47:18 +02:00
tastytea	6255d665af	Replace tabs with a space in search::cleanup().	2021-05-30 14:37:05 +02:00
tastytea	d7ad180721	Use iterators in search::context() and don't return extra whitespace Should be easier to understand now.	2021-05-30 13:45:56 +02:00
tastytea	790e60a055	Fix end-of-headline detection.	2021-05-29 23:00:16 +02:00
tastytea	37e868b3f2	Remove <style> and <script> snippets. Closes: #8	2021-05-29 18:52:03 +02:00
tastytea	00e3edb9f2	Only search files in spine, in the right order. The spine lists all content documents in their linear reading order. So we're finally getting our results in the right order! 🎉 Since we skip the images and fonts, which usually make up the most bytes in an EPUB file, the performance increase is immense. I measured 60-70% in a very short test. Closes: #1	2021-05-29 17:34:43 +02:00
tastytea	4ff796a590	Make regular expressions static variables. continuous-integration/drone/push Build is passing Details Fewer allocations → faster program. About 17% speed increase with 89 books on up to 3 cores. Measured using the average of 4 runs. Before: ~15,5 seconds After: ~12,8 seconds Calls to allocation functions went down from 16.652.583 to 5.059.301.	2021-05-28 19:11:32 +02:00
tastytea	e64591f204	Rework option parsing, change --no-filename. continuous-integration/drone/push Build is failing Details Options are now better accessible, --no-filename accepts the values filesystem, in-epub or all.	2021-05-27 17:20:00 +02:00
tastytea	c376ce8466	Print the EPUB file name if more than 1 input file. Change --no-filename to mean: Don't print the EPUB file name.	2021-05-27 14:46:23 +02:00
tastytea	29ae22cc4a	Make regex const.	2021-05-27 09:46:59 +02:00
tastytea	fe02b155f5	Import std::string into epubgrep::search namespace. continuous-integration/drone/push Build is passing Details	2021-05-26 18:02:27 +02:00
tastytea	e1d29c5893	Don't replace stuff in search::cleanup_text() if nothing matched.	2021-05-24 20:02:27 +02:00
tastytea	09090a1c13	Fix bugs in search::context(). - Don't add context if words == 0 - Handle beginning / end of text correctly.	2021-05-24 19:57:15 +02:00
tastytea	c790c4952c	Extract page numbers.	2021-05-24 18:56:43 +02:00
tastytea	bb4a4c719f	Wrap headlines in <H> and </H> during cleanup.	2021-05-24 18:08:40 +02:00
tastytea	8ab7d0f655	Extract headlines.	2021-05-24 17:27:30 +02:00
tastytea	972ce1d0fe	Don't strip headlines.	2021-05-24 16:37:30 +02:00
tastytea	bb1a43ca92	Move cleanup_text(), document functions.	2021-05-24 16:23:07 +02:00
tastytea	84e2b387e5	Clean up text before searching.	2021-05-24 16:01:41 +02:00
tastytea	1979956f03	Add basic search functionality and context output.	2021-05-24 15:35:49 +02:00
tastytea	1f82d9927a	Add skeleton for search::search(). - Type for matches - Type for options.	2021-05-24 07:52:36 +02:00

32 Commits