epubgrep

Author	SHA1	Message	Date
tastytea	7b4b9edfe5	Rename file names in search::matches to make it more clear.	2021-06-01 19:15:00 +02:00
tastytea	a7fae314b3	Log some progress info to log file. All checks were successful continuous-integration/drone/push Build is passing Details	2021-06-01 17:17:00 +02:00
tastytea	07915bdf87	Add lots of debug output.	2021-06-01 15:32:10 +02:00
tastytea	76ed0c9dbf	Un-escape named and numbered entities in documents before searching. All checks were successful continuous-integration/drone/push Build is passing Details	2021-05-30 23:32:35 +02:00
tastytea	7ddfe32e30	Move is_whitespace() and urldecode() to helpers.	2021-05-30 21:52:52 +02:00
tastytea	94564fa914	Strip whitespace from headlines.	2021-05-30 21:16:24 +02:00
tastytea	e7633fe134	Rename prefix to before and suffix to after. All checks were successful continuous-integration/drone/push Build is passing Details	2021-05-30 14:47:18 +02:00
tastytea	6255d665af	Replace tabs with a space in search::cleanup().	2021-05-30 14:37:05 +02:00
tastytea	d7ad180721	Use iterators in search::context() and don't return extra whitespace Should be easier to understand now.	2021-05-30 13:45:56 +02:00
tastytea	790e60a055	Fix end-of-headline detection.	2021-05-29 23:00:16 +02:00
tastytea	37e868b3f2	Remove <style> and <script> snippets. Closes: #8	2021-05-29 18:52:03 +02:00
tastytea	00e3edb9f2	Only search files in spine, in the right order. The spine lists all content documents in their linear reading order. So we're finally getting our results in the right order! 🎉 Since we skip the images and fonts, which usually make up the most bytes in an EPUB file, the performance increase is immense. I measured 60-70% in a very short test. Closes: #1	2021-05-29 17:34:43 +02:00
tastytea	4ff796a590	Make regular expressions static variables. All checks were successful continuous-integration/drone/push Build is passing Details Fewer allocations → faster program. About 17% speed increase with 89 books on up to 3 cores. Measured using the average of 4 runs. Before: ~15,5 seconds After: ~12,8 seconds Calls to allocation functions went down from 16.652.583 to 5.059.301.	2021-05-28 19:11:32 +02:00
tastytea	e64591f204	Rework option parsing, change --no-filename. Some checks failed continuous-integration/drone/push Build is failing Details Options are now better accessible, --no-filename accepts the values filesystem, in-epub or all.	2021-05-27 17:20:00 +02:00
tastytea	c376ce8466	Print the EPUB file name if more than 1 input file. Change --no-filename to mean: Don't print the EPUB file name.	2021-05-27 14:46:23 +02:00
tastytea	29ae22cc4a	Make regex const.	2021-05-27 09:46:59 +02:00
tastytea	fe02b155f5	Import std::string into epubgrep::search namespace. All checks were successful continuous-integration/drone/push Build is passing Details	2021-05-26 18:02:27 +02:00
tastytea	e1d29c5893	Don't replace stuff in search::cleanup_text() if nothing matched.	2021-05-24 20:02:27 +02:00
tastytea	09090a1c13	Fix bugs in search::context(). - Don't add context if words == 0 - Handle beginning / end of text correctly.	2021-05-24 19:57:15 +02:00
tastytea	c790c4952c	Extract page numbers.	2021-05-24 18:56:43 +02:00
tastytea	bb4a4c719f	Wrap headlines in <H> and </H> during cleanup.	2021-05-24 18:08:40 +02:00
tastytea	8ab7d0f655	Extract headlines.	2021-05-24 17:27:30 +02:00
tastytea	972ce1d0fe	Don't strip headlines.	2021-05-24 16:37:30 +02:00
tastytea	bb1a43ca92	Move cleanup_text(), document functions.	2021-05-24 16:23:07 +02:00
tastytea	84e2b387e5	Clean up text before searching.	2021-05-24 16:01:41 +02:00
tastytea	1979956f03	Add basic search functionality and context output.	2021-05-24 15:35:49 +02:00
tastytea	1f82d9927a	Add skeleton for search::search(). - Type for matches - Type for options.	2021-05-24 07:52:36 +02:00

27 Commits