The spine lists all content documents in their linear reading order. So we're finally getting our results in the right order! 🎉 Since we skip the images and fonts, which usually make up the most bytes in an EPUB file, the performance increase is immense. I measured 60-70% in a very short test. Closes: #1
4.9 KiB
epubgrep(1) Manual Page
NAME
epubgrep - Search tool for EPUB e-books.
SYNOPSIS
epubgrep [OPTION]… PATTERN [FILE]…
DESCRIPTION
epubgrep searches EPUB files in a similar way as grep. It uses the same names for command line switches where possible. However, not all grep switches are implemented and some additional switches are added.
OPTIONS
- -h, --help
-
Display a short help message and exit.
- V, --version
-
Show version, copyright and license.
- -G, --basic-regexp
-
PATTERN is a POSIX basic regular expression. This is the default.
- -E, --extended-regexp
-
PATTERN is a POSIX extended regular expression.
- --grep
-
In combination with --basic-regexp or --extended-regexp, PATTERN is treated as a newline separated list of expressions, a match is found if any of the expressions in the list match.
- -P, --perl-regexp
-
PATTERN is a Perl regular expression.
- -i, --ignore-case
-
Ignore case distinctions in pattern and data.
- -e PATTERN, --regexp PATTERN
-
Use additional PATTERN for matching. Can be used more than once.
- -a, --raw
-
Do not clean up text before searching. No HTML stripping, no newline removal, all files will be read (not just the text documents listed in the spine).
- -C NUMBER, context NUMBER
-
Print NUMBER words of context around matches.
- --nocolor
-
Do not color matches.
- --no-filename WHICH
-
Suppress the mentioning of file names on output. WHICH is ‘filesystem’ for the file names on your file systems, ‘in-epub’ for the file names inside the EPUB or ‘all’. Chapters and page numbers will still be output.
- -r, --recursive
-
Read all files under each directory, recursively, following symbolic links only if they are on the command line. Silently skips directories that are not readable by the user.
- -R, --dereference-recursive
-
Read all files under each directory, recursively. Follow all symbolic links. Silently skips directories that are not readable by the user.
- --ignore-archive-errors
-
Ignore errors about wrong file formats. When you search directories recursively, it is likely that there are files which are not EPUB files. This setting suppresses errors related to them.
USAGE
$ epubgrep -i makhno -C 4 The_Bolshevik_Myth.epub
OPS/piece000038.xhtml, Chapter 33. Dark People, page 141: in the campaign against Makhno, and they were exchanging
The output is <file path in epub>, <last headline>, <page number>: <context before><match><context after>. <last headline> and <page number> may not be available.
Differences to grep
epubgrep does not operate on lines, but on whole files. All newlines will be replaced by spaces (multiple newlines will be condensed into one space) and HTML will be stripped. This means you can search for text spanning multiple lines and don’t have to worry about HTML tags in the text. Use --raw if you want to search in the raw files instead.
Configuration
Every command line switch can be used as an option in the configuration file. If
the switch has no value (it is a simple on switch), it has to be written as
option = 1
. Do not use quotation marks around the values, they will be taken
literally.
Command line options overwrite configuration file options. Options that can occur more than once are merged.
Example configuration file
This example makes epubgrep ignore files which are not EPUB, suppress the file names on output, print 2 words of context around matches (unless overridden on the command line) and search for mentions of the words thyme and oregano in every book.
ignore-archive-errors = 1
no-filename = 1
context = 2
regexp = [Tt]hyme
regexp = [Oo]regano
FILES
- Configuration file
-
-
If
XDG_CONFIG_HOME
is defined:${XDG_CONFIG_HOME}/epubgrep.conf
-
If
HOME
is defined:${HOME}/.config/epubgrep.conf
-
Otherwise:
epubgrep.conf
-
KNOWN BUGS
EPUB files with non-ASCII file names only work reliably when the system locale
uses an encoding which has the necessary characters. Technically EPUBs must use
UTF-8 for file names but it is usually recommended to only use ASCII (ASCII is
valid UTF-8). If your system locale is not UTF-8, files may be silently skipped.
You can work around this by calling epubgrep like this:
LC_ALL="C.UTF-8" epubgrep
REPORTING BUGS
Bugtracker: https://schlomp.space/tastytea/epubgrep/issues
E-mail: tastytea@tastytea.de
SEE ALSO
perlre(1)