Search tool for EPUB e-books
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 

6.9 KiB

epubgrep(1) Manual Page

NAME

epubgrep - Search tool for EPUB e-books.

SYNOPSIS

epubgrep [OPTION]… PATTERN FILE

DESCRIPTION

epubgrep searches EPUB files in a similar way as grep. It uses the same names for command line switches where possible. However, not all grep switches are implemented and some additional switches are added.

This manual is also available at https://man.schlomp.space/tastytea/?program=epubgrep.

EXAMPLES

Search for Apple(s) or Orange(s) with 2 words of context around the matches, case insensitively
epubgrep -PiC2 '(Apple|Orange)s?' file.epub
Extract external hyperlinks
epubgrep -PC0 --raw --no-filename=all '"http[^"]+"' file.epub | tr -d '"'
Save the search results to an HTML file and output a status message every 20 seconds
epubgrep -C2 --status --status-interval=20 --html 'Apples' file.epub > result.html

OPTIONS

General options

-h, --help

Display a short help message and exit.

V, --version

Show version, copyright and license.

--debug

Write debug output to the terminal and log file.

Search options

-G, --basic-regexp

PATTERN is a POSIX basic regular expression. This is the default.

-E, --extended-regexp

PATTERN is a POSIX extended regular expression.

--grep

In combination with --basic-regexp or --extended-regexp, PATTERN is treated as a newline separated list of expressions, a match is found if any of the expressions in the list match.

-P, --perl-regexp

PATTERN is a Perl regular expression.

-i, --ignore-case

Ignore case distinctions in pattern and data.

-a, --raw

Do not clean up text before searching. No HTML stripping, no newline removal, all files will be read (not just the text documents listed in the spine).

-r, --recursive

Read all files under each directory, recursively, following symbolic links only if they are on the command line. Silently skips directories that are not readable by the user.

-R, --dereference-recursive

Read all files under each directory, recursively. Follow all symbolic links. Silently skips directories that are not readable by the user.

-e PATTERN, --regexp PATTERN

Use additional PATTERN for matching. Can be used more than once.

Output options

-C NUMBER, context NUMBER

Print NUMBER words of context around matches.

--nocolor

Turn off colors and other decorations.

--no-filename WHICH

Suppress the mentioning of file names on output. WHICH is ‘filesystem’ for the file names on your file systems, ‘in-epub’ for the file names inside the EPUB or ‘all’. Chapters and page numbers will still be output.

--ignore-archive-errors

Ignore errors about wrong file formats. When you search directories recursively, it is likely that there are files which are not EPUB files. This setting suppresses errors related to them.

--json

Output JSON instead of plain text. JSON will only be output at the end of the program. There will be an object named generator with the property epubgrep. The value is the version of the program, as string. The matches are in an array named matches. I will try not to break the API. 😊

--html

Output HTML instead of plain text. HTML will only be output at the end of the program.

--status

Output status message every --status-interval seconds to standard error. Default is 30.

--status-interval NUMBER

Set status message interval to NUMBER seconds.

USAGE

$ epubgrep -i makhno -C 4 The_Bolshevik_Myth.epub
OPS/piece000038.xhtml, Chapter 33. Dark People, page 141: in the campaign against Makhno, and they were exchanging

The output is <file path in epub>, <last headline>, <page number>: <context before><match><context after>. <last headline> and <page number> may not be available.

Differences to grep

epubgrep does not operate on lines, but on whole files. All newlines will be replaced by spaces (multiple newlines will be condensed into one space) and HTML will be stripped. This means you can search for text spanning multiple lines and don’t have to worry about HTML tags in the text. Use --raw if you want to search in the raw files instead.

Configuration

Every command line switch can be used as an option in the configuration file. If the switch has no value (it is a simple on switch), it has to be written as option = 1. Do not use quotation marks around the values, they will be taken literally.

Command line options overwrite configuration file options. Options that can occur more than once are merged.

Example configuration file

This example makes epubgrep always search directories recursively, ignore files which are not EPUB, not print the file names inside the EPUB, print 2 words of context around matches (unless overridden on the command line) and search for mentions of the words thyme and oregano in every book.

recursive = 1
ignore-archive-errors = 1
no-filename = in-epub
context = 2
regexp = [Tt]hyme
regexp = [Oo]regano

FILES

Configuration file
  • If XDG_CONFIG_HOME is defined: ${XDG_CONFIG_HOME}/epubgrep/epubgrep.conf

  • If HOME is defined: ${HOME}/.config/epubgrep/epubgrep.conf

  • Otherwise: epubgrep.conf

Log file
  • If XDG_STATE_HOME is defined: ${XDG_STATE_HOME}/epubgrep/epubgrep.log

  • If HOME is defined: ${HOME}/.local/state/epubgrep/epubgrep.log

  • Otherwise: epubgrep.log

KNOWN BUGS

EPUB files with non-ASCII file names only work reliably when the system locale uses an encoding which has the necessary characters. Technically EPUBs must use UTF-8 for file names but it is usually recommended to only use ASCII (ASCII is valid UTF-8). If your system locale is not UTF-8, files may be silently skipped. You can work around this by calling epubgrep like this: LC_ALL="C.UTF-8" epubgrep

SEE ALSO

perlre(1)