2021-05-23 06:35:04 +02:00
|
|
|
|
= epubgrep(1)
|
|
|
|
|
:doctype: manpage
|
|
|
|
|
:Author: tastytea
|
|
|
|
|
:Email: tastytea@tastytea.de
|
2021-05-30 19:03:59 +02:00
|
|
|
|
:Date: 2021-05-30
|
2021-05-23 06:35:04 +02:00
|
|
|
|
:Revision: 0.0.0
|
|
|
|
|
:man source: epubgrep
|
|
|
|
|
:man manual: General Commands Manual
|
|
|
|
|
|
|
|
|
|
== NAME
|
|
|
|
|
|
2021-05-28 22:59:54 +02:00
|
|
|
|
epubgrep - Search tool for EPUB e-books.
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
|
|
|
|
== SYNOPSIS
|
|
|
|
|
|
|
|
|
|
*epubgrep* [_OPTION_]… _PATTERN_ [_FILE_]…
|
|
|
|
|
|
|
|
|
|
== DESCRIPTION
|
|
|
|
|
|
|
|
|
|
*epubgrep* searches EPUB files in a similar way as grep. It uses the same names
|
2021-05-23 16:25:12 +02:00
|
|
|
|
for command line switches where possible. However, not all grep switches are
|
|
|
|
|
implemented and some additional switches are added.
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
|
|
|
|
== OPTIONS
|
|
|
|
|
|
|
|
|
|
*-h*, *--help*::
|
|
|
|
|
Display a short help message and exit.
|
|
|
|
|
|
|
|
|
|
*V*, *--version*::
|
|
|
|
|
Show version, copyright and license.
|
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
*-G*, *--basic-regexp*::
|
|
|
|
|
_PATTERN_ is a POSIX basic regular expression. This is the default.
|
|
|
|
|
|
2021-05-23 06:35:04 +02:00
|
|
|
|
*-E*, *--extended-regexp*::
|
2021-05-23 16:25:12 +02:00
|
|
|
|
_PATTERN_ is a POSIX extended regular expression.
|
|
|
|
|
|
|
|
|
|
*--grep*::
|
|
|
|
|
In combination with *--basic-regexp* or *--extended-regexp*, _PATTERN_ is
|
|
|
|
|
treated as a newline separated list of expressions, a match is found if any of
|
|
|
|
|
the expressions in the list match.
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
|
|
|
|
*-P*, *--perl-regexp*::
|
2021-05-23 16:25:12 +02:00
|
|
|
|
_PATTERN_ is a Perl regular expression.
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
|
|
|
|
*-i*, *--ignore-case*::
|
|
|
|
|
Ignore case distinctions in pattern and data.
|
|
|
|
|
|
|
|
|
|
*-e* _PATTERN_, *--regexp* _PATTERN_::
|
2021-05-23 16:25:12 +02:00
|
|
|
|
Use additional _PATTERN_ for matching. Can be used more than once.
|
|
|
|
|
|
2021-05-24 07:50:50 +02:00
|
|
|
|
*-a*, *--raw*::
|
2021-05-29 15:50:03 +02:00
|
|
|
|
Do not clean up text before searching. No HTML stripping, no newline removal,
|
|
|
|
|
all files will be read (not just the text documents listed in the spine).
|
2021-05-24 07:50:50 +02:00
|
|
|
|
|
2021-05-24 21:49:27 +02:00
|
|
|
|
*-C* _NUMBER_, *context* _NUMBER_::
|
2021-05-24 08:23:21 +02:00
|
|
|
|
Print _NUMBER_ words of context around matches.
|
2021-05-24 07:50:50 +02:00
|
|
|
|
|
2021-05-25 11:56:17 +02:00
|
|
|
|
*--nocolor*::
|
2021-05-30 19:03:59 +02:00
|
|
|
|
Turn off colors and other decorations.
|
2021-05-25 11:56:17 +02:00
|
|
|
|
|
2021-05-27 17:20:00 +02:00
|
|
|
|
*--no-filename* _WHICH_::
|
|
|
|
|
|
|
|
|
|
Suppress the mentioning of file names on output. _WHICH_ is ‘filesystem’ for the
|
|
|
|
|
file names on your file systems, ‘in-epub’ for the file names inside the EPUB or
|
|
|
|
|
‘all’. Chapters and page numbers will still be output.
|
2021-05-27 14:46:23 +02:00
|
|
|
|
|
2021-05-27 21:55:51 +02:00
|
|
|
|
*-r*, *--recursive*::
|
2021-05-27 14:44:56 +02:00
|
|
|
|
Read all files under each directory, recursively, following symbolic links only
|
|
|
|
|
if they are on the command line. Silently skips directories that are not
|
|
|
|
|
readable by the user.
|
|
|
|
|
|
2021-05-27 21:55:51 +02:00
|
|
|
|
*-R*, *--dereference-recursive*::
|
2021-05-27 14:44:56 +02:00
|
|
|
|
Read all files under each directory, recursively. Follow all symbolic
|
|
|
|
|
links. Silently skips directories that are not readable by the user.
|
2021-05-26 09:04:16 +02:00
|
|
|
|
|
2021-05-27 21:48:35 +02:00
|
|
|
|
*--ignore-archive-errors*::
|
|
|
|
|
Ignore errors about wrong file formats. When you search directories recursively,
|
|
|
|
|
it is likely that there are files which are not EPUB files. This setting
|
|
|
|
|
suppresses errors related to them.
|
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
== USAGE
|
|
|
|
|
|
2021-05-24 07:51:37 +02:00
|
|
|
|
[source,shellsession]
|
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
$ epubgrep -i makhno -C 4 The_Bolshevik_Myth.epub
|
|
|
|
|
OPS/piece000038.xhtml, Chapter 33. Dark People, page 141: in the campaign against Makhno, and they were exchanging
|
|
|
|
|
--------------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
The output is <file path in epub>, <last headline>, <page number>: <context
|
|
|
|
|
before><match><context after>. <last headline> and <page number> may not be available.
|
|
|
|
|
|
|
|
|
|
=== Differences to grep
|
|
|
|
|
|
2021-05-25 11:08:08 +02:00
|
|
|
|
epubgrep does not operate on lines, but on whole files. All newlines will be
|
|
|
|
|
replaced by spaces (multiple newlines will be condensed into one space) and HTML
|
|
|
|
|
will be stripped. This means you can search for text spanning multiple lines and
|
|
|
|
|
don't have to worry about HTML tags in the text. Use *--raw* if you want to
|
|
|
|
|
search in the raw files instead.
|
2021-05-24 07:51:37 +02:00
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
=== Configuration
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
Every command line switch can be used as an option in the configuration file. If
|
|
|
|
|
the switch has no value (it is a simple on switch), it has to be written as
|
|
|
|
|
`option = 1`. Do not use quotation marks around the values, they will be taken
|
|
|
|
|
literally.
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
Command line options overwrite configuration file options. Options that can
|
|
|
|
|
occur more than once are merged.
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
==== Example configuration file
|
|
|
|
|
|
2021-05-27 21:48:35 +02:00
|
|
|
|
This example makes epubgrep ignore files which are not EPUB, suppress the file
|
|
|
|
|
names on output, print 2 words of context around matches (unless overridden on
|
|
|
|
|
the command line) and search for mentions of the words thyme and oregano in
|
|
|
|
|
every book.
|
2021-05-23 16:25:12 +02:00
|
|
|
|
|
|
|
|
|
[source,cfg]
|
|
|
|
|
--------------------------------------------------------------------------------
|
2021-05-27 22:21:25 +02:00
|
|
|
|
ignore-archive-errors = 1
|
|
|
|
|
no-filename = 1
|
2021-05-26 20:19:09 +02:00
|
|
|
|
context = 2
|
|
|
|
|
regexp = [Tt]hyme
|
|
|
|
|
regexp = [Oo]regano
|
2021-05-23 16:25:12 +02:00
|
|
|
|
--------------------------------------------------------------------------------
|
2021-05-23 06:35:04 +02:00
|
|
|
|
|
|
|
|
|
// == EXAMPLES
|
|
|
|
|
|
2021-05-23 16:25:12 +02:00
|
|
|
|
|
2021-05-23 06:35:04 +02:00
|
|
|
|
== FILES
|
|
|
|
|
|
|
|
|
|
*Configuration file*::
|
|
|
|
|
* If `XDG_CONFIG_HOME` is defined: `${XDG_CONFIG_HOME}/epubgrep.conf`
|
|
|
|
|
* If `HOME` is defined: `${HOME}/.config/epubgrep.conf`
|
|
|
|
|
* Otherwise: `epubgrep.conf`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
== KNOWN BUGS
|
|
|
|
|
|
|
|
|
|
EPUB files with non-ASCII file names only work reliably when the system locale
|
|
|
|
|
uses an encoding which has the necessary characters. Technically EPUBs must use
|
|
|
|
|
UTF-8 for file names but it is usually recommended to only use ASCII (ASCII is
|
|
|
|
|
valid UTF-8). If your system locale is not UTF-8, files may be silently skipped.
|
|
|
|
|
You can work around this by calling epubgrep like this:
|
|
|
|
|
`LC_ALL="C.UTF-8" epubgrep`
|
|
|
|
|
|
|
|
|
|
== REPORTING BUGS
|
|
|
|
|
|
|
|
|
|
Bugtracker: https://schlomp.space/tastytea/epubgrep/issues
|
|
|
|
|
|
|
|
|
|
E-mail: tastytea@tastytea.de
|
|
|
|
|
|
|
|
|
|
== SEE ALSO
|
|
|
|
|
|
|
|
|
|
*perlre*(1)
|
|
|
|
|
|
|
|
|
|
// LocalWords: epubgrep
|