epubgrep/man/epubgrep.1.adoc

158 lines
4.9 KiB
Plaintext
Raw Normal View History

2021-05-23 06:35:04 +02:00
= epubgrep(1)
:doctype: manpage
:Author: tastytea
:Email: tastytea@tastytea.de
:Date: 2021-05-30
2021-05-23 06:35:04 +02:00
:Revision: 0.0.0
:man source: epubgrep
:man manual: General Commands Manual
== NAME
2021-05-28 22:59:54 +02:00
epubgrep - Search tool for EPUB e-books.
2021-05-23 06:35:04 +02:00
== SYNOPSIS
*epubgrep* [_OPTION_]… _PATTERN_ [_FILE_]…
== DESCRIPTION
*epubgrep* searches EPUB files in a similar way as grep. It uses the same names
for command line switches where possible. However, not all grep switches are
implemented and some additional switches are added.
2021-05-23 06:35:04 +02:00
== OPTIONS
*-h*, *--help*::
Display a short help message and exit.
*V*, *--version*::
Show version, copyright and license.
*-G*, *--basic-regexp*::
_PATTERN_ is a POSIX basic regular expression. This is the default.
2021-05-23 06:35:04 +02:00
*-E*, *--extended-regexp*::
_PATTERN_ is a POSIX extended regular expression.
*--grep*::
In combination with *--basic-regexp* or *--extended-regexp*, _PATTERN_ is
treated as a newline separated list of expressions, a match is found if any of
the expressions in the list match.
2021-05-23 06:35:04 +02:00
*-P*, *--perl-regexp*::
_PATTERN_ is a Perl regular expression.
2021-05-23 06:35:04 +02:00
*-i*, *--ignore-case*::
Ignore case distinctions in pattern and data.
*-e* _PATTERN_, *--regexp* _PATTERN_::
Use additional _PATTERN_ for matching. Can be used more than once.
2021-05-24 07:50:50 +02:00
*-a*, *--raw*::
Do not clean up text before searching. No HTML stripping, no newline removal,
all files will be read (not just the text documents listed in the spine).
2021-05-24 07:50:50 +02:00
2021-05-24 21:49:27 +02:00
*-C* _NUMBER_, *context* _NUMBER_::
2021-05-24 08:23:21 +02:00
Print _NUMBER_ words of context around matches.
2021-05-24 07:50:50 +02:00
2021-05-25 11:56:17 +02:00
*--nocolor*::
Turn off colors and other decorations.
2021-05-25 11:56:17 +02:00
*--no-filename* _WHICH_::
Suppress the mentioning of file names on output. _WHICH_ is filesystem for the
file names on your file systems, in-epub for the file names inside the EPUB or
all. Chapters and page numbers will still be output.
2021-05-27 21:55:51 +02:00
*-r*, *--recursive*::
Read all files under each directory, recursively, following symbolic links only
if they are on the command line. Silently skips directories that are not
readable by the user.
2021-05-27 21:55:51 +02:00
*-R*, *--dereference-recursive*::
Read all files under each directory, recursively. Follow all symbolic
links. Silently skips directories that are not readable by the user.
*--ignore-archive-errors*::
Ignore errors about wrong file formats. When you search directories recursively,
it is likely that there are files which are not EPUB files. This setting
suppresses errors related to them.
== USAGE
[source,shellsession]
--------------------------------------------------------------------------------
$ epubgrep -i makhno -C 4 The_Bolshevik_Myth.epub
OPS/piece000038.xhtml, Chapter 33. Dark People, page 141: in the campaign against Makhno, and they were exchanging
--------------------------------------------------------------------------------
The output is <file path in epub>, <last headline>, <page number>: <context
before><match><context after>. <last headline> and <page number> may not be available.
=== Differences to grep
epubgrep does not operate on lines, but on whole files. All newlines will be
replaced by spaces (multiple newlines will be condensed into one space) and HTML
will be stripped. This means you can search for text spanning multiple lines and
don't have to worry about HTML tags in the text. Use *--raw* if you want to
search in the raw files instead.
=== Configuration
2021-05-23 06:35:04 +02:00
Every command line switch can be used as an option in the configuration file. If
the switch has no value (it is a simple on switch), it has to be written as
`option = 1`. Do not use quotation marks around the values, they will be taken
literally.
2021-05-23 06:35:04 +02:00
Command line options overwrite configuration file options. Options that can
occur more than once are merged.
2021-05-23 06:35:04 +02:00
==== Example configuration file
This example makes epubgrep ignore files which are not EPUB, suppress the file
names on output, print 2 words of context around matches (unless overridden on
the command line) and search for mentions of the words thyme and oregano in
every book.
[source,cfg]
--------------------------------------------------------------------------------
2021-05-27 22:21:25 +02:00
ignore-archive-errors = 1
no-filename = 1
context = 2
regexp = [Tt]hyme
regexp = [Oo]regano
--------------------------------------------------------------------------------
2021-05-23 06:35:04 +02:00
// == EXAMPLES
2021-05-23 06:35:04 +02:00
== FILES
*Configuration file*::
* If `XDG_CONFIG_HOME` is defined: `${XDG_CONFIG_HOME}/epubgrep.conf`
* If `HOME` is defined: `${HOME}/.config/epubgrep.conf`
* Otherwise: `epubgrep.conf`
== KNOWN BUGS
EPUB files with non-ASCII file names only work reliably when the system locale
uses an encoding which has the necessary characters. Technically EPUBs must use
UTF-8 for file names but it is usually recommended to only use ASCII (ASCII is
valid UTF-8). If your system locale is not UTF-8, files may be silently skipped.
You can work around this by calling epubgrep like this:
`LC_ALL="C.UTF-8" epubgrep`
== REPORTING BUGS
Bugtracker: https://schlomp.space/tastytea/epubgrep/issues
E-mail: tastytea@tastytea.de
== SEE ALSO
*perlre*(1)
// LocalWords: epubgrep