epubgrep/man/epubgrep.1.adoc

212 lines
6.9 KiB
Plaintext
Raw Permalink Normal View History

2021-05-23 06:35:04 +02:00
= epubgrep(1)
:doctype: manpage
:Author: tastytea
:Email: tastytea@tastytea.de
:Date: 2021-07-02
2021-05-23 06:35:04 +02:00
:Revision: 0.0.0
:man source: epubgrep
:man manual: General Commands Manual
== NAME
2021-05-28 22:59:54 +02:00
epubgrep - Search tool for EPUB e-books.
2021-05-23 06:35:04 +02:00
== SYNOPSIS
*epubgrep* [_OPTION_]… _PATTERN_ _FILE_…
2021-05-23 06:35:04 +02:00
== DESCRIPTION
*epubgrep* searches EPUB files in a similar way as grep. It uses the same names
for command line switches where possible. However, not all grep switches are
implemented and some additional switches are added.
2021-05-23 06:35:04 +02:00
This manual is also available at
<https://man.schlomp.space/tastytea/?program=epubgrep>.
2021-06-24 12:34:48 +02:00
== EXAMPLES
.Search for Apple(s) or Orange(s) with 2 words of context around the matches, case insensitively
[source,shell]
--------------------------------------------------------------------------------
epubgrep -PiC2 '(Apple|Orange)s?' file.epub
--------------------------------------------------------------------------------
.Extract external hyperlinks
[source,shell]
--------------------------------------------------------------------------------
epubgrep -PC0 --raw --no-filename=all '"http[^"]+"' file.epub | tr -d '"'
--------------------------------------------------------------------------------
.Save the search results to an HTML file and output a status message every 20 seconds
[source,shell]
--------------------------------------------------------------------------------
epubgrep -C2 --status --status-interval=20 --html 'Apples' file.epub > result.html
--------------------------------------------------------------------------------
2021-05-23 06:35:04 +02:00
== OPTIONS
=== General options
2021-05-23 06:35:04 +02:00
*-h*, *--help*::
Display a short help message and exit.
*V*, *--version*::
Show version, copyright and license.
*--debug*::
Write debug output to the terminal and log file.
=== Search options
*-G*, *--basic-regexp*::
_PATTERN_ is a POSIX basic regular expression. This is the default.
2021-05-23 06:35:04 +02:00
*-E*, *--extended-regexp*::
_PATTERN_ is a POSIX extended regular expression.
*--grep*::
In combination with *--basic-regexp* or *--extended-regexp*, _PATTERN_ is
treated as a newline separated list of expressions, a match is found if any of
the expressions in the list match.
2021-05-23 06:35:04 +02:00
*-P*, *--perl-regexp*::
_PATTERN_ is a Perl regular expression.
2021-05-23 06:35:04 +02:00
*-i*, *--ignore-case*::
Ignore case distinctions in pattern and data.
2021-05-24 07:50:50 +02:00
*-a*, *--raw*::
Do not clean up text before searching. No HTML stripping, no newline removal,
all files will be read (not just the text documents listed in the spine).
2021-05-24 07:50:50 +02:00
*-r*, *--recursive*::
Read all files under each directory, recursively, following symbolic links only
if they are on the command line. Silently skips directories that are not
readable by the user.
*-R*, *--dereference-recursive*::
Read all files under each directory, recursively. Follow all symbolic
links. Silently skips directories that are not readable by the user.
*-e* _PATTERN_, *--regexp* _PATTERN_::
Use additional _PATTERN_ for matching. Can be used more than once.
=== Output options
2021-05-24 21:49:27 +02:00
*-C* _NUMBER_, *context* _NUMBER_::
2021-05-24 08:23:21 +02:00
Print _NUMBER_ words of context around matches.
2021-05-24 07:50:50 +02:00
2021-05-25 11:56:17 +02:00
*--nocolor*::
Turn off colors and other decorations.
2021-05-25 11:56:17 +02:00
*--no-filename* _WHICH_::
Suppress the mentioning of file names on output. _WHICH_ is filesystem for the
file names on your file systems, in-epub for the file names inside the EPUB or
all. Chapters and page numbers will still be output.
*--ignore-archive-errors*::
Ignore errors about wrong file formats. When you search directories recursively,
it is likely that there are files which are not EPUB files. This setting
suppresses errors related to them.
*--json*::
Output JSON instead of plain text. JSON will only be output at the end of the
program. There will be an object named `generator` with the property
`epubgrep`. The value is the version of the program, as string. The matches are
in an array named `matches`. I will try not to break the API. 😊
2021-06-01 15:36:18 +02:00
*--html*::
Output HTML instead of plain text. HTML will only be output at the end of the
program.
*--status*::
Output status message every *--status-interval* seconds to standard
error. Default is 30.
*--status-interval* _NUMBER_::
Set status message interval to _NUMBER_ seconds.
== USAGE
[source,shellsession]
--------------------------------------------------------------------------------
$ epubgrep -i makhno -C 4 The_Bolshevik_Myth.epub
OPS/piece000038.xhtml, Chapter 33. Dark People, page 141: in the campaign against Makhno, and they were exchanging
--------------------------------------------------------------------------------
The output is <file path in epub>, <last headline>, <page number>: <context
before><match><context after>. <last headline> and <page number> may not be available.
=== Differences to grep
epubgrep does not operate on lines, but on whole files. All newlines will be
replaced by spaces (multiple newlines will be condensed into one space) and HTML
will be stripped. This means you can search for text spanning multiple lines and
don't have to worry about HTML tags in the text. Use *--raw* if you want to
search in the raw files instead.
=== Configuration
2021-05-23 06:35:04 +02:00
Every command line switch can be used as an option in the configuration file. If
the switch has no value (it is a simple on switch), it has to be written as
`option = 1`. Do not use quotation marks around the values, they will be taken
literally.
2021-05-23 06:35:04 +02:00
Command line options overwrite configuration file options. Options that can
occur more than once are merged.
2021-05-23 06:35:04 +02:00
==== Example configuration file
2021-06-02 11:21:27 +02:00
This example makes epubgrep always search directories recursively, ignore files
which are not EPUB, not print the file names inside the EPUB, print 2 words of
context around matches (unless overridden on the command line) and search for
mentions of the words thyme and oregano in every book.
[source,cfg]
--------------------------------------------------------------------------------
2021-06-02 11:21:27 +02:00
recursive = 1
2021-05-27 22:21:25 +02:00
ignore-archive-errors = 1
2021-06-02 11:21:27 +02:00
no-filename = in-epub
context = 2
regexp = [Tt]hyme
regexp = [Oo]regano
--------------------------------------------------------------------------------
2021-05-23 06:35:04 +02:00
// == EXAMPLES
2021-05-23 06:35:04 +02:00
== FILES
*Configuration file*::
* If `XDG_CONFIG_HOME` is defined: `${XDG_CONFIG_HOME}/epubgrep/epubgrep.conf`
* If `HOME` is defined: `${HOME}/.config/epubgrep/epubgrep.conf`
2021-05-23 06:35:04 +02:00
* Otherwise: `epubgrep.conf`
2021-05-31 19:45:01 +02:00
*Log file*::
* If `XDG_STATE_HOME` is defined: `${XDG_STATE_HOME}/epubgrep/epubgrep.log`
* If `HOME` is defined: `${HOME}/.local/state/epubgrep/epubgrep.log`
* Otherwise: `epubgrep.log`
2021-05-23 06:35:04 +02:00
== KNOWN BUGS
EPUB files with non-ASCII file names only work reliably when the system locale
uses an encoding which has the necessary characters. Technically EPUBs must use
UTF-8 for file names but it is usually recommended to only use ASCII (ASCII is
valid UTF-8). If your system locale is not UTF-8, files may be silently skipped.
You can work around this by calling epubgrep like this:
`LC_ALL="C.UTF-8" epubgrep`
== REPORTING BUGS
Bugtracker: https://schlomp.space/tastytea/epubgrep/issues
E-mail: tastytea@tastytea.de
== SEE ALSO
*perlre*(1)
// LocalWords: epubgrep