7.3 KiB
epubgrep
epubgrep is a search tool for EPUB e-books. It does not operate on lines, but on whole files. All newlines will be replaced by spaces and HTML will be stripped. This means you can search for text spanning multiple lines and don’t have to worry about HTML tags in the text.
epubgrep is licensed under the AGPL-3.0-only. The bundled Termcolor is licensed under the BSD-3-Clause license.
Usage
See man page for more information.
Install
Gentoo
sudo eselect repository enable guru
echo 'app-text/epubgrep' | sudo tee -a /etc/portage/package.accept_keywords/epubgrep
sudo emaint sync -r guru
sudo emerge -a app-text/epubgrep
Debian and Ubuntu
wget -O - https://tastytea.de/tastytea.asc | sudo apt-key add -
sudo add-apt-repository 'deb https://apt.schlomp.space/[code name] [code name] main'
sudo apt install epubgrep
Replace [code name] with the code name of your installation. Packages are available for bullseye (Debian 11), buster (Debian 10), focal (Ubuntu 20.04) and bionic (Ubuntu 18.04).
Tip
|
If you get the error message that add-apt-repository was not found, install
software-properties-common .
|
From source
Dependencies
-
Tested OS: Linux
-
C++ compiler with C++17 support (tested: GCC 8/9/10, clang 6/11)
-
CMake (at least: 3.12)
-
Boost (tested: 1.75.0 / 1.65.0)
-
gettext (tested: 0.21 / 0.19)
-
libarchive (tested: 3.5 / 3.2)
-
fmt (tested: 7.0 / 4.0)
-
AsciiDoc (tested: 9.0 / 8.6)
-
Termcolor (tested: 2.0) (If not found, the bundled version is used.)
-
pugixml (tested: 1.11 / 1.8)
-
nlohmann_json (tested: 3.9 / 2.1)
-
Optional
-
Tests: Catch (tested: 2.13 / 1.10)
-
Install dependencies in Debian or Ubuntu
Or distributions that are derived from Debian or Ubuntu. You will need at least Debian buster (10) or Ubuntu focal (20.04).
sudo apt install build-essential cmake libboost-program-options-dev \
libboost-locale-dev libboost-regex-dev libboost-log-dev \
gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev \
nlohmann-json-dev
Tip
|
If nlohmann-json-dev can not be found, try nlohmann-json3-dev .
|
Install dependencies in openSUSE
Tested on openSUSE Leap 15.3.
sudo zypper install cmake gcc10-c++ rpm-build \
libboost_program_options1_75_0-devel \
libboost_locale1_75_0-devel libboost_log1_75_0-devel \
fmt-devel libarchive-devel pugixml-devel \
nlohmann_json-devel asciidoc
Get sourcecode
Release
Download the current release at schlomp.space.
Development version
git clone https://schlomp.space/tastytea/epubgrep.git
Compile
In a terminal, go to the directory where you unpacked / cloned the source code and then:
cmake -S . -B build
cmake --build build --parallel $(nproc --ignore=1)
To install, run sudo cmake --install build
. To run the tests, run ctest
--test-dir build
.
Tip
|
If you are using Debian or Ubuntu, or a distribution that is derived from these,
you can run cpack -G DEB in the build directory to generate a .deb-file. You
can then install it with apt install ./epubgrep-*.deb .
If you are using a distribution that uses RPM packages, like openSUSE or Fedora,
you can generate a package with cpack -G RPM and install it with zypper
install ./epubgrep-*.rpm or dnf install ./epubgrep-*.rpm .
|
-
-DCMAKE_BUILD_TYPE=Debug
for a debug build. -
-DWITH_TESTS=YES
if you want to compile the tests. -
-DXGETTEXT_CMD=String
The program to use instead ofxgettext
. -
-DFALLBACK_BUNDLED=NO
if you don’t want to fall back on bundled libraries. -
-DWITH_SANITIZER=YES
to use sanitizers in debug builds.
Similar projects
-
ripgrep-all can search EPUB files and strips HTML, but does not display page numbers or headings.
-
zipgrep from unzip can search EPUB files but does not strip HTML and does not display page numbers or headings.
Performance
A test with a directory containing 3333 EPUBs and 6269 files in total showed this difference between epubgrep-0.6.2 and ripgrep-all-0.9.6:
% hyperfine "epubgrep 'floor' ~/Books" "rga 'floor' ~/Books"
Benchmark #1: epubgrep 'floor' ~/Books
Time (mean ± σ): 167.246 s ± 3.848 s [User: 176.251 s, System: 79.107 s]
Range (min … max): 161.533 s … 173.647 s 10 runs
Benchmark #2: rga 'floor' ~/Books
Time (mean ± σ): 9.219 s ± 0.506 s [User: 17.540 s, System: 12.773 s]
Range (min … max): 8.571 s … 9.923 s 10 runs
Summary
'rga 'floor' ~/Books' ran
18.14 ± 1.08 times faster than 'epubgrep 'floor' ~/Books'