Search tool for EPUB e-books
Go to file
tastytea 449e315397
All checks were successful
continuous-integration/drone/push Build is passing
add performance section to readme
2022-10-01 20:41:23 +02:00
cmake CI: Add package generation for openSUSE Leap 15. 2021-08-20 21:07:16 +02:00
dist/termcolor Merge commit '6c33fb4dcebd5464d89ca3fb98bdf23847d81fbf' as 'dist/termcolor' 2021-05-25 10:43:32 +02:00
man Add sub-headings for option categories in man page. 2021-07-02 14:26:02 +02:00
src pass c strings to fmt (…) 2022-08-16 18:15:21 +02:00
tests fix tests (copy paste error) 2022-08-16 19:03:08 +02:00
translations Update german translation. 2021-08-20 17:07:12 +02:00
.clang-format Add .clang-tify and .clang-format. 2021-05-21 03:24:28 +02:00
.clang-tidy clang-tidy: change MinimumVariableNameLength to 2 2022-08-19 01:41:34 +02:00
.cmake-format.json disable cmake-format for now 2022-08-30 23:04:22 +02:00
.drone.yml fix release upload 2022-08-16 19:14:19 +02:00
.editorconfig Clean up .editorconfig. 2021-05-29 21:44:36 +02:00
.gitignore Ignore dap-mode file. 2021-06-22 17:23:26 +02:00
AUTHORS Initial commit. 2021-05-20 04:34:06 +02:00
CMakeLists.txt version bump 0.6.2 2022-08-16 18:38:04 +02:00
CMakePresets.json Make sanitizers optional. 2021-08-20 18:54:27 +02:00
CODE_OF_CONDUCT.adoc Initial commit. 2021-05-20 04:34:06 +02:00
CONTRIBUTING.adoc Update rebuild-commands in translator guide. 2021-06-02 16:00:33 +02:00
CREDITS Update credits. 2021-06-01 18:49:34 +02:00
LICENSE Initial commit. 2021-05-20 04:34:06 +02:00
README.adoc add performance section to readme 2022-10-01 20:41:23 +02:00
screenshot.png Update screenshot. 2021-06-02 10:04:27 +02:00

epubgrep

epubgrep is a search tool for EPUB e-books. It does not operate on lines, but on whole files. All newlines will be replaced by spaces and HTML will be stripped. This means you can search for text spanning multiple lines and dont have to worry about HTML tags in the text.

epubgrep is licensed under the AGPL-3.0-only. The bundled Termcolor is licensed under the BSD-3-Clause license.

Usage

Screenshot of epubgrep, showing the output of 2 book searches.

See man page for more information.

Install

Packaging status

Gentoo

sudo eselect repository enable guru
echo 'app-text/epubgrep' | sudo tee -a /etc/portage/package.accept_keywords/epubgrep
sudo emaint sync -r guru
sudo emerge -a app-text/epubgrep

Debian and Ubuntu

wget -O - https://tastytea.de/tastytea.asc | sudo apt-key add -
sudo add-apt-repository 'deb https://apt.schlomp.space/[code name] [code name] main'
sudo apt install epubgrep

Replace [code name] with the code name of your installation. Packages are available for bullseye (Debian 11), buster (Debian 10), focal (Ubuntu 20.04) and bionic (Ubuntu 18.04).

Tip
If you get the error message that add-apt-repository was not found, install software-properties-common.

From source

Dependencies

  • Tested OS: Linux

  • C++ compiler with C++17 support (tested: GCC 8/9/10, clang 6/11)

  • CMake (at least: 3.12)

  • Boost (tested: 1.75.0 / 1.65.0)

  • gettext (tested: 0.21 / 0.19)

  • libarchive (tested: 3.5 / 3.2)

  • fmt (tested: 7.0 / 4.0)

  • AsciiDoc (tested: 9.0 / 8.6)

  • Termcolor (tested: 2.0) (If not found, the bundled version is used.)

  • pugixml (tested: 1.11 / 1.8)

  • nlohmann_json (tested: 3.9 / 2.1)

  • Optional

    • Tests: Catch (tested: 2.13 / 1.10)

Install dependencies in Debian or Ubuntu

Or distributions that are derived from Debian or Ubuntu. You will need at least Debian buster (10) or Ubuntu focal (20.04).

sudo apt install build-essential cmake libboost-program-options-dev \
                 libboost-locale-dev libboost-regex-dev libboost-log-dev \
                 gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev \
                 nlohmann-json-dev
Tip
If nlohmann-json-dev can not be found, try nlohmann-json3-dev.
Install dependencies in openSUSE

Tested on openSUSE Leap 15.3.

sudo zypper install cmake gcc10-c++ rpm-build \
                    libboost_program_options1_75_0-devel \
                    libboost_locale1_75_0-devel libboost_log1_75_0-devel \
                    fmt-devel libarchive-devel pugixml-devel \
                    nlohmann_json-devel asciidoc

Get sourcecode

Release

Download the current release at schlomp.space.

Development version
git clone https://schlomp.space/tastytea/epubgrep.git

Compile

In a terminal, go to the directory where you unpacked / cloned the source code and then:

cmake -S . -B build
cmake --build build --parallel $(nproc --ignore=1)

To install, run sudo cmake --install build. To run the tests, run ctest --test-dir build.

Tip
If you are using Debian or Ubuntu, or a distribution that is derived from these, you can run cpack -G DEB in the build directory to generate a .deb-file. You can then install it with apt install ./epubgrep-*.deb. If you are using a distribution that uses RPM packages, like openSUSE or Fedora, you can generate a package with cpack -G RPM and install it with zypper install ./epubgrep-*.rpm or dnf install ./epubgrep-*.rpm.
CMake options:
  • -DCMAKE_BUILD_TYPE=Debug for a debug build.

  • -DWITH_TESTS=YES if you want to compile the tests.

  • -DXGETTEXT_CMD=String The program to use instead of xgettext.

  • -DFALLBACK_BUNDLED=NO if you dont want to fall back on bundled libraries.

  • -DWITH_SANITIZER=YES to use sanitizers in debug builds.

Similar projects

  • ripgrep-all can search EPUB files and strips HTML, but does not display page numbers or headings.

  • zipgrep from unzip can search EPUB files but does not strip HTML and does not display page numbers or headings.

Performance

A test with a directory containing 3333 EPUBs and 6269 files in total showed this difference between epubgrep-0.6.2 and ripgrep-all-0.9.6:

% hyperfine "epubgrep 'floor' ~/Books" "rga 'floor' ~/Books"
Benchmark #1: epubgrep 'floor' ~/Books
  Time (mean ± σ):     167.246 s ±  3.848 s    [User: 176.251 s, System: 79.107 s]
  Range (min … max):   161.533 s … 173.647 s    10 runs

Benchmark #2: rga 'floor' ~/Books
  Time (mean ± σ):      9.219 s ±  0.506 s    [User: 17.540 s, System: 12.773 s]
  Range (min … max):    8.571 s …  9.923 s    10 runs

Summary
  'rga 'floor' ~/Books' ran
   18.14 ± 1.08 times faster than 'epubgrep 'floor' ~/Books'