epubgrep/README.adoc

= epubgrep
:showtitle:
:toc: preamble
:project: epubgrep
:uri-base: https://schlomp.space/tastytea/{project}
:uri-branch-main: {uri-base}/src/branch/main

:uri-gcc: https://gcc.gnu.org/
:uri-clang: https://clang.llvm.org/
:uri-cmake: https://cmake.org/
:uri-catch: https://github.com/catchorg/Catch2
:uri-boost: https://www.boost.org/
:uri-gettext: https://www.gnu.org/software/gettext/
:uri-libarchive: https://www.libarchive.org/
:uri-fmt: https://github.com/fmtlib/fmt
:uri-asciidoc: http://asciidoc.org/
:uri-termcolor: https://termcolor.readthedocs.io/
:uri-pugixml: https://pugixml.org/
:uri-json: https://nlohmann.github.io/json/

:license: https://schlomp.space/tastytea/{project}/src/branch/main/LICENSE
:license-termcolor: https://schlomp.space/tastytea/{project}/src/branch/main/dist/termcolor/LICENSE

*{project}* is a search tool for EPUB e-books. It does not operate on lines, but
on whole files. All newlines will be replaced by spaces and HTML will be
stripped. This means you can search for text spanning multiple lines and don't
have to worry about HTML tags in the text.

{project} is licensed under the link:{license}[AGPL-3.0-only]. The bundled
link:{uri-termcolor}[Termcolor] is licensed under the
link:{license-termcolor}[BSD-3-Clause] license.

== Usage

[alt="Screenshot of epubgrep, showing the output of 2 book searches."]
image::{uri-base}/raw/branch/main/screenshot.png[]

See
https://schlomp.space/tastytea/{project}/src/branch/main/man/{project}.1.adoc[man
page] for more information.

== Install

[alt="Packaging status" link=https://repology.org/project/epubgrep/versions]
image::https://repology.org/badge/vertical-allrepos/epubgrep.svg[]

=== Gentoo

[source,shell]
--------------------------------------------------------------------------------
sudo eselect repository enable guru
echo 'app-text/epubgrep' | sudo tee -a /etc/portage/package.accept_keywords/epubgrep
sudo emaint sync -r guru
sudo emerge -a app-text/epubgrep
--------------------------------------------------------------------------------

=== Debian and Ubuntu

[source,shell]
--------------------------------------------------------------------------------
wget -O - https://tastytea.de/tastytea.asc | sudo apt-key add -
sudo add-apt-repository 'deb https://apt.schlomp.space/[code name] [code name] main'
sudo apt install epubgrep
--------------------------------------------------------------------------------

Replace _[code name]_ with the code name of your installation. Packages are
available for *bullseye* (Debian 11), *buster* (Debian 10), *focal* (Ubuntu
20.04) and *bionic* (Ubuntu 18.04).

[TIP]
If you get the error message that `add-apt-repository` was not found, install
`software-properties-common`.

=== From source

==== Dependencies

* Tested OS: Linux
* C\++ compiler with C++17 support (tested: link:{uri-gcc}[GCC] 8/9/10,
  link:{uri-clang}[clang] 6/11)
* link:{uri-cmake}[CMake] (at least: 3.12)
* link:{uri-boost}[Boost] (tested: 1.75.0 / 1.65.0)
* link:{uri-gettext}[gettext] (tested: 0.21 / 0.19)
* link:{uri-libarchive}[libarchive] (tested: 3.5 / 3.2)
* link:{uri-fmt}[fmt] (tested: 7.0 / 4.0)
* link:{uri-asciidoc}[AsciiDoc] (tested: 9.0 / 8.6)
* link:{uri-termcolor}[Termcolor] (tested: 2.0) (If not found, the bundled
  version is used.)
* link:{uri-pugixml}[pugixml] (tested: 1.11 / 1.8)
* link:{uri-json}[nlohmann_json] (tested: 3.9 / 2.1)
* Optional
  ** Tests: link:{uri-catch}[Catch] (tested: 2.13 / 1.10)

===== Install dependencies in Debian or Ubuntu

Or distributions that are derived from Debian or Ubuntu. You will need at least
Debian buster (10) or Ubuntu focal (20.04).

[source,shell]
--------------------------------------------------------------------------------
sudo apt install build-essential cmake libboost-program-options-dev \
                 libboost-locale-dev libboost-regex-dev libboost-log-dev \
                 gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev \
                 nlohmann-json-dev
--------------------------------------------------------------------------------

[TIP]
If `nlohmann-json-dev` can not be found, try `nlohmann-json3-dev`.

===== Install dependencies in openSUSE

Tested on openSUSE Leap 15.3.

[source,shell]
--------------------------------------------------------------------------------
sudo zypper install cmake gcc10-c++ rpm-build \
                    libboost_program_options1_75_0-devel \
                    libboost_locale1_75_0-devel libboost_log1_75_0-devel \
                    fmt-devel libarchive-devel pugixml-devel \
                    nlohmann_json-devel asciidoc
--------------------------------------------------------------------------------

==== Get sourcecode

===== Release

Download the current release at link:{uri-base}/releases[schlomp.space].

===== Development version

[source,shell]
--------------------------------------------------------------------------------
git clone https://schlomp.space/tastytea/epubgrep.git
--------------------------------------------------------------------------------

==== Compile

In a terminal, go to the directory where you unpacked / cloned the source code
and then:

[source,shell]
--------------------------------------------------------------------------------
cmake -S . -B build
cmake --build build --parallel $(nproc --ignore=1)
--------------------------------------------------------------------------------

To install, run `sudo cmake --install build`. To run the tests, run `ctest
--test-dir build`.

[TIP]
If you are using Debian or Ubuntu, or a distribution that is derived from these,
you can run `cpack -G DEB` in the build directory to generate a .deb-file. You
can then install it with `+++apt install ./epubgrep-*.deb+++`.
If you are using a distribution that uses RPM packages, like openSUSE or Fedora,
you can generate a package with `cpack -G RPM` and install it with `+++zypper
install ./epubgrep-*.rpm+++` or `+++dnf install ./epubgrep-*.rpm+++`.

.CMake options:
* `-DCMAKE_BUILD_TYPE=Debug` for a debug build.
* `-DWITH_TESTS=YES` if you want to compile the tests.
* `-DXGETTEXT_CMD=String` The program to use instead of `xgettext`.
* `-DFALLBACK_BUNDLED=NO` if you don't want to fall back on bundled libraries.
* `-DWITH_SANITIZER=YES` to use sanitizers in debug builds.

== Similar projects

* link:https://github.com/phiresky/ripgrep-all[ripgrep-all] can search EPUB
  files and strips HTML, but does not display page numbers or headings.
* zipgrep from link:http://infozip.sourceforge.net/[unzip] can search EPUB files
  but does not strip HTML and does not display page numbers or headings.

== Performance

A test with a directory containing 3333 EPUBs and 6269 files in total showed
this difference between epubgrep-0.6.2 and ripgrep-all-0.9.6:

[source,shellsession]
--------------------------------------------------------------------------------
% hyperfine "epubgrep 'floor' ~/Books" "rga 'floor' ~/Books"
Benchmark #1: epubgrep 'floor' ~/Books
  Time (mean ± σ):     167.246 s ±  3.848 s    [User: 176.251 s, System: 79.107 s]
  Range (min … max):   161.533 s … 173.647 s    10 runs

Benchmark #2: rga 'floor' ~/Books
  Time (mean ± σ):      9.219 s ±  0.506 s    [User: 17.540 s, System: 12.773 s]
  Range (min … max):    8.571 s …  9.923 s    10 runs

Summary
  'rga 'floor' ~/Books' ran
   18.14 ± 1.08 times faster than 'epubgrep 'floor' ~/Books'
--------------------------------------------------------------------------------

include::{uri-base}/raw/branch/main/CONTRIBUTING.adoc[]