Compare commits

...

48 Commits
0.6.0 ... main

Author SHA1 Message Date
tastytea 449e315397
add performance section to readme
continuous-integration/drone/push Build is passing Details
2022-10-01 20:41:23 +02:00
tastytea 7eae29031f
disable cmake-format for now 2022-08-30 23:04:22 +02:00
tastytea 531a409124
clang-tidy: change MinimumVariableNameLength to 2 2022-08-19 01:41:34 +02:00
tastytea 22a50ef661
up the cognitive threshold to 30
continuous-integration/drone/push Build is passing Details
2022-08-16 21:42:13 +02:00
tastytea 94555621d8
fix release upload
continuous-integration/drone/push Build is passing Details
2022-08-16 19:14:19 +02:00
tastytea cfe274f1e1
fix tests (copy paste error)
continuous-integration/drone/push Build was killed Details
2022-08-16 19:03:08 +02:00
tastytea eb4630d738
version bump 0.6.2
continuous-integration/drone/push Build was killed Details
botched the 0.6.1 release 😅
2022-08-16 18:38:04 +02:00
tastytea bbc412db45
add support for testing with catch 3
continuous-integration/drone/push Build was killed Details
2022-08-16 18:35:00 +02:00
tastytea c0a2f7e779
pass c strings to fmt (…)
continuous-integration/drone/push Build was killed Details
2022-08-16 18:15:21 +02:00
tastytea 4b5e6898cd
pass c strings to fmt (and one more)
continuous-integration/drone/push Build was killed Details
2022-08-16 18:10:15 +02:00
tastytea c16265683f
pass c strings to fmt (found some more)
continuous-integration/drone/push Build is failing Details
2022-08-16 17:59:03 +02:00
tastytea d438e2292f
pass c strings to fmt (forgot some)
continuous-integration/drone/push Build is passing Details
2022-08-16 17:42:42 +02:00
tastytea 089eac4cfc
CI: install file on Debian and Ubuntu for .dev generation 2022-08-16 17:36:53 +02:00
tastytea 63a8ab2683
pass c strings to fmt
continuous-integration/drone/push Build is passing Details
boost strings and filesystem paths used to be automatically converted,
but that doesn't happen anymore with fmt 9
2022-08-16 16:26:17 +02:00
tastytea cd03898039
update .clang-tidy
continuous-integration/drone/push Build is passing Details
2022-08-16 05:30:59 +02:00
tastytea 550a1143a5
Don't install useless asciidoc dependencies. 2021-12-22 20:22:16 +01:00
tastytea d1083b7dca
CI: Fix dependencies. 2021-08-21 00:39:03 +02:00
tastytea 1058903def
Add more information about RPMs to readme.
continuous-integration/drone/push Build is passing Details
2021-08-21 00:14:04 +02:00
tastytea 5d28b1f4ef
CI: Modify zypper repos more elegantly. 2021-08-20 21:54:22 +02:00
tastytea bb37e53207
CI: refresh zypper data, resolve build dir conflict.
continuous-integration/drone/push Build is passing Details
2021-08-20 21:41:04 +02:00
tastytea 1bddad7083
CI: Fix openSUSE dependencies.
continuous-integration/drone/push Build is failing Details
2021-08-20 21:26:41 +02:00
tastytea 7daade6425
CI: Fix sed command. 2021-08-20 21:10:41 +02:00
tastytea c41f3a2485
CI: Add package generation for openSUSE Leap 15.
continuous-integration/drone/push Build was killed Details
2021-08-20 21:07:16 +02:00
tastytea 3e23dc2cd9
CI: Build apt and zypper steps in parallel.
continuous-integration/drone/push Build was killed Details
2021-08-20 20:42:05 +02:00
tastytea 9c6dd5ca64
CI: Add rpm package cache. 2021-08-20 20:41:52 +02:00
tastytea c62799e00f
CI Add openSUSE with GCC 9.
continuous-integration/drone/push Build was killed Details
2021-08-20 20:30:17 +02:00
tastytea 636e84408c
Compile with debug flags and sanitizers in CI. 2021-08-20 18:58:23 +02:00
tastytea ef77a9e4fb
Make sanitizers optional. 2021-08-20 18:54:27 +02:00
tastytea 552df1a49e
Don't crash if language detection fails.
continuous-integration/drone/push Build is passing Details
If there is no container.xml or something unexpected happens, we just return an
empty string.
2021-08-20 17:51:44 +02:00
tastytea 1e0cde8a4b
Fix test, print exceptions. 2021-08-20 17:38:12 +02:00
tastytea 2bede91fb7
Remove some superfluous “std::”.
continuous-integration/drone/push Build is failing Details
2021-08-20 17:07:25 +02:00
tastytea 165592982a
Update german translation. 2021-08-20 17:07:12 +02:00
tastytea b1dcdea95e
Add language attribute to HTML output.
Bug: #16
2021-08-20 17:05:06 +02:00
tastytea 299063e02c
Add language to books, documents and matches.
Currently only the book's language is actually read and applied down the line.
2021-08-20 16:57:29 +02:00
tastytea fca719634a
Move OPF file path detection into own function. 2021-08-20 15:35:10 +02:00
tastytea d2aff45018
Move spine_filepaths() from zip:: to book::. 2021-08-20 15:29:55 +02:00
tastytea b134bd0301
Add pointer to preferred text version (raw or cleaned) to document. 2021-08-20 15:07:00 +02:00
tastytea d0738891c2
Ensure the correct order of files and the TOC.
continuous-integration/drone/push Build is passing Details
2021-08-17 14:22:28 +02:00
tastytea b53e99306c
Re-add support for raw text searching. 2021-08-17 13:55:53 +02:00
tastytea 84ef5d1bf3
Move book processing into own file.
continuous-integration/drone/push Build is failing Details
2021-08-17 13:05:14 +02:00
tastytea 97fecd37f0
Revert "Remove generator from CMake presets." – it is required.
continuous-integration/drone/push Build is passing Details
This reverts commit 49de44f729.
2021-08-05 20:27:17 +02:00
tastytea e154b62201
Add “Similar projects” to readme.
continuous-integration/drone/push Build is passing Details
2021-07-10 12:12:30 +02:00
tastytea 90eb30fa3e
Add sub-headings for option categories in man page. 2021-07-02 14:26:02 +02:00
tastytea 9cc1823b3b
clang-tidy: Set cognitive complexity threshold to 30.
25 is a bit low with a try-catch-block in a for-loop.
2021-06-29 02:09:40 +02:00
tastytea 2489c444df
Add experimental RPM packe config to CMake config. 2021-06-29 02:00:40 +02:00
tastytea c99c01162d
Silence some clang-tidy warnings.
- Thread-unsafe std::getenv and std::setlocale doesn't matter for us.
- It is unlikely that we can make main() less complex without making it more
  complex elsewhere.
- Thread-unsafe std::strerror stays unsolved for now.
2021-06-29 01:58:53 +02:00
tastytea 49de44f729
Remove generator from CMake presets. 2021-06-29 01:22:46 +02:00
tastytea bdcf153b47
Fix usage quick-help.
continuous-integration/drone/push Build is passing Details
FILE is not optional.
2021-06-26 15:14:57 +02:00
27 changed files with 773 additions and 382 deletions

View File

@ -1,5 +1,4 @@
# -*- mode: conf; fill-column: 100; -*-
# Written for clang-tidy 11.
# Written for clang-tidy 14.
---
Checks: '*,
@ -29,7 +28,9 @@ Checks: '*,
-fuchsia-multiple-inheritance,
-llvmlibc*,
-cppcoreguidelines-avoid-non-const-global-variables,
-cert-*-c'
-cert-*-c,
-abseil-string-find-*,
-altera-*'
FormatStyle: file # Use .clang-format.
CheckOptions: # ↓ Clashes with static private member prefix. (static int _var;) ↓
- { key: readability-identifier-naming.VariableCase, value: lower_case }
@ -39,9 +40,15 @@ CheckOptions: # ↓ Clashes with static private member prefix. (static int _va
- { key: readability-identifier-naming.ProtectedMemberCase, value: lower_case }
- { key: readability-identifier-naming.ProtectedMemberPrefix, value: _ }
- { key: readability-identifier-naming.ClassCase, value: lower_case }
- { key: readability-identifier-naming.ClassCase, value: lower_case }
- { key: readability-identifier-naming.StructCase, value: lower_case }
- { key: readability-identifier-naming.EnumCase, value: lower_case }
- { key: readability-identifier-naming.FunctionCase, value: lower_case }
- { key: readability-identifier-naming.ParameterCase, value: lower_case }
- { key: readability-function-cognitive-complexity.Threshold, value: 30 }
- { key: readability-identifier-length.MinimumVariableNameLength, value: 2 }
...
# -*- mode: yaml; fill-column: 100; -*-
# vim: set fenc=utf-8 tw=100 et ft=yaml:

5
.cmake-format.json Normal file
View File

@ -0,0 +1,5 @@
{
"format": {
"disable": true
}
}

View File

@ -4,9 +4,12 @@ kind: pipeline
type: docker
volumes:
- name: debian-package-cache
- name: deb-package-cache
host:
path: /var/cache/debian-package-cache
path: /var/cache/deb-package-cache
- name: rpm-package-cache
host:
path: /var/cache/rpm-package-cache
trigger:
event:
@ -14,7 +17,7 @@ trigger:
- tag
steps:
- name: GCC 10 / clang 11
- name: GCC 10 / clang 11 (debug)
image: debian:bullseye-slim
pull: always
environment:
@ -28,19 +31,19 @@ steps:
- apt-get update -q
- apt-get install -qq build-essential cmake clang locales
- apt-get install -qq catch libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev nlohmann-json3-dev
- rm -rf build && mkdir -p build && cd build
- cmake -G "Unix Makefiles" -DWITH_TESTS=YES ..
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- cmake -DCMAKE_BUILD_TYPE=Debug -G "Unix Makefiles" -DWITH_TESTS=YES -DWITH_SANITIZERS=YES ..
- make VERBOSE=1
- make install DESTDIR=install
- ctest -V
- cd ../
- rm -rf build && mkdir -p build && cd build
- CXX="clang++" cmake -G "Unix Makefiles" -DWITH_TESTS=YES ..
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- CXX="clang++" cmake -DCMAKE_BUILD_TYPE=Debug -G "Unix Makefiles" -DWITH_TESTS=YES -DWITH_SANITIZERS=YES ..
- make VERBOSE=1
- make install DESTDIR=install
- ctest -V
volumes:
- name: debian-package-cache
- name: deb-package-cache
path: /var/cache/apt/archives
- name: Download CMake 3.12 installer
@ -69,20 +72,43 @@ steps:
- apt-get install -qq catch libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev nlohmann-json-dev
- sh cmake_installer.sh --skip-license --exclude-subdir --prefix=/usr/local
- cp /usr/lib/x86_64-linux-gnu/libpugixml* /lib/x86_64-linux-gnu/
- rm -rf build && mkdir -p build && cd build
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- cmake -G "Unix Makefiles" -DWITH_TESTS=YES ..
- make VERBOSE=1
- make install DESTDIR=install
- ctest -V
- cd ../
- rm -rf build && mkdir -p build && cd build
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- CXX="clang++" cmake -G "Unix Makefiles" -DWITH_TESTS=YES ..
- make VERBOSE=1
- make install DESTDIR=install
- ctest -V
volumes:
- name: debian-package-cache
- name: deb-package-cache
path: /var/cache/apt/archives
depends_on:
- GCC 10 / clang 11 (debug)
- Download CMake 3.12 installer
- name: GCC 9
image: opensuse/leap:15
pull: always
environment:
CXX: g++-9
CXXFLAGS: -pipe -O2
LANG: C.UTF-8
commands:
- zypper --non-interactive modifyrepo --all --keep-packages
- zypper --non-interactive install cmake gcc9-c++ rpm-build
- zypper --non-interactive install Catch2-devel libboost_program_options1_75_0-devel libboost_locale1_75_0-devel libboost_log1_75_0-devel fmt-devel libarchive-devel pugixml-devel nlohmann_json-devel asciidoc
- rm -rf build_rpm && mkdir -p build_rpm && cd build_rpm
- cmake -G "Unix Makefiles" -DWITH_TESTS=YES ..
- make VERBOSE=1
- make install DESTDIR=install
- ctest -V
volumes:
- name: rpm-package-cache
path: /var/cache/zypp/packages
- name: notify
image: drillster/drone-email
@ -96,6 +122,11 @@ steps:
from_secret: email_password
when:
status: [ changed, failure ]
depends_on:
- GCC 10 / clang 11 (debug)
- Download CMake 3.12 installer
- GCC 9
- GCC 8 / clang 6
---
name: Packages x86_64
@ -103,9 +134,9 @@ kind: pipeline
type: docker
volumes:
- name: debian-package-cache
- name: deb-package-cache
host:
path: /var/cache/debian-package-cache
path: /var/cache/deb-package-cache
trigger:
event:
@ -124,16 +155,17 @@ steps:
- rm /etc/apt/apt.conf.d/docker-clean
- alias apt-get='rm -f /var/cache/apt/archives/lock && apt-get'
- apt-get update -q
- apt-get install -qq build-essential cmake clang locales lsb-release
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev nlohmann-json3-dev
- rm -rf build && mkdir -p build && cd build
- apt-get install -qq build-essential cmake clang locales lsb-release file
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev libpugixml-dev nlohmann-json3-dev
- apt-get install -qq --no-install-recommends asciidoc xsltproc
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr ..
- make VERBOSE=1
- make install DESTDIR=install
- cpack -G DEB
- cp -v epubgrep_${DRONE_TAG}-0_amd64_bullseye.deb ..
volumes:
- name: debian-package-cache
- name: deb-package-cache
path: /var/cache/apt/archives
- name: Debian buster
@ -149,17 +181,20 @@ steps:
- rm /etc/apt/apt.conf.d/docker-clean
- alias apt-get='rm -f /var/cache/apt/archives/lock && apt-get'
- apt-get update -q
- apt-get install -qq build-essential cmake clang locales lsb-release
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev nlohmann-json-dev
- rm -rf build && mkdir -p build && cd build
- apt-get install -qq build-essential cmake clang locales lsb-release file
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev libpugixml-dev nlohmann-json-dev
- apt-get install -qq --no-install-recommends asciidoc xsltproc
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr ..
- make VERBOSE=1
- make install DESTDIR=install
- cpack -G DEB
- cp -v epubgrep_${DRONE_TAG}-0_amd64_buster.deb ..
volumes:
- name: debian-package-cache
- name: deb-package-cache
path: /var/cache/apt/archives
depends_on:
- Debian bullseye
- name: Ubuntu focal
image: ubuntu:focal
@ -173,17 +208,21 @@ steps:
- rm /etc/apt/apt.conf.d/docker-clean
- alias apt-get='rm -f /var/cache/apt/archives/lock && apt-get'
- apt-get update -q
- apt-get install -qq build-essential cmake clang locales lsb-release
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev nlohmann-json3-dev
- rm -rf build && mkdir -p build && cd build
- apt-get install -qq build-essential cmake clang locales lsb-release file
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev libpugixml-dev nlohmann-json3-dev
- apt-get install -qq --no-install-recommends asciidoc xsltproc
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr ..
- make VERBOSE=1
- make install DESTDIR=install
- cpack -G DEB
- cp -v epubgrep_${DRONE_TAG}-0_amd64_focal.deb ..
volumes:
- name: debian-package-cache
- name: deb-package-cache
path: /var/cache/apt/archives
depends_on:
- Debian bullseye
- Debian buster
- name: Download CMake 3.12 installer
image: plugins/download
@ -207,19 +246,46 @@ steps:
- rm /etc/apt/apt.conf.d/docker-clean
- alias apt-get='rm -f /var/cache/apt/archives/lock && apt-get'
- apt-get update -q
- apt-get install -qq g++-8 build-essential clang locales lsb-release
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev asciidoc libpugixml-dev nlohmann-json-dev
- apt-get install -qq g++-8 build-essential clang locales lsb-release file
- apt-get install -qq libboost-program-options-dev libboost-locale-dev libboost-regex-dev libboost-log-dev gettext libarchive-dev libfmt-dev libpugixml-dev nlohmann-json-dev
- apt-get install -qq --no-install-recommends asciidoc xsltproc
- sh cmake_installer.sh --skip-license --exclude-subdir --prefix=/usr/local
- cp /usr/lib/x86_64-linux-gnu/libpugixml* /lib/x86_64-linux-gnu/
- rm -rf build && mkdir -p build && cd build
- rm -rf build_deb && mkdir -p build_deb && cd build_deb
- cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr ..
- make VERBOSE=1
- make install DESTDIR=install
- cpack -G DEB
- cp -v epubgrep_${DRONE_TAG}-0_amd64_bionic.deb ..
volumes:
- name: debian-package-cache
- name: deb-package-cache
path: /var/cache/apt/archives
depends_on:
- Debian bullseye
- Debian buster
- Ubuntu focal
- Download CMake 3.12 installer
- name: openSUSE Leap 15
image: opensuse/leap:15
pull: always
environment:
CXX: g++-9
CXXFLAGS: -pipe -O2
LANG: C.UTF-8
commands:
- zypper --non-interactive modifyrepo --all --keep-packages
- zypper --non-interactive install cmake gcc9-c++ rpm-build lsb-release
- zypper --non-interactive install libboost_program_options1_75_0-devel libboost_locale1_75_0-devel libboost_log1_75_0-devel fmt-devel libarchive-devel pugixml-devel nlohmann_json-devel asciidoc
- rm -rf build_rpm && mkdir -p build_rpm && cd build_rpm
- cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr ..
- make VERBOSE=1
- make install DESTDIR=install
- cpack -G RPM
- cp -v epubgrep-${DRONE_TAG}-0.x86_64.opensuse-$(lsb_release --release --short).rpm ..
volumes:
- name: rpm-package-cache
path: /var/cache/zypp/packages
- name: gitea_release
image: plugins/gitea-release
@ -235,8 +301,15 @@ steps:
- epubgrep_${DRONE_TAG}-0_amd64_bullseye.deb
- epubgrep_${DRONE_TAG}-0_amd64_focal.deb
- epubgrep_${DRONE_TAG}-0_amd64_bionic.deb
- epubgrep-${DRONE_TAG}-0.x86_64.opensuse-$(lsb_release --release --short).rpm
checksum:
- sha512
depends_on:
- Debian bullseye
- Debian buster
- Ubuntu focal
- Ubuntu bionic
- openSUSE Leap 15
- name: notification
image: drillster/drone-email
@ -250,3 +323,9 @@ steps:
from_secret: email_password
when:
status: [ changed, failure ]
depends_on:
- Debian bullseye
- Debian buster
- Ubuntu focal
- Download CMake 3.12 installer
- Ubuntu bionic

View File

@ -5,7 +5,7 @@ set(CMAKE_BUILD_TYPE "Release" CACHE STRING "The type of build.")
set(XGETTEXT_CMD "xgettext" CACHE STRING "The command for xgettext.")
project(epubgrep
VERSION 0.6.0
VERSION 0.6.2
DESCRIPTION "Search tool for EPUB e-books"
HOMEPAGE_URL "https://schlomp.space/tastytea/epubgrep"
LANGUAGES CXX)
@ -15,6 +15,7 @@ list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")
# Project build options.
option(WITH_TESTS "Compile tests." NO)
option(FALLBACK_BUNDLED "Fall back to bundled libs." YES)
option(WITH_SANITIZERS "Use sanitizers in debug builds." NO)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

View File

@ -22,25 +22,17 @@
"inherits": "common",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Debug",
"WITH_TESTS": true
"WITH_TESTS": true,
"WITH_SANITIZERS": false
}
},
{
"name": "dev_gcc",
"displayName": "Developer config, GCC",
"description": "Build using GCC with debug symbols and tests enabled",
"name": "dev_san",
"displayName": "Developer config, with sanitizers",
"description": "Build with debug symbols, tests enabled and sanitizers enabled",
"inherits": "dev",
"cacheVariables": {
"CMAKE_CXX_COMPILER": "g++"
}
},
{
"name": "dev_clang",
"displayName": "Developer config, clang",
"description": "Build using clang with debug symbols and tests enabled",
"inherits": "dev",
"cacheVariables": {
"CMAKE_CXX_COMPILER": "clang++"
"WITH_SANITIZERS": true
}
},
{

View File

@ -76,7 +76,7 @@ If you get the error message that `add-apt-repository` was not found, install
==== Dependencies
* Tested OS: Linux
* C\++ compiler with C++17 support (tested: link:{uri-gcc}[GCC] 8/10,
* C\++ compiler with C++17 support (tested: link:{uri-gcc}[GCC] 8/9/10,
link:{uri-clang}[clang] 6/11)
* link:{uri-cmake}[CMake] (at least: 3.12)
* link:{uri-boost}[Boost] (tested: 1.75.0 / 1.65.0)
@ -107,6 +107,19 @@ sudo apt install build-essential cmake libboost-program-options-dev \
[TIP]
If `nlohmann-json-dev` can not be found, try `nlohmann-json3-dev`.
===== Install dependencies in openSUSE
Tested on openSUSE Leap 15.3.
[source,shell]
--------------------------------------------------------------------------------
sudo zypper install cmake gcc10-c++ rpm-build \
libboost_program_options1_75_0-devel \
libboost_locale1_75_0-devel libboost_log1_75_0-devel \
fmt-devel libarchive-devel pugixml-devel \
nlohmann_json-devel asciidoc
--------------------------------------------------------------------------------
==== Get sourcecode
===== Release
@ -137,12 +150,44 @@ To install, run `sudo cmake --install build`. To run the tests, run `ctest
[TIP]
If you are using Debian or Ubuntu, or a distribution that is derived from these,
you can run `cpack -G DEB` in the build directory to generate a .deb-file. You
can then install it with `apt install ./epubgrep-*.deb`.
can then install it with `+++apt install ./epubgrep-*.deb+++`.
If you are using a distribution that uses RPM packages, like openSUSE or Fedora,
you can generate a package with `cpack -G RPM` and install it with `+++zypper
install ./epubgrep-*.rpm+++` or `+++dnf install ./epubgrep-*.rpm+++`.
.CMake options:
* `-DCMAKE_BUILD_TYPE=Debug` for a debug build.
* `-DWITH_TESTS=YES` if you want to compile the tests.
* `-DXGETTEXT_CMD=String` The program to use instead of `xgettext`.
* `-DFALLBACK_BUNDLED=NO` if you don't want to fall back on bundled libraries.
* `-DWITH_SANITIZER=YES` to use sanitizers in debug builds.
== Similar projects
* link:https://github.com/phiresky/ripgrep-all[ripgrep-all] can search EPUB
files and strips HTML, but does not display page numbers or headings.
* zipgrep from link:http://infozip.sourceforge.net/[unzip] can search EPUB files
but does not strip HTML and does not display page numbers or headings.
== Performance
A test with a directory containing 3333 EPUBs and 6269 files in total showed
this difference between epubgrep-0.6.2 and ripgrep-all-0.9.6:
[source,shellsession]
--------------------------------------------------------------------------------
% hyperfine "epubgrep 'floor' ~/Books" "rga 'floor' ~/Books"
Benchmark #1: epubgrep 'floor' ~/Books
Time (mean ± σ): 167.246 s ± 3.848 s [User: 176.251 s, System: 79.107 s]
Range (min … max): 161.533 s … 173.647 s 10 runs
Benchmark #2: rga 'floor' ~/Books
Time (mean ± σ): 9.219 s ± 0.506 s [User: 17.540 s, System: 12.773 s]
Range (min … max): 8.571 s … 9.923 s 10 runs
Summary
'rga 'floor' ~/Books' ran
18.14 ± 1.08 times faster than 'epubgrep 'floor' ~/Books'
--------------------------------------------------------------------------------
include::{uri-base}/raw/branch/main/CONTRIBUTING.adoc[]

View File

@ -24,10 +24,13 @@ if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR CMAKE_CXX_COMPILER_ID MATCHES "Clang"
"-Wdouble-promotion"
"-Wformat=2"
"-ftrapv"
"-fsanitize=undefined"
"-fsanitize=address"
"-Og"
"-fno-omit-frame-pointer")
if(WITH_SANITIZERS)
list(APPEND tmp_CXXFLAGS
"-fsanitize=undefined"
"-fsanitize=address")
endif()
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU")
list(APPEND tmp_CXXFLAGS
"-Wlogical-op"
@ -45,9 +48,11 @@ if(CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR CMAKE_CXX_COMPILER_ID MATCHES "Clang"
endif()
add_compile_options("$<$<CONFIG:Debug>:${tmp_CXXFLAGS}>")
list(APPEND tmp_LDFLAGS
"-fsanitize=undefined"
"-fsanitize=address")
if(WITH_SANITIZERS)
list(APPEND tmp_LDFLAGS
"-fsanitize=undefined"
"-fsanitize=address")
endif()
# add_link_options was introduced in version 3.13.
if(${CMAKE_VERSION} VERSION_LESS 3.13)
set(CMAKE_SHARED_LINKER_FLAGS_DEBUG "${tmp_LDFLAGS}")

View File

@ -6,7 +6,9 @@ set(CPACK_PACKAGE_CONTACT "tastytea <tastytea@tastytea.de>")
# Should be set automatically, but they are not.
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_PACKAGE_VERSION "${PROJECT_VERSION}")
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY "${CMAKE_PROJECT_DESCRIPTION}")
# DEB
# Figure out dependencies automatically.
set(CPACK_DEBIAN_PACKAGE_SHLIBDEPS ON)
@ -26,4 +28,30 @@ endif()
set(CPACK_DEBIAN_FILE_NAME
"${CPACK_PACKAGE_NAME}_${CPACK_PACKAGE_VERSION}-0_${CPACK_DEBIAN_PACKAGE_ARCHITECTURE}_${DEBIAN_CODENAME}.deb")
# RPM
set(CPACK_RPM_PACKAGE_LICENSE "AGPL-3")
# Figure out dependencies automatically.
set(CPACK_RPM_PACKAGE_AUTOREQ ON)
# Should be set automatically, but it is not.
execute_process(COMMAND uname -m
OUTPUT_VARIABLE CPACK_RPM_PACKAGE_ARCHITECTURE
OUTPUT_STRIP_TRAILING_WHITESPACE)
set(CPACK_PACKAGE_FILE_NAME
"${CPACK_PACKAGE_NAME}-${CPACK_PACKAGE_VERSION}-0.${CPACK_RPM_PACKAGE_ARCHITECTURE}")
execute_process(COMMAND lsb_release --id --short
OUTPUT_VARIABLE OS
OUTPUT_STRIP_TRAILING_WHITESPACE)
if("${OS}" STREQUAL "openSUSE")
execute_process(COMMAND lsb_release --release --short
OUTPUT_VARIABLE OS_RELEASE
OUTPUT_STRIP_TRAILING_WHITESPACE)
set(CPACK_PACKAGE_FILE_NAME
"${CPACK_PACKAGE_NAME}-${CPACK_PACKAGE_VERSION}-0.${CPACK_RPM_PACKAGE_ARCHITECTURE}.opensuse-${OS_RELEASE}")
endif()
include(CPack)

View File

@ -2,7 +2,7 @@
:doctype: manpage
:Author: tastytea
:Email: tastytea@tastytea.de
:Date: 2021-06-24
:Date: 2021-07-02
:Revision: 0.0.0
:man source: epubgrep
:man manual: General Commands Manual
@ -13,7 +13,7 @@ epubgrep - Search tool for EPUB e-books.
== SYNOPSIS
*epubgrep* [_OPTION_]… _PATTERN_ [_FILE_]
*epubgrep* [_OPTION_]… _PATTERN_ _FILE_…
== DESCRIPTION
@ -46,6 +46,8 @@ epubgrep -C2 --status --status-interval=20 --html 'Apples' file.epub > result.ht
== OPTIONS
=== General options
*-h*, *--help*::
Display a short help message and exit.
@ -55,6 +57,8 @@ Show version, copyright and license.
*--debug*::
Write debug output to the terminal and log file.
=== Search options
*-G*, *--basic-regexp*::
_PATTERN_ is a POSIX basic regular expression. This is the default.
@ -88,6 +92,8 @@ links. Silently skips directories that are not readable by the user.
*-e* _PATTERN_, *--regexp* _PATTERN_::
Use additional _PATTERN_ for matching. Can be used more than once.
=== Output options
*-C* _NUMBER_, *context* _NUMBER_::
Print _NUMBER_ words of context around matches.
@ -95,7 +101,6 @@ Print _NUMBER_ words of context around matches.
Turn off colors and other decorations.
*--no-filename* _WHICH_::
Suppress the mentioning of file names on output. _WHICH_ is filesystem for the
file names on your file systems, in-epub for the file names inside the EPUB or
all. Chapters and page numbers will still be output.

306
src/book.cpp Normal file
View File

@ -0,0 +1,306 @@
/* This file is part of epubgrep.
* Copyright © 2021 tastytea <tastytea@tastytea.de>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, version 3.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include "book.hpp"
#include "fs-compat.hpp"
#include "helpers.hpp"
#include "log.hpp"
#include "zip.hpp"
#include <boost/locale/message.hpp>
#include <boost/regex.hpp>
#include <fmt/format.h>
#include <fmt/ostream.h> // For compatibility with fmt 4.
#include <pugixml.hpp>
#include <algorithm>
#include <memory>
#include <string>
#include <string_view>
#include <vector>
namespace epubgrep::book
{
using boost::locale::translate;
using fmt::format;
using std::string;
book read(const fs::path filepath, const bool raw)
{
using helpers::unescape_html;
DEBUGLOG << "Processing book " << filepath;
std::vector<string> epub_filepaths{[&filepath, raw]
{
if (!raw)
{
return list_spine(filepath);
}
return zip::list(filepath);
}()};
book current_book;
current_book.language = [&filepath]() -> string
{
try
{
pugi::xml_document xml;
auto opf_file_path{get_opf_file_path(filepath)};
const std::string opf_file{
zip::read_file(filepath, opf_file_path.string())};
const auto result{xml.load_buffer(&opf_file[0], opf_file.size())};
if (result)
{
auto lang{xml.child("package")
.child("metadata")
.child("dc:language")};
if (lang == nullptr)
{
lang = xml.child("opf:package")
.child("opf:metadata")
.child("dc:language");
}
return lang.text().as_string();
}
}
catch (epubgrep::zip::exception &e)
{
if (e.code != 1) // 1 == container.xml not found.
{
LOG(log::sev::error) << e.what();
}
}
return "";
}();
DEBUGLOG << "Book language detected: " << current_book.language;
for (const auto &entry : epub_filepaths)
{
DEBUGLOG << "Processing document " << entry;
document doc;
if (!raw)
{
doc = process_page(unescape_html(zip::read_file(filepath, entry)));
}
else
{
doc.text_raw = zip::read_file(filepath, entry);
doc.text = std::make_unique<std::string>(doc.text_raw);
}
doc.language = current_book.language; // FIXME: Get language of doc.
current_book.files.emplace_back(entry, std::move(doc));
}
return current_book;
}
document process_page(const std::string_view text)
{
string output{text};
static const boost::regex re_header_start{"<[hH][1-6]"};
static const boost::regex re_header_end{"</[hH][1-6]"};
static const boost::regex re_pagebreak{"[^>]+pagebreak[^>]+"
"(title|aria-label)"
"=\"([[:alnum:]]+)\""};
{
size_t pos{0};
while ((pos = output.find_first_of("\n\t\r", pos)) != string::npos)
{
if (output[pos] == '\r')
{
output.erase(pos, 1);
}
else
{
output.replace(pos, 1, " ");
}
}
}
{
size_t pos{0};
while ((pos = output.find(" ", pos)) != string::npos)
{
output.replace(pos, 2, " ");
}
}
size_t pos{0};
document doc;
size_t headline_start{string::npos};
while ((pos = output.find('<', pos)) != string::npos)
{
auto endpos{output.find('>', pos) + 1};
if (boost::regex_match(output.substr(pos, 3), re_header_start))
{
headline_start = pos;
}
else if (boost::regex_match(output.substr(pos, 4), re_header_end))
{
if (headline_start != string::npos)
{
doc.headlines.insert(
{headline_start,
output.substr(headline_start, pos - headline_start)});
headline_start = string::npos;
}
}
else if (output.substr(pos, 6) == "<span ")
{
boost::match_results<string::const_iterator> match;
using it_size_t = string::const_iterator::difference_type;
string::const_iterator begin{output.begin()
+ static_cast<it_size_t>(pos)};
string::const_iterator end{output.begin()
+ static_cast<it_size_t>(endpos)};
if (boost::regex_search(begin, end, match, re_pagebreak))
{
doc.pages.insert({pos, match[2].str()});
}
}
else if (output.substr(pos, 7) == "<style "
|| output.substr(pos, 8) == "<script ")
{
if (output.find("/>", pos) > endpos)
{
endpos = output.find('>', endpos) + 1;
}
}
output.erase(pos, endpos - pos);
}
doc.text_cleaned = output;
doc.text = std::make_unique<string>(doc.text_cleaned);
return doc;
}
std::string headline(const document &doc, const size_t pos)
{
std::string_view last;
for (const auto &pair : doc.headlines)
{
if (pair.first > pos)
{
break;
}
last = pair.second;
}
return string(last);
}
string page(const document &doc, const size_t pos)
{
std::string_view last;
for (const auto &pair : doc.pages)
{
if (pair.first > pos)
{
break;
}
last = pair.second;
}
return string(last);
}
fs::path get_opf_file_path(const fs::path &zipfile)
{
pugi::xml_document xml;
const std::string container{
zip::read_file(zipfile, "META-INF/container.xml")};
const auto result{xml.load_buffer(&container[0], container.size())};
if (result)
{
return fs::path{xml.child("container")
.child("rootfiles")
.first_child()
.attribute("full-path")
.value()};
}
LOG(log::sev::error) << result.description() << '\n';
return fs::path{};
}
std::vector<string> list_spine(const fs::path &filepath)
{
auto opf_file_path{get_opf_file_path(filepath)};
std::vector<std::string> spine_filepaths;
if (!opf_file_path.empty())
{
DEBUGLOG << "Parsing " << opf_file_path;
pugi::xml_document xml;
const std::string opf_file{
zip::read_file(filepath, opf_file_path.string())};
const auto result{xml.load_buffer(&opf_file[0], opf_file.size())};
if (result)
{
auto manifest{xml.child("package").child("manifest")};
if (manifest == nullptr)
{
manifest = xml.child("opf:package").child("opf:manifest");
}
auto spine{xml.child("package").child("spine")};
if (spine == nullptr)
{
spine = xml.child("opf:package").child("opf:spine");
}
for (const auto &itemref : spine)
{
const auto &idref{itemref.attribute("idref").value()};
const auto &item{manifest.find_child_by_attribute("id", idref)};
auto href{helpers::urldecode(item.attribute("href").value())};
if (href[0] != '/')
{
href = (opf_file_path.parent_path() /= href);
}
DEBUGLOG << "Found in spine: " << href;
spine_filepaths.emplace_back(href);
}
}
else
{
LOG(log::sev::error) << "XML: " << result.description() << '\n';
}
}
if (opf_file_path.empty() || spine_filepaths.empty())
{
LOG(log::sev::error)
<< format(translate("{0:s} is damaged. Could not read spine. "
"Skipping file.\n")
.str()
.c_str(),
filepath.c_str());
return {};
}
return spine_filepaths;
}
} // namespace epubgrep::book

73
src/book.hpp Normal file
View File

@ -0,0 +1,73 @@
/* This file is part of epubgrep.
* Copyright © 2021 tastytea <tastytea@tastytea.de>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, version 3.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef EPUBGREP_BOOK_HPP
#define EPUBGREP_BOOK_HPP
#include "fs-compat.hpp"
#include <map>
#include <memory>
#include <string>
#include <string_view>
#include <utility>
#include <vector>
namespace epubgrep::book
{
using std::string;
//! Document inside EPUB.
struct document
{
string text_raw; //!< HTML page
string text_cleaned; //!< Plain text page
std::unique_ptr<string> text; //!< Pointer to preferred text version
std::map<size_t, string> headlines; //!< pos, title
std::map<size_t, string> pages; //!< pos, page
string language; //!< Page language
} __attribute__((aligned(128)));
//! EPUB file.
struct book
{
std::vector<std::pair<string, document>> files; //!< filename, file
std::vector<std::pair<string, string>> toc; //!< title, href
string language; //!< Book language
} __attribute__((aligned(128)));
//! Read and process book.
[[nodiscard]] book read(fs::path filepath, bool raw);
//! Clean up page and record headlines and page numbers.
[[nodiscard]] document process_page(std::string_view text);
//! Return last headline if possible.
[[nodiscard]] string headline(const document &doc, size_t pos);
//! Return current page if possible.
[[nodiscard]] string page(const document &doc, size_t pos);
//! Returns the file path of the OPF file in the EPUB.
[[nodiscard]] fs::path get_opf_file_path(const fs::path &zipfile);
//! Returns the files in the EPUB “spine” (all pages that are actually text).
[[nodiscard]] std::vector<string> list_spine(const fs::path &filepath);
} // namespace epubgrep::book
#endif // EPUBGREP_BOOK_HPP

View File

@ -184,7 +184,7 @@ std::string unescape_html(const std::string_view html)
std::string_view get_env(const std::string_view name)
{
const char *env = std::getenv(name.data());
const char *env = std::getenv(name.data()); // NOLINT(concurrency-mt-unsafe)
if (env != nullptr)
{
return env;

View File

@ -48,6 +48,7 @@
constexpr int EXIT_FATAL{2}; // NOLINT(readability-identifier-naming)
// NOLINTNEXTLINE(readability-function-cognitive-complexity)
int main(int argc, char *argv[])
{
using namespace epubgrep;
@ -59,7 +60,7 @@ int main(int argc, char *argv[])
// locale_generator("").name.c_str() returns "*" instead of "". That's why
// the global C locale isn't changed. So we have to set it additionally.
std::setlocale(LC_ALL, "");
std::setlocale(LC_ALL, ""); // NOLINT(concurrency-mt-unsafe)
boost::locale::generator locale_generator;
locale_generator.add_messages_path("translations");
locale_generator.add_messages_path("/usr/share/locale");
@ -125,8 +126,10 @@ int main(int argc, char *argv[])
}
LOG(log::sev::error)
<< format(translate("Could not open {0:s}: {1:s}").str(),
e.path1(), e.what());
<< format(translate("Could not open {0:s}: {1:s}")
.str()
.c_str(),
e.path1().c_str(), e.what());
return_code = EXIT_FAILURE;
}
}
@ -174,9 +177,11 @@ int main(int argc, char *argv[])
catch (const std::ifstream::failure &e)
{
LOG(log::sev::error)
<< std::strerror(errno)
<< format(translate(" (while opening {0:s})").str(),
filepath);
<< std::strerror(errno) // FIXME: Not thread safe.
<< format(translate(" (while opening {0:s})")
.str()
.c_str(),
filepath.c_str());
return EXIT_FAILURE;
}
catch (const boost::regex_error &e)
@ -232,10 +237,11 @@ int main(int argc, char *argv[])
while (cancel.wait_for(std::chrono::seconds(opts.status_interval))
!= std::future_status::ready)
{
std::cerr
<< format(translate("{0:d} of {1:d} books searched.").str(),
books_searched, input_files.size())
<< '\n';
std::cerr << format(translate("{0:d} of {1:d} books searched.")
.str()
.c_str(),
books_searched, input_files.size())
<< '\n';
}
std::cerr << translate("All books searched.") << '\n';
}};

View File

@ -159,7 +159,7 @@ options parse_options(int argc, char *argv[])
if (vm.count("help") != 0)
{
cout << translate("Usage: epubgrep [OPTION]… PATTERN [FILE]\n");
cout << translate("Usage: epubgrep [OPTION]… PATTERN FILE…\n");
cout << options_visible;
cout << translate("\nYou can access the full manual "
"with `man epubgrep`.\n");

View File

@ -44,8 +44,8 @@ void print_matches(const std::vector<search::match> &matches,
{
cout << termcolor::yellow;
}
cout << format(translate(" In {0:s}: \n").str(),
fs::relative(matches[0].filepath_epub));
cout << format(translate(" In {0:s}: \n").str().c_str(),
fs::relative(matches[0].filepath_epub).c_str());
if (!opts.nocolor)
{
cout << termcolor::reset;
@ -140,8 +140,10 @@ void html_all(const std::vector<std::vector<search::match>> &matches_all,
{
std::uint64_t count{1};
cout << "<!DOCTYPE html>\n"
<< "<html><head><title>epubgrep output</title>"
cout << "<!DOCTYPE html>\n";
// Translators: Replace “en” with your language code here.
cout << format(R"(<html lang="{0:s}">)", translate("en").str());
cout << "<head><title>epubgrep output</title>"
"<style>article { margin: 1em; }</style>"
"</head><body>\n\n";
@ -167,21 +169,31 @@ void html_all(const std::vector<std::vector<search::match>> &matches_all,
if (!opts.no_fn_epub)
{
cout << format(R"( <th id="file_path_{0:d}">{1:s}</th>)",
count, translate("File path (in EPUB file)"))
count,
translate("File path (in EPUB file)").str().c_str())
<< '\n';
}
cout << format(R"( <th id="headline_{0:d}">{1:s}</th>)", count,
translate("Last headline"))
translate("Last headline").str().c_str())
<< '\n'
<< format(R"( <th id="page_{0:d}">{1:s}</th>)", count,
translate("Page number"))
translate("Page number").str().c_str())
<< '\n'
<< format(R"( <th id="match_{0:d}">{1:s}</th>)", count,
translate("Match"))
translate("Match").str().c_str())
<< "\n </tr>\n";
for (const auto &match : matches)
{
const auto lang{[&match]
{
if (!match.language.empty())
{
return format(R"( lang="{0:s}")",
match.language);
}
return std::string{};
}()};
cout << " <tr>\n";
if (!opts.no_fn_epub)
{
@ -190,15 +202,16 @@ void html_all(const std::vector<std::vector<search::match>> &matches_all,
match.filepath_inside)
<< '\n';
}
cout << format(R"( <td headers="headline_{0:d}">{1:s}</td>)",
count, match.headline)
cout << format(
R"( <td headers="headline_{0:d}"{1:s}>{2:s}</td>)", count,
lang, match.headline)
<< '\n';
cout << format(R"( <td headers="page_{0:d}">{1:s}</td>)",
count, match.page)
<< '\n';
cout << format(R"( <td headers="match_{0:d}">{1:s})"
R"(<strong>{2:s}</strong>{3:s}</td>)",
count, match.context.first, match.text,
cout << format(R"( <td headers="match_{0:d}"{1:s}>{2:s})"
R"(<strong>{3:s}</strong>{4:s}</td>)",
count, lang, match.context.first, match.text,
match.context.second)
<< '\n';
cout << " </tr>\n";

View File

@ -16,6 +16,7 @@
#include "search.hpp"
#include "book.hpp"
#include "fs-compat.hpp"
#include "helpers.hpp"
#include "log.hpp"
@ -43,8 +44,8 @@ std::vector<match> search(const fs::path &filepath,
const std::string_view regex, const settings &opts)
{
LOG(log::sev::info)
<< format(R"(Starting search in {0:s} using regex "{1:s}")", filepath,
regex);
<< format(R"(Starting search in {0:s} using regex "{1:s}")",
filepath.c_str(), regex);
boost::regex::flag_type flags{};
switch (opts.regex)
@ -73,33 +74,12 @@ std::vector<match> search(const fs::path &filepath,
const boost::regex re(regex.data(), flags);
std::vector<match> matches;
std::vector<string> epub_filepaths{[&opts, &filepath]
{
if (!opts.raw)
{
return zip::list_spine(filepath);
}
return zip::list(filepath);
}()};
for (const auto &entry : epub_filepaths)
auto book{book::read(filepath, opts.raw)};
for (const auto &file : book.files)
{
DEBUGLOG << "Processing " << entry;
file_in_epub file;
{
const auto document{zip::read_file(filepath, entry)};
if (!opts.raw)
{
file = cleanup_text(helpers::unescape_html(document));
}
else
{
file.text = document;
}
}
string::const_iterator begin{file.text.begin()};
string::const_iterator end{file.text.end()};
const auto &doc{file.second};
string::const_iterator begin{doc.text->begin()};
string::const_iterator end{doc.text->end()};
auto begin_text{begin};
boost::match_results<string::const_iterator> match_result;
@ -108,13 +88,14 @@ std::vector<match> search(const fs::path &filepath,
{
match match; // FIXME: Rename variable or struct.
match.filepath_epub = filepath;
match.filepath_inside = entry;
match.filepath_inside = file.first;
match.text = match_result[0];
match.context = context(match_result, opts.context);
const auto pos = static_cast<size_t>(
std::distance(begin_text, match_result[0].begin()));
match.headline = headline(file, pos);
match.page = page(file, pos);
match.headline = headline(doc, pos);
match.page = page(doc, pos);
match.language = doc.language; // FIXME: Get language of match.
matches.emplace_back(match);
begin = match_result[0].end();
@ -124,89 +105,6 @@ std::vector<match> search(const fs::path &filepath,
return matches;
}
file_in_epub cleanup_text(const std::string_view text)
{
string output{text};
static const boost::regex re_header_start{"<[hH][1-6]"};
static const boost::regex re_header_end{"</[hH][1-6]"};
static const boost::regex re_pagebreak{"[^>]+pagebreak[^>]+"
"(title|aria-label)"
"=\"([[:alnum:]]+)\""};
{
size_t pos{0};
while ((pos = output.find_first_of("\n\t\r", pos)) != string::npos)
{
if (output[pos] == '\r')
{
output.erase(pos, 1);
}
else
{
output.replace(pos, 1, " ");
}
}
}
{
size_t pos{0};
while ((pos = output.find(" ", pos)) != string::npos)
{
output.replace(pos, 2, " ");
}
}
size_t pos{0};
file_in_epub file;
size_t headline_start{string::npos};
while ((pos = output.find('<', pos)) != string::npos)
{
auto endpos{output.find('>', pos) + 1};
if (boost::regex_match(output.substr(pos, 3), re_header_start))
{
headline_start = pos;
}
else if (boost::regex_match(output.substr(pos, 4), re_header_end))
{
if (headline_start != string::npos)
{
file.headlines.insert(
{headline_start,
output.substr(headline_start, pos - headline_start)});
headline_start = string::npos;
}
}
else if (output.substr(pos, 6) == "<span ")
{
boost::match_results<string::const_iterator> match;
using it_size_t = string::const_iterator::difference_type;
string::const_iterator begin{output.begin()
+ static_cast<it_size_t>(pos)};
string::const_iterator end{output.begin()
+ static_cast<it_size_t>(endpos)};
if (boost::regex_search(begin, end, match, re_pagebreak))
{
file.pages.insert({pos, match[2].str()});
}
}
else if (output.substr(pos, 7) == "<style "
|| output.substr(pos, 8) == "<script ")
{
if (output.find("/>", pos) > endpos)
{
endpos = output.find('>', endpos) + 1;
}
}
output.erase(pos, endpos - pos);
}
file.text = output;
return file;
}
match_context context(const boost::match_results<string::const_iterator> &match,
std::uint64_t words)
{
@ -270,36 +168,4 @@ match_context context(const boost::match_results<string::const_iterator> &match,
return {before, after};
}
std::string headline(const file_in_epub &file, const size_t pos)
{
std::string_view last;
for (const auto &pair : file.headlines)
{
if (pair.first > pos)
{
break;
}
last = pair.second;
}
return string(last);
}
string page(const file_in_epub &file, const size_t pos)
{
std::string_view last;
for (const auto &pair : file.pages)
{
if (pair.first > pos)
{
break;
}
last = pair.second;
}
return string(last);
}
} // namespace epubgrep::search

View File

@ -43,7 +43,8 @@ struct match
std::string filepath_inside; //!< The file path of the matched line.
std::string headline; //!< The last headline, if available.
std::string page; //!< The page number, if available.
};
std::string language; //!< Match language.
} __attribute__((aligned(128)));
struct settings
{
@ -52,34 +53,25 @@ struct settings
bool ignore_case{false};
bool raw{false};
std::uint64_t context{0};
};
} __attribute__((aligned(16)));
struct file_in_epub
{
std::string text;
std::map<size_t, std::string> headlines;
std::map<size_t, std::string> pages;
};
} __attribute__((aligned(128)));
//! Search file, return matches.
[[nodiscard]] std::vector<match> search(const fs::path &filepath,
std::string_view regex,
const settings &opts);
//! Strip HTML, remove newlines, condense spaces.
[[nodiscard]] file_in_epub cleanup_text(std::string_view text);
//! Return words before and after the match.
[[nodiscard]] match_context
context(const boost::match_results<std::string::const_iterator> &match,
std::uint64_t words);
//! Return last headline if possible.
[[nodiscard]] std::string headline(const file_in_epub &file, size_t pos);
//! Return current page if possible.
[[nodiscard]] std::string page(const file_in_epub &file, size_t pos);
} // namespace epubgrep::search
#endif // EPUBGREP_SEARCH_HPP

View File

@ -25,7 +25,6 @@
#include <boost/locale/message.hpp>
#include <fmt/format.h>
#include <fmt/ostream.h> // For compatibility with fmt 4.
#include <pugixml.hpp>
#include <cstdlib>
#include <cstring>
@ -56,8 +55,8 @@ std::vector<std::string> list(const fs::path &filepath)
<< format(translate("File in {0:s} is damaged. "
"Skipping in-EPUB file.\n")
.str()
.data(),
filepath);
.c_str(),
filepath.c_str());
continue;
}
toc.emplace_back(in_epub_filepath);
@ -85,7 +84,7 @@ std::string read_file(const fs::path &filepath, std::string_view entry_path)
"Skipping in-EPUB file.\n")
.str()
.data(),
filepath);
filepath.c_str());
continue;
}
if (std::strcmp(path, entry_path.data()) == 0)
@ -100,9 +99,9 @@ std::string read_file(const fs::path &filepath, std::string_view entry_path)
{
close_file(zipfile, filepath);
throw exception{
format(translate("Could not read {0:s} in {1:s}.").str(),
entry_path, filepath.string())};
throw exception{format(
translate("Could not read {0:s} in {1:s}.").str().c_str(),
entry_path, filepath.string())};
}
close_file(zipfile, filepath);
@ -116,7 +115,7 @@ std::string read_file(const fs::path &filepath, std::string_view entry_path)
if (entry_path == "META-INF/container.xml")
{ // File is probably not an EPUB.
exception e{format(translate("{0:s} not found in {1:s}.").str(),
exception e{format(translate("{0:s} not found in {1:s}.").str().c_str(),
entry_path, filepath.string())};
e.code = 1;
throw exception{e};
@ -146,7 +145,7 @@ struct archive *open_file(const fs::path &filepath)
{
close_file(zipfile, filepath);
exception e{format(translate("Could not open {0:s}.").str(),
exception e{format(translate("Could not open {0:s}.").str().c_str(),
filepath.string())};
e.code = 1;
throw exception{e};
@ -160,84 +159,10 @@ void close_file(struct archive *zipfile, const fs::path &filepath)
auto result{archive_read_free(zipfile)};
if (result != ARCHIVE_OK)
{
throw exception{format(translate("Could not close {0:s}.").str(),
filepath.string())};
throw exception{
format(translate("Could not close {0:s}.").str().c_str(),
filepath.string())};
}
}
std::vector<std::string> list_spine(const fs::path &filepath)
{
const auto opf_file_path{
[&filepath]
{
pugi::xml_document xml;
const std::string container{
read_file(filepath, "META-INF/container.xml")};
const auto result{xml.load_buffer(&container[0], container.size())};
if (result)
{
return fs::path{xml.child("container")
.child("rootfiles")
.first_child()
.attribute("full-path")
.value()};
}
LOG(log::sev::error) << result.description() << '\n';
return fs::path{};
}()};
std::vector<std::string> spine_filepaths;
if (!opf_file_path.empty())
{
DEBUGLOG << "Parsing " << opf_file_path;
pugi::xml_document xml;
const std::string opf_file{read_file(filepath, opf_file_path.string())};
const auto result{xml.load_buffer(&opf_file[0], opf_file.size())};
if (result)
{
auto manifest{xml.child("package").child("manifest")};
if (manifest == nullptr)
{
manifest = xml.child("opf:package").child("opf:manifest");
}
auto spine{xml.child("package").child("spine")};
if (spine == nullptr)
{
spine = xml.child("opf:package").child("opf:spine");
}
for (const auto &itemref : spine)
{
const auto &idref{itemref.attribute("idref").value()};
const auto &item{manifest.find_child_by_attribute("id", idref)};
auto href{helpers::urldecode(item.attribute("href").value())};
if (href[0] != '/')
{
href = (opf_file_path.parent_path() /= href);
}
DEBUGLOG << "Found in spine: " << href;
spine_filepaths.emplace_back(href);
}
}
else
{
LOG(log::sev::error) << "XML: " << result.description() << '\n';
}
}
if (opf_file_path.empty() || spine_filepaths.empty())
{
LOG(log::sev::error)
<< format(translate("{0:s} is damaged. Could not read spine. "
"Skipping file.\n")
.str()
.data(),
filepath);
return {};
}
return spine_filepaths;
}
} // namespace epubgrep::zip

View File

@ -43,9 +43,6 @@ namespace epubgrep::zip
//! Close zip file.
void close_file(struct archive *zipfile, const fs::path &filepath);
//! Returns the files in the EPUB “spine” (all pages that are actually text).
[[nodiscard]] std::vector<std::string> list_spine(const fs::path &filepath);
//! It's std::runtime_error, but with another name.
class exception : public std::runtime_error
{

View File

@ -5,11 +5,16 @@ file(COPY "test.epub3" DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
find_package(Catch2 CONFIG)
if(Catch2_FOUND) # Catch 2.x
if(Catch2_FOUND) # Catch 2.x / 3.x
include(Catch)
add_executable(all_tests main.cpp ${sources_tests})
target_link_libraries(all_tests
PRIVATE Catch2::Catch2 ${PROJECT_NAME}_lib)
if(TARGET Catch2::Catch2WithMain) # Catch 3.x
target_link_libraries(all_tests
PRIVATE Catch2::Catch2WithMain ${PROJECT_NAME}_lib)
else() # Catch 2.x
target_link_libraries(all_tests
PRIVATE Catch2::Catch2 ${PROJECT_NAME}_lib)
endif()
target_include_directories(all_tests PRIVATE "/usr/include/catch2")
catch_discover_tests(all_tests EXTRA_ARGS "${EXTRA_TEST_ARGS}")
else() # Catch 1.x

View File

@ -1,3 +1,8 @@
#define CATCH_CONFIG_MAIN
#include <catch.hpp>
// catch 3 does not have catch.hpp anymore
#if __has_include(<catch.hpp>)
# include <catch.hpp>
#else
# include <catch_all.hpp>
#endif

View File

@ -1,7 +1,12 @@
#include "fs-compat.hpp"
#include "helpers.hpp"
#include <catch.hpp>
// catch 3 does not have catch.hpp anymore
#if __has_include(<catch.hpp>)
# include <catch.hpp>
#else
# include <catch_all.hpp>
#endif
#include <array>
#include <exception>

View File

@ -2,7 +2,12 @@
#include "options.hpp"
#include "search.hpp"
#include <catch.hpp>
// catch 3 does not have catch.hpp anymore
#if __has_include(<catch.hpp>)
# include <catch.hpp>
#else
# include <catch_all.hpp>
#endif
#include <clocale>
#include <exception>

View File

@ -1,7 +1,13 @@
#include "book.hpp"
#include "fs-compat.hpp"
#include "search.hpp"
#include <catch.hpp>
// catch 3 does not have catch.hpp anymore
#if __has_include(<catch.hpp>)
# include <catch.hpp>
#else
# include <catch_all.hpp>
#endif
#include <clocale>
#include <exception>
@ -26,7 +32,7 @@ SCENARIO("Searching helpers work as intended")
text = "Moss";
try
{
text = epubgrep::search::cleanup_text(text).text;
text = epubgrep::book::process_page(text).text_cleaned;
}
catch (const std::exception &)
{
@ -46,7 +52,7 @@ SCENARIO("Searching helpers work as intended")
text = "💖\r\r🦝";
try
{
text = epubgrep::search::cleanup_text(text).text;
text = epubgrep::book::process_page(text).text_cleaned;
}
catch (const std::exception &)
{
@ -66,7 +72,7 @@ SCENARIO("Searching helpers work as intended")
text = "Moss\n\n\n\n\n\nis good.";
try
{
text = epubgrep::search::cleanup_text(text).text;
text = epubgrep::book::process_page(text).text_cleaned;
}
catch (const std::exception &)
{
@ -91,8 +97,8 @@ SCENARIO("Searching helpers work as intended")
text = "… <h3>Soup</h3> …";
try
{
auto file{epubgrep::search::cleanup_text(text)};
text = epubgrep::search::headline(file, text.size());
auto file{epubgrep::book::process_page(text)};
text = epubgrep::book::headline(file, text.size());
}
catch (const std::exception &)
{
@ -113,8 +119,8 @@ SCENARIO("Searching helpers work as intended")
"road to nowhere</h2> …";
try
{
auto file{epubgrep::search::cleanup_text(text)};
text = epubgrep::search::headline(file, text.size());
auto file{epubgrep::book::process_page(text)};
text = epubgrep::book::headline(file, text.size());
}
catch (const std::exception &)
{
@ -134,8 +140,8 @@ SCENARIO("Searching helpers work as intended")
text = "<html><hr>The long<section>road to nowhere</section>";
try
{
auto file{epubgrep::search::cleanup_text(text)};
text = epubgrep::search::headline(file, text.size());
auto file{epubgrep::book::process_page(text)};
text = epubgrep::book::headline(file, text.size());
}
catch (const std::exception &)
{
@ -160,8 +166,8 @@ SCENARIO("Searching helpers work as intended")
text = R"(… <span epub:type="pagebreak" … title="69"/> …)";
try
{
auto file{epubgrep::search::cleanup_text(text)};
text = epubgrep::search::page(file, text.size());
auto file{epubgrep::book::process_page(text)};
text = epubgrep::book::page(file, text.size());
}
catch (const std::exception &)
{
@ -181,8 +187,8 @@ SCENARIO("Searching helpers work as intended")
text = R"(… <span role="doc-pagebreak" … aria-label="69"/> …)";
try
{
auto file{epubgrep::search::cleanup_text(text)};
text = epubgrep::search::page(file, text.size());
auto file{epubgrep::book::process_page(text)};
text = epubgrep::book::page(file, text.size());
}
catch (const std::exception &)
{

View File

@ -2,10 +2,16 @@
#include "options.hpp"
#include "search.hpp"
#include <catch.hpp>
// catch 3 does not have catch.hpp anymore
#if __has_include(<catch.hpp>)
# include <catch.hpp>
#else
# include <catch_all.hpp>
#endif
#include <clocale>
#include <exception>
#include <iostream>
#include <string>
#include <vector>
@ -32,8 +38,9 @@ SCENARIO("Searching ZIP files works")
opts.regex = epubgrep::options::regex_kind::extended;
matches = epubgrep::search::search(zipfile, "📙+\\w?", opts);
}
catch (const std::exception &)
catch (const std::exception &e)
{
std::cerr << "EXCEPTION: " << e.what() << '\n';
exception = true;
}
@ -53,8 +60,9 @@ SCENARIO("Searching ZIP files works")
opts.context = 1;
matches = epubgrep::search::search(zipfile, "📗", opts);
}
catch (const std::exception &)
catch (const std::exception &e)
{
std::cerr << "EXCEPTION: " << e.what() << '\n';
exception = true;
}
@ -78,8 +86,9 @@ SCENARIO("Searching ZIP files works")
matches = epubgrep::search::search(zipfile, R"([ \n])",
opts);
}
catch (const std::exception &)
catch (const std::exception &e)
{
std::cerr << "EXCEPTION: " << e.what() << '\n';
exception = true;
}
@ -114,12 +123,13 @@ SCENARIO("Searching ZIP files works")
try
{
opts.context = 1;
opts.regex = epubgrep::options::regex_kind::extended;
opts.regex = epubgrep::options::regex_kind::perl;
matches = epubgrep::search::search(
zipfile, R"(work\s[\w]+\.\W[\w']+\Wstay)", opts);
}
catch (const std::exception &)
catch (const std::exception &e)
{
std::cerr << "EXCEPTION: " << e.what() << '\n';
exception = true;
}

View File

@ -1,7 +1,12 @@
#include "fs-compat.hpp"
#include "zip.hpp"
#include <catch.hpp>
// catch 3 does not have catch.hpp anymore
#if __has_include(<catch.hpp>)
# include <catch.hpp>
#else
# include <catch_all.hpp>
#endif
#include <clocale>
#include <exception>

View File

@ -2,8 +2,8 @@ msgid ""
msgstr ""
"Project-Id-Version: epubgrep 0.6.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2021-06-24 18:38+0200\n"
"PO-Revision-Date: 2021-06-24 19:19+0200\n"
"POT-Creation-Date: 2021-08-20 17:06+0200\n"
"PO-Revision-Date: 2021-08-20 17:07+0200\n"
"Last-Translator: tastytea <tastytea@tastytea.de>\n"
"Language-Team: tastytea <https://schlomp.space/tastytea/epubgrep>\n"
"Language: de\n"
@ -17,6 +17,12 @@ msgstr ""
"X-Poedit-KeywordsList: translate\n"
"X-Poedit-SearchPath-0: .\n"
# „Spine“ ist ein Fachbegriff, daher habe ich ihn nicht übersetzt.
#: src/book.cpp:284
msgid "{0:s} is damaged. Could not read spine. Skipping file.\n"
msgstr ""
"{0:s} ist beschädigt. Konnte „Spine“ nicht lesen. Überspringe Datei.\n"
#: src/log.cpp:70
msgid "WARNING"
msgstr "WARNUNG"
@ -29,23 +35,23 @@ msgstr "FEHLER"
msgid "FATAL ERROR"
msgstr "SCHWERER FEHLER"
#: src/main.cpp:82
#: src/main.cpp:83
msgid " (while parsing options)"
msgstr " (während Optionen interpretiert wurden)"
#: src/main.cpp:128
#: src/main.cpp:129
msgid "Could not open {0:s}: {1:s}"
msgstr "Konnte {0:s} nicht öffnen: {1:s}"
#: src/main.cpp:178
#: src/main.cpp:179
msgid " (while opening {0:s})"
msgstr " (während {0:s} durchsucht wurde)"
#: src/main.cpp:236
#: src/main.cpp:237
msgid "{0:d} of {1:d} books searched."
msgstr "{0:d} von {1:d} Büchern durchsucht."
#: src/main.cpp:240
#: src/main.cpp:241
msgid "All books searched."
msgstr "Alle Bücher durchsucht."
@ -160,8 +166,8 @@ msgid "Set status message interval to NUMBER seconds."
msgstr "Setze Intervall für Statusmeldungen auf ANZAHL Sekunden."
#: src/options.cpp:162
msgid "Usage: epubgrep [OPTION]… PATTERN [FILE]…\n"
msgstr "Aufruf: epubgrep [OPTION]… MUSTER [DATEI]…\n"
msgid "Usage: epubgrep [OPTION]… PATTERN FILE…\n"
msgstr "Aufruf: epubgrep [OPTION]… MUSTER DATEI…\n"
#: src/options.cpp:164
msgid ""
@ -187,48 +193,47 @@ msgstr ""
msgid " In {0:s}: \n"
msgstr " In {0:s}:\n"
#: src/output.cpp:155
# Sprache der Benutzeroberfläche.
#: src/output.cpp:145
msgid "en"
msgstr "de"
#: src/output.cpp:157
msgid "File {0:d}"
msgstr "Datei {0:d}"
#: src/output.cpp:170
#: src/output.cpp:172
msgid "File path (in EPUB file)"
msgstr "Dateipfad (innerhalb der EPUB Datei)"
#: src/output.cpp:174
#: src/output.cpp:176
msgid "Last headline"
msgstr "Letzte Überschrift"
#: src/output.cpp:177
#: src/output.cpp:179
msgid "Page number"
msgstr "Seitennummer"
#: src/output.cpp:180
#: src/output.cpp:182
msgid "Match"
msgstr "Treffer"
#: src/zip.cpp:56 src/zip.cpp:84
#: src/zip.cpp:55 src/zip.cpp:83
msgid "File in {0:s} is damaged. Skipping in-EPUB file.\n"
msgstr "Datei in {0:s} ist beschädigt. Überspringe Datei in der EPUB.\n"
#: src/zip.cpp:104
#: src/zip.cpp:103
msgid "Could not read {0:s} in {1:s}."
msgstr "Konnte {0:s} in {1:s} nicht lesen."
#: src/zip.cpp:119 src/zip.cpp:126
#: src/zip.cpp:118 src/zip.cpp:125
msgid "{0:s} not found in {1:s}."
msgstr "{0:s} nicht gefunden in {1:s}."
#: src/zip.cpp:149
#: src/zip.cpp:148
msgid "Could not open {0:s}."
msgstr "Konnte {0:s} nicht öffnen."
#: src/zip.cpp:163
#: src/zip.cpp:162
msgid "Could not close {0:s}."
msgstr "Konnte {0:s} nicht schließen."
# „Spine“ ist ein Fachbegriff, daher habe ich ihn nicht übersetzt.
#: src/zip.cpp:232
msgid "{0:s} is damaged. Could not read spine. Skipping file.\n"
msgstr ""
"{0:s} ist beschädigt. Konnte „Spine“ nicht lesen. Überspringe Datei.\n"