PDF→EPUB: Use types for menus and keyboard keys; small fixes.

This commit is contained in:
tastytea 2021-03-15 10:53:31 +01:00
parent de98ade9c0
commit 92446b6ba1
Signed by: tastytea
GPG Key ID: CFC39497F1B26E07
1 changed files with 31 additions and 23 deletions

View File

@ -1,7 +1,7 @@
---
title: "How I convert PDFs to EPUB semi-automatically"
slug: how-i-convert-pdfs-to-epub
description: "A guide to clean EPUBs from PDFs using Calibre, Emacs and time."
description: "A step by step guide to clean EPUBs from PDFs using Calibre, Emacs and time."
date: 2021-03-15T04:12:00+01:00
type: posts
draft: false
@ -12,6 +12,7 @@ tags:
---
:source-highlighter: pygments
:experimental: true
:url-calibre: https://calibre-ebook.com/
:url-calibre-convert: https://manual.calibre-ebook.com/conversion.html#pdfconversion
@ -31,25 +32,28 @@ have a lot of footnotes.
One option is to use Calibre to convert and then fix the result, but I have
found that I get better results in less time when I create a new
link:{wp-epub}[EPUB], copy the PDF's content into link:{url-emacs}[Emacs], clean
it up there and then copy it over to Calibre. This process is what I'd like to
it up there and then copy it over to Calibre. This process is what I want to
share with you here. You will need Calibre, Emacs or another editor with
keyboard macros and some knowledge of link:{wp-xhtml}[XHTML] and
link:{wp-css}[CSS] to follow this recipe. It will take long and is boring, but
link:{wp-css}[CSS] to follow this guide. It will take long and is boring, but
the result is a clean and enjoyable book.
== Create a new book in Calibre
Click on “Add books” → “Add empty book”. Then fill in the metadata and select
Click on menu:Add books[Add empty book]. Then fill in the metadata and select
“EPUB” as format. You can add more metadata and a cover image by right-clicking
the book and then selecting “Edit metadata”. Open Calibre's editor by right
clicking on the book and selecting “Edit book”. You start with a single XHTML
file, `start.xhtml`. I always use that for the title page, the copyright notice
and so on. You can force a page break to separate the title and the copyright
notice with CSS: Add `style="page-break-after: always;"` to the last element of
the virtual “page” or use a CSS class. To add a CSS file click “File” → “New
file” and enter a filename ending with `.css`. Add the CSS file by right
clicking on `start.xhtml` in the file browser and selecting “Link
stylesheets…”. Note that the in-built preview does not show page breaks.
the virtual “page” or use a CSS class. To add a CSS file click menu:File[New
file] and enter a filename ending with `.css`. Add the CSS file to the document
by right clicking on `start.xhtml` in the file browser and selecting “Link
stylesheets…”.
[NOTE]
The built-in preview does not show page breaks.
Your files should look similar to this:
@ -139,15 +143,17 @@ end of the lines.
--------------------------------------------------------------------------------
Make sure that `auto-fill-mode` is disabled. Position the cursor at the start of
the buffer and press `<f3>` to start recording a macro. Press `<end>`
`<deletechar>` `SPC` (space bar) and then `<f4>` to stop recording. If there is
a hyphen at the end of the current line, press `<backspace>` 2 times. Press
`<f4>` to call the macro and repeat until you are at the end of the
paragraph. Move the cursor to the first line of the next paragraph and repeat.
the buffer and press kbd:[<f3>] to start recording a macro. Press kbd:[<end>]
kbd:[<deletechar>] kbd:[SPC] (space bar) and then kbd:[<f4>] to stop
recording. If there is a hyphen at the end of the current line, press
kbd:[<backspace>] 2 times. Press kbd:[<f4>] to call the macro and repeat until
you are at the end of the paragraph. Move the cursor to the first line of the
next paragraph and repeat…
Now you should have a text file with 1 paragraph per line. We need to wrap all
lines in `<p>` tags, except block quotes and sub-headlines. Either use another
macro (“<p>” `<end>` “</p>” `<down>` `<down>` `<home>`) or this elisp function:
macro (`<p> kbd:[<end>] </p> kbd:[<down>] kbd:[<down>] kbd:[<home>]`) or this
elisp function:
[source,elisp]
--------------------------------------------------------------------------------
@ -170,14 +176,16 @@ hyperlink-able, so we can't just wrap them in plain `<p>` tags, they need IDs. I
like to use `<span>1</span><p id="fn1">[…]</p>` if there is only one
footnote-section or `<span>1</span><p id="fn1_1">[…]</p>` for
chapter-footnotes. We are going to use a macro with a counter to generate
consecutively numbered IDs. First, set the counter to 1 with `C-x C-k
C-c` “1”. Then, record this macro:
consecutively numbered IDs. First, set the counter to 1 with `kbd:[C-x]
kbd:[C-k] kbd:[C-c] 1`. Then, record this macro:
“<span>” `C-x C-k` `<tab>` `C-u` “-1” `C-x C-k C-a` “</span><p id="fn” `C-x C-k`
`<tab>` “">” `<end>` “</p>” `<down>` `<down>` `<home>`
`<span> kbd:[C-x] kbd:[C-k] kbd:[<tab>] kbd:[C-u] -1 kbd:[C-x] kbd:[C-k]
kbd:[C-a] </span><p id="fn kbd:[C-x] kbd:[C-k] kbd:[<tab>] "> kbd:[<end>] </p>
kbd:[<down>] kbd:[<down>] kbd:[<home>]`
`C-u` “-1” `C-x C-k C-a` “adds” -1 to the counter, so that we can use the same
number again.
[NOTE]
`kbd:[C-u] -1 kbd:[C-x] kbd:[C-k] kbd:[C-a]` “adds” -1 to the counter, so that
we can use the same number again.
Call the macro until every footnote is wrapped and copy them to Calibre.
@ -195,14 +203,14 @@ Press `<f3>` to search through the text and `C-r` to replace.
== Finishing touches
Click “Tools” → “Table of Contents” → “Edit table of Contents”, remove the
Click menu:Tools[Table of Contents > Edit table of Contents], remove the
existing entry and click “Generate ToC from major headings” or “Generate ToC
from all headings”.
Click “Tools” → “Set semantics” and set the location of the title page,
Click menu:Tools[Set semantics] and set the location of the title page,
copyright page, beginning of text and so on.
Select “Tools” → “Check book” and fix the errors.
Select menu:Tools[Check book] and fix the errors.
You're done! Enjoy your cleanly formatted book. 😊