[dev] reading an epub book with less: adventures in text processing

From: Greg Reagle <list_AT_speedpost.net>
Date: Sat, 09 Mar 2024 09:34:12 -0500

I have an epub ebook. It is a novel, but when I get this process working, I want to repeat it for any epub ebook.

I want to read it, with formatting (such as underline or italics), with less. I am happy to use any software that exists in the process, but I MUST use less in the end to read it. The terminal emulators that I use are usually st, xterm, and termux. All of them are capable of colored text and underlining and so forth, and I want to take advantage of this.

Pandoc does a very good job converting epub to html, and it looks good with w3m, however when I use w3m in a pipe, the output is truly *plain* text, meaning there are no escape codes for formatting. Same story with elinks. Is it possible to get either of these programs, or some other program, to dump html to text *with* escape codes?

Since I could not get HTML to work, I went with man format. Amazing. Pandoc automatically chooses man format for output based on the '.1' extension in the followingv
    pandoc --standalone -o City_of_Truth-Morrow.1 City_of_Truth-Morrow.epub
Remember to use standalone option or it won't work. Then
    man --local-file --pager 'less -ir' City_of_Truth-Morrow.1
It looks great! (for text only on a terminal) It has bold and underlined text. From there I can use less 's' command to save the formatted text to a file.

There might be a better or more direct way of achieving this goal, but this I what I figured out for now. And the rationale is this: I already know and love less. There is no good reason for me to learn the user interface of a different program like an epub reader or an html reader to read a book that does not have graphics, diagrams, pictures, and/or custom formatting.
Received on Sat Mar 09 2024 - 15:34:12 CET

This archive was generated by hypermail 2.3.0 : Sat Mar 09 2024 - 16:36:09 CET