Re: [dev] reading an epub book with less: adventures in text processing

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Sat, 9 Mar 2024 17:33:28 +0100

On Sat, Mar 09, 2024 at 09:34:12AM -0500, Greg Reagle wrote:
> I have an epub ebook. It is a novel, but when I get this process working, I want to repeat it for any epub ebook.
>
> I want to read it, with formatting (such as underline or italics), with less. I am happy to use any software that exists in the process, but I MUST use less in the end to read it. The terminal emulators that I use are usually st, xterm, and termux. All of them are capable of colored text and underlining and so forth, and I want to take advantage of this.
>
> Pandoc does a very good job converting epub to html, and it looks good with w3m, however when I use w3m in a pipe, the output is truly *plain* text, meaning there are no escape codes for formatting. Same story with elinks. Is it possible to get either of these programs, or some other program, to dump html to text *with* escape codes?
>
> Since I could not get HTML to work, I went with man format. Amazing. Pandoc automatically chooses man format for output based on the '.1' extension in the followingv
> pandoc --standalone -o City_of_Truth-Morrow.1 City_of_Truth-Morrow.epub
> Remember to use standalone option or it won't work. Then
> man --local-file --pager 'less -ir' City_of_Truth-Morrow.1
> It looks great! (for text only on a terminal) It has bold and underlined text. From there I can use less 's' command to save the formatted text to a file.
>
> There might be a better or more direct way of achieving this goal, but this I what I figured out for now. And the rationale is this: I already know and love less. There is no good reason for me to learn the user interface of a different program like an epub reader or an html reader to read a book that does not have graphics, diagrams, pictures, and/or custom formatting.
>

Hi,

Maybe mupdf/mutools or the eGhostscript tools o qpdf?

-- 
Kind regards,
Hiltjo
Received on Sat Mar 09 2024 - 17:33:28 CET

This archive was generated by hypermail 2.3.0 : Sat Mar 09 2024 - 17:36:08 CET