Re: [dev] reading an epub book with less: adventures in text processing

From: Georg Lehner <jorge_AT_at.anteris.net>
Date: Sat, 9 Mar 2024 22:06:34 +0100

Hi Greg,

On 2024-03-09 15:34, Greg Reagle wrote:
> I have an epub ebook. It is a novel, but when I get this process working, I want to repeat it for any epub ebook.
>
> I want to read it, with formatting (such as underline or italics), with less. I am happy to use any software that exists in the process, but I MUST use less in the end to read it. The terminal emulators that I use are usually st, xterm, and termux. All of them are capable of colored text and underlining and so forth, and I want to take advantage of this.
>
> Pandoc does a very good job converting epub to html, and it looks good with w3m, however when I use w3m in a pipe, the output is truly *plain* text, meaning there are no escape codes for formatting. Same story with elinks. Is it possible to get either of these programs, or some other program, to dump html to text *with* escape codes?
>
> Since I could not get HTML to work, I went with man format. Amazing. Pandoc automatically chooses man format for output based on the '.1' extension in the followingv
> pandoc --standalone -o City_of_Truth-Morrow.1 City_of_Truth-Morrow.epub
> Remember to use standalone option or it won't work. Then
> man --local-file --pager 'less -ir' City_of_Truth-Morrow.1
> It looks great! (for text only on a terminal) It has bold and underlined text. From there I can use less 's' command to save the formatted text to a file.
>
> There might be a better or more direct way of achieving this goal, but this I what I figured out for now. And the rationale is this: I already know and love less. There is no good reason for me to learn the user interface of a different program like an epub reader or an html reader to read a book that does not have graphics, diagrams, pictures, and/or custom formatting.
>
Just modify your workflow slightly and you are good:

Option 1: use w3m

pandoc -s -t html City_of_Truth-Morrow.epub | w3m -T text/html

Option 2: use man/less

pandoc -t man City_of_Truth-Morrow.epub | man -l -

Option 3, save as html for future use:

pandoc -s  -o City_of_Truth-Morrow.html City_of_Truth-Morrow.epub

Saves your epub to html. Whenever you want to view it, use your favorite
browser, i.e. w3m, with all its features.

Option 4: save as man:

pandoc -s -t man -o City_of_Truth-Morrow.man City_of_Truth-Morrow.epub

Whenever you view it, use: man -l City_of_Truth-Morrow.man

- - -

Some notes:

The reason you loose formatting when saving from less(1) or w3m is, that
these programs on purpose do not save the terminal control characters
which are doing the markup. Line breaks and terminal control are created
on demand, depending on the type and size of the terminal (window) and
will display different (weird) when any of this is different from the
terminal you (would have) saved them to a file.

The -s option (--standalone) option for Pandoc is not required for man
page output. For html (and other formats) pandoc outputs only the <body>
content, the -s options wraps this into a complete <html> document.

Best Regards,


   Georg
Received on Sat Mar 09 2024 - 22:06:34 CET

This archive was generated by hypermail 2.3.0 : Sat Mar 09 2024 - 23:12:09 CET