Re: [dev] sfeed: a simple RSS and Atom parser and reader

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Mon, 6 Aug 2012 12:20:36 +0200

On Mon, Aug 6, 2012 at 12:59 AM, pancake <pancake_AT_youterm.com> wrote:
>
> Did you tried with parsifal? Anyway.. My parser was simpler than all that xml-strict foo. So it worked too with corrupted and partially downloaded rss files.
>
> http://hg.youterm.com/mksend/file/14984ebd1529/parsifal
>

I'll investigate this, thanks! I agree the downside of the current XML
parser I use is that it's a validating XML parser, meaning the XML
should be correctly formatted. I will replace it with a non-validating
parser at some point though.

>> I like to use curl because it handles https, http redirection and also
>> allows me to pass the date of the latest update so HTTP caching will
>> work too. But curl can easily be replaced by wget or fetch though.
>
> I end up using wget and using local files with rss2html to process them. Depending on a library for this is imho not suckless

I agree, it doesn't depend on libcurl, just the command-line curl. You
can easily replace this with wget or fetch like I said.

>>
>>> Actually, the only useful feature was the 'planet' option which sorts/merges all your feeds in a single timeline.
>> You can specify multiple feeds in a config file and run sfeed_update
>> with this config file as a parameter. Then pipe it through sfeed_html
>> .
>
> Config file for what? Specifying a list of feeds should not be in a config file. Maybe in a wrapper script or so.

I agree. Sfeed_update is an optional wrapper script, it's the script I
use and added for convenience. You can write your own wrapper scripts
around sfeed (I know some people here prefer rc over sh for example).

>
> Iirc A suckless way should be exporting a tsv where the first word of each line is the unix timestamp, so using sort -n should be more unix friendly.
>
> At the end a feed reading should just comvert from various crappy atom/rss formats to an unified tsv output. The rest can be done with grep, sort and awk. Even the html output
>

I somewhat agree and this is what sfeed does. The optional
sfeed_update wrapper script does a little more than that though. It
makes sure there are no duplicates, groups them by feedname etc, a
snippet from the sfeed_update script:

        # merge raw files.
        # merge(oldfile, newfile)
        merge() {
                # unique sort by id, link, title.
                # order by feedname (asc), feedurl (asc) and timestamp (desc).
                (cat "$1" "$2" 2> /dev/null) |
                        sort -t ' ' -u -k7,7 -k4,4 -k3,3 |
                        sort -t ' ' -k10,10 -k11,11 -k1r,1
        }

> I would suggest exporting json too. That will make templating work on client side and no need to do any templating system. Static html is good for lynx... Another option i would suggest is to put that template design in config.h

You can convert the tsv format to json, it should be very trivial.

> Can you specify filter for words? Grep will work here?

Definitely, you can grep -v the tsv feeds file or just the stdout of sfeed.

>>
>>> I also wanted to have a way to keep synced my already read links. But that was a boring task.
>>
>> Atm I just mark all items a day old or newer as new in sfeed_html and
>> sfeed_plain. In your browser visited links will ofcourse be coloured
>> differently.
>>
>
> The workflow i would like to have with feeds is:
>
> Fetch list of new stuff
> Mark them as:
> - uninteresting (stroke, possibly add new filtering rules)
> - read later (have a separate list of urls to read when i have time)
> - mark as read/unread.
> - favorite (flag as imprtant thing)
> - show/hide all news from a single feed
>
> I understand that this workflow shouldnt be handled by sfeed, because thats a frontend issue. But having html output does not allows me to do anything of that.

Sounds useful and all that should be possible but you need to write
additional scripts for that.

> With json it would be easy to write a frontend like that easily in javascript ( blame me, but its fast and its everywhere). There's also a minimalist json parser named js0n that can do that from commandline too.
>
> But probably people in this list would expect an awk friendly format instead of json. (tsv can be easily converted to json)
>

I opted for an awk friendly format, but I personally think json is a
good format for data exchange (much better than XML).

I hope that will answer you questions, feel free to contact me on IRC
for any questions too for a faster answer :)
Received on Mon Aug 06 2012 - 12:20:36 CEST

This archive was generated by hypermail 2.3.0 : Mon Aug 06 2012 - 12:24:03 CEST