[dev] sfeed: a simple RSS and Atom parser and reader

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Sun, 5 Aug 2012 15:18:09 +0200

Greetings fellow people of suckless,


I would like to announce a simple RSS and Atom parser and reader I've
been working on.

Some of the current features are:

- items are stored in a format to easily interact with, so I used a
TSV-like format.
- separate programs to display this data (sfeed_plain (plain-text),
sfeed_html (HTML)).
- relatively little dependencies (although I use libexpat to parse XML).
- simple to interact with and play nice with existing tools (works
great with dmenu, links and your trusty term).
- works on most platforms (works on Linux, OpenBSD, Cygwin etc).
- easy to sync your feeds TO THE CLOUD ;P
- parallel downloading of feeds (sfeed_update).
- time conversion to your timezone.


Programs included and purpose of each program
---------------------------------------------

Name Purpose

sfeed - read XML RSS or Atom feed data from stdin. Write feed data
                    in tab-separated format to stdout.
sfeed_update - Shellscript; update feeds and merge with old feeds in the
                    file $HOME/.sfeed/feeds by default.
sfeed_plain - Format feeds file (TSV) from sfeed_update to plain text.
sfeed_html - Format feeds file (TSV) from sfeed_update to HTML.
sfeed_opml_import - Generate a sfeedrc config file based on an opml file.
sfeed_opml_export - Generate an opml file based on a sfeedrc config file.


Example output of format programs
---------------------------------

Example output of sfeed_plain:
        http://www.codemadness.nl/downloads/projects/sfeed/EXAMPLE.txt

Example output of sfeed_html:
        http://www.codemadness.nl/downloads/projects/sfeed/EXAMPLE.html

Screenshot of sfeed_plain with dmenu (see usage examples):
        http://www.codemadness.nl/downloads/screenshots/sfeed-screenshot.png


TAB-SEPARATED format
--------------------

The items are saved in a TSV-like format except newlines, tabs and
backslash are escaped with \ (\n, \t and \\). Other whitespace except
spaces are removed.

The timestamp field is converted to a unix timestamp. The timestamp is also
stored as formatted as a separate field. The other fields are left untouched
(including HTML).

The order and format of the fields are:

item unix timestamp - string unix timestamp (GMT+0)
item formatted timestamp - string timestamp (YYYY-mm-dd HH:MM:SS tz[+-]HHMM)
item title - string
item link - string
item description - string
item contenttype - string ("html" or "plain")
item id - string
item author - string
feed type - string ("rss" or "atom")
feed name - string (extra field added by sfeed_update)
feed url - string (extra field added by sfeed_update)
item baseurl site - string (extra field added by sfeed_update)


Some usage examples
-------------------

Basic usage to get items from a single newsfeed which also explains
the design of sfeed (iconv is optional, it's only used if feeds are
non-UTF8 encoded):

        curl -s 'http://kernel.org/kdist/rss.xml' | iconv -cs -f "iso-8859-1"
-t "utf-8" | sfeed | sfeed_plain
        

Config file syntax (shell script) for sfeed_update:

        feeds() {
                # feed <name> <feedurl> [basesiteurl] [encoding]
                feed "codemadness" "http://www.codemadness.nl/blog/rss.xml" &
                feed "xkcd" "http://xkcd.com/atom.xml" &
                feed "linux kernel" "http://kernel.org/kdist/rss.xml"
"http://kernel.org" "iso-8859-1" &
        }


update items and merge (default config location is $HOME/.sfeed/sfeedrc).
        sfeed_update


format feeds to plain-text:
        sfeed_plain < $HOME/.sfeed/feeds > $HOME/.sfeed/feeds.txt


format feeds to HTML:
        sfeed_html < $HOME/.sfeed/feeds > $HOME/.sfeed/feeds.html


view feeds with dmenu, opens selected url in $BROWSER:

        url=$(sfeed_plain < "$HOME/.sfeed/feeds" | dmenu -l 35 -i |
                sed 's_AT_^.* \([a-zA-Z]*://\)\(.*\)$_AT_\1\2_AT_')
        [ ! "$url" = "" ] && $BROWSER "$url"


Generate a sfeedrc config file from your exported list of feeds in opml
format (newsbeuter, google reader, snownews, thunderbird, etc):

        sfeed_opml_import < opmlfile.xml > $HOME/.sfeed/sfeedrc


Export an opml file of your feeds from a sfeedrc config file:

        sfeed_opml_export configfile > myfeeds.opml


This is the first version I publicly share with people so I'm sure
there are some bugs. Patches, bug reports and constructive criticism
are very welcome.


You can get the latest code with git at:

        git clone http://www.codemadness.nl/downloads/projects/sfeed/src/sfeed.git

A direct link to the latest README with more information is available here:
        
        http://www.codemadness.nl/downloads/projects/sfeed/README

Link to blog with some screenshot and example files:

        http://www.codemadness.nl/blog/2011/04/01/sfeed-simple-feed-parser/


Credits

Thanks to raph_ael on #suckless for the idea of an opml converter and
__20h__ for suggesting I should add a public code repo for easier
patch management.


Kind regards,
        Hiltjo (Evil_Bob on #suckless)
Received on Sun Aug 05 2012 - 15:18:09 CEST

This archive was generated by hypermail 2.3.0 : Sun Aug 05 2012 - 15:24:03 CEST