Notes on spark feeds in Fever

26 Sep 2013

Following my writeup on getting Twitter links into Fever, I’ve done some more digging and have a few observations worth sharing.

Fever’s sparks feature works by matching the URLs it finds in feeds. That’s it. It’s not particularly clever about doing this (for sensible performance reasons) so it needs exact matches. If feeds contain URLs which are obscured by URL shortening services, redirection services, click trackers or similar then the URLs will not match, Fever will not count them, and you will never see them.

What this means is that you need to check the actual content of a feed before adding it to your sparks folder, otherwise you’re just wasting your time (and your server’s disk space).

Let’s look at a real-world example: Google News. Google News should be a perfect source: it is searchable (with RSS output for search results) and it pre-aggregates lots of sources for you. Even better than that, the content of each feed item often contains links to multiple different sources for the story and related articles. It should be a spark goldmine.

Instead it’s useless. To find out why, take a look at an example feed. Specifically look for the <link> and <description> fields inside each <item>. You will see something like this:

<link>http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNEf</link>
<description>
  [...]
  <a href="http://news.google.com/news/url?AFQjCNEf [...]
  url=http://www.theguardian.com/technology/2013/sep/24/">
  [...]
</description>

Crucially, you will notice that the main <link> URL and every URL contained in the <description> is wrapped with Google’s URL redirect service. In this specific example the URL is:

http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNEfL3kebJcrRdzqqVCoAaD2IyfQYg&url=http://www.theguardian.com/technology/2013/sep/24/blackberry-messenger-bbm-iphone-android

Appended to the redirect is the real Guardian URL:

http://www.theguardian.com/technology/2013/sep/24/blackberry-messenger-bbm-iphone-android

The important thing to note is that Fever sees this as a single URL, it cannot extract the Guardian link from the Google redirect. Thus no matter how many Google News feeds you add to your sparks folder they will only ever corroborate themselves and they will not act in the way you would like or expect.

It gets worse I’m afraid because in addition to URL redirects you will also find click tracking in feeds. Another real-world example, this time from the Sinocism newsletter (which I highly encourage you to subscribe to if you are interested in China).

This is another very promising looking feed: one entry per day containing 40-50 links to interesting articles chosen by a knowledgable curator. However, as excellent a resource as Sinocism is, unfortunately it is useless as a spark feed.

Each link is behind a redirect like this:

http://sinocism.us5.list-manage.com/track/click?u=f18121c5942896d3a87491249&id=3edf501b0d&e=c480cfc6eb

And even if you were to follow the link, it leads to this:

http://news.xinhuanet.com/english/china/2013-09/24/c_132746010.htm?utm_source=The+Sinocism+China+Newsletter&utm_campaign=c4f78bc521-Sinocism09_24_13&utm_medium=email&utm_term=0_171f237867-c4f78bc521-28360685

Notice the Google Analytics click tracking query parameters: utm_source, utm_campaign, utm_medium etc. Having these appended to the end of a URL means that Fever treats it as a different URL to the original:

http://news.xinhuanet.com/english/china/2013-09/24/c_132746010.htm

This is certainly what I’ve observed in my daily usage, and we can confirm it by checking the Fever source code - normalize_url() in libs/util.php is defined as:

<?php
function normalize_url($link) {
  // removes protocol and generic www subdomain
  $link = r('#^(?:https?|feed)://(?:www\.)?([^.]+\.)#i', '\\1', $link);
  // lowercases the domain name
  $link = r('#(^[^/]+/)#e', "low('$1')", $link);
  // removes /index.php and ilk
  $link = r('#/index\.[a-z0-9]+#i', '', $link);
  // removes directory slash before query
  $link = r('#/\?#', '?', $link);
  // removes final directory slash
  $link = r('#/$#', '', $link);
  return $link;
}

Can anything be done about this?

The obvious solution is to seek out feeds that don’t use URL obfuscation and include related links in the content of their feed items. Some examples I’ve found include:

You may find more spark links on my pinboard “spark” tag.

An alternative is to write scripts to modify specific feeds on the fly. This is obviously only possible if you’re comfortable writing a little code, and only worth it if the feed in question is of particular interest to you. Here is an example script I wrote using PHP’s DOM API.

Sometimes I see links in my hot list where the title is obviously the text of a tweet. Sometimes the title is just the name of the company whose website is being linked to. This got me wondering how Fever picks a title, given that it potentially has 10s or even 100s of different candidate titles. Let’s look at a specific example: my top link at the time of writing, the annoucement of Valve’s new Steam hardware.

Screenshot of a hot link

http://store.steampowered.com/livingroom/SteamMachines/

Querying the fever_links table for this URL we see it has the checksum 1937344971. Querying again for the checksum returns every instance of the URL which Fever has found:

mysql> SELECT created_on_time, item_id, is_item, url_checksum, title
    -> FROM fever_links 
    -> WHERE url_checksum=3887076886 
    -> ORDER BY created_on_time;
+-----------------+---------+---------+--------------+------------------------------+
| created_on_time | item_id | is_item | url_checksum | title                        |
+-----------------+---------+---------+--------------+------------------------------+
|      1380128415 |   64469 |       0 |   3887076886 | [link]                       |
|      1380128416 |   64389 |       0 |   3887076886 | [link]                       |
|      1380128507 |   64471 |       0 |   3887076886 | [link]                       |
|      1380128822 |   64441 |       1 |   3887076886 | SteamMachines                |
|      1380129088 |   64481 |       0 |   3887076886 | has announced                |
|      1380129088 |   64481 |       0 |   3887076886 | Valve                        |
|      1380129624 |   64549 |       0 |   3887076886 | has announced Steam Machines |
|      1380177532 |   66243 |       0 |   3887076886 | Steam Machines               |
+-----------------+---------+---------+--------------+------------------------------+
8 rows in set (0.00 sec)

The title of a “hot” link is the first title Fever finds for the link as a feed item (appearing as a link inside the body of another feed item doesn’t count).

This is over-ridden if the URL appears as a primary link in a feed you subscribe to (including other spark feeds, it needn’t be a kindling feed). You will also get a summary/excerpt, assuming the feed supplies one. So presentation is improved if something in one of your spark feeds catches on and gets lots of links from other spark feeds. This is possibly an argument for dumping lots of common news feeds into sparks.

I’ll probably continue to keep an eye on the behaviour of my spark feeds and update these notes when I have more usage data.

Other notes

2016

2014

2013

2012

2011

2010