All about my awful RSS feed generator

2020-08-11 00:00

So, I joined the RSS bandwagon not too long ago. Right now, I’m just using Thunderbird’s built-in RSS feed manager to follow friends. I used to, and still do for some friends who don’t have an RSS feed, keep a text file called ~/Documents/links/friends-websites.txt. This file contains line-separated links to friends’ homepages, which I would check most mornings while waking up with a cup of coffee.

Something about following friends’ personal homepages is way more appealing than feed-based social media sometimes. It’s relaxing, doesn’t require attention, it doesn’t have notifications, and I don’t even need an account to follow people. How great is that? I guess it’s like having a newspaper full of my friends’ beautiful discourse.

… Okay, RSS feeds are still feeds hahaha.

So, naturally, I wanted friends to have the same convenient access to my homepage activity as I did to theirs, so I began to research how RSS worked. I didn’t really know at all to be honest, and the XML examples on Wikipedia confused the hell out of me.

After staring at the examples for a while, I kind of got the gist of what they were and their structure.

The example on Wikipedia indicated that I didn’t need much. In its example, it had a title, a description, a link, a build date, a publishing date, and a “ttl”–whatever the hell that is.

These elements seemed like they should exist at top of my RSS feed, inside a <channel> element, and each of them should only occur once.

Below the <channel> element, there were a few <item> elements, and inside each <item> element there was a <title>, <description>, <link>, <guid>, and a <pubDate> element.

At this point I was starting to understand more about how RSS worked.

After reading the Wikipedia page on RSS, I went to check the official RSS standard to see what it had to say. To my luck, it had listed “required channel elements”, which said you only need the following elements in the <channel> element:

<title>, which contains the name of the RSS feed
<link>, which contains a link to your website, not the RSS feed file
<description>, which contains a phrase describing your feed

With this information I made a few definitions in Racket that would later populate the <title>, <link>, and <description> elements:

#lang racket/base

(require racket/file
         racket/string)

(define title        "m455's blog")
(define homepage-url "https://m455.casa")
(define description  "A blog about programming, documentation, and anything that interests me.")

Next, I had to figure out which element were required inside of the <item> element. There was no section that was called “Required …”, but I did manage to find this phrase from the Elements of <item> section:

All elements of an item are optional, however at least one of title or description must be present.

According to this, I only needed a <title> or a <description> element.

This was fine, because I could use the <title> element as the title of any given blog post.

Conveniently, the specification also mentioned that you could have a <link> element inside of the <item> element. This was great, because this would mean I could include a link for each blog post, so users could click a link in an RSS feed, instead of referencing the title text and searching for it manually.

After reading that, I had decided to make a test file based on the requirements I gathered:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
  <title>m455's blog</title>
  <link>https://m455.casa</link>
  <description>A blog about programming, documentation, and anything that interests me.</description>

  <item>
    <title>This is test1</title>
    <link>https://m455.casa/posts/this-is-a-test1</link>
  </item>

  <item>
    <title>This is test2</title>
    <link>https://m455.casa/posts/this-is-a-test2</link>
  </item>

</channel>
</rss>

I took this over to the w3schools RSS validator and decided to test it.

Unsurprisingly, the validator returned errors:

“item should contain a guid element”
“Missing atom:link with rel=”self"

I clicked the help link beside the guid-related error, and the help documentation said that all I needed to do was add in a <guid> element. This is great, but I had no clue what I was supposed to be populating the <guid> element with.

After some research, which was basically a bunch of Wikipedia rabbit holes, I found out that the <guid> just needs to be a “unique identifier” for each <item> in a <channel>, so what better unique identifier for an item than the link to the item itself!

After modifying my already-modified RSS feed, I threw it at the RSS validator again:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">

<channel>
  <title>m455's blog</title>
  <link>https://m455.casa</link>
  <description>A blog about programming, documentation, and anything that interests me.</description>

  <item>
    <title>This is test1</title>
    <link>https://m455.casa/posts/this-is-a-test1</link>
    <guid>https://m455.casa/posts/this-is-a-test1</guid>
  </item>

  <item>
    <title>This is test2</title>
    <link>https://m455.casa/posts/this-is-a-test2</link>
    <guid>https://m455.casa/posts/this-is-a-test2</guid>
  </item>

</channel>
</rss>

… and that seemed to get rid of the guid-related error! Did I do it right? Who knows!

Next up was that mysterious “Missing atom:link with rel=”self"" error.

I clicked the help link beside the error, and it gave me the following solution:

If you haven’t already done so, declare the Atom namespace at the top of your feed, thus: <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

Then insert a atom:link to your feed in the channel section. Below is an example to get you started. Be sure to replace the value of the href attribute with the URL of your feed. <atom:link href="http://dallas.example.com/rss.xml" rel="self" type="application/rss+xml" />

The first suggestion just required you to add the xmlns:atom=... at the top of the RSS feed, but the second suggestion took a bit of fiddling around to figure out.

It turns out all I needed to do was provide a link to the RSS feed itself.

So, I, yet again, modified my RSS feed to test against the RSS validator:

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
  <title>m455's blog</title>
  <link>https://m455.casa</link>
  <description>A blog about programming, documentation, and anything that interests me.</description>
  <atom:link href="https://m455.casa/feed.rss" rel="self" type="application/rss+xml" />

  <item>
    <title>This is test1</title>
    <link>https://m455.casa/posts/this-is-a-test1</link>
    <guid>https://m455.casa/posts/this-is-a-test1</guid>
  </item>

  <item>
    <title>This is test2</title>
    <link>https://m455.casa/posts/this-is-a-test2</link>
    <guid>https://m455.casa/posts/this-is-a-test2</guid>
  </item>

</channel>
</rss>

… which turned out to be valid!

There was an error saying “Self reference doesn’t match document location”, but that was because it was trying to follow the link to the RSS feed when the RSS didn’t exist yet. To make sure it did work, I added the test feed to my live website to see if it validated using the URL validator, instead of the direct-input validator, and it did!

Now I knew what it took to create a bare-minimum, valid RSS feed.

This meant I could finally start the fun part: The programming of the RSS feed generator :D

For my definitions, I came up with

(define title            "m455's blog")
(define homepage-url     "https://m455.casa")
(define description      "A blog about programming, documentation, and anything that interests me.")
(define reference-file   "pages/posts.md")
(define feed-file        "feed.rss")
(define feed-file-output (string-append "output/" feed-file))
(define feed-file-url    (string-append homepage-url "/" feed-file))

The feed-file-url creates a https://m455.casa/feed.rss link, while the feed-file-output creates the location of where my feed-file is to be generated: output/https://m455.casa/feed.rss

The reason I needed the location of my feed-file is because the RSS-generator script exists in the same directory as the output/ directory, along with posts/, pages/, images/, etc.

There is one special Markdown file I need to parse/transform, which is the posts.md Markdown file, which exists inside of the pages/ directory. This file contains a bulleted list of titles and links to all of my blog posts, which would soon be converted into an RSS feed.

I chose this file because it has the two pieces of information I need for each <item> element in my RSS feed:

The title of the blog post
The link to the blog post
and because the <guid> element will be populated with the same information as the <link> element, I didn’t have to worry about finding data to populate the <guid> element with

You can see what the whole posts.md file looks like below:

# Posts

* [Thoughts on technical writing and accidentally gatekeeping communities](/posts/thoughts-on-technical-writing-and-accidentally-gatekeeping-communities.html)
* [Having fun with Lisp(s)](/posts/having-fun-with-lisps.html)
* [Public Unix server etiquette](/posts/public-unix-server-etiquette.html)
* [What I like about the Scheme community](/posts/what-i-like-about-the-scheme-community.html)
* [What are social Unix servers?](/posts/what-are-social-unix-servers.html)
* [Redirecting your GitHub Pages website to a Dat url](/posts/redirecting-your-github-pages-website-to-a-dat-url.html)
* [Setting up graphical applications in Windows Subsystem for Linux](/posts/setting-up-graphical-applications-in-windows-subsystem-for-linux.html)
* [Setting up a Chinese input method on GNU/Linux](/posts/setting-up-a-chinese-input-method-on-gnulinux.html)
* [A quick guide to pronouncing Chinese words](/posts/a-quick-guide-to-pronouncing-chinese-words.html)
* [Interpreting second language speakers](/posts/interpreting-second-language-speakers.html)
* [Learn to read and type Chinese: A primer for the people of the internet](/posts/learn-to-read-and-type-chinese.html)

I decided that all I would need to do to convert this file into an RSS feed is:

Remove the # Posts title
Remove the * bullet points
Extract the title, which exists between the [ and ], and store it in a local definition
Remove the /posts/ bit from links
Extract the link, which exists between the ( and ), and store it in a local definition
Remove the brackets and parentheses around the title and links

The removal of items can be emulated by searching for a string and replacing it with "".

The extraction of the link and title information can be done with regex.

The rest of my script just needs to create string templates that are formatted, populated, and then stitched together.

One thing I really enjoyed about using Racket for this little project was that I could use string blocks, which allowed me to type in string values in a very free-form manner.

You can see what I mean by “free-form” below. Basically, everything between #<<string-block and string-block is treated as a string. New lines, tabs, etc. are all rendered as well, so it’s almost the same experience you would get if you were to type text into a plain-text file.

(define rss-header
  (format
  #<<string-block
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

<channel>
  <title>~a</title>
  <link>~a</link>
  <description>~a</description>
  <atom:link href="~a" rel="self" type="application/rss+xml" />

string-block
title
homepage-url
description
feed-file-url
))

The ~as are all populated with the title, homepage-url, description, and feed-file-url definitions, just the same as you would populate a string with (format ... title homepage-url description feed-file-url).

Even though my RSS feed’s title, homepage link, and description aren’t directly connected to my website generator’s source files, it’s still fun to have a default RSS template that I can pass around for future websites I create, as long as I follow the same format as posts.md. Yeah, it’s bad design, but that’s why this RSS-feed generator is beautifully awful haha.