add post on lwn rss feeds
All checks were successful
Build Hugo Site / build (push) Successful in 11s

This commit is contained in:
Benjamin Hays 2024-12-21 16:28:33 -05:00
parent 84352979e8
commit 5a5d4561c0
Signed by: BenHays42
GPG Key ID: CE14B8B296ABEFB1
2 changed files with 117 additions and 4 deletions

View File

@ -43,13 +43,14 @@ These roles and playbooks are fairly minimal and not too complex but I do enjoy
## Gitea Actions to the Rescue!
The answer, if the title did not make it obvious, is GitHub Actions (or rather Gitea Actions, but we'll get to that). Gitea were the missing piece in transforming my Ansible playbooks from a manual toolset to a fully automated configuration management system.
The answer, if the title did not make it obvious, is GitHub Actions (or rather Gitea Actions, but we'll get to that). Gitea were the missing piece in transforming my Ansible playbooks from a manual toolset to a fully automated configuration management system.
If you've never encountered it, [Gitea Actions](https://docs.gitea.com/usage/actions/overview) is essentially an open-source, self-hosted alternative to GitHub Actions, designed to provide continuous integration and continuous deployment (CI/CD) capabilities directly within a Git repository. Gitea Actions uses a workflow syntax nearly identical to GitHub's, making it remarkably easy to migrate or adapt existing workflows.
The workflow files are written in YAML, stored in the `.gitea/workflows/` directory of a repository. These files define exactly how automation should occur - specifying triggers, environment requirements, and the precise steps to execute. These workflow files allow Gitea Actions to spin up ephemeral runners - temporary Docker containers that execute the specified workflow steps in order. These runners are self-hosted, allowing me to use my own infrastructure for running jobs. In my homelab, this means I can run complex deployment and testing workflows without relying on external services in the cloud, paying for hosted runners, or most importantly create a security risk by allowing external traffic to manage my servers.
The following is the Gitea Action that generated the site you're currently reading:
```yml
# .gitea/workflows/build.yaml
name: Build Hugo Site
@ -82,7 +83,7 @@ My workflows, like the one shown above are fairly rudimentary examples, but they
## Gitea Actions for Ansible
Now that we've covered that primer, let's take a closer look at the two workflows that power my infrastructure automation: `ansible-deploy.yml` and `ansible-lint.yml`.
Now that we've covered that primer, let's take a closer look at the two workflows that power my infrastructure automation: `ansible-deploy.yml` and `ansible-lint.yml`.
### Deployment Workflow
@ -146,7 +147,6 @@ The dependency and environment setup follows a careful, reproducible pattern:
Caching Python packages might seem like a minor optimization, but it significantly speeds up repeated workflow runs. The explicit installation of Ansible Galaxy requirements ensures that all necessary roles and collections are available.
```yaml
- name: Run playbook
uses: dawidd6/action-ansible-playbook@v2
@ -160,6 +160,7 @@ Caching Python packages might seem like a minor optimization, but it significant
--inventory Ansible/inventory/homelab.ini
--extra-vars "@Ansible/homelab-vault/secrets.yml
```
As mentioned before, the beauty of Gitea Actions is that I can easily grab a task made for GitHub and include it in my code. The instruction above is what does most of the heavy lifting, Ansible-wise.
Notice that the prior action specifies an Ansible Vault password and a reference to `secrets.yml`. This is an encrypted archive of CI/CD secrets used in the playbook that I have stored on a private Gitea repository which the action pulled in a previous step. This way I can use privileged information in my Ansible automation whilst keeping it safe and secure.
@ -194,4 +195,4 @@ This workflow runs on every push, automatically checking my Ansible code against
## Conclusion
The beauty of these workflows lies in their simplicity and transparency. Every configuration change is tracked, every deployment is automated, and the entire process is documented directly in the repository. Now, it's very far from the most elegant solution, and I am certainly not a programmer at heart, but I've found this collection of tools to be very valuable in my home-lab journey. It has saved me from hours of work and made my network a good deal safer in the process. As always, thank you for reading!
The beauty of these workflows lies in their simplicity and transparency. Every configuration change is tracked, every deployment is automated, and the entire process is documented directly in the repository. Now, it's very far from the most elegant solution, and I am certainly not a programmer at heart, but I've found this collection of tools to be very valuable in my home-lab journey. It has saved me from hours of work and made my network a good deal safer in the process. As always, thank you for reading!

View File

@ -0,0 +1,112 @@
---
title: "Building Custom RSS Feeds for LWN.net"
date: 2024-12-21
toc: false
tags:
- Software Development
- Python
- Automation
---
## Introduction
If you've spent a bit of time in the Linux community, especially anything kernel-related, you may be familiar with [Linux News Weekly](https://lwn.net). It's an invaluable resource for kernel development news, security updates, and general open source coverage. While they offer RSS for syndication, I found myself wanting more control over the feeds themselves - specifically, the ability to filter out subscriber-only articles.
**Disclaimer**: I completely support the efforts of LWN.net, and you should strongly consider purchasing a membership there if you are able to. As a student, it's unfortunately a bit outside of the realm of possibility for me right now. All of the original RSS feeds used are publicly available for free on their website.
If you want to skip my code/solution altogether then you can find my filtered feeds [here](#conclusion)
## My Solution
LWN.net operates on a subscription model where some articles are only available to subscribers and become freely available after a week. Others, like security patch updates are available instantly and free forever. The official RSS feeds include all articles, which can be a bit frustrating when scrolling through my e-reader only to find things that won't be available till next week, conveniently just enough time for me to forget about them entirely.
```python
def download_feed(s, url, file, remove_premium=False):
r = s.get(url) # where s is a requests.Session object
tree = ET.ElementTree(ET.fromstring(r.text))
root = tree.getroot()
for post in tree.iter('item'):
if remove_premium and "[$]" in post.find('title').text:
root[0].remove(post)
tree.write(file)
```
The code is quite straightforward and minimal - it downloads the RSS feed, parses it using `lxml`, and removes any items marked with the "[$]" symbol that indicates subscriber-only content. I chose the `lxml` package over alternatives like `feedparser` because it gives me more direct control over the XML structure and allows me to easily write the results of my modifications to a file.
```python
s = requests.Session()
download_feed(s, "https://lwn.net/headlines/Features", "lwn-features.xml", remove_premium=True)
download_feed(s, "https://lwn.net/headlines/rss", "lwn-all.xml", remove_premium=True)
```
This code snippet creates the feeds that I eventually upload and use in my news app. Pretty convenient for 11 lines of code I'd say.
## Automating Updates
Of course, a static, out-of-date feed isn't very useful. I needed a way to regularly update the feeds to catch new articles as they're published. This is where Gitea Actions comes in. I set up a workflow that runs every 4 hours:
```yaml
name: Update RSS Feeds
on:
push:
branches:
- 'main'
schedule:
- cron: '@hourly'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Copy SSH Key
run: |
mkdir ~/.ssh/
echo "Host *" > ~/.ssh/config
echo " StrictHostKeyChecking no" >> ~/.ssh/config
echo '${{secrets.SSH_PRIVATE_KEY}}' > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
- name: Install Prereqs
run: |
apt update -y
apt install python3-requests python3-lxml -y
- uses: actions/checkout@v3
with:
submodules: recursive
- name: Generate Feeds
run: |
python3 generate_feeds.py
- name: Deploy to Server
run: |
scp -i ~/.ssh/id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -r lwn-*.xml bhays@10.0.0.20:/var/www/html/
```
The workflow is also fairly small - it just installs the required Python packages and runs the feed generator script. The generated feeds are then copied to my web server using SSH, making them available at predictable URLs. It's a bit of a hacky solution, given that CI/CD jobs were never really quite "made" for this, but it seems to be working perfectly fine without any downsides.
## Looking Forward
There's still plenty of room for improvement. I'd like to add full-text parsing for the articles that are freely available - you might notice the TODO comment in the original code repository. This would make the feeds more useful in feed readers that don't automatically fetch the full article content. Personally, I'm content with the other services that perform this task, but it's definitely an idea that could be worked upon.
I'm also considering adding more specialized feeds. For example, a feed that only includes security-related articles, or one that focuses on kernel development discussions. The nice thing about having the basic infrastructure in place is that adding new feeds is just a matter of writing the appropriate filters.
## Running Your Own Instance
If you want to set up your own custom LWN feeds, the code is available on my [Gitea instance](https://git.benhays.org/BenHays42/lwn-rss/). You'll need:
- Python 3 with the `requests` and `lxml` packages
- A web server to host the generated feeds (or you could find a way to publish it using GitHub pages)
- Basic understanding of Gitea Actions (or GitHub Actions if you prefer)
## Conclusion
Just remember that while we're filtering the feeds, all the content still belongs to LWN.net. If you find value in their reporting, consider supporting them with a subscription. I certainly do.
The feeds I generate are available at:
- All free articles: <https://benhays.org/lwn-all.xml>
- Free featured articles: <https://benhays.org/lwn-features.xml>
As always, thanks for reading!