hnsync: A Hacker News Sync Tool
November 2024
hnsync
is a tool designed to efficiently sync Hacker News items to a local SQLite database. It aims to provide a reliable and performant solution for those who want to analyze or monitor Hacker News data locally.
Getting Started
Running hnsync
is simple. Just execute:
go run github.com/larose/hnsync@latest
By default, the synced data is stored in a file named hn.db
in a table called hn_items
. This table contains two key columns:
id
: The unique identifier for each Hacker News item (INTEGER).data
: The raw JSON response from the Hacker News API (TEXT).
Example Query
You can easily inspect the data using SQLite:
sqlite> SELECT id, data FROM hn_items LIMIT 1;
id data
-- ------------------------------------------------------------
1 {"by":"pg","descendants":15,"id":1,"kids":[15,234509,487171,
82729],"score":57,"time":1160418111,"title":"Y Combinator","
type":"story","url":"http://ycombinator.com"}
How It Works
The design of hnsync
is straightforward yet powerful. It uses four types of workers to handle different aspects of syncing:
-
Discoverer: Iterates from the first item ID to the current maximum item ID on Hacker News, finding new items and adding them to a processing queue.
-
Refresher: Scans the database periodically to update existing items. Recent items are refreshed more frequently while older ones are updated less often.
-
Syncer: A group of workers responsible for downloading item data from the Hacker News API and saving it to the SQLite database.
-
Backfill Verifier: Runs at startup to detect and process any incomplete or missing items from the previous session. This ensures no data is lost, even if
hnsync
was interrupted abruptly.
Get In Touch
If you've built something using hnsync
, I'd love to hear about it! Feel free to reach out and share your project.
Like this article? Get notified of new ones: