static site generators can have little a database, as a treat
i’ve started working on autost again, with the hope i can tackle some of the problems i wrote about in april. so far, i’ve been cautiously experimenting with adding a database, which again, seems to be unavoidable if we want autost to be anything more than a Cohost Archive You Can Keep Posting To. but this can solve other problems too!
disclaimer
none of this has landed on the main branch yet, for good reason. it’s probably buggy as shit, so don’t run it on your real blog yet.
post ids
to publish a new post, we need to choose a new post id, and our current approach is pretty minimalist (derogatory):
- let
id
= 10000000 - try to create
posts/{id}.md
- if we can’t because it already exists:
- let
id
=id
+ 1 - go to step 2
- let
this is obviously inefficient, because it gets slower and slower the more posts you’ve made, and it doesn’t even work correctly if your last post was a .html
post.
we could cache the last post id in a text file, and that sounds easy at first, but storing data in text files gets complicated fast. now you have to worry about things like
- how to avoid data loss if a power outage happens while writing
- how to version and migrate the file format when you need to extend it
- how to make related changes to multiple files together or not at all
and all of these are solved problems if we use an actual database. here’s a simple sqlite database that solves our post id problems:
CREATE TABLE "post" (
"post_id" INTEGER PRIMARY KEY AUTOINCREMENT -- e.g. 10000250
, "path" TEXT NOT NULL -- e.g. '10000250.md'
, "rendered_path" TEXT NULL UNIQUE -- e.g. '10000250.html'
);
since the "rendered_path"
is UNIQUE
, we can’t accidentally create a "path" = '10000250.md'
and a "path" = '10000250.html'
that both have "rendered_path" = '10000250.html'
. and choosing the next post id so we can publish a new post is easy:
BEGIN;
-- no "post_id", so we generate it with AUTOINCREMENT.
-- empty "path", because we don’t know it yet.
INSERT INTO "post" ("path") VALUES ('');
-- now we check the “last insert id”, which is done outside sql.
-- let’s say it’s 10000250.
UPDATE "post" SET "path" = '10000250.md' WHERE "post_id" = 10000250;
COMMIT;
caching
right now, answering questions like “what posts are tagged #birds?” takes about as long as rendering your whole site, because in both cases we have to read all of your posts, parse the html, and extract the metadata. and when you publish or edit a post, we have to render your whole site from scratch.
this is generally how static site generators work, and Computer Touchers like that because having your whole blog be a bunch of text files is (a) elegant and (b) works well with version control. but the average Computer Toucher also has one (1) blog post, titled “New Year, New Blog, or: How I Learned To Stop Worrying and Switch From Twelvety to Sext.js”, so we should take their opinions with a grain of salt.
static site generators do have other benefits though. they’ve got good data portability, and static sites are easy to host on any web server, even web servers that are free because Big Coding is using them as a loss leader.
so maybe we want something in between, where your post files are still the primary source of truth, but we cache the metadata that we extract from them in the database, or we cache the fact that the post has already been rendered.
that will leave us with one of the two hard problems in computer science: now we need a way to know if a post has changed, or any of its attachments have changed, since it was last rendered or its metadata was last cached, and we also need to know what tag pages will need to be rerendered. hmm. is this just nix?