it’s only nix if it comes from the utrecht region of the netherlands. otherwise it’s just sparkling input hashing

how do we add caching to a static site generator? quoth last episode:

we need a way to know if a post has changed, or any of its attachments have changed, since it was last rendered or its metadata was last cached, and we also need to know what tag pages will need to be rerendered.

after a bunch of experimentation, i think nix is pretty much what we want, but nix doesn’t really work on windows, so it can’t be nix, so let’s build our own nix :3

disclaimer again

none of this has landed on the main branch yet, for good reason. it’s probably buggy as shit, so don’t run it on your real blog yet.

juicy numbers

on the cache branch as of 3e79cc5654c96:

this blog (4803 threads)

this blog is my cohost archive plus a couple hundred newer posts. that’s 4803 posts (9816 if you count posts referenced by replies), and ~4.7 GB of attachments.

no cache cold cache warm cache
rendering the site 828 ms 1731 ms 385 ms (−53.5%)
rendering after edit 438 ms
querying a tag 236 ms 1045 ms 90 ms (−61.8%)
note: percentages are relative to “no cache”
details: rendering the site
$ export RUST_LOG=autost=warn
$ time autost render --skip-attachments
9.36s user 1.30s system 1287% cpu 0.828 total
$ time autost render --skip-attachments --use-cache
10.26s user 3.97s system 821% cpu 1.731 total
$ time autost render --skip-attachments --use-cache
1.02s user 1.53s system 661% cpu 0.385 total
details: rendering again after editing tags in one of the posts
$ unset RUST_LOG
$ time autost render --skip-attachments --use-cache
INFO build{function=ReadFile id=b0143e6e233ed...}: autost::cache: building
INFO build{function=RenderMarkdown id=e0c542ce051e2...}: autost::cache: building
INFO build{function=FilteredPost id=5ad850559c540...}: autost::cache: building
INFO build{function=Thread id=799f2c033be6a...}: autost::cache: building
INFO build{function=RenderedThread id=844c6195470bd...}: autost::cache: building
INFO build{function=Thread id=9dda7da0c21ae...}: autost::cache: building
INFO build{function=RenderedThread id=eac9c7f17c6f8...}: autost::cache: building
INFO autost::cache: writing cache pack 5ad
INFO autost::cache: writing cache pack 9dd
INFO autost::cache: writing cache pack 799
INFO autost::cache: writing cache pack e0c
INFO autost::cache: writing cache pack eac
INFO autost::cache: writing cache pack 844
INFO autost::cache: writing cache pack b01
1.27s user 1.72s system 681% cpu 0.438 total
details: querying a tag
$ unset RUST_LOG
$ time autost cache test --list-threads-in-tag usb3sun --use-cache
37 threads in tag "usb3sun":
- "2023-01-06T18:08:35.092Z", "posts/787278.html", "SPARCstations have a unique serial-like interface "
- "2023-06-23T17:15:54.361Z", "posts/1742287.html", "usb3sun is an adapter that lets you connect usb ke"
- "2023-06-24T17:50:23.538Z", "posts/1650431.html", "SPARCstations have a unique serial-like interface "
[...]
INFO autost::cache: writing cache pack 372
0.49s user 0.93s system 410% cpu 0.347 total
$ rm -R cache
$ export RUST_LOG=autost=warn
$ time autost cache test --list-threads-in-tag usb3sun
1.77s user 0.27s system 864% cpu 0.236 total
$ time autost cache test --list-threads-in-tag usb3sun --use-cache
2.39s user 2.77s system 493% cpu 1.045 total
$ time autost cache test --list-threads-in-tag usb3sun --use-cache
0.07s user 0.30s system 412% cpu 0.090 total

big archive (146752 threads)

ok but what if i merged the cohost archives of everyone i followed into one very big archive? that’s 146752 posts (309140 if you count posts referenced by replies), and ~64 GB of attachments.

no cache cold cache warm cache
rendering the site 24.90 s 35.75 s 11.18 s (−55.1%)
querying a tag 5.551 s 11.67 s 1.547 s (−72.1%)
note: percentages are relative to “no cache”
details: rendering the site
$ export RUST_LOG=autost=warn
$ time autost render --skip-attachments
258.96s user 49.31s system 1237% cpu 24.902 total
$ time autost render --skip-attachments --use-cache
309.85s user 72.00s system 1067% cpu 35.755 total
$ time autost render --skip-attachments --use-cache
33.57s user 44.87s system 701% cpu 11.188 total
details: querying a tag
$ unset RUST_LOG
$ time autost cache test --list-threads-in-tag usb3sun
37 threads in tag "usb3sun":
- "2023-01-06T18:08:35.092Z", "posts/787278.html", "SPARCstations have a unique serial-like interface "
- "2023-06-23T17:15:54.361Z", "posts/1742287.html", "usb3sun is an adapter that lets you connect usb ke"
- "2023-06-24T17:50:23.538Z", "posts/1650431.html", "SPARCstations have a unique serial-like interface "
[...]
2025-08-28T09:39:27.095295Z  INFO autost::cache: writing cache pack 857
14.77s user 18.37s system 381% cpu 8.692 total
$ rm -R cache
$ export RUST_LOG=autost=warn
$ time autost cache test --list-threads-in-tag usb3sun
34.12s user 5.86s system 720% cpu 5.551 total
$ time autost cache test --list-threads-in-tag usb3sun --use-cache
57.26s user 32.47s system 768% cpu 11.673 total
$ time autost cache test --list-threads-in-tag usb3sun --use-cache
1.47s user 5.72s system 464% cpu 1.547 total

anyway! i repeat, how do we add caching to a static site generator?

bad answer: timestamps

make is an incremental build system, where “incremental” means it only rebuilds things that have changed. it does this with file timestamps and a dependency graph: a file needs to be rebuilt if any of its dependencies are newer than it.

this is inadequate, because you can easily change the timestamp without changing the contents or change the contents without changing the timestamp. and good luck if you want to simultaneously cache more than one version of a build product.

this is also unnecessarily frugal, except for attachments, as we’ll see shortly.

better answer: hashes

nowadays reading files is pretty fast, as long as they’re stored on an ssd. let’s write a little microbenchmark to see just how fast it can be. all of these results were taken on my framework 13 with an AMD 7840U and a charger connected.

details: running the microbenchmark
$ autost cache benchmark <posts|posts-recursive|attachments> sum-paths-len <10..=100>
$ autost cache benchmark <posts|posts-recursive|attachments> sum-read-len <10..=100>
$ autost cache benchmark <posts|posts-recursive|attachments> blake3 <10..=100>
$ autost cache benchmark <posts|posts-recursive|attachments> blake3-mmap-rayon <10..=100>

this blog (4803 threads)

it looks like reading and hashing all of the posts doesn’t even double the time taken to walk the tree, but doing all of the attachments would incur an over 30x time penalty.

time to walk and read and blake3
4803 posts 5.62 ms 9.69 ms 9.94 ms (1.76x)
9816 posts (recursive) 11.32 ms 18.24 ms 19.21 ms (1.69x)
5520 attachments 14.48 ms 458.0 ms 508.9 ms (35.1x)
note: multipliers are relative to “time to walk”

big archive (146752 threads)

again it’s less than a 2x penalty if we read and hash the posts, but an over 30x penalty for attachments.

time to walk and read and blake3
146752 posts 197.6 ms 298.2 ms 313.5 ms (1.58x)
309140 posts (recursive) 388.9 ms 612.6 ms 630.3 ms (1.62x)
126626 attachments 326.4 ms 9417 ms 11027 ms (33.7x)
note: multipliers are relative to “time to walk”

another way of looking at it is, if we’re gonna have to read a bunch of files, we might as well hash them so we can easily check if they’ve changed. again this doesn’t make sense for attachments, whose contents are never actually read by autost render. i’m not sure i have a good solution for attachments yet.

best answer: nix?

rendering your site involves many steps:

  • making a list of posts
  • rendering the posts
    • reading the post sources
    • for posts in markdown, rendering the markdown
    • parsing the post html into a dom tree
    • extracting metadata
      • and rendering the referenced posts (if any)
      • and scanning for references to attachments
    • applying transformations (e.g. making all images lazy loaded)
    • serialising the dom tree back to html
    • sanitising the html to “safe” tags and attributes
  • writing the output files
    • copying the static files (e.g. style.css, script.js)
    • hard linking the referenced attachments
    • writing a page for each thread, containing that thread
    • writing a page for each tag, containing all of its threads
    • writing an atom feed for each tag, containing all of its threads
    • writing a page for each built-in collection (e.g. index.html)

we don’t just want to cache the final build outputs (html pages and atom feeds), because the intermediate build steps are also useful for things like querying metadata. so how do we build a cache for all of the different kinds of build steps without descending into cache invalidation and ad-hoc serialisation hell?

many of these steps are pretty much functions (in the mathematical sense) that take some input and transform it to some output. in fact, all of them are, although some of them are easier to describe that way than others.

nix is a build system and package manager that uses this observation to build the largest ever single repository of software with efficient binary caching. how can we apply its ideas here?

imagine the process of loading a thread, just enough to know its metadata like its tags and what attachments it references:

  • for the last post in the thread
    • read the post sources
    • if the post is in markdown, render the markdown
    • parse the html and extract metadata
  • repeat for each of the posts it references

now let’s say we described each of these steps as a function. note that for the caching to work correctly, it’s very important that aside from readFile(), all of the other functions are pure. they can only use their input arguments, which is why makeThread() can’t take one post and load all of the other posts it references from disk.

  • readFile(path, hash) → content reads path with expected hash hash, and returns the content of that file with that hash (more on the hash later)
  • renderMarkdown(markdown) → html renders the given markdown, and returns the rendered html
  • loadPost(html) → post (html, metadata) parses the given html into a dom tree, extracts the metadata, and returns a post
  • makeThread(posts…) → thread (posts…) takes the given posts, combines them into a thread, and returns that thread

each of those functions just Builds The Thing. to make them cacheable and allow us to run them with maximum parallelism, we want to be able to make a “build plan” that completely describes how to Build The Thing without doing all of the work of actually Building The Thing.

let’s describe the build planning as a second set of functions:

  • readFilePlan(path) → “readFile(path, hash)”
  • renderMarkdownPlan(path) → “renderMarkdown(readFile(path, hash))”
  • loadPostPlan(path) →
    • | “loadPost(renderMarkdown(readFile(path, hash))” if markdown
    • | “loadPost(readFile(path, hash))” if html
  • makeThreadPlan(path) → “makeThread(loadPost(…), loadPost(…), …)”

one problem we run into with makeThreadPlan() is that we need the metadata of one of the posts to know what other loadPost() calls to include in the result. this is unfortunate, because it means we can’t entirely avoid Building The Thing when doing build planning.

now here’s the magic!

let’s say we’re building the thread for 10000216.md, which replies to 10000215.md. makeThreadPlan(10000216.md) returns the build plan below. it was relatively easy to compute, and it completely describes how to load the thread from the files that make up the thread.

“makeThread(loadPost(renderMarkdown(readFile(10000216.md, 44062ae08bb67…))), loadPost(renderMarkdown(readFile(10000215.md, 8adba2770e58c…))))”

remember the “(more on hash later)” from earlier? if the build plan for a thread includes the hash of each of its posts, then we can hash the build plan and get an ID that changes if and only if any of the posts have changed.

and with that kind of unique ID for each thing we build, caching is easy! since the contents of the build plan for a given ID can’t possibly change (by definition), and the contents of the output really shouldn’t change (if we wrote our build logic correctly), we never need to delete or update anything, except maybe to free up disk space.

to me, this is the essence of nix, and i had great fun capturing that essence in our funny little static site generator. if you wanna see how this ersatz nix works, check out src/cache.rs and src/cache/drv.rs ^w^

that’s the post. but if you’re hungry for even more detail, read on…


performant store

processing thousands or even hundreds of thousands of posts is hard work. some things that helped with cache storage bottlenecks:

  • staring at hundreds of flamegraphs
  • switching from sqlite to plain old files
  • switching from json to bincode
  • switching from tokio async to rayon sync
    • dedicated rayon thread pools for writing files with 4x the threads, since they will spend most of their time in syscalls waiting for i/o
    • explicitly creating a thread pool for normal work, since rayon otherwise just uses the thread pool of the innermost fork-join scope
  • caching build products in memory, then writing that to disk
    • storing native rust types in the memory cache, so we can take build product serialisation off the critical path
    • offloading writes to dedicated thread pools, so we can take build product file writing off the critical path
  • moving from one file per build product, to 4096 “packs” of cache data
    • splitting the memory cache into 4096 “packs” as well
    • splitting the dirty bit into 4096 dirty bits as well
  • reducing overheads of syncing memory caches with the disk store
    • switching from DashMap to HashMap, because it’s cheaper when we have thousands of them, and still performs well enough when sharded
    • lazy deserialisation of cache items, loading as Vec<u8> until needed

tag indexes

autost cache test --list-threads-in-tag builds a “tag index” to more efficiently look up posts having the given tag, but it was hard to do this in a way that wasn’t slower than just computing the metadata from scratch. two things made it possible.

the build plans themselves were expensive, because for each evaluation we had to deserialise a big tree of inputs from ThreadDrv to FilteredPostDrv to RenderMarkdownDrv and finally down to ReadFileDrv. plus load their outputs into the memory cache for no reason. we fixed this by hiding everything other than ReadFileDrv from the build plan, inferring them within the TagIndexDrv builder. idk about this… it feels like cheating or subverting how nix is supposed to work?

the obvious representation of the tag index is a HashMap or BTreeMap, but this has to be deserialised in full on first use without any parallelism, which is a waste unless our query involves the whole dataset. instead we can build a tiny sqlite database and serialise it to Vec<u8>, then serialise that as the build output. since sqlite can obviously query a table without building a whole map in memory, this is much faster.