shuppy (@delan) [archived]

waow this one small trick will make your zpool scrub faster!!

in openzfs dsl_scan.c, changing `scn->scn_phys.scn_min_txg` from 0 to 7000000 in dsl_scan_setup_sync zpool status after scrubbing with stock zfs 2.2.4, taking over seven minutes zpool status after scrubbing with patched zfs 2.2.4, taking just over two minutes and issuing reads for only 316GiB of the 970GiB pool

(not really. i’m just patching zfs to reproduce a bug that happens near the end of a scrub)

shuppy (@delan) [archived]

we’ve done it. blazingly fast, zero-cost scrubbing

zpool status reporting “scan: scrub repaired 0B in 00:00:00 with 0 errors on Fri May 17 00:56:15 2024”
shuppy (@delan) [archived]

one small problem. i have no idea what range of txgs i want to scrub

in openzfs dsl_scan.c, adding some dataset name checks in `dsl_scan_visitds` that set one new flag `is_target_or_descendant` if the dataset is cuffs/code (or a snapshot or descendant), or another new flag `is_potential_ancestor` if the dataset is cuffs (or a snapshot or descendant). this gets used later to skip scrubbing and/or recursing the dataset (not pictured) zpool status after scrubbing with this new patch, showing that the scrub finished in 00:01:17 after issuing reads for only 105G of the 971G pool. zfs list in another terminal shows that cuffs/code is only 106G

friendship ended with world’s most scuffed reimplementation of openzfs/zfs#15250. now world’s most scuffed reimplementation of openzfs/zfs#7257 is my best friend

shuppy (@delan) [archived]

waow this one small trick will make your zpool scrub faster!!

in openzfs dsl_scan.c, changing `scn->scn_phys.scn_min_txg` from 0 to 7000000 in dsl_scan_setup_sync zpool status after scrubbing with stock zfs 2.2.4, taking over seven minutes zpool status after scrubbing with patched zfs 2.2.4, taking just over two minutes and issuing reads for only 316GiB of the 970GiB pool

(not really. i’m just patching zfs to reproduce a bug that happens near the end of a scrub)

shuppy (@delan) [archived]

we’ve done it. blazingly fast, zero-cost scrubbing

zpool status reporting “scan: scrub repaired 0B in 00:00:00 with 0 errors on Fri May 17 00:56:15 2024”

waow this one small trick will make your zpool scrub faster!!

in openzfs dsl_scan.c, changing `scn->scn_phys.scn_min_txg` from 0 to 7000000 in dsl_scan_setup_sync zpool status after scrubbing with stock zfs 2.2.4, taking over seven minutes zpool status after scrubbing with patched zfs 2.2.4, taking just over two minutes and issuing reads for only 316GiB of the 970GiB pool

(not really. i’m just patching zfs to reproduce a bug that happens near the end of a scrub)

shuppy (@delan) [archived]

tidied up the server closet

(before, after)

  • on the shelf: jane (router), tol (plex server), core switch
  • under the shelf: power meter, ethernet to other rooms, gpon ntd
  • floor: ups, venus (nas)
  • bed: @bark (not pictured)
shuppy (@delan) [archived]

label makers are great btw

ethernet switch, with ports labelled “jane”, “desks”, “ap”, “venus”, “tol” gpon ntd, with port 2 of 4 labelled “▼ this one” top of computer case, with two of the four usb ports covered by labels that read “not connected” wall outlet labelled “HEADS UP! ON SAME CIRCUIT AS LAUNDRY AND LOUNGE”, partially obscured by a power meter

tidied up the server closet

(before, after)

  • on the shelf: jane (router), tol (plex server), core switch
  • under the shelf: power meter, ethernet to other rooms, gpon ntd
  • floor: ups, venus (nas)
  • bed: @bark (not pictured)

zpool_expansion time

two new WD80EFPX disks, with four labels resting on top, for these two plus the last two disks i added handwritten labels, partially cut from a page out of a notebook like a flyer for a lost dog, showing that ocean5x0 used to be ocean4x1 those four disks, now with labels printed by a label printer those four disks, now installed in a define 7 xl
  1. two new disks! unlike the WD80EFZZ, the WD80EFPX doesn’t seem to let you set the idle timer (wdidle3, idle3ctl). will need to keep an eye on those load cycle counts

  2. gonna interleave them with the last pair of disks i added, so we don’t end up with two mirror vdevs having both of their disks bought at the same time

  3. started writing my usual “lost dog, responds to ocean” labels by hand, but @ariashark reminded me we have a label printer now, so i redid them (and then redid ocean4x1 again, because it needs to be ocean4x2 now)

  4. installed! define 7 xl now at 14 disks and two ssds :3

Plum (@plumpan) [archived]

I'm pretty sure at least two other people have made this chost but I'm going to make it too.

If you live in the US, you can get skylake generation quad core office PCs for the same price as a 4GB raspberry pi. When you include the cost of a quality power adapter for it, you start getting into kaby lake or even first gen ryzen systems. Most of these come with 8GB of ram minimum, sometimes 16. They can be upgraded. You can get 1L PCs if size matters, or you can get SFF or Mini Desktop systems if you need expansion. They are way, way cheaper than raspberry pis when you start wanting to attach non USB IO.

They do not have GPIO. They use a lot more electricity. They're not the ideal choice for every situation, and pricing can change a LOT if you live outside of the US. But they are basically scrap on the way to the landfill that can still do a ton of stuff, and do those things for many years to come.

ruby (@srxl) [archived]

say hi to the Gemstone Labs

a bunch of mini desktops (3 hp elitedesks, 1 dell optiplex) sitting on top of a microtik 24 port switch, all under an ikea lack table

they're all haswell boxes, and they've got more than enough power to host all my shit. if you've ever talked to me on matrix, visited my website, or seen one of my chosts with a funky embed (like this one), you've spoken with them before

not pictured: the way too overkill ryzen build NAS providing storage for them

shuppy (@delan) [archived]

hp sff gang

two hp sff computers on a shelf, next to some ethernet switches one hp mini-sff computer on a bunnings plastic trestle table, next to our 3d printer

from left to right:

  • 0 AUD, jane (successor to daria), our opnsense home router, intel 4th gen
  • 71 AUD, tol, our plex server, intel 6th gen (with hardware video encoding!)
  • 100 AUD, smol, our 3d printer server, intel 6th gen

dozens of these things go for <200 AUD every other month at my local auction house. they’re quiet, they’re fast, and they don’t use a ton of power. would recommend.

not pictured: our big nas with 14 drives on the floor

[…] there are literally hundreds of thousands, if not millions of these things that you can get for under a hundred bucks, and in 15 years they will still be stacked to the rafters in ebay seller warehouses. the scale of waste in enterprise computing is literally inconceivable, it is beyond the ability of the human mind to comprehend just how many phenomenally good computers are thrown out every single day.
gravis

--i-am-a-cron-job-fuck-me-up-and-delete-without-asking

screenshot of github commit titled “zfs-sync-snapshots: rename --delete-yes and update sync scripts” diff of the script, showing a “--delete” option accepting values “none”, “old”, “all”, or “this”, a mandatory “dry” or “wet” argument, and a “--delete-yes” option being renamed to “--i-am-a-cron-job-fuck-me-up-and-delete-without-asking”

[sheldon smith voice] shuppyco had multiple safeguards in place that could have prevented a data loss incident, such as a dry-and-wet-run system and a safer deletion interface that prompts the operator to confirm each snapshot slated for deletion.

the csb found that delan, the operator on shift at the time, systematically disabled each of those safeguards, citing the pedestrian and familiar nature of the task at hand. it said, “priming my incremental backups is simple, i’ve done this countless times!”

unfortunately, this time it was not simple.

the csb concludes that shuppyco should (a) make the consequences of disabling key data loss safeguards impossible for operators to miss, (b) design and implement a process safety management system,

oops i wrote my own zfs snapshot thinning

terminal showing shell script “zfs-thin-snapshots” on the left, and the result of running it against dataset “ocean/dump/jupiter/home” on the right, where daily snapshots are kept for a week, weekly snapshots are kept for a month, and monthly snapshots are kept thereafter

daily automated zfs backups

screenshot of discord channel with four messages, each with a log file attached:

“sync jupiter ok”
“sync colo ok”
“sync venus ok”
“sync jupiter failed! @everyone”

just set up daily automated zfs backups for three of my machines, including discord logging and ping on failure :D

(sauce)

one weird trick to speed up your zfs metadata by 10x

venus, my home server, now with an intel 730 240GB in addition to the existing intel 730 480GB, so we can put the special vdev on redundant flash

the trick is adding a “special” vdev 🚄⚡

$ time du -sh /ocean/private
2.5T    /ocean/private
11:16.33 total

(send, add special, recv, remove l2arc, reboot)

$ time du -sh /ocean/private
2.5T    /ocean/private
1:17.16 total

https://forum.level1techs.com/t/zfs-metadata-special-device-z/159954

it’s colo time

screenshot of opacus and stratus side by side, with neofetch outputs on top, and below that, pings from my computer showing that they are both <1 ms away

18 months ago, i started hosting my websites and other public stuff out of my house. this largely went well, except for the residential power outages thing and the paying 420 aud/month for 1000/400 thing.

1RU with gigabit in a boorloo dc is like 110 aud/month. it’s colo time :)

here is what we’re gonna move the libvirt guests to.

1RU rackmount server on my floor, with the nextdc branded flash drive i used to install nixos
1RU rackmount server on my floor, with the nextdc branded flash drive i used to install nixos

through some wild coincidence, the new server uses the same model mobo as my home server, and its previous life was in the same dc it’s destined for.

neofetch on colo, nixos 22.11, X10SLM+-LN4F, E3-1270 v3 neofetch on venus, nixos 21.11, X10SLM+-LN4F, E3-1276 v3
left: neofetch on colo, nixos 22.11, X10SLM+-LN4F, E3-1270 v3.
right: neofetch on venus, nixos 21.11, X10SLM+-LN4F, E3-1276 v3.

dress rehearsal of the migration.

first we take zfs snapshots on the old server (venus), and send them to the new server (colo). then on venus, we suspend the vm (stratus), take snapshots again, send them incrementally to colo, and kick off a live migration in libvirt. finally on colo, we resume stratus, and stratus makes the jump with under 30 seconds of downtime.

venus$ sudo zfs snapshot -r cuffs@$(date -u +%FT%RZ | tee /dev/stderr)
2023-07-01T07:49Z

venus$ virsh suspend stratus

venus$ sudo zfs snapshot -r cuffs@$(date -u +%FT%RZ | tee /dev/stderr)
2023-07-01T08:51Z

venus$ for i in cuffs/stratus.{vda,vdb}; do
> sudo zfs send -RvI @2023-07-01T07:49Z $i@2023-07-01T08:51Z |
> ssh colo sudo zfs recv -Fuv $i
> done

venus$ virsh migrate --verbose --live --persistent \
> stratus qemu+ssh://colo.daz.cat/system

colo$ virsh resume stratus

can you see where the old host got further away and the new host got closer?

virt-manager shows stratus is shutoff on venus, and running on colo. tty on stratus shows ping to venus jumping from 0.165 ms to 0.281 ms, and ping to colo dropping from 0.426 ms to 0.208 ms.
virt-manager shows stratus is shutoff on venus, and running on colo. tty on stratus shows ping to venus jumping from 0.165 ms to 0.281 ms, and ping to colo dropping from 0.426 ms to 0.208 ms.