Tiered storage: use the right tool for the job

Hard drives are still the default choice for many homelab and data hoarding enthusiasts. They still hold the gigabytes per dollar advantage over SSD-s (for now), and if you buy the big external drives and take the drives out of the enclosures, you can get a pretty good deal.

Spinning rust has one obvious downside: it’s slow, both in maximum transfer speeds and in latency. For most use cases this is fine, but if you run a service that depends on a database or if you have a lot of clients relying on that storage, you’ll find that hard drives just won’t cut it.

At this point you probably don’t want to splurge and build an all-SSD NAS, so you look up caching solutions. Depending on your platform, you’ll have all sorts of options.

If you run ZFS, you’ll likely learn about L2ARC and why you probably don’t want to use it (hint: check your ZFS ARC hit/miss ratio first). Or SLOG, or the special metadata device. And in most cases the top recommendation is to add as much RAM as you can to the system so that you can benefit from a bigger filesystem cache. You’ll see some improvements, but not all workloads will benefit from this and you might up being disappointed after all that work.

I cannot speak for options on other platforms, but in most cases the idea is similar: buy a separate sacrificial SSD and use it as a cache drive.

Cache is (not) king

Take a moment and think about what type of data you’re storing.

Here’s an overview based on my own setup:

archived YouTube videos (bunch of big files)
cat pictures (bunch of small and big files)
services that utilize a database (bunch of smaller files)
various installation media (bunch of big files)
backups of physical disks (bunch of big files)
a copy of my Steam library (bunch of big and small files)
a web server (bunch of small files)

Most of the data I have is accessed relatively infrequently, and when it is, the performance requirements are not that high. Any hard drive based array will be able to handle streaming video or copying big bulky files over the network.

Data that is more latency sensitive, such as Nextcloud, Jellyfin (and its .sqlite DB) and PostgreSQL, take up a relatively small part of the overall storage.

In this case (and assuming that the setup allows for it physically), the solution is simple: add a smaller flash-based storage pool to the setup and use it for data that benefits from it.

I experimented with a similar setup in the past with two 8 TB hard drives holding all the big files and a pair of 250 GB SSD-s handling Nextcloud and other services. Now I had the best of both worlds.

In short, you don’t need to mess around with a fancy caching solution if your storage usage patterns are simple and predictable. If you do a little bit of investigation once, you can bypass all the logic that is usually built into various caching solutions and get a better end result as no caching solution is 100% perfect.