Aaron N. Tubbs bio photo

Aaron N. Tubbs

Dragon chaser.

Twitter Facebook Google+ LinkedIn Github

So I support an application that creates about 25GB of data, across 25,000 files a day. Don’t ask me why the data isn’t in a database. In any event, we used to be on NT servers with SAN back-ends for file storage, but this was pretty slow for our purposes, so we switched to using a share on a NetApp filer. At first, NetApp was great, because we saw 45% performance improvement over the NT/SAN solution … not to mention it was worlds more reliable.

Problem is, NetApp really isn’t built for people that do the sort of data flux that we do. Each day we create this much data, and move a comparable amount off to another filer to be archived (literally put in ZIP files on the other filer). This means that each day we’ve got 25GB of data being deleted, and 25GB being created. NetApp has this lovely feature called snapshots. It’s really cool in principle, in that you have instant access to previous versions of your fies, at a snapshot in the past. Say you toast your home directory with rm -rf. No problem. Just go into .snapshot and look at the snapshot of your home directory from an hour ago; it’s as if nothing ever happened!

Problem is, snapshots take space, and if you’re snapshotting a 250GB-per-week data flux, that’s a lot of data to store to keep snapshots going back for a month for even a 300GB share, even if you are being very clever about storing only deltas, and not producing duplicate data. Suddenly our 300GB share is needing something like a TB or two to maintain its snapshots. Of course, there are other shares on the volume, which doesn’t help things. In fact, it creates a rather interesting situation. First off, the available space for all of the shares on the volume starts mysteriously draining. While you may only be using 10GB of your 300GB, you might see something strange, like 20GB being available, because the filer is using the allocated share space to handle the overage on snapshot utilization. Ergo, our application is screwing everybody else on the filer … we got nailed the other day because our archive filer’s usage (several TB, but a similar data flux pattern) was cocking up a whole bunch of home directories by filling them up — numerous windows and unix users had zero disk space for a while as a result.

Obviously this is a bit of a problem. Now, I’m not going to say this is a problem with NetApp filers, it may be a configuration issue, and a bit of the problem involved with using a shared server in a large corporation … but most shops can’t justify dedicating a top-of-the-line filer and volume directly to one group, unless I’m missing something.

Anyhow, that’s just a small problem. The bigger problem relates to deleting all of that data. Quite frankly, from windows, it’s impossible to reliably delete large folders of data; the deletes keep failing. How on earth does a product get released that suffers from this sort of problems? I really love the idea of the filer, and a lot of the features are great, but this is like buying a Japanese stereo instead of British hi-fi — the end result is too many buttons to push, and pure crap coming through the speakers.