Being able to deliver reports down to domain level – and even the specific URL – is one of our top achievements. However, as more and more customers want this transparency, technical demands on our reporting platforms are increasing…a lot…
In short: Millions of daily impressions running on thousands of domains and URLs means the we needed to get our hands on equipment that didn’t exist. So we did something else…
Imagine a dump truck at warp speed
We needed both huge capacity and blistering performance – very much like a dump truck at warp speed. However, where do you acquire something like that? You buy both, hook them up and tune beyond belief.
Our setup consists of a lot of huge traditional hard drives. These are relatively slow, but good for the vast amount of reporting storage space. But – to force them into overdrive – we jacked them together in a MacGyver-esque manner with the R2-D2 of storage technology – Solid State Disks (all memory chips, no rotating magnetic disk, yet small in capacity).
Result: we’ve entered a new era. Really, we’ve made one of those frog leaps that’s rarely experienced in technology. In fact, we feel like we’ve handed in our birdman suit for a Concorde.
But – most importantly – we can say: “Mission Impossible: Accomplished”
The Nerdy Version (Be Warned – The Rest is for the Hawkings and Einsteins Out There)
What’s Solid State Drives?
The solid state drive acts as a cache layer in front of the traditional rotating disk parts, meaning whenever we want to store something or retrieve something from the huge traditional disks, it tries to act as a fast middle man.
Say we want to store analyzed breakdowns on URL-level after having analyzed an hour’s worth of delivery traffic. The amounts of small data entries for reports on URL-level are extreme, and suddenly writing all those nitty gritty details would take “forever” having only slow traditional disks at hand, and would greatly impact the use of the rest of system for a long time.
Now, a “middle man”, a.k.a. ‘solid state disk cache’, can receive the result magnitudes faster than before, and let us continue working, offloading the huge storage disks by temporaily handling the surge. Multiple cache layers then bundles all the different small write entries into big packages, suitable for loading to slower storage disks (think trucks), much more effective than trying to load thousands of individual bits and pieces all of a sudden.
When wanting to retrieve the data again, for showing detailed reports, the cache again can help, if it already has some or all of data stored, delivering results magnitudes faster. A tradional disk might find entries at a speed of a few hundred per second, where we have seen our solid state cache peak at 150.000 per second! Simply amazing.
The cache policies are not in itself new at all, but taking it to more levels now, and to a scale of caches to hundreds of GB is a major breakthrough, enabling us to take reporting detail to a whole new level.