I once worked for a large telco, and we wanted to run an Elasticsearch cluster to do some data analysis. Unfortunately, we worked in the wrong part of the organisation so we weren't allowed to buy servers :(
However, we knew someone in the building opposite who was responsible for the company's streaming TV service. They were retiring a whole bunch of CDN machines, similar to these, which had a crazy amount of storage per node.
While we weren't allowed to buy servers, we were allowed to buy components, so I got a bunch of the retired servers and bought enough RAM to max them out.
Except one story is about people doing things legitimately to get things done within the constraints of their corporate overlords while the other is about a thief ripping people off.
Well they are "just" HD44780 driven LCD displays (which are sold on eBay pretty cheap these days). Back in the day when every machine had a parallel port we used to just drive them directly bit banging out what we wanted. These days you can just slap an Arduino inbetween one and a USB port and jobs pretty much done.
Haha, yeah, I have fond memories of these displays. Got my hands onto one by accident as a kid, but didn't have any software to drive it, so I wrote me some software to drive those on the parallel port for display of CPU and RAM usage, number of mails in inbox, fan speeds and stuff. That program even gained a certain popularity in the case-modding scene, it was a hideous pile of crappy Visual Basic 4.0 code called "just another LCD software" or "jaLCDs".
That triggered a memory of when I decided to put a mini-tower PC in my trunk to may MP3s, and bought one of these displays off eBay so I could see a very basic menu and some song info. 4x40 characters of greatness propped up in the empty bay of my Camry. Those were the days.
I remember these units too. Back in 2002-2005, I worked for a small system integrator. We built turn-key appliances for software vendors. Firewalls, security appliances, etc. We'd work with chassis manufacturers to add cutouts for these units on the front of 1U and 2U servers to provide for an interface without needing to hook up a console cart. Same as you might see on a Dell or HP.
I remember these displays too, for fan control and stuff right? There used to be a website of basically show and tell of tower machines with cool upgrades. All pre RGB, but there were plenty of lights fans switches, all of it. It had to be popular around 2000/2001. Anyone remember the site?
I've used those for many hacks. To this day there is one on my desk, with the arrow keys adjusting audio volume, with 'tick' and 'x' keys for mute/unmute. The display shows date/time/audio volume and email counters, with a 60s blip to the name of the person calling when the phone rings.
The Pertelian? I still have mine and came across the printed instruction set just the other day. Fun times. I've thought about trying to get it to work in the modern era, but there are small and info-dense screens these days.
it's one of the earliest /. Era memes that I remember, along with "year of linux desktop" , "netcraft confirms" and "in Russia X does YOU". But I think all those are contemporary, along with "email is for old people " and others I don't remember now.
Beowulfs were huge hacks back in the day, but are similar to most super computers. The main difference from how many think of compute clusters these days is it is managed as a single machine, so historically needed identical hardware. Parallel processing and concurrency used to be more rare :D
I think the main distinction is that modern clusters are shiny, sleek, fancy, and expensive.
Whereas a Beowulf cluster is grungy, full of bailing twine, chewing gum, and possessing high levels of Bodge. And dirt cheap, built with what you can get, often used.
It's a cluster, made out of commodity computers and network equipment, that wasn't designed specifically to be part of a cluster.
"Beowulf" was just the name of an early machine built that way at NASA in the 90s. The approach is not unusual now; many "low-end" supercomputers are just a few racks of commodity servers. But it was very unconventional in the early 90s, which is why the Beowulf term existed, to distinguish it from "real" clusters.
Hardware Ethernet switches -- able to stream from port to port at full speed -- were the revolutionary component. Before Ethernet, clusters required specialized hardware implementing a crossbar-like switched fabric, so that all the nodes could communicate at high speed with each other, with a minimum of hops. These were solutions for clustering mainframes and minicomputers, horribly expensive and proprietary.
The defining attribute of original Beowulf clusters that made them interesting was the use of channel bonding to get more bandwidth than a single NIC could provide cheaply in those days.
At the same time, it demonstrated you could build a useful and inexpensive cluster using white-box linux desktop machines sitting on wire shelving. I did exactly the same thing at the time, except for the channel bonding, which would have helped my cluster scaling significantly, but I didn't have any budget left over for more NICs and since I was using a hub, there would have been lots of collisions anyway.
It's be great to go back in time to 2001 and tell Slashdot: I am from the future. Linux runs on a billion devices more powerful than any desktop computer it currently runs on. A quarter of people in the world use Linux daily. However, Linux still never had it year of the desktop.
Run SMART, check the stats, make sure the grown defect list is 0, make sure it's never been over-temp, run a SMART long test, run bad blocks, run another SMART long test. If it passes all those I'd be fine continuing to use it.
The HUH728080ALE600 drives in that server[1] idle at 5.1 watts[2], so it's 184 watts just on the drives. I'm guessing at idle the server runs around 300 watts. Which is not great, but it's not terrible. I pay ~ $1/watt/year in NC, so running that server 24x7 would cost me $300 annually.
If it were me I'd disconnect at least half the drives to keep as spares.
Cold spares can sometimes disappoint. Lots of techs preferred hot spares not just for immediate availability for the rebuild, but just because the spin-up was just as likely to kill the drive as other failure options. With hot-spares, you could rotate the drives by migrating data around like the old farmers partitioning their fields to let sections rest.
I use HGST / old Hitachi drives in most of my servers, typically either 2 TB or 3 TB, though I do have some older 1 TBs still kicking (over 10 years, running 24x7.) In all the years, I've only had 1 fail. I just checked and one of them has over 91000 hours!
I would love to have this. Old disks are no problem. They will most likely continue to run fine until they no longer are worth it due to either insane electricity prices or 50TB hard drives becomes available for a bargain.
My backup server is still running almost 17-18 years old 500GB hard drives. They will be retired soon though, as they consume too much power and lack spindown when not in use.
I think there's a difference between the wear and tear in a personal backup server and in a Netflix cache server. We manage several storage servers which see intensive usage, and usually at around 5 years we start to see disk failures.
True - but presumably they've been spinning and doing seeks pretty much 24/7. It's not like backblaze/coldline where the drives are practically powered down most of the time.
And of course, 36x the drives means 36x the drive failures - and even if you avoid losing data, you've still got the chore of swapping each failed drive.
> And of course, 36x the drives means 36x the drive failures
I think you must mean 36x the chance of drive failure. Drive failure probability per year is published at around 1% though practically as high as 10%. 36 drives means there's somewhere between a 30.35% (1 - 0.99^36) and 97.7% (1 - 0.9^36) chance per year of a single drive failing.
When you hit hundreds of concurrent operations there is no such thing as sequential. Also there is no need for sequential reads because a stripe of 7-15 disks would give you much more than streaming bitrate needs.
>We manage several storage servers which see intensive usage, and usually at around 5 years we start to see disk failures.
Which begs the question of how old the drives actually are. If intensive usage would result in lots of disk failures after 5 years, then a 9-year-old server would surely have a bunch of new(er) disks inside it? Or do they just reduce the amount of cache space they have with every failure?
Of course, the timing depends on the usage pattern. In our case we write a lot to the disks, which means they fail faster. Time to failure also depends on manufacturer, batch, server conditions... But in general I wouldn't trust too much that have been continuously running under a significant load (I don't think a Netflix cache server had a lot of idle time) for so many years.
Yep. In a previous iteration of my file server I killed a WD green drive in about two years because I unwittingly left the head park feature on. It was parking after 8 seconds of inactivity, and in two years it had accumulated like 2.5 million parks.
Not really. With minimum 50000 cycles, you can spin up and down 28 times every day for 5 years. It may not sound a lot, but in reality it is. I have my NAS set to spindown after 15 minutes idle time and have on average around 1000 cycles every year.
>or 50TB hard drives becomes available for a bargain.
Maybe my old age is showing, but this just frightens me to no end. 1 50TB failure, and you've lost 50TB. Build an array of smaller drives to get to 50TB, and much less catastrophe if 1 drive dies.
With enough redundancy anything can be made reliable. ZFS can do some crazy RAID levels that will pretty much make anything reliable given enough disks and CPU resources. Whether it's a good idea (especially in terms of power usage) is a good question though.
You'd think so, but failing disks in raidz arrays can cause extreme performance problems, and failing disk electronics can cause lockups on SAS/SATA controllers.
At a job I had long ago, embedding an interactive 3d model of a laser scan diffed with a CAD model was one of the products of my work. It was probably the best and easiest way to get that information to other people because otherwise we would have had to find some other way for people to view point clouds with color embedded.
IIRC Altium Designer can embed a complete PCB with their components rendered in 3D. Pretty useful to being able to see the model without having to open the software or even having it installed at all
I can totally see it happening where someone hits print for the color printer on the other side of the office expecting 4 or 5 pages, but upon arriving at the printer after a short walk sees a stack of pages yet it still hasn't finished.
HR later sends out an email regarding the printing of PDFs unnecessarily. Think of the trees that could be saved by not printing.
We had one of these colocated at the rural state university I used to work at. Saved us tons of bandwidth not having tens of thousands of undergrads need to get their 1s and 0s from further afield. Probably improved typical download speeds for everyone upstream of us too.
I would finally be able to download all the Assetto Corsa and rFactor mods I want, and seed those things where there's just me and a couple other dudes so bailing feels bad—with still some storage left for peace of the mind.
>They were also mad that one of their distribution partners last week leaked the finale of House of the Dragon early
I used to work for one of the companies supporting studios getting their content to streamers. This specific story is related to iTunes, but it's easy to do for any of them. The early days of "ramping up" meant putting butts in seats to do the work before automation was in place. The push to automate shot to highest priority when on of the butts in seats incorrectly copy/paste from one spreadsheet to the next which allowed an episode to be downloaded before it aired to anyone that subscribed to that season on iTunes.
The fallout from that was incredible, but credit to the company they did not fire the employee for making a human mistake.
If you look into the deployment guides (.. I was a little bored during lunch, lol. Also high scale tech is cool.) they preload content based on where the device is going to be deployed before shipping them out.
From there they have a "fill" window each day during the ISP's low traffic period where Netflix pushes new content to it.
> Each Open Connect Appliance (OCA) stores a portion of the Netflix catalog, which in general is less than the complete content library for a given region. Popularity changes, new titles that are added to the service, re-encoded movies, and routine software enhancements are all part of the nightly updates, or fill, that each appliance must download to remain current.
I mean honestly in 2022 their catalogue might fit entirely ( :P ) but yeah looks like it's just a portion, comprised of whatever's popular in the region
Even if the entire catalogue didn't fit on there, they mostly exist as edge caches , so stuffing the most popular/upcoming things on there and watching the hit rate is probably good enough to reduce most of the needless transit.
That and they had different tiers of hardware I believe.
You could fit 13,000 4K movies in 262TB. But Netflix will have versions for different resolutions/output devices pre baked so I reckon (I wouldn't want to gamble to much on this) that a single device wouldn't have everything.
Working in the other direction is that the region locking works by region you're in, not where you opened the account. So an edge cache would presumably never have titles that are only available in different parts of the world.
For and edge caching node, it doesn't need to. The vast majority of traffic is for a relatively small subset of the catalog that is popular that week/day/hour. Cache misses are directed to more centralized nodes that do have the full catalog (though not necessarily all on one server).
Reminds me of when I my dad had acquired two decommissioned hard drives that were used to store seafloor scans for the Halifax harbour and surrounding areas.
From my own poor recollection, their dimensions were about 2' x 3' x 1.5' - and obviously quite heavy.
The re-branded Dell servers Google sent out as search indexers (I believe) were pretty common 10-15 years ago. I remember seeing two of them here in Sweden alone, during my career.
But that is not the caseless DIY shelf-servers they've displayed in various PR videos.
For those custom servers, I remember reading that Google did not bother unracking and disposing of them, but let the dead ones sit in the rack indefinitely.
They figured out that it was more efficient to have hardware techs spend 100% of their time provisioning new hardware, and just let their software detect the broken servers, power them down, and route the tasks to working servers.
At some point the hyperscalers end up decommissioning their servers because of perf/watt concerns. I don't think they bother with single failures for the reasons you've stated, but they do decommission full sections of DCs once the wattage calculus no longer makes sense.
I did! I worked at a university at the time and another department was decommissioning one (a GSA). My group was looking for hardware at the time, so we snagged it. Under the yellow paint it was a bog-standard Dell PowerEdge. I think I still have a ridiculous picture somewhere of my team and I hustling the thing out through the parking lot like we were making off with super-secret Google hardware.
I mean, I like to bash on Google too, but this is just not even logical.
They want the spyware to run on your hardware. They don't need it on their hardware. If there is spyware on Goog's hardware, I'd be looking at it coming from certain three letter agencies or other nation states.
No, these pre-date this performance work. The original 100g work was done on a Xeon e5-2697a with NVME storage and no mechanical hard drives. I still use this class of machine as a benchmark to measure performance improvements
> Interestingly, the now-defunct dial-up online service Prodigy used a local caching system to distribute data more efficiently using the same basic principle as Open Connect in the 1980s and '90s.
However, we knew someone in the building opposite who was responsible for the company's streaming TV service. They were retiring a whole bunch of CDN machines, similar to these, which had a crazy amount of storage per node.
While we weren't allowed to buy servers, we were allowed to buy components, so I got a bunch of the retired servers and bought enough RAM to max them out.
Made a lovely ES cluster :)