George V. Reilly

The TerraServer

Unloading ESA 10000 Storage Subsystem Cabinet

I bought a 2TB external hard disk today. It’s about the size of a deck of cards, but thinner, and it cost me $95. Disk Utility says it has 2,000,397,884,928 bytes.

In 1998, I got to see more than three terabytes of disks in one system. At the time, a server with a 25GB disk was considered high capacity. The 3TB system occupied con­sid­er­ably more volume than a pack of cards. I don’t know what it cost but clearly it was tens if not hundreds of thousands of dollars.

At that time, I was a developer on the IIS team at Microsoft. I was the lead per­for­mance engineer for Microsoft’s web server product. The Ter­raServ­er was a project born in Microsoft Research intended to demon­strate that SQL Server could scale up to terabyte-sized workloads. It stored aerial, satellite, and topo­graph­ic images of the Earth in a SQL database available via the Internet, and it was the world’s largest online atlas.

They launched the Ter­raServ­er on the Internet. And the Internet liked the Ter­raServ­er. And the Ter­raServ­er fell over. So I got called in.

It was June and I was wearing shorts and Birken­stock sandals with no socks. The Ter­raServ­er was housed in one of Microsoft’s data centers and there was cold air blowing up through the floor, and quite soon I was wishing that I had worn socks that day.

The Ter­raServ­er was the biggest mi­cro­com­put­er-based system I had ever seen. Quoting the Ter­raServ­er Tech Report:

The web site has eight Windows NT servers — 6 web servers and 2 database servers. The USGS aerial imagery is maintained on a Compaq Al­phaServer™ 8400 containing 8 440 Mhz Alpha processors and 10 GB of RAM. The processor is attached to 7 Stor­age­Work­s™ Enterprise Storage Array 10000 (ESA-10000) cabinets. The disk arrays are based on UltraSCSI technology.

Each ESA-10000 contains 48 9 GB disk drives and 2 HSZ70 dual-redundant RAID-5 con­trollers. 4 sets of 11 disks each are configured into a single RAID-5 stripe-set and managed as a single logical disk by the HSZ70 controller. 2 drives per cabinet are available as hot spares. Should a disk fail, the HSZ70 con­trollers au­to­mat­i­cal­ly swap a spare drive into a RAID set.

Windows NT Server sees each large (85 GB each) disk created by the RAID con­trollers of each of the seven disk cabinets. It stripes these into 4 large (595 GB) volumes which are then each formatted and managed by the Windows NT file system (NTFS). Each 595 GB volume contains about thirty 20GB files. SQL Server stores its databases in these large files. We chose this 20GB file size since it fits con­ve­nient­ly on one DLT magnetic tape cartridge.

Connected to the Al­phaServ­er 8400 is a StorageTek 9710 automated tape robot. The tape robot contains 10 Quantum DLT7000 tape drives. Legato Networker backup software can backup the entire 1.5 TB Ter­raServ­er SQL database to the StorageTek tape robot in 7 hours and 15 minutes — or 17 GB/hour.

Database back-end: 1 8-way 440Mhz Compaq Al­phaServ­er 8400, 10GB ram, 3.2 TB raid5 324 9GB Ultra SCSI disks

So, 7 cabinets (pictured above) measuring 2’ wide by 3’ deep and 5’ tall, housing 324 9GB disks. And another cabinet for the servers. And one for the tape robot. The whole thing must have been about 20 feet wide.

By comparison, the 15" mid-2012 MacBook Pro laptop that I’m writing this on has 16GB RAM, 2.3 GHz Intel Core i7 (4 hy­per­thread­ed cores), 750GB internal disk (HDD), and the afore­men­tioned 2TB external disk. I’ve been backing up to that external disk and I’ve been hitting 90MB/s transfer rates. Moore’s Law marches on.

At this point, I can’t recall what I actually did to help them out, but between me and other people, they got it working sat­is­fac­to­ri­ly within a few days.

blog comments powered by Disqus
Python Egg Cache » « Review: A Long Long Way