We use Amazon S3 to store our 80 terabytes of photos and videos. We like the service and it works well. Yesterday, it went down for nearly 8 hours. And during that time, we were mostly up. Cloud computing is all the rage, but sometimes, the weather is really bad and you can’t see the clouds. We planned for that rainy day. Hence, on a day when Amazon S3 was entirely down, I was at the pool, literally. I will tell you about how we did it.
When users upload photos and videos, we first move them to our own servers. In the background, we send the data to S3. If Amazon S3 goes down, we can buffer data for up to two days before we notice. By buffering, we remove the real time requirements of Amazon S3 being up for our users to upload data. We can’t buffer indefinitely, but we are betting than an Amazon S3 outage longer than 2 days is very rare. We always believed short outages would occur. In fact this, is is not the first one.
For serving photos and videos, we act as our own content distribution network (CDN) and cache the hot data. That means that users can view most recent photos and videos, including what was recently uploaded.
All this caching and buffering is done outside of Amazon. We don’t use Amazon’s compute cloud (EC2) for that. We have considered moving more of our system to Amazon Web Services. It is unfortunate that EC2 was built to require S3 to be up in order for to it run. New instances are loaded from S3. So an S3 outage is correlated with an EC2 outage.
Photo and video sharing services that did not plan for S3 outages were completely down yesterday. We estimate that most of the cost savings for our business comes from outsourcing the storage. While we could save some additional money by using EC2, it is not as dramatic as the S3 savings. Hence, we will have to carefully consider before we put all our eggs in that basket.