storageous

All things Storageous in Storage!


3 Comments

Acceptable Downtime in our World? Introducing “Chaos Monkeys”

netflixI first read an article about this on GigaOM, and this really got me thinking in ways that companies go about down time. All lines of businesses are different and some can accept downtime others cannot. If you think of the likes of services we depend on and use daily such as Google, Amazon, Ebay etc we would not accept downtime. This is a rapid change of thinking as a few years ago we might have accepted this but not now, not in our online 24/7/365 world, and as I always allude to this is due user experience and choice that we as end users have.

An Example of this is Google Mail which went down between 8:45 AM PT and 9:13 AM PT when they were upgrading some of their load balancing software which turned out to be flawed so they had to revert back to a previous version  Now why would Google  deploy code at that time? The answer is simple when you think of it, in the online world we now live in there is no acceptable window of downtime so companies like this are constantly rolling out code upgrades to give more benefits to the end users and the business.

So a particular section of this article intrigued me which was how Netflix works, the full article can be found here. Netflix employs a service they created called “Chaos Monkeys” and this is an open invitation to break systems and cause downtime, because their philosophy is “The best defense against major unexpected failures is to fail often”. So they learn from failures and by doing this systems become more resilient.

Netflix were quoted saying “Systems that contain and absorb many small failures without breaking and get more resilient over time are “anti fragile” as described in [Nassim] Taleb’s latest book,” explains Adrian Cockcroft of Netflix. “We run chaos monkeys and actively try to break our systems regularly so we find the weak spots. Most of the time our end users don’t notice the breakage we induce, and as a result we tend to survive large-scale outages better than more fragile services.”

So the Chaos Monkey seeks out and terminates instances of virtual machines (AWS in this case) on a schedule usually within quiet hours, but this means that Netflix learn where their application is weak and they can identify ways to then keep this service running despite what goes down.

The thing is failures happen, everyone accepts this but when you find out about your applications stability is critical as with the case of Netflix the videos must keep on streaming!

I really like the philosophy of a “Chaos Monkeys” and it has really intrigued me as this is a different perspective to what I have viewed and experienced with scheduled DR testing, what Netflix are essentially doing is constantly trying to bring down this service.

This got me thinking about EMC VPLEX, which is designed to give you an active-active data center but more importantly giving you outage avoidance through such mediums as a stretched HA cluster spanning geographic distances. When I think of Netflix and in particular their infrastructure  if automated VMware high availability restarted their services on the other cluster then the outage windows would be smaller as it would only need an application restart but they could maintain online services while still hunting for errors.

Everything I seem to read these days is about availability, cloud, users and demand. VPLEX is addressing this zero tolerance availability, I will post an article explaining more about VPLEX soon, in the meantime have a look here.

So to sign off enjoy your Xmas and New year!

 

 


43 Comments

Twitter’s Blob store and Libcrunch – how it works

TwitterYou may have read one of my previous posts “Arming the cloud” where I talked about why and how large cloud providers are using commodity hardware with intelligent API’s to separate the dumb data and intelligent data to give us a better service. Well in a world of distributed computing and networking you will probably not find larger than Twitter.

To me and you when we upload a photo to the cloud its in the “cloud” we do not care much for what goes on in the background all we care about is how long is takes to upload or download. And this has been Twitter’s challenge, how do they  keep all this data synchronized around the world to meet our immediate demands? It is a common problem of how do large-scale web and cloud environment’s allow users from anywhere in the world to use the photo sharing service overcoming latency which ultimately boils down to me and you waiting for the service to work.

So Twitter announced a new photo sharing platform, but what I am going to look at is how the company manage software and infrastructure to enable this service. Here is what Twitter released yesterday;

“When a user tweets a photo, we send the photo off to one of a set of Blobstore front-end servers. The front-end understands where a given photo needs to be written, and forwards it on to the servers responsible for actually storing the data. These storage servers, which we call storage nodes, write the photo to a disk and then inform a Metadata store that the image has been written and instruct it to record the information required to retrieve the photo. This Metadata store, which is a non-relational key-value store cluster with automatic multi-DC synchronization capabilities, spans across all of Twitter’s data centers providing a consistent view of the data that is in Blob store.”

Sound familiar to what I was discussing in my previous posts? Of course it is, this is a classic example of commoditizing storage\compute\network hardware and having the software API intelligently manage this data.

So what you have to consider with a platform like Twitter is speed and cost, they want users to be able to see the tweet with the picture as soon as possible but they have to be conscious of cost to deliver this service. Twitter has many data centers with many resources but the trade off is always going to be cost.

The next element of this is reliability, how do Twitter ensure that your photos exist in multiple locations on file but not too many to cost too much to Twitter, it also has to think about how and where it stores information on servers which indicate where the actual file exists (meta data). If we took the servers for example, and then thought about how many photos are uploaded to Twitter each day, that’s a lot of meta data to store, what if one of those servers then fails? Then you would lose all meta data and the service would be unavailable. To remedy this the original way of thinking is to replicate this data, but that is costly and time-consuming to keep synchronized and lets not forget will be using some serious space.

So Twitter introduced a library called “libcrunch” and here is what they had to say about it;

“Libcrunch understands the various data placement rules such as rack-awareness, understands how to replicate the data in way that minimizes risk of data loss while also maximizing the throughput of data recovery, and attempts to minimize the amount of data that needs to be moved upon any change in the cluster topology (such as when nodes are added or removed).”

Does that sound familiar again? This is the Atmos play from EMC which is using intelligent API’s to manage all aspects of an element of data, I referred to this last time as an “Object Store”, and the point of this that the API itself understands what to do with a particular piece of data in terms of replication, security, encryption and protection. So we are no longer administering pools of storage but the API is self managing itself, and in the case of Twitter you have to admit that this would be the only way of doing this.

So what does the infrastructure look like, well they use cheap hard drives to store the actual file and the meta data is served from EFD drives for increased speeds. Think of meta data as a search engine it allows you to find articles related to a query very quickly rather than looking at the entire web.

So to sum this up as we place more and more information in to the cloud which is a blend of distributed compute and network, locating information across them is becoming more difficult and slow. Thinking like this with API’s controlling the data according to policies is the right direction to take when using large cloud services.

If you are interested in looking at a cloud solution platform delivering intelligence like this go to EMC Atmos

 

 


Leave a comment

How we Could be Changing the Habit of Facebook

Social Media HabitsLet’s roll back to 2004 when Facebook was first launched, back then it was designed as a social media tool for Colleges in the US. Once Facebook started to gather momentum it was only then that people started to realize its potential. Just think of the last website you visited, did it have a “like” button for Facebook? Of course it did they all do! Let’s not forget Microsoft tried this tactic to track people’s usage of websites, and their bid was rejected yet here we are in 2012 with Facebook being in my view enormously advanced in tracking what its members do, where they go and what they like. The integration with Facebook has just become the normal.

Now it sounds like I am building up to bring down a crushing blow here, but I am a Facebook member and I get the social media revolution, I would argue that a “Corporate Twitter” should replace email as it is aging and in my view is a very old school way of thinking. Firstly let’s think of the power that Facebook has, when they do “Big data Analytic” they can gain so much information about you and your behavior  Social media is about our end-user experience but it is as much about potential buyers from Facebook knowing about your behavior.

One of my favourite examples of this is “Nike” running app on most phones, you get the “Nike” app and trackers for your shoes and then you can compete with 1000’s of people around the world posting your times to the boards and also Facebook. Now I am a huge fan of this, this in my eyes is what Technology should allow me to do which is reach and be able to do things I never have before. Okay so lets consider the value of this data, to us this is fun and competitive but behind the scenes is big business, if your phone is tracking your position, distance and speed then it is easy to say that they will be able to figure out your route and all the details about it.

So lets imagine you are a running shop owner, you want to open a new shop in Manhattan NY, how valuable would the collated data above be to you? If they could tell you how many people ran past a plot of land you were interested in, how many people were running in NY, where they stopped etc. The answer is invaluable, that gives you market data you would have never obtained before!

Now that to me is pretty cool. So to give this its official name it is Big Data, which in my view is analyzing information to find patterns and behavioral characteristics, this is what large computing is good at! How do you think Amazon know what to suggest when the pop up that little ad saying “people who bought this also bought”? The market for this is huge at present and is only getting bigger and bigger as companies have to become smarter when tackling consumergeddon! We all have choice, we all have so many resources at our finger tips. Big Data is what is understanding our behaviour, big data for me was not clear when first talked about, now I see this being used everywhere especially in social media.

So my very long point is, we today are creating so much data from social networking, locations, applications, online gaming……the list goes on and on. The one current theme though is that this is mostly mobile data.

The mobile market has developed so much, I remember when I first got a phone with a Camera and to be honest I thought “Big deal” but I did not grasp what was in front of me. The problem wasn’t the camera on the phone rather the software underneath, the only way I could show people was to put it on my computer, MMS (too pricey) or show them on my phone. Then Facebook came along and changed this game completely. My Facebook news feed is literally full of people posting photos if they are out, sightseeing, parties, what they are eating and all manner of other things. I can certainly not recall having that much visual content available to me at all! The same can be said about Twitter.

So finally on to Instagram, I do not know what you have seen in the news but as Facebook now owns Instagram they have amended their terms of service to include this “”To help us deliver interesting paid or sponsored content or promotions, you agree that a business may pay us to display your username, likeness, photos, in connection with paid or sponsored content or promotions, without any compensation to you”. Now alarm bells should be ringing, we all love and use Instagram, it’s an online photo collaboration tools with API tie in to Facebook and Twitter (although no more now). So imagine the content that is on there now! Huge amounts of value to be sold to someone somewhere.

So we have all took photos, edited them and shared them and now all our activities can be sold, this for one feels different to me, fair enough with things like “Nike” above etc I can understand the way that businesses will want and pay for that information. But what these new terms are doing is selling content that you have created. I must not be alone in thinking that this is not really what social media is all about, they could be personal photos etc.

So I threw this idea to my friend the other day “Could we see the next age of social media coming which is subscription based?”, now we both agreed that we are already there just look at LinkedIn. We also banded about the notion that “If Facebook was a subscriptions service would you pay”. Now for me this is a difficult one, as I started this blog I wrote about Facebook’s inception in 2004, imagine if I had turned around to you in 2004 and said “One Seventh of the world’s population will have a website page by 2012”. I don’t have to describe what you would think of me, yet here we sit with 1 Billion subscription users to Facebook and users that login several times a day and update\maintain their page.

Now imagine if that was taken away from you now, and this moved to a subscription. I would argue that people would pay for this as Facebook is now habit and we socialize through this medium. Or the other way of thinking is if Facebook remained free but sold all your images that you post, do you think they would retain members?

I think that we are at critical mass with this social media at the moment, I think we could soon see the horizon change. Companies such as App.net have spawned are a paid social media experience with none of the advertising  so already social media has tipped over the edge in to a paid service.

What if I was to break the mold here and say imagine 8 years from now would we be saying “Remember that Facebook thing?!.

Just my thoughts….


Leave a comment

ARMING THE CLOUD

The Cloud is Closer than you think

So the cloud is here, but are you moving with the times or are you behind in your thinking? It’s a question people will never admit to, but the reality is becoming very apparent that SAN and NAS do not scale to large clouds such as Amazon or AT&T. So how do the big guns do cloud?

So lets take a service such as Amazon who have one huge infrastructure which spans global data centers with one huge flexible namespace which can grow with no complexities and minimal management costs. Amazons new offering which is Amazon Glacier released back in September, which is now 1 penny per GB per month! How do you get costs down to that price point and still turn a profit, and to give you some perspective on how big the Amazon infrastructure is this year alone 260 Billion objects were added! Imagine trying to manage that with traditional thinking such as silos of storage in SAN and NAS storage? Amazon’s pricing has dropped 12 times this year alone and they just under cut the market every time.

So lets look at their thinking, one thing that all the big cloud names have in common is that they do not use file systems for their cloud, this includes Facebook, Twitter, E bay, Amazon, You Tube. So why do they not use this, the answer is simple cost and scale. These infrastructures are huge and when they set about creating their clouds they wanted massive scale 10’s of petabytes if not hundreds with minimal growth disruption and management over head.

So lets take a different angle for one moment, it was us the end-user which created this costing point, because technology is so readily available now, for example if you have a credit card you can get a server and some storage from Amazon in a matter of minutes, but they are by no means the only ones doing this. The consumer market is so diverse now that if the price point is not right we just move on it’s as simple as that. How many times have you checked the price of something on the internet while shopping in a store?

So back to the point lets use Amazon, they use an object based API which incorporates, security, encryption, security, billing and a policy engine talking to commodity x86 servers and commodity storage. Hardware fails we accept that as everyone does, but the key is the software at the top layer. Object based storage does not work like traditional file systems and it spans one single namespace meaning you can geographically disperse your data centers and have one giant object store which according to policies set it replicates and protects your data. The simplest way of explaining this is “Drop box”, we all love and use drop box and it may surprise you to know that it too uses Amazons philosophy as above. Policies are the key in this object based world, lets take your free 10GB subscription with Drop Box, as that is a “Free” service it is very unlikely that a copy of your data is made, it is replicated or encrypted and they do not guarantee it will always be there.  But what if I pay the fee per month? Well then you would have a paid policy which would replicate to another data center, encrypt your data and more importantly bill you on usage metrics such as bandwidth and space used.

Now this is the key component here Object storage is subject to policies, an object contains meta data and the content, the intelligent API’s look at the meta data and decide what to do with this data according to policies set. This is key to understanding the management of the cloud. Let’s take E Bay how many photos do you think get uploaded to E Bay every single minute? Well I imagine this number is huge, but how do you manage how long those photos stay there? Before policies E Bay were having to run jobs to delete these photos every night, but there came a point where they were doing this constantly, so with policies in the API they simply set one up to delete these after 3 months has passed. It is as simple as that, all that management has gone and is automated.

The technology that Amazon and E Bay use is EMC Atmos. It is the intelligent API with commodity hardware underneath defining it as a purpose-built cloud platform giving you up to 1.3PB per floor tile. Atmos allows you to easily scale your cloud over geographic distances as it acts as one great big storage pool with one namespace, the API abstraction layer takes care of all the storage calls so developers who are paving the way in browser-based applications which are WAN friendly do not have to care what goes on below the software layer. Atmos takes care of all this, so lets imagine you have 5 data centers globally all connected and your objects are behaving according to your policies and automatically billing your end users based on your policies set (security, replication etc), which you don’t have to back up, Isn’t that the way to do things? Just imagine trying to back up Amazons cloud……………….no thanks.

As the intelligence in an Object store and resilience is also built-in you can lose multiple drives or nodes and your service does not go down, People such as Amazon and E Bay accept that hardware eventually fails, so they just stock pile this and when drives fail they replace them eventually, as it is not critical. Has E Bay ever gone down? The answer is no and there is good reason for this if E Bay went down it would cost E Bay $3,900 per second!

So EMC Atmos is arming the cloud, and the service providers are monetizing this platform in to services that me and you consume every single day. SAN and NAS ways of thinking are fast becoming limiting in the way they can scale in comparison to Object stores and this is now in my personal view why service providers are switching on to this change, traditional service providers are offering things such as “back up to the cloud” etc, and what they need to be doing is appealing to the developers who have written so many of their programs for Amazon S3, as their applications could run on Atmos as it understands S3. This would enable them to keep with the curve in this changing marketplace. And the best part is that this Atmos API is yours, you can edit, modify it, do what ever you like with it to make it work for your company and give a portal to your end users, and bill them accordingly.

So to sign off is Amazon trail blazing the way ahead? No, they have just done this before anyone thought of it, and they are now so large that they can just dynamically grow, and everyone thinks that cloud is slow, but look at Amazon using commodity hardware and servers, the sheer scale means the amount of compute and storage available is all there for the taking!

 


Leave a comment

Digital Breakdown

Okay so lets start with this, how many times have you checked your smart phone in the last hour? Better yet lets develop it a little bit, how many times have you tweeted, emailed or been on Facebook?

You see what I am getting at, society is being driven by digital media and its all time consuming efforts. The term digital breakdown I think is a tad harsh here but the concept remains.

As I have mentioned before, I always seem to be interacting with some screen using an interface and generally I am trying to do this while doing multiple other tasks. So I hear you cry whats the problem with being connected, well I think being connected is great but constantly is becoming too much for me now. I have so many directions and applications that I literally dread looking at my phone sometimes as it is simply eating my time responding. If we put half the effort back in to our life that we spend maintaining that perfect Facebook wall we would be better off.

I wonder when we will see smartphone black out days? Can you imagine a no Facebook day? Would people survive, my point here is that social media and interfacing with digital devices now is habit. We have been opened up to this world of information at our finger tips on a huge array of devices and we are literally frightened of missing something.

One person has tried this who is Brad Feld who is an MD for a Foundry Group VC firm. He stayed away from his iPhone for 14 days and his comments are below:

“There’s some magic peace that comes over me when I’m not constantly looking at my iPhone. I really noticed it after two weeks of not doing it. After a few days of withdrawal, the calm appears. My brain is no longer jangly, the dopamine effect of “hey – another email, another tweet” goes away, and I actually am much faster at processing whatever I’ve got on a 27″ screen than on a little tiny thing that my v47 eyes are struggling to read.”

Now later on in his write up I think he stumbles across a point that people are missing, its not that we don’t want information its that we want it to be relevant and filtered. Peter Hinssen quoted “There is not too much information, simply a lack of an intelligent filter”. I agree with him, I love information yet I do wish that someone could effectively filter this information. As I highlighted in a recent post “Windows ” I believe they have taken the correct step forwards with this, but still for all its effectiveness it still doesn’t filter what is most important to me at that time.

One idea I am spinning around is that surely we are at the stage of where my iPhone can be intelligent? so why can it not look in to my recent posts, diary, schedules, and times and only give me what I need at that point, everything else can wait until I release them to read them. I find this a highly relevant concept, but an incredibly complex one. If you have ever seen the film “Minority Report” you may know what I mean, when Tom Cruise walks down the street and the advertisements pick up his eyes and propose a relevant advert.

Well that kind of data is already here and this is leveraging big data and analytic s  to intelligently suggest items to purchase based on buying patterns. Retail are trail blazing this as we become an ever increasing consumer price driven, comparing world. Anyway what I am saying is can we not get that intelligence in to our smart phones and create this filter.

Maybe someone has done this, but so far I have not discovered this and I will be surprised if I am the only one with this view.

That being said I am signing off and going to see how long I can keep the smartphone off for.

 


Leave a comment

Embrace the cloud, but be afraid of the cloud

Entering the Cloud Era, it’s a word for me which had little significance less than 3 years ago and now is all that seems to be pitched by technology companies. The giants such as Amazon and Microsoft Azure are the clear trail blazers here but why?

For a start Amazons business model is so adaptable it is the clear choice for start-ups who do not want to invest in compute\network\storage and prefer to have a pay as you go scheme. Interestingly the amount of customers I visit where by their developers use Amazon Web Services is quite interesting especially when I ask the question in a meeting and the answer turns out to be “yes”. You then see minds working over time, “cloud, is it safe? how much data do we have there? how much is it costing?”. This is the point with Cloud, it is great and the technology allows us to clearly be elastic. The old days of having to build more infrastructure are gone and this means that anyone can develop, run, and analyze their business using a variety of platforms as a service on offer.

One conversation and discussion point I have is “how do you keep costs down” this may not be an issue for large enterprises, but what if you are offering free services such as Flip Board? You don’t want to get in the situation where you have large amounts of machines running consuming resources and in turn you are getting big bills through your door. The cloud is great but like anything it needs to be planned correctly. We seem to have lost our way a little here in the sense that we have all these options but because its automated and next to no fuss we feel that it requires no planning?!

Well the simple fact is that it does and people are starting to wake up to this now. There are many ways to do this such as reserved instances instead of on demand as the cost model is different and when used effectively it is much more beneficial.

The next big sticking point of the cloud for many people is what do they send to the cloud? It is interesting that most enterprises have now begun to lock down sync and share applications. This is because quite simply people have denied that their customers have been using things like Google Drive and Drop box for too long now and they are slowly realizing that they need to control this. How much sensitive data could leave their organisation is quite worrying especially when people find out that in the T&C’s of Drop box they are permitted to look at your data, do analysis on it and when it is on their storage it is theirs! Scary! I personally am a little saddened by this as I really like the sync and share it is such a simple rather old school idea that has become a joy to use and moved people away from relying on email. Email is in every company a massive share of files that can span out of control and simply sending a cloud drive link is so much neater, simpler and cheaper!

On the plus side these sync and share applications have really been driven by the hand-held market with tablets and smart phones but this has spawned new companies and opportunities. Companies that can put corporate policies on your tablets and phones such as Zenprise, giving the control back to IT.

But the root of these AWS web services and drop box is that they are so easy to obtain, very little red tape and when used correctly they are extremely cost-effective. This compared to going to your IT department, requesting a service/technology, waiting for endless approvals, finally get it weeks if not months later. Any wonder why people use this? It does to me feel like IT restrict everything I do, and it frustrates me beyond belief, yet I can always find a way around it and get what I need done, but my mind tells me I should not have to do this!

The one aspect for me that remains is security, there are so many security concerns but you watch. I liken this analogy to cliff diving, everyone waits until one jumps first and its safe then they can all follow! As soon as someone in the financial or government decides to fully embrace this then everyone will follow!

To sum up, Cloud is here, accept it, we use it everyday in our lives and it is only getting bigger and smarter and personally I am pleased with this as it opens up technology more in to our lives and brings us racing towards the information and screen led society we will eventually become. Software is becoming smarter and more usable by the day and it will soon interface with our lives seamlessly and will probably run in the cloud.