This post was originally published on AIIM's Expert Blogs by Serge Huber, CTO at Jahia Solutions
Now that the buzz around the cloud has died down a little, and that its acceptance has grown significantly enough to be seriously considered by corporations as part of the realm of technical solutions to software deployment, we ask the question of whether you can trust the cloud, if the data you put out there can really be safe or not ?
There are nowadays many different kinds of cloud services, ranging from online backup to simple file sharing to personal digital spaces. Some even offer the possibility to share files either publicly or to smaller groups, making it really easy and straightforward to send a file to a friend, a group of colleagues or family, without the file limitations of common email systems. Some of the file-sharing systems even allow to collaboratively work on documents, as for example with Google Drive, which makes it very attractive to users that want to see in real-time who is doing what on which document.
As you can see, there are lots of different cloud services, and they can be attractive to many different types of customers, including corporate users, so the question of trust comes up pretty quickly. Can you trust a cloud service with your data, which are the cases where it's ok to share a document in the cloud and which are the cases where it is not ? What are the common pitfalls and errors that people make when "going into the cloud" ?
Before going any further, and as a lot of mystery still lies behind the "cloud" denomination, let me clarify what I describe as the cloud, or more precisely, cloud services. A cloud service is nothing more than a piece of software running on a server directly connected to the public internet, offering the usage of such web-based software to anyone. This service may be hosted on one machine or a cluster of computers, and may be located anywhere in the world (and in some cases it is even replicated and distributed in multiple geographic locations). The service provider will take care of all the infrastructure including trying to provide fail-safe scenarios, but even in the most secure cases they may still fail (even Google suffers outages, and Amazon Cloud failures can affect many well-known services it hosts).
So the first element to take a look at is availability. When using a cloud service, you will be directly dependant on the availability of such a service, and most people expect something like 100% availability. Now the service provider will try its best to provide that, but it may fail for many reasons, ranging from loss of internet connectivity on either side (yours or theirs), catastrophic failure in his infrastructure or even worse : being hacked. Availability is a hard problem, and can be very expensive, since failures are usually very bad for business and are not very well understood by users who've place their trusts in a remote (sometimes unknown) company.
But then again if you used an equivalent service hosted inside your own company, all the difficulties of the availability would be your own challenges, and while there are many advantages to that, it will require that you have the in-house skills and resources to properly manage the infrastructure. One of the usual problems in in-house hosting is improper planning for system load, and this can cause a lot of problems if the service doesn't reply fast enough as load grows. Cloud services on the other hand are usually more aware of these types of problems and generally better at anticipating them, so this might be a good example of when to trust them.
Of course the main problem of trust is in the data security area. When a third party company is hosting data that doesn't belong to it, there are all kinds of security concerns that arise. The first, and probably one of the most important one is the hosters local country legislation. Many developed countries have anti-terrorist laws that require companies to be able to provide access to any data they host in the case of an on-going anti-terrorist investigation. Because of this requirement, companies must introduce "government backdoors" into their systems, making it possible, even if the data is encrypted, to access it if need be. Because of these requirements, it means that someone you don't know inside a company you trust with your data has access to any of the content you have uploaded to their servers, even if it is encrypted by their service (but if you encrypt it yourself, you should be safer). Sure they have internal policies that these type of accesses are only to be used in mandated cases, but at least one person inside the company has complete access, and you must trust this person implicitly with your data.
If you take the example of the NSA and the revelations made by Edward Snowden, it is becoming more and more clear that systems administrators were supposed to "abide by the law and the rules", but actually without any clear public oversight this could possibly be easily abused (they even abused it for personal use) This is why giving full access to large amount of data can be a real power problem, and when the government is involved, it can be misused at the detriment of the general public.
You might want to then be very selective about the data you share in cloud services, but sometimes you might not even be aware of the data you're trusting to third parties. Did you know for example that in the next version of the Apple operating system (Mac OS X Maverick), all the password data will be hosted in their iCloud service by default? Of course it will be encrypted, but there is a strong chance that the government access backdoors will still be put in place somehow. The same is also true for all passwords in iOS device and Android devices, so in effect a lot of personal passwords are already being hosted in publicly available internet servers.
I have talked a lot about government access, but there are other risks with such services, and one of the most common one is the risk of being hacked. Public internet infrastructure is a very common and frequent target of malicious attacks, and it is only going to get worse as new tools and techniques are discovered to detect software and system vulnerabilities. For example, just recently, a new internet scanner was announced - Zmap - than can scan the entire public internet in under 45 minutes ! You can easily imagine that with such powerful tools many new attack vectors will be found and will be used against high profile services such as cloud hosting companies. So cloud services are good targets for hackers, but fortunately strong encryption can help here, especially if the service provider has been careful enough to make sure that the encryption keys and algorithms are safe even from major system penetration.
Of course private systems are also vulnerable to hacking, and even more vulnerable to social engineering. A good social engineer will convince a target that he works for the company and has legitimate need for accessing a system, and might convince an unsuspecting worker to provide him with credentials he should not have access to. In this regard the security's weakest link is the human, and while this might also happen within a cloud service provider, they will usually be more cautious about such attacks than an SME hosting all its data in-house.
So now that we've illustrated some of the drawbacks of hosting data in the cloud, should you still do it or is it just not worth it? As usual the most important thing in any decision making is the quality of the information to make an informed decision, and I think that being fully aware of the realistic problems of security is important in that regard. Also, the value of the data must be taken into account. If the data does get hacked or falls in the wrong hands, is it a real problem or not? Sometimes the answer might surprise you and some of the data might just be a little time sensitive because it is preparative work for some new product to be announced, but if it leaks it might not entirely be negative. So a good assessment of the value of the data is critical when deciding whether or not to host it in the cloud. Even for personal users the same criteria is true, since deciding or not to publish a photo on Facebook is equivalent to asking yourself the question if is it ok or not that this picture is potentially available publicly? If it's not ok, don't share it.
Finally, as presented in this post, no data is ever 100% safe. Even on personal computers that are not necessarily always connected to the internet, it is possible to access data in unwanted manners, and all users should know about such attack vectors when being trained to use computers (in this last case Trojan horses are the most common attack vector). The cloud adds additional vectors that could be problematic because they are not necessarily all known and might even change over time. So it is important to design a cloud hosting strategy that takes into account the potential damage that may occur. And for really sensitive data, the less people have access to it the better, and it should always stay in-house.
Suggested further reading
Here are some interesting books that you might be interested in that cover everything from cloud security to social engineering:
- The Art of Deception, http://en.wikipedia.org/wiki/The_Art_of_Deception
- The Art of Intrusion, http://en.wikipedia.org/wiki/The_Art_of_Intrusion
- Avogadro Corp: the singularity is closer than it appears, http://www.amazon.fr/Avogadro-Corp-Singularity-Appears-ebook/dp/B006ACIMQQ