Be careful who you trust with your data !
This post was originally published on AIIM's Expert Blogs by Serge Huber, CTO at Jahia Solutions
Nowadays, especially with an on-going global economic crisis, companies are under a lot of pressure to reduce costs, and hope to do so with the many services available in the cloud: social services such as Facebook, Twitter, Pinterest, data hosting services such as Dropbox, iCloud, SkyDrive or even infrastructure services such as Amazon EC2, Google AppEngine and Microsoft Windows Azure are all very tempting products. But is it really a good idea to go into the cloud blindly, sacrificing control over your own data ?
For both enterprises and individuals, there is a tradeoff that too many forget they are making, or simply choose to ignore a little too easily: you are basically trusting some other company with your data, and you are giving it away !
As an individual, it is indeed tempting, and sometimes even fun, to use services such as Facebook to connect with other relatives and to stay in touch. This is actually one of the strengths of the social business model, to use existing customers to draw new ones. It is therefore quite interesting for the social service provider, and despite the infrastructure costs it can become an interesting business very quickly.
In the case of data hosting services, it is the practical availability of your data from anywhere that is indeed compelling, and to be simply able to access files using a browser and possibly even share them with others is very useful. On the hosting provider’s side, it can use the service to initially attract large amounts of customers using free offers, and then sell premium services such as additional storage space, versioning, retention policies or team sharing.
In all cases, unless you have already purchased and are hosting in-house your own data-sharing or social service solution, you are trusting a (potentially unknown) third party with your data. So this means that whatever you upload can be potentially viewed by others, copied, stolen or otherwise misused. This might seem pessimistic, but despite all the legal clauses, let’s not forget that all the data will be in the hands of humans, and that they are often the weakest security link.
Data retention and backup are also an issue: online services are often considered interesting as it is possible to outsource the administrative tasks of keeping a service running as well as dealing with backups, high availability and any other constraints that you may require. Backups of the data that you send will be stored long past the deletion you might think you have done on the system. So if you thought you had deleted all the data from a service, this is actually not true because it will survive in backups for as long as they are being retained by the third party. In some of the worst cases, although the third party service may be using encryption properly for the active data, it might not be doing so for backups, which is a potential security nightmare waiting to happen. As a cautionary tale, even Amazon that is known for its large datacenter has suffered some large failures repeatedly (affecting Netflix and Instagram !) and even suffered from partial data loss.
While on the subject of security, with all the recent news stories about highly visible public website such as LinkedIn or Yahoo Mail getting hacked, you might wonder if they are really better choices than trusting security in-house. Data security is actually really hard work and while it does make some sense to trust online suppliers to do a better job at it than an inexperienced in-house teams, they are also much more visible targets. Sometimes, you might also think that data is more secure in a third party but in reality it is not: examples such as the Microsoft Sidekick contacts loss or the Sony Playstation network hacks (25 millions users affected !) are scary reminders that cloud services may also be vulnerable. I highly recommend that you read the Web Application Hacker’s Handbook that is a real eye opener to the numerous potential vulnerabilities that plague online applications (the Yahoo hack for example was an SQL injection vulnerability, which is quite well-known vulnerability and even worse they were storing the passwords unencrypted !)
A lot of companies are looking at the problem from a content security classification angle. The highly secure data should always stay in-house, while the less critical data may freely flow in and out of the company. Although this might sound reasonable, it is practically very difficult to enforce, as it will require constant training and monitoring of employees to make sure that nothing goes from one classification to the other.
Also, do not forget that public companies such as Facebook, Google, LinkedIn or others - when sending the data into their services - are basically storing it however they see fit, and they are not really keen to offer a full-blown interface to manage the data yourself (for example to expire content after a predefined expiration period). One of the main problems with such systems is users that have passed away. A data-recuperation policy must be put in place to handle such cases, or more usually manual work must happen to properly deal with the data. So in effect these sites are new content silos, where data goes in but it rarely comes out, and even less so at the request of the end-user (Actually Google is a little better than the others at this since they do have a way to request access to all the data, but I doubt it is well-known or even well-used).
In-house solutions offer more peace of mind in terms of data control, but might evolve more slowly as their administration is not necessarily the business’ core activity. So this is why the move to cloud services is so interesting, but it is really important that people be made aware of the real tradeoffs they are making. It is also getting much easier to maintenance and deploy new versions of enterprise software, which makes it possible for in-house solutions to stay up to date with the latest technologies and features (such as mobile access).
The real solution, I believe, lies in enterprise open source software. If companies can fully grasp the benefits of this type of solution, they will understand that they share infrastructure development with other companies, so that helps them reduce cost, while still being able to maintain full control over their data as it can stay in-house, and therefore not create new content silos, as the data is being manipulated using code that will always be available to anyone (a little interesting anecdote : did you know that OpenOffice is better at opening really old Microsoft Word files than Microsoft’s own product ? This happened because the closed format it was based on was so poorly documented that even Microsoft did not properly maintain it. Open source developers actually did all the documentation work and QA testing when they reverse engineered the format).
So please, be careful who you trust, you might be giving away your data which might seem like a quick short-term solution but that might not be the best course of action over the long term.
Author : Serge Huber
Serge Huber is co-founder and CTO of Jahia. He is a member of the Apache Software Foundation and Chair of the Apache Unomi project management committee as well as a co-chair of the OASIS Customer Data Platform Specification.