Wednesday, May 18, 2011

Anonymous user data

Increasingly it seems that users are concerned about the safety and privacy of their data stored online in "the cloud." The recent widely-publicised leak of name, address and possibly credit card details for around 100m users of Sony's Playstation Network is just the latest in a series of data breaches. Multi-national companies and governments on both sides of the Atlantic are just as vulnerable as small companies and individuals.

But what to do about it?

As a matter of principle, we have set out to put user data under user control, stored safely in an encrypted database that can be securely backed up. We let the user choose what to share, and with whom; and we know only the absolute number of users, not who they are - the link between their 200-digit UserID number and their real name is made only on their local machine...

Our goal is to make user data anonymous. But this is harder said than done and it's almost impossible to ensure that all user data is utterly anonymous. An article by Pete Warden asserts "Precisely because there are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone's actions has a good chance of matching identifiable public records."

The idea is simply to map information in one set of data and to cross-reference it with other available information to form conclusions: The sort of activity that helped Allies win WWII when carried out in a military context, but now available to marketers and made easier by powerful computers able to crunch through increasingly large and public datasets.

So it may in practice be hard to ensure the anonymity of users. And does it really matter if others can work out that I'm a customer, say, of this company rather than that one - it's hardly life-threatening info!

The frustration is that users care in different contexts: on the one hand, I object if I can't get easy WiFi access to the Internet wherever I am; on the other hand, attempts by Apple and Google to crowd-source information about where those open access points lie (by gathering information through roaming users' hand sets) leads to a public outcry.

Nothing's new: we want to have cake and eat it! Perhaps the lesson for now is that ethics is firmly on the consumer agenda, even though the majority haven't studied or thought about moral philosophy! And the way users react to the products we build will depend in part on how we treat them, inform them and present what we've built to them.