After our post 'In defence of social media" which itself was a response to the disproportionate news coverage of Facebook specifically, there have been many responses generally accepting that it should have been common sense that nothing is 'free' but that there was a clear mis-understanding on how people are tracked online and what exactly is collected and by who. This isn't unreasonable because the whole tracking and collection industry is shady and insidious, and just for clarity I was correct when I said GDPR will make absolutely no difference. So, how about we look at a few specific examples of data capture from some big players in the market...
Let's start with Facebook, purely because it was the subject of recent news stories.
Facebook of course collects everything you feed into it, this includes you name, address, date of birth (if anyone actually uses their real date of birth), phone numbers, email addresses and so on. This data forms the root record (the record to which everything else is attached).
To the root record we then add everything you view, everything you like or dislike, everything you post (Images, Text, Links), every message you send and receive and every ad that is displayed or clicked.
Associations are also added, that's "Friends" and the interactions between you and your "Friends" are also logged and common interests or appearance in common photographs are also recorded.
If you are unfortunately enough to have used your Facebook 'login' to login to third party websites then a record of that site, when you use it and for how long is also included.
As you can see, Facebook stores pretty much everything you do and that's their business model, you get to waste hours of your life that you'll never get back and Facebook sells the data they collect from this activity. There's nothing wrong with this business model, it works and has been around for decades.
Pinterest, Instragram(which is now Facebook), Tumblr and so on
These sites, which are generally 'image' sites record everything you add into the profile, a to that they add everyone you follow, every image you view (and for how long) and further some of these scan the images uploaded, recognise faces and then form internal relationships between the images and users. There's nothing wrong with this business model either of course, except perhaps the fact that the moment you upload your image, its no longer your image but that still doesn't stop people using these services.
Now Twitter has been around for a few years and is basically a 'feed' services where you follow topics and people and you'll receive updates from them. Its a simple model yet an effective one. Twitter records your posts, reads, follows and followers. It also records every link you follow from posts. Twitter inserts 'ads' into your feed which is annoying but not a show stopper and these are of course paid for by the advertisers. The rest of twitters revenue comes from selling your data to third parties which is again a good sustainable business model. In the early days Twitter was wide open to abuse where 'fake' accounts were created in celebrity's names causing unsuspecting followers to be duped and further be directed to 'donation' or 'malware' sites but Twitter put a stop (mostly) to this by 'verifying' some celebrities to remove any confusion. Twitter also allows the embedding of links, audio and now video into the feed which is great but also brings with it a new set of challenges around protecting users but also provides additional tracking metrics.
Google is a huge company with many 'services' most of which are 'free' to use. Let's look at probably the most common service, the "search" engine. There's no denying that Google.com is a great search engine and if your looking for something a little obscure then its your go to engine, but let's look at what's captured.
When you Search on Google, the search term is recorded along with the results, which results you click on, and the time taken for that click. This simply makes associations of interest between your google profile (if you created one, or a unique identifier if you didn't). This in itself isn't really bad and you would expect them capture this information surely? This information (search history) is further used to focus future searches so the more you use it, the more likely you are to get more applicable results but this is the official line and don't ever believe that Google is the only search engine, its not. Because of the way Google adds sites to its index, sites with large budgets and resources always find their way to the top results even if they aren't applicable at all. Moreover, Google adjust results of political, social, personal or controversial searches to add their bias to the results you see, and many would argue that this 'bias' that most don't even realise is wrong on many levels. Some other search engines such as DuckDuckGo, etc often produce more evenly weighted results and without adding their bias which some may prefer.
Getting back to Google the company, we need to talk about google analytics which is yet another 'free' service allowing website owners to get insights into visitors which is actually really useful, but for that to work Google needs to be able to connect YOU as a person to that site which it does easily. This gives Google not only your search queries, results, and clicks but also now most websites you visit, when you visit them for how long and what you do on those sites. Now we're starting to collect some seriously valuable data and this is of course the business model again, you get lots of free services and Google makes money from advertisers and the data. Google allegedly purchased shopper data from MasterCard which again when augmented with your online profile just adds a wealth of additional behaviour data.
Other Services (Gmail, Google Docs, Groups, Google+, Google Drive, and so on)
Google offers a bunch of other 'free' services all of which are quite useful, but each bring yet more data to the profile they are maintaining on your behalf. Every email you send and receive via Gmail is scanned, stored and linked. Every document you add to Google Docs is scanned, stored and added, any file you store on Google Drive is scanned Stored and added, are you seeing a pattern here? Nothing you do on any Google service is private. How about Google Maps? A very useful tool if you want to find somewhere, but yet again everything you look at is recorded and added to your profile. If you have an Android phone then your location data is also added to your profile along with your messages, apps installed, app usage, contacts and so on. Google Home is a voice assistant and speaker for your home, but again anything you ask it is stored and added to your profile data.
YouTube (now owned by Google) again stores the video's you want, channels you watch, comments you make and so on.
Android, the phone operating system developed by Google has its own class of information leakage in that every app you install and use is tracked and unless you specifically disable it (and there's still a debate if you can disable it) then your location is tracked using your phone's GPS data. Mapping this allows Google to track all the places you visit, shops you visit and for how long.
Google Chrome is a web browser developed by Google and is again free to download and use. Within this browser there are options to 'store' your credentials and bookmarks in the Cloud and this does then of course give Google this data to further add to the profile. We also noticed that Chrome (unlike other browsers) created several local files storing your search history, browser history, and so on for reasons unknown. The files are unprotected meaning that we (or any malicious or otherwise software) can easily read them to obtain this information. At the time of writing we also noted weak protection of your stored passwords, but this isn't specific to Chome and several other browsers are also easy to crack.
So Google know what you search, what you view and for how long and how often, what you buy, what you look at but don't buy, how often you buy something, what you read, what you post and what posts you read, what pictures and video's you view, how often and from what websites.
Bing & Yahoo
Bing is a search engine that is pretty useless in fact and is even more unfairly weighted towards sites with $$$ and subsequently doesn't have any significant market share (about 7% at time of writing) but that doesn't mean that they don't store you searches, links clicked etc which they do. There's a 'relationship' between Microsoft and Yahoo which goes back several years and brings Yahoo results into the Bing search engine which is probably a good thing but this also brings Yahoo free services such as Yahoo Messenger, Yahoo Groups and so on into your search footprint. Yahoo itself has been bought and sold several times and the actual ownership is hard to pin down but we do know that the majority is owned by Oath inc (part of Verizon) at time of writing.
Generally speaking the use of Bing and Yahoo is fairly limited these days with about 4% market share (at time of writing) since Bing's search results are limited and Yahoo's reputation has been shredded with past data breaches. The use of Yahoo mail brings with it the same issues that Gmail has, your email's and everything in them are scanned and stored. Microsoft's Hotmail is exactly the same and why shouldn't it be so, its free after all. Yahoo's Geocities which is pretty much dead now and Yahoo Groups, if anyone still uses them, bring yet more profile cross linking with group 'Members' being associated by topic and post and of course you must have a 'yahoo' account to participate.
Internet Service Providers (BT, PlusNet, Virgin and so on)
Some reading this may not be aware that your Internet Service Provider has access to every website you visit. They do this via DNS which is the system that converts a domain name into an ip address. Unless you specifically override it your ISP will route your DNS requests to their servers which then accumulate your website requests against your 'session' which is your current IP Address linked to your account. Using SPI (Stateful Packet Inspection) your ISP can also record what you actually do online such as listening to music, watching video, making phone calls, instant messaging, and so on. All this data is accumulated and stored indefinitely and in this country at least is made available to law enforcement without a warrant.
The Amazon ecosystem is slightly different to the general model as there's no 'free' services, you need an account to be able to buy online, download books, listen to music or watch videos, but that doesn't mean the company won't collect your data because they do. Everything you search for on Amazon is stored and kept, everything you listen to, read or watch is stored and kept and all this profile data is used to target search responses and advertisements to your specific interests. Amazon don't make any guarantees not to sell your data (that I can find) so its safe to assume they probably do. Amazon also has 'Alexa' which further arguments the profile by storing what you ask and do with the devices but this in itself isn't bad and can be used to tailor responses based on your past history.
Local Government & Agencies
You may or may not know that your local council is at liberty to sell your personal data to anyone willing to pay. They call this the electoral roll but in fact its just a dump of all the people registered to vote + council tax payers. When you combine this with data from a company like Cameo you then introduce affluence and net worth, link that with Experien or Equifax and you now have credit worthiness, loans, mortgages, bank accounts and the list goes on, all free to purchase.
The DVLA is now also selling your details to companies so if you own or are the registered 'keeper' of a vehicle that data is now also up for grabs.
And of course the Census data, that you MUST complete legally is made available for sale to anyone who wants it and this is of course why the Government is exempt from GDPR along with the Police, the Military, and anyone else who you may want GDPR to actually apply to.
The payment provider allows easy transactions available on many websites and vendors. Paypal collects the product, price, location, currency, and store and records this at point of sale. Whilst this information can easily be justified, Paypal are at liberty to sell this data to anyone else which further compliments your online profile with validated purchases.
Since tracking to your personal profile is done via Fragments left on your computer, or cookies/sessions left by website's or even by your browser screen size and in a recent discovery by your sound card then allocating your activity to you is fairly good but there are some cases, especially in companies where internet access is proxied and where only a few 'login' to accounts that others activity can be falsely attributed to your or others profiles. I have personally seen this whilst writing this article when I requested all my activity from Google. Digging through it and remember I never use Google I found a bunch of searches performed as recently as earlier in the week that were from other users on the network which somehow wound up in MY profile. I have no idea how common this is in the real world.
There are some claims on social media that Google, Facebook and others are always 'listening' using the Microphone in your equipment, but this has largely been disproved by researchers at the time of writing this article. That doesn't mean it categorically does not happen or that it does, simply that the evidence to date suggests not.
Services such as VPN's and of course the ever popular Tor Browser are ways to obscure your real identity online, but you'll discover fairly quickly that the services above either don't work at all or are crippled deliberately. Google for example returns some made up message about unusual traffic. As VPN's come and go there will always be a short time before the services get blacklisted but this will never be a viable solution long term.
The sale of data and the data market
All of the above can produce fairly detailed and valuable profiles of your online activity but when the separate data collections are combined you start to have very complete profiles linked directly to an individual. This is what worries people more than Facebook and Google. Given that your data is bought and sold on a daily basis some of these companies have a complete record of everything you do online. Let's see what the total footprint of an average teenager today is
I think now you must be starting to understand how the data business works and how your pretty powerless to stop it without some radical changes to your lifestyle and even then its too late for most people. Its important to be aware that these companies have done nothing wrong, nothing illegal or even shady, they are all businesses and their business is your data. I personally like Facebook & Twitter and Google is a good search engine but YOU need to make informed decisions on what services you use online, and what information you surrender to those services.
Whether you believe it or not, Apple has taken a fairly adversarial approach to data protection, committing to protecting your data not only on your devices but also online with anti-tracking features in their browser (Safari), but in the scale of things and despite Apples best intentions it's not going to make very much difference in the end. The only way for Apple to make an effective dent in the data collection market would be to block all social media and search engines from users devices, which they won't do for obvious reasons and in the real world everyone has to make their own decisions on what they do and don't use.
The near future
Imagine that every shop you visit is recorded and shared? Its already a fact for online shopping and some phones already report you actual location every few minutes so its not a huge stretch to match your location data to retain stores, public houses, peoples homes, your home, daily commute and so on. It won't be long and you should know its coming.