One of the questions that comes up time and time again on the Helpdesk is, what is my cache, where is my cache and what am I supposed to do with it?
Well, the question itself often arrives on the back of conversations with content providers and developers often around out of date content so its worth taking a few minutes to explain what the cache is, where it is and why it is.
A cache, pronounced "Cash" is masterfully defined as "A hiding place used especially for storing provisions." or "A place for concealment and safekeeping, as of valuables." and that's not too far from the truth. The cache is indeed a place for storing provisions of the digital kind. You see the internet isn't anywhere near as fast as you experience it from a browser on your PC, and this is because the internet is just a collection of many different networks all connected together to provide a 'route' from your PC to the server at the end of a browser request. Let's look at this in more details now:
When you type a url into your browser, for example http://www.gen.net.uk and press enter or go, the browser uses the operating system of your device to open a connection to www.gen.net.uk on port 80 (port 443 if https://) and request that page. The actual request sent to the remote server looks like this "GET / HTTP 1.1" which means get the page at / the default or index page and use HTTP 1.1 which is just a specification. The response from the server will be a HTML page which the browser then displays to you as the client.
Now where does caching fit in here? Well, your browser when it receives the HTML page stores in locally in a cache (which is just a hidden folder on your pc) and with that it stores a date and time the page was retrieved. Now if you close the browser, open it again and again type in http://www.gen.net.uk then this time something magical happens; The browser realises that its just been to www.gen.net.uk and just received the page at / so rather than bother requesting it again it just returns the one it stored a few moments ago. Simple and fast right?
Well, it get's a little more complex than that because the server when returning the page to the browser can in fact indicate whether or not the browser should cache it, and if it should then it can specify for how long the browser can cache it and indeed the page at www.gen.net.uk/ at the time of writing does not give any special instructions to your browser around caching.
So, hopefully that's a little clearer, when you type in a url or follow a link if your browsers already been there recently then you'll get the cached version rather than the 'live' version unless the site specifically told the browser not to cache. This really becomes visible if you have your own website, and you or your developer has made changes but you just can't see them, its all in the cache. Clearing the cache is simple enough and can be found in your browsers menu's should you require it and issuing repeated refreshes (CTRL+R windows, CMD+R Apple) will also force the browser to reload the live page generally.
Now as I said before the internet is no where near as fast as you experience it, and this is not only due to your browsers magic cache, its also due to internet service providers (mostly residential) using systems called 'transparent proxies'. This is another cache between you and the sites you browse and this cache is not optional and in many cases will not yield to servers requests not to cache. The transparent proxies intercept your requests as you make them, look to see if they have a copy of that page and of so serve it up as if it came from the server itself. Your browser has no idea its not a live page and neither do you. By using transparent proxy caching ISP's (Internet Service Providers) especially residential can significantly reduce the amount of bandwidth they use on their upstream (between them and the server). There are also, in this country at least, significant privacy concerns around transparent proxying because your ISP not only intercepts your requests but can keep a log of them tracked back to your IP Address, and therefore back to you so its a bit of a double whammy. There is a third layer of caching known as web accelerators that are sometimes used at the server side to speed up performacne by keeping a cache but this is under the control of the site owners and as such isn't an issue.
How do you defeat this transparent proxying ?
Well its not easy because the ISP has access to all the traffic you send and receive and can easily intercept not only your web requests, but your email too, although if your email is stored at Microsoft (hotmail, office 365 etc), google (gmail, etc), Yahoo, AOL and so on, then its already compromised many times over and this really isn't going to make any difference. There are however tools that can cut through the proxies by establishing a 'tunnel' between your browser and a server in another country and from there making browser requests and I am of course talking about VPN's, the most common of which is the Tor Project (https://www.torproject.org/) but having said that, the tor project based in the USA is probably not going to be filling you with overwhelming confidence in the privacy of your data but its the best we've got unless you want to spend some real money in which case you can establish real VPN's to real secure proxies and have true anonymity online.
I think its also worth mentioning that browser plugins such as Addblock, Ghostry, Web of Trust to name a few and of course Microsoft's own 'safe browsing' nonsense also hijack every URL you visit and pass that url back to central servers somewhere giving them also a full history of your browser habits but by themselves they can't tie that data back to you personally. That is, they know that a PC on the internet with a unique ID visits these websites but without help from your ISP they can't tie that information specifically back to you as a person unless of course you login to your Facebook, Google+, twitter and so on using the same PC in which case they can now easily tie your browsing habits back to you personally the only difference is that your ISP has your postal address and generally people aren't stupid enough to enter that sort of thing into Facebook, google+ or twitter.
So here concludes this little discussion around caching that has taken a sideways step into privacy and anonymity but its all connected of course.