Blog

This is the Blog of the technical experts at GEN and its companies

How to annoy your visitors with Google ReCaptcha

Screenshot-2019-04-30-at-14.05.55

I'm not a robotFor many years now there has been a steady proliferation of Google ReCaptcha - A free service provided by Google which is used to verify that a human is actually filling out your form. It was annoying when it first arrived on the internet, but the latest rendition takes annoyance to a whole new level with poor quality images, multiple pages to select and more. So why do so many websites choose to irritate their visitors with Google ReCaptcha?

Well, firstly its free, and readily integrates with most hosting platforms. Secondly its thought to be effective and Finally for whatever reason people think it's a good idea. In reality, that's not at all the case, it is free, but there are serious privacy concerns and its not effective as it can be bypassed easily with a browser plug-in or broker service and finally I don't think there's a complete understanding of just how annoying it is especially for those on small screens or those with imperfect vision or hearing. But first let's talk about privacy as that's a hot topic these days. 

Privacy Concerns

If you click Privacy or Terms from the Google Re-Captcha box then your taken to generic Google Privacy or Terms which make no reference to ReCaptcha or what it will collect. This odd behaviour could only be by design. If you dig deeper into the Privacy Policy for ReCaptcha which is nearly impossible to find you discover the following. 

  • reCAPTCHA is a free service from Google that helps protect your website and app from spam and abuse by keeping automated software out of your website.
  • It does this by collecting personal information about users to determine whether they’re humans and not spam bots. reCAPTCHA checks to see if the computer or mobile device has a Google cookie placed on it. A reCAPTCHA-specific cookie gets placed on the user’s browser and a complete snapshot of the user’s browser window is captured.
  • Browser and user information collected includes: All cookies placed by Google in the last 6 months CSS information The language/date Installed plug-ins All Javascript objects

Blimey, who knew? After reading that do you still believe Google Re-Captcha is a good idea for your website? 

  • The Google reCAPTCHA Terms of Service doesn’t explicitly require a Privacy Policy. However, it has the requirement that if you use reCAPTCHA you will “provide any necessary notices or consents for the collection and sharing of this data with Google

But this is often if not always overlooked by website owners, in fact I cannot think of a single website using ReCaptch that actually notifies you prior to its use that your going to be sharing a bunch of data with Google just by clicking "I'm not a Robot". Let's review and expand on the Privacy Policy and what is collected...

  • A complete snapshot of the users browser window captured pixel by pixel
  • All Cookies placed by Google over the last 6 months are captured and stored and an Additional Cookie is stored. 
  • How many mouse clicks or touches you've made
  • The CSS Information for the page, including but not limited to your stylesheets and third party style sheets. 
  • The Date, Time, Language, Browser you're using and of course your IP Address. 
  • Any plug-ins you have installed in the browser (for some browsers)
  • ALL Javascript including your own custom code and that of third parties. 

So at this point, you as a website owner are obligated to disclose to your users that by clicking on the I'm not a robot re-captcha you as a visitor AGREE to all the above being shared with Google, which is not only an inconvenience but pretty much no one does it because in most cases they don't fully understand what data is being shared. This can be a real problem especially in the EU now where GDPR has caused many websites to display mandatory and equally annoying cookie confirmations, and even restricts access to a large number of really useful sites from within the EU.

Annoyance

CrosswalksTry again laterIn a recent survey conducted by GEN with our business customers we included a question about Google ReCaptcha and asked users to rate how annoying it was from 1 to 10 with 10 being the most annoying, and we came back with 94% who though it was the most annoying. Now its a small sample set of a few thousand users but it does indicate a general appreciation of the inconvenience it presents. Personally, when I see the 'Im not a Robot' box unless its absolutely critical I'll just close the page and move on to something else, and this is a view shared collectively at this office as it probably is a most. 

For those outside of the USA, a crosswalk is what the Americans call a pedestrian Crossing, in the pictures its the white lines across the road but of course in most of the rest of the world these are black and white or black and yellow. This is a regular mis-understanding as is Palm Trees which are the trees with the leaves at the top, and never seen in many countries. 

If your Not a Robot and I am certainly not then its easy to wind up with the dialogue to the right after getting a couple of images incorrect, after which your screwed and cannot continue to submit your form without closing the browser, re-opening and filling the whole thing out again. That is really really Annoying. 

Alternatives

There are a whole myriad of alternatives to Google ReCaptch, most of which are self hosted and have none of the privacy issues associated with Google ReCaptcha. The general trend these days with Captcha is that its not required anymore since form submission mechanisms have evolved to use a hidden captcha which is in fact a generated seed on the form that is passed and validated server side on submission. A robot (or bot) would want to POST the form without filling it in which this hidden captcha easily defeats. Further validation of field types can pretty much eliminate bot POSTing and removes the need for anyone to click traffic lights, fire hydrants, store fronts or any other collection if images whilst providing Google with your personal information. 

SummaryCars

  • Google Re-Captcha is not infallible and can be defeated by browser plug-ins or brokers. 
  • Google Re-Captcha has serious privacy issues especially in Europe. 
  • Google Re-Captcha is annoying to visitors and deters customers. 
  • Google Re-Captcha can present images of such poor quality (to the left) that no one can accurately guess them. 

If you are using Google Re-Captcha on your website then look for alternatives, there are many out there and many of those will not require the customer to enter anything and work silently in the background. If you have a GEN Hosted website and would like assistance in replacing your Google Re-Captcha then please raise a ticket at the HelpDesk and we'll do our best to assist you. 

In writing this article, we rely on sources from Google's website and others. We make every effort to ensure accuracy but things do change especially terms and policies so be sure to check the current status. 

Continue reading
  7521 Hits
  2 Comments
Recent Comments
Guest — Baranee Bjoha
Great, I was just about to order food via UberEats and guess what... "Try Again Later" bullshit. I wonder how much business they l... Read More
Friday, 24 May 2019 12:04
Guest — Moe Badderman
reCAPTCHA is the biggest waster of time on the 'Net, but the lack of instruction for the comment form of this website is a runner-... Read More
Saturday, 30 May 2020 05:41
7521 Hits
2 Comments

Browser Cache, Transparent Proxies and more

Browser Cache, Transparent Proxies and more

One of the questions that comes up time and time again on the Helpdesk is, what is my cache, where is my cache and what am I supposed to do with it? 

Well, the question itself often arrives on the back of conversations with content providers and developers often around out of date content so its worth taking a few minutes to explain what the cache is, where it is and why it is. 

A cache, pronounced "Cash" is masterfully defined as "A hiding place used especially for storing provisions." or "A place for concealment and safekeeping, as of valuables." and that's not too far from the truth. The cache is indeed a place for storing provisions of the digital kind. You see the internet isn't anywhere near as fast as you experience it from a browser on your PC, and this is because the internet is just a collection of many different networks all connected together to provide a 'route' from your PC to the server at the end of a browser request. Let's look at this in more details now: 

When you type a url into your browser, for example http://www.gen.net.uk and press enter or go, the browser uses the operating system of your device to open a connection to www.gen.net.uk on port 80 (port 443 if https://) and request that page. The actual request sent to the remote server looks like this "GET / HTTP 1.1" which means get the page at / the default or index page and use HTTP 1.1 which is just a specification. The response from the server will be a HTML page which the browser then displays to you as the client. 

Now where does caching fit in here? Well, your browser when it receives the HTML page stores in locally in a cache (which is just a hidden folder on your pc) and with that it stores a date and time the page was retrieved. Now if you close the browser, open it again and again type in http://www.gen.net.uk then this time something magical happens; The browser realises that its just been to www.gen.net.uk and just received the page at / so rather than bother requesting it again it just returns the one it stored a few moments ago. Simple and fast right? 

Well, it get's a little more complex than that because the server when returning the page to the browser can in fact indicate whether or not the browser should cache it, and if it should then it can specify for how long the browser can cache it and indeed the page at www.gen.net.uk/ at the time of writing does not give any special instructions to your browser around caching. 

So, hopefully that's a little clearer, when you type in a url or follow a link if your browsers already been there recently then you'll get the cached version rather than the 'live' version unless the site specifically told the browser not to cache. This really becomes visible if you have your own website, and you or your developer has made changes but you just can't see them, its all in the cache. Clearing the cache is simple enough and can be found in your browsers menu's should you require it and issuing repeated refreshes (CTRL+R windows, CMD+R Apple) will also force the browser to reload the live page generally. 

Now as I said before the internet is no where near as fast as you experience it, and this is not only due to your browsers magic cache, its also due to internet service providers (mostly residential) using systems called 'transparent proxies'. This is another cache between you and the sites you browse and this cache is not optional and in many cases will not yield to servers requests not to cache. The transparent proxies intercept your requests as you make them, look to see if they have a copy of that page and of so serve it up as if it came from the server itself. Your browser has no idea its not a live page and neither do you. By using transparent proxy caching ISP's (Internet Service Providers) especially residential can significantly reduce the amount of bandwidth they use on their upstream (between them and the server). There are also, in this country at least, significant privacy concerns around transparent proxying because your ISP not only intercepts your requests but can keep a log of them tracked back to your IP Address, and therefore back to you so its a bit of a double whammy. There is a third layer of caching known as web accelerators that are sometimes used at the server side to speed up performacne by keeping a cache but this is under the control of the site owners and as such isn't an issue. 

How do you defeat this transparent proxying ? 

Well its not easy because the ISP has access to all the traffic you send and receive and can easily intercept not only your web requests, but your email too, although if your email is stored at Microsoft (hotmail, office 365 etc), google (gmail, etc), Yahoo, AOL and so on, then its already compromised many times over and this really isn't going to make any difference. There are however tools that can cut through the proxies by establishing a 'tunnel' between your browser and a server in another country and from there making browser requests and I am of course talking about VPN's, the most common of which is the Tor Project (https://www.torproject.org/) but having said that, the tor project based in the USA is probably not going to be filling you with overwhelming confidence in the privacy of your data but its the best we've got unless you want to spend some real money in which case you can establish real VPN's to real secure proxies and have true anonymity online. 

I think its also worth mentioning that browser plugins such as Addblock, Ghostry, Web of Trust to name a few and of course Microsoft's own 'safe browsing' nonsense also hijack every URL you visit and pass that url back to central servers somewhere giving them also a full history of your browser habits but by themselves they can't tie that data back to you personally. That is, they know that a PC on the internet with a unique ID visits these websites but without help from your ISP they can't tie that information specifically back to you as a person unless of course you login to your Facebook, Google+, twitter and so on using the same PC in which case they can now easily tie your browsing habits back to you personally the only difference is that your ISP has your postal address and generally people aren't stupid enough to enter that sort of thing into Facebook, google+ or twitter. 

So here concludes this little discussion around caching that has taken a sideways step into privacy and anonymity but its all connected of course. 

Continue reading
  2711 Hits
  0 Comments
2711 Hits
0 Comments