Blog

This is the Blog of the technical experts at GEN and its companies

Web Harvesting, List building and how to avoid it

Today at Technical Support

One of our customers raised a ticket at the HelpDesk complaining of telemarketing calls on his managed VoIP telephony system. Some technical analysis later proved that the incoming calls were genuine calls and there was no security issue around the platform. The customer disclosed that his email was also inundated with spam which started around the same time and this points us to a completely different cause. During the next few updates and phone calls the customer disclosed that he'd recently had their website redesigned and paid for some form of 'marketing'. 

Taking a quick look at the website it was clear to see why they were suddenly victims of a spam attack; The website, although very pretty had their phone number (actually three of their phone numbers) in plain text on their contact form and again on their about page. Additionally, their email address was hard coded into the contact form. 

A quick web search using our favourite search engine of their telephone numbers showed them appearing in 192.com, yell.com, and various other 'indexes' that no one ever uses anymore in one form or another. This was apparently the 'marketing' they had paid for. 

Web Harvesting

Its fairly easy to write a program that will load a web page and save the contents to disk. Its fairly easy to take said contents and search through it for email addresses and telephone numbers. Now, imagine that same program started at google uk with a search for "engineering" and then just spidered (followed every link) saving the contents and then searching for email and phone numbers. That's exactly what web harvesting is, and spammers use it all the time to compile and sell lists of phone and email addresses to other spammers. 

So how do you prevent your contact information being harvested? 

Its actually as simple as you'd expect. Do not under any circumstances put your email address or telephone number on your website - ever. In days gone by we were able to put the telephone number in an image and obscure it that way, but with modern OCR systems like tesseract even that no longer works.

If you really absolutely must have your telephone number on your website then we can shield it by formatting it in such a way that simple searches won't see it (such as breaking it up into several parts and then having each part in a separate DIV/P)  or we can hide it behind a server side request using a captcha but both these options serve to confuse potential customers and does it give any benefit? Maybe from programatic web harvesting, but it won't stop list builders from Asia. 

List Builders from the far east? 

Yep, so qualified list's can be purchased for not a great deal of money from certain companies that do not use programatic based harvesting, but instead have a room full of staff who use search engines, find companies, and then compile lists. An example would be if you wanted a list of dentists in the south east then for a few hundred dollars your custom list can be provided. Its not going to be perfect, but the accuracy is going to be much better than web harvesting because someone has actually done some research. 

Is it legal? Yep it sure is as all the legislation to date only protects individuals and not businesses, and besides even the current legislation is next to worthless given the global nature of the internet. 

Contactless Contact 

Contact forms without contact information? Sounds like trouble to me, but in fact it isn't as a well designed and fast contact form will usually do the trick just fine. If you want that instant response then consider an inline chat system like tawk.to. 

But what about Google Places for Business, or Bings equivalent ? 

In order to have your business listed in either then you need to have a phone number, but it doesn't have to be geographic and it doesn't even have to work. We're listed in both of course but we've listed a non-geographic (08700) number which plays a message to say head on over to the website and that works just fine given that we get almost no traffic from the number over the year. 

 

 

Continue reading
  3021 Hits
  0 Comments
3021 Hits
0 Comments