pinit_fg_en_rect_red_20 GEN - Big Data Processing

bigdata GEN - Big Data Processing

 GEN have invested significantly in our Data Processing facilities and currently have two core database processing clusters, one using Microsoft SQL Server 2014 and one based on MariaDB AX and CLusterIX.

The scale of the processing power allows us to process vast amounts of data quickly returning results from data cubes and data lakes in seconds instead of hours. Leveraging this huge computing power allows us to take on subcontracted Data Processing from companies who want one time or multi time processing. Further to this we have locally stored global datasources for key data allowing us to validate, cross process and reconcile customer data with industry standard data sources.  

Our Local Data Sources

UK Post Code Data:The UK Post Code Database gives us data on every street in the country including but not limited to its map location, number of properties, business or residential, District, Ward and Locality Data. Routes and Distances and County/City Council Zoning. From this data we have grid references and co-ordinates for every street in the UK with various postcode formats. 

 

UK Demographic Data: With the data from Cameo we can, by postcode or locality determine demographic such as average house price, affluence, property types, as well as classifications of people within those zones such as working, non-working, retired, and ethnicity & language data. 

UK Census Data: The Census gives us reasonably reliable data per household such as number of members, ethnicity, languages, religion, as well as employment status and zoning data. 

UK Companies Data: Every limited company in the UK is available together with Directors information and SIC Codes. 

UK Telephony Data: CLI to Exchange and Provider data as well as broadband and service availability by postcode or CLI and in many cases current broadband and telephony provider. We also have some data on SIP customers, coverage etc. 

UK Financial Data: Affluence, Debt, Finance and Income data by Street, Locality or Area is available to give detailed spend and debt statistics by street. Knowing who your customers are, what they spend and receive and how likely they are to consider your product or service can make all the difference. 

Email and Address Data: We have around 7 million email addresses most of which are linked to property addresses and whilst we can't and won't sell these details we can use them to validate your lists against locality, provider, household and more. 

Combining these datasets allows us to progressively focus processing on the specific needs of the customer. A typical example would be a customer with a calling list which we can process against our Postcode, Demographic and Census data to grade customers based on their likelihood of purchasing a particular offering or indeed their need of a particular offering. Another example would be to wash a customers data returning only those records where sales are possible or likely. Validation of email lists, customer data, homemovers and so on is all possible. 

Specialist Processing

In the past we've taken on some challenging processes including facial recognition from 960 hours of Video against a database of employees, Linking a list of 14 million email addresses back to full addresses and then adding average indebtedness, Identifying specific broadband availability by postcode and house number, Leveraging Mapping data to match GPS data streams to a specific street and many more. The harder it is to do the more we like it! 

We take on data processing on a case by case basis and have strict controls in place to ensure information is tightly controlled both in and out of the system. Automated processing can be setup using FTP data transfer to receive data, process it and return it via the same method which allows companies to repeat processing daily, weekly or whenever needed ensuring up-to-date data. 

If you have a big data project and would like to discuss how we can help then contact us today! 

api

GEN have for decades now had a comprehensive data processing and analysis service and data supply both inwards and outwards can be via 

SFTP (Secure File Transfer Protocol), a mature system for transferring data into, out of and synchronised with a directory structure. SFTP is widely support and easily automated to allow programmed delivery and retrieval of data over an encrypted connection. SFTP Can support authentication by username/password and/or client certificate, and further allows restrictions by IP. 

WebDAV (Web-based Distributed Authoring and Versioning), a less mature but equally flexible protocol allows file transfer over standard HTTPS channels. Automation of WebDAV is again easily automated but its support is still limited to Windows and Linux (so far). 

RSYNC (Remote SYNC), a mature system for transferring data between systems and generally assumes a synchronisation approach.RSYNC is always automated and can operate over secure channels, but the security is weaker and therefore is generally only operated over encrypted VPN's. 

API (Application Programming Interface), an evolving technology allowing different systems to exchange data ONLY in an automated way over the encrypted HTTPS channel. API's allow disparate systems to exchange data in a simple 'transfer' mode, but also permits more complex functionality such as query based retrieval and realtime processing. 

GEN's Arkane Data Exchange API

We have since our inception had a need to exchange information between customers, and as some may know we started the company in the world of EDI over X.400. Our first data exchange API was ASCII based and over X.25, allowing customers who couldn't stretch for an X.400 connection to still exchange data with X.400 EDI companies. Now X.400 is long gone as is X.25, but we've maintained and expanded on the API year after year and today the Arkane API allows customer A to send data securely to GEN or customer B securely and confidently. The Arkane API allows virtually any payload to be exchanged, whether that be a CSV, XML, JSON or BINARY file, or maybe a command or query or request. Our API also affords an interface into our Big Data Cluster for realtime data appending, validation and cleansing as well as customised functions such as checking a postcode for broadband connectivity, or mobile signal coverage, etc. 

An example API Request would be 

POST arkane.php?target=M487395P4C&urn=U123456799&type=CSV HTTP/1.1
Key: AAAAAAA
Content-Type: text/plain
User-Agent: Some System
Accept: */*
Cache-Control: no-cache
Token: dc803a35-492e-40a1-93c2-a5395328523e,3952e19c-bd09-4c39-9977-a5115c3cd5ec
Accept-Encoding: gzip, deflate
Content-Length: 88
Connection: keep-alive
cache-control: no-cache

Title, Forename, Surname, Phone
Mr,Fred,Bloggs,01159339001
Mr,Joe,Soap,01159339000

In our example we are sending a CSV payload to target (Customer / System) M487395P4C, with a URN (Unique Reference Number / Transaction Number) of U123456799. The actual payload can be seen at the bottom. Our system would receive the payload, store it and then transmit it to the target system OR hold it until the target system requested it. This example is of course completely fictitious but serves to show the simplicity of the API in its basic operation. 

Our Arkane API is heavily authenticated and requires customers to complete a registration process, submit a data flow diagram and document their requirements, after which our team will setup authentication tokens and enable the features requested, and/or build the application required.