Wednesday, 19 September 2012

Postcodes to grid references with Code-Point Open

I've recently been playing around with using the NBN Gateway webservice to implement something along the lines of ZipCodeZoo or the Postcode plants database. This project will involve using NBN Gateway data to provide a list of species for a given taxonomic group within a few kilometres of your location (more later).

Smartphones and modern browsers are able to report their current location for use in such services, bit what if you want to check what species are found near your home while you're at work? Being able to use post codes would be an additional  nice to have feature.

As part of the UK Government's Open Data initiatives the Ordnance Survey have released Code-Point Open - which is essentially a series of CSV files with data that allows you to move from a postcode to an OS Grid Reference.

An Aside....
The OS Grid Reference can be converted to OSGB36 latitude and longitude coordinates, which can then be converted to WGS84 coordinates (e.g. to map postcodes on Google Maps). This involves some pretty in depth mathematics, but thankfully people have made a number of libraries available for doing this, such as this JavaScript library and this PHP library.
The download of this data package contains in the Data folder 120 CSV files, named after the first text portion of the postcode (some information of the structure of UK postcodes).

This isn't really ideal (I want to get these into a MySQL database) so first of all I had to merge all of these files into a single CSV file. On a Mac or Linux machine this is as simple as navigating to the Data folder in the Terminal and executing the following command:
cat *.csv > aall.csv
The resulting all.csv file is pretty large (156MB-ish). Calling it aall.csv instead of all.csv just puts it at the top of the list which appeals to my aesthetic sense.

The next step is to create a new table in MySQL:

Then we import our aall.csv into MySQ:

This imports all of the data correctly into MySQL - however some of the fields are enclosed by quotation marks ("). All that remains for us to do is to trim these from the appropriate columns:

Now that the data is all in MySQL we can get the OS Grid Reference (i.e. the northings and eastings columns for a postcode) using the following query: