Thursday, 24 October 2013

Testing the Arduino Intervalometer

So today I finally got around to testing the Arduino Intervalometer that I built last month (we might soon be using it in a project unrelated to timelapse work).

So here for your viewing delight are 564 frames of the Natural History Museum's Darwin Centre (plus a few accidental photographs of Lyme Regis that were accidentally left on the memory card).

The images (JPEG) were converted to a video on an Ubuntu virtual machine using the command:

mencoder "mf://*.JPG" -mf fps=25 -o timelpase.mpg -ovc x264

A modified version of this project, the Arduino Intervalometer and Camera Trap appears in my book Arduino for Biologists.

Wednesday, 16 October 2013

EXIF Custom: photo metadata import for Scratchpads and Drupal

In a recent paper I published (EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal) I described a Drupal module I have written that allows for the import of metadata embedded within images to Drupal fields. This, along with bulk image upload tools, allows for rapid publication of images.

The introduction to the paper is reproduced here:
"The use of embedded image metadata is becoming widespread in the biodiversity informatics community (e.g. Stafford et al. 2010 & Tulig et al. 2012), and is frequently used to describe the subject and licencing of images as well as for recording the 'tombstone metadata' (e.g. Introduction to Metadata) - when the image was created, last edited, who created it, and where and how it was created.
The eMonocot project ( makes use of the Scratchpads (Smith et al. 2011) infrastructure as a tool for collecting, curating, and creating content to be harvested by the eMonocot portal ( As part of this project hundreds of images with embedded metadata are being uploaded to a number of different Scratchpads, combined with images directly uploaded by partner communities, and exported en mass to the portal. For this to be technically feasible at scale images from varied, disparate sources need to have their metadata standardised as part of the bulk upload process.
There are three widespread image metadata formats that can be handled by this module. A subset of the EXIF standard (Camera and Imaging Products Association Standardization Committee 2010) specifies a method for tagging of images with metadata. This is widely used by device manufacturers to record both the make and model of the image capture device and also the device's settings when the image was captured (e.g. focal length, flash duration). The eXtensible Metadata Platform (XMP) was originally developed by Adobe Systems Incorporated and later adopted by the International Standards Organisation as ISO 16684-1:2012. It uses a data model defined in Adobe 2012 which is serialised in XML when embedded into files. The International Press Telecommunications Council defines the IPTC Core and Extension metadata standards (IPTC 2010).
An existing Drupal module, Exif (, provides a mechanism for displaying embedded image metadata on Drupal nodes, but does not provide a mechanism for mapping the metadata into fields. The import of embedded metadata into Scratchpads/Drupal fields is a requirement of the eMonocot project and is useful for the wider Scratchpads community as it allows for these data to be easily used by other Drupal modules (e.g. Views - and in other Scratchpads-specific functions such as our on-going work on implementing the ability to export data via DarwinCore Archives (GBIF DarwinCore Archives). There is a comparison of these two modules (and potentially other similar Drupal modules) at"
EXIF Custom on

Read the full paper (Biodiversity Data Journal)

Monday, 16 September 2013

Wednesday, 4 September 2013

WikiData Workshop: 28th September, London

  • Date: Saturday 28 September 2013, time TBC
  • Venue: Development House, 56-64 Leonard Street, London EC2A 4LT
  • Participants: All welcome!
  • Contacts: Any questions? Please contact Katie Chan on 020 7065 0990
 Further details and registration at:

Monday, 2 September 2013

Deleting all nodes of a content type using drush and devel

While developing the open source data logger (started here but expanded to post from Arduino to Drupal) the need to regularly delete all content of a given type arises fairly regularly. This can be done through the user interface with the Devel module but it is much quicker to use Drush:

drush genc 0 --kill --types=article

Sunday, 1 September 2013

Intervalometer for Canon 450D

This is a project to create an intervalometer for a Canon 450D (aka Digital Rebel XSi) camera using Arduino. An intervalometer triggers a camera to take a photograph every time a certain time interval has elapsed, and is therefor a useful tool for creating time-lapse videos using a DSLR.

The Canon 450D uses a 2.5mm stereo connection for off-camera control of automatic focus and the shutter (as well as infra-red, although this project only uses the 2.5mm connection). The thee rings of the 2.5mm jack are as follows:

sleeve (base)          ground
ring focus
tip shutter

The focus and shutter connections have a positive voltage in relation to ground in the normal state. They can be triggered by connecting them to ground. The focus is equivalent to half-pressing the shutter button on the camera, the shutter is equivalent to fully depressing the shutter button.

While it would be possible to directly connect these connections to Arduino pins I prefer to keep the Camera and Arduino electrically isolated - this allows for experimental Arduino code to be run without any risk of damaging the camera's on board circuitry. This is achieved by using a relay to make the connections between ground and both focus and shutter.

I have used Arduino pins 11 and 12 for the shutter and focus respectively. The first code I used to test the device was from Scott Kirkwood (CanonSlrIntervalometer) which works fine, but causes some unusual results on starting the device as pin 13 is also used for the status indicator LED by the Arduino. I have made the code I am using available on GitHub (Canon Intervalometer) and will update it as this project develops.

This first attempt is quite limited, and requires the Arduino to be reprogrammed when the interval time needs changing - but the code works and triggers the camera as expected. I have packaged the device in an enclosure, with just the USB connector of the Arduino (for power and/or reprogramming) and a  stereo 2.5mm panel mounted connector exposed.

Parts List
Arduino Uno (Rev 3)
2x Relay
2.5mm stereo connector (panel mount)
Connecting wires

Total Cost
Around £20 if you shop around and already have connecting wires and tool.

Update: A modified version of this project, the Arduino Intervalometer and Camera Trap appears in my book Arduino for Biologists.

Tuesday, 20 August 2013

Extension to Wikimedian-in-Residence project at the Natural History Museum

At the end of the initial Wikimedian-in-Residence project John's contract with the museum was extended to the middle of January 2014. He will be working part time (50%) on the Wikimedian-in-Residence project, funded by Wikimedia UK with the rest of his time spent on a project on the abyssal megafauna of the Clarion-Clipperton Fracture Zone with Gordon Paterson.

The official project page on Wikipedia can be found here: Wikipedia:GLAM/Natural History Museum and Science Museum

There is some background to the project and other information on my website: Wikipedian in Residence (Natural History Museum and Science Museum)

Transcribing letters from the NHM archive using Wikisource

As an experiment John Cummings, Wikimedian-in-Residence at the Natural History Museum has made a few selected scans from the museum's archive available for transcription on Wikisource.

To familiarise myself with Wikisource I have translated the following letter from Charles Harte to Walter Rothschild. Harte worked as an impresario for Mademoiselle Paula (the famous reptile conqueror), and was offering Rothschild the chance to buy a snake from her collection.
You can read the transcription over at wikisource: Mdlle Paula, the famous reptile conqueror
(Click on the page numbers at the bottom of that page to view the transcriptions)

There is some background to this letter in this blog post on the NHM website: Item of the month (October 2011) Paula conquerors a time gone by and an old press cutting from the Otago Daily Times: Reptile Handling for a Livelihood.


There is a list of other letters from the archive that you can have a go at translating on our GLAM project page.

Monday, 19 August 2013

Informatics Horizons: A digitial Natural History Museum in 10 years time

This was the final presentation at the Informatics Horizons event, and is a speculative look at what the digital offer of a natural history museum (not necessarily just the Natural History Museum) might look like in the future. Some key themes include not doing by hand what a computer can do for us (referring back to Vince's  presentation [video]), a closer synergy between scientists, public engagement and citizen scientists, and how we need to think - scope, scale, speed. Much of the inspiration for this talk came from Michael Edson's  talk at GLAM-WIKI 2013 (thanks!).

Informatics Horizons: Building Highways in the Informatics Landscape

This is the second talk I gave at the Informatics Horizons event at the Natural History Museum - which gives a brief overview of the DarwinCore Archive as a method for sharing biodiversity data and the advantages of this, primarily in supplying data to aggregators such as the eMonocot portal.

The slides are on SlideShare:

Thursday, 1 August 2013

Informatics Horizons: Scratchpads & Citizen Science

This is the first of my talks from the Informatics Horizons event at the Natural History Museum, London covering some work we have been on linking Scratchpads and Citizen Science. The video of the talk is on YouTube:

... and the slides are on SlideShare:

The talk summarises what we have done with citizen science in Scratchpads so far, particularly the following projects and events:
The project to create a mobile app for recording observational data in Scratchpads is in collaboration with anymals+plants.

An example of an automated feed from user designed hardware might be along these lines: Open Source Data Logger.

Monday, 22 July 2013

NHM Informatics Horizons

Informatics Horizons is an event on biodiversity informatics at the Natural History Museum, London focussing on the museum's work in this field. I will be giving three talks (one with Vince Smith).  John Cummings (Wikimedian in Residence) at the museum will also be talking. The event will be live streamed here on the day.

Sunday, 7 July 2013

The Informatics Landscape

Something I wrote for somewhere else:

The ‘informatics landscape’ has existed for centuries in card indexes and for decades in computer databases across the globe. These datasets were highly fragmented, geographically distant, hard to access and had no common data models. The 21st century has seen a number of projects try to aggregate information about biodiversity - allowing different datasets to be standardised, integrated and searched as one. The Global Biodiversity Informatics Facility (GBIF) does this for specimen and observation data, the Biodiversity Heritage Library aims to digitise historic literature, the Encyclopedia of Life is a portal into these data for non specialists and Scratchpads provide a tool for doing biodiversity research online.

The combined size of these datasets is impressive, and growing significantly. In order for them to be useful we must understand what species each item relates to. This seemingly trivial task is complicated by the the same organism have multiple names (synonyms) or by the same name being applied to multiple species (homonymy). To resolve these issues nomenclatural databases are being constructed (e.g. ZooBank for animals, International Plant Names Index for plants) that when properly integrated into the landscape will allow people to find all of the information about a species, no matter what name was used in the search term or the original work.

Alongside these core features (specimens, observations, literature, communication) of the landscape there is a whole ecosystem of other projects that people use to contribute data to the big projects (e.g. NBN Gateway) or analyse the data they contain (OBOE). This category also contains bibliographic tools such as Mendeley, dissemination tools such as Wikipedia, and the user-generated documentation, experience, advice and frustrations spread across blogs, Twitter and other social networking sites.

Scratchpads allow you to combine your data with information from EoL, GBIF and other partners. You can also share your data if you choose to with EoL, GBIF and any other project that accepts DarwinCore Archives, or perform analyses of your data using OBOE. We also make use of services from ZooBank, IPNI and others to make our tools more useful for our users.

Saturday, 18 May 2013

Writing for Wikipedia: an introductory workshop

An event by John Cummings - our Wikimedian in Residence at the Natural History Museum and Science Museum.

Originally posted at Physics and Maths info @ Imperial College London Library: Writing for Wikipedia: an introductory workshop

This 90 minute workshop, led by John Cummings (Wikimedian in Residence at the Natural History Museum and Science Museum) and other Wikimedia trainers will involve a short general introduction to the Wikipedia projects and a discussion of how they are created and developed, followed by a more in-depth practical session involving learning the basics of editing and engaging with other contributors.
During the session, Dr Steve Cook (Senior Teaching Fellow, Biology, Imperial College London) will talk about how he uses Wikipedia with undergraduate students and Professor Henry Rzepa (Professor of Computational Chemistry, Imperial College London) will also talk about his work with Wikipedia.
This workshop is aimed at academic staff, researchers, postdocs, teaching fellows, learning technologists and postgraduate research students.
Thursday 6 June 2013
10.00am – 11.30am
Central Library, South Kensington campus, Training Room 1
To book:
If you would like to attend please email Andrew Day to book your place. Joining instructions will be sent on booking.
For further information email Jenny Evans.

Silicon Snake Oil? Hindsight is always 20-20

Clifford Stoll is perhaps best known for tracking down Markus Hess - a German blach-hat computer hacker recruited by the KGB to provide US military secrets to the soviets. This story is told is Stoll's own book The Cuckoo's Egg, Katie Hafner and John Markoff's Cyberpunk as well as the Nova episode embedded below.

Recently I have started to read Stoll's second book Silicon Snake Oil (1995) - slated, at least by the publisher, as "The first book to question the inflated claims - and hidden costs - of the Internet."

Strong words indeed. But nearly 20 years on - how do Stoll's often charmingly  dystopian predictions hold up? Not very well.

" Well, I don't believe that phone books, newspapers, magazines, or corner video stores will disappear as computer networks spread. Nor do I think that my telephone will merge with my computer, to become some sort of information appliance."
 A pretty major prediction on page 11 and one that has clearly not stood the test of time: California stops automatic phone book delivery, newspapers struggle to find sustainable financial models, Blockbuster moves towards a more retail than rental model. I won't provide a link to the telephone/computer hybrid inofmation appliance - you're possibly reading this on it.
"Whether yo-yos, books, records or insurance, there are good reasons why business doesn't work over the Internet."
Of course this was written in a world before Amazon, eBay, comparethemarket and Yoyo Shop.

Stoll's is an astronomer, so perhaps his ideas on scientific research turned out to be closer to the mark?
"Researchers naturally save their best work to publish in jounrals and books, realizing that the review process ensures that better papers make into print. They're unlikely to post good, original stuff to the network first; somebody might swipe their material."
Well - concerns over data swiping are still around - but we also publish material online before publication using preprint servers such as ArXiv and are moving, albeit slowly and uncertainly, towards online, open, peer review.

Thankfully Stoll sees the funny side of all of this:
"Of my many mistakes, flubs, and howlers, few have been as public as my 1995 howler.

Wrong? Yep.
At the time, I was trying to speak against the tide of futuristic commentary on how The Internet Will Solve Our Problems.

Gives me pause. Most of my screwups have had limited publicity: Forgetting my lines in my 4th grade play. Misidentifying a Gilbert and Sullivan song while suddenly drafted to fill in as announcer on a classical radio station. Wasting a week hunting for planets interior to Mercury's orbit using an infrared system with a noise level so high that it couldn't possibly detect 'em. Heck - trying to dry my sneakers in a microwave oven (a quarter century later, there's still a smudge on the kitchen ceiling)

And, as I've laughed at others' foibles, I think back to some of my own cringeworthy contributions.

Now, whenever I think I know what's happening, I temper my thoughts: Might be wrong, Cliff...

Warm cheers to all,
-Cliff Stoll on a rainy Friday afternoon in Oakland"
Here's some newer, but just as enthusiasm rich, Stoll. If you want to predict the future: ask an experienced kindergarten teacher:

Wednesday, 8 May 2013

Lyme Regis Fossil Festival: The Preparations

Several people (TetZoo, NHM) have already blogged about the Lyme Regis Fossil Festival and how great it is. Instead of doing it again here are some things that had been occurring before and during but out of the public eye, mainly relating to the work of the official friends of the Fossil Festival: The Buckland Club.

For the second year running the Festival has had WiFi internet available in the marquee and around the Marine Theatre and the Cobb Centre. Last year we were fortunate to have Pao and Victor from the Quick Mesh Project in Barcelona to help. This time it was just me and Sam Bennet. The first full day of our holiday was spent wiring up the nodes to test them in the office of the Lyme Regis Development Trust, the second was spent with the ever helpful Chris (who had also taken a day off work) around the town - installing the nodes on buildings and inside the marquee and running ethernet cables through some troublesome runs. By lunchtime on day 2 everything was good to go.

Satellite Link
Eddie and Tony (media technicians at the NHM) setup a satellite link on the Cobb to beam back live video to the Nature Live events in the Attenborough Studio. This involved very quickly setting up some kit and a far greater amount of time hunting for the satellite with the satellite dish.

In this process there was a rather unfortunate exchange between Eddie and a seagull:

The Town Prepares
The event is celebrated by many places throughout the town. When in Lyme Regis we are all the adopted family of Rikey and Paddy. Rikey runs the famous Alice's Bear Shop, Paddy runs the infamous Fossil Workshop in its basement. For the festival weekend Paddy was allowed to have the window display.

Long lost potential relatives
The Festival is also a great time to meet friends old and new. After four years of nagging Richard Edmunds (Jurrasic Coast team / my Jurassic Dad) and I finally posed for a photograph together.

The Buckland Club contingent this year was a relatively small 14 requiring three properties in Lyme Regis organised by the legendary Jackie Skipper (plus the occassional sofa-surfer from Charmouth). The numbers were swollen on the Saturday and Sunday nights as we threw first a party then a BBQ for ourselves and a rather large number of people from the NHM and others associated with the Festival. We had our favourite party house named (at least by us) Cauli and Flowers after the fruit and vegetable shop that used to be below. No two rooms of the house are on the same level - making it resemble an Escher artwork.

The Cyclists
As if turning up to spend your holiday teaching people wasn't enough dedication Aoife and Sally decided to cycle to Lyme Regis from London (still time for a charitable donation here). There's more about their trip on Twitter at #london2lyme.

Sunday, 21 April 2013

An article is born.....

Way back in October 2008 I started a Wikipedia article on the Rosemary Leaf Beetle (Chrysolina americana) with barely a paragraph of text and a couple of references. It looked like this:

I then forgot all about it, until I started reviewing some of my old Wikipedia edits since we got a Wikipedian in Residence at the Natural History Museum. Since my original and minor contribution the article has grown, although it still could be improved upon greatly. Out of curiosity I pulled up what all of the edits have looked like - it's quite interesting to watch the community come together and make continual gradual improvements.

The final result (so far) is this - I'd encourage the coleopterists among you to go and improve it some more....

Creative Commons Licence
An article is born..... by Ed Baker is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Friday, 5 April 2013

Preserving knowledge: the brain dump

Once every few weeks I will have a conversation with somebody that goes along the lines of "we're not training any new X-ists and the existing people who know about X will soon retire". Quite often the experts in X have retired already and the main concern is that they will take a huge volume of knowledge to the grave that will take any successor many years to pick up. This is usually combined with some tale of the number of people working on X being in a steady decline.

Ignoring the fact that the number of people working on X may be in decline, and assuming that X is a scientific discipline, however parochial, there are some things to consider:

1) Much, or at least some, of the knowledge a person has will be in their publications. This knowledge will (should) not be lost and is available to the scientific community in perpetuity.

2) Skills such as manual or mental dexterity of any kind require practise, and provided some documentation exists of how to perform the task, repeated practise should do the trick. Perhaps it's dissecting something, perhaps its identifying something using a key.

3) What's left? Unpublished/unpublishable knowledge! 

It could be argued that this knowledge should be, or should have been, published but that's not really a satisfactory answer. There are pieces of knowledge that in isolation are useful to know, but would not be publishable alone. This knowledge is outside of the concept of a publon, it's a metaphorical quark.

Some platforms do exist for getting these ideas down in digital ink - one of the ideas we used to throw around in the early days of Scratchpads was creating a tool for taxonomists to share everything they might know and want to share (a brain dump).

It is impossible to predict what will be useful to who, some of the most read posts on this blog are quite arcane in their own way (Processing and USB ports /dev/ttyACM0, /dev/ttyACM1, .... doesn't sound riveting, even to me). The fact is they may be useful to someone - and they don't take long to put online (the blog post I just mentioned was actually as much an aide memoir to myself as an intentional sharing of content).

Some may argue that the real skill of an expert is in figuring things out for oneself, and indeed this is a crucial part of becoming an expert in anything. I would argue however that this exploration is potentially more fertile if it is done in the process of creating new knowledge, not just the rediscovery of what people used to know.

It's a pleasant surprise then to see that a 104 gardener is sharing his knowledge, learnt over nearly ten decades, to whoever might ask for it: Gardener, 104, takes to Twitter to share horticultural tips. 140 characters is sub-publon in size - but I'm betting the information will still be useful to many.

Thursday, 4 April 2013

Altmetric for Drupal

Recently I have been looking at how we measure contributions to science in a way that is more well-rounded than the h-index and similar initiatives. Most of this relates to how we measure a user's contributions to projects such as Scratchpads, ViBRANT and eMonocot.

The "alternative metrics" movement has been around for a number of years now, and one of the more established outfits is Altmetric who provide badges for research articles showing how much attention that article has received on a number of purely social (Twitter, Facebook) and 'academic social' (Mendeley, Connotea) networks.

As the badges are pretty easy to implement I have made a small Drupal module that displays an Altmetric badge on Biblio node pages, and provides a configuration page to allow the badges to be customised. The module is available here: Drupal biblio altmetric.

Monday, 25 March 2013

John Cummings begins work as Wikimedian in Residence at Natural History Museum and Science Museum

John Cummings radio interview

Reposted from the Wikimedia UK Blog:John Cummings begins work as Wikimedian in Residence

Wikimedia UK is very happy to report that John Cummings, a long-standing and well known Wikimedian, has begun his work as Wikimedian in Residence at the Science Museum and Natural History Museum.

This is a ground-breaking partnership between two of the UK’s most prestigious cultural institutions and the charity that promotes and supports Wikipedia and Wikimedia projects in the UK. His role with the museums will last for four months.

John said: “It’s a real privilege to work with institutions with such important places in the history and public understanding of science. I hope I will be able to help the museums in their goals.”

John is the co-founder and project leader for MonmouthpediA and Gibraltarpedia, the world’s first Wikipedia town and city, and he is a Wikimedia UK accredited trainer for communities and institutions.

He is also technical lead for Leaderwiki, a collaborative education resource for emerging leaders from all over the world who want to make a positive contribution in their communities.
John will be working with myself and the rest of the Biodiversity Informatics team at the NHM, as well as other staff from the across the museum. You can see what's happening here.

Saturday, 23 March 2013

Re-inventing the wheel: do we need a common infrastructure for museum digital?

Over the last few years (since around the eBiosphere conference) I have several times put together slides detailing the 'Informatics Landscape' of biological collections (there's an example here) and the ecosystem of projects that it, in some way, supports. Over the years projects have been and gone, and the informatics community has coalesced around a number of projects and initiatives: Biodiversity Heritage Library for legacy literature, GBIF for specimen and observational records, Encyclopedia of Life as an aggregator for the public, Scratchpads a platform for virtual research and data sharing.

In a recent Guardian piece (Digital pro bono: time for cultural giants to offer their services) and an earlier blog post (Wouldn’t it be cool if … ) Oonagh Murphy suggests that big cultural institutions could give some of their time to help smaller cultural institutions with their web presence. This is an idea, and would no doubt have a positive impact on the sector as a whole, but should we be looking more towards the biodiversity informatics community? Would it not be better to spend this time developing a shared, open, infrastructure of online tools that smaller museums, and perhaps even larger ones, could use?

If this was the case then we could create an environment for shared development. The cost of developing some piece of functionality could be spread amongst the museums who need it, at a reduced cost to each, and then freely shared with the rest of the community. Other institutions might realise they can tweak it for a different purpose, or develop it further to meet their own needs. It would be possible to create a new ecosystem of collaboration.

This could potentially be a similar model to the Scratchpads - take an existing project (in that case Drupal) which deals with much of the basics - and build on top of it a more specific set of tools that are of use to the cultural community. Some of these enhancements, if they are generic enough, can be released back to the Drupal community for other people to use in their many and diverse projects.

The advantage of this model is that things only need to be done once: develop mobile support and everybody using the platform has mobile support. Individual projects (sites) can brand their content as they wish and still make use of pooled resources and development.

Tuesday, 19 March 2013

Visualising an archive: Walter Rothschild's correspondence

Rothschild with his famed zebra (Equus burchelli) carriage, which he drove to Buckingham Palace to demonstrate the tame character of Zebras to the public

 As part of exploring possibilities for the Wikipedian in Residence project (more on this very soon) we were given some example data from the NHM archive catalogue relating to the correspondence of Walter Rothschild to see what potential there might be for digitisation and semantic linking of content. Having data with locality and time information means only one thing: time to dig out CartoDB and Torque!

Some background to the correspondence from the Tring Museum:
Tring Museum was a natural history museum owned by Walter, later 2nd Baron Rothschild, and which was donated to the Natural History museum in 1936.  It had been open to public since 1892. A large amount of papers, particularly letters from Walter were destroyed, so this correspondence is largely all that remains of the history of that museum.

This series is mostly letters to Walter and / or his curators Ernst Hartert and Karl Jordan, and is a fascinating collection with a wealth of information. Not only scientifically and historically
from the ornithologists and entomologists who wrote to Tring, but historically from the various institutions around the world, and also the economic history of the business of natural history, from the dealers, publishers and booksellers. There is also important social history to be studied about Tring Museum's relationship with local people and businesses, who visited, and were employed by the Museum.  The largest part relates to collectors, writing from all over the world about the expeditions they’re on and the specimens they are collecting for the museum – writing sometimes from war zones, during revolutions and uprisings, and from jungles and deserts.
Daisy Cunynghame - NHM Library and Archives
Example data:

18 Feb - 25 Jun 1903The Acetylene Supply Co"6 letters from The Acetylene Supply Co., 48 Cranbourne Street and 1 Bear Street, Leicester Square, London, England, United Kingdom. 3 of the letters addressed to Karl Jordan, 3 addressed to Ernst Hartert_x000D_ _x000D_ [Was previously reference number TM/1/69/1]"
1 Apr - 17 Dec 1903André & Sleigh Limited"2 letters to Ernst Hartert, 9 letters to Karl Jordan from André & Sleigh Limited, Photo-Engravers, Bushey, Hertfordshire, England, United Kingdom_x000D_ _x000D_ [Was previously reference number TM/1/69/5]"

The problem with this data, from a visualisation point of view, is that the addresses are not geolocated. Manually geolocating a large number of addresses would be a substantial task, perhaps best undertaken via a crowd-sourcing approach. Making a quick demonstrator to see what is potentially possible precludes the use of such an approach in this case.

Instead the online GeoCoder tool from was used to process the first 1,000 records of this dataset. This failed for a large number of the locations provided, but again, as this is only a demonstrator I just ignored the rows that failed.

The following map shows the results after the geocoding.

The geocoding of a few points (many of those shown as being in North America) is clearly wrong, however the vast majority have been correctly placed, as far as is possible.

Of course geolocating just gives us a way of visualising the archive in spatial dimensions, however we also have temporal data available, so this seemed like an obvious use for Torque on top of CartoDB. The video below (best viewed at 720px and fullscreen)  shows both the spatial and temporal extent of communication.

 Obviously to be a truly useful and accurate tool the data would need more rigorous processing, which would take considerably longer than creating this demonstration (which took less than a couple of hours). It does however show that visualisation tools can be useful in developing a deeper understanding or archive catalogue data.

On a (slightly) related note...
Daisy (who provided the summary of Tring Museum and Walter Rothschild above) has also written a piece about a namesake of mine that used to work for the museum as a collector: Item of the Month (July 2012) Edward Baker - One of Tring Museum's Daring Explorers. 

Measuring the Impact of Wikipedia for organisations (Part 3)

Previous posts in this series:
As mentioned in a previous post in this series I have downloaded all of the Wikipedia pages that make a direct link to the Natural History Museum website. While this is useful in attempting to measure the impact of the NHM and Wikipedia on each other this post is a little bit more for fun at this stage (although the data was collected for an upcoming project).

An obvious thing to do with these downloaded pages is scan for them links - then build a graph of the interconnections between them. The script I set about this task is taking a while - so I decided to see what I could summarise about a topic (Wikipedia page) based on the articles that page links to. In all of these examples the numbers are the number of links from the 'subject' page to the other page.

First up is the iconic Dippy (Diplodocus):

4 | Othniel_Charles_Marsh
3 | Carnegie_Museum_of_Natural_History
3 | Sauropod
3 | Walking_with_Dinosaurs
2 | Jurassic
2 | Diplodocidae
2 | Type_species
2 | John_Bell_Hatcher
2 | William_Jacob_Holland
2 | Diplodocid
2 | Fossil

These as a set seem to be a reasonable, high-level, summary of the Diplodocus.  There is a mixture of information that is technical (type species, Diplodocid), cultural (Walking with Dinosaurs) and about the discovery, description and display of the fossil (Marsh, Hatcher, etc).

Let's go for another species, the Holly Blue
3 | Lycaenidae
2 | Eurasia
2 | North_America
2 | India
2 |
2 | Holly_Blue
2 | Main_Page
2 | Wikipedia:About
1 | Biological_classification
1 | Animal
1 | Arthropod
This time the information is more about the biogeography and higher taxonomy, but nevertheless can be seen as a reasonable, if subjectively limited, summary of the species.

Time for something different: first up a member of NHM staff, Chris Stringer

2 | Archaeology
2 | Biological_anthropology
2 | Social_anthropology
2 | Cultural_anthropology
2 | Feminist_anthropology
2 | Fellow_of_the_Royal_Society
2 |
2 |
2 |
2 |
2 |

In short, a Fellow of the Royal Society who is an anthropologist and has written a number of books. In a purely professional sense: pretty much spot on.

So what does this kind of summary allow us to do? In a limited sense it allows us to make brief summaries of people, species and institutions that have a Wikipedia presence. But the real use comes when a large number of these analyses can be aggregated, queried and visualised. More of this another time, however here is a quick visualisation made from hacking the demos that come with arbor.js.

Full Screen Version

  Creative Commons Licence
Measuring the Impact of Wikipedia for organisations (Part 3) by Edward Baker is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Based on a work at

Sunday, 17 March 2013

Some links from Science Hackday London #shdl

A list of project pitches

EpiCollect (GitHub) provides a web application for the generation of forms and freely hosted project websites (using Google's AppEngine) for many kinds of mobile data collection projects. Data can be collected using multiple mobile phones running either the Android Operating system or the iPhone (using the EpiCollect mobile app) and all data can be synchronised from the phones and viewed centrally (using Google Maps) via the Project website or directly on the phones.

Online assistance in performing tasks that require human cognition, knowledge or intelligence such as image classification, transcription, geocoding and more!
  • Help advance research
  • Everything is open and freely usable
  • Things computers can't do

Yellowhammer Dialects (Czech Site)
What happens with birdsong during invasion of a new territory? To answer this question a citizen science project looks for volunteers to record yellowhammers in New Zealand and Great Britain to evaluate distribution of their dialects.

Konekta (GitHub)
Geolocate community services and make them available through a mobile site.

WAX Science
The WAX project’s goal is to launch an online collaborative platform with two main objectives :
  • To give a space to raise young people’s curiosity in the sciences area. With several participatory and fun approches, we want to support young people in letting their curiosity and natural motivations win back over. There will be contests, small expériences, vidéos, in the spirit of a science for everyone, revalued, but that points out the stereotypes. Because to fight something, one must be aware of it.
  • To give the possibility to the existing associations/initiatives/collectivities to get in touch with eachother and to know where to turn by drawing a map of what already exists, both in the field of popular science, but also on the theme of gender balance. By linking those initiatives on our website, we hope to raise the visibility of everyone of them and to catalyze the interactions !

Friday, 15 March 2013

Senior Developer at the Extreme Citizen Science grou

An interesting position!

Job title: Senior Developer at the Extreme Citizen Science group , - Ref:1320261

UCL Department / DivisionCivil, Environmental & Geomatic Engineering
Grade 8
Hours Full Time
Salary (inclusive of London allowance) £40,216 - £47,441 per annum

_Duties and Responsibilities_

We are looking for an experienced and talented Senior Programmer with knowledge of systems architecture and management to fill a 2-year vacancy to help our various research projects achieve the aims they set out to accomplish with bespoke and innovative technologies.
The main duties and responsibilities of the ExCiteS Senior Developer will include, but not be limited to the redevelopment of the Community Maps platform ( ) using open source and current technologies, administration of IT systems and server management, and providing assistance to the group in making decisions about technologies that will be used on various projects. The appointee will also be required to manage Linux servers, and advise on and be involved in development projects that aim to include people in the scientific process from the Inuit in Canada to the Pygmies of the Congo. The job includes guidance with the development team which includes MSc and PhD students and postdoctoral fellows.
The post is available for immediate start and is for 2 years in the 1st instance

_Key Requirements_

The candidate will have extensive experience working as a developer, ideally within standards-based projects and using Open Source technologies with project management. They will have to have extensive knowledge of up-to-date, open source, spatial and non-spatially enabled technologies, such as Linux, PostgreSQL/PostGIS, and OpenLayers/Leaflet and quickly pick up and adapt to new development environments, particularly as we wish to move into further HTML5 and mobile development, and basing some of our technologies on open APIs. They ideal candidate should be able to use object-oriented methodologies and tools to analyse, design and implement software tools, as well as experience in designing and implementing API architectures to further extend the current software systems. It is imperative that they are able to communicate technically complex information in an understandable way. They will also need to have a solid foundation in structures and standards, properly utilising code management systems (such as GitHub), designing robust code in an easily extensible way, and ensuring that the viability of solutions extend far beyond the lifetime of the research projects themselves.

Further Details A job description and person specification can be accessed at
If you have any queries regarding the vacancy or the application process, please contact Prof. Muki Haklay, , +44 (0)20 7679 2745.
We particularly welcome applications from black and minority ethnic candidates as they are under-represented within UCL at this level.
Closing Date: 14 Apr 2013
This appointment is subject to UCL Terms and Conditions of Service for Research and Support Staff.

Wednesday, 6 March 2013

Who owns biodiversity informatics? The Patents

I find it surprising how close some of these come to the core business of many biodiversity informaticians, and I suspect that there might be prior art in some cases. If you know of any I've missed put them in the comments and I'll add them.

Managing Taxonomic Information (US 7,650,327 B2)
Remsen, D.; Norton, C.
In a management of taxonomic information, a name that specifies an organism is identified. Based on the name and a database of organism names or classifications a link between pieces of biological identification information in the database, or a classification for the organism, is determined. Based on the other name or the classification, information associated with the organism is identified.

Information System for Biological and Life Sciences Research (Pending: US 2005/0038776 A1)
Cyrus, R.; Di Tommaso, M.; Kerlavage, A.R.; Lawrence, C.B. 
An online life science research environment and virtual community with a focus on design and analysis of biological experiments includes a life sciences laboratory system employing at least one networked computer system that defines a virtual research environment. Users access the system through a portal associated with the networked computer system(s). The virtual research environment has a data coupling mechanism by which the user designates a set of user-specified data for bioinformatics processing. A processor(s) associated with the networked computer system(s) performs bioinformatics services upon the user-specified data. In one embodiment, the data coupling mechanism enables transfer of user-specified data to a memory space that is mediated or accessed by the processor performing the bioinformatics processing. Users may this exploit bioinformatics processing resources that are not deployed on users' local computer environments, and to store and organize information relating to life sciences research in a secure, online workspace.

Systems and Methods for Resolving Ambiguity Between Names and Entities (US 7,9225,444 B2)
Garrity, G.; Lyons, C. 
The present invention provides systems and methods that utilize an information architecture for disambiguating scientific names and other classification labels and the entities to which those names are applied, as well as a means of accessing data on those entities in a networked environment using persistent, unique identifiers.

Systems and methods for automatically identifying and linking names in digital resources (Pending: US 2010/0198841 A1)
Parker, C; Lyons, C.; Roston, G.; Garrity, G. 
The present invention provides systems and methods for automatically identifying name-like-strings in digital resources, matching these name-like-string against a set of names held in an expertly curated database, and for those name-like-strings found in said database, enhancing the content by associating additional matter with the name, wherein said matter includes information about the names that is held within said database and pointers to other digital resources which include the same name and it synonyms.

Saturday, 2 March 2013

Measuring the Impact of Wikipedia for organisations (Part 2)

This post continues from Measuring the Impact of Wikipedia for organisations (Part 1) which looked at a number of statistics relating to page views and links using linkypedia (well - a slightly customised version of linkypedia).

Part of my reasons for doing this might have become clear based on a subsequent post on this blog: Wikimedian in Residence at NHM.

This post uses a  feature I added to linkypedia to save a copy of pages that link to the NHM website into a database. This allows for some quick queries to identify both the type of pages, and the content they contain.

13580 pages have links to the domain

This includes (type of page, number of pages):

User pages 44
User talk pages 39
WikiProjects 2
WikiProjects pages 6
WikiProjects talk pages 20
Wikipedia Signpost 3
Village Pump 1
Reference Desk 9
Graphics Lab 1
Copyright Problems 3
Suspected Copyright Violtaions 2
Possibly unfree files 2
Media copyright questions 1
Articles for creation 2
Featured article candidates 4

Examples of other queries that can be run:

Biota InfoBox 12768  (can be assumed to be good indicator of pages about a taxon)
Type specimen 52
Lepidoptera 12773
Stub 12412
Lepidoptera stub 12190

This looks like the NHM has quite a sizeable Wikipedia footprint, however  a huge majority of these are stub lepidoptera pages with very little content besides a link back to a project on the NHM website.

Sample stub lepidoptera page (Accessed 02 March 2013)

Considering the number of type specimens the museum holds (20,000 mosses alone) the figure of 52 is one that is definitely open to some improvement.

 Creative Commons License
Measuring the Impact of Wikipedia for organisations (Part 2) by Ed Baker is licensed under a Creative Commons Attribution 3.0 Unported License.

Thursday, 28 February 2013

A few more jobs in biodiversity informatics projects at Natural History Museum

You can find more about all jobs and apply online here.

Drupal Developer

The Natural History Museum is looking to recruit a Drupal Developer to work on the Scratchpads project (, which is based on the Drupal ( content management system. The role encompasses the development of content, theming and functionality for new and existing PHP and Drupal systems and applications.

Scratchpads are a web-based informatics tools written using Drupal. They allow distributed groups of biodiversity scientists to create their own virtual research communities on the Web. The successful applicants should be able to work on their own initiative and be proficient in theming, coding, configuring and doing quality assurance on Drupal based websites. Mentored training and support will be provided. You will work with members of the developer and user community (research scientists, software developers and organisations) to manage and parse biodiversity data, in addition to helping with the design, construction and testing of Drupal modules and sites. The project includes opportunities for international travel to meetings, workshops, conferences and presentations as a member of the Scratchpad development team.

A bachelor’s science degree (or equivalent) and previous experience in Drupal web development is also essential for this post.

Salary:  £37,564 per annum plus benefits

Contract: Fixed term appointment (9 months)

Closing date: Midnight on Thursday 7 March 2013

Role competences:

  • A bachelor’s science degree (or equivalent)
  • Previous experience in Drupal web development (version 6 and 7). We require evidence of Drupal websites that the candidate has developed.
  • Hands on experience configuring Views, CCK, and other contributed Drupal modules
  • Hands on experience configuring the Drupal CMS for use by non-developers to add and edit the content
  • Experience with PHP, MySQL, Drupal, SQL, XML, HTML and CSS.
  • Knowledge of jQuery/Javascript is highly desirable
  • Experience of Linux is also highly desirable
  • Experience with Drupal theming and CSS is highly desirable
  • Familiarity with other programming languages (e.g Java) in addition to PHP would be useful but not essential
  • Familiarity with natural history (biodiversity, taxonomy, systematics) would be advantageous but not essential


Front-End Developer  

The Natural History Museum is looking to recruit a Front End Developer to work on integration of the Museum collections digitisation projects. The role encompasses the design and development of the front end of a web application to interface with the Museum’s collections management system, KE-EMu, which will allow the quick and easy addition of multimedia and associated meta-data. KE-EMu is a collections management system designed specifically for natural history museums and other special collections.

The successful applicants should be able to work on their own initiative and be proficient in producing beautifully designed and intuitive web applications. Mentored training and support will be provided. You will work with members of the developer and user community (research scientists, software developers and organisations) to identify the key software requirements. The project includes opportunities for international travel to meetings, workshops, conferences and presentations as a member of the Natural History Museum’s development team.

This is a fixed term appointment for twelve months, but with the possibility of a contract extension.

A bachelor’s science degree (or equivalent) and evidence of websites and web application components that you have developed are also essential for this post.

Salary: £36,986 per annum plus benefits

Contract: Fixed term appointment (12 months)

Closing date: Midnight on Thursday 7 March 2013

Role competences:
  • A bachelor’s science degree (or equivalent)
  • Previous experience in HTML5 web application development
  • Experience with systems analysis
  • Experience working with a back end developer to produce fully featured intuitive web applications
  • Extensive experience with JavaScript libraries, such as jQuery, MooTools, Dojo, Prototype, Backbone.js
  • Experience working with a front-end frameworks, such as Bootstrap, Cappuccino.
  • Extensive experience cross-browser testing and producing sites that conform to the WCAG 2.0
  • Experience in prototyping, designing and creating clean and intuitive web application user interfaces
  • Ability to produce designs that are simple, elegant and scale well across screen sizes
  • Experience developing user documentation, training and application testing
  • Fluent in software such as Photoshop, Illustrator, Fireworks

 PHP Developer

The Natural History Museum is looking to recruit a PHP Developer to work on integration of the Museum collections digitisation projects. The role encompasses the development of a web application to interface with the museum’s collections management system, KE-Emu, which will allow the quick and easy addition of multimedia and associated meta-data. KE-Emu is a collections management system designed specifically for natural history museum and other special collections. It has a well-documented API that will enable the successful applicant to build and integrate a web application to support museum digitisation workflows.

The successful applicants should be able to work on their own initiative and be proficient in producing well-documented and tested code. Mentored training and support will be provided. You will work with members of the developer and user community (research scientists, software developers and organisations) to identify the key software requirements. The project includes opportunities for international travel to meetings, workshops, conferences and presentations as a member of the Natural History Museum’s development team.

This is a fixed term appointment for twelve months, but with the possibility of a contract extension.

A bachelor’s science degree (or equivalent) and evidence of websites and web application components that you have developed are also essential for this post.

Salary: £36,986 per annum plus benefits

Contract: Fixed term appointment (12 months)

Closing date: Midnight on Thursday 7 March 2013

Role competences:
  • A bachelor’s science degree (or equivalent)
  • Previous experience in Drupal web development (version 7) or extensive experience with an alternative PHP framework. We require evidence of websites and web application components that the candidate has developed.
  • Experience of systems analysis and systems integration, interfacing with external services via APIs.
  • RDBMS experience, ideally one or more of the following: MySQL, PostgreSQL,MSSQL
  • Experience with NoSQL databases, such as MongoDB, CouchDB
  • Experience working with a front-end developer to produce fully featured intuitive web applications.
  • Knowledge of optimising PHP code, especially when interfacing with external services
  • Experience with unit testing and systems integration testing
  • Experience of command line Linux server administration is highly desirable, including knowledge of command line software like ImageMagick
  • Familiarity with other programming languages (e.g. Java) in addition to PHP would be useful but not essential
  • Familiarity with natural history (biodiversity, taxonomy, systematics) would be advantageous but not essential

Thursday, 21 February 2013

ViBRANT Citizen Science Workshop Report

The report from the workshop is now available to the public through Google Drive:

ViBRANT Citizen Science Workshop Report

It seems that we will be breaking free of the case study mentality and looking to build a generic system for biological citizen science.... follow our progress.

Wednesday, 13 February 2013

Presentations from ViBRANT Cirtizen Science Workshop

The presentations from the ViBRANT Citizen Science Workshop:





The same talk a few days earlier at UCL.

Citizen Sciecne at the Natural History Museum, London

Global Canopy Project

Overview of EU Projects

Saturday, 9 February 2013

Wifi Client Bridge (for Aruino project)

(aka how to get Ethernet devices attached to a wireless network without awkward cable runs)

Previously I wrote about an Arduino based temperature and light sensor that could send it's reading to the web over Twitter. This made use of the Ethernet shield that is available for the Arduino. This post describes how to connect this wired Ethernet device to a wireless network using a wireless router.

WiFi Client Bridge
A WiFi Client Bridge allows a secondary wireless router to share a wireless network it receives via it's WiFi interface (from a  primary wireless router) to devices attached to it via Ethernet. In this set-up the Ethernet connected devices behave as if they are directly attached to the primary wireless router.

Most routers designed for home use are capable of acting as a client bridge, although the software inside the router (the firmware) often prevents you from using the device in this fashion. This problem can be overcome by replacing the firmware that ships with the router by one of several alternatives, I used DD-WRT for this project but OpenWRT and others have the same functionality.

Wireless configuration page on router running DD-WRT firmware

The individual router firmware project websites have lists of compatible routers (I got a second-hand Buffalo WHR-G125 from eBay for £6) and instructions for replacing (flashing) the firmware.

Both DD-WRT and OpenWRT have instructions for setting up a client bridged network:
Assuming that the primary router has a correctly configured DHCP server connecting Ethernet devices to the internet is as simple as plugging them into the secondary router. This is a useful way of placing Ethernet devices in places where running Ethernet cables would be awkward.

Arduino with Ethernet shield attached to a wireless network through a router in client bridge mode

Thursday, 24 January 2013

Wikimedian in Residence at NHM (Closing date 10/02/2013)

Fancy coming to work with us?

Vacancy reference: NHM/WIR/SN
Location: South Kensington
Employment type: Fixed Term
Area of business: Life Sciences
Closing date: 10/02/2013

The Natural History Museum and the Science Museum are in partnership to recruit an experienced joint Wikimedian in Residence ( with a good understanding of GLAM projects ( As the official Wikimedian in Residence the successful applicant will be expected to make an impactful contribution to the public’s knowledge of the work of both institutions and their important and unique collections. You should have an understanding of the Wikimedia movement and Wikimedia UK’s mission to help people and organisations build and preserve open knowledge to share and use freely. You will also be expected to help develop strong and on-going links to build a long term relationship with the broader Wikimedia community and help to develop methods for assessing the impact of Wikipedia and sister projects on both institutions and the communities they serve.

The successful candidate will use their strong communication and organisational skills to promote the use of Wikipedia and sister projects to museum staff including scientists, curators and educators by fostering a broader understanding of Wikipedia (and sister projects) and arranging training in use and editing with groups and individuals. You should have an understanding of Wikimedia’s movement and Wikimedia’s UK mission to help people and organisations build and preserve open knowledge to share and use freely. In addition you will work with the museum staff to improve the quality of Wikipedia pages using items of the museums’ collections, libraries and archives and discussions with curators and researchers and act as a Wikipedia advocate through outreach to museum staff about Wikipedia’s mission and how they may contribute through workshops, events and one to one interactions.

An undergraduate degree in (or strong and demonstrable knowledge) of a scientific or technological discipline with experience of working within the Wikimedia community is also essential for this post.

Wikimedia UK is currently looking for several other Wikimedians in Residence in various cultural institutions within the UK. If you would like to find out more, please contact or visit

Knowledge, skills and experience:
An undergraduate degree in (or strong and demonstrable knowledge of) a scientific or technological discipline

Wikimedia UK
  • An understanding of, and empathy for, Wikimedia’s movement and Wikimedia UK’s mission to help people and organisations build and preserve open knowledge to share and use freely
  • Experience of editing Wikipedia or its sister websites. Supplementary training may be given
  • Experience of working with the Wikimedia community
  • An understanding of and commitment to Wikimedia UK’s Equal Opportunities Policies in both services to members and employment

  • Good understanding of the ethos and activities (curation, research, education) of national museum
  • An understanding of the GLAM sector, its culture and aims.

  • Ability to teach and support those learning to use Wikipedia and its sister projects (including via organising events/workshops
  • Ability to work tactfully, sensitively and effectively, as part of the two institutions, the Wikimedia community and with a wide range of individuals and also under your own initiative
  • Ability to communicate in English clearly, both verbally and in writing to a wide range of audiences alongside use of basic numeracy
  • Experience of successfully meeting deadlines
  • Awareness of issues related to intellectual property, confidentiality, commercial benefit and transparent working practices.

Apply here