Saturday 25 November 2017

Improving BioAcoustica performance

Since it's inception BioAcoustica has been built on the Scratchpads virtual research environment. Sadly the timing of the launch of BioAcoustica was very close to the Scratchpad Lead Developer leaving the Natural History Museum (the BioAcoustica Database Paper was their last NHM publication).

Since that time the Scratchpads have had little love (apart from some work I have done to keep them alive) and seem to be slowly decaying. This is set to change soon (I am led to believe, although not for the first time) with new developer attention. This is always a risk of building a project on top of infrastructure maintained and developed elsewhere. (The upside is that BioAcoustica development has leveraged existing infrastructure to manage biological taxonomies, DarwinCore compliant specimens, literature, etc).

Completely separating from the Scratchpads project, at least for now, is still undesirable. Recently the NHM team have started attending to the Scratchpads servers, and replicating the server environment outside the NHM introduces issues for future maintenance once the Scratchpads receive the care they deserve. (Although I have tested getting the infrastructure running on an external cloud hosting provider - it works - to ensure we have all bases covered).

So assuming future development of the Scratchpads will resolve the issues we have been having with occasional downtime and that fixes and that new features/infrastructure should be coming, what can we do to improve the current situation?

Aside from downtime the main issue that people have reported to me is slow file downloads. BioAcoustica is bandwidth heavy - we prefer wave files to MP3 files (for science reasons) for many taxa, and many of the files (particularly soundscapes) are large, often in the gigabyte range.

A quick test of downloading files from the Scratchpad server and the recently launched Digital  Ocean Spaces revealed that we could potentially increase file bandwidth by a factor of 10. Shifting high bandwidth reads from the Scratchpads to the cloud clearly offers benefits to BioAcoustica users (faster load time), particularly if they are using the R interface to work with a large number of files.

Another issue this addresses is file backups. While the Scratchpads databases have a regular backup schedule (daily, weekly, monthly, yearly) the file backups are held only for 30 days, which has led to previous issues when nobody noticed until too late that the files had gone from their site. An automated process of copying files to the cloud as they are uploaded has the potential to allow for a more long-term backup mechanism.

So where are we now? If you visit a recording page on BioAcoustica (e.g. this Mole Cricket) then there is a good chance that the file downloaded to display the webform is currently being served from Digital Ocean rather than the Scratchpad directly. Similarly the download link will more likely than now use the same source.

What's coming next? Over the next day or so the R interface will be updated to use Digital Ocean for file transfers. This change will happen silently and will not affect users (besides saving them time). In the near term the R package will be updated so that the metadata services it relies on will also be served from the cloud, allowing the R (read only) interface to function even during times of Scratchpad downtime.

From scientific sound collection to entomological erotica. Part 1.

The BioAcoustica project goes from strength to strength. Recently Klaus-Gerhard Heller and I published a new species of the bush-cricket Horatosphaga that Klaus-Gerhard first identified from a recording I had made available on the platform. The species was named in honour of David Ragge, who worked on bush-crickets at the Natural History Museum (NHM) in London for many years, as well being the founder of the NHM's library of recorded wildlife sounds.

Making the NHM sound collection freely available allowed Klaus-Gerhard to identify the potential new species - and after I re-prepared the future holotype to expose the stridulatory file it could be confirmed easily enough. The openness led to a new collaboration.

The first taxonomic group that we have made data available for was the Gryllotalpidae - in part because of the status of Gryllotalpa gryllotalpa  in the UK is of interest, and in part because the NHM also has casts of some of the acoustic burrows made by males.

At the recent Orthoptera Special Interest Group (SIG) of the Royal Entomological Society I was approached by Clive Huggins who informed me that I was listed in the credits of The Duke of Burgundy - an art film with entomology as an important plot point, and it appears a good amount of slightly unusual erotica. Indeed The Guardian starts one review with the following paragraph:

"The Duke of Burgundy is the most tender love story you'll see in which a woman forcefully urinates in her lover's mouth."

This obviously needs to be checked out. The only issue was I had completely forgotten about supplying the recordings. I initially assumed they were just taken from BioAcoustica - but the film was made just before BioAcoustica went live.

Digging through old emails I discovered when the film crew got their hands on the recordings. Before BioAcoustica was released to the public we had to get permission from the NHM to release the sound recordings under an open licence (Creative Commons Attribution), which I managed to arrange. At about this time I was passed an email from a film company via George Beccaloni (at the time NHM Curator of Orthopteroid insects) from someone who was after recordings we had of various Gryllotalpa species. The reason I had no recollection of the storyline was due to not having many details about the film. Indeed the only thing I did now follows here:
"We'd be using as part of a film soundtrack - it's the story of two entomologists ( sort of!) - in one scene they listen to Mole Crickets...."
So that's how George and I ended up being credited in The Duke of Burgundy.