Ed's Blog: scratchpads

Showing posts with label scratchpads. Show all posts

Saturday, 25 November 2017

Improving BioAcoustica performance

Since it's inception BioAcoustica has been built on the Scratchpads virtual research environment. Sadly the timing of the launch of BioAcoustica was very close to the Scratchpad Lead Developer leaving the Natural History Museum (the BioAcoustica Database Paper was their last NHM publication).

Since that time the Scratchpads have had little love (apart from some work I have done to keep them alive) and seem to be slowly decaying. This is set to change soon (I am led to believe, although not for the first time) with new developer attention. This is always a risk of building a project on top of infrastructure maintained and developed elsewhere. (The upside is that BioAcoustica development has leveraged existing infrastructure to manage biological taxonomies, DarwinCore compliant specimens, literature, etc).

Completely separating from the Scratchpads project, at least for now, is still undesirable. Recently the NHM team have started attending to the Scratchpads servers, and replicating the server environment outside the NHM introduces issues for future maintenance once the Scratchpads receive the care they deserve. (Although I have tested getting the infrastructure running on an external cloud hosting provider - it works - to ensure we have all bases covered).

So assuming future development of the Scratchpads will resolve the issues we have been having with occasional downtime and that fixes and that new features/infrastructure should be coming, what can we do to improve the current situation?

Aside from downtime the main issue that people have reported to me is slow file downloads. BioAcoustica is bandwidth heavy - we prefer wave files to MP3 files (for science reasons) for many taxa, and many of the files (particularly soundscapes) are large, often in the gigabyte range.

A quick test of downloading files from the Scratchpad server and the recently launched Digital Ocean Spaces revealed that we could potentially increase file bandwidth by a factor of 10. Shifting high bandwidth reads from the Scratchpads to the cloud clearly offers benefits to BioAcoustica users (faster load time), particularly if they are using the R interface to work with a large number of files.

Another issue this addresses is file backups. While the Scratchpads databases have a regular backup schedule (daily, weekly, monthly, yearly) the file backups are held only for 30 days, which has led to previous issues when nobody noticed until too late that the files had gone from their site. An automated process of copying files to the cloud as they are uploaded has the potential to allow for a more long-term backup mechanism.

So where are we now? If you visit a recording page on BioAcoustica (e.g. this Mole Cricket) then there is a good chance that the file downloaded to display the webform is currently being served from Digital Ocean rather than the Scratchpad directly. Similarly the download link will more likely than now use the same source.

What's coming next? Over the next day or so the R interface will be updated to use Digital Ocean for file transfers. This change will happen silently and will not affect users (besides saving them time). In the near term the R package will be updated so that the metadata services it relies on will also be served from the cloud, allowing the R (read only) interface to function even during times of Scratchpad downtime.

Sunday, 16 November 2014

Wildlife Sound Database

Over the past few months I have been working with some volunteers at the Natural History Museum to make the museum's collection of recorded insect sounds available online. This work-in-progress can be viewed online at the Wildlife Sounds Database. The collection reflects the research interests of the BMNH Acoustic Laboratory during the 1970s-1990s: the bulk of the collection relates to European Orthoptera (grasshoppers and allied orders).

The Wildlife Sounds Database website

Platform
The Wildlife Sound Database is a modified instance of the Scratchpads virtual research environment (Smith et al, 2011).

Use of collection
This collection contains the raw data underpinning a number of scientific publications (e.g. Ragge, 1987; Ragge & Reynolds, 1988). The recordings may also be used for future taxonomic work (in some instances voucher specimens exist in the NHM Entomology collection) and as a training set for machine learning algorithms.

Exposing datasets
Recordings with compatible licences are shared automatically with the Enyclopedia of Life through the Wildlife Sound Database Collection. The dataset is also available as a DarwinCore Archive at http://sounds.myspecies.info/dwca.zip (Baker, Rycroft & Smith, 2014).

WildSounDB audio files flowing to Encyclopedia of Life

Current development
Current development of the Wildlife Sound Database is through the NHM funded project Developing the NHM Wildlife Sound Archive. This project will develop annotation tools to annotate sections of recordings, allowing differentiation between voice introductions and different types of calls, as well as between calls of different species on the same recording. In addition we will integrate analysis tools using the seewave package for R.

As well as technical developments this project will increase the number of recordings available through the addition of new projects to the Wildlife Sound Database.

The Godfrey Hewitt collection (University of East Anglia) will be digitised and made available (this is likely to contain recordings underpinning important works on hybrid zones and post-glacial recolonisation of Orthoptera).

The Godfrey Hewitt Collection: reel-to-reel tapes to be digitised

The Global Cicada Sounds Collection will include cicada recordings from around the world.

Monday, 13 January 2014

Darwin Core Archives

In a presentation I gave not so long ago (Building Highways in the Informatics Landscape) I suggested that Darwin Core Archive (DwC-A) was the lingua franca of biodiversity informatics. A position that I still stand by. However it's a lingua franca with different dialects - and implementation is not quite as simple as it perhaps could (should?) be. In a recent paper (Linking multiple biodiversity informatics platforms with Darwin Core Archives) in the new Biodiversity Data Journal I, along with Simon Rycroft and Vince Smith, set out some of the challenges in making a 'several dialects' DwC-A that satisfies the needs of all current DwC-A consumers of the Scratchpad project.

Wednesday, 16 October 2013

EXIF Custom: photo metadata import for Scratchpads and Drupal

In a recent paper I published (EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal) I described a Drupal module I have written that allows for the import of metadata embedded within images to Drupal fields. This, along with bulk image upload tools, allows for rapid publication of images.

The introduction to the paper is reproduced here:

"The use of embedded image metadata is becoming widespread in the biodiversity informatics community (e.g. Stafford et al. 2010 & Tulig et al. 2012), and is frequently used to describe the subject and licencing of images as well as for recording the 'tombstone metadata' (e.g. Introduction to Metadata) - when the image was created, last edited, who created it, and where and how it was created.

The eMonocot project (http://about.e-monocot.org) makes use of the Scratchpads (Smith et al. 2011) infrastructure as a tool for collecting, curating, and creating content to be harvested by the eMonocot portal (http://e-monocot.org). As part of this project hundreds of images with embedded metadata are being uploaded to a number of different Scratchpads, combined with images directly uploaded by partner communities, and exported en mass to the portal. For this to be technically feasible at scale images from varied, disparate sources need to have their metadata standardised as part of the bulk upload process.

There are three widespread image metadata formats that can be handled by this module. A subset of the EXIF standard (Camera and Imaging Products Association Standardization Committee 2010) specifies a method for tagging of images with metadata. This is widely used by device manufacturers to record both the make and model of the image capture device and also the device's settings when the image was captured (e.g. focal length, flash duration). The eXtensible Metadata Platform (XMP) was originally developed by Adobe Systems Incorporated and later adopted by the International Standards Organisation as ISO 16684-1:2012. It uses a data model defined in Adobe 2012 which is serialised in XML when embedded into files. The International Press Telecommunications Council defines the IPTC Core and Extension metadata standards (IPTC 2010).

An existing Drupal module, Exif (https://drupal.org/project/exif), provides a mechanism for displaying embedded image metadata on Drupal nodes, but does not provide a mechanism for mapping the metadata into fields. The import of embedded metadata into Scratchpads/Drupal fields is a requirement of the eMonocot project and is useful for the wider Scratchpads community as it allows for these data to be easily used by other Drupal modules (e.g. Views - https://drupal.org/project/views) and in other Scratchpads-specific functions such as our on-going work on implementing the ability to export data via DarwinCore Archives (GBIF DarwinCore Archives). There is a comparison of these two modules (and potentially other similar Drupal modules) at https://drupal.org/node/1842686."

EXIF Custom on drupal.org

Read the full paper (Biodiversity Data Journal)

Monday, 19 August 2013

Informatics Horizons: Building Highways in the Informatics Landscape

This is the second talk I gave at the Informatics Horizons event at the Natural History Museum - which gives a brief overview of the DarwinCore Archive as a method for sharing biodiversity data and the advantages of this, primarily in supplying data to aggregators such as the eMonocot portal.

The slides are on SlideShare:

Building highways in the informatics landscape from Edward Baker

Thursday, 1 August 2013

Informatics Horizons: Scratchpads & Citizen Science

This is the first of my talks from the Informatics Horizons event at the Natural History Museum, London covering some work we have been on linking Scratchpads and Citizen Science. The video of the talk is on YouTube:

... and the slides are on SlideShare:

Scratchpads & Citizen Science from Edward Baker

The talk summarises what we have done with citizen science in Scratchpads so far, particularly the following projects and events:

The project to create a mobile app for recording observational data in Scratchpads is in collaboration with anymals+plants.

An example of an automated feed from user designed hardware might be along these lines: Open Source Data Logger.

Thursday, 22 November 2012

ViBRANT Citizen Science Workshop (24-25 January 2013)

Organised by Ed Baker (me) & Sarah Faulwetter to set a framework for future development of the Scratchpads BioBlitz profile (demo site) and the HCMR's ViBRANT deliverable of a Citizen Science module for Drupal.

Workshop Day 1: What can we learn from successful citizen science projects?

Morning (workshop participants & invited NHM staff)

Presentations from successful citizen science projects (background to the project and what has made them successful)

COMBER (Citizens' Network for the Observation of Marine BiodivERsity)
Hellenic Centre for Marine Research
iSPOT
Open University / OPAL
ExCiteS (Extreme Citizen Science)
University College London
Notes from NatureVizzuality
Global Canopy Programme
Overview of other European Citizen Science projectsSarah Faulwetter & Ed Baker

Afternoon (workshop participants)

Round table discussion on how ViBRANT and Scratchpads can participate in citizen science with emphasis on:

What would be useful for us to do and how we might be able to engage with existing projects?
What user-groups exist, how are they served by existing projects, and who can Scratchpads/ViBRANT target?
What are the outcomes of these projects (fun/educational awareness/scientific data)?
What quality of data can be collected?
How can data gathered be reused (Biodiversity Data Journal/GBIF/EoL)?

Workshop Day 2: Creating a citizen science plan for Scratchpads & ViBRANT

ViBRANT attendees

Development plan for HCMR’s citizen science module and can we incorporate it into the BioBlitz profile
Can we incorporate citizen science tools into Scratchpads in general (e.g. crowdsourcing image transcription).

Sunday, 18 November 2012

Playing with Flickr and CartoDB

Last Friday we had a ViBRANT sponsored workshop about CartoDB, the open source mapping and visualisation product from Madrid/New York based Vizzuality. The context of the workshop was possible integration of CartoDB with the Scratchpads and OBOE projects in the context of visualising biological datasets. The notes for the workshop demonstrations are here and are what the work done here is based on.

Not having a suitable dataset to hand I have been playing with making maps of the photos I have shared on Flickr. Flickr does provide a map view of a user's photographs (here's mine) although it is very limited in functionality - and unless you only have a handful of photographs you can't get a map view of all of your photographs.

I have previously visualised my Flickr stream by hacking the Drupal flickrsync module to save geolocation data with the Location module. Even with clustering the map points for 7,000+ images the results are slow to load: Drupal Flickr map of my photographs. The plus side of this work was that with just modifying the output of the view I could get a CSV file of my Flickr stream which I easily imported into CartoDB.

The basic map produced by CartoDB from this file is below:

Next I wanted to make a map of countries that are represented in my Flickr stream (perhaps I really wanted to play with PostGIS and polygons). This required downloading a shape file of all the countries from thematicmapping and uploading the file to a new table in CartoDB (CartoDB will accept the URL to the zip file so you can do it without downloading the file if you choose). The following SQL was applied to the world countries table:

This results in the following map:

Finally using some PostGIS I was able to make this map a little more accurate by splitting the countries into separate polygons (e.g. separating Hawaii from the continental United States, Northern Ireland from Great Britain).

Here's the new map:

Thursday, 8 November 2012

Drupal Developer | Natural History Museum, London

Become part of an expanding team of developers working at the cutting edge of information science and biodiversity research. The Natural History Museum London is recruiting a Drupal developer (fixed term until end of November 2013, £34,853 per annum plus benefits) to work on the Scratchpad project (http://scratchpads.eu) as part of a major effort to help researchers share and manage biodiversity data on the Web.

Key tasks and responsibilities include:

• Development and support of Drupal Modules and Themes

• Data parsing and content construction

• Supporting users in the development of their sites

• Interfacing with the user support team

Applicants should be able to work on their own initiative and be proficient in module development, theming and quality assurance. Mentored training and support will be provided. Successful applicants will work with members of the developer and user communities to manage and parse biodiversity data, in addition to helping with the design, construction and testing of Drupal modules and sites. The project includes opportunities for international travel as part of the development team.

Applicants should have at least 1-2 years experience in Drupal development (version 6 & 7) with hands on experience configuring Views, CCK and other contributed Drupal modules. This includes working with PHP, MySQL, SQL, XML, HTML and CSS. If you have a profile page on Drupal.org, please make reference to this within your application along with Drupal websites you have developed.

For job specific enquiries contact s.rycroft@nhm.ac.uk

Absolutely, Positively, Strictly - NO RECRUITMENT AGENCIES.

For a full job description and to apply online please visit the Natural History Museum website. http://www.nhm.ac.uk/jobs

Closing date: 30th November 2012

Tuesday, 14 August 2012

Comaprison of IPTC, EoL DwC Media & Audubon Core

As part of the eMonocot and Scratchpads projects I have been doing some research to help us decide what metadata we will allow users to add to media by default. For images the International Press Telecommunications Council (IPTC) standard is generally considered de facto. While it forms a good basis for the curation of metadata of biological images and videos, in itself it is inadequate. Two schemes for extending this basis are the EoL Media DarwinCore extension and the more comprehensive Audubon Core (a proposed TDWG standard).

In order to compare and contrast these three standards to aid in our decision making I created this spreadsheet which may be of use to others.

Monday, 9 July 2012

Part Time e-taxonomy Support Specialist, Natural History Museum (closing today)

Become part of an expanding team of developers and informaticians working at the cutting edge of information and biodiversity research. The Natural History Museum London is recruiting an e-Taxonomy support specialist (14 month, part time, £16,403 per annum, pro rata equivalent to £27,339) as part of a major effort to help researchers share and manage biodiversity data on the Web.
Key tasks and responsibilities include:

Run the Scratchpad helpdesk, respond to user queries
Develop training courses and assist in their delivery
Maintain the on-line, context-sensitive help system
Develop a personal Scratchpad on a taxonomic topic

The successful applicant will manage the help system for Scratchpads, be the primary point of contact for user enquiries, assist in the development of user support functions and engage in outreach and promotional activities. As part of this work we encourage the successful applicant to develop a Scratchpad for their own project research using taxonomic content they have acquired through research activities. This post would ideally suit a taxonomist with practical experience managing taxonomic data and an interest in eTaxonomy. This is a part time post equivalent to 3 days per week.
For job specific enquiries contact vince@vsmith.info. For a full job description and to apply online please visit the Natural History Museum website. http://www.nhm.ac.uk/jobs

Absolutely, Positively, Strictly - NO RECRUITMENT AGENCIES. Closing date: 9th July 2012

Two new Drupal developer posts (closing today)

Become part of an expanding team of developers working at the cutting edge of information science and biodiversity research. The Natural History Museum London is recruiting two junior/mid-level Drupal developers (18 month contracts, £34,508 per annum plus benefits) as part of a major effort to help researchers share and manage biodiversity data on the Web.
Key tasks and responsibilities include:

Development and support of Drupal Modules and Themes
Data parsing and content construction
Supporting users in the development of their sites
Interfacing with the user support team

Applicants should be able to work on their own initiative and be proficient in theming, coding, configuring and quality assurance. Mentored training and support will be provided. Successful applicants will work with members of the developer and user communities to manage and parse biodiversity data, in addition to helping with the design, construction and testing of Drupal modules and sites. The project includes opportunities for international travel as part of the development team.
Applicants should have at least 1-2 years experience in Drupal development (version 6 & 7) with hands on experience configuring Views, CCK and other contributed Drupal modules. This includes working with PHP, MySQL, SQL, XML, HTML and CSS. If you have a profile page on Drupal.org, please make reference to this within your application along with Drupal websites you have developed.

For job specific enquiries contact vince@vsmith.info. For a full job description and to apply online please visit the Natural History Museum website. http://www.nhm.ac.uk/jobs

Absolutely, Positively, Strictly - NO RECRUITMENT AGENCIES. Closing date: 9th July 2012

Friday, 1 June 2012

Managing Scratchpads tools in eMonocot

Originally published on the eMonocot blog: Managing Scratchpads tools

As I mentioned before, one of the things that tailoring Scratchpads to a particular community or project allows us to do is to develop functionality that is specific to that community or project. Within the eMonocot project we have developed some functionality that is useful across all of the eMonocot Scratchpads (e.g. the IPNI webservice) while other parts (e.g. the Swiss Orchid Foundation images) are useful only to a few of the Scratchpads.
In order to allow site maintainers to pick and choose which subset of these functions they would like I have added a number of the optional fetaures to the Scratchpads Tools feature to make it easy for site maintainers to add/remove these functions with a single click.
eMonocot tools

Saturday, 26 May 2012

Using IPNI to autocomplete publication names

Originally posted on the eMonocot blog: Using IPNI to autocomple publication names
One of the advantages of making a custom Scratchpads profile for a particular group of users is that this allows us to tailor the functionality of a site to the specific needs of a particular community or project.
Scratchpads have been developed to be neutral to the nomenclatural codes (unlike SpeciesFile for example, which is developed to precisely follow the International Code of Zoological Nomenclature). While we want to keep as much as we can of what we do within eMonocot useful to zoologists, there are a few cases where the restriction to a subset of botany has allowed us to develop some useful botanically focused extensions. The inclusion of images from the World Orchid Iconography database is an example of this as I blogged about recently.
Another example that we have released today is a link to the International Plant Names Index (IPNI). IPNI has a databse of botanical publications and we now use their webservice to autocomplete various fields on the 'add bibliography' form on the eMonocot Scratchpads.

The IPNI publication autocomplete function in use. Click to enlarge.

Of course this integration with IPNI has applications for many non-eMonocot botanical Scratchpads - a good example of how the eMonocot project can contribute to the wider Scratchpads community.

Monday, 19 September 2011

LifeDesks SQL -> Scratchpad Import

Thursday, 15 September 2011

Parent/child spreadsheet from Drupal taxonomy

So I don't forget:

[EDIT] This doesn't get the root term of a classification (one with no parent) - see comments for a way that does.

Sunday, 13 December 2009

Bioinformatics and Norwegian pop bands

In the supplementary file to the recent paper on the Scratchpads project published in BMC Bioinformatics the Drupal module ahah_action apparently has the functionality required to provide 'a Norwegian pop band'.