The Digital Catapult wants to hear from any small or medium sized businesses interested in developing innovative approaches to digitising scientific collections for our Pit Stop with Cisco and the Natural History Museum London happening in February.
Natural history collections hold critical information necessary to tackle fundamental scientific and societal challenges of our time – from conserving the biodiversity on which our wellbeing and our planet’s health depend, to finding new ways of combating disease and extracting mineral resources.
At present this information is locked away within hundreds of millions of specimens, labels and archives distributed across the globe, available to just a handful of scientists. The Natural History Museum London (NHM) is embarking on the boldest digitisation project of its type in the world.
Its collection comprises 80 million objects and is widely considered as one of the most internationally important natural history resources because of its taxonomic, geographical and historical breadth. For the first time in more than 250 years of natural history collecting, digital technologies offer the opportunity to unlock this treasure trove of information at scale.
The goal over the next five years is to digitise 20 million specimens – a quarter of the collections – and apply the products to scientific research. This effort will primarily focus on its entomological, botanical and paleontological collections, imaging and transcribing data about millions of natural history specimens, making the collections openly available to global scientific, entrepreneurial, academic and public audiences.
There are, so far approximately three million of the collections in digital form and the NHM is looking to automate the process of digitisation. Digitisation of the NHM collection is a multi-disciplinary process that demands high throughputs and complex workflows. The ambitious targets mean they must look towards innovative solutions that will help to maximise efficiencies of data capture and handling processes.
The Pit Stop will aim to bring together experts from a range of sectors to help address key challenges as follows:
1. Specimen metadata transcription: exploring efficient ways of capturing text at scale. In the context of natural history digitisation this normally involves transcribing semi-structured specimen metadata from analogue (often handwritten) labels into digital records. Potential solutions to improve efficiency could include OCR for printed tags and HTR (Handwriting Transcript Recognition). NHM manually digitised records will be available if required for training data.
2. Data quality enhancement of legacy data and output from any transcription processes: looking at data cleaning methods and how these can be automated. This could incorporate NLP/machine learning/text processing for analysis; it might also be addressed by using gamification/crowd sourcing over time.
3. Text/data/literature mining and linking: enhancing the value of specimen data by extracting the semantic information and linking to relevant data be it in published literature archives or online and provide new metrics to track the impact and use of digital collections.
How can I get involved?
The Cisco Pit Stop with the NHM will investigate how data capture, data analytics and new transcription processes can help find innovative ways of increasing the efficiency and accuracy of data capture during the delivery of the Natural History Museum digitisation project. The Digital Catapult would like to hear from any small or medium sized businesses working with data capture, data analytics or transcription solutions applicable to large scientific collections or operating in a related field.
The Pit Stop will connect selected innovators with key stakeholders from both NHM and Cisco. Over one and a half days through a series of collaborative workshops, participants will get a chance to engage with key stakeholders and industry experts.
Date and times
The closing date to submit your interest for this Pit Stop is Wednesday 3 February 2016. Please fill in the short form below and our team will come back to you within 14 days of the call closing. The Pit Stop takes place from 25 – 26 February 2016. To apply, please see the form below.