How the Data Science Accelerator sparked a solution to meet a Covid data challenge
Blog by Joyce Dalgleish, Business Change Manager, Scottish Public Sector Analytical Collaborative.
The Data Science Accelerator is a 12 week development programme that gives public sector analysts the opportunity to build their data science skills through project-based mentoring. Delivered by the Scottish Public Sector Analytical Collaborative (SPACe), the accelerator is helping to develop data science capability in Scotland, speeding up the adoption of new tools and innovative practices. As we are about to go into the accelerator for 2022, it seemed like the perfect time to share a story from one of our alumni.
Martin Reid, a senior analyst working in Public Health Scotland (PHS), applied to the Data Science Accelerator in 2019 with a proposal to explore ways to remove the human intervention required to minimise errors during the record linkage process to assign Community Health Index (CHI) number.
The CHI number enables PHS to link internal and external datasets for analysis and is central to research, Official Statistics and health and care decision making. Linkage for CHI uses probabilistic matching which assigns the likelihood of 2 records being the same individual. When this identifier is absent, or where quality issues exist, the challenge of linking datasets increases. Where potential matches exhibit dissimilarity or a one-to-many relationship exists, the match must be confirmed by a member of the linkage team to ensure false positive matches or false negative matches are minimised.
Martin wanted to use the accelerator to establish the most appropriate machine learning model to undertake that human confirmation task. Improving the existing methodology in this way would be of immense benefit to PHS – saving resources, improving reputation and increasing customer confidence.
During the accelerator, Martin worked on a random forest classifier using Python open-source software which delivered good results. However, post-accelerator he started to explore how he might use R to do deterministic matching. PHS were growing their R skills and building this in R would put Martin’s project on a more sustainable footing.
Martin had no experience of programming in R, but as he describes it, the accelerator had “lit a fire” within him that gave him the confidence to rewrite his original project. He tested his work against the human review and found that his programme was matching the human 95% of the time, with the added benefit of consistency.
When PHS began to work on Covid, the volume of test data, plus the velocity and veracity of that data gave the linkage team a perfect opportunity to deploy the project. This automated approach gave them the ability to meet the demands on the team and became the only way to keep up with the Covid test data.
Scott Wilson, Principal Information Analyst, Public Health Scotland commented
“This development has been of huge benefit to our team and has been in use for a major section of the Covid work allocated to us. We estimate that this product has made efficiency savings of around 1.5 whole time equivalent staff, which has meant that this staff resource can be used for other projects within the team, allowing business as usual work to continue alongside the additional Covid work.”
“Following on from the Covid work, I’ve gone on to create a unique patient reference number seeding workflow in my new job for the spatial and analytics team. I continue to connect with my former colleagues in the linkage team to keep sharing challenges and successes. I realise now that what’s achievable is limited mainly by imagination.”
This approach has also enabled demands for large jobs with quick turnaround to be delivered and the learning has been extended to other areas of the business.
Martin’s story is a great example of the positive and ongoing effects of the Data Science Accelerator. It shows how the experience builds confidence and knowledge enabling participants and their organisation deliver real efficiencies.
To find out more about taking part, email: DataScienceAccelerator@gov.scot