Digital

Increasing the accuracy of identification of genetic test results

April 16, 2025 by No Comments | Category Data Science and Innovation Accelerator, Digital Scotland

Blog post by Laura Naismith, Data Management Officer, Public Health Scotland.

The Data Science and Innovation Accelerator is a programme for public sector organisations in Scotland that supports the innovative use of data to solve business problems or discover new opportunities. In her blog, Laura Naismith, Data Management Officer in Public Health Scotland discusses her 2024 Accelerator project to improve the identification of genetic conditions in test results.

‘Each year Public Health Scotland receives data on approximately 5000 antenatal and infant tests carried out in Scotland. This data arrives in 5 extracts from 4 laboratories. My project set out to improve the performance of an algorithm that identifies genetic conditions caused by an extra chromosome:

  • T13 – Patau’s Syndrome (extra chromosome 13)
  • T18 – Edward’s Syndrome (extra chromosome 18)
  • T21 – Down’s Syndrome (extra chromosome 21)

When the algorithm identifies a record with these potential conditions, it is assessed and confirmed manually by disease registration staff. Increasing the accuracy of the algorithm would help to reduce the time spent on misidentified records.

Although the algorithm identifies the relevant information in many cases, it does have certain flaws. The main one being it picks up on percentages, dates, or genetic code containing the numbers 13, 18 or 21 as a trisomy even if the test results are negative. I needed to find a way to try and minimise these ‘noise’ records.

In the early weeks of the Accelerator I used the dedicated time to understand how each laboratory recorded suspected cases, by using confirmed case data from 2019 to 2023. This allowed me to discover both patterns and variations in how text is recorded which I realised was affecting results.

Anyone who works with free text will understand the challenges of analysing this data! One of the trickier issues I encountered was the impact of negatives in the free text field such as ‘No evidence of Trisomy 13’ which resulted in a positive identification of the condition by the algorithm in error.

For my solution, I decided to create a new algorithm specific for each laboratory. My theory being that it would allow me to better understand how each laboratory highlights their suspected cases. In addition, this would allow future changes to be implemented with ease, without affecting other laboratories data.

This gave me the opportunity to learn more about Regular Expressions (Regex). Regex proved to be a vital tool in filtering for specific conditions and creating a flagging system using R / Posit which will also make it easy for our teams to access and update scripts in the future.

During the Accelerator, I was mentored by Helen Price, an Operational Researcher in Scottish Government. One of the benefits of being mentored by Helen was that she doesn’t have a health background, which helped develop my skills in translating clinical data for a non-clinical audience. It was also great to hear her different perspective and bounce ideas off her.

By December, I was successfully able to create five functioning scripts for each extract. These are semi-automatic, still allowing for records to be reviewed for registration. By using Regex to locate patterns of suspected cases, I removed around 500 ‘no trisomy’ noise records across all extracts from 2019 to 2023. This equates to a 10% – 30% reduction per laboratory. Excitingly the chance arose to use one script for a new submission during the programme. This learning will allow us to develop new scripts for other rare diseases and identify them more accurately.

Being part of the Accelerator helped to build my confidence in explaining complex issues to a lay audience, reaching out to new stakeholders, and working with a dataset I am unfamiliar with. It brought a wonderful opportunity to meet people and develop new skills”

What a fantastic result for Laura!  If you’ve been inspired by what Laura achieved, please visit: Data Science and Innovation Accelerator – Scottish Digital Academy and get your applications in by 28 April 2025.

If you are interested in supporting people like Laura to progress with their innovative ideas, we’d love you to consider volunteering as an Accelerator mentor. You’ll find all the information you need at Data Science and Innovation Accelerator – Mentor – Scottish Digital Academy.


Tags: , , ,

Comments

Leave a comment

By submitting a comment, you understand it may be published on this public website. Please read our privacy policy to see how the Scottish Government handles your information.

Your email address will not be published. Required fields are marked *