Statistics

Improving the Speed and Accuracy of Agricultural Statistics

February 16, 2022 by No Comments | Category Farming and rural, Land use, Uncategorized, Working in statistics

This month, the team are looking at improving the speed and accuracy of our agricultural statistics. They focussed on the computer code and the time taken from receiving data to producing statistics. And, we are learning from gaming and software development. In simple terms, it is a way of coding that removes errors. It also makes it easier for future developments to take place.

Improving Accuracy and Replacing ‘Copy and Paste’

Up until now, analysts have had to edit the code in June Agricultural Census and upload the data into our analytical software. Think ‘copy and paste’ on an industrial scale! This can, and does, lead to errors. It takes a huge amount of time for them to then find and correct them.

The solution is Reproducible Analytical Pipelines or RAPs, a specific method for writing statistical analysis code. This blog on RAPs and data in government explains the method in greater detail.

We are adopting RAP principles to reduce, eliminate or identify errors more easily. RAP also makes the code more easily understood. Done properly, this will save time on error checking and data processing. And future improvements will be easier to make.

Another crucial benefit is that they can test the code with live data. Because of the way that missing data is handled, re-running code can yield results that differ each time it is run. Instead, we will feed data into a well-tested and notated algorithm. An algorithm is just a fancy name for many different computing processes run one after the other or at the same time.

In theory these changes should speed up the time taken to produce analysis. This will give our teams more time to go deeper into the data and learn more about what it is telling us.

More Open Code

And the team have taken it a step further. Again, similar to gaming and software development the team are using a Git repository. This is a method of version control, where teams can code in a virtual space, review and make changes without overwriting any work. This system then ‘pulls’ the work into a final version.

Our aim is to “code in the open” similar to the Scottish Crop Map. We will be storing our code on GitHub again. Meaning, any researcher or student will be able to go in and examine the raw code. Anyone can see how the data from June Agricultural Census is turned into the final product of statistics. You will be able to examine the algorithms and even make suggestions to improve it, so we have effective and efficient code.

So far the team have practiced on a series of ‘mini-projects’. First we used our pig data then moved on to cereals, improving the speed and accuracy of statistics. Results look promising and the speed at which the data can be taken to produce visual tables, charts and analysis is already a improving our current system.

If this is our first jump into expanding our capability to replace the ageing systems and surveys with modern techniques, I am starting to get really excited!

Get Involved

Are you a researcher in agriculture or land use? Why not get involved? Contact the team and join us in one of our ‘Show and Tell’ sessions.


Tags: , , , , ,

Comments

Leave a comment