Web archiving – what you need to know
To celebrate World Digital Preservation Day (29 November) we’ve published guidance on web archiving.
Working with National Records of Scotland and National Library of Scotland – the organisations who carry out legal duties to archive web content – we’ve created new archiving guidance for content owners across the public sector.
Our aim is to make sure online information that is considered a public record is preserved for future use.
The guidance covers:
- what web archiving is and who does this for the public sector in Scotland
- why it’s important for users
- tips to make websites archive-friendly
The good news is most public authority websites are already being archived.
But the quality can vary, depending on the way websites are built and managed, so it’s important to contact National Records of Scotland and National Library of Scotland to make sure they know what to archive and how often.
The Public Records (Scotland Act) 2011 was introduced following concerns that records management was failing to meet legal requirements and the needs of users.
Public authorities must have a formal records management plan, which includes detailing processes for archiving information worthy of permanent preservation. The Keeper of the Records of Scotland (National Records of Scotland) is responsible for making sure these plans meet the obligations set out in the Act.
The National Library of Scotland preserves websites under the Legal Deposit Libraries (Non-Print Works) Regulations 2013, creating collections of sites relevant to events or themes. An example of this is the 2014 Commonwealth Games.
An open and transparent government
Of course it’s not just the legal requirements that make archiving an important issue.
Websites change frequently to meet the agenda of the present day. If users need to find information that is no longer on a live website, we need to make sure they can find it.
This helps maintain the online chain of official information and supports a good user experience, public transparency, and openness.
What you can do
Following best practice in content design can help make sure website content can be archived and found by end users.
However, the way your website is designed can prevent a web crawler from archiving your content. Features that can cause problems include:
- Content delivered using http POST
- Search and filtering tools on websites – these cannot generally be captured, which creates issues if a user can only find content by using a search box or filtering
- Database-driven content from an external or ‘back-end’ source
- Streamed and embedded audio visual content, for example YouTube and Vimeo
- Flash objects and Rich Internet Applications
- Content which is on a different website – web crawlers will generally only look at content on one web domain at a time and will generally not follow external links
- Poor website structure, broken links and ‘orphaned’ content which is not linked to
- Streamed and embedded social media content (for example Twitter and Facebook feeds
- Interactive maps