Digital

Web archiving – what you need to know

November 29, 2018 by 1 Comment | Category Content Design, Digital Public Services

To celebrate World Digital Preservation Day (29 November) we’ve published guidance on web archiving.

Working with National Records of Scotland and National Library of Scotland – the organisations who carry out legal duties to archive web content – we’ve created new archiving guidance for content owners across the public sector.

Our aim is to make sure online information that is considered a public record is preserved for future use.

The guidance covers:

  • what web archiving is and who does this for the public sector in Scotland
  • why it’s important for users
  • tips to make websites archive-friendly

The good news is most public authority websites are already being archived.

But the quality can vary, depending on the way websites are built and managed, so it’s important to contact National Records of Scotland and National Library of Scotland to make sure they know what to archive and how often.

The legislation

The Public Records (Scotland Act) 2011 was introduced following concerns that records management was failing to meet legal requirements and the needs of users.

Public authorities must have a formal records management plan, which includes detailing processes for archiving information worthy of permanent preservation. The Keeper of the Records of Scotland (National Records of Scotland) is responsible for making sure these plans meet the obligations set out in the Act.

The National Library of Scotland preserves websites under the Legal Deposit Libraries (Non-Print Works) Regulations 2013, creating collections of sites relevant to events or themes. An example of this is the 2014 Commonwealth Games.

An open and transparent government

Of course it’s not just the legal requirements that make archiving an important issue.

Websites change frequently to meet the agenda of the present day. If users need to find information that is no longer on a live website, we need to make sure they can find it.

This helps maintain the online chain of official information and supports a good user experience, public transparency, and openness.

What you can do

Following best practice in content design can help make sure website content can be archived and found by end users.

However, the way your website is designed can prevent a web crawler from archiving your content. Features that can cause problems include:

  1. Content delivered using http POST
  2. Search and filtering tools on websites – these cannot generally be captured, which creates issues if a user can only find content by using a search box or filtering
  3. Displaying content using complex JavaScript and other scripting languages – for example HTML and URLs dynamically created by JavaScript execution, frames and stylesheets dynamically created by JavaScript execution, dropdown menus, checkboxes, ‘submit’ commands
  4. Database-driven content from an external or ‘back-end’ source
  5. Streamed and embedded audio visual content, for example YouTube and Vimeo
  6. Flash objects and Rich Internet Applications
  7. Content which is on a different website – web crawlers will generally only look at content on one web domain at a time and will generally not follow external links
  8. Poor website structure, broken links and ‘orphaned’ content which is not linked to
  9. Streamed and embedded social media content (for example Twitter and Facebook feeds
  10. Interactive maps

Contact

Contact National Records of Scotland and National Library of Scotland for more information or for advice on your archiving policy.


Tags: , ,

Comments

  • info says:

    The Collections as Data events, both hosted by the Labs team here at the Library of Congress, and related events supported by the Institute of Museum and Library Services have brought increased attention to the needs of researchers to interact with a range of derivative data forms for digital collections. In that vein, the Web Archiving Team provided a good bit of support to the Library of Congress Digital Scholars Lab Pilot Project (PDF) which focused on a use case of deriving data sets for researchers.

Leave a comment

By submitting a comment, you understand it may be published on this public website. Please read our privacy policy to see how the Scottish Government handles your information.

Your email address will not be published. Required fields are marked *