{"id":5164,"date":"2024-09-03T12:42:18","date_gmt":"2024-09-03T12:42:18","guid":{"rendered":"https:\/\/blogs.gov.scot\/digital\/?p=5164"},"modified":"2024-09-04T16:41:43","modified_gmt":"2024-09-04T16:41:43","slug":"next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector","status":"publish","type":"post","link":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/","title":{"rendered":"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector"},"content":{"rendered":"\r\n<p><strong>Blog by Masood Alam, Chief Data Architect, and Catherine Ojo, Lead Data Scientist, from the Scottish Government&#8217;s Data Division.<\/strong><\/p>\r\n<p>In a time when data drives decision-making, the Scottish public sector is making a significant move with the prospective development of a state-of-the-art Federated Data Catalogue and Metadata Repository, using machine learning &amp; graph analytics. This pioneering project aims to revolutionise how public sector organisations manage, discover, and integrate data, setting a new benchmark for data accessibility and use.<\/p>\r\n<h2>The Problem Statement<\/h2>\r\n<p>The data architecture team has found that while there are some metadata catalogues in place across the organisation, there is a need for a more robust solution to better manage data governance and data management across different platforms.<\/p>\r\n<p>Like many governments and large organisations, the Scottish Government faces challenges in managing data effectively. We lack a complete overview of our data\u2014its location, ownership, or potential insights. Even when data is accessible, metadata is rarely documented, limiting the use of AI, advanced analytics, and evidence-based decision-making.<\/p>\r\n<p>To tackle this, the data architecture team worked with Gartner to understand trends in metadata management and maturity. According to Gartner&#8217;s &#8220;Metadata Management Technology Maturity&#8221; model, organisations range from Level 0 (Unaware) to Level 5 (Augmented):<\/p>\r\n<ul>\r\n<li><strong>Level 0 (Unaware):<\/strong> No standards, uncoordinated, project-based.<\/li>\r\n<li><strong>Level 1 (Inventory):<\/strong> Minimal data collection, &#8220;as is&#8221; accepted, separate tools.<\/li>\r\n<li><strong>Level 2 (Catalog):<\/strong> Technical descriptors, some data lineage, coordinated business descriptors, scheduled updates.<\/li>\r\n<li><strong>Level 3 (Proactive):<\/strong> Resolves critical assets, multiple definitions, technical taxonomy, trend analysis.<\/li>\r\n<li><strong>Level 4 (Active):<\/strong> Uses machine learning for profiling, content analysis, clustering, resource allocation metrics, and alerts.<\/li>\r\n<li><strong>Level 5 (Augmented):<\/strong> Machine learning by example, orchestrates recommendations and responses, infers new assets from use cases.<\/li>\r\n<\/ul>\r\n<p>Organisations without a metadata catalogue are typically at Level 0 or 1, highlighting the need for major improvements in data management to fully realise the value of their data and make better decisions. With the introduction of this catalogue, we aim to support many organisations in reaching Levels 4 and 5, where advanced metadata management and automation can significantly improve data discovery, governance, and overall effectiveness.<\/p>\r\n<h2>What is the Big Idea?<\/h2>\r\n<p>We aim to create the public sector&#8217;s first self-service, plug-and-play data catalogue. Using AI and knowledge graphs, we want to streamline data handling, enhance decision-making, and build a more flexible, data-driven public sector.<\/p>\r\n<p><strong>What\u2019s Under the Hood?<\/strong><\/p>\r\n<p>Our catalogue will feature cutting-edge technology:<\/p>\r\n<ol>\r\n<li><strong>Automatic Metadata Cataloguing &amp; Generation<\/strong>: Utilising pre-trained large language models for summarising text to generate metadata, machine learning to profile structured data and identify entities with different names, and graph modelling for continuous catalogue improvement.<\/li>\r\n<li><strong>Active Data Discovery<\/strong>: Using knowledge graph models and other tools to detect and suggest relationships between datasets dynamically.<\/li>\r\n<li><strong>Automated Data Tagging<\/strong>: Generating dataset synonyms automatically with pre-trained large language models for data classification and knowledge graphs.<\/li>\r\n<li><strong>DCAT-3 Metadata Standards<\/strong>: Implementing the latest Data Catalogue Vocabulary standards to enable seamless metadata sharing between organisations.<\/li>\r\n<\/ol>\r\n<h2>Key proof of concept features<\/h2>\r\n<p>We plan to develop four key features as a proof of concept (POC) to create a practical demonstrator with our vendor and trial partners.<\/p>\r\n<p><strong>1.1 Entity Recognition &amp; Mapping<\/strong><\/p>\r\n<p>The system will include an entity recognition and mapping module that automatically detects and suggests relationships across datasets using Multi-Modal LLMs.<\/p>\r\n<p><strong>1.2 Knowledge Graphs with Semantic Notes<\/strong><\/p>\r\n<p>knowledge graphs that:<\/p>\r\n<ul>\r\n<li>Visualise data relationships.<\/li>\r\n<li>Provide semantic notes to explain complex connections.<\/li>\r\n<\/ul>\r\n<p><strong>1.3 Dataset Classification and Metadata<\/strong><\/p>\r\n<p>Machine learning will:<\/p>\r\n<ul>\r\n<li>Automatically classify datasets.<\/li>\r\n<li>Generate metadata to enhance data discovery.<\/li>\r\n<\/ul>\r\n<p>It will serve as a metadata generation tool, allowing users to provide real-time feedback to improve results.<\/p>\r\n<p><strong>1.4 Real-Time DCAT-3 Conversion<\/strong><\/p>\r\n<p>The system will adopt DCAT-3 standards to improve metadata sharing across organisations, providing richer metadata descriptions.<\/p>\r\n<h2>Federated Approach and UK-Wide Integration<\/h2>\r\n<p>Our proposal for a federated approach to metadata search aligns with the UK Government&#8217;s plans for a National Data Library. By developing APIs to link our Scottish metadata repository with the UK Government, we will contribute to a National Metadata Repository, improving data discovery across the public sector.<\/p>\r\n<p>A federated model brings several benefits:<\/p>\r\n<ul>\r\n<li><strong>Local Control<\/strong>: Devolved governments and agencies can manage data in their own systems.<\/li>\r\n<li><strong>Better Discovery<\/strong>: Enables cross-catalogue searches for more comprehensive data finding.<\/li>\r\n<li><strong>Controlled Data Sharing<\/strong>: Facilitates data sharing while keeping local governance in place.<\/li>\r\n<\/ul>\r\n<h2>Finally<\/h2>\r\n<ol>\r\n<li><strong>Phased Implementation<\/strong><\/li>\r\n<\/ol>\r\n<p>We will begin with a Proof of Concept (PoC) with Scottish public sector organisations. This careful start lets us refine the system and prove its value before larger trials and a full rollout.<\/p>\r\n<ol start=\"2\">\r\n<li><strong>Why It Matters<\/strong><\/li>\r\n<\/ol>\r\n<p>Traditional data catalogues are expensive, inflexible, and require manual input from skilled Data Architects. Our Next-Generation Metadata Catalogue addresses these problems:<\/p>\r\n<ul>\r\n<li><strong>Cost-Effective<\/strong>: No high licensing fees.<\/li>\r\n<li><strong>Customisable<\/strong>: Easily tailored to public sector needs.<\/li>\r\n<li><strong>Open Source<\/strong>: Prevents vendor lock-in and encourages collaboration.<\/li>\r\n<li><strong>Standards-Compliant<\/strong>: Built with DCAT-3 standards for compatibility and future use.<\/li>\r\n<\/ul>\r\n<ol start=\"3\">\r\n<li><strong>The Future<\/strong><\/li>\r\n<\/ol>\r\n<p>This project goes beyond creating a data catalogue; it aims to build a more efficient, connected, and data-driven public sector. The Next-Generation Metadata Catalogue will unlock data potential, foster innovation, and improve public services in Scotland and beyond.<\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>Blog by Masood Alam, Chief Data Architect, and Catherine Ojo, Lead Data Scientist, from the Scottish Government&#8217;s Data Division. In a time when data drives decision-making, the Scottish public sector is making a significant move with the prospective development of a state-of-the-art Federated Data Catalogue and Metadata Repository, using machine learning &amp; graph analytics. This&#8230;<\/p>\n","protected":false},"author":317,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,6],"tags":[107,442,441,95,336],"class_list":["post-5164","post","type-post","status-publish","format-standard","hentry","category-data","category-digital-scotland","tag-data","tag-data-architecture","tag-metadata","tag-scottish-government","tag-scottish-public-sector"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector - Digital<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector - Digital\" \/>\n<meta property=\"og:description\" content=\"Blog by Masood Alam, Chief Data Architect, and Catherine Ojo, Lead Data Scientist, from the Scottish Government&#8217;s Data Division. In a time when data drives decision-making, the Scottish public sector is making a significant move with the prospective development of a state-of-the-art Federated Data Catalogue and Metadata Repository, using machine learning &amp; graph analytics. This...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\" \/>\n<meta property=\"og:site_name\" content=\"Digital\" \/>\n<meta property=\"article:published_time\" content=\"2024-09-03T12:42:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-04T16:41:43+00:00\" \/>\n<meta name=\"author\" content=\"Stewart Hamilton\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Stewart Hamilton\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\"},\"author\":{\"name\":\"Stewart Hamilton\",\"@id\":\"https:\/\/blogs.gov.scot\/digital\/#\/schema\/person\/fd47935c780321ad6c4ecbb2f10da552\"},\"headline\":\"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector\",\"datePublished\":\"2024-09-03T12:42:18+00:00\",\"dateModified\":\"2024-09-04T16:41:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\"},\"wordCount\":833,\"commentCount\":6,\"keywords\":[\"Data\",\"Data architecture\",\"Metadata\",\"scottish government\",\"Scottish Public Sector\"],\"articleSection\":[\"Data\",\"Digital Scotland\"],\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\",\"url\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\",\"name\":\"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector - Digital\",\"isPartOf\":{\"@id\":\"https:\/\/blogs.gov.scot\/digital\/#website\"},\"datePublished\":\"2024-09-03T12:42:18+00:00\",\"dateModified\":\"2024-09-04T16:41:43+00:00\",\"author\":{\"@id\":\"https:\/\/blogs.gov.scot\/digital\/#\/schema\/person\/fd47935c780321ad6c4ecbb2f10da552\"},\"breadcrumb\":{\"@id\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blogs.gov.scot\/digital\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blogs.gov.scot\/digital\/#website\",\"url\":\"https:\/\/blogs.gov.scot\/digital\/\",\"name\":\"Digital\",\"description\":\"Updates from the Scottish Government&#039;s Digital Directorate\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blogs.gov.scot\/digital\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blogs.gov.scot\/digital\/#\/schema\/person\/fd47935c780321ad6c4ecbb2f10da552\",\"name\":\"Stewart Hamilton\",\"description\":\"Communications and Engagement Officer\",\"url\":\"https:\/\/blogs.gov.scot\/digital\/author\/stewarthamilton\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector - Digital","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/","og_locale":"en_GB","og_type":"article","og_title":"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector - Digital","og_description":"Blog by Masood Alam, Chief Data Architect, and Catherine Ojo, Lead Data Scientist, from the Scottish Government&#8217;s Data Division. In a time when data drives decision-making, the Scottish public sector is making a significant move with the prospective development of a state-of-the-art Federated Data Catalogue and Metadata Repository, using machine learning &amp; graph analytics. This...","og_url":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/","og_site_name":"Digital","article_published_time":"2024-09-03T12:42:18+00:00","article_modified_time":"2024-09-04T16:41:43+00:00","author":"Stewart Hamilton","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Stewart Hamilton","Estimated reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#article","isPartOf":{"@id":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/"},"author":{"name":"Stewart Hamilton","@id":"https:\/\/blogs.gov.scot\/digital\/#\/schema\/person\/fd47935c780321ad6c4ecbb2f10da552"},"headline":"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector","datePublished":"2024-09-03T12:42:18+00:00","dateModified":"2024-09-04T16:41:43+00:00","mainEntityOfPage":{"@id":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/"},"wordCount":833,"commentCount":6,"keywords":["Data","Data architecture","Metadata","scottish government","Scottish Public Sector"],"articleSection":["Data","Digital Scotland"],"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/","url":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/","name":"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector - Digital","isPartOf":{"@id":"https:\/\/blogs.gov.scot\/digital\/#website"},"datePublished":"2024-09-03T12:42:18+00:00","dateModified":"2024-09-04T16:41:43+00:00","author":{"@id":"https:\/\/blogs.gov.scot\/digital\/#\/schema\/person\/fd47935c780321ad6c4ecbb2f10da552"},"breadcrumb":{"@id":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/blogs.gov.scot\/digital\/2024\/09\/03\/next-generation-metadata-catalogue-revolutionising-data-discovery-in-the-scottish-public-sector\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blogs.gov.scot\/digital\/"},{"@type":"ListItem","position":2,"name":"Next-Generation Metadata Catalogue: Revolutionising Data Discovery in the Scottish Public Sector"}]},{"@type":"WebSite","@id":"https:\/\/blogs.gov.scot\/digital\/#website","url":"https:\/\/blogs.gov.scot\/digital\/","name":"Digital","description":"Updates from the Scottish Government&#039;s Digital Directorate","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blogs.gov.scot\/digital\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/blogs.gov.scot\/digital\/#\/schema\/person\/fd47935c780321ad6c4ecbb2f10da552","name":"Stewart Hamilton","description":"Communications and Engagement Officer","url":"https:\/\/blogs.gov.scot\/digital\/author\/stewarthamilton\/"}]}},"_links":{"self":[{"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/posts\/5164","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/users\/317"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/comments?post=5164"}],"version-history":[{"count":0,"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/posts\/5164\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/media?parent=5164"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/categories?post=5164"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.gov.scot\/digital\/wp-json\/wp\/v2\/tags?post=5164"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}