• English

Metadata at the data collection phase; metadata driven architecture

Metadata is data about data. Metadata is used to facilitate the understanding, use and management of data.

 It is generally acknowledged that metadata play a decisive role both in satisfying statistics users’ growing quality requirements and in increasing the efficiency of the internal production processes within an NSO (national statistical organization).

   The following documents relate to this subject:

“IMS (Integrated Metadata System) – An Architecture for an Expandable Metadata Repository to Support the Statistical Life Cycle”

Prepared by Guenther Zettl, Statistics Austria, for the
UNECE Workshop on the Common Metadata Framework ( Vienna , Austria , 4-6 July 2007)

 Executive Summary

 In recent years relevant technical publications have repeatedly stressed that the implementation of metadata systems must be founded on a comprehensive and general model of statistics production and on a long-term master plan (the term “metadata strategy” is often used in this context). Paying too little attention to these preconditions leads to metadata systems which are neither linked with each other nor with the data they document and which lack the ability to cooperate with each other. Often, the same information is stored repeatedly, rendering it difficult to keep the metadata consistent and causing unnecessary effort and costs. In the worst case, the resulting applications rely on mutually incompatible concepts and models, making integrating them ex post an extremely demanding if not impossible task.

  The problems mentioned above are not unknown in Statistics Austria. As presumably in every other NSI, numerous metadata-related cross-domain projects have been implemented over time. These projects were often carried out without relation to each other, thus creating more or less isolated solutions.

  In several of its comments the Statistical Council has explicitly drawn attention to the importance of delivering comprehensive metadata and of increasing the statistical system’s coherence, and has demanded the development of a metadata repository. In this context it has also underlined the central role which the IT department should fulfil in “implementing the requirements repeatedly voiced by the Statistical Council for uniform information delivery, increased quality, enhanced timeliness, easier data access and provision of more comprehensive metadata”.

  In no small part due to the Statistical Council’s emphasis on these matters, an IT project was commenced in 2006, the goal of which was to conceive an integrated metadata repository and to prepare an overall plan for implementing such an information system in the form of sub-projects. The system architecture elaborated in this project goes under the working title of IMS (“Integrated Metadata System”) and is presented in abbreviated form in this paper.

“Integrated Metadatabase (IMDB) - A metadata repository to support the survey life cycle”

Prepared by Alice Born, Statistics Canada, for the
UNECE Workshop on the Common Metadata Framework ( Vienna , Austria , 4-6 July 2007)

  Executive Summary

  Statistical metadata is relevant to all stages of the statistical processing and there are a growing number of statistical agencies viewing statistical metadata as part of the whole survey life cycle, commonly known as end-to-end (E2E) metadata.

 At Statistics Canada, a corporate repository and registry of statistical metadata, the Integrated Metadatabase (IMDB) is based on a model that supports the metadata required for the complete survey life cycle – from planning and design of a survey to archiving of master datafiles. The objective of this paper is to describe the metadata model that is being developed in Statistics Canada to support the metadata requirements of the agency’s 590 active surveys and statistical programs. Although the metadata was first developed to meet the requirements for disseminating statistical data, there is growing internal pressure to reuse existing metadata and the administration layer of the model in other phase of the survey life cycle.

 Currently, Canada is developing the administered items for the questionnaire part of the metadata model as well as expanding the model for archiving data. The IMDB model is based on the ISO/IEC 11179 Metadata Registries and the Corporate Metadata Repository (CMR) from the U.S. Census Bureau. The CMR model consists of a data dimension model, business dimension model, administration and document dimension model, and terminology and classification dimension model.

 For purposes of this paper, Statistics Canada’s application in the IMDB of the CMR is described in detail: the data dimension model, business dimension model and questionnaire model, which links the business and data dimensions. Also, registration and classification of the metadata are described.

“e-QUEST: A Metadata-Based System For Electronic Raw Data Collection”

Prepared by Wolfgang Koller, Günther Zettl (Statistics Austria) and Frederick Rennert (CSCAustria), for the
UNECE/Eurostat Work Session on Electronic Raw Data Reporting ( Geneva , 13-15 February 2002).

   Executive Summary

  It is very likely that electronic means of data reporting will become a standard in the near future. In the case of organizations with a wide range of surveys to manage, however, the effort involved in developing, disseminating and maintaining electronic questionnaires tailored to specific surveys is prohibitive. What is needed is a complete infrastructure covering all phases of the data collection process from the preparation of electronic questionnaires by subject matter specialists up to the management and initial processing of the transmitted raw data.

  In 1998 Statistics Austria initiated a research project with the mandate to develop a generic solution of this kind for electronic raw data collection that can be used for any survey by specifying all survey related meta-information – including questionnaire forms and validation checks – in extended markup language (XML) format. The core component of this system, the “electronic questionnaire management system” e-Quest, is a product that especially suits the requirements of complex statistical surveys.

 The paper presents the architecture of the system (including system components that are used internally at Statistics Austria for the preparation of new electronic questionnaires and for the processing of incoming data) and describes basic design principles and how they were implemented in e-Quest. Also discussed are different types and layers of metadata in the context of electronic data reporting.

“The IQML Model of Metadata for Data Capture"

Prepared by Joanne Lamb, University of Edinburgh, United Kingdom for the
UNECE/Eurostat Work Session on Electronic Raw Data Reporting ( Geneva , 13-15 February 2002).

  Executive Summary

  IQML (a software suite and XML standard for intelligent questionnaires) is a shared-cost project in the European Union Fifth Framework Programme (IST-1999-29093). The partners consist of three commercial companies (Dimension EDI, Comfact AB and DESAN Marktonderzoek), two National Statistics Offices (CSO, Ireland and Statistics Norway) and two universities: the National Technical University of Athens, and the University of Edinburgh, which co-ordinates the project. The project started in February 2000, and ends in January 2003.

  There are five related software modules in IQML which share a common data model. In addition, certain modules utilise their own model, which maps to the common model, but which also contains features that are necessary for their particular functionality. The paper presents the common model, the related models, and the mappings between them. It describes the activities of the IQML group in disseminating the models to relevant standards groups, in the software development community and in the area of official statistics.


“Metadata-Based Systems and XML-Based Data Formats in the Production of Statistics in Germany"

Prepared by Michael Shäfer, Federal Statistical Office (DESTATIS), Germany for the
UNECE/Eurostat Work Session on Electronic Raw Data Reporting (Geneva , 13-15 February 2002).

  Executive Summary

  The Federal Statistical Office is currently developing and implementing XML-based document types for statistical tables and data that will make it possible to handle and relate data and associated metadata in a consistent manner throughout the complete process of statistical production. Major long-term objectives include application and data integration, quality assurance and improvement, and archiving data in an application-neutral format.

 In Germany , public surveys are conducted by the Federal Statistical Office in cooperation with the 16 Statistical Offices of the German states (“Länder”). The German Statistical Offices have a long tradition of developing common software products and establishing common standards. They have decided to use XML-based document types for the whole of the statistical production from data reporting to publication. Initial steps include the development of a document type for tables (TabML®, Table Markup Language) and of another one for statistical data (DatML®, Data Markup Language). The most important tools in use at the German Statistical Offices are SPLV®, a 4GL programming language for statistical purposes, STATSPEZ®, a metadata-based tool enabling non-programmers to specify and produce tables with automatically generated SPLV programs, and GENESIS®, a statistical information system. In addition, off-the-shelf products like SAS are used. All the metadata of an SPLV program can be extracted automatically. STATSPEZ stores metadata in the form of reusable objects. Shared STATSPEZ metadata data bases are being set up by the German Statistical Offices to form a common metadata repository. SPLV and STATSPEZ alone provide for about 90% of the regular tabulation throughout all Statistical Offices. As a consequence, ongoing and upcoming efforts to introduce XML document types are centred upon them.