UNITED NATIONS


STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE
CONFERENCE OF EUROPEAN STATISTICIANS
Forty-third plenary session
(Geneva, 12-15 June 1995)
CES/1995/R.5
7 November 1994

October 1994 Work Session on Statistical Data Editing

Note prepared by the secretariat


1. A work session on statistical data editing was held in Cork, Ireland, from 17-20 October 1994 and was kindly hosted by the Central Statistics Office of Ireland. It was attended by participants from Austria, Canada, Czech Republic, Croatia, Denmark, France, Greece, Hungary, Ireland, Italy, the Netherlands, Norway, Poland, Russian Federation, Slovenia, Sweden, United Kingdom and the United States. A representative of the Economic and Social Commission for Western Asia (ESCWA) also attended.

2. The provisional agenda was adopted.

3. Mrs. Dania P. Ferguson (USA) was elected Chair, and Mr. John Kovar (Canada) was elected Vice-Chair.

4. The following substantive topics were discussed at the meeting:

Recommendations for future work are given below but other conclusions which the participants reached at the meeting on each of the above topics are reproduced (in English only) in the Annex of this note.

5. The topics were discussed on the basis of papers prepared by Canada (3), Croatia (4), Denmark, France (2), Netherlands (3), Poland, Slovenia, Sweden(5) and the USA (5).

6. The participants recommended that future work be undertaken by the Conference in the field of data editing. To assist them in formulating their recommendations, the secretariat informed the participants of the goals that the Conference of European Statisticians had identified for work in this field, and distributed the portion of the integrated presentation of international statistical work programmes in the ECE region relating to the project on statistical data editing.

7. The participants recommended that a further work session be covened in 1995/96, which would focus on the preparation of Volume 2 of the publication on data editing. They recommended, therefore, that the following text be included in the programme of work of the Conference of European Statisticians:

2.2 Statistical data collection and processing

Activities of ECE


ANNEX

a) DATA EDITING METHODS, TECHNIQUES AND SOFTWARE

Documentation: Working Papers by Croatia, the Netherlands, Slovenia and Sweden (2).

1. The work session was informed about the experiences of some countries in the application of optical readers in carrying out their 1990/91 censuses. The use of OCR technology, combined with automatic and computer-assisted coding proved to be very efficient, saving both time and cost. Further, data captured through optical readers with one typist correcting errors had been found to be of higher quality than that of manually entered data. The Work session expressed its desire to pursue the work in this field, in order to optimize the experiences of those countries which have made use of new technology with a view to facilitating the undertaking of the next round of censuses in various countries.

2. Some countries have also reported on the use of other methods and technologies such as scanning, aiming to eliminate data entry work. Furthermore, computer assisted self interviewing using touch tone data entry, and electronic forms were reported. The work session also noted the use of client/server systems enabling access to common data bases. Some participants mentioned that the role of client servers is important as it provides the facility to relate back to the historical time series needed very often for data editing.

3. A representative of Statistics Netherlands informed the meeting of the progress in the development of a new system for survey processing called Blaise III, the successor of Blaise II. While the earlier versions of Blaise focused only on data entry and data editing, Blaise III is now capable of managing all software needed in the individual phases of survey processing. The separate programs are considered as modules in a statistical control centre which plays the role of the switchboard according to the needs of individual statistical applications. Through unified metadata description, Blaise III can facilitate the reformatting of Blaise data to different data file formats (e.g. SPSS, SAS, Xbase).

4. The work session noted with interest the functional flexibility of Blaise III for survey processing in a PC environment. It expressed the view that it would be desirable that Statistics Netherlands present this new system to the next session of the Working Party on Electronic Data Processing (to be held in February 1995) which is charged with the consideration of managerial aspects of the whole process of statistical electronic data processing.

5. The work session was also informed of the graphical macro editing system recently developed by Statistics Sweden. The software was developed for the PC as a special application for the Short Periodic Employment Survey, using the Visual Basic programming language. The system provides for graphical drill down interactive editing of outliers. A positive feature of this system was the relatively short time needed for its development. Although the system was developed for macro editing, it was pointed out that a similar approach could be also useful for micro level editing.

b) IMPACT OF DATA EDITING ON QUALITY OF DATA

Documentation: Working Papers by Canada, the Netherlands, Poland and Sweden.

6. Because of the large amount of time and resources spent on editing in the past, countries have begun to look at various possibilities of cutting down on editing while at the same time maintaining quality. The general feeling was that at present too much editing was being done and it was time to adopt new editing strategies.

7. The work session was informed of a number of country studies which have shown that complete editing did not significantly improve the quality of the data and, therefore, a good approach would be to use selective editing methods. The idea of selective editing is to identify records with high impact on the estimates or those with a high probability of having large errors and concentrate the editing efforts on them. This may be carried out with methods such as the use of Hidiroglou-Berthelot bounds, score functions, top-down approaches and others.

8. Nevertheless, it was cautioned that selective editing's applicability depends on the subject matter areas concerned. Although selective editing worked well in some areas of statistics, it has limitations in other areas such as in population censuses and demographic statistics. It was also pointed out that the results of these studies could not be considered as conclusive and generalized across countries since much depends on the quality of the source data and that in that respect, the situation differed from country to country.

9. The work session noted that while no country suggested opting for the elimination of editing it was stressed that overediting must be avoided, in particular as extensive editing often introduces new errors to the data.

c) STATISTICAL DATA EDITING METHODS AND TECHNIQUES, VOL. 2

Documentation: Working Papers by Canada (2), Croatia (3), Denmark, France (2), Sweden (2) and the United States (5).

10. The Working Session considered papers dealing with the preparation of Volume 2. It was recommended that contributions originally submitted for the item of the provisional agenda proposed at the 1993 Work session, that is - new technologies in data editing, data collection, automated coding, and imputation of missing values" - be also considered. These papers served not only as a basis or discussion and exchange of experiences but also as candidate contribution papers for Volume 2.

11. The Working Session was informed that the ECE secretariat had finished the final editing of the publication "Statistical Data Editing Methods and Techniques", Volume 1. It was also reported that the Bibliography had been updated based on the references of the individual chapters and will be included in Volume 1. The secretariat offered to take up the responsibility of maintaining the Bibliography in the future. It is expected that Volume 1 would be ready by the end of October 1994 when it will be disseminated to the members of the Working Group.

12. A special meeting of experts was held after each plenary session, the goal being to plan strategies, prediscuss the content and organization of the work on Volume 2 and to prepare recommendations on that work for the plenary session. The experts were from Canada, Croatia, France, Italy, Netherlands, Sweden, the United Kingdom and the United States.

13. The planning group agreed that, for each chapter of Volume 2, one country should be appointed to be responsible for coordinating and pre-editing the chapter content. It was also felt that Volume 2 could not be considered as a purely theoretical work but should be well balanced with practical experiences and examples. Furthermore, the group agreed to cooperate with the secretariat in the coordination of the work on Volume 2 in future. The secretariat informed the expert group of the possibility of publishing Volume 2 as an official United Nations publication.

14. Based on the proposal of the group, the Work Session discussed the preparation of Volume 2. It was recommended that all papers should have a common structure as follows: First page (title, author(s), abstract and key-words list; Body of the paper; Annex A (references); Annex B (definition of key-words if different or not in Glossary (WP33). It was also agreed that all papers should be submitted in WP 5.2 or ASCI format on 3 1/2 inch diskettes.

15. It was agreed that each chapter will have an introduction describing the objectives and how examples and or applications will be presented in the chapter. The Work Session considered the contents of individual chapters and made recommendations. All working papers mentioned should be updated in accordance with the main orientation of individual chapters as specified below.

16. Chapter 0: Introductory chapter linking Volume 1 to Volume 2.

17. Chapter 1: How to evaluate data editing process (Sweden) - Concentration is on evaluation methods rather than specific applications. Descriptions of how to measure, audit and manage the process are sought.

Contributed papers: Denmark (WP no. 20); Sweden (part of WP no.14); USA (WP nos. 30 and 32).

18. Chapter 2: How to design sets of edits (Italy) - How editing and imputation specifications are (or could be) defined in complex applications, regarding for example, strategic surveys such as Censuses, Labour Force Surveys, etc. Contributions are required especially containing:

The introduction to the chapter will contain results of edit logic survey of participating countries.

Contributed papers: USA - NASS (Room Paper No.1); USA - Bureau of Census (Application of Speer/discrete - a new paper); Italy (paper describing system Daisy - a new paper); Denmark (Application of Labour Force Survey - a new paper).

19. Chapter 3: How graphics may be applied to data editing (U.K.) - How to use graphical techniques to assist the editing process

Contributed papers: Sweden (WP no. 35); Sweden (Experiences gained in New Zealand and Australia - a new paper); USA - Bureau of the Census (Julie BIENIAS - a new paper; UK (Summary of Esposito (ARLIES), Hughes, Cook - a new paper).

20. Chapter 4: How new technology impacts the data editing process (USA) - Focus should be on the impact of using new technology and not the technology itself.

Contributed papers: USA (WP no.31); Sweden (WP no.14); Slovenia (WP no.8); Croatia (WP no.2); Hungary (CAPI application - a new paper).

21. Chapter 5: What to do when an edit fails (Canada) - Studies should be expanded highlighting selective editing and respondent follow-up, as well as automatic imputation uses and innovations.

Contributed papers: Canada (WP nos.10 and 16); Netherlands (WP nos.11 and 17); Sweden (WP no.12).

22. Chapter 6: How the computer can speed the coding process (France) - Papers should focus on :