Back to Top

Risks and costs

Risks and costs include the expertise involved; questions of data quality, potential misuse, and misunderstanding of data by users; legal and ethical matters; and maintaining the trust and support of respondents.

Ethical issues and maintaining respondents’ trust

When collecting data from individuals, facilities, or establishments, statistical agencies and other data producers usually assure respondents that the information they provide will be used only for statistical purposes. This is both an ethical and legal obligation. To be successful, NSOs must maintain the trust of respondents to ensure they continue to cooperate in data collections. Confidentiality protection is key to that trust. If respondents perceive that a NSO will not protect the confidentiality of their data, they are less likely to cooperate or provide accurate data. One incident, particularly if it receives media attention, could have a significant impact on respondent cooperation and therefore on the quality of official statistics. This is the dominant issue from the point of view of NSOs, but there are other concerns; see our section on ethical issues.

Legal issues

Traditionally, statistical legislation emphasized the issues of privacy and data protection, often preventing the dissemination of microdata. Some statistical legislation has been modernized to allow for dissemination of selected anonymized microdata files. Much legislation remains to be modernized, to better take into account the increasingly diverse sets of data generated, managed, and disseminated by statistical agencies; see our section on legal issues.

Exposure to criticism and contradiction

“Some NSOs are concerned that the quality of their microdata may not be good enough for further dissemination. Whilst quality may be sufficiently accurate to support aggregate statistics, this may not be the case for very detailed analysis. In some cases, adjustments are made to aggregate statistics at the output editing stage without amendment to the microdata. Consequently, there may be inconsistencies between research results based on microdata and published aggregate data.” (Managing Statistical Confidentiality and Microdata Access: Principles and Guidelines of Good Practice United Nations Economic Commission for Europe [UNECE 2007]). If some parts of datasets are considered too unreliable, these may be removed before dissemination. Data producers should be open and transparent about quality. Another concern is that providing microdata to researchers opens up the possibility of their publication of results, which could contradict data producer estimates. When the data producer is an official statistical agency, this may result in conflicting official vs. non-official estimates, and lead to questioning of the data, with possible political implications.

There may be various reasons for differences. First, there may be errors in official estimates, in which case outside scrutiny is a benefit. Second, differences may arise from use of different versions of the data (e.g., the full master file vs. an anonymized/reduced public version, further editing by researcher, etc.). These differences should be marginal and can easily be explained. Third, this may be the consequence of different methodologies used. This is often a more challenging issue for data producers, as the public will not always be able to understand highly technical explanations.

It is important for data producers to be able to defend their estimates. This means that the collection, processing, and analysis of the data must be fully documented, and that this information be preserved for easy access. In some cases, published results may have been produced by or with the assistance of external experts who are no longer available to answer questions. Data producers can protect themselves against this risk by adopting and enforcing strict practices of documentation and preservation in compliance with the replication standard. Succinctly, the replication standard is defined as follows: “(...) the only way to understand and evaluate an empirical analysis fully is to know the exact process by which the data were generated and the analysis produced. (...) The replication standard holds that sufficient information exists with which to understand, evaluate, and build upon a prior work if a third party could replicate the results without any additional information from the author." [Gary King. 1995. “Replication, Replication"]

Political science is a community enterprise; the community of empirical political scientists needs access to the body of data necessary to replicate existing studies to understand, evaluate, and especially build on this work. Unfortunately, the norms we have in place now do not encourage, or in some cases even permit, this aim. The paper provides suggestions that would facilitate replication and are easy to implement- by teachers, students, dissertation writers, graduate programs, authors, reviewers, funding agencies, and journal and book editors.



NSOs may also be concerned about costs. These include not only the costs of creating and documenting microdata files, but the costs of creating access tools and safeguards, and of supporting and authorizing inquiries made by the research community; new users of data files need help to navigate complex file structures and variable definitions. Although the costs are borne by the NSOs, they are usually not provided with budget supplementation to do the additional work. And on the whole, researchers do not have the funding to contribute substantially to these costs.” [UNECE 2007] Thus, whenever possible, such costs should be built into the survey budget as a means of ensuring that maximum use can be made of the survey results. It is in the public interest that insights from the data be made available to inform decision-makers and the public. Furthermore, if survey data are used more extensively in this way, they provide an extra level of protection against budget reductions to statistical programs. Surveys that offer limited knowledge in support of policy-making are more vulnerable to elimination.

Loss of exclusivity

When disseminating microdata, data owners lose their exclusive right to discovery. This is a greater issue for academic researchers than official producers, although official data producers or some of their staff members sometimes take advantage of a monopolistic access to data to offer consulting services. Increasingly, survey sponsors define a legitimate and “reasonable” period of exclusive access to the data by the producer, after which data must be made accessible to other users.

Technical capacity

A certain technical capacity is required to support dissemination of microdata files. The files need to be well documented, preferably using the DDI metadata standard, and preserved. In addition, files must be reviewed to identify the risk of disclosure of individual information and the risk must be reduced using various techniques.