Design frame and sample methodology
This sub-process identifies and specifies the population of interest, defines a sampling frame (and, where necessary, the register from which it is derived), and determines the most appropriate sampling criteria and methodology (which could include complete enumeration). Common sources are administrative and statistical registers, censuses and sample surveys. This sub-process describes how these sources can be combined if needed. Analysis of whether the frame covers the target population should be performed. A sampling plan should be made: The actual sample is created sub-process 4.1 (Select sample), using the methodology, specified in this sub-process.
Surveys collect data on a sample of households with the intention of inferring about the total population from the observation of the sample. The true value of an indicator of interest (such as average household income, the rate of unemployment or the prevalence of HIV/AIDS) and the estimation of that indicator obtained from the sample cannot be expected to be identical. The difference will be partly due to the fact that we are not observing all households in the country but only some of them, and partly due to other reasons. Trying to reduce both kinds of errors (respectively called sampling and non-sampling errors) is thus an important concern of survey designers, critically linked to the precision of the product.
Sampling theory has earned a reputation of being difficult to understand and better left to experts, because it requires substantial background in mathematics and probability theory as a prerequisite. We provide here some highlights, and links to key resources, some simple, other more advanced.
Highlights
- The sample of households to be visited by the survey is often selected in two stages: first a certain number of area units (sample points) are chosen; then a group of households (a cluster) is chosen in each sample point. Both stages are random selections.
- Random sampling permits establishing sampling errors and confidence intervals around the survey estimations. Only random sampling can do this.
- Sampling errors depend very much on the size of the sample, and very little on the size of the population.
- As the sample size increases, sampling errors are reduced but non-sampling errors get bigger.
- The sample is generally stratified by regions or by other criteria, in order to adequately represent subgroups of the population,
- The first-stage sample frame is developed from the most recent census. If the census is old, some updating may be needed.
- The second-stage sample frame is the list of all households in each selected sample point. This field operation needs to be done before the survey, but ideally not much before.
- The sampling errors of two-stage samples are affected by clustering – the tendency of neighboring households to provide similar answers to the questions asked. To reduce clustering, the size of the clusters should be small.
- Samples are generally selected with unequal probabilities and thus need to be analyzed with weights.
- Design effect is the combined result of stratification, clustering and weighting on sampling errors. It needs to be estimated with special software.
- Not all selected households will answer all questions. Survey managers should try to prevent non-response, and properly document the non-response they were unable to prevent.
Introductory references
- Practical guidelines are available from United Nations Statistics Division (2005), Household Sample Surveys in Developing and Transition Countries.
More advanced
Key references
- Cochran, W.G. (1977) Sampling Techniques, 3rd Edition. New York: Wiley
- Kish, L. F. (1967) Survey Sampling. New York: Wiley. (paperback 1995).
- Weisberg, H. F. (2005) The total survey error approach: A Guide To The New Science Of Survey Research. The University Chicago of Press.
Key references
- Azorín, F. (1969) Curso de Muestreo y Aplicaciones. Madrid: Aguilar
- Brossier, G. and Dussaix, A-M. (1999) Enquêtes et sondages: méthodes, modèles, applications, nouvelles technologies. Paris: Dunod.
- Deming, W.E. (1950) Some Theory of Sampling. New York: Wiley (reprinted in 1966).
- Glewwe, P. and Jacoby. H. (2000) Recommendations for collecting panel data, in Grosh, M. and Glewwe, P. (Eds.) (2000) Designing Household Survey Questionnaires for Developing Countries. Lessons from 15 Years of the Living Standards Measurement Study. pp. 275-314. Washington, D.C: The World Bank.
- Groves, R. M. (1989) Survey Errors and Survey Costs. New York: Wiley.
- Groves, R.M., Fowler, F.J., Couper, M.P., Lepkowski, J.M., Singer, E. and Tourangeau, R. (2004) Survey Methodology. New York: Wiley.
- Gourieroux, C. (1981) Théorie des sondages. ESA. Economica.
- Kalton, G. and Heeringa, S. (2003) Leslie Kish. Selected Papers. New York: Wiley Series In Survey Methodology.
- Korinek, A., Mistiaen, J.A. and Ravallion, M. (2005) Survey Nonresponse and the Distribution of Income. World Bank Policy Research Working Paper No. 3543. Available at SSRN: http://ssrn.com/abstract=695442.
- Lessler, J.T.& Kalsbeek W.D. (1992) Nonsampling Error in Surveys. New York: Wiley.
- Mahalanobis, P.C. (1946) Recent experiments in statistical sampling in the Indian Statistical Institute; Journal of the Royal Statistical Society Ser. A 109, 325-378, reprinted in Sankhya (1958), 1-68.
- Mukhopadhyay, P. (1998) Theory And Methods Of Survey Sampling. Prentice-Hall of India. UK, 200408.
- Neyman, J. (1934) On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society 97; 558-606
- Scott, Christopher. (1990) Master sample: Advantages and Drawbacks. Inter-stat, March 1990, No.2, 33-44. EUROSTAT/ODA /INSEE
- Sukhatme, P.V. and Sukhatme, B. (1970) Sampling Theory of Surveys with applications. USA: Iowa State University Press