Back to Top

Generation of synthetic populations

The International Household Survey Network (IHSN) and the World Bank Development Data Group (DECDG) collaborated on a project aiming to promote the use of existing survey and census microdata for agent-based simulations and micro-modeling. An important component of this project was the development (or improvement) of methods and tools for the generation of synthetic populations. The rationale for generating such synthetic datasets is that (i) synthetic data largely solve the issue of statistical disclosure risk, and (ii) the method allows combination (fusion) of data from multiple sources and types, thereby creating datasets of particular relevance for modeling and simulation. Working with a team of experts in Austria, the project resulted in the publishing of simPop, an open source R package.

simPop is an R (open source) package for generating synthetic populations based on survey data and ancilliary Information. Tha package includes model-based methods, calibration and combinatorial optimization tools.
Project status: Close (but follow-up activities are being implemented)
Related on-going or planned activities: The R package simPop will be used to generate actual synthetic populations. The code and data will be disseminated (to be used mainly as training materials for interested users). Other activities related to microsimulation and geo-referencing of synthetic populations are ongoing or planned.
Sponsor(s): DFID Trust Fund No TF011722 administered by the World Bank, Development Data Group (WB-DECDG) and World Bank
Implemented by: World Bank Development Data Group and consulting firm
Type of output: (i) Open source software (R package with manuals); and (iii) technical guidelines.