Designing Data Systems for Skilling India | Center for the Advanced Study of India (CASI)

Stefan Bender, Jörg Heining, Kaushik Krishnan

October 6, 2014

India’s unemployment rate currently sits at 9 percent. Yet, one in three citizens with at least a bachelor’s degree is out of work. Its working age population,is projected to rise from over 750 million today to almost a billion by 2020. At the same time, agricultural employment is in decline, accounting for less than 50 percent of total employment for the first time in Indian history. These market pressures are pushing the labor force towards higher skilled occupations. Yet, even young, college-educated,Indians often lack the requisite skills to obtain these jobs.

It is perhaps with this transition in mind that Finance Minister, Arun Jaitley announced Skill India, a program to give young workers the training needed to find jobs. The goal is to train 500 million workers by 2022. An existing organization, the National Skill Development Corporation, as well as eighteen other ministries, have run skill development programs in the past. Skill India aims to consolidate and replace these fragmented initiatives. However, the government needs not only to upgrade its programs but also its data collection and evaluation systems.

The argument for high-quality accessible data is that, regardless of ideology, governments should pursue effective policies. Additionally, even well-intentioned policies can have perverse effects. Data-driven analysis is needed for both. For example, recent research shows that India’s child labor ban led to an increase in child laborers and a decrease in their wages.

Presently, the National Sample Survey is the only source of nationally representative labor market data. At best, it can uncover some broad trends in employment levels. Given India’s large labor force and many state-sponsored initiatives, a richer employee-employer-benefit linked data set is needed for meaningful policy evaluation. Other countries have made considerable progress in this regard.

Fifteen years ago in Germany, a push from researchers and the strong will of the administration led to the creation of a unique database on individual workers by the Federal Employment Agency. The data originate from notifications into the social security system and employees’ basic information on employment – a rich set of socio-demographic characteristics of the employee and some information on the employing establishment – are put together and annually submitted. Using unique social security numbers, information on employment is combined with other data such as periods of unemployment benefits, registered job-searches, and participation in programs and training schemes.

The resulting database allows researchers and policymakers to follow workers from the beginning of their training until they leave the labor force and enter the pension system. As a consequence, almost all active labor market programs in Germany are evaluated using these data. Prominent examples include the evaluation of the so-called “Hartz reforms,” major labor market reforms in 2005 where new payment schemes for unemployment benefits were introduced and the recently introduced comprehensive minimum wage.

Most of this evaluation comes at a very low cost to the government. Resources like the German employee-employer matched data have become the gold standard for high quality, academic research in labor economics, as the data are publicly accessible for research while preserving confidentiality. Rules are in place to ensure that the privacy of any single individual in the database is maintained. The Federal Employment Agency has a Research Data Centre, a facility specifically designed to provide researchers with access to confidential micro data in a secure environment in compliance with privacy laws. Several field offices of the German RDC have opened in the U.S., making this data available for analysis to academic researchers.

India does not have to start from scratch in creating a similar data set. In fact, most of the raw data needed are already collected. Many industries have to report employee wages to comply with Employee Provident Fund and Employee State Insurance laws. Individuals are required to report income in their tax returns. Almost all benefit programs in India collect and maintain their own data. The only thing remaining is to link and clean the data originating from different sources.

It might be argued that the creation of this dataset is pointless as most of India’s labor force is informal or contract-based. However, the same problems plague any data collection effort, including the NSSO’s. On the contrary, there are many big picture benefits. Hard data on what works will lead to greater stability in India’s labor policies. The knowledge we glean can be used in fixing India’s labour laws to encourage greater formal participation. Better data leads to more efficient investment by the government as well as outside agencies. Most importantly, the costliest part of this endeavor, collecting the raw data, is already being done.

The lesson from the German experience is clear: preparing data is not a bureaucratic burden nor is it prohibitively expensive. Creating, maintaining, and allowing access to administrative data will encourage high quality research on the Indian labor market. The benefits to policy makers and the Indian public are clear. Effective interventions can only happen by chance unless India invests in systems that help it understand how its labor markets operate.

Stefan Bender is the Head of the Research Data Center of the German Federal Employment Agency at the Institute for Employment Research.

Dr. Jörg Heining is a Senior Researcher at the Research Data Center of the German Federal Employment Agency at the Institute for Employment Research.

Kaushik Krishnan is a Ph.D. candidate in Economics at UC Berkeley.

India in Transition (IiT) is published by the Center for the Advanced Study of India (CASI) of the University of Pennsylvania and partially funded by the Nand and Jeet Khemka Foundation. All viewpoints, positions, and conclusions expressed in IiT are solely those of the author(s) and not specifically those of CASI and the Khemka Foundation. IiT articles are re-published in the op-ed pages of The Hindu: Business Line. This article can be read here.

Science & Technology

Society & Culture

IiT Related Resources