(English captions & Hindi subtitles available)
About the Seminar:
Professor Singh (in collaboration with Karthik Muralidharan) presents results from a large-scale experimental evaluation of an ambitious attempt to improve school management in India, which featured several global “best practices” (comprehensive assessments, detailed school ratings, and customized school improvement plans). While assessments were near-universally completed and ratings were informative, the intervention had no impact on school functioning or student outcomes. Yet, it was perceived to be successful and scaled up to cover over 600,000 schools nationally. They investigate reasons for program failure and scale-up despite ineffectiveness. Their results illustrate how ostensibly well-designed programs that appear effective based on administrative measures of compliance may be ineffective in practice.
About the Speaker:
Abhijeet Singh, Associate Professor, Department of Economics, is an applied microeconomist at the Stockholm School of Economics (SSE) and studies topics relating to the economics of education, child nutrition, and public service delivery in developing countries. He is affiliated with the Jameel Poverty Action Lab (JPAL), CESifo Economics of Education Research Network in Munich, the Center for Global Development, and the Young Lives study. Before SSE, he was based at the Department of Economics at UCL. He received his doctorate in Economics from the University of Oxford in 2015.
Welcome to another CASI weekly seminar at the Center for the Advanced Study of India, here at the University of Pennsylvania. My name is Tariq Thachil and I'm the Director of CASI and I am delighted to be able to welcome you to today's talk. Just a quick reminder for those of you who come to our events regularly we will not have a weekly seminar talk next week as it's a university holiday and so we'll resume two Thursdays from now. And so just a quick note on that.
For today's talk I am delighted to be able to welcome Abhijeet Singh who is an associate professor of economics and an applied micro-economist at the Stockholm School of Economics and his work looks at topics relating to the economics of education, child nutrition and public service delivery in low income countries. He has affiliations with the Jameel Poverty Action Lab also known as JPAL, the CES-Ifo economics of education research network based in Munich and the center for global development.
Before moving to the Stockholm school he was based at the Department of Economics at the University College London and received his doctorate in 2015 from the University of Oxford, his work has been published in a number of the leading outlets in economics including the American Economic Review, the Journal of the European Economic Association, the Journal of Development Economics and World Development and many other outlets.
Some of his recent work includes an interesting paper in the American Economic Review with Karthik Muralidharan and Alejandro Ganimian which is Disrupting Education? Experimental Evidence on Technology-Led Education in India and that looks at the effects of a personalized technology aided after school instruction program in middle school for middle school children in urban India and that came out in 2019.
He has an interesting working paper entitled the Myths of Official Measurement which looks at a centrally understudied component of state capacity, the quality of administrative data and he looks at direct audit evidence showing that levels of student achievement in India from a census that covers over seven million students are severely inflated and in fact the distortion is worse for students who are low performing.
His talk today builds on some of these interests in education and so the title of the talk is Improving Public Sector Management at Scale? Experimental Evidence on School Governance in India, this is a long running project again with Karthik Muralidharan again we're delighted that he has made the time to share some of their findings with us here.
So Abhijeet I will give you the floor again for everyone he will speak for about 25, 30 minutes and then after that we will have Q&A. If you have any questions please put them in the chat box to me Thachil, personally, directly so that I can compile a list of questions and then I will call upon you to ask your question to Abhijeet, please keep your questions short and please keep your microphones muted for the duration of the talk and only unmute them to ask the question.
Okay, with that Abhijeet I think that's all I have by way of introduction, welcome again to CASI thanks for joining us, take us away.
Thanks a lot Tariq and to everyone at CASI for the invitation. I guess like everything else in the last year, this is from my living room to yours. This is a project that has been in the works for a while and as Tariq said a lot of my work and again this is joint with Karthik Muralidharan whose at UCSD, a lot of my work and a lot of shared interests that I guess Karthik and I both have, have been around trying to see ways of improving students achievement in India. And I'll be talking a bit more about that, this particular and you'll see the geneses of this is really about trying to see if we can improve public sector management, so how schools are run.
But let's jump straight in. The background to all of this work is that we've done extraordinarily well at getting kids into school. So nearly everyone for the last 15 years at least who is of school going age enrolls at school at least some point in time. And that's a huge success if you look at say the literature from the 90's the core of the economics of education literature in India was about students dropping out, the effect of household shocks, general equalities in enrollment and we've made substantial progress there.
The sad part of it is we haven't made the same kind of progress in what kids actually know, even though they're increasingly in school, even though they're staying in school a lot longer. So instead of just looking at enrollment, look at how many years of schooling average cohorts get, that's trended up remarkably over the last 15 years.
At the same time, what we've known at least since the first ASER report in 2005 nationally and if you go back to the 90's things like the probe report which this audience will be familiar with. We've kind of known that a lot of students can't do very basic things. One statistic out of ASER is about half of kids in Grade 5 can't read things that they should have been able to in class 2. Think of that as the background for all of this work and a lot of our work has been trying to figure out exactly why is that? Because there's a period that education expenditures have gone up, households have gotten richer, children born today are born to more educated parents than children of 20 years ago, so we should have seen larger increases.
One of the things that comes up quite a lot and this is particularly in the case of public sector education in India, so government schools is just the issue of school governance and management. Think of this as a broad bucket and if you had to choose an indicator then think of things like teacher absence. If you show up in school on a surprise visit on average about a quarter of public school teachers in rural India are absent, so that was true in 2003 and that was more or less true in 2010, it didn't budge over a 7, 8 year period. And reasonable calculations suggest that just in terms of lost services that's about 1.5 billion dollars per year on the exchequer and this is without thinking about purchase and [inaudible] or any conversions of that kind.
So that's one, we know that school governance is an issue and that's true in lots of different ways of commissioned yet[crosstalk]
Jere R. Behrman:
Are the numbers on private schools?
There are numbers on private schools, the teacher absence is lower it's not as low as you would imagine it to be, if I remember correctly out of this paper in the General Public Economics I think it's about eight or nine percentage points lower so about a third lower in terms of absence. This particular experiment... and it's important so thanks for raising that.
This particular experiment and everything that we're talking about is going to be in the state sector so this is going to be a very government led intervention and we've got other work trying to look at similar topics in private settings things around the Right to Education act. So, I have some previous published work about Estimating the Private School Premium in Andhra Pradesh using panel data and the like. But this is going to be all about government schools.
And one of the things I think that we now have pretty good evidence on internationally and in India is the quality of management, in terms of processes that happen in school, whether accountability pressures, but also just things of the ability to hire people, the ability to reward people, the ability to hold people accountable, the ability of the processes in place to keep track of student achievement, do you even measure what kids know and does anything in what you do change in response to what kids know.
So things of that kind, does at least correlationally, they correlate with student learning levels, they correlate with student value-added. And one thing that is positive, is that over the last 10, 15 years, governments are increasingly willing to believe, and this is true not just in India but I think more generally, in a number of low and middle income countries that student achievement levels are an issues, that having figured out how to solve axis that this is the next frontier that you really want to push on and that school management has a role to play here. So a lot of government really want to fix how schools are run and reforms trying to change this are completely ubiquitous.
One thing that we had to do for the paper was just try and see what are the things that governments do, so we just looked at the World Bank database alone so these are projects that the World Bank has had some role in and you find things in 80 odd countries that fit some definition of school management and then if you narrow things down and speak a bit more about this particular intervention, the kind of reforms that try to improve school management are to be found in essentially your developing country of choice.
So there very common and they are very much sold as best practice and we don't actually know whether they work when governments do these at scale. So we know very good things about being able to put incentives in 100 schools, about putting more accountability pressures and the like but relatively little about composite management reforms and that's what this paper is going to be about.
So here's really what we're going to do, which is that we are going to look at a large-scale randomized control trial of a school management program that is now being scaled up all across India. This is going to be set in the state of Madhya Pradesh and that's a large state so you should care about it just by itself and the program looks quite a lot like what programs in the States do so it combines a number of global best practices and the idea is that school inspectors reach, they do comprehensive school quality assessments based on those they get together with the teachers write out these school improvement plans with the idea being, try and make a list of things that are under your control, let's try and figure out what are concrete things that you need to do over the next year to get netter.
Cluster Resource Coordinators who are basically the next administrative layer above the school are meant to follow up quadrally and say okay you said you would do A, B and C how are you doing so far? There are no explicit incentives positive or negative so unlike really high stake versions of school accountability that we know of, for example, in the no child left behind world. Nobody's at risk of losing their job, nobody's really at risk of losing out on promotions or the like and again we think this is the realistic setting that we operate in.
So think of this as a thoughtful school reform program around management that is feasible for large Indian states to deliver.
And what we're going to do, we're going to experimentally evaluate the program in just under 1,800 schools. And these 1,800 schools these were randomly selected from population of about 11,000 schools and this is going to be important because in some senses the key questions is not do students learn better if you make the schools run better, the key question is can a government that wants to make schools run better actually do that at scale. So can governments do that with the capacity that they have with insensitive agency problems that might be there at a large enough scale. So again the fact that it's implemented in around 2000 schools means that this is beyond the scale that I can make any program work.
So if it's 50 school we can set up enough salience and reminders and possibly layer on incentives and make things work and that's important in eventual policy relevance of the scale of work?
Jere R. Behrman:
So is there a big question about not including any incentives?
Sorry, Jere, I hadn't realized that was you.
Jere R. Behrman:
Very good to see you.
Very good to see you as well. So we would have liked there to be incentives. I think realistically because this comes in the context of a program that the government wanted to scale up everywhere. So there is no version of incentives that we think we can explicitly place on this one part. It is impossible to add high powered incentives around salary and it's really impossible to add high powered incentives around career concerns. So you can not link, in this particular instance issues around promotions or around salary raises and you can't really issue things around penalties either. One reason that absences can be high is exactly the fact, so in a world where you can't put too many sanctions on not showing up for work, the odds of being able to put sanctions for not meeting targets in a performance review, just the likelihood of being able to do that declines a lot. So, this is without incentives and frankly we would have liked very much the kind of stuff, the settings that you did in Mexico for instance, but that wasn't really on the table here.
And I'll speak a bit more about the setting, this is also not uncommon in terms of how these programs actually run. So, in an appendix for the paper we talked about at least 27 other illustrations of this that we see without incentives in a variety of countries.
And at the time that we started evaluating this which Ark was the technical partner here was trying to extend these in Kenya, extend these in South Africa, this looks almost identical to a reform called the Whole School Improvement program in South Africa that went on for about a decade and then was wound down. And it doesn't seem like it achieved anything but we don't actually have an evaluation that says anything one way or the other.
And Abhijeet could you just say a quick note on what was the impetus behind actually doing this in Madhya Pradesh, so was this a particular government act or how did this suddenly come about that you were interested in this particular thing? In this particular intervention?
So I believe a combination of things, this came around at a period of time when the government really wanted to focus on education reform. It was the same government that's in place now. There was this feeling that Madhya Pradesh was doing pretty badly on all sort of comparable metrics of achievement and then they first started with the large universal assessments to try and pivot the system towards thinking of learning, there was this feeling that and I think in principle I agree with the instinct which is that you didn't want to tailor the intervention too much, so you wanted to be able to take on some recognition of the fact that in a really large state there are about 100,000 schools in Madhya Pradesh in the government system but you wanted to let the precise actions that were taken by any school differ. So schools are heterogeneous and they face heterogeneous constrains and that they might need supporting on a wide variety of things.
So they wanted to look at interventions that took school functioning as a whole rather than focus specifically only teacher training or focus specifically only on getting better work with the school management committees or with the headmaster necessarily going and doing inspections of classrooms and things better. So that's why this looks at a composite program.
Once they had decided that that's roughly the way that they wanted to go they looked around for technical assistance and actually this program looks a lot like what the UK system of school inspections is. So if you look at the overarching design motivation that looks a lot like Ofsted inspections for those of you who might be familiar with the UK setting. And that's a function of the fact they approach different for technical assistance, different approach Ark... Ark which is the technical lead here in developing the program they run charter schools or the equivalent of charter schools in the UK quite successfully. And run various educational services and that's what they... they spent two years modifying it to context. So it's already much more thoughtful in trying to adapt than the typical cut and paste job that you would see for global best practices. They spent a year trying to work out in smaller groups exactly what assessment rubrics should look like then they spent another year trying it out in 100 schools and at that point they thought here is an intervention that we can go with at scale in the entire state.
And that's the stage in which we came in to evaluate, so at the time that we went in I think we were reasonably sure that if it fails it won't be failing because of teething problems or that it wasn't quite obviously a botched up job. So this is near the [inaudible] envelope of what you might expect would happen in a state government in terms of having had support from the senior leadership, in terms of having had support from the senior bureaucracy, in having had international technical assistance and with that technical assistance having taken two years to try and seriously adapt it to this context. This is an audience that's worked with aid programs for a long time. Already by putting in those filters you screen out very large portion of programs that get implemented.
So that's what this paper is going to look at and I'm just going to give you the results now. Because I would rather make sure that I actually finish on time and then we spend time discussing what we learn or don't learn out of this.
The first bit is, the intervention got delivered and again as anyone who's worked in program [inaudible] scale that's not something to be taken for granted. The assessments, the first stage of the intervention is, the inspectors are meant to go, there meant to spend two days in the school, there meant to talk to parents, there meant to observe classrooms, there meant to talk to students, there meant to inspect administrative records. All of that happened, 93% of the elementary schools that were assigned to the treatment group they were covered by the assessments and they prepared the School Improvement Plans.
At least in terms of, if you think of this as sort of the first stage, there is a first stage. The other is there was actually meaningful variation across schools, most schools are just rated inadequate. Very, very few schools are seen as being exactly at standards or even close. There is variation across schools, that variation seems to make sense. By seems to make sense I mean it's predictive of future value added, it's predictive of things like teacher absence over and above what you can observe out of test scores or out of observance.
Whatever these assessors went and saw over those two days in putting these assessments together that had information. So we know that at least the government was able to implement this in 2000 schools which is not to be taken for given. And that this had information that isn't available otherwise. And that's basically where the good news stops.
We find basically no support or oversight in treated schools, we don't see any change in frequency of visits or the content of inspections, so if any of you worked in schools you'll know that there's typically a visitors register where everyone puts down comments we even transcribe those nothing seems to have changed. People are coming the same frequency, there still commenting on the same things, we don't see any signs of school management committees which have teachers and parent representatives we don't see them a more active role in treatment schools, we don't really see changes in teacher effort, we don't see changes in classroom processes.
Teacher absence is high, in our sample we are talking about 33% and that stays unchanged across treatment control groups. And we don't see any effect on things like instructional time on the use of textbooks and workbooks, on the likelihood that you've checked student homework books. So think of that as a stock measure of teacher effort. We do a lot of classroom observations but the one pitfall of regular classroom observations, it by definition is not blinded, there are two strange adults taking lots of notes while you go on and teach your class. We thought of taking a random sample of homework books off students and seeing has the teacher ever corrected anything? If they've corrected have they given comments? Because [inaudible] on the day, we're not seeing any changes there, student absent rates are high and then unsurprisingly we're not finding any changes in test scores. We don't find any changes in test scores on independent assessments that we run but we also don't find any changes in administrative tests.
Again that's kind of interesting because in a separate paper I show that administrative tests here are massively inflated. So nobody... there's very little oversight there either nobody even felt pressured enough to raise those ups. So there wasn't even any grade inflation in response to that. So that was the first RCT the way we had planned it. Had it all gone to plan this would have been a paper that would have been done in about 2016 or so.
What changed was, the government had already planned for the expansion of the program in stages, so the way that the program had started was that the government from the get go had wanted to only look at programs that could eventually target all schools. What they had wanted to do was do it in 100 schools as a pretest, these 2000 schools as the first proper implementation as phase one, then to go to 25,000 and eventually got to 110,000 in the entire state system. At the times that we had these results, and these were shared with the government all through. They'd already planned, they'd already notified the expansion, they made... they were open about these results. The other thing that happened was, there was a national program called [inaudible] city that came in which said, this seems like a really good idea and we should do this across the country.
This was a different assessment rubric that was developed by NUEPA which is the leading education institution body, it's been changed to an institute from a university, folks in education in India would know who we're talking about. The government took a lot of this feedback on board and essentially what they did was made the plans much more detailed. I encourage you to go and look at the paper and in the appendix we have what the school improvement plans looks like. I've never filled out as detailed a performance plan in my life and certainly in my career.
There weren't any changes in incentives, I'm going back to Jere's question, that's because nobody could work out a version of incentives that would be scalable across all schools in Madhya Pradesh and that's just a design constraint there. And nationally this program's now been scaled to over 600,000 schools and the target is to reach 1.6million schools. The target is to reach all government schools in the country, the one change was that earlier these external assessments were the core of this school diagnosis in the national program there is a greater weight-age to school self-evaluations and self-assessments. But it's very, very similar in spirit. At that point we felt behold that given this was going up nationally and we carded the first version wasn't working that we had to try and evaluate that.
For the next two years we evaluated the scale up and we did this using a matched-pair design that satisfies parallel trends in test scores, part of the reason for doing that is that because this was being rolled into a national program it wasn't really possible to randomize at that point. By all the checks that you could do in observational designs it seems like this is a valid comparison group, also this is a comparison group that was formed before we went and collected any outcome data. So it's pre specified in that sense and 1.5 years after that we still find absolutely no evidence of improved learning outcomes.
In that respect this is a grim paper. At that point, Tariq mentioned at the beginning that this is a project that's taken a while, that's because we really wanted to try and figure out exactly what policy reform at scale means in this sector and in the setting. And, quantitatively we knew that to our best ability that this wasn't changing anything in either processes or in outcomes so why is it that it wasn't doing so, even though it seemed to have support through much of the state bureaucracy. Then we supplement the RCT with essentially doing unstructured interviews with a random sample of teachers, head-teachers, education officials over a long period time and just trying to understand exactly how they saw the program, how program implementation worked from their perspective and what they think the program was trying to achieve.
Here I think what we find is true of public service delivery in India more generally, so certainly for things like education, health, social security that what teachers and supervisors, so here you should think of block resource coordinators, class resource coordinators on basically perceiving the program as a data collection and paperwork filing effort. The idea is that, this is one of the many things where the government at the center asks us to fill forms and send them back up and they want all this data and therefore, even if I'm super motivated, what I need to do is make sure the assessments happen. And by that metric of success, this was a very successful program, all the paperwork was submitted on time. And then the moment you'd filled the school improvement plan [inaudible] program effectively starts.
And de facto, the way that the program translates is pretty far from the reflective exercise on self-evaluation and improvement that program designers had in the back of their mind that the theory of change pushes through and these features would arguably relate much more broadly to bureaucratic incentives in the public sector where you want to reward the appearance of activity rather than being focus entirely on outcomes. Because the outcomes in terms of test scores aren't really measured or evaluated here. I mean there evaluated by us but not under business as usual.
I am going to skip a bit of the literature review but we basically think of this as contributing to within economics to a few different strands. One is there's a literature around improving management quality including in the public sector. So we know in the private sector, these consulting inputs lead to persistent improvements in productivity that's true of firms in India, that's true of firms in Mexico and we know a bit about managing quality being correlated with productivity and government sectors, Imran Rasul at UCL has been doing some really good work on that.
What we're trying to do, we're trying to see would you be able to run any similar interventions in public sector settings at roughly kind of the intensity that you would have if you tried to do this at scale. And we find absolutely no evidence that it changes things and one reason and Jere pointed this out already, and I would say this is the default economist reaction is that oh we can conjecture that a really important reason is the lack of incentives. And there's work that looks at the complementarities of inputs or knowledge, with incentives in education, in health, in the private sector so a lot of work seems to suggest that essentially you get what you incentivize. Here you incentivize paperwork that's what you got. That would be one version.
But there's a broader issue here of trying to understand public sector reform and organizational economics which is why would you scale up things at the stage that you think they aren't working already? Just the difficulties, it's called change management, in the management literature, in large organizations. This is a literature that has a really active body of work but relatively little well-identified evidence and then we think we speak a little on bureaucratic incentives but I'll let you be the judge of that.
The rest of this is very straight forward. Madhya Pradesh is a really large state, that's basically want you need to pick up on the slide it's a really large state, and it's exactly the kind of setting that you would want interventions that target achievement. So learning levels are low, teacher absence is high, student attendance is low and a lot of these trends seem to have been worsening into the 2000's and that's partly why education had become so salient, at the period of time that the [inaudible] assessments were starting this was starting. Concerned by these weaknesses, that's where the government of Madhya Pradesh prioritized trying to improve school governance and management.
This program which is called the MP School Quality Assessment intervention in India it's called Shaala Gunwatta is modeled after several global best practices, I've said this already in the questions that came up it came with a lot of political and bureaucratic support substantial technical inputs, lots of adaptation and as the theory of change think of this as being relatively closely related to a lot work in management, Bob Gibbons and Rebecca Henderson have [inaudible] where they're trying to look at why do low-performing organizations, public or private stay that way? Why does low performance or low quality of management persist?
And you can think of lots of different things that break down, one is that maybe they just don't know that they're performing poorly, the other is they know that they're performing poorly but they don't know what they need to do to get better, so this is an information constraint, or they know they're performing poorly, they know what they need to do to get better but they're just not motivated or held accountable for improvements so that's the agency view of the world that Jere and us as economists would look at, all of this is lined up and they're just external factors beyond their control and the program to the credit of people that are trying to develop it, did explicitly try to address each of these constraints.
The idea was that the program would first set out standards, what is it that a good school is meant to be like? And, the way to think about this is, there is supervisory capacity that exists in the state system, so there are cluster resource coordinators, there are block resource coordinators, but there isn't a job description. So they're told they need to go and inspect and make sure things are all right but what doe all right mean is not something that is specified. So, the first bit is, okay you want a rubric about what you're supposed to look for and how you're supposed to judge with some element of discretion. That's the standard setting part. The assessment part is when you go and you actually do this, this is trying to figure out information and this assessment importantly and the entire intervention is framed as a collaborative effort.
It's framed as a collaborative effort across the entire state education system and part of that is to try and start a process of continuous improvement. At that point you're meant to sit with these assessors and all of the school teachers and representative parents and meant to sit out and put together the school development plans which have concrete actions. Here are things that you can actually do. You might want an entirely new building and 10 more teachers but that's maybe not happening so within the constraints that you have, with the student body that you have what are the things that you can do. And then, this is trying to figure out, are you doing poorly? This is trying to figure out and these assessors also got quite a lot of training in terms of the kind of actions that schools can take so what you need to do in order to get better. And then the cluster resource coordinators are supposed to follow-up every quarter.
The theory of change there is that even though you can't layer on high powered incentives, career concerns matter in bureaucracies. So nobody wants their boss to think badly of them and nobody wants negative reports about them to go upwards on the chain even if nobody's really, realistically ever losing their job for it. That's the idea. Just someone turning up and saying oh you said you would do these things and did you actually do these things? That's the most that we can do. So that's what the program was trying to do and this is really similar to a national policy. Here is just a screenshot from their website.
The national policy is called Shaala Siddhi so it stems from [inaudible] Shaala Siddhi, it's school evaluation as the means, school improvement as the goal. This is partly in response to, I guess, a feeling amongst many fellow economists about why would anyone do something like this without incentive? Well this is what governments do. So, the need for effective schools and the idea is that if you evaluate then you will be motivated and people [inaudible] motivation and everyone will get better, we'll get better incrementally but we'll get better.
It's also, and it's not just India, this is not an India dysfunction paper at all, this is common in lots of places just from the World Bank Project database and we just mapped out, where do we see interventions with school inspections with only management training and with school development plans. So, school development plans typically were followed up with either one or both of these things. These things are really common and this is just the ones that the World Bank's been involved in, lots of players are involved in this case.
The experimental design is really straight forward, between 2014 and '16 the government wanted to do this in 2000 schools. They wanted to start in one region, in five districts. These five districts had about just under 12,000 schools and they said you can pick whichever 2000. So that we randomly assigned. An important design feature is, that we did not want to randomly assign across schools, so this is assigned at the academic cluster level. And the reason for this is the following, if I am the cluster resource coordinator and you tell me you really care about the outcomes of some schools under my charge and not other schools. It would be rational for me to just pay more attention to the schools that I know somebody further up the chain is keeping track of.
So we didn't want to crowd out effort from some places rather than other places, the other is that at scale this is something that at least the government at the time wanted to do in our schools. We wanted to avoid spillovers here so it was implemented in 153 clusters, they're called JSKs in this setting, 51 to treatment the rest to control. And all schools in the treatment cluster are covered by the intervention. Essentially, find the bureaucrat that all schools that are under my charge are covered.
In the first phase it was done in just under 1,800 elementary schools and 116 secondary schools. I will only be showing you results around the elementary schools, that's because the national policy is only about the elementary school and much of the program designed was in that. The results on the secondary school are reported in the paper, so this is not us hiding stuff under the carpet, and it's an evaluation at scale. These are 12,000 odd schools in these districts. That's the first phase of the experimental evaluation. This is the state of Madhya Pradesh. It literally means central province, it is in the center of the country.
In the second phase we further wanted to broaden out our evaluation a bit, so we extended to five districts in... this is the Bhopal region, this is the Indore region and we [inaudible] here partly because we were interested in just seeing if program fidelity could work in even more challenging places, which is the kind of thing you want to worry about as things scale up.
We study effects of the program in the first evaluation up to 18 months after the roll-out. And the kind of data that we have, we have all of the implementation data from the online portal with detailed assessments. So all of the assessments are uploaded and school improvement plans are uploaded. Student learning we have in 302 sort of primary and middle schools split across treatment and control, we go and run our own assessments and that's for a couple of reasons. One is that we wanted to make sure it covered a broad range of ability and would actually pick up effects, which may not be picked up in school exams that are targeted at grade level. So the average kid is really far behind, you can improve their achievement a lot and still not pick it up on school exams. So that was one. The other was that we were, this is by nature and unblinded treatment. You are not agnostic to the fact that you have been monitored that you have had to sit for days, you have made a plan and then somebody comes and keeps asking how you are doing on it.
We were realistically worried about inflation and administrative test scores and the like. In these schools we also took down individual student-level scores on official assessments that are comparable across the schools in the state and then for the full study population we have school-level aggregate scores. So that's the data on student learning and student learning was our primary outcome. And the school processes, so in the 300 schools we actually spent a lot of time collecting as much data on intermediate variables as we could. And the reason for that is straightforward, you can think of lots of actions that may be prescribed for performance improvement that teachers adopt but they don't increase achievement, so increasing student achievement is kind of a hard thing, so what we did was, three times in the school year we went and collected teacher absence in surprise visits, we went and did classroom observations, we surveyed school principals, we surveyed teachers, we surveyed students, this was a lot of work.
That's the summary there. And, now everything is just results... Naveen how much time do I have?
I think you have 10 more minutes. But we need to have [crosstalk]
We have a few questions so maybe, if in the next five minutes you can finish up we can get through of the questions that have come in.
Okay. So, I mean in the first phase, we saw that these assessments were implemented in about 93% of schools, and only about 9% of these schools are seen as meeting standards. There's almost no school, and these are rounded off to the first decimal point, most schools fail. So these things split people in to below standards, close to standards which is euphemism for you've failed but it's close enough, meets standards and above standards and it doesn't seem like there is collusion here. And if you look at things like teaching and learning about three quarters of schools by assessments done by state officials so this is done by, the cluster resource coordinator and typically somebody like a retired headteacher. And done on government approved criteria most schools fail. So it doesn't seem like there's collusion here and we wouldn't have necessarily taken this for granted going in.
School improvement plans, these are developed everywhere and these contain things like what are the expected goals, what's the action that you have to take to get to that goal, who's going to take the action, when would you do this by? Who's going to verify that the actions completed? This was all meant to be concrete, in practice it defers like all performance plans how concrete or implementable these things are. But all plans that we've looked at contain some actions that schools could take.
So what do I mean? Just for concreteness. Things like headteacher will conduct monthly observations of classroom teaching, to me that's a relatively concrete action. Teachers provide remedial instruction to students who score in the bottom two letter grades and the like. Insufficiently concrete recommendations are teachers should use different instruction strategies teaching and learning materials and activities, wow okay that could not have been more vague. Or a friendly atmosphere will be created, what does a friendly atmosphere mean and what do you need to get to it. So it's a mix that way but so are most performance plans that I've seen in universities, I don't know if [inaudible] better but.
And everything else is basically zeros. I'm going to rush through some of these tables. But the core is that at no point do we manage to see differences in the number of visits, the time of the visit, how recently, how much time they spent there, we also see no differences in comments, we see no differences in what teachers report they've been given feedback on, we don't see differences in teacher attendance, we're finding teacher attendance about 66,67% that's basically the same across these two groups. Student attendance is the same. The use of textbooks, the use of workbooks, the use of things like teacher praising students, time used within the classroom, whether homework notebooks are checked, basically not seeing differences anywhere.
You see the occasional star, you have this many comparisons none of this will survive a multiple hypothesis test. We see no differences in school management committees, which is maybe not a surprise but we see no effect on test scores. We don't see effects on test scores on ours or on the government ones [inaudible] so we can reject depending on test score and specification things like about .1 of standard deviation or so.
In case you wonder, maybe some schools did better, this is the distribution of test scores. So if you have theories about the type of score that could of done better, you need to balance it out with the type of school that will have done worse. If you squint, you can see that there are two lines, these are just empirical CDFs, these distributions essentially look identical. We can't reject that there the same.
Okay, so that was the first intervention in the scale up what we did, it demonstrates in 100 clusters we had a matched-pair design each with a treated and comparison school in the same cluster and these were identified before collecting data on student learning outcomes. I'm willing to explain a bit more of this in the questions if anybody cares. But, the core is that across the board, we find basically no evidence, so if you just look at the treatment row, there's no evidence that this improved test scores in either Math or Hindi which were the two subjects that we had said that we would assess. And if you look at this regression that is about as close to zero as you can get. We don't think there was much happening here at all.
The last click, I will take five minutes from this point, we tried to think what can we possibly learn out of this other than the fact that the program didn't work. The program didn't work and it's been scaled up so we should care about that result. On paper the program had several things that we thought could make it likely to succeed. It had a coherent program design, people had at least thought through why they thought this might work. And there was a lot of buy-in it was the flagship education reform in the state and we see basically no effects.
What we did, was in six districts, we went and randomly sampled schools, regular program schools and control schools and schools that had been designated champion schools so that the state had recognized as schools that are really good implementers. That's what we wanted from there and then in each district we went to academic cluster offices, Block Education Offices and just had open-ended interviews both about self-constraints in the education system and the program and assessments. So fairly broad ranging and only semi structured.
And on reading across all of those transcripts is basically the following: what the program did was, it was basically seen as something that added paperwork, so even when we're talking Champion School's and the like, there's a lot of documentation work, we make a work plan, then upload it, get it printed, there's so much paperwork that by the time some teachers figured that out they had forgotten what was Shaala Siddhi itself. I do all the documentation work at home because I have no time in the school, so this is the kind of stuff that we hear. And one of the things that comes through with a number of respondents, is essentially that everyone's kind of thinking of the program in terms of administrative compliance.
So if you think of the program here theory changed that's about you get information, you get incentives, you reflect, you improve your actions. Here, everyone is thinking in terms of from the state headquarters this is what I have been asked to provide, this is the format in which I have been asked to provide it, this is where it needs to get uploaded, I need to get this done because otherwise I shall be sent memos. So really the focus in terms of anybody implementing it, is in terms of administrative compliance it says that for one year we did everything that we were asked to. After that nobody asked us to do anything in the program. So we did not follow up on that at all. I don't remember what I did in the program two years ago. Files aren't kept properly. So everything's about where are the files kept? And are the files available to see if they come?
At the same time, the program is seen nationally as being successful. So if you look at the national program documents, that talk about Shaala Siddhi - the national program. It says that there is a consensus that everyone who's tried to do this kind of program in India has found it to improve things. This is a bold statement, this is in the official documents. So why does this program fail and be perceived as successful? So first, why do things fail? Think and for the most part here we're actually borrowing from the work of other people who've thought about this more carefully. People who've done careful orthographies, people who've done careful qualitative work. Because that frankly is not Karthik's strength or mine.
One is that there's a disconnect between the objectives of the program and how it's actually perceived by those implementing it. So it's not a self-improvement kind of program in the way it gets implemented. The other is that there is a massive disconnect in the role that the program attaches to education officials. So what is it that cluster resource coordinators meant to do? And the role as it's perceived by other agents in the system.
[inaudible] in writing with various colleagues, has this nice phrase called the post office state and that's in the context of education, that even though these cluster resource coordinators are meant to provide guidance, meant to provide feedback, they see there roles as I need to collect data from schools, send it up and I need to take notifications from the top and take it to schools, so that's what Yamani and [inaudible] call the post office state. And then the last bit is and this comes up unprompted in few interviews, which is that everyone essentially programs as short-run, they think of these as okay here's a one shot thing that's come in for education policy, these program priorities will change, the government changes its mind every other year so this is not something that I want to do [inaudible] in.
The other reason, so that's what's happening at the bottom, and then what we think is happening at the top is that there's a reason for the divergence between perception and reality. That's all, we're able to say that the impacts are zero because we went and measured those. If you're actually looking at the view from the state capital, what do you view? You view whether the assessments were done, whether there was activity, so the program got implemented in mission mode which is a term beloved of many state governments and many contexts and there success is defined in terms of did you actually manage to deliver the intervention? Did you manage to complete things? And on these metrics the program is a success. It's an unqualified success. A 93% compliance rate is something that I don't always get on experiments that we run with far more oversight.
And then if you think of this sort of more broadly, in the context of not just education, but the Indian state. So, Akhil Gupta who you will see here has this very nice book which was a joy to read called Red Tape and this is just an excerpt out of it, I would encourage everyone to go look at it. He's looking at land offices in UP and says what stands out is higher officials in the administrative hierarchy making decisions about programs and targets that bear little relevance to realities on the ground; also present, in turn are subordinates faithfully executing programs on paper but caring little for how well there implemented. Targets are indeed met, but the ultimate goals of the program go unfulfilled. And to some extent I think that's what captures what is happening as a process here.
And that finally leaves us with the question this is not unknown to agents in the system, so people who are in schools know this, people who are cluster resource coordinators et cetera would know this and if the same government wants to know they could find this out as well.
So why do governments continue scaling these up? Again it's not our [inaudible]to reinvent the wheel, we think one compelling explanation, we can think of at least one or two others is just institutional isomorphism. So if you go back to the classic DiMaggio and Powell paper from '83 well organizations tend to model themselves after similar organizations in their field that they perceive to be more legitimate or successful. So there's a question there about why does the government of Madhya Pradesh feel that it needs to learn from Ofsted in the UK? Well because the UK is seen as doing well in education and have an accountability regime that at least abroad is seen as working.
And these institutional isomorphic processes can be expected to proceed even in the absence of evidence that they increase organizational efficiency and that such mimicry has a ritual aspect, organizations adopt these innovations to enhance their legitimacy to demonstrate they are at least trying to improve. This comes back to Tariq's question earlier about why was MP doing any of this? I think that was Tariq's question. So there's a ritual aspect, I need to demonstrate that I care about education and this is the flagship reform that might take us there.
Right so, it's a straightforward paper so I'm not going to summarize the results again, but I just want to point out that this is incredibly expensive if you think peoples labor has value. If you just think that this takes five teacher days per school, we're talking about 35,000 teacher years of time that would be taken in this exercise at scale. If you use reasonable back of the envelope calculations based on teachers salaries, that's about an annual cost of 235 million dollars. The fact that these things are getting scaled up and seem to be ineffective has a direct fiscal cost. To the extent we care about public finance we really should care about this.
We think that this reform is important partly because it's been scaled up and it's really large and it's already been scaled up but also it's relevant to understand more than just this education policy. So in terms of improving management maybe really high in the public sector but it's going to be a much bigger challenge than it has been with private sector firms. In the paper we talk about this quite a bit, this is an experiment off one particular policy so we can't identify what exactly would lead things to succeed. In the paper we talk about things that we think could work, so we talk about incentives, visibility on outcomes, we talk about just inadequate staffing. The Indian state is also very good at layering functions without adding capacity. And of course there's issues around autonomy.
The last bit that I would leave you with is that in some senses we see this evaluation as also symptomatic of how various aid-funded, not just aid-funded programs this is also true of national, Shaala Siddhi is not an aided funded program it's funded by the national government. But a lot of programs are judged on whether the program designed reflects on international best practices, whether things are done in partnership with the government at different levels and how many beneficiaries it reached. If those are the metrics you use, this is an outstanding success. Like 600,000 schools, very few things in the world go to that scale, and then a broader lesson from our study is that you want to discipline initiatives that happen at this level with credible evaluations based on their impacts on things that you think are the ultimate outcomes of interest.
Sorry, I've definitely gone way over time.[crosstalk]
No, I think you've anticipated some of the questions that we've got. Let me just actually, in the interest of time, I know some people will have to leave and that's fine, those of you want we'll stay on for another 5, 10 minutes just to get through a couple of the questions. And I'm actually just going to go ahead and just ask them in the interest of time. I'm going to just put two or 3 of the initial questions that we've got. So one was a question from Kimberley [inaudible] who's a grad student here, whether assessments part of what's happening is assessments as well, you talked about the time horizons of some of this but assessments too are being done too soon? And that perhaps if there was a plan to follow a cohort over time, measure a wider range of metrics, including health, work, gender, attitudes, and see if they change over a longer period time? so thinking of not just the time horizons of the scale up et cetera, but even of your own evaluation? she had a question about that.
[inaudible] had a question about whether there was any comparison with Navodaya schools? Or how Navodaya schools are fitting into this?
And then Pratchi had a question about public response, so with these results, you mentioned presenting these results to the state government, were they covered by the local press? Was there a community response? And more generally was there a community response to the intervention at all?
So those are three initial questions if you want to take a crack at any or all of them?
Yeah. So, for the first one in trying to follow a cohort over time I mean one thing that we were conscious about and this is why the project has taken this long, which is that we wanted to make sure that we were not basing our judgments on the first year of any implementation because it takes a while for these things to happen. That's also why we wanted to measure actions and not just learning outcomes. Because presumably, if the program was doing anything then we should have seen something change in classrooms, something change in report [inaudible] something change in how teachers approaching the task, we're just not seeing any signs there. And then that's also why we thought it was incumbent on us to continue following up with the scale up and trying evaluate that. Because there is an argument that bureaucracy's only start taking things seriously once they reach a certain scale. So if you redefine the problem in that sense, you tried it for two years it didn't work, then you got better at it and then it worked when you went to 25,000.
So that's why this took five years to do. This took five straight years of field work to do. And that's I think the kind of time that we could take. We don't think anything would change if we were doing this longer, because when you go and ask teachers there like yeah we did this two years, I put the assessments and I haven't heard anything on it since. So, if you're not picking up actions in the short term, I don't think we'd be picking them up longer.
I think Pratchi had a question about community responses and the like. Now the program tried to have specific roles that attached to the schools management committees about parent involvement in schools and the like. There we don't really see action. Whether we went and inserted information about program failure in specific communities, frankly we haven't done. So, the partnership here was with the government and at the end of every field work round we basically kept telling them in real time exactly what we learnt. So, the other part was that we did not want to treat this evaluation like an audit study of sorts, but we'll tell you after all the results are in. So things like teacher absence results, and the fact that we weren't seeing any movement those came in relatively early, even in the first phase.
The comparison with Navodaya schools, I have two takes on this, one is that I think that is exactly why, so we do not make these comparisons and we think that's a good thing. And the reason we think this is a good thing is that there is a reason that the government would always want to talk about Navodaya schools and [inaudible]. The average school does not look like a Navodaya, it does not look at the same level of resources, it does not look at the same level of staffing, it is not as selective at entry, does not have the same infrastructure. And this is a tiny proportion. So in some senses, I guess the question is not can you improve 50 or 100 schools because that is not the scale [inaudible] that we have. We also do various trials that are of that mold, this is not one of those. And that's why it's about the intervention, really the case here, here we don't do that at all.
Just two questions to kind of wrap us up. So one, I was struck by the disjuncture between the stylized fact at the beginning which I've heard, I'm not an expert in education, but I've heard now repeatedly, we now have kids in school but they're not learning. But then in some of the data you present, kids are not actually in school if their attendance rates are at 58% or whatever. So, I don't know whether that's specific to MP or kind of how those two things sit together. We've now moved on to this problem of learning when they're in school but many of them are still not in school.
And the second was just, if you could talk about the way in which the reformulation and the scale up it went to self-assessment. It seems to me that, that decision was purely on feasibility grounds because that doesn't seem study related at all. Your kind of theory of change with the external assessment. But then the switch in the scale up was again was backing away from that so I didn't quite understand that decision maybe it was happening in some other way.
Yeah. So a couple of things. Student attendance is low in MP even in comparison to national norms. But I guess, I want to differentiate on the extensive margin of attending school and the intensive margin of attending school. So, the proportion of students who are not enrolled is quite low. Irregular attendance can be quite high but it is not as if there are these 40% students that never show up. So that doesn't seem to be the case. Methodologically for anyone who actually wants to set up school level panels this is an absolute nightmare. Because if you go and you test kids once, you go the next time there's a certain portion who weren't present the first time and a certain portion who was present the next time.
I think that's something that we need to worry about. Shoot I forgot the...
The other one is just in terms of the switch in the scale up to simply go [crosstalk] to self-assessments, and self assessments in schools what was the logic there?
So the logic there is, that I think across the board people involved in the design of the program really believed that the only chance this has of succeeding is with some version of collaborative framing. So the national program took that framing one step forward. The first step is that schools need to self-assess themselves and then the CRC is [inaudible] to show up, verify. Some of this has to do with feasibility, presumably. But that's what the national program looks like, so the scale up was [inaudible] into the national program and then the design of the intervention ends up being a bit of a composite between what the MP model was and that was a slightly higher intensity variant to sort of what the national model is. So I guess one version of trying to think about this, if you're thinking of external validity that the version being scaled up is even weaker on incentives than what we have.
Mm-hmm (affirmative) Okay. I think we are already over time. So I'll thank everyone for coming and most of all thank Abhijeet for sharing his really interesting work with us. Just to remind you that our next meeting will be two weeks from now and we'll have [inaudible] who will be speaking on gender quotas and political inclusion in India's VP institutionalized party system and that will be again two Thursdays from now. So thanks again for joining, Abhijeet thanks again for joining us as well and we hope we can see you in person next time you're at CASI.
Thanks a lot for having me. And just feel free to email me, I'm happy to hang around on the call if there's [crosstalk].
Thanks, thanks everyone.