Wednesday, June 3, 2020

Should we care about the decline in PISA scores?


Charles Ungerleider, Professor Emeritus, The University of British Columbia
[permission to reproduce granted if authorship is acknowledged]


There is little doubt that results from the Programme for International Student Assessment (PISA) get attention. If you enter the search term “PISA results” in your browser under the heading news as I just did, it will return more than 67,000 references in about a quarter of a second. Every three years the results are celebrated by politicians in jurisdictions that are ‘winners,’ like Canada, and loathed by those presiding over education in jurisdictions that are ‘losers.’  

PISA is the name given to the assessments administered to 15-year-olds in reading, mathematics, and science. The assessments are administered in more than 35 countries (which PISA often calls “economies”) and more than a dozen partners that include countries, such as Brazil, and economic regions such as Shanghai, Hong Kong, and Macau. It is convenient for PISA to use the term ‘economies’ to speak of all participating entities. 

Each administration of PISA assesses all three domains (reading, mathematics, and science), but gives prominence to one of the three domains in each cycle. In 2000, 2009 and 2018, the principal assessment domain was reading. In 2003 and 2012, it was mathematics. In 2006 and 2015, science was the focal domain. In 2021, the focus will be on mathematics again with an additional test in creative thinking. And, in 2024, PISA will measure “Learning in the Digital World,” the ability of students to use self-regulated learning while they employ digital tools.

PISA derives its support from the countries and economies that participate in the assessments. To sustain itself PISA must maintain the continuing support of previous participants, but it also tries to encourage new participation. PISA combines the assessment of the three domains with the assessment of abilities in other areas: financial literacy, creativity, and digital learning, for example.  

It is doubtful whether PISA would earn repeat business if there were significant differences from round to round in the major domains. “If you want to measure change, do not change the measures” – at least not too much.  Thus, while the assessments do change from one round to the next, the folks who analyze the results perform a variety of statistical operations to assure participating jurisdictions of the equivalence of the assessments. They also adjust the results so that they are related to the same scale. Thus, adjustments are made so that the score is centered at 500 with a standard deviation of 100 score points.

Large-scale assessments are helpful in determining if school systems are producing better student outcomes and reducing educational inequalities among groups of students over time. The jurisdictions that have such mechanisms are advantaged. Ontario, for example, has an Education Quality and Accountability Office (EQAO). A body that has some independence from the provincial Ministry of Education , EQAO conducts province-wide, census-type, large-scale assessments in reading, writing and mathematics at the primary and junior divisions; applied and academic mathematics at Grade 9; and the Ontario Secondary School Literacy Test (OSSLT) administered at grade 10. But education systems that do not have their own census-type large-scale assessments are at a disadvantage and thus must rely on external benchmarks against which they can measure their progress over time.

Because of my interest in improving student outcomes and reducing educational inequalities, I have been an observer of PISA since it began at the turn of this century. I am primarily interested in how jurisdictions interpret the results and what use, if any, they make of them to improve outcomes.

Earlier in this blog I used the terms ‘winners’ and ‘losers.’ I did that because the leadership in most jurisdictions treats PISA like a horse race. “Who won?” Who lost?” “How well or badly did we do?” There are some significant challenges to making use of the results.

Those who are responsible for PISA want PISA to attract attention and earn support. But they caution that PISA results are not a measure of the impact of schooling per se, but a cumulative measure of the prior experiences that the 15-year-olds have had and the many factors that influence those experiences such as poverty, parental education, etc. They also caution that PISA is not aligned with curriculum in the many countries and economies that participate.

Notwithstanding these significant limitations, I have spent quite a bit of time reading and consulting with colleagues about the decline in PISA scores over time. The decline has occurred in all three major domains (reading, mathematics, and science) on an international level and within Canada. I have represented that decline in the chart below devoted to mathematics. I chose mathematics because it is an area about which there has been much hand wringing. It includes the data for all the Canadian provinces, Canada, and the OECD average (excluding the partner economies since because they are not countries).

 

 This chart produced from data extracted using the PISA International Data Explorer.


I and my colleagues, many of whom have international reputations in measurement, statistics, and education are baffled. We are not certain to what the decline should be attributed or  its significance. There is no shortage of hypotheses.

With repeated measurements of the same phenomenon, there is a tendency for scores at the extreme ends (high or low) of the distribution to be followed by ones that are closer to the mean. The trend in the PISA data may reflect such a tendency. Another hypothesis is that over the nearly 20 years of PISA assessments, students spend more time on computers and less time reading print material, and the kind of reading they do has changed - if not in kind in degree. According to this hypothesis, students devote less mental effort to reading and the loss of mental effort is reflected in all subjects. Still another hypothesis is that the effort to retain students who would have dropped out of school has paid off, but that the students retained are less able and, thus, ‘diluting’ performance over time. Yet another hypothesis is that successive generations of students have become desensitized to large-scale assessments, attributing less importance to them, and expending less effort on them than in the past. I could go on, but I won’t.


Do these tentative explanations deserve examination? These hypotheses are worthy topics for a dissertation. But, if a jurisdiction has a robust system of large-scale assessments upon which it can rely for examining change over time, it would be more productive to focus on the data produced by those systems than to depend upon PISA. This is particularly true if the large-scale assessments are closely linked with the jurisdiction’s educational goals and curricula; allow for assessment at regular intervals throughout students’ educational careers; and are amenable to close analysis of the relationship between factors over which schools have control and the outcomes measured. 


Jurisdictions that do not have robust systems for large-scale assessment and do not have the resources to develop them will be dependent upon assessments such as PISA. For them, understanding why PISA scores are declining is a necessary prelude to understanding the results their students obtain.  Jurisdictions that depend on PISA alone for an external measure of system performance would also be wise to invest in some oversampling so they can track performance of subpopulations.