Tuesday, February 23, 2021

The fallacies of a sampling approach to student assessment

 

Charles Ungerleider, Professor Emeritus, The University of British Columbia

[permission to reproduce granted if authorship is acknowledged]

My previous blog examined some of the claims made about large scale student assessment: that they prompt teachers to teach to the test; waste valuable time and resources; do not assess everything that is important; are stressful for students and teachers; do not take into account differences among students; and allow some individuals and organizations to make invidious comparisons among schools.

Another argument that opponents make about large scale student assessment is that they should be administered to a percentage of students (a sample) rather than all the students at a particular level (census approach). That, it is argued, would save resources, and prevent those who misuse the results from doing so. It is doubtful that a sample approach to large scale student assessment would achieve those desired outcomes. But there is a more serious problem: sampling will not work if one is really concerned about equity of outcomes for all students because it reduces the ability to identify factors that impede success for each and every student.

 A sampling approach (as opposed to a census approach) has other deficiencies. Sampling prevents examining the trajectories of students. In other words, with a sample, it is almost impossible to tell whether the students who performed poorly at grade 3 had improved by grade 6. Sampling doesn’t allow the system to see what has happened to transient students, a particularly vulnerable population that is easy to overlook. It also increases imprecision by increasing measurement error. If you want to know how students’ prior performance relates to their subsequent performance, you need to survey all students.

I can illustrate the benefits of a census approach from a study in British Columbia. In 2006, the Minister of Education described high school completion rates of students for whom English was a second language (ESL) to be better than any other group the Ministry assessed (Victoria, Parliamentary Debates, p. 3530). After a closer examination of the data, Bruce Garnett and I (Garnett and Ungerleider, 2008) confirmed the accuracy of the Minister’s statement, but found that the high achievement of the numerous Chinese speakers masked the fact that smaller subgroups of the ESL population were faring poorly in the school system. We found that the strong performance of Chinese speakers (the largest ESL group in the data set) pulled the aggregate ESL graduation rates upwards. The graduation rates of all groups except Chinese speakers were very low, generally below 60%. The worst outcomes were among ESL speakers of Spanish, Vietnamese and Filipino languages. Identifying the differential success of various ESL groups would not have been possible if the data set had been generated on a sample basis. You cannot systematically address problems that you cannot identify.

Another advantage of a census approach is that it permits analyses that can help to identify factors over which the system exerts influence that facilitate or impede educational progress of groups of learners (for example, First Nations, Metis, and Inuit students, second language learners, students with special needs, etc.).

Most important, if equity among students is a priority, samples simply do not work. Even with carefully drawn samples it is difficult to detect small sub-populations of students to support meaningful analyses. I can illustrate this with reference to British Columbia’s student population which, at the time of the calculations below, was about 40,848 students at grade 4. With a student population of that size, we could draw a sample of 1,481 student, a number sufficient to meet the requirements for sound statistical analyses of the grade level population.

However, if you wanted to break down results by school board or wanted to study the performance of sub-groups of students, the sample would not work. Here is an illustration of why it doesn’t. The illustration assumes that the assessment is intended to produce results that fall within a confidence interval of +/- 2.5% 95 times out of 100 at the school board level.

School Boards

Number of Students available for assessment in 2019/20

Number of students required for a sample with a confidence interval of 2.5 at a 95% level of confidence

Abbotsford

1524

765

Alberni

294

247

Arrow Lakes

34

33

Boundary

99

93

Bulkley Valley

144

132

Burnaby

1740

816

Campbell River

397

316

Cariboo-Chilcotin

305

255

Central Coast

26

26

Central Okanagan

1690

805

Chilliwack

1022

614

Coast Mountains

271

230

Comox Valley

631

448

Conseil Scolaire francophone

601

432

~~~~~~~~~~~~~~~~~~~~~~~~

Table truncated*

~~~~~~~~~~~~~

Richmond

1377

726

Rocky Mountain

288

243

Saanich

484

368

Sea to Sky

409

323

Sooke

868

555

Southeast Kootenay

466

358

Stikine

15

15

Sunshine Coast

235

204

Surrey

5459

1199

Vancouver

3493

1067

Vancouver Island North

94

89

Vancouver Island West

35

34

Vernon

610

437

West Vancouver

501

378

Grand Total

40,848

22,200

*FULL TABLE AVAILABLE UPON REQUEST

         

In very large boards such as the Surrey School Board, the number of students sampled (1199) would be a relatively small proportion of the grade 4 student population (5,459). In a smaller board such as Vancouver Island North, the sample required (89) would encompass almost all the 94 grade 4 students in the Board. In the Conseil Scolaire Francophone Board the sample required would be more than two-thirds (432) of the 601 grade 4 students.

Overall, if you wanted to break down results by school board or wanted to study the performance of sub-groups of students, you would need to sample more than half of the total number of students at grade 4.

It is inconsistent for those concerned about the role education plays in helping to achieve social justice to want to restrict large scale student assessments by arguing in favour of sampling students. Those determined to achieve social justice ought to want to shine a light on discrepancies, not obscure them. My hunch is that, when considering the evidence of the limitation posed by sampling, advocates for social justice will see the benefits of a census approach to large scale student assessment.