How are the SATs expected standards set?

Please note: the ATL website is no longer being updated and will be taken down soon.

Visit the new NEU website

03 November 2016 by Anne Heavey
The key stage 2 test results this year look very different to results in previous years. The curriculum being assessed is different, the tests are different and the reporting of results are different. At ATL we thought you might be interested in understanding how the standard was set for each test.
Little boy being tested

As schools receive their KS2 test data back many teachers will be asking the following question: How was the expected standard set for these tests? Back in the 2015 several groups of teachers met to agree the test performance descriptors. Each descriptor creates the picture of the attainment that could be expected of a child working just at “the expected standard” as set out in the National Curriculum. These performance descriptors form a core part of the standard setting process later so are extremely important. It is these descriptors that have been described as “broadly in line with 4B” and were published in the test frameworks in 2015. The standard setting process involved different teachers; 2500 teachers applied to take part. Two groups were selected to set the standard for each test area:

Each group had approximately 30 members, drawn from primary teaching and leadership backgrounds. The KS2 groups also had at least one teacher who taught year 7. The groups were constructed to ensure that a wide demographic pupil range was represented. Having two different groups setting the standard on each test allows the groups to validate each other’s judgments. Prior to attending the standard setting day each participant completes a set activity to ensure that they are familiar with the performance descriptors for each test against which judgments will be made. On the day of the standard setting the 'bookmark' method is used. This method of standard setting is used in many countries and is widely accepted as a robust and fair approach. A test item booklet is constructed for the exercise. In the booklet the test items for each test area are placed in order of difficulty. This is determined by the number of children who successfully completed each question in the 2016 tests. The bookmark, and therefore the standard, will be placed at the last question in the booklet that the participant feels confident that a child just at the expected standard (as laid out in the performance descriptor) could have a two thirds chance of successfully completing. In round one, each individual participant makes their own judgment of where they believe that the book mark falls. The range of bookmark placements may vary significantly at this point. The results of this round are discussed as a whole group. In round two participants work in small groups to agree the placement of the bookmark.  Again the results are discussed as a whole group. Between rounds two and three, impact data is shared with the groups. This impact data outlines the percent of children nationally who would meet the expected standard at placements of the suggested bookmarks. In round three the whole group decides upon a final bookmark placement, using the impact data as further evidence to support the placement. The final bookmarks selected by both groups then go forward to the standard confirmation meeting. In the standard confirmation meetings staff from the Standards and Testing Agency and the chair of their technical advisory group of international experts, discus the results of the standard setting meetings and make a final recommendation for the setting of the standard. The agreed final score will be the 100 point in the scaled score.  Teacher and head teacher unions observed this meeting. There are some interesting points about this process:

  • The expected standard mark had not been determined ahead of these standard setting meetings.
  • Real teachers who have taught the curriculum and supported children through the assessment set the standard.
  • The level of difficulty of each question is determined by the actual performance of this year’s year 6 cohort.
  • The bookmark is placed at the last point at which teachers think a child just at the expected standard would have a two thirds chance of achieving the mark.

The “pass mark” was not set prior to taking the test, and teachers who had delivered the curriculum and administered the assessments, set the standard. However we may feel about the curriculum and the tests, one thing we can say is that the profession had control over setting the standard and the process was robust. A very similar process was used to set the standard for the Key Stage 1 tests. The Science standard will be set later in the year.

Tagged with: