Revisit Agarwal's Theory on Testing Effect for 8th Grade Science Education: Improving Retention and Recall

Name: Jesse Anderson

Class: Psychology 181 – Section 010

Teacher: Dr. Brown-Kramer

TA: Melinda Powell

Date: 10-26-2018

The Effects of Quiz Frequency and Placement

Table of Contents

Summary

The testing effect is an ever occurring phenomenon from students within the school system. Essentially, this effect is the positive recollection of information based on frequent administration of tests/quizzes prior to final, criterial tests. Such that, information pertaining to these critical tests was better remembered by students if seen before and with more frequent occurrence (Agarwal, P, et.al.; 2011). Over the course of the years, stemming back to the early 1900’s, the testing effect has been studied numerous ways. However, many of these examinations were not inherently accurate as they gave, and tested upon, information not relative to the students class material. Only a handful of studies attempted to utilize the actual course work. Unfortunately, it was found that these studies’ criterial tests were not indicative, or attributed, to the students overall course grade. Or, the tests were given after the final grades had been designated, both resulting in students lack of care about the outcome of either situation (Agarwal, P, et.al.; 2011). Thus, not an accurate portrayal within authentic classroom settings.

Therefore, stemming from the research prior, Agarwal, et. al. hypothesized that a more numerous and continual quizzing regime, conducted throughout the year based on congruent exam material, could be used to promote increased levels of learning, and elevated persistence of recall, thus improving test scores on core summative curricular content (2011). Their assumption relies not only on the faults on previous studies, but supported by the successful implementations of particular teachers that used prelecture, postlecture, and review quizzes (Roediger, et al.; 2010). This regime was found to enhance performance on summative tests for the given material. Naturally, various questions were raised. First being if, “these positive effects of quizzes could be obtained for middle school science?”; second being, “whether the significant benefits of quizzing rest on the particular three-quiz regimen reported in 2010 by Roediger, Agarwal, McDaniel, et.al.?” (Agarwal, P, et.al.; 2011); lastly, why the “robustness of quizzing effects (if found) for long retention intervals has received little attention?” (Agarwal, P, et.al.; 2011). Multiple studies were to be conducted to understand these questions; they included practices on evaluating a student’s performance on end-of-semester and end-of-year exams by varying, repetitive exposure and time placement of quizzes (Agarwal, P, et.al.; 2011). This variation allows for the isolation and identification of the best technique for the testing effect, which in turn, supported or rejected their hypothesis.

Agarwal’s, and company, ongoing, comprehensive study is important at understanding how students can better retain and recall information, which in turn can improve overall grades. Increasing scores, especially that of science, has become a significant priority for the acquisition of school funding, and additionally on policies and educational research. Not only does the finalization of higher grades promote confidence of a students abilities but frequent quizzing could potentially assist in alleviation of test anxiety and improve accurate cognitive judgements.

Materials and Methods

Experiments 1 was conducted with subset of 139 participating eighth-grade science students attending a suburban middle class, Midwest middle school. Informed consent was received from the participants parents, and study approved by the school board, principal, and associated faculty of the school. Six groups, or classrooms, containing a 24 student mean were randomly assigned the randomly pooled quizzes. Thirteen students in special education or gifted programs were excluded from analysis; as well as, 28 students not present for all initial, unit, and delayed exams. All remaining, but three individuals, agreed to have their statistical data incorporated into Agarwal’s, et. al., study. Thus, 92 students data was ultimately recorded. Participating teachers course material and lesson plans were unchanged, and approved all quizzes/questions formed for the study, with the shortest being three questions to the longest yielding eight questions. The pool of questions, containing either multiple-choice (quizzed) or inference (nonquizzed), were randomly assigned to each pre and post quiz. All questions presented from these were included in the review quiz (Agarwal, P, et.al.; 2011).

In experiment one, the study implemented the three quiz regime stated earlier. Subjects of interest were: genetics, evolution, and anatomy (1, 2, & 3). The students were instructed by their teachers to read their respective chapters prior to attending class. Presented to the students was the initial prelecture quiz, based on the read material, was given. Followed by a post lecture quiz based on in-class material, and a final quiz 24 hours before the exam. For the pre and postlecture quizzes, all random, were given in the form of clicker questions and the teacher was to leave the room in order to preserve the integrity the test. Correct answers were shown briefly to the class to give feedback, but not who answered what. These quizzes did not pertain to their grade and were measures for individual comprehension. The review quiz on the other hand, counted towards a small percentage of the students overall grade (10%) and the teacher remained present during this period. After all quizzes the students were informed that any number of the question may be present on the final exam. Half of the quizzed questions were on the unit exams but the multiple choices were changed to avoid recognition as opposed to recollection. Overall scores of the exam were returned to the students a day after administration, but students did not receive “item-by-item” feedback. Students were informed one month prior to end-of-semester exam which contained half of the previously quizzed questions and the other half was not. The end-of-year exam noncongruent with the overall grade and was given unbenounced to the students. It contained none of the same end-of-semester exam questions but half were from previous quizzes. Retention intervals ranged from 5 to 8 months between the unit exam and end-of-year exam (Agarwal, P, et.al.; 2011).

Two additional experiments were performed in order to fine-tune the most predominant testing effect method. Experiment 2a utilized the exact same individuals from experiment 1, all consent and approvals remaining consistent. The procedure for quiz implementation remained intact and all students still received the pre-post-review quizzes. However, the content presented on the quizzes (the conditions) in which to be seen on the different unit exam varied, such that eight different conditions were formed. Material on the exam contained information from either none, one, two, or all three quizzes, depending on what was given for the particular subject. The students could either receive quizzes containing the following exam information: (a) none contained exam material (control); (b) only prelesson; (c) only postlesson; (d) only review; (e) on both pre and post; (f) on both pre and review; (g) on both post and review; (h) on all pre-post-review. Experiment 2a still contained a delayed retention end-of-semester and end-of-year exam (Agarwal, P, et.al.; 2011).

Experiment 2b did not utilize the same subset of eighth graders as both 2a and 1 had, and none of the individuals were from the former experiments. Although, experiments 2b was conducted with subset of 148 participating eighth-grade science students attending the same suburban middle class, Midwest middle school. Informed consent was still received from the participants parents, and study approved by the school board, principal, and associated faculty of the school. Six groups, or classrooms, containing a 24 student mean were randomly assigned the randomly pooled quizzes. 25 students in special education or gifted programs were excluded from analysis; as well as, 69 students not present for all initial, unit, and delayed exams. All remaining students agreed to have their statistical data incorporated into Agarwal’s, et. al., study. Thus, 54 students data was ultimately recorded. Participating teachers course material and lesson plans was still unchanged, and approved all quizzes/questions formed for the study (Agarwal, P, et.al.; 2011).

Moreover, experiment 2b sought to replicate 2a’s procedure but the course content now varied. The eight previous conditions and all quiz implementations remained, but with the addition of two delayed retention exams. Also, instead of the experiment being observed for five courses, this experiment collapsed down into two: genetics and chemistry. These two courses were divided into four “chapters” where a total of eight pre and eight post lesson quizzes were created and varied based on length of the chapter. All delayed retention exams followed the same procedure as in experiment 1 (Agarwal, P, et.al.; 2011).

In summation, experiment 1 could be considered a control correlational experiment due to the fact there was no variable manipulation among the population. All students received quizzes (pre-post-review) in which half of the content would be on the unit exam. Albeit, the questions for each section were different and the multiple choice answers were scrambled on the exam, there was no control variable. However, it could be argued that the control was the unquizzed material thereby making the preexposed quizzed material the independent. Regardless, experiment 1 was simply a foundation to see if there was a correlation between continuous implementation of course summative information and exam scores (testing effect), attributed to memory retention and cognitive processing. Experiment 2, however, was an experimental process. They had a control variable: none of the quizzes contained unit exam material. Where the independent variable was the frequency of quizzes with exposed exam course material: conditions a-h. Thereby the dependent variable would be the relative exam scores based upon the number of unit exam material containing quizzes exposed to students. Both experiments 1 and 2b looked longitudinally as they then held delayed retention exams to compare to one another, thus why experiment 1 can be considered overall a control.

Results

Experiment 1 saw in substantial increase in scores, thus learning, between the pre and post lesson quizzes, and then again between the post and review quizzes. All course subjects displayed this type statistical increase by elevated exam scores. With delayed retention exams, they essentially found a statistical increase in cognitive retention when preexposed to exam pertaining material in the form of multiple quizzes (pre-post-review) over a course of almost nine months. The delayed examinations indicate the highest retention was up to three months after exposed to material and eventually with scores becoming congruent with the nonquizzed material after eight months (Agarwal, P, et.al.; 2011).

Experiment 2a was to then identify if these successes depended on a particular placement of the exam material containing quizzes. Among the administration of a single exam material containing quiz (pre-only, post-only, or review-only), post-only and review-only were found to significantly increase student’s unit exam scores when compared to the nonquizzed (non-exam material containing quizzes). Additionally, when post-only and review-only were directly compared to each other, review-only was significantly more effective. Even with the prelesson quiz paired with either the post or review, it was still found ineffective as both post-only and review-only still yielded greater results. Furthermore, when post and review quizzes were paired they too were found to be less effective when compared to the review-only. Ironically, when all three, pre-post-review, quizzes were paired it was found to be still less effective than just the review-only. Thus indicating that multiple placed quizzes will not augment, or obtain, greater exam scores (Agarwal, P, et.al.; 2011).

Experiment 2b’s data was nearly identical to 2a’s. Even a single exam material containing quiz elevated test scores when compared to having all nonquizzed material. However, unsurprisingly, none of the other combined conditions surpassed the statistical significance of the review-only quiz on unit exam scores. Even when pertaining to the delayed retention exams administered in this experiment, the review-only condition outperformed all others (Agarwal, P, et.al.; 2011).

Unfortunately, these results do not support the experimenter's hypothesis. The addition of numerous and strategically placed course content quizzes does not increase overall exam scores. However, the hypothesis is not rejected. The sub-hypothesis, that implementing material relevant quizzes will increase cognitive processing and retention, thereby increasing exam scores is supported. Even the single addition of one of these quizzes yielded far higher unit exam scores and long term retention opposed to having none.

Critical Review

The results indicated that the review-only, exam material quizzes, most predominantly assisted students in overall retention and cognitive processing. However, even the repetitive addition of any one of the three conditions greatly improved the students scores compared to nonquizzed material. This implies, on the basis of studying and learning on a school setting, that the more exposure to material one has, the greater the chance an individual can recall information. This is important, especially to teachers and students, as their lives revolve around the success of demonstrating information processing. Not only relevant in a school setting, but any workplace where the more exposure a person has to the relevant information pertaining to their job, the greater the mental clarity and judgements become when dealing with issues that arise. These results illuminate the importance of understanding material from all aspects of given information or material and not just a particular set of questions.

However, after reading the study, some concerns are brought to light. The most notable, in my opinion, is the fact the review-only “cram” exposure yielded the most significant unit exam results, more so than to all three simultaneously implemented. A potential confounding factor involved is considered in this regard. Mainly, that the review quizzes actually pertained to the students grade whereas all other quizzes (besides the teacher’s unit exam) did not. Therefore, students more than likely preemptively studied before these quizzes, possibly an entire week prior. Furthermore, that particular quiz was administered 24-hours before the unit exam, in which students not only had been studying for the review quiz but more than likely for the exam as well. Now, the unit exam score were lower even when students were given this quiz but not with exam pertaining material. I believe students considered the review quiz to be just that, a review. Watever material was on the review is more than likely what the students primarily focused on. This would grant a reason for higher results pertaining to the individuals given the review quizzes with actual exam pertaining material.

The researchers could have also elaborated on their methods slightly more. Multiple suggestions are brough to mind such as the identification of courses were given which condition in experiment 2a; such as, the pre-only, post-only, review-only, etc given for either genetics, evolution, or anatomy (1, 2, & 3). It is unclear as to which section received what. Information regarding this should be considered into the results and the mean unit exam scores to be accurately depicted for each course. To further clarify the information, better explanation of the statistical number associated could alleviate any confusion from individuals reading the study. Now, most who are reading these type of journals already have an understanding of the symbols and formulas provided, but to those just entering the field of research may not, thus benefiting from a more apt description.

Additionally, there is some confusion on particular portion of experiment 1 where the researches talk about “astronomy 3” but that course was not apart of the study, “anatomy 3” was. Was this a typo, or was the wrong class talked about altogether? The method itself was reasonable, as they not only used different course material and different population samples, but also implemented various placements of conditions. However, I think the exact same experiment should be reproduced across increasing education levels (highschool freshman to college senior) in order to see if their overall hypothesis for the testing effect is supported.

Application

Applications are between this study and my own life are quite evident. Professors consistently give clicker questions prior to the initiation of the days lecture, or at it’s conclusion. Additionally, multiple recitations (especially STEM based ones) implement quizzes either at the beginning or end of class. Either based on the prior week material or the material covered in recitation, respectively. As a college, pre-med student, I’m constantly locked in an intellectual struggle. Which class requires more attention? Which homework to complete first? What hours can I sacrifice in for a single study session over another? This study assisted in illuminating a better outlet in which to retain information and progress with my education. The best route for replicating the data presented would to be create review quizzes (or at least flash cards). However, instead of just a 24 hour preemptive cramming tool, make one for each weeks material then conjoin them for the 24 hour pretest. Even though the “review-only” is technically most “statistically significant”, the “post-review” was almost identical in both short and long term retention, indicating it is not inherently negative implement such technique. Furthermore, the amount of information in college far more vast than that of middle school, so reinforcing the information weekly and then grand 24 hour recall would mostly likely be optimal. With this application I do not foresee much struggle, as I already do a form of this. Each weekend I review all my notes and then attempt to explain to myself how it all works and pieces together. However, I think the implementation of a quiz or flash cards would yield greater retention.

References

Agarwal, P; Huelser, B; McDaniel, M; McDermott, K; Roediger III. Test-Enhanced Learning in a Middle School Science Classroom: The Effects of Quiz Frequency and Placement. Journal of Educational Psychology. (97, 399-414) 2011. DOI: 10.1037/a0021

Essay: Revisit Agarwal’s Theory on Testing Effect for 8th Grade Science Education: Improving Retention and Recall

Essay details and download:

Text preview of this essay:

Summary

References

About this essay:

Essay details and download:

Text preview of this essay:

Summary

References

About this essay:

Essay Categories: