NEPSY-II Test Coverage and Use: Comprehensive Guide on Validity and Reliability

Test Review: NEPSY-II, Assessment of Neuropsychological Development

Test Coverage and Use

The NEPSY-II is an individually administered assessment battery designed to assess neuropsychological development of children and adolescents ages 3 to 16 years 11 months old (Korkman, Kirk, & Kemp, 2010). The NEPSY-II was intended to measure both basic and complex aspects of cognition of a children’s ability to learn and be productive, in and outside of, school settings. It enables clinicians to assess across six functional domains, including: Attention and Executive Functioning, Language, Memory and Learning, Sensorimotor, Social Perception, and Visuospatial Processing. Subtests may be selected to provide a general overview of neuropsychological functioning, diagnostic measure, determine cognitive abilities, and/or a comprehensive neuropsychological investigation (Korkman, Kirk, & Kemp, 2010). The results provide information relating to typical childhood disorders, which can lead to accurate diagnosis and intervention planning for success in school and at home. The NEPSY-II is a well-developed neuropsychological assessment tool that informs practitioners working with a wide range of child and adolescent populations.

Appropriate Samples for Test Validation and Norming

The NEPSY-II normative data was collected between 2005 and 2006. The standardization sample was a stratified random sample consisting of 1,200 preschoolers, children, and adolescents between the ages of 3 to 16 (Brooks, Sherman, & Strauss, 2009). The data analysis was collected based on U.S. Bureau of Census in 2003. The sample was stratified by age, race/ethnicity, geographic region, and parent education level (Brooks, Sherman, & Strauss, 2009). There were 100 children (50 boys, 50 girls) in each of 12 age groups ranging from ages 3 to 16 years. For ages 3 to 12 years, each group contained 50 children in the first six months and 50 children in the second six months of the year. For adolescents between 13 and 16 years, there were 50 subjects for each year.

Each child in the normative sample was separated by the race/ethnicity category. Also, children were selected from the normative group in accordance with the proportions of children living in each of the four regions of the U.S: Northeast, Midwest, South, and West. The sample was further stratified by parent educational level. Children were excluded from the normative sample if they had a neurological, learning, sensory/motor and psychiatric disorder, recent history of previous testing, and medication usage that might potentially impact performance (Brooks, Sherman, & Strauss, 2009).

Reliability

Reliability of a test refers to the accuracy, consistency and stability of test scores across situations (Urbina, 2014). Reliability measures are provided for all of the NEPSY-II primary and process scaled scores. Internal consistency reliability was calculated using split-half and alpha methods. Additionally, test-retest reliability and decision consistency reliability were used for the subtest that weren’t appropriate for split-half or alpha method (Brooks, Sherman, & Strauss, 2009).

The split-half reliability coefficient was calculated using the Spearman Brown formula. Stability coefficients (i.e., test-retest methodology) was used for subtest with autocorrelation of items or subtest that allow specific latency periods of responding. Approximately 80% of the reliability coefficients reported were above .70 for subtest scores (Korkman, Kirk, & Kemp, 2010). Some subtest distributed low scores within a specific age range. Additionally, internal reliability coefficients are presented in the manual for the mixed clinical diagnosis sample.

Overall, evidence for internal reliability for the NEPSY-II was quite impressive. The reliability data indicates a high internal consistency. The subtest that have the highest reliability coefficients are Comprehension of Instruction, Design Copying, Fingertip Tapping, Imitating Hand Positions, List Memory, Memory of Names, Phonological Processing, Picture Puzzles, and Sentence Repetition (Korkman, Kirk, & Kemp, 2010).

Test-retest reliability refers to the stability of test performance over time. This sample consisted of a diverse group of 165 children (52% male, 48% female), divided into six age groups: 3 to 4 years, 5 to 6 years, 7 to 8 years, 9 to 10 years, 11-12 years, 13 to 16 years (Davis, & Matthews, 2010). The group took the NEPSY-II on two separate occasions between 12 and 51 days, with a mean retest interval of 21 days. One way of evaluating test-retest was by stability coefficients, which was derived using Pearson product moment. Pearson product moment coefficients (r) ranged from .21 to .91 across all age groups (Davis & Matthews, 2010). The lowest reliability coefficients were the scores calculated with test-retest. The low reliability was likely a result of practice effects and range restrictions on the test-retest reliability (Korkman, Kirk, & Kemp, 2010).

A decision consistency methodology was used to demonstrate reliability for the scores that test-retest was not appropriate (Davis & Matthews, 2010). Using the same six age groups, a cut off score was used to create two categories. The authors used: less than or equal to 10th percentile and greater than 10th percentile. Decision consistency was moderate to high on each subtest across all ages (Davis & Matthews, 2010). Overall, test-retest reliability correlations for many NEPSY-II subtest are sufficiently high.

Content Validity

Evidence regarding the validity of the NEPSY-II subtests is thoroughly presented in the manual, and it includes a discussion of content, construct, and concurrent validity. The content validity of the NEPSY-II was established through a review of research literature, the test authors' clinical and research experience, customer feedback, and a review of pilot studies (Korkman, Kirk, & Kemp, 2010). Based on the information, the development of the NEPSY-II subtests involved multiple revisions of test content including, modifications to address content gaps and address research concerns (Korkman, Kirk, & Kemp, 2010). Data was collected from the pilot and tryout phase to determine subtest and item composition. Finally, after the standardization phase, subtests were reevaluated for content, bias, and psychometric properties.

Construct Validity

The pattern of correlations between subtest provides information about the internal structure of the NEPSY-II, also about the degree of the relationship among subtests measuring similar content. It’s important to note, the subtests in the same domains measure very different aspects of neurological functioning. Therefore, high correlations between subtest are not expected. Even when they are classified within the same domain because they are assessing various abilities. Evidence for construct validity was generally strong, but there was no evidence of a factor analysis to confirm the structure of domains (Brooks, Sherman, & Strauss, 2009). The most relevant data for construct validity are the correlational data obtained from concurrent validity studies with other measures of functioning.

Criterion Validity

There is a large amount of criterion validity evidence in the form of correlations with several other tests. Concurrent validity is provided by a series of correlation studies with a variety of instruments designed to measure cognitive ability, academic achievement, behavior, and neuropsychological functioning (Davis & Matthews, 2010). An examination of the relationship between test scores and other external variables provided information about what the test measures and if it relates to external variables appropriately.

A series of studies were conducted with the scale’s standardization to examine the relationship between NEPSY-II scores and other measures of cognitive, neuropsychological, academic, adaptive, and behavioral functioning. Concurrent validity of intellectual functioning was assessed using the WISC-V, DAS-II, and Wechsler Nonverbal Scale of Ability (Davis & Matthews, 2010). Correlations between these instruments show the NEPSY-II is sufficiently predictive of cognitive performance in both verbal and nonverbal measures. The WIAT-II was used to asses’ validity in academic domains. Results of this analysis show a moderate link between the multiple subtests, and show a positive relationship between the skills assessed on the NEPSY-II and academic performance (Davis & Matthews, 2010).

The authors also conducted several “special group” studies to test criterion validity, and to obtain information that leads to a disability or diagnosis. The data reported the NEPSY-II scores have good discriminative validity across a variety of disability conditions (Davis & Matthews, 2010). Overall, there is acceptable evidence of convergent and discriminant validity.

Test Administration

The NEPSY-II may be administered in its entirety to obtain a comprehensive overview of functioning across all domains. Additionally, the NEPSY-II has the unique feature of allowing the evaluator to design a battery of his or her own by choosing different combinations of individual subtests (Davis & Matthews, 2010). The manual describes multiple specialized batteries that may be administered to answer specific questions without having to administer the entire test battery. Due to the large scale nature of the NEPSY-II, test users should be prepared to spend time learning and practicing this instrument

The NEPSY-II is well organized, and the materials are well designed and user friendly to help the administration run efficiently. Administration time is dependent upon how many subtests are actually administered, however the entire battery can take up to 3.5 hours for older children (Davis & Matthews, 2010). The test materials of the NEPSY-II kit include: Administration Manual, Clinical Manual, Stimulus Book 1 and 2, Record Form for ages 3-4 and 5-16, Response Booklet for ages 3-4 and 5-16, Animal Sorting Cards (8), Memory for Design Cards (22), Memory for Names Cards (8), Memory Grid, Red Blocks (12), Pencil, and Scoring Template for drawing subtests. There is a Training CD to assist the examiner prior to administration. Examiner will need a readily available stopwatch, pencil, and a device to play the audio CD when indicated during administration.

Prior to test administration, examiner will determine the necessary subtest to administer and organize materials for easy manipulation during testing. The physical setting of the testing needs to be quiet, well-lit room to prevent from distractions. The examiner should be sitting at a 90° from the examinee for clear visibility of their answers on subtest requiring them to point in the stimulus book. The desk should be at a comfortable height for the examinee and the materials should be appropriately placed in front of them. Adherence to the standard procedure includes using a natural conversational tone, establishing and maintaining a rapport, and continuously reinforce the child’s efforts throughout administration.

During administration, the Stimulus Book will stand on the easel, and the pages get turned toward the examinee. The front page of the Stimulus Book should be faced towards the examiner, to ensure the examinee does not see the administration instructions. The Record Form should be placed on a clipboard to ensure the examinee cannot see the answers provided. The directions for each subtest are designed to elicit optimal performance, and require thorough procedures. Deviating from standard procedure including change of physical environment, repeating examples more than recommended, or not presenting teaching concepts can reduce the validity of the results (Korkman, Kirk, & Kemp, 2010).

Test Reporting

The NEPSY-II appears to be an excellent instrument that will provide relevant and useful information regarding the neuropsychological functioning of children and adolescents. The NEPSY-II test scores include four major types of scores: primary scores, process scores, contrast scores, and behavioral observations (Brooks, Sherman, & Strauss, 2009). The scores are expressed as scaled scores, percentile ranks, percent performance, and cumulative percentages. Based on the specific child and the referral questions, specific subtest can be administered to enhance clinical utility and minimize administration time. The variety of test scores allow for each subtest to be broken down into separate scores, to allow for further specific interpretation. The NEPSY-II provides a subtest profile that allows for the identification of specific skill deficits by evaluating various aspects of performance (Korkman, Kirk, & Kemp, 2010). The inclusion of qualitative analysis provides a valuable perspective to interpret and apply information gained from the test battery score (Davis & Matthews, 2010). The Clinical and Interpretation Manual provides detail information on interpreting the scores. Additionally, Scoring Assistant and Assessment Planner software is available to assist the scoring. The results of NEPSY-II, when combined with quantifiable behavioral observations analysis during the assessment and observations from home and school, can help clarify the nature of a child's problems and provide a basis for developing appropriate intervention recommendations (Brooks, Sherman, & Strauss, 2009).

Test and Item Bias

The NEPSY-II Affect Recognition subtest require examinees to identify emotions exhibited of a child’s face in a photo. The items in the Affect Recognition subtest include faces of children of Caucasian, African-American, South- East Asian, and Romania descent. When the test was adapted to Romania, stimuli featuring faces of children of African and South-East Asian descent showed significant item bias for children ages 3-13 years. The authors noted the reason for this bias is most likely due to the lack of familiarity because Romanian children have very little contact with children of non-Romanian descent (Yao, Bull, Khng, & Rahim, 2017). As a result, this stimulus had to be replaced with stimuli featuring Caucasian faces with similar emotions.

Additionally, the subtest Auditory Attention can produce bias when words have to be translated to a different language. When translating, the examiner needs to replace with words of the target language that are adequate phonetic distractors for the stimulus (Schmitt & Woodrich, 2004). The standardization sample included a stratified age, race/ethnicity, and geographic region to determine and eliminate any potential bias’s (Davis & Matthews, 2010). When the NEPSY-II was released, it produces notable improvement reducing test and item bias noted from the first edition.

Essay: NEPSY-II Test Coverage and Use: Comprehensive Guide on Validity and Reliability

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: