Back to the Scholarly Publications page
Pucel, D.J., &
DEVELOPING COMPUTER
SIMULATION PERFORMANCE TESTS: CHALLENGES AND CRITERIA
David J. Pucel, Ph.D. Lynn
D. Anderson, Ph.D.
Professor Executive
Director
Human Resources Development and Joint Commission on Allied Health
Personnel
Business and Industry Education in Ophthalmology (JCAHPO)
The use of computer
simulations to test a person’s technical psychomotor skills in health and other
technician roles is in its infancy. This paper presents insights into how
computer simulations can be developed to test a person’s ability to perform
psychomotor tasks. They are the result of a two-year project that has resulted
in a set of international certification tests for ophthalmic technicians within
the
Performance, Testing, Simulations, Certification, Ophthalmic
Testing to verify a person’s ability to perform a skill is fundamental
to many occupations. It provides assurance that a person is capable of
performing on the job. Such testing is often very expensive and time consuming.
It often requires people to come to testing sites far from their homes, and
examiners that are qualified in the skills to be tested. The authors faced the
challenge of making the process of such testing more efficient, valid and
reliable for those wishing to be certified as ophthalmic technicians. The
solution they arrived at was to develop computer simulations to test a person’s
skills.
The
authors quickly found that the use of computer simulations to test a person’s technical
psychomotor skills in health and other technician roles is in its infancy.
Although an extensive literature search was undertaken, no literature was found
that focused directly on how to develop such simulations. The question became,
“How can a computer simulation be developed so it can be used as the platform
to test a person’s ability to perform a psychomotor task, such as determining a
patient’s eye correction using a retinoscope?
This
paper presents major considerations in developing computer simulations for
certifying the ability to perform psychomotor skills based on a two year
developmental project that has yielded tests that will be in place through
throughout the
2. Models Underlying the Developmental Effort
Two
basic models were adopted as frameworks to begin the design and development of
the computer simulation tests. They were implemented within the context of the Standards
for Educational and Psychological Testing [1]. The first was the performance testing model
presented by Pucel [2]. That model outlines the procedures for the development
of tests to be used to assess skill mastery. That model calls for:
·
clearly
defining the skill to be evaluated.
·
outlining
the process of performing the skill.
·
defining
criteria for judging each step in the process.
·
developing
criteria for judging the final outcome of performing the skill (e.g., product,
accuracy of decisions)
·
establishing
scoring procedures that reflect the importance of each step and an acceptable
outcome of the performance.
The second model was that presented by
Alessi and Trollip for the development of instructional simulations [3]. Their
model suggests a set of standards for the development of multi-media materials
aimed at providing instruction. Those standards are presented in three
categories: planning, design, and development.
3. Challenges
At
first, one might think the development of computer simulation tests is only a
matter of generalizing typical instructional simulation development principles.
However, after spending two years developing a series of simulation performance
tests in ophthalmic technology, it is apparent that the challenges are much
more complex. Instruction is aimed at teaching a person knowledge or how to do
something. In teaching a performance skill, the focus is on how to do it
correctly. Testing is aimed at determining if a person has mastered the
content.
In
the context of performance testing, one challenge is to present the skill
through accurate simulation, but an additional challenge is presenting
alternatives to the correct procedure that will allow a person to demonstrate
they do not know how to perform the skill correctly. These alternatives can be
created in a variety of ways. For example, one dimension for doing something
incorrectly is not performing a particular performance step correctly. Another
is performing the steps out of order. Another is arriving at the wrong answer
even if the correct process is used. The process is further complicated by the
fact that when people enter computer simulations they tend to want to
experiment with the simulation to see what it does. In doing so they may or may
not be intending to demonstrate their skill. Therefore, when does scoring
begin, how does one determine when a person is intentionally trying to perform
the skill versus explore the workings of the simulation?
4. Rationale for Testing Performance
Skills with Computer Simulations
At first it would appear that it would not
be rational to try and assess psychomotor performance skills using computer
simulations. Psychomotor skills require the ability to actually manipulate real
devices that require the use of tactile skills that can only be learned from
working with and handling the real devices. However, adequate performance of
psychomotor skills not only requires the ability to manipulate actual devices,
but cognitive decision-making regarding the process of manipulating the devices
and the ability to arrive at the desired outcome.
Therefore, if a person has had a
significant amount of experience working with the actual devices, certification
testing can be based on whether the person can manipulate the devices
correctly. In addition, if the outcome desired is a decision or a result that
can be recorded as contrasted with the production of a physical product such as
welding a pipe or building a wall, computer simulations can allow for judging
the adequacy of the outcome of the performance. Therefore, the authors
determined that psychomotor testing with computer simulations is reasonable if
the computer simulation is designed to allow a person to demonstrate the
ability to manipulate devices, and to produce the desired outcome in a
recordable fashion. The authors also suggest that it would not be
appropriate to use computer simulations
if the goal of testing is to assess a person’s ability to physically manipulate
the real devices to build their psychomotor skills, or to produce physical
products.
In the case of this project to develop the
ophthalmic skills tests, verification of the ability to manipulate the actual
real devices was obtained by requiring candidates to have either successfully
completed an accredited training program which included the skills, or having
work experience with the devices verified by a supervising ophthalmologist.
5. Design Issues to be Addressed
Following the performance testing model,
major design issues to be addressed during the development of the computer
simulation tests were:
1.
The need to realistically present each skill. What people
saw on the screen needed to be an accurate representation of what they would
see in real life.
2.
Navigation through the simulations needed to be simple
enough so testing was not seriously affected by a person’s ability to operate
the computer.
3.
Besides allowing a person to complete a skill correctly,
alternative ways of completing the skill incorrectly needed to be built into
the simulations.
4.
The simulations required built in scoring algorithms that
reflected the ways in which peoples’ performance would be judged on the job.
5.
Adjustments were needed to accommodate differences between
the way people approach computer simulations and real-life performance tests.
6.
All portions of the simulations needed to be validated as
truly and accurately representing each skill and allowing candidates to
demonstrate their true ability to perform the skill.
7.
A tutorial was required that allowed people to be trained
in how to actually use the computer during the simulations to move objects, to
provide directions, and to record responses.
5.1 The simulations needed
to realistically present each skill. What people saw on the screen needed to be
an accurate representation of what they would see in real life.
In order to ensure the simulations were
realistic presentations of the skills, actual movies and pictures were taken of
the skills. They were then incorporated into a computer simulation using FLASH.
The simulations were first developed to show the correct method of performing
the skills. They were later modified to allow candidates to demonstrate
alternative incorrect as well correct processes. Figure 1 presents a
Figure
1

screen capture of a simulation of the
ophthalmic skill “refinement”.
Validation of the realism of the
simulations was first accomplished by having subject matter experts, who were
ophthalmologists and technicians, repeatedly review and suggest modifications
to ensure what was presented on the screen represented the real world. It was further validated
through a series of pilot tests that will be described later.
5.2 Navigation through the
simulations needed to be simple enough so testing was not seriously affected by
a person’s ability to operate the computer.
Developing navigation through the
simulations became a major issue. Navigation needed to not only accommodate the
ability to move through the simulation correctly, but to move through the
simulation in a manner similar to the way the skill would be performed on the
job. Given that these were performance tests, the navigation system also needed
allow people to make mistakes and navigate incorrectly.
The navigation system was developed by
first storyboarding each skill into logical portions of what one would see when
performing each major segment of the skill. Therefore, each skill was broken
down into logical portions based on changes in what a person would need to see
and attend to when in each segment of the skill. The segments of the skill
needed to be discrete so it would be possible to proceed through the segments
in the correct as well as the incorrect order. In other words, if a skill
required a person to focus an eyepiece before moving on to positioning a
device, the eyepiece would be seen in one segment and positioning the device in
another. This allowed candidates to be able to select either focus the eyepiece
as a major portion of the skill, or position the device. Once the segments were
identified, an introductory menu divided into the logical portions or segments
of the skill was developed.
5.3 Besides allowing a
person to complete a skill correctly, alternative ways of completing the skill
incorrectly needed to be built into the simulations.
After getting into a segment, the person
needed to be able to perform the processes associated with that segment
correctly or incorrectly. This required allowing people to activate correct and
incorrect controls on the devices. This was eventually accomplished by placing
arrows on each device control that would allow a person to move a control in
different directions. In Figure 1 above, these arrows are presented on the face
of the controls. The actual device does not have arrows on it. A candidate was
instructed to place the cursor on the appropriate arrow and to activate the
left-hand mouse button to move the device. Although this process currently
seems obvious, it took a number of pilot studies to perfect.
5.4 The simulations
required built in scoring algorithms that reflected the ways in which peoples’
performance would be judged on the job.
Since the simulations were going to be
used for certification testing, scoring algorithms were needed that would allow
a person to be judged on the extent to which they could perform the skill to on
the job standards. This first required the development of scoring rubrics in
the form of checklists. The checklists needed to clearly indicate the correct
procedure for completing each skill, the criteria for judging each portion of
the procedure, and point systems that would reflect the relative importance of
completing each portion of the procedure correctly. Table 1.
presents a portion of a
sample checklist for the ophthalmic skill “keratometry”
Table 1
Scoring Rubric Checklist for Keratometry
Givens:
Keratometer, new patient with astigmatic error.
Required
Performance: Measure the corneal curvature and record results.
Standard: 80 points on process and within tolerance range.
Process
Steps
|
Criteria
|
Score |
|
Focus
the eyepiece. |
Reticule clear |
3 |
|
Instruct
patient |
Patient
instructed to keep forehead and chin in position |
3 |
|
Position
the patient, Etc. |
Patient’s
forehead and chin in position (Computer
version: automatically done) |
0 |
The development of these scoring rubrics
was a long and difficult process requiring many iterations. Although each
ophthalmic expert assigned to work on the project was able to observe people
and determine if they were competent, they had different ways and words for
expressing what they would observe and how they would judge competence. The
additional complicating factor was that the criteria had to be assessable
through the computer simulation. Therefore, it was necessary for them to
repeatedly assemble and arrive at mutually agreed upon criteria to judge skill
mastery.
In some cases if a step was relatively
automatic given a set of instructions, the fact that a person selected the correct
instruction was automatically assumed to lead to the correct action in the
computer. In cases such as the “position the patient” example above, this
simplified the programming of the simulation.
5.5 Adjustments needed to
accommodate differences between the way people approach computer simulations
and real-life performance tests.
At first it was assumed that when people
entered the simulation tests they would proceed directly to complete the tests
in the linear order as specified in the validated scoring checklists. Pilot
testing soon indicted this was not so. Even though an extensive tutorial in how
to mechanically operate the simulations was provided to candidates before
testing, most candidates wanted to try things out to familiarize themselves
with how to manipulate the devices with the computer as they went through the
tests. (A discussion of the tutorial is presented later.) This meant that they
would touch dials to try things out without intending to actually complete the
test. However, the scoring algorithms were developed in such a way that if they
touched the dials out of order they received score deductions. Also, candidates
at times wanted to go back and check on earlier results during the simulations.
Again the original assumption was that once they completed a step they would
move on without returning. When they did return, they were being scored as
doing things out of order.
These unanticipated candidate behaviors
required a further review of the scoring procedures. When a candidate entered a
specific portion of a simulation and a new picture segment, they were allowed
to touch things as long as they did not proceed along a logical sequence that
indicated that they were intending to actually complete the test. They were
scored after they completed a systematic portion of what was being tested in
that portion of the simulation. Also, if a person went back to an earlier step
to check on a previous reading, it was determined that this was reasonable in
many situations during the real-life performance of the skill. Therefore,
scoring was adjusted to ensure that as a person proceeded to the next step they
did all of the previously required steps, and if they did go back it did not
invalidate the procedure. However, if they moved ahead to steps without
completing the necessary prerequisite steps they did receive score deductions.
Again, in retrospect this seems obvious. However, it took a number of pilot
tests and adjustments to arrive at these adjustments to typical performance
test scoring rubrics.
5.6 All portions of the
simulations needed to be validated as truly and accurately representing each
skill and allowing candidates to demonstrate their true ability to perform the
skill.
Throughout the development of the
simulation tests the process was under continual review by a simulation
development committee composed of ophthalmologists and incumbent technicians
who were already certified to perform the skills to be tested. All aspects of
the tests were reviewed an approved by the committee as the developmental
process was underway. This included: a) the extent to which the simulations
were accurate representations of the
skill as they look and are performed in real-life, b) the ease with which a
person can complete the skills without the artificial nature of computerization
compromising a person’s ability to demonstrate their true skill, and c) the
validity of the scoring algorithms to truly judging things that are important
and in the correct relative importance.
In addition, three informal pilot tests
were conducted with actual job incumbents as the project continued. Candidates
gave feedback on the extent to which they felt they could demonstrate their
true skill, the realism of the simulations, and their ability to operate the
simulations. A formal pilot test was also conducted though which formal survey
data were gathered and analyzed. The results showed that the simulations did
allow people to demonstrate their true skill, they were easily operated, and
candidates felt the tests were equally valid or more valid than the real-life
tests used in the past.
5.7 A tutorial was
required that allowed people to be trained in how to actually use the computer
during the simulations to move objects, to provide directions, and to record
responses.
It
quickly became apparent that many of the people who would be tested had
relatively few computer skills. How to move things and record responses was not
intuitive to them. Therefore, an extensive tutorial was developed and assembled
on a CD. It provided candidates with an orientation to the purposes and format
of the overall computerized simulation evaluation of the seven ophthalmic
skills. It also provided detailed examples and opportunities for candidates to
make sample menu selections, move ophthalmic objects with the computer, and to
record responses. The tutorial was pilot tested along with all other aspects of
the simulations and data indicated that the final version was easy to use and
provided the needed information to effectively use the simulations.
6. Summary
This developmental project revealed that
there are many new considerations in developing computer simulation psychomotor
performance tests than those typically faced when developing instructional
simulations, or real life performance tests. However, it also has shown that
the development of such tests is feasible and that such tests are capable of
validly testing such skills.
7.
Recommendations
Experience with the development of these
computerized simulation tests has provided insights that might be useful to
others.
1.
During the design and development of the simulations there
was constant tension between how detailed the simulations needed to be as
evaluation tools and the fidelity of portraying all of the nuances of the
skill. This had many implications for the cost of the project as well as the
effectiveness of the simulations as evaluation tools. The more detail that was
included, the higher the cost. Also, at times detail beyond that needed to
evaluate a person’s skill actually obscured what was being assessed during a
particular segment of a simulation. The implication for designers and
developers is to early on explicitly address the amount of fidelity that is
required to evaluate the skill being addressed. Otherwise, there may be many
costly revisions that could be avoided.
2.
Because of the diversity of the way experts convert their
professional criteria for judging adequate skill performance into words, it is
important to obtain agreement regarding how the process of completing a skill
and the criteria for judging it will be stated. In real life, people may be
looking for the same things and be able to come to the same judgment about a
person’s competence. But when you ask them to write down what they look for and
how they judge if it is done correctly, they tend to express things in
different words. A project can face many false starts if these issues are not
resolved before beginning. If the differences are not addressed early, they
will surface as the project progresses.
3.
The use of computer simulations to replace live performance
tests is not meaningful in all situations. It is important to make sure doing
so makes sense in terms of the particular situation. A rationale presenting the logic for using
them similar to the one presented earlier should be developed.
8.
References
[1] American Educational
Research Association, American Psychological Association & National Council
on Measurement in Education. Standards for Educational and Psychological Testing.
[2] Pucel, D.J. Developing
and Evaluating Performance-Based Instruction(second edition).
[3] Alessi, S.M. &
Trollip, S.R. Multimedia for Learning: Methods and Development. (third
edition)