The Kirkpatrick Model for Summative Evaluation of Training and Instruction
The question of how to identify the effects of training has been an ongoing challenge for designers as well as training managers. How do we know the training was successful? How successful was it? Was there a quantifiable improvement in performance, or not? How is that measured? In 1975, Donald Kirkpatrick first presented a four-level model of evaluation that has become a classic in the training industry. These four levels provide a structure to measure the effectiveness of instruction dependent on both the complexity of the training, the transfer of knowledge over time, and the final impact for the trainees organization.

These levels can be applied to technology-based training as well as to more traditional forms of delivery. Modified labels and descriptions of these steps of summative evaluation follow.

Level One: Students' Reaction
In this first level or step, students are asked to evaluate the training after completing the program. These are sometimes called smile sheets or happy sheets because in their simplest form they measure how well students liked the training. However, this type of evaluation can reveal valuable data if the questions asked are more complex. For example, a survey similar to the one used in the formative evaluation also could be used with the full student population. This questionnaire moves beyond how well the students liked the training to questions about:

With technology-based training, the survey can be delivered and completed online, and then printed or e-mailed to a training manager. Because this type of evaluation is so easy and cheap to administer, it usually is conducted in most organizations.

Level Two: Learning Results
Level Two in the Kirkpatrick model measures learning results. In other words, did the students actually learn the knowledge, skills, and attitudes the program was supposed to teach? To show achievement, have students complete a pre-test and post-test, making sure that test items or questions are truly written to the learning objectives. By summarizing the scores of all students, trainers can accurately see the impact that the training intervention had. This type of evaluation is not as widely conducted as Level One, but is still very common.

Level Three: Behavior in the Workplace
Students typically score well on post-tests, but the real question is whether or not any of the new knowledge and skills are retained and transferred back on the job. Level Three evaluations attempt to answer whether or not students' behaviors actually change as a result of new learning. Ideally, this measurement is conducted three to six months after the training program. By allowing some time to pass, students have the opportunity to implement new skills and retention rates can be checked. Observation surveys are used, sometimes called behavioral scorecards. Surveys can be completed by the student, the student's supervisor, individuals who report directly to the student, and even the student's customers. For example, survey questions evaluating a sales training program might include:


Level Four: Business Results
The fourth level in this model is to evaluate the business impact of the training program. The only scientific way to isolate training as a variable would be to isolate a representative control group within the larger student population, and then rollout the training program, complete the evaluation, and compare against a business evaluation of the non-trained group. Unfortunately, this is rarely done because of the difficulty of gathering the business data and the complexity of isolating the training intervention as a unique variable. However, even anecdotal data is worth capturing. Below are sample training programs and the type of business impact data that can be measured.

Sales training. Measure change in sales volume, customer retention, length of sales cycle, profitability on each sale after the training program has been implemented.

Technical training. Measure reduction in calls to the help desk; reduced time to complete reports, forms, or tasks; or improved use of software or systems.

Quality training. Measure a reduction in number of defects.

Safety training. Measure reduction in number or severity of accidents.

Management training. Measure increase in engagement levels of direct-reports