CTSE 540
Session 2
Introduction to Courseware Evaluation Models;
Types/Focus of Instructional Evaluations;
Sources of Instructional Materials; Evaluation Principles; Systemic Change
Research Involving Computers and Instruction
I INTRODUCTION TO COURSEWARE EVALUATION
Purpose: Everyone evaluates everything in education. Teachers evaluate student progress by using 1) normative evaluation (comparing one student with the others) (the common way of ranking students for grading purposes); 2) criterion evaluation (comparing student learning to the course objectives) (comparing individual student progress against instructional standards). Principals evaluate staff performance. Supervisory district personnel and parents evaluate the performance of each school and their instructional programs. Instructional developers evaluate instructional materials and methods to determine which ones are best.
The evaluation of both individual materials and complete instructional programs is a key to the success of any instruc-tional activity. The assessment, which occurs during the development or formation of the program, is called formative evaluation. This assessment is usually done with individuals or small groups of students taken from the intended user population. It en-sures that the vocabulary, pacing, examples, reinforcement, and other variables are appropriate. Following development of an entire program, summative evaluation is performed with groups to ensure that the entire program works -- that the objectives are met. As a result of evaluation, several revision cycles are likely to occur. These cycles will con-tinue until the behaviors, criterion levels and degree statements in the objective(s) are attained. These evaluation processes will be discussed in more detail later in this module.
Because most instructional technology assessments are objective based rather than norm based, criterion-referenced measures are most common in training organizations if not in educational institutions. Process and product evaluation are both common in I.T. activities. Not only do we want to know if the our programs and materials work [product evaluation], but we also want to be able to generalize from these specific activities [process evaluation] into the future. Process evaluation is thus, important to the development of a solid instructional technology. Without process evaluation we would quickly revert to an audio-visual or resource-provider level of problem solving. To learn from spe-cifics and develop theories or models which will allow us to generalize into the future is one of the more complex, challenging and rewarding sides of the I.T. profession.
Evaluation is a key skill that a professional in this field must master. There are several evaluation models that are useful at varying times, depending on the problem and its parameters.
"Every sentence I utter must be understood not as an affirmation, but as a question."
-- Niels Bohr
Evaluation is a form of quality control. It looks at questions like: "Was the right problem resolved?" "Were the object-ives taught to the degree specified? Under the designed conditions?" "Do the same results occur repeatedly?" A well-known definition of evaluation originated by Ralph Tyler perceives evaluation as "The process of determining to what extent the educa-tional objectives are actually being realized." Evaluation then, involves the measurement of how effectively the learners are meeting the objectives as a result of instruction.
When a child fails to learn, it is not necessarily the fault of the teacher. On the other hand, if the learner didn't learn, we take the position that the teacher or instructional system didn't teach. Just as giving implies receiving, teaching implies learning. Evalua-tion, however, must be performed to determine if learning did occur, and, if it did not occur, why not. The learner, the teacher and the instructional design or methodology must all be looked at and evaluated as a whole.
Formative evaluation looks at the process of learning and teaching during the time the instructional design is being devel-oped and materials produced. The primary purpose of formative evaluation is to improve the process or instructional methods and their resultant products. The concept evolved in the early 1960's but was called "developmental testing," "product tryout," and "learner verifi-cation." When A-V materials are being developed, formative evaluation is usually accomplished using storyboards before the final materials are developed. The Sesame Street producers were among the first to develop rough drafts, using storyboards, to try out the visual segment with children. Commercial advertisers use storyboard tryouts with "target populations to determine the reactions of these potential purchasers. Then they revise the presentations until they are satis-factory. Evaluation specialists Lumsdaine and Scriven clarified these ideas by defining and distinguishing between formative and summative evaluation. Teachers and instructional materials producers can inex-pensive and efficiently improve their presentations by the use of formative evaluation procedures.
Summative evaluation is performed near the conclusion of the teaching/learning process to draw inferences or conclusions about the effectiveness of the instruction. Some say formative evaluation is to improve, while summative evaluation is to prove.
Both types of evaluation examine the learner, the teacher and the instructional design, and while both attempt to determine what the student actually learned, their focus is different. The data sources are also unique for each form of evaluation. Formative evalua-tion, might ask about the learners' study habits, class attendance, preparation for learning, and partici-pation in class activities. In con-trast, summative evaluations might ask: "How much did they learn concerning each objective?" Formative evalua-tion of the teacher ad-dresses his or her implementation of educa-tional methods, per-sonality, use of media, and the complete-ness of the preparation for the class. The intent of this evalua-tion is providing feed-back for improving classroom performance. Summative evaluation will focus on how well the students learned what the teacher taught. The instructional design is formatively examined for the appropriate-ness of the educational method, level of object-ives, grading system, use of feed-back to the student, media selection, etc. Summative questions regarding the instructional design address its costs, logistics require-ments, ease of main-tenance, and level of acceptance by parents, administrators, teachers, and students.
Formative and summative evaluation are really two sides of the same coin. During the early stages of a development project, most evaluation questions revolve around making improvements to the system. As the project progresses, emphasis shifts toward docu-menting its effects. In some cases, the same data may serve both formative and summative goals. (e.g.: item analysis of a post- test). In others, unique procedures must be employed, par-ticu-larly to obtain data for making revisions. It is one thing to know revisions are needed--it is another to know what specific revisions to make.
The evaluation design should be planned at the same time as the instructional design. The evaluation design, if possible, should be done by a different person or agency than the one respon-sible for development of the materials. This helps ensure that the evaluation is objective and believable. The evaluation design should include:
The analysis of instructional strategies and materials is an ongoing activity. Analysis must continue from the time of development until it has been determined that the entire instruc-tional system is workable. This may require several evaluation and revision cy-cles. The analysis or evaluation occurs in the same time frame as the revision activities.
Evaluation involves the collection and use of information to make decisions about an instructional program. Evaluation is per-formed to:
There are two general yardsticks against which to measure student performance. Norm-referenced measures compare one group or individual's performance to the results of a previously tested group or for individuals it may be their own group). Norm- referenced measure-ments can rank learners according to achieve-ment and are thus often used to assign grades, IQ scores, etc. These measures tend to result in competition among individual students. Criterion-referenced measures, on the other hand, compare a stu-dent's performance with objective criteria. These measures test individual mastery to a criterion as stated in an objective, and tend to foster competition within the individual rather than between individuals. Criterion referenced testing is generally preferred by instructional technologists since individual perform-ance is compared to the behavior contained in the objectives. Under these conditions it is possible for all students to perform at an acceptable level. In contrast, when norm referenced testing is conducted, learners are often compared to each other and per-formance plotted onto a curve. This practice guarantees some learners will be evaluated as successful while others are con-sidered unsuccessful. In reality all might have done very poorly compared to what was stated in the objectives, but some will be declared competent. Of course, the reverse is also true, if all learners have learned almost all of the behaviors, minor differences might become major determinants of who "passes" or "fails." Also, since what constitutes a "pass" can float up or down, revision of the instruction to improve it has less meaning.
Evaluation may focus on the instructional process or on the product of the instruction. Process evalua-tion examines the means or methodology used to reach the object-ives. Deter-mining if programmed materials displayed on video-tapes are more readable than equivalent materials in text form is a process evaluation. The evaluation is performed while learning is occurring, and examines the process itself. Product evalua-tion looks at the results of a program after its implementation and is ends oriented. Determining the effectiveness of a specific film is a product evaluation. The evaluation is performed after the instruction has taken place.
Process evaluation tends to focus on formative questions about what does and doesn't seem to need revision. However, summative process questions about time, cost, and acceptance can also be examined. Product evaluation tends to focus on summative questions, but can also examine whether the content is accurate or obsolete, which are formative questions.
II THE EVALUATION PROCESS
It is important when planning an evaluation to be specific about what is to be evaluated. Decide if the evaluation will be formative, summative, or both. Determine how the evaluation infor-mation will be used and whether criterion or norm-referenced measures should be used. The process of perform-ing an evaluation may differ according to the theoretical per-ception guiding the evaluation. In conducting an evaluation there are a number of steps to follow:
When designing an evaluation plan, decisions should be made involving the desirability of program objectives. Objective studies, it is argued, are free from subjective inputs. They stay with the scientific method of investigation to eliminate the bias of the evaluators. Objective studies supposedly permit one to generalize because of the use of objective, bias-free, measures. Objective data are more useful in medicine (where things tend to be seen as black or white), but subjective data are often more useful in the behavioral sciences future. In reality, evaluation should not be an "either...or..." situation. Both play important roles in designing and implement-ing instructional systems.
Subjective studies are often useful precisely because they do not eliminate potentially relevant parameters just because they would complicate a "tight" research design. High relia-bility and validity in a study are desirable, but too much is often lost in achieving these values. Many research studies are too far removed from the interactions of the real world. The design of evaluation studies should be based on both personal profes-sional knowledge and formal design/models if practical cur-riculum decisions are going to be made. Firmly grounded objec-tive studies can perhaps then be performed after a solid grasp on the variables has been obtained. Eventually, fine points, or compar-isons, can be made with research designs rather than with evalua-tion designs.
Many commercial organizations use an evaluation hierarchy by Kirkpatrick. There are four levels in this model:
1. Do the students like the instructor, the course, the instruction; sometimes called Warm Fuzzy's
2. Does the training meet the cognitive / psychomotor objectives?
3. Can the students use the instruction on the job?
4. Does the instruction make a difference?
There are different focal points for program (or course) evaluation. Each type of evaluation requires unique instruments and provides different answers to instructional developers and to classroom teachers. These types of evaluation are:
1. Software / Program / Content Evaluation
Quantitative Measures
* Use judges and learning points for learning measures (trained observers)
* Behavioral Evaluation / Observation on the job (this may be done using video recording)
* Time-Motion studies to determine efficiency & effectiveness
* Longitudinal studies (1, 3, 6, 9 month delay)
*
Qualitative Measures
* Self Appraisal
* Description of learning
*
2. Affective Evaluation
* Use of Likert Scales
* Peer Evaluation
*
3. Course / Instructor Evaluation
* "Brag Sheets" Instructor Evaluation
*
4. Technical Quality
* Are the course materials "professionally" done -- are they well developed materials
* Are the graphics appropriate and well developed
Example of Commercial Product Evaluation
According to an August 1994 issue of InfoWorld, Microsoft Corp. is seriously evaluating their new software product "Chicago." Fully six months before its expects expected release, they have spent 250,000 hours testing Chicago; 150 Microsoft employees are dedicated to testing Chicago full time. These employees have made more than 10,000 call to beta testers of the product -- and a more fully featured Beta 2 version is due to be released in October before the expect May 1995 release of the product. More than 100,000 copies of the beta versions of Chicago [Windows 95] was tested in the field by users before the product is finally released. No dollar estimate for this product evaluation has been provided, but this company, has spent a lot of time, energy and money to ensure that the product works like they want it to before it is released. Most instructors spend this kind of time evaluating their course materials too -- right?
Example of K-12 Teachers' Evaluation and Use of Instructional Materials
An evaluation of educational software found that instructional programs which are difficult to master easily (Where in the World is Carmen San Diego) are often not sought by students (in this case, to learn world geography and world history). Oregon Trail (a U.S. history simulation and problem solving program by the Minnesota Educational Computer Consortium) encourages paired students to talk about the educational issues and to seek-out and enjoy this educational activity. Even in this case, it appeared that a lack of teacher training in how to use the materials inhibited their effective use. "Ideally, a teacher should be 'bundled' with the software At the site where the program is purchased, that teacher could give a short workshop on how to use the program in a classroom. Including such information in the written documentation is, in many cases, not enough; few teachers read all the documentation. It is faster and easier to learn by demonstration..." [Caftori, N. (August 1994). Educational effectiveness of computer software. T.H.E. Journal, 62-65.] "Until software designers are certain of what features do both -- attract children and teach -- they should adopt the pedagogical methods that teachers use in a hands-on environment or for manipulatives: Remind students of what the goal is and point out inconsistencies in students' actions. While integrating those techniques, designers should avoid the most uninteresting method -- lecture." "Designers can also make the program 'behave' as a teacher, rather than counting on teachers being present in the room. If educational software can assume a teacher's role and still be motivating to children, it would accomplish something dearly needed: Specific attention provided to each child, immediate feedback and individual guidance." (p. 65)
III TEXTBOOK ASSIGNMENT 1 STUDY GUIDE.
Hannafin & Peck, The Design, Development, and Evaluation of Instructional Software, 3-24. Be prepared to define / discuss the following topics in class:
CAI [vs. CBT, CMI]
types of CAI [drill & practice-CAI, tutorial or instructional-CBT, simulations or modeling, games-usually CBT]
Instructional Programs CAI vs. Page Turning CAI
Research Conclusions:
CAI Effectiveness:
On the negative side:
Characteristics of Effective CAI:
1. Uses appropriate, clearly developed objectives are the basis of developing effective CAI
2. Focuses on learner characteristics -- any one program is not intended for all learner populations.
3. Maximizes S/S and S/T interactions -- "high tech results in high touch"
4. Is individualized.
5. Maintains student interest.
6. Approaches the learner with a positive tone as in a one-to-one tutorial
7. Provides a large variety of feedback to the student.
8. Fits the instructional environment.
9. Evaluates student progress appropriately.
10. Uses the computers capabilities for manipulating data [student progress, types of successes]
11. Uses effective instructional design principles.
12. Uses formative and summative evaluation principles effectively.
IV RESEARCH TRENDS / CONCLUSIONS INVOLVING COMPUTER ASSISTED AND COMPUTER BASED INSTRUCTION
In addition to the conclusions drawn from the textbook (above) on the effectiveness of CAI there have been thousands of studies on the effectiveness and effects of using computers in training. See if you can fill-in the following information about the educational uses of computers.
Logistical Data
· White students appear to have access to more computers at school (32.8%) vs. Hispanic (10.4%) or Black (10.9%) students. - U.S. Dept. of Education, Digest of Education Statistics, October, 1993.
· White students appear to have access to more computers at home (43.3%) vs. Hispanic (15.2%) or Black (16.1%) students. - U.S. Dept. of Education, Digest of Education Statistics, October, 1993.
·
·
Learning Conclusions
·
·
Affective Conclusions
·
·
"Logical" Conclusions
· The computer, especially when hooked to the Internet, permits the inclusion of resources into classroom that may not have sufficient textbooks or other learning materials. On-line services provide access to encyclopedias, other students (often distant) who have similar interests, national news, etc.
· Adults who have successful computer skills are more likely to get good jobs than adults without these skills (in today's computer-rich environment).
·
·
V RESEARCH METHODOLOGY IN INSTRUCTION
The purpose of research and evaluation assessments in educa-tion is to ensure that effective and efficient educational sys-tems are designed, developed, and implemented. The focus of research and evaluation in Instructional Technology should be to determine the relative effectiveness and efficiency of specific instructional designs and materials.
Now we will examine research and evaluation tools, their contribution to education, and their uses by instructional tech-nologists, designers, producers, evaluators, and change agents.
Good ongoing research is necessary, since it provides a foun-dation for instructional technology. Among the contributions are:
Research and evaluation efforts are being questioned today in many academic communities. Appropriate research designs using "statistics" or projection techniques to determine sta-tistical differences are sometimes thought to be irrelevant because the studies are too controlled, or out of touch with reality. The United States Congress, during the 1970s, cur-tailed federal research funds in education because of concern regarding the utility of many of these studies in helping the practitioner in the classroom. Independent reviews of educa-tional studies by John Goodlad and others conclude that few of the research studies employed appropriate experimental design. Studies of irrelevant, insig-nificant factors or parameters also occur far too often. Data based on techni-ques useful for studying rat mazes or electron- movement seem to have little relevance in the more complex domain of human learning. Learning based on trivial questions in "laboratory" situations may also have little relevance to a class-room teacher or an instructional technologist.
Many of the questions useful to instructional technologists cannot be answered by physical science methods of research. The instructional technologist needs to look at the interrelationship among multiple variables and complex problems, such as "What are the most relevant composition objectives for a second grade student?".
The questions that can be answered by traditional research designs are often of a lower order, less relevant, nature. Gagne (1978) after analyzing numerous research studies over an extended period of time was convinced that correlational studies are in-evitably weak in identifying causal effects. He stated, "Only an ethically impossible experiment would end the doubt in terms of the logic of scientific method." (p. 235) We cannot know whether direct, structured, formal research-based methods really promote higher achievement than, for example, progressive, open and informal methods. Gage believes our only acceptable alter-native is a combin-ation of logic, insight, raw experiences, common sense, and the writings of persuasive prose stylists.
As instructional media evolve they first follow the instructional strategies, contents and formats of the media they are replacing. It appears from recent research that only the unique features of a medium affect learning efficiency. "Thus television researchers replaced the search for average effects of the medium with more refined questions (e.g., "Does the depiction of movement enhance com-prehension?") that focused on specific salient television attri-butes and qualities. A parallel question about computer-afforded learning would be "Does the manipulation of a model's variables facilitate the comprehension of gradients and absolute values?" (Salomon & Gardner, 14) This approach to designing media research seems the most rational. This will permit the investigation of training costs, optimal instructional design, perception and psychological variables.
Barbatsis (1978) suggests that ITV research has not led to theore-tical progress in answering the question: How can television be used to teach? One is left to conclude, says Barbatsis, that, ". . .the major factor contributing to the domination of research which has produced so little progress in the field of instruc-tional tele-vision has been the bias of the academic community to inquiry dedicated to scientific validation." (p.412). The same con-clusions can be drawn from the lack of impact of research on other media and processes used in instructional technology.
Researchers are still asking easy, but now irrelevant, questions. For example: Is televised teaching as effective as the traditional lecture method of instruction? The fact that after two decades the same question is still being asked, indi-cates the difficulty of attacking the questions beyond this level.
The futility of past research is seen in the summarization of research on instructional technology. Instruc-tional tech-nology studies have not provided enough information to eliminate the uncertainties involved in making choices. Wilkinson (1980) points out another weakness in the media research in his summary of research involving media. He states, "Many of the studies in the filed were set up to demonstrate prior con-victions rather than to examine carefully drawn hypotheses. The results of several decades of research never-the-less . . . can be summed up as 'no signif-icant difference.'"
V EVALUATION MODELS
General Evaluation Models
Instructional Evaluation Models
Introduction
There are many ways to evaluation instructional software. One of the most common is by examining the instructional objectives it purports to teach and determine how well the software really teaches those objectives. An instructor can then compare their objectives and by knowing the "quality" of piece of instructional software has a fairly good idea how well their students will learn from the software.
Software evaluation is not a unique field -- it is a part of the general field of educational evaluation. The procedures and protocols of software evaluation are those common to the broader field of evaluation.
Many commercial software development companies also seriously evaluate their programs:
Learning Company (Fremont, CA) develops educational software programs (Operation Neptune, Reader Rabbit, Treasure Math Storm) for the PC and Macintosh computers. They have about 20 programs for children ages 3 through 17--designed to burnish reading, math & problem-solving skills with prices ranges from $39 to $69. They do this with programs that use animated characters, sound effects & sophisticated multicolored graphics.
From 1987 to 1992 the company's research ad development expenditures have averaged 20% of sales and has involved parents, educators and children in the evaluation of their software. The average time to get a product to market is thus a long 14 months. CEO William A. Dinsmore III says "At times we elected not to take the profit and to invest that money into the product development engine." It apparently pays off as some 70% of their current sales are to previous customers. This R&D for Operation Neptune, for example, included consulting with marine scientists and teenage hackers.
These programs are really both games and learning software. The gaming helps interest the students in the activity and keeps then in the learning environment. The 1993 profits at Learning Co. should clime to $2.9 million on sales of $24.5 million -- you can make money with good educational software.
-- Teitelbaum, R.S. (October, 1992). Companies to Watch, Fortune, 95.
1. Objectives Evaluation Model
The educational objectives evaluation model was proposed by Ralph Tyler (1949) as the Eight Year Study was designed in the 1930s. The "progressive" education folks felt that a program's effectiveness can be judged by comparing a student's progress with stated course objectives. Earlier assessments were measurement studies that focused strictly on individual students.
A tacit understanding in this model is that all of a program's objectives can be listed as behavioral and measurable objectives. This model works quite well for software evaluation but as a program becomes increasingly complex it is more and more difficult to assess the product against a list of objectives. It may be very difficult to write objectives for some teaching tasks such as those requiring creative, democratic or similar behaviors; it may be difficult to observe and measure a students growth in an instructional setting -- well enough to conclude when an objective has been learned. There may be intended but overlooked (unwritten) objectives which are not measured by this assessment methodology. Popham published a popular book on this procedure in 1978. This is the most common evaluation model in use today.
2. Utilization-focused Approach
In the mid 1970's Patton (1978) argued that courses can be evaluated using criteria that the decision-makers will bring to an evaluation report. He argued that you need to determine the questions these report readers will need to answer in making their application decisions and then provide that data. In other words, involving the decision-makers in the evaluation planning process increases the likelihood of the evaluation findings having relevance and being used. Patton relied on Stufflebeam's work in the early 1970s who thought that evaluation should be a process of delineating, obtaining and providing useful information to decision-makers.
3. Goal Free Model
Michael Scriven has advocated another type of model. In the early 1970s he said that knowing a programs objectives in advance of an evaluation can bias an evaluator so he urged evaluators to ignore an objectives so they will broaden their assessment to both intended and unintended effects.
4. Responsive Evaluation Model
Guba and Lincoln (1981) developed an evaluation model using the concerns and issues of the evaluation reader or audience as the evaluation procedure. They believe that the major focus of evaluation should respond to the information requirements, the wants and needs, of the evaluation audience. This model is similar to some of those mentioned above.
5. Analysis, Design, Development, Implementation and Evaluation (ADDIE) Model
Daniel Blair (The role of training product evaluation, Hewlett Packard, 1995) writes that educational evaluation, when used to evaluate or develop instructional materials, is a linear process. There are, he says, five elements that comprise the Instructional Systems Design (ISD) model. These five components are: Analysis, Design, Development, Implementation and Evaluation. See Figure 1.
Analysis has a primary function of collecting, analyzing and summarizing the data necessary for decision making through the design, development and evaluation phases. Some of the tools using during this phase are: Goal analysis, Target audience analysis, Task analysis, Performance gap analysis and Cost-benefit analysis. Design activities result in the design and sequences of a set of learning sequences. This process decides the media-methodology strategy. Prototype materials and its associated formative evaluation is a part of this development specification. Development activities result in the construction of effective learning materials. Implementation is often the longest and most costly phase of the product-life cycle. Learning materials are delivered, instruction is conducted, and knowledges and skills are assessed during this phase. Summative evaluation is an on-going activity to ensure the desired learning is occurring and that necessary revisions are made in the instructional materials. Evaluation is a holding bucket used to hold information about the learning materials. The primary function of summative evaluation is to collect, analyze, and summarize data so that update activities, modifications, learning effectiveness, utilization considerations, efficiencies, and product obsolesce decisions can be made in a timely, useful manner. In industry, this process allows cost-benefit analysis to occur to document the "return on the training investment." (Davidove, 1993)
References:
Davidove, E.A. (1993). Evaluating the return on investment of training. Performance and Instruction, 32(1), 1-8.
Guba, E.G., & Lincoln, Y.S. (1981). Effective evaluation. San Francisco: Jossey-Bass.
Patton, M.Q. (1978). Utilization-focused Evaluation. Beverly Hills, CA: Sage Publications.
Popham, W.J. (1978). Criterion-referenced Measurement. Englewood Cliffs, N.J.: Prentice Hall.
Rossi, P.H., & Freeman, H.E. (1993). Evaluation: A systemic approach. (5th Edition). Newbury Park, CA: Sage Publications.
Scriven, M. (1973). Goal-free evaluation. School Evaluation: The Politics and Process. (E.R. House, ed.). Berkeley, CA: McCutchan.
Scriven, M. (1991). Evaluation thesaurus (4th ed.). Newbury Park, CA: Sage Publications.
Stake, R. E. (1967). The countenance of educational evaluation. Teachers College Record, 68, 523-540.
Tyler, R.W. (1949). Basic Principles of Curriculum and Instruction. Chicago, IL: University of Chicago Press.
VI DATA COLLECTION METHODS/TECHNIQUES
Collecting information for an evaluation can be accomplished in many ways.
Achievement tests can be administered to large groups at low cost. These tests measure student knowledge over a wide spectrum of subject areas.
Aptitude or IQ tests are somewhat useful measures of potential.
Paper and Pencil self-reporting measures ask individuals to express their attitudes, beliefs, perceptions and feelings.
Ques-tionnaires are self-administered surveys consisting of sets of questions. They are inexpensive to construct, but the type of information they provide is often limited, biased and incom-plete.
Rating scales can be used for evaluation of indivi-duals, events, or products. Student attitude, for example, can be rated on a five point scale from one to five. These scales are easy to develop and provide objective data which may, however, be biased due to differences in definitions.
Ranking scales arrange a set of items into a hierarchy according to a value or preference. A teacher may be asked to rank students or textbooks. Because ranking is forced, raters may make some distinctions despite not really noting any differences.
Semantic differentials may be used to measure attitudes and affect according to the indirect meanings of words. The format of a semantic differential question might be:
Instructional Technology is -
Good --- --- --- --- --- Bad
Powerful --- --- --- --- --- Weak
Desirable --- --- --- --- --- Undesirable
Effective --- --- --- --- --- Ineffective
In this illustration, the semantic differential reveals the attitude of an individual toward instructional technology. Instructional Technology was being assessed for its goodness, power, desirability, and effectiveness.
The Q-Sort permits individuals to rate items or statements by prioritizing them. Teachers could be asked to rate ten textbooks as "very good" to "very bad" in such a manner that at least two texts are assigned to each category. In essence, it is a forced choice classification exercise.
Diary techniques re-quire individuals to keep hourly, daily, or weekly accounts of specific activities, attitudes, thoughts or events.
The critical incident technique requires the re-cording only of particularly important, unique or useful infor-mation. Emphasis is placed on recording or reporting those incidents or situations that seem to make a big difference in system performance. If an event occurs frequently, but does not have major impact on other events, it is not a critical event. Uncommon events that have major con-sequences are, however, considered critical. Diary and critical incident tech-niques are difficult to interpret and score.
Observations can be used to determine what instructional materials are being used. Eyewitness observation, self-completed checklist, rating scales, field notes, and summary reports are all examples of observation instruments. Standard observations require careful planning to insure accurate information. Many observed individuals, how-ever, do not behave normally while being observed. It is also difficult to train accurate observers. Time sampling observa-tions provide repeated observations of a given situation. While this technique overcomes some of the limitations of the standard observation techniques, it does not permit the program as a whole to be observed.
An interview may be performed by talking to an individual or group. Interviews can be either unstructured and spontaneous, or highly structured. A face-to-face interview permits the probing of sensitive issues like attitudes or values. This pro-cess, however, can be time-consuming and expensive and requires exten-sive training of the interviewer. Telephone interviews, although less sensitive to attitude and value information than face-to-face interviews, are much less time-consuming and expensive.
Performance tests require individuals to complete a task. Evaluation may focus on the performance itself, the end-product of the performance, or both. Rating scales may be used to eval-uate the performance. These measures are often time-consuming, expen-sive and difficult to develop but if related carefully to object-ives provide excellent data.
Record reviews use score data, such as tests and grades, from which inferences about achievement or attitude can be made. This method can be used to access individual achievement.
In summary, an evaluation technique closely matching the objective should be selected and used. Usually, the decision is easy to make. To determine the consistency of object-ives and a corresponding test, the condition state-ment should be examined to determine if the evaluation or test can be given under those specified condi-tions. The degree state-ment should also be examined to determine the scoring standard. And of course, the behavior statement should be examined to determine specif-ically what behavior the student is to exhibit.
An examination of the verb in behavioral objectives, as suggested in Chapter 6, will permit you to relate the verbs direct-ly to one of the testing formats discussed earlier in this chapter. If the behavior involves recall or recognition, then the most appropriate test formats are those which require the student to recall or recognize the answer. Matching, multiple choice, and true-or-false questions are examples of recognition tests. A recall objective requires a short answer, fill-in-the-blank, or "list the . . ." type of question.
Measuring a concept, procedure, rule or principle objective usually requires a short answer, fill-in-the-blank, listing, or a performance test item. A procedure objective typically requires a short answer, fill-in-the-blank, narrative description, or a per-formance test. Problem solving performance is often based on projects, case studies or performance simulations.
VII EVALUATION PROJECTS
Some of the first coherent facts about the quality of educa-tion came from the National Assessment of Educational Progress (NAEP) in 1970. This government-sponsored project mea-sured student achievement in art, careers, citizenship, litera-ture, mathematics, science, social studies, music, reading, and writing. As the facts poured out of NAEP, curriculum develop-ers, textbook writers and other educators found they could better assess their students and consequently design better instruction for them. Another Federally funded source of statistics about education is the National Center for Educational Statistics, (NCES). These large evaluation efforts provide data that tend to be valid and reliable. All professional educators including technologists should study the results of these pro-jects.
In the spring of each year NAEP personnel select a representative sample of some 22,000 4th-, 8th-, and 12th-grade students from across the U.S. and examine their knowledge in one or more areas. Students are usually presented with both background questions and cognitive tasks. This data is then compiled to permit the U.S. Department of Education (Office of Educational Research) to partially determine the quality of education in the U.S.
NAEP questions are based on generally accepted "themes" for the content areas evaluated. The Spring 1994 evaluation examined U.S. History and some of the questions were:
Sample Question (Grade 4)
Your teacher has asked you to teach your classmates about ONE of these famous places where an important event in American history happened: 1) the Alamo; 2) Pearl Harbor; 3) Gettysburg; 4) Roanoke Island. My famous place in American history is _____________________.
Write down three facts about the place that you have chosen that will help you teach your classmates about the place.
Fact 1 _____________________________________________________
Fact 2 _____________________________________________________
Fact 3 _____________________________________________________
Sample Question (Grade 12)
Given: Average Farm Size and Total Number of Farms:
Year Farm Size Number of Farms
1900 150 Acres 6,250,000
1980 425 Acres 2,225,000
Summarize the changes shown in the table above. _______________________________
_______________________________________________________________________
Explain how one invention or development helped cause the changes you have described.
_______________________________________________________________________
This question is an example of [which of the following ________]:
Theme: Economic and technological changes and their relation to society, ideas, and the environment.
Period: The development of modern America (1865 to 1920).
Skill: Historical analysis and interpretation.
As you can see, the questions are broad conceptual ones that may allow a good understanding of the instructional quality. The NAEP data will not allow us, that I can see, a way of measuring instructional software directly, but it does provide a way to evaluate the quality of American education and suggest places where software might be developed.
Coleman, Hoffer and Kilgore (1982) published data from the "High School and Beyond" study by National Opinion Research Center involving information on 58,000 sophomores & seniors. Many additional researchers have reviewed this data. In their post- hoc analysis, Walberg and Shanahan (1983) found that private schools, even elite private ones, did not produce superior cogni-tive achievement, once fixed characteristics of their students and their backgrounds were statistically controlled. They concluded that quantity of instruction is the only educationally alterable variable that emerges as a strong correlate of performance. Their results are based on year-long courses taken in English, math-ematics, French, German, Spanish, history, and science. This time- on-task conclusion is important since it stresses that our instructional designs should not focus primarily on reducing time to the attainment of given objectives, but rather should result in more time spent on learning activities.
The College Entrance Examination Board also regularly evaluates student aptitude. Its Scholastic Aptitude Tests (SAT) provides data annually which suggest that over the last few years, the nation's youth are performing less well on verbal and mathematics measures than in earlier years.
The Navy Personnel Research and Development Center evaluated the objectives, course content and test items in 100 U.S. Navy classrooms. Taylor, Ellis, & Baldwin (1988) found that a majori-ty (56%) of the 1945 knowledge objectives examined were inappro-priate for the course training goals and future job requirements. It was soon evident to them that 49% of the course objectives were never tested. They then examined the course examinations in more detail and found that 48% of the test items did not match the listed objectives. Further examination of the of the objec-tives and their related tests resulted in a finding that over one third (38%) of all test items were considered inappropriate--that is they did not test the listed appropriately test their relevant objective. In other words, course objectives were frequently unrelated to course testing procedures. When examining the course content, in relation to the course objectives, the major problem identified in this study was that practice requirements specified, or suggested by the objectives, was accomplished only 50% of the time. It appears, in this study, that the objectives in many classes were not the basis for course and tests develop-ment.
The evaluation of individual students re their achievement, or potential for achievement [IQ], has been difficult to do for the last few years. NAACP, ACLU and other groups have sued to prohibit the testing of blacks (they argue that the tests are culturally biased). In 1992 the California Learning Assessment System tests were developed and administered. The CLAS tests in reading and writing to students in grades four, eight and 10. A judge in May 1994 ruled that students may be given the tests without prior parental approval. Never the less, the testing of students to know what they know so they can be given appropriate instruction is still a touchy topic in the U.S.
VIII COURSEWARE EVALUATION PROJECTS AND MODELS
Introduction
There are several software clearinghouses which can be used by educators to provide initial software evaluation. Final evaluation should be done by the school/teacher to ensure the materials are relevant for their students.
Some of these clearinghouses which do software evaluation are:
* Educational Products Information Exchange (EPIE).
* MicroSIFT
* York University Faculty of Education On-line Service.
Many of these evaluation may be obtained either through your "education librarian" or accessed on-line.
Several publications also provide software evaluations. These include:
* Software Reports
* Digest of Software Reviews
* The (Annual) Educational Software Preview Guide. This is a very good source of courseware evaluations. The Guide is developed by the Educational Software Evalu-ation Consortium which represents 17 organizations involved in computer education in North America. The selection of titles is based on a critical evaluation conducted by participating organizations and participants at the California Software Evaluation Forum. The 200+ materials are indexed by curriculum area, topic, grade level, and computer brand. Source and costing information is included. Copies are available from the California TEC Center Soft- ware Library & Clearinghouse, SMERC Library, San Mateo County Office of Education, 333 Main Street, Redwood City, CA 94063.
Because so many educators have been developing their own instructional software several organizations have evolved to assist in the evaluation and dissemination process. The include:
* CONDUIT
* Minnesota Educational Computing Corporation (MECC).
Despite the active involvement of the above organizations, the quality of instructional software tends to be poor (Bialo & Erikson, 1985). York University reported that only five to nine percent of the software titles it has evaluated could be considered acceptable (Owston, 1985).
Since it is difficult to get evaluation copies of software from producers and distributors without charge as examination copies; many companies are concerned because they know that illegal copies have been made from evaluation copies in the past and then used by the teachers or other school district personnel. Since it is difficult to get evaluation copies of the programs, the evaluation services discussed above make software evaluation more reasonable. Even if preview copies of instructional software were available, their sheer volume makes software evaluation by various agencies almost essential; there are about 10,000 educational software titles currently on the market. Perhaps 2,000 new software titles appear on the market each year. In elementary mathematics, for example, perhaps a hundred new titles may appear each year -- it is very difficult for individual teachers to evaluate the potentially relevant useful software.
Some of these software distributors will provide evaluation copies of textbooks, but are often unwilling to provide computer software for evaluation. There is no great cost advantage to copying textbooks (unless the copy machine is left unguarded) but hundreds of dollars worth of software may be copies to two dollar diskettes. Commercially available software typically costs from $39 for a simple program to many hundreds of dollars for complex programs.
One way of providing software evaluation is to provide educators with "preview copies" or demonstration copies" which are only short demos of the entire product. As will be discussed later, when we discuss copyright law, the illegal and immoral copying has significantly reduced the quality of instructional software; many creative software developers feel there is not enough money in instructional software production to make it worth while developing quality software.
References:
* Bialo, E.R., & Erikson, L.B. (1985). Microcomputer courseware: characteristics and design trends. AEDS Journal, 18(4), 227-236.
* Owston, R.D. (1985). Software Evaluation Using YESES. Paper presented at the annual conference of the Ontario Educational Research Council, Toronto, Ontario, December 1985.
IX TESTING LIMITATIONS
There are possible limitations to today's tests: lack of precision in what they measure, or testing bias which limits the some groups (as women or blacks). It is possible that tests are not sufficiently precise to measure the difference in learning between individuals so instructional designers can determine which treatment works best with specific types of learners. Many researchers feel that Aptitude-Treatment-Interaction research, discussed previously, would show that learners differ and they can succeed best with individually tailored, or prescribed, instructional methods and materials.
The National Center for Fair & Open Testing, is one group who feels that the Scholastic Aptitude Test (SAT), and the American Testing (ACT) examination are biased against blacks, girls and other minority groups. They feel that girls, for example, would not be able to do a math problem involving a basketball game because few girls are involved in basketball. An SAT test in 1974 is supposed to have used the word "pirouette," a term likely to be used in some socioeconomic communities. About this period the words "timpanist" and "melodeon" were also used. Since these tests are used by many Universities in their admis-sions process, it is important to eliminate possible sources of testing error.
Most tests today are carefully combed for possible bias. Most test users are also aware that tests are but one measure which should be used in admissions, promotion, or placement procedures.
For whatever reasons (testing bias, innate differences, environmental effects [fewer girls take computer classes]), Asians score higher than whites, and whites score higher than blacks on many "standardized tests." Males score higher than females. This later finding led, in the mid-1980s, to a federal judge in New York State declaring that the SAT was unfair to girls when it was used to give out college scholarships; males were getting two-thirds of the scholarship funds. These testing patterns are common not only to the SAT, and ACT, but to about 150 other standardized tests. It seems that this might be a good area for sociologists, psychometricians and others to research in greater detail.
X IQI / CES SYSTEM for Course Evaluation (will be presented in the 5th Session)
XI EVALUATION PRINCIPLES
A. This examination of evaluation precepts or principles will focus on how we can use them in software evaluation.
XII SYSTEMIC CHANGE
Changing any part of a system effects all of the other components in the system. The traditional educational system is composed of:
All of these aspects of a typical educational system need to change to easily allow the implementation of an innovative or different instructional system.
In a computer-based instructional system the teachers should be more comfortable with counselor-like diagnosis [personal and academic] and prescription [of learning experiences -- in this system to courseware]. The grading system should be on a criteria-referenced scale related to student achievement of the instructional objectives and their associated criteria and standards [and thus to pass or no-pass "grades" perhaps accompanied by narrative details]. The students should be allowed some flexibility in their daily schedules so they can follow their interests [possible in an individualized instructional system such as CBT]. School system need to be aware of the need that CBT is initially expensive and requires "up-front" costs which can be amortized over a number of years and which, in some cases, can result in low annual per student costs. Computer-based learning areas may be much larger than traditional classrooms with many more students in them who are under the supervision of a single teacher [in a workable computer-based instruction system it is expected that the computers would reduce the instructor's load -- presentation of information, testing, record keeping]. If CBT is to be cost effective these types of changes must be anticipated and the system changed to permit the adequate use of this instructional technology.
XIII SOURCES OF INSTRUCTIONAL MATERIALS
Baker & Taylor is the largest distributor of educational and entertainment CD-ROMs in the U.S. If you include its book and audiotape distribution, it's a $1 billion business. When Baker & Taylor puts a title on its list the title may to into thousands of retail outlets. Baker & Taylor looks for specific things when selecting a title to distribute (their criteria differ from ours as educators, but many of the criteria are common to both). "We put the value on content -- titles that are compelling or that have something a little extra," says Brad Grob, Director of multimedia. The distributor examines the comp[any, sometimes steering clear of "garage-band" setups that are not likely to be able to support their products. Baker & Taylor's evaluation is somewhat different: Packaging is essential. "The box has got to stand out on the shelf," says Grob. "You've got 10 seconds to sell yourself. The packages that do the best have a clean, not cluttered, look. The box cover should also position the package and quickly tell the story." "...JFK Assassination and Exploring Ancient Architecture came to us brand new and with their products they have now sold more than 100,000 units.
A. Review students homework assignments
B. Augment the above with collections of brochures, etc.
540-2 Apr-96