# CHAPTER 9 - Measurement And Scaling

• I. Measurement Levels
or Scales of Measurement
 Measurement The assignment of numbers to responses based on a set of guidelines
A. Nominal-Scaled Responses
• Numbers are just labels
• Used solely to identify different categories .
• Do not imply rank ordering
• Intervals, ratios are meaningless
• Only mathematical operation is counting the number within each category.
• Only measure of central tendency is the mode.
B. Ordinal-Scaled Responses
• Numbers possess the property of rank order.
• Two measures of central tendency
• Mode - most frequent category
• Median - category in which the 50th percentile falls
C. Interval-Scaled Responses
• Has all the properties of an ordinal scale
• Differences between scale values are meaningful.
• Measures
• Mean - simple average
• Standard deviation - measure of dispersion. The degree of deviation of the numbers from their mean.
• Mode
• Median
• Starting point or zero point is arbitrary.
• Ratio of two values on an interval scale is
• arbitrary
• no meaningful interpretation
• It depends on the scale's starting point.
D. Ratio-Scaled Responses
• Possess all the properties of an interval scale
• Ratios of numbers have meaningful interpretations.
• Responses are the most versatile analytically
• Attitudes and opinions scales usually do not satisfy the stringent requirements of a ratio scale.
• Natural, unambiguous starting point: zero.
• Computing and interpreting ratios make sense.
 Metric Classification of Data Types Non Metric Data Data with only nominal and ordinal properties Metric Data Data with interval or ratio properties
II. Classes of Variables
 Attributes personal or demographic characteristics Education level Age Size of household Number of children. Behavioral variables relate to behaviors frequency of visits to a store extent of magazine readership. Beliefs relate to knowledge and what respondents consider (correctly or incorrectly) to be true Attitudes Similar to beliefs Also reflect respondents' evaluative judgments. Attitudes are multifaceted cognitive component (knowledge, beliefs) affective component (liking, preference behavioral component (action tendency).
Attitude Scaling
five ways to measure attitudes
A. Observe Overt Behavior
• A person's behavior concerning an object will be consistent with attitudes toward it.
• This assumption has found empirical support.
• Useful when other attitude measurement methods are inconvenient or infeasible.
• Examples
• Observation study of very young children toward toys.
• One-way mirrors observation of focus groups .
• Observation of behavior can yield only rough estimates of attitudes at best.
• Other factors influence behavior.
• Validity of attitudes inferred may be questionable.
• Expensive approach because (requires skilled observers).
B. Analyze Reactions to Partially Structured Stimuli
• Ask respondents to react to, or describe an incomplete, vague stimulus.
• Responses obtained are analyzed by trained professionals
• Rationale : A person's response to a vague stimulus will be shaped by attitudes.
• Examples
• Various projective techniques
• Word association test
• Sentence completion test
• TAT
• cartoon test
C. Evaluate Performance on Objective Tasks
• Respondents complete an ostensibly objective, well-defined task.
• Their performance is then analyzed
• Drawbacks
• difficulty in constructing objective tasks
• meaningfully interpreting performance.
• Unstructured, disguised questionnaires are illustrations
D. Monitoring Physiological Responses
• Based on the premise that emotional reactions to a stimulus will be accompanied by involuntary physiological changes.
• For example, a feeling of fear or anxiety can induce physiological reactions such as shivering, perspiration, and increased heartbeat.
• galvanic skin response (GSR) meter
• pupillometer
• eye-tracking equipment
• response latency measures (e.g., VOPAN).
• Drawback,
• Measure emotional arousal rather than attitudes.
• Cannot reveal whether the arousal is from a positive or a negative emotion.
• Physiological responses may only measure
• Attention-getting power
• Excitement-generating potential
E. Self-Report Measurements of Attitudes
• More straightforward than the other four categories.
• Ask respondents relatively questions concerning attitudes
• Questions are typically in the form of rating scales
IV. Use of Rating Scales in Self-Report Measurements
A. Graphic Versus Itemized Formats
• A graphic rating scale
• presents a continuum
• in the form of a straight line
• Infinite number of ratings are possible.
• Allows detection of fine shades of differences in attitudes.
• Coding and analysis will require a substantial amount of time
• Respondents may be incapable
• of perceiving fine shades of differences in attitudes
• accurately translating perceptions into physical distances.
• Whether graphic rating scales can be meaningfully used is arguable.
• Pure graphic rating scales are not widely used
• Itemized rating scales
• have a set of distinct response categories
• Any suggestion of an attitude continuum is implicit.
• Essentially look like multiple-category questions.
• Easier to respond to
• More meaningful from the respondent's perspective.
• Coding and analysis is easier
• Much more widely used than the graphic type.
• Can also use a combination format.
B. Comparative Versus Noncomparative Assessments
 Comparative rating scale provides all respondents with a common frame of reference. Allows the researcher to be confident that all respondents are answering the same question. Noncomparative rating scale Permits respondents to use any frame of reference
• A standard frame of reference from the researcher may not be meaningful to all respondents.
• Validity of the comparative ratings may be questionable.
• Choose between comparative and noncomparative based on
• the nature of potential respondents
• their realms of experience with the attitude objects about which ratings are desired
C. Forced Versus Nonforced Response Choices
 Forced Does not give respondents the option to express a neutral, or middle- ground, attitude. Odd number of response categories Non Forced Does give respondents the option to express a neutral attitude. Even number of categories no unambiguous answer as to which is better Choose after carefully considering the characteristics unique to the situation.
D. Balanced Versus Unbalanced Response Choices
• Balanced scale has an equal number of
• positive/favorable response choices
• negative/unfavorable response choices.
• Balanced reduces response biases.
• Unbalanced scale has a larger number of choices on the side where the attitudes are likely to fall..
• Recommended when true attitudes are likely to be predominantly one-sided.
E. Labeled Versus Unlabeled Response Choices
• Rating scales have anchor labels that define their two extremes.
• May include one or more intermediate labels
• Word labels
• Number labels
• Word & Number labels
• No rules exist for determining the number and types of labels
• No label is better than a bad label
• Interval scale data
• Inappropriate word labels can cast doubt on the assumption of interval data.
• Word labels should be used cautiously and sparingly
• Numerical labels must also be used and interpreted cautiously.
• Consider the likely audience for the final research report.
• Consider consistency across languages and cultures.
• Picture labels are also used
F. Number of Scale Positions
• Another area with no rigid rules.
• Most rating scales have between five and nine categories.
• More precision as the number of scale positions increases.
• Scales must make sense to the target population.
• Are you using a
• Single-item scale (just one question) a larger number of scale positions
• Multiple-item scale (sum response values from a few questions). Fewer scale positions
 Constant Sum Scale Natural Starting Point (zero) Allocate points among group Has ratio scale properties
 Paired Comparison Scale Comparative evaluation of two or more objects Number of pairs can get large n*(n+1)/2
G. Measurement Level of Data Obtained
• Measurement level (how powerful the data are) of the data generated impacted by
• Type of question
• Type of rating scale.
• No such thing as the ideal scale format
• Choose carefully after considering
• Characteristics and requirements of the research setting:
• Nature of the variable to be measured
• Are respondents capable of making refined mental judgments concerning the variable?
• Types of analyses to be performed
V. Commonly Used Multiple-Item Scales
A. Likert Scale
• A series of evaluative statements concerning an attitude object.
• Five-point agree--disagree scale.
• A typical Likert scale has about 20 to 30 statements.
• Verbal labels but no numerical labels.
• Numbers are assigned to the scale categories for consistent indication of attitude direction.
• Summing numerical ratings on all the statements yields overall attitudes.
• Usefulness of a Likert scale depends on the statements.
• Statements should
• capture all relevant aspects
• be unambiguous to minimize erratic fluctuations
• be sensitive enough to discriminate among respondents with differing attitudes.
• This scale development can be quite laborious if it is to be done in an organized fashion.
B. Semantic-Differential Scale
• The most widely used attitude-scaling technique.
• Consists of
• a series of items to be rated by respondents.
• a series of bipolar adjectival words or phrases
• Each pair of opposites is separated by a seven-category scale
• No numerical labels
• No verbal labels
• Reverse direction to eliminate bias
• Favorable descriptors on the right on half
• Favorable descriptors on the left on half
• Compute attitude scores by adding together.
• Pictorial profile of the attitude object(s) is more common
C. Stapel Scale
• The Stapel scale has four distinctive features:
• Each item has only one word or phrase.
• Each item has 10 response categories.
• Each item is a forced-choice scale since
• Response categories have numerical labels but no verbal labels.
• Eliminates the need to
• Develop complete statements
• Come up with pairs of bipolar words or phrases
• Not as widely used due to perceived complexity of
• Format
• Instructions
• Analyze data by using procedures like on semantic-differential scales.
• Compute overall attitude scores by summing ratings on individual items.
• Pictorial profiles are made from the mean (or median) ratings
VI. Strengths of Multiple-Item Scales
A. Validity
• The extent to which
• the scale a true reflection of the underlying variable.
• the scale fully captures all aspects of the construct to be measured.
• To be a more valid measure
• Careful design
• multiple-item scale
• containing numerous items
1. Content Validity

• The extent to which the content of a measurement scale seems to tap all relevant facets of an issue that can influence respondents' attitudes.
• Content validity of an attitude scale is global criterion
• It can be assessed only through a subjective judgment.
2. Construct Validity

• Seeks agreement between a theoretical concept and a specific measuring device or procedure.
• For example, a researcher inventing a new IQ test might spend a great deal of time attempting to "define" intelligence in order to reach an acceptable level of construct validity.
• Assesses the nature of the underlying variable or construct measured by the scale.
• High construct validity means
• convergent validity -
• The degree to which concepts that should be related theoretically are interrelated in reality
• The actual general agreement among ratings, gathered independently of one another
• discriminate validity -
• The degree to which concepts that should not be related theoretically are, in fact, not interrelated in reality
• The lack of a relationship among measures which theoretically should not be related.
3. Predictive Validity

• Predictive validity answers the question "How well does the attitude measure provided by the scale predict some other variable or characteristic it is supposed to influence?"
B. Reliability
 Reliability of an attitude scale measures how consistent or stable the ratings generated by the scale are likely to be.
1. Test-Retest Reliability
 Test-retest Reliability measures the stability of ratings over time relies on administering the scale to the same group of respondents at two different times.

• High test-retest reliability if ratings from two measurements
• correlate strongly
• are consistent.
• Test-retest reliability is a measure
• Stability of the scale items
• Degree to which scores obtained through the scale remain the same from measurement to measurement over time.
2. Split-Half Reliability
 Split-half Reliability Measures consistency across items within a scale Can only be assessed for multiple-item scales.

• Requires
• Split the scale items randomly into two sets
• Equal number of items in each set
• Examining the correlation between respondents' total scores derived from the two sets of items.
• Split-half reliability is a measure of
• The equivalency or internal consistency
• The degree to which scores obtained through randomly split halves of the scale correlate with others within the same measurement.
C. Sensitivity
 sensitivity of an attitude scale Closely tied to reliability Focuses specifically on ability to detect subtle differences in the attitudes being measured.
• A highly sensitive attitude scale should
• be able to discriminate between respondents who differ even slightly in terms of their attitudes toward something.
• be capable of uncovering minute changes in the same respondents' attitudes over time.