Introduction to health measurement scales

https://doi.org/10.1016/j.jpsychores.2010.01.006Get rights and content

Abstract

Both research and clinical decision making rely on measurement scales. These scales vary with regard to their psychometric properties, ease of administration, dimensions covered by the scale, and other properties. This article reviews the main psychometric characteristics of scales and assesses their utility.

Introduction

There is no single variable that can be used to describe health, and health cannot be measured directly. Health measurement requires several steps and involves the evaluation of several health-related indicators.

Rating scales are used in numerous settings to measure various aspects of health such as different symptoms or the presence of a particular trait. Health measurement scales can be classified in (at least) three ways, according to their function, description, and methodology. Functional classification focuses on the application of methods and how they are used, such as Bombardier and Tugwell's [1] classification of diagnostic, prognostic, and evaluative health measurements; however, others [2] have argued that this classification ignores the way scales are actually used in practice. Descriptive classification of health measurements is concerned with the range of topics covered by a particular measurement. For example, one might focus on a particular organ system, a diagnosis, or a broader concept such as anxiety or quality of life. Another distinction can be between broad classification of generic health measures and specific instruments. Specific instrument can be concerned with not only a particular disease, but also a particular target population, such as children. Methodological classification distinguishes among rating scales, questionnaires, indices, and subjective vs. objective measures.

Whether rating scales are to be used in a research project or to make clinical decisions, it is essential to evaluate how well they perform. By how well, we mean how much random error is present in the measurement (i.e., its reliability) and whether the scores give us meaningful information about the respondent (the validity of the instrument). A third measure of performance addresses the issue of whether it is feasible to use the instrument for a particular purpose. In this article, we will give an introduction to some of the properties of rating scales, the concept of validity and reliability. Those who are interested in the details of constructing measurement scales are referred to more comprehensive texts [2], [3], [4].

Scale development can be approached in two ways: questions may be chosen from an empirical or a theoretical viewpoint [5]. With the empirical approach, a large number of questions are tested and statistical procedures are used to select the ones that best predict the outcome of interest. However, the disadvantage of this method is that it is difficult to interpret why individuals answering a certain question in a certain way tend to have different outcomes. Questions in the Health Opinion Survey [6] were selected because they distinguished between those who do and do not have psychiatric problems. However, debates over what exactly the scale measures are still continue. Scales developed entirely from an empirical stance may have clinical value, but they do not advance our understanding of the underlying phenomena. The alternative strategy is to select questions that are thought to be relevant from a standpoint of a particular theory, such as the McGill Pain Questionnaire [7]. In psychology, at least, the trend over the past 50 years has been a move toward theoretically derived instruments [2].

Section snippets

Items on a scale

Items on a scale can come from several different sources: existing scales, reports of individuals' subjective experiences, clinical observations, expert opinion, research findings, and theory. One should be aware of the strengths and weaknesses of each source when considering a scale for a particular use. The advantage of using existing items from older scales is that items have probably already gone through a rigorous process of assessment and are, therefore, more likely to be useful. It may

Criteria to identify useful items

Not every item intended for a scale will perform well; therefore, several aspects of items have to be checked to decide which are likely to be useful.

It is important to use clear, comprehensible language. Very often, technical or jargon terms are used (e.g., stool, shock, or cardiovascular), which would be fine if the scale is to be used on health professionals but not if lay people are the intended respondents. Since people are different in their reading ability, items should not require more

Reliability

Before one can start using an instrument, it should be established that it is measuring “something” in a reproducible manner; that is, if the measurement is repeated by different observers, or on different occasions, or by a similar (parallel) test, then the results should be comparable [2]. Assuming that the person has not changed, we would expect to arrive at similar scores at two different times. There are different indices to measure reliability, and not all are applicable to a given scale.

Validity

Validity is concerned with the meaning and interpretation of the scores. In other words, validity guides us as to what conclusions can be made about people with a given score. If, for example, we draw on a scale to measure degree of low-back pain, then we would like to be sure that people who score higher actually have more low-back pain. Whether this is the case is a question of validity. It may be that the scale measures something else, such as the degree of pain from other sources or

Utility

A measure that is reliable and valid can still be impractical for use. It may, for example, take a long time to complete, may require excessive resources to score, or may require training interviewers who would administer the scale. It is usually the case that longer tests tend to be more reliable and valid than shorter ones, but for the sake of improved utility, decreasing the time needed to complete a test might be advantageous.

Over the past few decades, a new model of test construction has

Summary

Scales and questionnaires are an integral part of clinical practice and research. However, they are not all created equally. To be useful, instruments must demonstrate good psychometric properties, such as reliability and validity, and be in a format that patients find easy to use.

References (18)

There are more references available in the full text version of this article.

Cited by (0)

View full text