Research report
A 4-item measure of depression and anxiety: Validation and standardization of the Patient Health Questionnaire-4 (PHQ-4) in the general population

https://doi.org/10.1016/j.jad.2009.06.019Get rights and content

Abstract

Background

The 4-item Patient Health Questionnaire-4 (PHQ-4) is an ultra-brief self-report questionnaire that consists of a 2-item depression scale (PHQ-2) and a 2-item anxiety scale (GAD-2). Given that PHQ-4, PHQ-2, and GAD-2 have not been validated in the general population, this study aimed to investigate their reliability and validity in a large general population sample and to generate normative data.

Methods

A nationally representative face-to-face household survey was conducted in Germany in 2006. The survey questionnaire consisted of the PHQ-4, other self-report instruments, and demographic characteristics.

Results

Of the 5030 participants (response rate = 72.9%), 53.6% were female and mean (SD) age was 48.4 (18.0) years. The sociodemographic characteristics of the study sample closely match those of the total populations in Germany as well as those in the United States. Confirmatory factor analyses showed very good fit indices for a two-factor solution (RMSEA .027; 90% CI .023–.032). All models tested were structurally invariant between different age and gender groups. Construct validity of the PHQ-4, PHQ-2, and GAD-2 was supported by intercorrelations with other self-report scales and with demographic risk factors for depression and anxiety. PHQ-2 and GAD-2 scores of 3 corresponded to percentile ranks of 93.4% and 95.2%, respectively, whereas PHQ-2 and GAD-2 scores of 5 corresponded to percentile ranks of 99.0% and 99.2%, respectively.

Limitation

A criterion standard diagnostic interview for depression and anxiety was not included.

Conclusions

Results from this study support the reliability and validity of the PHQ-4, PHQ-2, and GAD-2 as ultra-brief measures of depression and anxiety in the general population. The normative data provided in this study can be used to compare a subject's scale score with those determined from a general population reference group.

Introduction

Depression and anxiety are among the most prevalent and disabling conditions in Western societies, and their burden on the individual and society is tremendous (Demyttenaere et al., 2004, Kessler et al., 1994, Leon et al., 1995). With the aim to improve the average physicians' detection rates, which currently range below 50% (Ansseau et al., 2004, Löwe et al., 2003, Löwe et al., 2004a), with only minimal additional burden, ultra-brief self-report screening instruments for depression and anxiety have been developed and validated. Several treatment guidelines now provide evidence-based recommendations regarding screening adults for depression in clinical practices that have systems in place to assure accurate diagnosis, effective treatment, and follow-up (National Institute for Health and Clinical Excellence, 2004, U.S. Preventive Services Task Force, 2002). In contrast to the availability of ultra-short depression screeners (Mitchell and Coyne, 2007), to our knowledge, only one ultra-brief screening scale for anxiety has been published (Kroenke et al., 2007). Although not yet included in treatment guidelines, screening for anxiety was recently suggested as a necessary first step in improving outcomes in patients with anxiety disorders (Katon and Roy-Byrne, 2007).

Ultra-short screening tools are typically defined as measures with 1–4 items, requiring less than 4 min to complete (Mitchell and Coyne, 2007). Results from two recent meta-analyses and a comparative study suggest that ultra-short two- or three-question tests perform better than single item screeners in depression screening, identifying approximately 80% of the cases (Corson et al., 2004, Gilbody et al., 2007, Mitchell and Coyne, 2007). The Patient Health Questionnaire-2 (PHQ-2) (Kroenke et al., 2003, Löwe et al., 2005) is the most validated 2-item screener for depression. It is the short version of the 9-item Patient Health Questionnaire (PHQ-9) (Gräfe et al., 2004, Kroenke et al., 2001). The new diagnostic principle of the PHQ-9 was that each of the nine items evaluates the presence of one of the DSM-IV diagnostic criteria of major depressive disorder (Löwe et al., 2004a, Spitzer et al., 1999). The PHQ-2 focuses solely on depressed mood and loss of interest, thereby representing the DSM-IV diagnostic core criteria. Results from a prospective criterion standard study in a sample of 520 medical outpatients suggest that the PHQ-2 has good criterion and convergent validity and is sensitive to change (Löwe et al., 2005). Other studies indicate good criterion validity of the PHQ-2 as a screening tool for major depression in older adults (Li et al., 2007), pregnant and postpartum women (Bennett et al., 2008), patients with coronary artery disease (Thombs et al., 2008), and patients with HIV / AIDS (Monahan et al., 2009). However, while one of the above-mentioned meta-analyses evaluated the PHQ-9 to be equally effective as longer clinician-administered instruments, more research was requested to validate the PHQ-2 and to compare its diagnostic abilities to those of the PHQ-9 (Gilbody et al., 2007).

For anxiety, the 2-item Generalized Anxiety Disorder Scale (GAD-2) (Kroenke et al., 2007) was recently published as the short version of the 7-item Generalized Anxiety Disorder Scale (GAD-7) (Löwe et al., 2008a, Spitzer et al., 2006). With areas under the curve of 0.80 to 0.91 for the four most common anxiety disorders diagnosed with a criterion standard interview, a recent validation study of 965 primary care patients indicated good criterion validity of the GAD-2.

Despite the promising operating characteristics of the PHQ-2 and the GAD-2, as well as their potential usefulness for medical care and research, neither of the ultra-brief scales has been validated in the general population. Normative data from the general population, which would allow the interpretation of individual PHQ-2 and GAD-2 scores, is also not available.

Our study aims to establish reliability, validity, as well as normative data for the PHQ-2, the GAD-2, and their composite measure, the 4-item Patient Health Questionnaire-4 (PHQ-4) (Kroenke et al., 2009), in a large and representative sample from the general population. Specifically, we investigated the item characteristics, reliability, and factorial structure, including factorial invariance for different age and gender groups. Second, construct validity of the PHQ-2, GAD-2, and PHQ-4 was assessed in the general population by investigating associations between scale scores, other self-report measures, and well-known demographic risk factors for depression and anxiety. Finally, in order to provide comparative data for the application of these three measures, we generated age- and gender-specific normative data for the PHQ-2, GAD-2, and PHQ-4.

Section snippets

Study design and participants

The validation and standardization of the PHQ-4 in the general population was part of a nationally representative face-to-face household survey conducted in Germany. This survey was also used to provide normative data for the 7-item Generalized Anxiety Disorder Scale (GAD-7) (Löwe et al., 2008a). Within this survey, the study participants were interviewed using a structured self-report questionnaire. The survey was carried out in two waves between May 5 and June 8, 2006 by a total of 231 (first

Sample characteristics

From 8106 valid addresses, 1199 persons (14.8%) were not at home at the time of the three visits of the interviewers, 1806 persons refused to participate (22.3%), and 65 persons (0.8%) were not able to complete the study questionnaire due to severe illness. A total of 5036 persons agreed to participate, provided verbal informed consent, and completed the study questionnaire. Response rate among all subjects met by the interviewers was 72.9% (5036/6907) while participation rate among all

Discussion

The findings from this study, which included more than 5000 subjects, suggest that an ultra-brief 4-item measure can reliably and validly measure depression and anxiety in the general population. While preliminary data on the validity of the PHQ-4 and its two subscales (the PHQ-2 for depression and the GAD-2 for anxiety) in clinical samples were previously available (Kroenke et al., 2009, Löwe et al., 2005), this is the first study to provide evidence for the reliability and validity of the

Role of funding source

The study was funded by the Friedrich-Ebert-Stiftung, Germany. The funding source had no role in designing the study, in the collection, analysis, and interpretation of data, in the writing of the report, or in the decision to submit the paper for publication.

Conflict of interest

The authors have no conflicts of interest in connection with this paper.

Acknowledgements

We thank Stefanie Müller, MA, who assisted with data analyses, and we thank all subjects for participating in our study.

References (58)

  • W. Rief et al.

    Base rates for panic and depression according to the Brief Patient Health Questionnaire: a population-based study

    J. Affect. Disord.

    (2004)
  • Diagnostic and Statistical Manual of Mental Disorders

  • J.L. Arbuckle

    Amos 16.0 User's Guide

  • I.M. Bennett et al.

    Efficiency of a two-item pre-screen to reduce the burden of depression screening in pregnancy and postpartum: an IMPLICIT network study

    J. Am. Board Fam. Med.

    (2008)
  • P.M. Bentler et al.

    Significance tests and goodness of fit in the analysis of covariance structures

    Psychol. Bull.

    (1980)
  • J.M. Bland et al.

    Cronbach's alpha

    BMJ

    (1997)
  • B. Bracken et al.

    State of the art procedures for translating, validating and using psychoeducational tests in cross-cultural assessment

    Sch. Psychol. Int.

    (1991)
  • R.M. Carter et al.

    One-year prevalence of subthreshold and threshold DSM-IV generalized anxiety disorder in a nationally representative sample

    Depress. Anxiety

    (2001)
  • D.A. Clark et al.

    Common and specific dimensions of self-reported anxiety and depression: implications for the cognitive and tripartite models

    J. Abnorm. Psychology

    (1994)
  • J. Cohen

    Statistical power analysis for the behavioral sciences

    (1988)
  • K. Corson et al.

    Screening for depression and suicidality in a VA primary care setting: 2 items are better than 1 item

    Am. J. Manag. Care

    (2004)
  • K. Demyttenaere et al.

    Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys

    JAMA

    (2004)
  • D. Ferring et al.

    Measurement of self-esteem: findings regarding reliability, validity, and stability of the Rosenberg Scale

    Diagnostica

    (1996)
  • S. Gilbody et al.

    Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): a diagnostic meta-analysis

    J. Gen. Intern. Med.

    (2007)
  • K. Gräfe et al.

    Screening for psychiatric disorders with the Patient Health Questionnaire (PHQ). Results from the German validation study

    Diagnostica

    (2004)
  • G. Henrich et al.

    Questions on life satisfaction — a short measure for assessing quality of life

    Eur. J. Psychol. Assess.

    (2000)
  • L. Hu et al.

    Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives

    Struct. Equ. Modeling

    (1999)
  • R.M. Kaplan et al.

    Psychological testing

  • W. Katon et al.

    Anxiety disorders: efficient screening is the first step in improving outcomes

    Ann. Intern. Med.

    (2007)
  • Cited by (1521)

    • Quality of life following liposuction for lipoedema: a prospective outcome study

      2024, Journal of Plastic, Reconstructive and Aesthetic Surgery
    View all citing articles on Scopus
    View full text