Disability insurance error rates and gender differences
Disability insurance programmes around the world provide income replacement and medical benefits to workers who face major health shocks – strokes, musculoskeletal disorders, mental health problems, etc. – that affect their ability to work. In the US and elsewhere, these programmes have grown substantially over the last 35 years, both in participation (applicants and beneficiaries) and expenditure.
The rapid growth of these programmes raises a number of important policy questions: Is the growth of the programmes fiscally sustainable? Can alternative programmes be devised to keep people with moderate disabilities at work or to ease the transition of current beneficiaries back into employment? How effective are the screening and re-assessment processes for determining who is (or still is) disabled?
Assessing the effectiveness of the screening process comprises two distinct questions: first, to what extent does the programme accept applicants who are not truly work limited (a Type II error, accepting a hypothesis that is actually false); and second, to what extent does the programme correctly accept applicants who are truly work limited (a Type I error). These raise the further question of whether errors differ for different groups in society.
The effectiveness of the screening process depends on the ability to observe the ‘true’ health of an applicant. While some disabilities are relatively easy to verify through tests and diagnostic exams, others are harder to assess in their severity (back pain, depression, etc.). This means that some individuals who would not be classified as disabled under a world with perfect information may be awarded benefits in a world in which the signal is noisy. In the media, this is known as the ‘cheaters problem’ or, in less colourful language, as the moral hazard aspect of disability insurance.
There is plenty of research that quantifies the importance of this type of error and the implied incentive effects of disability insurance on the employment rate of low-skilled workers (e.g. French and Song 2014, Maestas et al. 2013).
However, scant attention is paid to the opposite type of screening error: turning down disability insurance applications from people who are, in fact, truly disabled. And there is no research on whether these screening errors tend to be disproportionally larger among certain groups in the population (e.g. people with hard-to-verify conditions, minorities). Our recent work (Low and Pistaferri 2019) is an attempt to study these issues and, in particular, to identify the extent of this type of error and to show how it differs by gender.
We base our work on data from the US Health and Retirement Survey (HRS), which collects extensive information on the health of participants, and merged it with administrative records from the Social Security Administration (SSA), which contains information on disability insurance applications, including the type of condition, the outcome of the application, and the reason for denial (if any).
To define false rejections (Type I errors), we obtain information on who applied and was denied (from the SSA records), and the ‘true’ disability status of an applicant (from the HRS). Our assumption is that SSA observes the true disability status with noise, which is the source of screening errors. The SSA definition of disability is:
Inability to engage in any substantial gainful activity … by reason of any medically determinable physical or mental impairment, which can be expected to result in death, or which has lasted, or can be expected to last, for a continuous period of at least 12 months.
This definition emphasises that a disability insurance award should be for a disability that is severe, persistent, and affecting the ability to work. The SSA screening process rejects applicants either because they lack a severe, persistent medical condition or because they are deemed to have residual functional capacity for work.
We searched for survey questions in the HRS that replicate this definition as close as possible and classify as ‘truly work limited’ an individual who answers
- ‘Yes’ to the question “Do you have any impairment or health problem that limits the kind or amount of paid work you could do?”
- ‘No’ to the question “Is this a temporary condition that will last for less than three months?”
- ‘Yes’ to the question “Does this limitation keep you from working altogether?”
Our implicit assumption is that a denial of an HRS disability insurance applicant who classifies themself as disabled (as per the answers to these three questions) is an instance of incorrect rejection (Type I error), and, symmetrically, that an award given to an applicant who classifies himself as non-disabled is an instance of incorrect acceptance (Type II error).
We use these definitions and calculate that the baseline false rejection rate (Type I error) is 55% and the baseline false acceptance (Type II error) is 29%. The extent of these errors, and the implied inefficiency of the insurance system, confirms earlier structural estimates of these errors in Low and Pistaferri (2015).
In Low and Pistaferri (2019), we show that the probability of being rejected when truly disabled varies with a host of observable characteristics. The effect of some variables is as expected: higher incorrect denials are associated with applying with harder-to-detect disabilities (such as back pain or mental health) and with young age.
But the most worrisome and robust finding is that truly disabled women are 20 percentage points more likely to be incorrectly rejected than observationally equivalent men, i.e. after accounting for demographics, labour market history, and health status (Figure 1). None of the robustness checks we run changes this conclusion.
Figure 1 ‘Incorrect rejection’ (Type I error) share by condition and gender
Note: The vertical green lines indicate 95% confidence intervals.
We consider various explanations for this striking gender difference. The first is that women are genuinely in better health than men when applying. The second is that they have a lower threshold for pain than men, and SSA ‘sees through’ the application. The third is that they have lower opportunity costs of applying for disability insurance. These are all channels where demand for insurance differs.
We also consider a supply channel, namely the possibility that SSA sets a higher threshold for women when evaluating who is work-limited and who is not.
To distinguish between these explanations, we use information on who applies and on who self-reports a disability from the HRS. We also make use of a special disability vignettes module in the HRS, in which survey respondents are asked to assess the work-limitation status of hypothetical individuals (or ‘vignettes’). The gender of the vignettes presented to responders is randomised, allowing us to explore whether men and women applicants are evaluated differently.
This turns out to be the case: men and women applicants are evaluated differently, whereas we find no evidence for that demand for insurance differs by gender. Our conclusion is that the most likely explanation of our findings is the supply channel, i.e. SSA setting higher admission standards for women (or alternatively, receiving noisier signals).
What is the source of the larger false rejection rate for women? We verify that applicants are not incorrectly rejected because SSA classifies women as not suffering from a given medical condition; they happen because SSA concludes that disabled women are more likely than men to still have functional work capacity.
An implicit test of whether the rejection is truly an error is to look at the employment records of ‘correctly rejected’ women (i.e. those who classified themselves as non-disabled from the HRS questions) and ‘incorrectly rejected’ women (those who classified themselves as disabled) three to five years after receiving the denial decision. We find that the average employment rate of the correctly rejected women is 19%, but it is only 2% among wrongly rejected women, which is comparable to the employment rate of women who were awarded disability insurance.
We do not find an analogous result when we perform the tests among men; neither do we find any difference when conducting a placebo test and comparing the same women five to ten years before they apply for disability insurance: future ‘incorrectly rejected’ women work at the same rate as ‘correctly rejected’ women (and as much as those awarded disability insurance) in the period five to ten years prior to application.
Conclusions and implications
One surprising thing about the disability insurance application forms is the amount of information on those forms that does not appear relevant to assessing work limitations. For example, the form asks not only gender but also marital status. Moreover, for those assessed as having residual functional capacity for work, additional special forms ask whether the applicant takes care of relatives (parents, children, spouse) or pets, handles household chores (shopping, cleaning, cooking), and so forth. It is possible that the societal role assigned to women, who in reality spend more time than men on caregiving and household activities, puts them at a disadvantage by suggesting greater levels of residual capacity.
Our results suggest that an important policy change to consider would be to make disability insurance applications gender-blind. Evidence from other settings shows that gender-blind evaluations of candidates can explain a variety of labour-market outcomes. However, there are two caveats to making the evaluation for disability insurance completely gender-blind: first, some illnesses are readily associated with gender; second, the same illness may create different degrees of incapacity by gender, as recent evidence on gender-based medicine suggests.
Another policy to consider may be the introduction of machine learning algorithms to try to take out human biases in at least some stages of the screening process. Regardless, our finding of substantial gender-based errors raises important policy issues.
French, E, and J Song (2014), “The effect of disability insurance receipt on labor supply”, American Economic Journal: Economic Policy 6(2): 291–337.
Low, H, and L Pistaferri (2019), “Disability insurance: error rates and gender differences”, NBER Working Paper 26513.
Low, H, and L Pistaferri (2015), “Disability insurance and the dynamics of the incentive insurance trade-off”, American Economic Review 105(10): 2986–3029.
Maestas, N, K Mullen and A Strand (2013), “Does disability receipt discourage work? Using examiner assignment to estimate causal effects of SSDI receipt,” American Economic Review 103(5): 1797–829.