Home » plausibility testing
Category Archives: plausibility testing
This is Part 2 in the 4-part series on plausibility-testing and measuring ROI of wellness-sensitive medical events. No vendors or consultants are being “outed” in this posting, so if you read TSW for the shock value, you’ll be disappointed. But of course you don’t do that — you read TSW to gain insight and knowledge. Yeah, right, and you used to subscribe to Playboy for the articles.
In the previous installment, which should be reviewed prior to reading this one, we listed the ICD9s and ICD10s used to identify and measure wellness-sensitive medical events. You want to count the number of ER visits and inpatient stays across these diagnoses, the idea being that this total should be low and/or declining, if indeed wellness (and disease management) are accomplishing anything.
This total is never going to fall to zero — people will always be slipping through the care and especially self-care cracks — but the best performing health plans and employers can manage the total down to 10-15 visits and stays per year. To put this in perspective, incurring only 10 ER and IP claims a year per 1000 covered people for wellness-related events is a great accomplishment, given that you have about 250 ER and IP claims/1000 covered people for all-causes combined. That would mean only about 5% of your claims are wellness-sensitive. If hospital and ER spending is about 40% of your total spending, that would mean your spending on events theoretically avoidable by wellness programs represents about 2% of your total spending. (So much for the CDC’s rant that 86% of your claims are associated with chronic disease. This from the people who are head-scratchingly alarmed by the “arresting fact” that “chronic disease is responsible for 7 out of every 10 deaths.” And yet these guys somehow wiped out polio…)
When you count these codes, there are a number of mistakes you could make, but shouldn’t, if you follow this checklist. It’s really very easy, meaning that many mistakes are the result of overthinking the analysis.
Think of it this way: if you were estimating a birth rate, you wouldn’t look at the participants in your prenatal program, or count how many women made appointments with obstetricians. You’d simply tally the number of babies born and divide that figure by the number of people you cover. Each potential mistake on this list is avoidable by keeping that example in mind.
I’ve got a little list
- Do not “count” the number of people (two discharges for one person equals one discharge for two people), and do not take into account whether people were in a disease management or wellness program.
- Do not count people for whom you are secondary payer.
- If someone has an event straddling the year-end, count them in the year of discharge
- Don’t be concerned with taking out false positives; they will “wash”
- If someone is transferred and has an applicable primary diagnosis both times, they count twice. (This should happen automatically.)
- If someone has (for example) a heart attack and an angina attack in one hospitalization, only the primary code counts
- Admissions following discharges count separately if they generate two different claims forms
- Interim submissions of claims or claims submissions replaced by other claims submissions should only be counted once (since they represent only one hospital stay)
- Admissions made through the ER, of course, do not count as ER visits
- Claims may include facility and professional. Remember to only count facility and not professional claims – otherwise it is double-counting
- Urgent care is not the same as ER. ER includes just (1) ER PLACE OF SERVICE and (2) OBSERVATION DAYS.
- All ACUTE CARE hospital admissions count, including <24 hours, and EXCLUDING observation days, which we count with ER.
- Allowed claims, not paid claims
- Fiscal year or Calendar year is fine — most people use fiscal year
- Be careful that your case-finding algorithm notes that sometimes IP admissions from the ER take place the day after the ER admission (like at night)!
- Go back as many years as is conveniently trackable. The more years you go back, the more insight you will glean from the analysis.
- For ER discharges, include all submissions whether non-emergent or emergent
- Do NOT count members >65 in the “commercial” category even if you are primary-pay. (That would mess up your comparisons.
Coming up next time: how to present and interpret your results.
Suppose your family is enjoying dinner one night and your daughter’s cell phone rings. She excuses herself, goes in the other room for a few minutes, comes back out and announces: ‘‘Mom, Dad, I’m going over to Jason’s house tonight to do homework.’’
No doubt you reply, ‘‘Okay, bye. Have a nice time.’’
Ha, ha, good one, Al. Obviously, you don’t say that. You say: ‘‘Wait a second. Who’s Jason? What subject? Are his parents home?’’ Then you call over to the house to make sure that:
- adults answer the phone; and
- the adults who answer the phone do indeed have a son named Jason.
You are applying a “plausibility test” to your daughter’s statement so instinctively that you don’t even think, let alone, say: ‘‘Honey, I think we need to test the plausibility of this story.’’ That’s everyday life. Plausibility-testing would be defined as:
Using screamingly obvious parental techniques to check whether your kids are trying to get away with something.
The general definition of plausibility-testing in wellness
Not so in wellness, where employers never test plausibility. (It’s amazing employer families don’t have a higher teen pregnancy rate.) In wellness, plausibility-testing is defined as:
Using screamingly obvious fifth-grade arithmetic to check whether the vendor is trying to get away with something.
You might say: “Hey, I majored in biostatistics and I don’t remember learning about plausibility-testing or seeing that definition.” Well, that’s because until population health came along, plausibility testing didn’t exist because there was no need for it in real grownup-type biostatistics. In real biostatistics studies, critics could “challenge the data.” They could show how the experiment was designed badly, was contaminated, had confounders, had investigator bias, etc. and therefore the conclusion should be thrown out.
The best example might be The Big Fat Surprise, by Nina Teicholz, in which she systematically eviscerates virtually every major study implicating saturated fat as a major cause of heart attacks, and raises the spectre of sugar as the main culprit. This was two years before it was discovered that the Harvard School of Public Health had indeed been paid off by the sugar lobby to do exactly what she had inferred they were doing.
What makes wellness uniquely suited to plausibility-testing is because, unlike Nina, you aren’t objecting to the data or methods, as in the case of every other debate about research findings. Rather, in wellness plausibility-testing, you typically accept the raw data or methods — but then observe they prove exactly the opposite of what the wellness promoter intended. You do this even though the raw data and methods are usually suspect as well. For instance, dropouts are not only uncounted, and unaccounted for, in almost all wellness data. Indeed with the exception of Iver Juster poking the HERO bear in its own den, their existence is generally not even acknowledged. As an Argentinian would say, they’ve been disappeared.
Flunking plausibility is part of wellness industry DNA, the hilarity of which has been covered at length on this site, as recently as last week with (you guessed it) Ron Goetzel. I did have to give him some credit this time, though: usually a plausibility test requires 5 minutes to demonstrate he proved the opposite of what he intended to prove. This time it took 10.
And of course the best example was Wellsteps, where all you had to do was add up their own numbers to figure out they harmed Boise’s employees. You didn’t have to “challenge the data,” by saying they omitted non-participants and dropouts, that many people would likely have cheated on the weigh-ins etc. All those would be true, but they wouldn’t face-invalidate the conclusion the way that plausibility test did.
The specific definition of plausibility-testing using wellness-sensitive medical admissions
All of what you are about to read below, plus the story about Jennifer (which ends happily — it turned out Jason was home, they did do homework…and later on they got married and had kids of their own, whose plausibility they routinely check), is covered in Chapter 2 in Why Nobody Believes the Numbers. This adds the part about the ICD10s.
There is also a very specific plausibility test, in which you contrast reductions in wellness-sensitive medical event diagnosis codes with vendor savings claims, to see if they bear any relationship to each other. The idea, as foreign as it may seem to wellness vendors, is that if you are running a program designed to reduce wellness-sensitive hospitalizations and ER visits, you should actually reduce wellness-sensitive hospitalizations and ER visits. Hence that is what you measure. Oh, I know it sounds crazy but it just might work.
And it’s not just us. The Validation Institute requires this specific analysis for member-facing organizations. They were adopted for a major Health Affairs case study on wellness (that didn’t get any attention because it showed wellness loses money even when a population is head-scratching unhealthy to begin with). And even the Health Enhancement Research Organization supported this methodology, before they realized the measuring validly was only a good strategy if you wanted to show losses.
Quizzify plausibility-tests its results in this manner and guarantees improvements, but because Quizzify reduces many more codes than just wellness-sensitive ones, the list of diagnosis codes below would be much-expanded. But the concept is the same.
The remainder of this post and (barring a “news” event in the interim) the next posting will show how to do a plausibility test. Today we’ll start with which codes to look at. Part 2 will be how to avoid common mistakes. Then we’ll cover how to compare your results to benchmarks. Finally, we’ll show how to estimate the “savings” and ROI.
Codes to be used in a plausibility test
Start by identifying codes that are somewhat closely associated with lifestyle-related conditions and/or can be addressed through disease management. These are the ones where, in theory at least, savings can be found. Here are some sample ICD9s and ICD10s. In order to save space since this source data doesn’t reproduce well in WordPress, I can’t put the codes next to the conditions. Instead, I’ll stack ’em in the following order:
- CHF and other lifestyle cardio-related events
ICD9s are stacked in the same order:
|493.xx (excluding 493.2x*)|
|491.xx, 492.xx, 493.2x, 494.xx, 496.xx, 506.4x|
|410, 411, 413, 414 (all .xx)|
|249, 250, 251.1x, 252.2x, 357.2x, 362, 366.41, 681.1x, 682.6, 682.7, 785.4x , 707, 731.8x|
|398.90. 398.91, 398.99, 402.01, 402.11, 402.91, 404.01, 404.03, 404.11, 404.13, 404.91, 404.93, 422.0, 422.9x, 425.xx, 428.xx, 429.xx|
ICD10s, ditto in order, are:
|J40, J41, J42, J43, J44, J47, J68.4|
|i20, i21, i22, i23, i24, i25.1, i25.5, i25.6, i25.7|
|E08, E10, E11.0-E11.9, e16.1, e16.2, e08.42, e09.42, e10.42, e11.42, e13.42, e08.36, e09.36, e10.36, e11.311, e11.319, e11.329, e11.339, e11.349, e11.359, e11.36, e13.36, L03.119, L03.129, i96, E09.621, E09.622, E11.621, E11.622, E13.621, E13.622, L97|
|i50, i10, i11, i12, i13|
The ICD9s and ICD10s are not a perfect match for each other. If ICD10s matched ICD9s, there would be no need for ICD10s. If you try to construct an events trendline crossing October 1 2015, when the ICD10s were adopted, you might find a bump. More on that another time.
Coming up next: So now that you have these ICD9s, what do you do with them?