Slate Star Codex

In a mad world, all blogging is psychiatry blogging

Prescriptions, Paradoxes, and Perversities

[WARNING: I am not a pharmacologist. I am not a researcher. I am not a statistician. This is not medical advice. This is really weird and you should not take it too seriously until it has been confirmed]

I.

I’ve been playing around with data from Internet databases that aggregate patient reviews of medications.

Are these any good? I looked at four of the largest such databases – Drugs.com, WebMD, AskAPatient, and DrugLib – as well as psychiatry-specific site CrazyMeds – and took their data on twenty-three major antidepressants. Then I correlated them with one another to see if the five sites mostly agreed.

Correlations between Drugs.com, AskAPatient, and WebMD were generally large and positive (around 0.7). Correlations between CrazyMeds and DrugLib were generally small or negative. In retrospect this makes sense, because these two sites didn’t allow separation of ratings by condition, so for example Seroquel-for-depression was being mixed with Seroquel-for-schizophrenia.

So I threw out the two offending sites and kept Drugs.com, AskAPatient, and WebMD. I normalized all the data, then took the weighted average of all three sites. From this huge sample (the least-reviewed drug had 35 ratings, the most-reviewed drug 4,797) I obtained a unified opinion of patients’ favorite and least favorite antidepressants.

This doesn’t surprise me at all. Everyone secretly knows Nardil and Parnate (the two commonly-used drugs in the MAOI class) are excellent antidepressants1. Oh, nobody will prescribe them, because of the dynamic discussed here, but in their hearts they know it’s true.

Likewise, I feel pretty good to see that Serzone, which I recently defended, is number five. I’ve had terrible luck with Viibryd, and it just seems to make people taking it more annoying, which is not a listed side effect but which I swear has happened.

The table also matches the evidence from chemistry – drugs with similar molecular structure get similar ratings, as do drugs with similar function. This is, I think, a good list.

Which is too bad, because it makes the next part that much more terrifying.

II.

There is a sixth major Internet database of drug ratings. It is called RateRx, and it differs from the other five in an important way: it solicits ratings from doctors, not patients. It’s a great idea – if you trust your doctor to tell you which drug is best, why not take advantage of wisdom-of-crowds and trust all the doctors?

The RateRX logo. Spoiler: this is going to seem really ironic in about thirty seconds.

RateRx has a modest but respectable sample size – the drugs on my list got between 32 and 70 doctor reviews. There’s only one problem.

You remember patient reviews on the big three sites correlated about +0.7 with each other, right? So patients pretty much agree on which drugs are good and which are bad?

Doctor reviews on RateRx correlated at -0.21 with patient reviews. The negative relationship is nonsignificant, but that just means that at best, doctor reviews are totally uncorrelated with patient consensus.

This has an obvious but very disturbing corollary. I couldn’t get good numbers on how times each of the antidepressants on my list were prescribed, because the information I’ve seen only gives prescription numbers for a few top-selling drugs, plus we’ve got the same problem of not being able to distinguish depression prescriptions from anxiety prescriptions from psychosis prescriptions. But total number of online reviews makes a pretty good proxy. After all, the more patients are using a drug, the more are likely to review it.

Quick sanity check: the most reviewed drug on my list was Cymbalta. Cymbalta was also the best selling antidepressant of 2014. Although my list doesn’t exactly track the best-sellers, that seems to be a function of how long a drug has been out – a best-seller that came out last year might have only 1/10th the number of reviews as a best-seller that came out ten years ago. So number of reviews seems to be a decent correlate for amount a drug is used.

In that case, amount a drug is used correlates highly (+0.67, p = 0.005) with doctors’ opinion of the drug, which makes perfect sense since doctors are the ones prescribing it. But amount the drug gets used correlates negatively with patient rating of the drug (-0.34, p = ns), which of course is to be expected given the negative correlation between doctor opinion and patient opinion.

So the more patients like a drug, the less likely it is to be prescribed2.

III.

There’s one more act in this horror show.

Anyone familiar with these medications reading the table above has probably already noticed this one, but I figured I might as well make it official.

I correlated the average rating of each drug with the year it came on the market. The correlation was -0.71 (p < .001). That is, the newer a drug was, the less patients liked it3.

This pattern absolutely jumps out of the data. First- and second- place winners Nardil and Parnate came out in 1960 and 1961, respectively; I can’t find the exact year third-place winner Anafranil came out, but the first reference to its trade name I can find in the literature is from 1967, so I used that. In contrast, last-place winner Viibryd came out in 2011, second-to-last place winner Abilify got its depression indication in 2007, and third-to-last place winner Brintellix is as recent as 2013.

This result is robust to various different methods of analysis, including declaring MAOIs to be an unfair advantage for Team Old and removing all of them, changing which minor tricylics I do and don’t include in the data, and altering whether Deprenyl, a drug that technically came out in 1970 but received a gritty reboot under the name Emsam in 2006, is counted as older or newer.

So if you want to know what medication will make you happiest, at least according to this analysis your best bet isn’t to ask your doctor, check what’s most popular, or even check any individual online rating database. It’s to look at the approval date on the label and choose the one that came out first.

IV.

What the hell is going on with these data?

I would like to dismiss this as confounded, but I have to admit that any reasonable person would expect the confounders to go the opposite way.

That is: older, less popular drugs are usually brought out only when newer, more popular drugs have failed. MAOIs, the clear winner of this analysis, are very clearly reserved in the guidelines for “treatment-resistant depression”, ie depression you’ve already thrown everything you’ve got at. But these are precisely the depressions that are hardest to treat.

Imagine you are testing the fighting ability of three people via ten boxing matches. You ask Alice to fight a Chihuahua, Bob to fight a Doberman, and Carol to fight Cthulhu. You would expect this test to be biased in favor of Alice and against Carol. But MAOIs and all these other older rarer drugs are practically never brought out except against Cthulhu. Yet they still have the best win-loss record.

Here are the only things I can think of that might be confounding these results.

Perhaps because these drugs are so rare and unpopular, psychiatrists only use them when they have really really good reason. That is, the most popular drug of the year they pretty much cluster-bomb everybody with. But every so often, they see some patient who seems absolutely 100% perfect for clomipramine, a patient who practically screams “clomipramine!” at them, and then they give this patient clomipramine, and she does really well on it.

(but psychiatrists aren’t actually that good at personalizing antidepressant treatments. The only thing even sort of like that is that MAOIs are extra-good for a subtype called atypical depression. But that’s like a third of the depressed population, which doesn’t leave much room for this super-precise-targeting hypothesis.)

Or perhaps once drugs have been on the market longer, patients figure out what they like. Brintellix is so new that the Brintellix patients are the ones whose doctors said “Hey, let’s try you on Brintellix” and they said “Whatever”. MAOIs have been on the market so long that presumably MAOI patients are ones who tried a dozen antidepressants before and stayed on MAOIs because they were the only ones that worked.

(but Prozac has been on the market 25 years now. This should only apply to a couple of very new drugs, not the whole list.)

Or perhaps the older drugs have so many side effects that no one would stay on them unless they’re absolutely perfect, whereas people are happy to stay on the newer drugs even if they’re not doing much because whatever, it’s not like they’re causing any trouble.

(but Seroquel and Abilify, two very new drugs, have awful side effects, yet are down at the bottom along with all the other new drugs)

Or perhaps patients on very rare weird drugs get a special placebo effect, because they feel that their psychiatrist cares enough about them to personalize treatment. Perhaps they identify with the drug – “I am special, I’m one of the only people in the world who’s on nefazodone!” and they become attached to it and want to preach its greatness to the world.

(but drugs that are rare because they are especially new don’t get that benefit. I would expect people to also get excited about being given the latest, flashiest thing. But only drugs that are rare because they are old get the benefit, not drugs that are rare because they are new.)

Or perhaps psychiatrists tend to prescribe the drugs they “imprinted on” in medical school and residency, so older psychiatrists prescribe older drugs and the newest psychiatrists prescribe the newest drugs. But older psychiatrists are probably much more experienced and better at what they do, which could affect patients in other ways – the placebo effect of being with a doctor who radiates competence, or maybe the more experienced psychiatrists are really good at psychotherapy, and that makes the patient better, and they attribute it to the drug.

(but read on…)

V.

Or perhaps we should take this data at face value and assume our antidepressants have been getting worse and worse over the past fifty years.

This is not entirely as outlandish as it sounds. The history of the past fifty years has been a history of moving from drugs with more side effects to drugs with fewer side effects, with what I consider somewhat less than due diligence in making sure the drugs were quite as effective in the applicable population. This is a very complicated and controversial statement which I will be happy to defend in the comments if someone asks.

The big problem is: drugs go off-patent after twenty years. Drug companies want to push new, on-patent medications, and most research is funded by drug companies. So lots and lots of research is aimed at proving that newer medications invented in the past twenty years (which make drug companies money) are better than older medications (which don’t).

I’ll give one example. There is only a single study in the entire literature directly comparing the MAOIs – the very old antidepressants that did best on the patient ratings – to SSRIs, the antidepressants of the modern day4. This study found that phenelzine, a typical MAOI, was no better than Prozac, a typical SSRI. Since Prozac had fewer side effects, that made the choice in favor of Prozac easy.

Did you know you can look up the authors of scientific studies on LinkedIn and sometimes get very relevant information? For example, the lead author of this study has a resume that clearly lists him as working for Eli Lilly at the time the study was conducted (spoiler: Eli Lilly is the company that makes Prozac). The second author’s LinkedIn profile shows he is also an operations manager for Eli Lilly. Googling the fifth author’s name links to a news article about Eli Lilly making a $750,000 donation to his clinic. Also there’s a little blurb at the bottom of the paper saying “Supported by a research grant by Eli Lilly and company”, then thanking several Eli Lilly executives by name for their assistance.

This is the sort of study which I kind of wish had gotten replicated before we decided to throw away an entire generation of antidepressants based on the result.

But who will come to phenelzine’s defense? Not Parke-Davis , the company that made it: their patent expired sometime in the seventies, and then they were bought out by Pfizer5. And not Pfizer – without a patent they can’t make any money off Nardil, and besides, Nardil is competing with their own on-patent SSRI drug Zoloft, so Pfizer has as much incentive as everyone else to push the “SSRIs are best, better than all the rest” line.

Every twenty years, pharmaceutical companies have an incentive to suddenly declare that all their old antidepressants were awful and you should never use them, but whatever new antidepressant they managed to dredge up is super awesome and you should use it all the time. This sort of does seem like the sort of situation that might lead to older medications being better than newer ones. A couple of people have been pushing this line for years – I was introduced to it by Dr. Ken Gillman from Psychotropical Research, whose recommendation of MAOIs and Anafranil as most effective match the patient data very well, and whose essay Why Most New Antidepressants Are Ineffective is worth a read.

I’m not sure I go as far as he does – even if new antidepressants aren’t worse outright, they might still trade less efficacy for better safety. Even if they handled the tradeoff well, it would look like a net loss on patient rating data. After all, assume Drug A is 10% more effective than Drug B, but also kills 1% of its users per year, while Drug B kills nobody. Here there’s a good case that Drug B is much better and a true advance. But Drug A’s ratings would look better, since dead men tell no tales and don’t get to put their objections into online drug rating sites. Even if victims’ families did give the drug the lowest possible rating, 1% of people giving a very low rating might still not counteract 99% of people giving it a higher rating.

And once again, I’m not sure the tradeoff is handled very well at all.6.

VI.

In order to distinguish between all these hypotheses, I decided to get a lot more data.

I grabbed all the popular antipsychotics, antihypertensives, antidiabetics, and anticonvulsants from the three databases, for a total of 55,498 ratings of 74 different drugs. I ran the same analysis on the whole set.

The three databases still correlate with each other at respectable levels of +0.46, +0.54, and +0.53. All of these correlations are highly significant, p < 0.01.

The negative correlation between patient rating and doctor rating remains and is now a highly significant -0.344, p < 0.01. This is robust even if antidepressants are removed from the analysis, and is notable in both psychiatric and nonpsychiatric drugs.

The correlation between patient rating and year of release is a no-longer-significant -0.191. This is heterogenous; antidepressants and antipsychotics show a strong bias in favor of older medications, and antidiabetics, antihypertensives, and anticonvulsants show a slight nonsignificant bias in favor of newer medications. So it would seem like the older-is-better effect is purely psychiatric.

I conclude that for some reason, there really is a highly significant effect across all classes of drugs that makes doctors love the drugs patients hate, and vice versa.

I also conclude that older psychiatric drugs seem to be liked much better by patients, and that this is not some kind of simple artifact or bias, since if such an artifact or bias existed we would expect it to repeat in other kinds of drugs, which it doesn’t.

VII.

Please feel free to check my results. Here is a spreadsheet (.xls) containing all of the data I used for this analysis. Drugs are marked by class: 1 is antidepressants, 2 is antidiabetics, 3 is antipsychotics, 4 is antihypertensives, and 5 is anticonvulsants. You should be able to navigate the rest of it pretty easily.

One analysis that needs doing is to separate out drug effectiveness versus side effects. The numbers I used were combined satisfaction ratings, but a few databases – most notably WebMD – give you both separately. Looking more closely at those numbers might help confirm or disconfirm some of the theories above.

If anyone with the necessary credentials is interested in doing the hard work to publish this as a scientific paper, drop me an email and we can talk.

Footnotes

1. Technically, MAOI superiority has only been proven for atypical depression, the type of depression where you can still have changing moods but you are unhappy on net. But I’d speculate that right now most patients diagnosed with depression have atypical depression, far more than the studies would indicate, simply because we’re diagnosing less and less severe cases these days, and less severe cases seem more atypical.

2. First-place winner Nardil has only 16% as many reviews as last-place winner Viibryd, even though Nardil has been on the market fifty years and Viibryd for four. Despite its observed superiority, Nardil may very possibly be prescribed less than 1% as often as Viibryd.

3. Pretty much the same thing is true if, instead of looking at the year they came out, you just rank them in order from earliest to latest.

4. On the other hand, what we do have is a lot of studies comparing MAOIs to imipramine, and a lot of other studies comparing modern antidepressants to imipramine. For atypical depression and dysthymia, MAOIs beat imipramine handily, but the modern antidepressants are about equal to imipramine. This strongly implies the MAOIs beat the modern antidepressants in these categories.

5. Interesting Parke-Davis facts: Parke-Davis got rich by being the people to market cocaine back in the old days when people treated it as a pharmaceutical, which must have been kind of like a license to print money. They also worked on hallucinogens with no less a figure than Aleister Crowley, who got a nice tour of their facilities in Detroit.

6. Consider: Seminars In General Psychiatry estimates that MAOIs kill one person per 100,000 patient years. A third of all depressions are atypical. MAOIs are 25 percentage points more likely to treat atypical depression than other antidepressants. So for every 100,000 patients you give a MAOI instead of a normal antidepressant, you kill one and cure 8,250 who wouldn’t otherwise be cured. The QALY database says that a year of moderate depression is worth about 0.6 QALYs. So for every 100,000 patients you give MAOIs, you’re losing about 30 QALYs and gaining about 3,300.

OT19: Don’t Thread On Me

This is the semimonthly open thread. Post about anything you want, ask random questions, whatever. Also:

1. Comments of the week are Scott McGreal actually reading the supplement of that growth mindset study, and gwern responding to the cactus-person story in the most gwernish way possible.

2. Worthy members of the in-group who need financial help: CyborgButterflies (donate here) and as always the guy who runs CrazyMeds (donate by clicking the yellow DONATE button on the right side here)

3. I offer you a statistical mystery a little closer to home than the ones we usually investigate around here: how come my blog readership has collapsed? The week-by-week chart looks like this:

Notice that the week of February 23rd it falls and has never recovered. In fact, I can pinpoint the specific day:

Between February 20th and February 21, I lost about a third of my blog readership, and they haven’t come back.

Now, I did go on vacation starting February 20 and make fewer posts than normal during that time, but usually when I don’t post for a while I get a very gradual drop-off, whereas here, the day after a relatively popular post, everyone departs all of a sudden. And I’ve been back from vacation for a month and a half without anything getting better.

I would assume maybe WordPress changed its method of calculating statistics around that time, but I can’t find any evidence of this on the WordPress webpage. That suggests it might be a real thing. Did any of you leave around February 20th for some reason and not check the blog again until today? Did anything happen February 20th that tempted you to leave and you only barely hung on? I get self-esteem and occasionally money from blog hits, so this is kind of bothering me.

4. I want to clarify that when I discuss growth mindset, the strongest conclusion I can come to is that it’s not on as firm ground as some people seem to think. I do not endorse claims that I have “debunked” growth mindset or that it is “stupid”. There are still lots of excellent studies in favor, they just have to be interpreted in the context of other things.

Posted in Uncategorized | Tagged | 817 Comments

Nefarious Nefazodone And Flashy Rare Side Effects

[Epistemic status: I am still in training. I am not an expert on drugs. This is poorly-informed speculation about drugs and it should not be taken seriously without further research. Nothing in this post is medical advice.]

I.

Which is worse – ruining ten million people’s sex lives for one year, or making one hundred people’s livers explode?

I admit I sometimes use this blog to speculate about silly moral dilemmas for no reason, but that’s not what’s happening here. This is a real question that I deal with on a daily basis.

SSRIs, the class which includes most currently used antidepressants, are very safe in the traditional sense of “unlikely to kill you”. Suicidal people take massive overdoses of SSRIs all the time, and usually end up with little more than a stomachache for their troubles. On the other hand, there’s increasing awareness of very common side effects which, while not disabling, can be pretty unpleasant. About 50% of users report decreased sexual abilities, sometimes to the point of total loss of libido or anorgasmia. And something like 25% of users experience “emotional blunting” and the loss of ability to feel feelings normally.

Nefazodone (brand name Serzone®, which would also be a good brand name for a BDSM nightclub) is an equally good (and maybe better) antidepressant that does not have these side effects. On the other hand, every year, one in every 300,000 people using nefazodone will go into “fulminant hepatic failure”, which means their liver suddenly and spectacularly stops working and they need a liver transplant or else they die.

There are a lot of drug rating sites, but the biggest is Drugs.com. 467 Drugs.com users have given Celexa, a very typical SSRI, an average rating of 7.8/10. 14 users have given nefazodone an average rating of 9.1/10.

CrazyMeds might not be as dignified as Drugs.com, but they have a big and well-educated user base and they’re psych-specific. Their numbers are 3.3/5 (n = 253) for Celexa and 4.1/5 (n = 47) for nefazodone.

So both sites’ users seem to agree that nefazodone is notably better than Celexa, in terms of a combined measure of effectiveness and side effects.

But nefazodone is practically never used. It’s actually illegal in most countries. In the United States, parent company Bristol-Myers Squibb (which differs from normal Bristol-Myers in that it was born without innate magical ability) withdrew it from the market, and the only way you can find it nowadays is to get it is from an Israeli company that grabbed the molecule after it went off-patent. In several years working in psychiatry, I have never seen a patient on nefazodone, although I’m sure they exist somewhere. I would estimate its prescription numbers are about 1% of Celexa’s, if that.

The problem is the hepatic side effects. Nobody wants to have their liver explode.

But. There are something like thirty million people in the US on antidepressants. If we put them all on nefazodone, that’s about a hundred cooked livers per year. If we put them all on SSRIs, at least ten million of them will get sexual side effects, plus some emotional blunting.

My life vastly improved when I learned there was a searchable database of QALYs for different conditions. It doesn’t have SSRI-induced sexual dysfunction, but it does have sexual dysfunction due to prostate cancer treatment, and I assume that sexual dysfunction is about equally bad regardless of what causes it. Their sexual dysfunction has some QALY weights averaging about 0.85. Hm.

Assume everyone with fulminant liver failure dies. That’s not true; some get liver transplants, maybe some even get a miracle and recover. But assume everyone dies – and further, they die at age 30, cutting their lives short by fifty years.

In that case, putting all depressed people on nefazodone for a year costs 5,000 QALYs, but putting all depressed people on SSRIs for a year costs 1,500,000 QALYs. The liver failures may be flashier, but the 3^^^3 dust specks worth of poor sex lives add up to more disutility in the end.

I don’t want to overemphasize this particular calculation for a couple of reasons. First, SSRIs and nefazodone both have other side effects besides the major ones I’ve focused on here. Second, I don’t know if the level of SSRI-induced sexual dysfunction is as bad as the prostate-surgery-induced sexual dysfunction on the database. Third, there are a whole bunch of antidepressants that are neither SSRIs nor nefazodone and which might be safer than either.

But I do want to emphasize this pattern, because it recurs again and again.

II.

In that spirit, which would you rather have – something like a million people addicted to amphetamines, or something like ten people have their skin eat itself from the inside?

I can’t get good numbers on how many adults abuse Adderall, but a quick glance at the roster for my hospital’s rehab unit suggests “a lot”. Huffington Post calls it the most abused prescription drug in America, which sounds about right to me. Honestly there are worse things to be addicted to than Adderall, but it’s not completely without side effects. The obvious ones are anxiety, irritability, occasionally frank psychosis, and sometimes heart problems – but a lot of the doctors I work with go beyond what the research can really prove and suggest it can produce lasting negative personality change and predispose people to other forms of addictive and impulsive behavior.

If you’ve got to give adults a stimulant, I would much prefer modafinil. It’s not addictive, it lacks most of Adderall’s side effects, and it works pretty well. I’ve known many people on modafinil and they give it pretty universally positive reviews.

On the other hand, modafinil may or may not cause a skin reaction called Stevens Johnson Syndrome/Toxic Epidermal Necrolysis, which like most things with both “toxic” and “necro” in the name is really really bad. The original data suggesting a connection came from kids, who get all sorts of weird drug effects that adults don’t, but since then some people have claimed to have found a connection with adults. Some people get SJS anyway just by bad luck, or because they’re taking other drugs, so it’s really hard to attribute cases specifically to modafinil.

Gwern’s Modafinil FAQ mentions an FDA publication which argues that the background rate of SJS/TEN is 1-2 per million people per year, but the modafinil rate is about 6 per million people per year. However, there are only three known cases of a person above age 18 on modafinil getting SJS/TEN, and this might not be different from background rates after all. Overall the evidence that modafinil increases the rate of SJS/TEN in adults at all is pretty thin, and if it does, it’s as rare as hen’s teeth (in fact, very close to the same rate as liver failure from nefazodone).

(also: consider that like half of Silicon Valley is on modafinil, yet San Francisco Bay is not yet running red with blood.)

(also: ibuprofen is linked to SJS/TEN, with about the same odds ratio as modafinil, but nobody cares, and they are correct not to care.)

I said I’ve never seen a doctor prescribe nefazodone in real life; I can’t say that about modafinil. I have seen one doctor prescribe modafinil. It happened like this: a doctor I was working with was very upset, because she had an elderly patient with very low energy for some reason, I can’t remember, maybe a stroke, and wanted to give him Adderall, but he had a heart arrythmia and Adderall probably wouldn’t be safe for him.

I asked “What about modafinil?”

She said, “Modafinil? Really? But doesn’t that sometimes cause Stevens Johnson Syndrome?”

And then I glared at her until she gave in and prescribed it.

But this is very, very typical. Doctors who give out Adderall like candy have no associations with modafinil except “that thing that sometimes causes Stevens-Johnson Syndrome” and are afraid to give it to people.

III.

Nefazodone and modafinil are far from the only examples of this pattern. MAOIs are like this too. So is clozapine. If I knew more about things other than psychiatry, I bet I could think of examples from other fields of medicine.

And partially this is natural and understandable. Doctors swear an oath to “first do no harm”, and toxic epidermal necrolysis is pretty much the epitome of harm. Thought experiments like torture vs dust specks suggest that most people’s moral intuitions say that no amount of aggregated lesser harms like sexual side effects and amphetamine addictions can equal the importance of avoiding even a tiny chance of some great harm like liver failure or SJS/TEN. Maybe your doctor, if you asked her directly, would endorse a principled stance of “I am happy to give any number of people anxiety and irritability in order to avoid even the smallest chance of one case of toxic epidermal necrolysis.”

And yet.

The same doctors who would never dare give nefazodone, consider Seroquel a perfectly acceptable second-line treatment for depression. Along with other atypical antipsychotics, Seroquel raises the risk of sudden cardiac death by about 50%. The normal risk of cardiac sudden death in young people is about 10 in 100,000 per year, so if my calculations are right, low-dose Seroquel causes an extra cardiac death once per every 20,000 patient-years. That’s ten times as often as nefazodone causes an extra liver death.

Yet nefazodone was taken off of the market by its creators and consigned to the dustbin of pharmacological history, and Seroquel is the sixth-best-selling drug in the United States, commonly given for depression, simple anxiety, and sometimes even to help people sleep.

Why the disconnect? Here’s a theory: sudden cardiac death happens all the time; sometimes God just has it in for you and your heart stops working and you die. Antipsychotics can increase the chances of that happening, but it’s a purely statistical increase, such that we can detect it aggregated over large groups but never be sure that it played a role in any particular case. The average person who dies of Seroquel never knows they died of Seroquel, but the average person who dies from nefazodone is easily identified as a nefazodone-related death. So nefazodone gets these big stories in the media about this young person who died by taking this exotic psychiatric drug, and it becomes a big deal and scares the heck out of everybody. When someone dies of Seroquel, it’s just an “oh, so sad, I guess his time has come.”

But the end result is this. When treatment with an SSRI fails, nefazodone and Seroquel naively seem to be equally good alternatives. Except nefazodone has a death rate of 1/300,000 patient years, and Seroquel 1/20,000 patient years. And yet everyone stays the hell away from the nefazodone because it’s known to be unsafe, and chooses the Seroquel.

I conclude either doctors are terrible at thinking about risk, or else maybe a little too good at thinking about risk.

I bring up the latter option because there’s a principal-agent problem going on here. Doctors want to do what’s best for their patients. But they also want to do what’s best for themselves, which means not getting sued. No one has ever sued their doctor because they got a sexual side effect from SSRIs, but if somebody dies because they’re the lucky 1/300,000 who gets liver failure from nefazodone, you can bet their family’s going to sue. Suddenly it’s not a matter of comparing QALYs, it’s a matter of comparing zero percent chance of lawsuit with non-zero percent chance of lawsuit.

(Fermi calculation: if a doctor has 100 patients at a time on antidepressants, and works for 30 years, then if she uses Serzone as her go-to antidepressant, she’s risking a 1% chance of getting the liver failure side effect once in her career. That’s small, but since a single bad lawsuit can bankrupt a doctor, it’s worth taking seriously.)

And that would be a tough lawsuit to fight. “Yes, Your Honor, I knew when I prescribed this drug that it sometimes makes people’s livers explode, but the alternative often gives people a bad sex life, and according to the theory of utilitarianism as propounded by 18th century philosopher Jeremy Bentham – ” … “Bailiff, club this man”.

And the same facet of nefazodone that makes it exciting for the media makes it exciting for lawsuits. When someone dies of nefazodone toxicity, everyone knows. When someone dies of Seroquel, “oh, so sad, I guess his time has come”.

That makes Seroquel a lot safer than nefazodone. Safer for the doctor, I mean. The important kind of safer.

This is why, as I mentioned before, I hate lawsuits as a de facto regulatory mechanism. Our de jure regulatory mechanism, the FDA, is pretty terrible, but to its credit it hasn’t banned nefazodone. One time it banned clozapine because of a flashy rare side effect, but everyone yelled at them and they apologized and changed their mind. With lawsuits there’s nobody to yell at, so we just end up with people very quietly adjusting their decisions in the shadows and nobody else being any the wiser.

I don’t want to overemphasize this, because I think it’s only one small part of the problem. After all, a lot of countries withdrew nefazodone entirely and didn’t even give lawsuits a chance to enter the picture.

But whatever the cause, the end result is that drugs with rare but spectacular side effects get consistently underprescribed relative to drugs with common but merely annoying side effects, or drugs that have more side effects but manage to hide them better.

Growth Mindset 3: A Pox On Growth Your Houses

Jacques Derrida proposed a form of philosophical literary criticism called deconstruction. I’ll be the first to admit I don’t really understand it, but it seems to have something to do with assuming all texts secretly contradict their stated premise and apparent narrative, then hunting down and exposing the plastered-over areas where the author tries to hide this.

I have no idea whether this works for literature or not, but it’s a useful way to read scientific papers.

Consider a popular field – or, at least, a field where a certain position is popular. For example, we’ve been talking a lot about growth mindset recently. There seem to be a lot of researchers working to prove growth mindset and not a lot working to disprove it. Journals are pretty interested in studies showing growth mindset interventions work, and maybe not so interested in studies showing they don’t. I’ll admit that my strong suspicions of publication bias don’t seem to be borne out by the facts here – see this meta-analysis – but I bet its more sinister cousin “all experimenters believe the same thing and have the same experimenter effects” bias is alive and well.

In a field like that, you’re not going to get the contrarian studies you want, but one way to find the other side of the issue is to look a little more closely at the studies that do get published, the ones that say they’re in support of the thesis, and see if you can find anything incriminating.

Here’s a perfect example: Mindset Interventions Are A Scalable Treatment For Academic Underachievement, by a team of six researchers including Carol Dweck.

The abstract reads:

The efficacy of academic-mind-set interventions has been demonstrated by small-scale, proof-of-concept interventions, generally delivered in person in one school at a time. Whether this approach could be a practical way to raise school achievement on a large scale remains unknown. We therefore delivered brief growth-mind-set and sense-of-purpose interventions through online modules to 1,594 students in 13 geographically diverse high schools. Both interventions were intended to help students persist when they experienced academic difficulty; thus, both were predicted to be most beneficial for poorly performing students. This was the case. Among students at risk of dropping out of high school (one third of the sample), each intervention raised students’ semester grade point averages in core academic courses and increased the rate at which students performed satisfactorily in core courses by 6.4 percentage points. We discuss implications for the pipeline from theory to practice and for education reform.

This sounds really, really impressive! It’s hard to imagine any stronger evidence in growth mindset’s favor.

And then you make the mistake of reading the actual paper.

The paper asked a 1,594 students from a bunch of different high schools to take a 45 minute online course.

A quarter of the students took a placebo course that just presented some science about how different parts of the brain do different stuff. This was also classified as a “mindset intervention”, though it seems pretty different.

Another quarter took a course that was supposed to teach growth mindset.

Still another quarter took a course about “sense of purpose” which talked about how schoolwork was meaningful and would help them accomplish lots of goals and they should be happy to do it.

And the final quarter took both the growth mindset course and the “sense of purpose” course.

Then they let all students continue taking their classes for the rest of the semester and saw what happened, which was this:

Among ordinary students, the effect on the growth mindset group was completely indistinguishable from zero, and in fact they did nonsignificantly worse than the control group. This was the most basic test they performed, and it should have been the headline of the study. The study should have been titled “Growth Mindset Intervention Totally Fails To Affect GPA In Any Way”.

Instead they went to subgroup analysis. Subgroup analysis can be useful to find more specific patterns in the data, but if it’s done post hoc it can lead to what I previously called the Elderly Hispanic Woman Effect, after medical papers that can’t find their drug has any effect on people at large, so they keep checking different subgroups – young white men…nothing. Old black men…nothing. Middle-aged Asian transgender people…nothing. Newborn Australian aboriginal butch lesbians…nothing. Elderly Hispanic women…p = 0.049…aha! And the study gets billed as “Scientists Find Exciting New Drug That Treats Diabetes In Elderly Hispanic Women.”

As per the abstract, the researchers decided to focus on an “at risk” subgroup because they had principled reasons to believe mindset interventions would work better on them. In their subgroup of 519 students who had a GPA of 2.0 or less last semester, or who failed one or more academic courses last semester:

Growth mindset still doesn’t differ from zero. And growth mindset does nonsignificantly worse than their “sense of purpose” intervention where they tell children to love school. In fact, the students who take both “sense of purpose” and growth mindset actually do (nonsignificantly) worse than sense-of-purpose alone!

But the control group mysteriously started doing much worse in all their classes right after the study started, so growth mindset is significantly better than the control group. Hooray!

Why would the control group’s GPA suddenly decline? The simplest answer would be that by coincidence the class got harder right after the study started, and only the intervention kids were resilient enough to deal with it – but that can’t be right, because this was done at eleven different schools, and they wouldn’t have all had their coursework get harder at the same time.

Another possibility is that sufficiently low-functioning kids are always declining – that is, as time goes on they get more and more behind in their coursework, so their grades at time t+1 are always less than at time t, and maybe growth mindset has arrested this decline. This is plausible and I’d be interested in seeing if other studies have found this.

Perhaps aware that this is not very convincing, the authors go on to do another analysis, this one of percent of students passing their classes.

This is the same group of at-risk students as the last one. It’s graphing what percent of these students pass versus fail their courses. The graph on th left shows that a significantly higher number of students in the intervention conditions pass their courses than in the control condition.

This is better, but one part still concerns me.

Did you catch that phrase “intervention conditions”? The authors of the study write: “Because our primary research question concerned the efficacy of academic mindset interventions in general when delivered via online modules, we then collapsed the intervention conditions into a single intervention dummy code (0 = control, 1 = intervention).

We don’t know whether growth mindset did anything for even these students in this little subgroup, because it was collapsed together with the (more effective) “sense of purpose” intervention before any of these tests were done. I don’t know if this is just for convenience, or if it is to obfuscate that it didn’t work on its own.

[EDIT: Scott McGreal looks further and finds in the supplementary material that growth mindset alone did NOT significantly improve pass rates!]

The abstract of this study tells you none of this. It just says: “Mindset Interventions Are A Scalable Treatment For Academic Overachievement…Among students at risk of dropping out of high school (one third of the sample), each intervention raised students’ semester grade point averages in core academic courses and increased the rate at which students performed satisfactorily in core courses by 6.4 percentage points” From the abstract, this study is a triumph.

But my own summary of these results, as relevant to growth mindset is as follows:

For students with above a 2.0 GPA, a growth mindset intervention did nothing.

For students with below a 2.0 GPA, the growth mindset interventions may not have improved GPA, but may have prevented GPA from falling, which for some reason it was otherwise going to do.

Even in those students, it didn’t do any better than a “sense-of-purpose” intervention where children were told platitudes about how doing well in school will “make their families proud” and “make a positive impact”.

In no group of students did it significantly increase chance of passing any classes.

Haishan writes:

If ye read only the headlines, what reward have ye? Do not even the policymakers the same? And if ye take the abstract at its face, what do ye more than others? Do not even the science journalists so?”

Titles, abstracts, and media presentations are where authors can decide how to report a bunch of different, often contradictory results in a way that makes it look like they have completely proven their point. A careful look at the study may find that their emphasis is misplaced, and give you more than enough ammunition against a theory even where the stated results are glowingly positive.

The only reason we were told these results is that they were in the same place as a “sense of purpose mindset” intervention that looked a little better, so it was possible to publish the study and claim it as a victory for mindsets in general. How many studies that show similar results for growth mindset lack a similar way of spinning the data, and so never get seen at all?

Universal Love, Said The Cactus Person

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Right,” I said. “I’m absolutely in favor of both those things. But before we go any further, could you tell me the two prime factors of 1,522,605,027, 922,533,360, 535,618,378, 132,637,429, 718,068,114, 961,380,688, 657,908,494 ,580,122,963, 258,952,897, 654,000,350, 692,006,139?

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

The sea was made of strontium; the beach was made of rye. Above my head, a watery sun shone in an oily sky. A thousand stars of sertraline whirled round quetiapine moons, and the sand sizzled sharp like cooking oil that hissed and sang and threatened to boil the octahedral dunes.

“Okay,” I said. “Fine. Let me tell you where I’m coming from. I was reading Scott McGreal’s blog, which has some good articles about so-called DMT entities, and mentions how they seem so real that users of the drug insist they’ve made contact with actual superhuman beings and not just psychedelic hallucinations. You know, the usual Terence McKenna stuff. But in one of them he mentions a paper by Marko Rodriguez called A Methodology For Studying Various Interpretations of the N,N-dimethyltryptamine-Induced Alternate Reality, which suggested among other things that you could prove DMT entities were real by taking the drug and then asking the entities you meet to factor large numbers which you were sure you couldn’t factor yourself. So to that end, could you do me a big favor and tell me the factors of 1,522,605,027, 922,533,360, 535,618,378, 132,637,429, 718,068,114, 961,380,688, 657,908,494, 580,122,963, 258,952,897, 654,000,350, 692,006,139?

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

The sea turned hot and geysers shot up from the floor below. First one of wine, then one of brine, then one more yet of turpentine, and we three stared at the show.

“I was afraid you might say that. Is there anyone more, uh, verbal here whom I could talk to?”

“Universal love,” said the cactus person.

At the sound of that, the big green bat started rotating in place. On its other side was a bigger greener bat, with a ancient, wrinkled face.

Not splitting numbers / but joining Mind,” it said.
Not facts or factors or factories / but contact with the abstract attractor that brings you back to me
Not to seek / but to find

“I don’t follow,” I said.

Not to follow / but to jump forth into the deep
Not to grind or to bind or to seek only to find / but to accept
Not to be kept / but to wake from sleep

The bat continued to rotate, until the first side I had seen swung back into view.

“Okay,” I said. “I’m going to hazard a guess as to what you’re talking about, and you tell me if I’m right. You’re saying that, like, all my Western logocentric stuff about factoring numbers in order to find out the objective truth about this realm is missing the point, and I should be trying to do some kind of spiritual thing involving radical acceptance and enlightenment and such. Is that kind of on the mark?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Frick,” I said. “Well, okay, let me continue.” The bat was still rotating, and I kind of hoped that when the side with the creepy wrinkled face came into view it might give me some better conversation. “I’m all about the spiritual stuff. I wouldn’t be here if I weren’t deeply interested in the spiritual stuff. This isn’t about money or fame or anything. I want to advance psychedelic research. If you can factor that number, then it will convince people back in the real – back in my world that this place is for real and important. Then lots of people will take DMT and flock here and listen to what you guys have to say about enlightenment and universal love, and make more sense of it than I can alone, and in the end we’ll have more universal love, and…what was the other thing?”

“Transcendent joy,” said the big green bat.

“Right,” I said. “We’ll have more transcendent joy if you help me out and factor the number than if you just sit there being spiritual and enigmatic.”

“Lovers do not love to increase the amount of love in the world / But for the mind that thrills
And the face of the beloved, which the whole heart fills / the heart and the art never apart, ever unfurled
And John Stuart is one of / the dark satanic mills”

“I take it you’re not consequentialists,” I said. “You know that’s really weird, right. Like, not just ‘great big green bat with two faces and sapient cactus-man’ weird, but like really weird. You talk about wanting this spiritual enlightenment stuff, but you’re not going to take actions that are going to increase the amount of spiritual enlightenment? You’ve got to understand, this is like a bigger gulf for me than normal human versus ineffable DMT entity. You can have crazy goals, I expect you to have crazy goals, but what you’re saying now is that you don’t pursue any goals at all, you can’t be modeled as having desires. Why would you do that?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Now you see here,” I said. “Everyone in this conversation is in favor of universal love and transcendent joy. But I’ve seen the way this works. Some college student gets his hands on some DMT, visits here, you guys tell him about universal love and transcendent joy, he wakes up, says that his life has been changed, suddenly he truly understands what really matters. But it never lasts. The next day he’s got to get up and go to work and so on, and the universal love lasts about five minutes until his boss starts yelling at him for writing his report in the wrong font, and before you know it twenty years later he’s some slimy lawyer who’s joking at a slimy lawyer party about the one time when he was in college and took some DMT and spent a whole week raving about transcendent joy, and all the other slimy lawyers laugh, and he laughs with them, and so much for whatever spiritual awakening you and your colleagues in LSD and peyote are trying to kindle in humanity. And if I accept your message of universal love and transcendent joy right now, that’s exactly what’s going to happen to me, and meanwhile human civilization is going to keep being stuck in greed and ignorance and misery. So how about you shut up about universal love and you factor my number for me so we can start figuring out a battle plan for giving humanity a real spiritual revolution?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

A meteorite of pure delight struck the sea without a sound. The force of the blast went rattling past the bat and the beach, disturbing each, then made its way to a nearby bay of upside-down trees with their roots in the breeze and their branches underground.

“I demand a better answer than that,” I demanded.

The other side of the bat spun into view.

“Chaos never comes from the Ministry of Chaos / nor void from the Ministry of Void
Time will decay us but time can be left blank / destroyed
With each Planck moment ever fit / to be eternally enjoyed”

“You’re making this basic mistake,” I told the big green bat. “I honestly believe that there’s a perspective from which Time doesn’t matter, where a single moment of recognition is equivalent to eternal recognition. The problem is, if you only have that perspective for a moment, then all the rest of the time, you’re sufficiently stuck in Time to honestly believe you’re stuck in Time. It’s like that song about the hole in the bucket – if the hole in the bucket were fixed, you would have the materials needed to fix the hole in the bucket. But since it isn’t, you don’t. Likewise, if I understood the illusoriness…illusionality…whatever, of time, then I wouldn’t care that I only understood it for a single instant. But since I don’t, I don’t. Without a solution to the time-limitedness of enlightenment that works from within the temporal perspective, how can you consider it solved at all?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

The watery sun began to run and it fell on the ground as rain. It became a dew that soaked us through, and as the cold seemed to worsen the cactus person hugged himself to stay warm but his spines pierced his form and he howled in a fit of pain.

“You know,” I said, “sometimes I think the kvithion sumurhe had the right of it. The world is an interference pattern between colliding waves of Truth and Beauty, and either one of them pure from the source and undiluted by the other will be fatal. I think you guys and some of the other psychedelics might be pure Beauty, or at least much closer to the source than people were meant to go. I think you can’t even understand reason, I think you’re constitutionally opposed to reason, and that the only way we’re ever going to get something that combines your wisdom and love and joy with reason is after we immanentize the eschaton and launch civilization into some perfected postmessianic era where the purpose of the world is fully complete. And that as much as I hate to say it, there’s no short-circuiting the process.”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“I’m dissing you, you know. I’m saying you guys are so intoxicated on spiritual wisdom that you couldn’t think straight if your life depended on it; that your random interventions in our world and our minds look like the purposeless acts of a drunken madman because that’s basically more or less what they are. I’m saying if you had like five IQ points between the two of you, you could tap into your cosmic consciousness or whatever to factor a number that would do more for your cause than all your centuries of enigmatic dreams and unasked-for revelations combined, and you ARE TOO DUMB TO DO IT EVEN WHEN I BASICALLY HOLD YOUR HAND THE WHOLE WAY. Your spine. Your wing. Whatever.”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Fuck you,” said I.

I saw the big green bat bat a green big eye. Suddenly I knew I had gone too far. The big green bat started to turn around what was neither its x, y, or z axis, slowly rotating to reveal what was undoubtedly the biggest, greenest bat that I had ever seen, a bat bigger and greener than which it was impossible to conceive. And the bat said to me:

“Sir. Imagine you are in the driver’s seat of a car. You have been sitting there so long that you have forgotten that it is the seat of a car, forgotten how to get out of the seat, forgotten the existence of your own legs, indeed forgotten that you are a being at all separate from the car. You control the car with skill and precision, driving it wherever you wish to go, manipulating the headlights and the windshield wipers and the stereo and the air conditioning, and you pronounce yourself a great master. But there are paths you cannot travel, because there are no roads to them, and you long to run through the forest, or swim in the river, or climb the high mountains. A line of prophets who have come before you tell you that the secret to these forbidden mysteries is an ancient and terrible skill called GETTING OUT OF THE CAR, and you resolve to learn this skill. You try every button on the dashboard, but none of them is the button for GETTING OUT OF THE CAR. You drive all of the highways and byways of the earth, but you cannot reach GETTING OUT OF THE CAR, for it is not a place on a highway. The prophets tell you GETTING OUT OF THE CAR is something fundamentally different than anything you have done thus far, but to you this means ever sillier extremities: driving backwards, driving with the headlights on in the glare of noon, driving into ditches on purpose, but none of these reveal the secret of GETTING OUT OF THE CAR. The prophets tell you it is easy; indeed, it is the easiest thing you have ever done. You have traveled the Pan-American Highway from the boreal pole to the Darien Gap, you have crossed Route 66 in the dead heat of summer, you have outrun cop cars at 160 mph and survived, and GETTING OUT OF THE CAR is easier than any of them, the easiest thing you can imagine, closer to you than the veins in your head, but still the secret is obscure to you.”

A herd of bison came into listen, and voles and squirrels and ermine and great tusked deer gathered round to hear as the bat continued his sermon.

“And finally you drive to the top of the highest peak and you find a sage, and you ask him what series of buttons on the dashboard you have to press to get out of the car. And he tells you that it’s not about pressing buttons on the dashboard and you just need to GET OUT OF THE CAR. And you say okay, fine, but what series of buttons will lead to you getting out of the car, and he says no, really, you need to stop thinking about dashboard buttons and GET OUT OF THE CAR. And you tell him maybe if the sage helps you change your oil or rotates your tires or something then it will improve your driving to the point where getting out of the car will be a cinch after that, and he tells you it has nothing to do with how rotated your tires are and you just need to GET OUT OF THE CAR, and so you call him a moron and drive away.”

“Universal love,” said the cactus person.

“So that metaphor is totally unfair,” I said, “and a better metaphor would be if every time someone got out of the car, five minutes later they found themselves back in the car, and I ask the sage for driving directions to a laboratory where they are studying that problem, and…”

“You only believe that because it’s written on the windshield,” said the big green bat. “And you think the windshield is identical to reality because you won’t GET OUT OF THE CAR.”

“Fine,” I said. “Then I can’t get out of the car. I want to get out of the car. But I need help. And the first step to getting help is for you to factor my number. You seem like a reasonable person. Bat. Freaky DMT entity. Whatever. Please. I promise you, this is the right thing to do. Just factor the number.”

“And I promise you,” said the big green bat. “You don’t need to factor the number. You just need to GET OUT OF THE CAR.”

“I can’t get out of the car until you factor the number.”

“I won’t factor the number until you get out of the car.”

“Please, I’m begging you, factor the number!”

“Yes, well, I’m begging you, please get out of the car!”

“FOR THE LOVE OF GOD JUST FACTOR THE FUCKING NUMBER!”

“FOR THE LOVE OF GOD JUST GET OUT OF THE FUCKING CAR!”

“FACTOR THE FUCKING NUMBER!”

“GET OUT OF THE FUCKING CAR!”

“Universal love,” said the cactus person.

Then tree and beast all fled due east and the moon and stars shot south. And the bat rose up and the sea was a cup and the earth was a screen green as clozapine and the sky a voracious mouth. And the mouth opened wide and the earth was skied and the sea fell in with an awful din and the trees were moons and the sand in the dunes was a blazing comet and…

I vomited, hard, all over my bed. It happens every time I take DMT, sooner or later; I’ve got a weak stomach and I’m not sure the stuff I get is totally pure. I crawled just far enough out of bed to flip a light switch on, then collapsed back onto the soiled covers. The clock on the wall read 11:55, meaning I’d been out about an hour and a half. I briefly considered taking some more ayahuasca and heading right back there, but the chances of getting anything more out of the big green bat, let alone the cactus person, seemed small enough to fit in a thimble. I drifted off into a fitful sleep.

Behind the veil, across the infinite abyss, beyond the ice, beyond daath, the dew rose from the soaked ground and coalesced into a great drop, which floated up into an oily sky and became a watery sun. The cactus person was counting on his spines.

“Hey,” the cactus person finally said, “just out of curiosity, was the answer 37,975,227, 936,943,673, 922,808,872, 755,445,627, 854,565,536, 638,199 times 40,094,690,950, 920,881,030, 683,735,292, 761,468,389, 214,899,724,061?”

“Yeah,” said the big green bat. “That’s what I got too.”

Links 4/15: Link And You’re Dead

Perytons are mysterious bursts detected by radio telescopes. Some kind of novel astronomical object? Maybe not – a recent investigation suggested something more banal – microwave ovens in the astronomers’ break room.

Greg Cochran on creepy cell line infections. “There are diseases that look as if they might be infectious where no causative organisms has ever been found – diseases like sarcoidosis. They might be caused by some disease that started out as your second cousin Frank.”

Yale on climate change polling. More people believe in global warming themselves, than believe there is a scientific consensus around it? That’s the opposite of what I would have expected. More people want to regulate CO2 than believe global warming exists? Polling is weird.

A lot of the scrutiny around Ferguson focused on its corrupt police force as an example of white officials fleecing black citizens, and how this might be solved by mobilizing black voters to take control of the government. The Daily Beast has an interesting article on the town next to Ferguson – where black officials fleece black citizens about the same amount.

CVS will allow people to get naloxone without prescriptions in order to fight deaths from opiate overdose (which naloxone treates). The two interesting things I took from this study – first, it’s surprisingly legal to give prescription drugs away without prescriptions if you can get a couple of trade groups to agree to it. Second, maybe this will mean alcoholics can try the Sinclair Method on their own.

Lot of interesting graphs on my Twitter feed this month. Here’s one on how fertility isn’t declining and one on how IQ affects likelihood of escaping poverty (source)

An Italian surgeon is prepared to attempt the world’s first head transplant.

A multinational team says their machine learning program can now predict IQ from MRI images accurately enough that their estimates correlate at 0.71 with the real thing. I asked Twitter what they thought; apparently it’s real prediction rather than “my machine learning algorithm correctly predicted the same data we fed it”, but it might be confounded by the sample of different-aged children; the program might just be reading off whose brain looks older and predicting that older children perform better on IQ tests.

How did surveyors in 1919, long before the computer was invented, calculate the geographical center of the United States?

No Irish Need Apply: A Myth Of Victimization. A historian argues that there are no actual records of 19th century American businesses or advertisments using this phrase, and it was later made up to promote Irish-American solidarity. When asked for comment, experts look shifty and say they “know nothing”.

More strong claims for probiotics: a four-week treatment with a multispecies supplement decreases reactivity to sad mood, considered a risk factor for depression.

Vox writes about Raj Chetty’s theories of location-dependent social mobility, and now it seems that Hillary Clinton is a huge fan. But Steve Sailer points out exactly the same giant gaping radioactive flaw that I noticed – he is basically just noticing that there is less social mobility between races than within them, and that therefore, places with high black populations appear to have less social mobility. Please tell me I’m misunderstanding something and he didn’t actually miss this.

A while back we discussed gender differences in ethical theories. A recent big meta-analysis finds that women are moderately more deontological than men, and men slightly more utilitarian than women. Whatever.

It’s morally wrong to blame a victim’s actions for their own victimization. We should be blaming those victims’ genes. Or something. Not really sure what to do with this one.

Very closely related: a while back I argued that the apparent connection between childhood bullying and psychiatric disorders was way too strong to be real and likely to represent some kind of common confound. Sure enough, when somebody twin-studied it they found that at least in the case of paranoia 93% of the association is likely to represent a common genetic risk factor.

Ready For Hillary? Take Our Quiz And Find Out! Question four: “Her slogan is (a) Ready for Hillary, (b) Resigned to Hillary, (c) Preparing for Chelsea, or (d) What Difference, At This Point, Does It Make?”

19th century polymath Francis Galton was among the first to study the efficacy of prayer, noting among other things that despite all the people praying “God save the King” royals tended to die earlier than other upper-class individuals.

Chris Blattman conducted a study in Liberia that finds that at-risk poor young men given cognitive behavioral therapy were involved in 20-50% less crime, drugs, and violence than a control group, with effects lasting at least a year. This sincerely surprises me. I would pay money to see what James Coyne thinks of this.

New work with odd jellyfish-like creatures called ctenophores raises the surprising question: did neurons evolve twice?

At least three towns have exclamation points in their names: Hamilton!, Ohio; Westward Ho!, Devon, and Saint Louis du Ha! Ha!, Quebec.

In order to prove some kind of point, Ecuador very carefully disguises a portion of its territory as Costa Rica, tells some of its citizens they were going on a trip to Costa Rica, then keeps them in Ecuador. Now it’s an international incident with the Costa Rican government getting involved.

Individual Differences In Executive Function Are Almost Entirely Genetic In Origin. And when they say “almost entirely”, they mean “about 99%”. This doesn’t make sense to me – why should this be the only 99% genetic thing in a world full of cognitive skills that are about 50% genetic? Really looking forward to a replication attempt.

Has Obamacare Turned Voters Against Sharing The Wealth? Maybe not Obamacare specifically, but the magnitude of increasing opposition to redistribution is surprising and disturbing. Also a confusing sign of how poorly trends in media coverage mirror trends in people’s attitudes.

FBI Admits It Fudged Forensic Hair Matches In Nearly All Criminal Trials For Decades. “Oops” doesn’t seem to cut it.

If Douglas Hofstadter wrote erotica (h/t Multiheaded)

Posted in Uncategorized | Tagged | 360 Comments

Blame Theory

It’s always dangerous to speculate about the hidden psychological motives of people you disagree with – this is the sin of Bulverism. But like most sins, it’s also fun. So please forgive me while I talk about blame.

Many people have remarked on the paradox of an academia made mostly of upper-class ethnic-majority Westerners trying so very hard to find reasons why lots of things are the fault of upper-class ethnic-majority Westerners. The simplest example I can think of is attributing the woes of Third World countries to colonialism; without meaning to trivialize the evils of colonization, a lot of academics seem to go beyond what even the undeniably awful facts can support. Dependency theory, for example, is now mostly discredited, as are a lot of the Marxist perspectives. I would provide other examples if I weren’t satisfied you can generate them independently.

This is on the face of it surprising; naively we would expect people to cast themselves and those like them in as positive a light as possible. Forget about whether these attributions of blame are right or wrong. Even if they were right I would not expect people to believe them as enthusiastically as they do.

The theories I’ve heard to explain this paradox are rarely very flattering; usually something about class signaling, or holier-than-thou-ness, or trying to justify the existence of an academic elite.

I want to propose another possibility: what if people are really, fundamentally, good?

Moral philosophy distinguishes between a couple of ethical systems, like deontology, utilitarianism and virtue ethics. Most people without philosophical training settle into a sort of mishmash of all of them, but one which, I think, is closer to deontology than either of the others. Call it Moral Therapeutic Deontology. Like all deontological systems, it focuses on following certain rules: don’t murder, don’t steal, respect your parents, pay back your debts. Like all deontological systems, other things like charity are “supererogatory”, meaning they’re nice but not really necessary. If you’ve got extra time and energy after doing the important stuff, then sure, do the superogatory stuff, whatever, but it’s hardly where your moral focus should be.

On the other hand, when confronted with the full extent of human suffering – whether by living in a poor area, or serving in a war zone, or traveling to a Third World country, or treating depression patients – it’s hard to think about anything else. The sheer burning horribleness of it becomes this unscratchable itch, this flaw in the world that blots out the sun.

And here’s Moral Therapeutic Deontology, saying, “Yeah, helping quench the burning fire of human suffering is nice, but it’s not like a real thing that real morality should care about. It’s not your duty.”

This is some heavy cognitive dissonance. It doesn’t match basic intuitions about the importance of the matter. Even worse, it doesn’t allow you to communicate the importance of the matter to other people. If you say “Look at all these people living squalid and miserable in the slums without any hope,” and they say “Yeah, well, it would be supererogatory to help them and I’m not feeling supererogatory today,” you don’t really have a leg to stand on.

There’s an easy way to resolve the dissonance without abandoning either Moral Therapeutic Deontology or your concern for the less well-off. That resolution is to prove that human suffering is you and your friends’ fault. Deontology very clearly says that if you cause a problem, it’s your job to help fix it. If you can prove that the reason the Third World is suffering is because of First World white people, you have a strong claim that you as a First World white person should be deeply emotionally invested in solving it; that your friends and neighbors, as First World white people, ought to help you; and that your government, as that of a First World majority-white country, is justified in using taxpayer money to get involved.

I think this might be a part of what’s happening. People feel a need to help the less-advantaged so strongly that they come up with a justification to do so that makes sense in their own moral system, whether it’s factually accurate or not.

I am not as fanatical a partisan of utilitarianism as I used to be, but this still seems like one of the situations where it has an obvious advantage. Utilitarianism tells us that we are perfectly justified in seeing the relief of suffering as a pressing need. We don’t need to justify it by positing facts that may later be proven untrue; it is self-justifying. People sometimes complain that a flaw of utilitarianism is that it implies a heavy moral obligations to help all kinds of people whether or not any of their problems are our fault; the world is divided between those who consider that a bug and those who find it a very helpful feature.

I want more people to become familiar with utilitarianism because I think a lot of the colonialism theory stuff is net hurtful. It combines a justification for helping the poor with an insult to people’s identity, and probably makes the former less palatable to many people than it would be naturally. It also makes our need to help the poor hinge on an empirical point; if that empirical point gets disproved, things become pretty awkward.

This theory implies that utilitarian liberals will have all the features of liberalism except the interest in blaming their own group for major problems. My anecdotal experience confirms that. The utilitarians I know are very interested in helping the poor and in various other liberal ideas, but are more likely than other liberals to roll their eyes at talk about colonialism and stereotype threat. I think it’s because they feel confident in their right to care about the disadvantaged regardless.

Polemical Imbalance

Today is an exciting day for me. I got argued against on Mad In America. This one is going straight to my resume.

Mad In America apparently doesn’t like being called an anti-psychiatry blog, so let’s call it a blog…that discusses psychiatry…and doesn’t usually like what it sees. They were heavily involved in popularizing the idea that psychiatry erred grieviously in overselling “chemical imbalance”, and they didn’t much like my post on the same topic:

Alexander argues that the notion that psychiatrists once promoted the idea of low serotonin as a cause of depression and Selective Serotonin Reuptake Inhibitors (SSRIs) as proper treatment for that deficiency is all simply a false “narrative” invented by “antipsychiatry” activists. These activists then “frame it as ‘proof’ that psychiatrists are drug company shills who were deceiving the public.” Alexander points to quotes of American Psychiatric Association officials in a post by MIA Blogger Philip Hickey, and notes that none of the quotes specifically describe a low-serotonin explanation for depression. The Hickey post cited is not actually about that topic, but about the promotion of the phrase “chemical imbalance”; nevertheless, Alexander broadly refers to Hickey and all of Mad in America as “antipsychiatry”, and he then writes, “If the antipsychiatry community had quotes of APA officials saying it’s all serotonin deficiency, don’t you think they would have used them?” Alexander argues, “The idea that depression is a drop-dead simple serotonin deficiency was never taken seriously by mainstream psychiatry.” There seems to be a lot of evidence to the contrary still today readily available even on the web, though.

This is exactly the sort of fight I probably shouldn’t get involved in continuing. But I’m going to do so anyway, because I think Mad In America’s counterargument is actually going to end up supporting my point and maybe shed more light on the situation.

Up there, when they say “Alexander points to quotes of American Psychiatric Association officials in a post by MIA Blogger Philip Hickey, and notes that none of the quotes specifically describe a low-serotonin explanation for depression [but] the Hickey post cited is not actually about that topic, but about the promotion of the phrase ‘chemical imbalance'” – that’s where I get pretty confident they’ve missed my point.

Remember, the thesis of my last post was that the “chemical imbalance” argument hides a sort of bait-and-switch going on between the following two statements:

(A): Depression is complicated, but it seems to involve disruptions to the levels of brain chemicals in some important way

(B): We understand depression perfectly now, it’s just a deficiency of serotonin.

If you equivocate between them, you can prove that psychiatrists were saying (A), and you can prove that (B) is false and stupid, and then it’s sort of like psychiatrists were saying something false and stupid.

Given that this is my thesis, it’s exactly right for me to debate a post on “chemical imbalance” by showing that none of the quotes involved reduce the problem to just a basic serotonin deficiency!

And when Rob Wipond from MIA says he’s found “a lot of evidence to the contrary still readily available even on the web”, well, spoiler, he’s found more people saying A.

II.

Let’s go through his examples:

For example, a 2004 Washington University in St. Louis press release, about a study published in Biological Psychiatry, states that the “brain’s serotonin receptors” are “at abnormally low levels in depressed people” and that antidepressants “work by increasing serotonin levels in the brain.”

I assume he’s talking about this press release about a study that shows abnormally low levels of serotonin receptors in depressed people. First of all, the study actually did show this. I don’t think it’s irresponsible to mention that a study shows low levels of serotonin receptors in depressed people when a study actually shows this. Second of all, the press release makes it extremely clear that they don’t know exactly what’s going on: “Little is understood about how depression makes people feel sad, but neuroscientists do know that the brain chemical serotonin is involved.” They mention that SSRIs appear to work for depression, but admit that “The bad news is that beyond that first step of increasing serotonin, we understand very little about how these drugs relieve symptoms of depression”. Finally, this study actually found something much more complicated than the prevailing narrative – a serotonin deficiency model of depression would have predicted high levels of serotonin receptors in related brain structures (more chemicals = fewer receptors) but in fact it found the opposite. This fits with the emerging theory that depression may be related to increased serotonin levels in certain parts of the brain, which SSRIs provoke a compensatory response against.

This press release is actually as good as the harshest critic could have wished for. It admits we don’t really know how depression works, it admits we don’t really know how SSRIs treat it, and then it presents the result of a study that shows that serotonin is implicated but not in the way the “serotonin deficiency” theory would expect.

The only way Mad In America turned this into a poster child for psychiatry deceiving people about serotonin was to quote from it extremely out of context.

Let’s go to their next example:

And there is prominent psychiatrist Richard Friedman writing in the New York Times in 2007 that psychiatrists were soon going to be able to conduct “a simple blood test” to determine “what biological type of depression” a person had and then treat them with the right drug. “For example,” writes Friedman, “some depressed patients who have abnormally low levels of serotonin respond to S.S.R.I.’s, which relieve depression, in part, by flooding the brain with serotonin.”

Okay, but Friedman starts with a story about how SSRIs often don’t work for patients, then says that this is because some people have depression that doesn’t seem to be serotonergic: “Some depressed patients who have abnormally low levels of serotonin respond to SSRIs, which relieve depression, in part, by flooding the brain with serotonin. Other depressed patients may have an abnormality in other neurotransmitters that regulate mood, like norepinephrine or dopamine, and may not respond to SSRIs”. He says (correctly!) that “in everyday clinical practice, we have little ability to predict what specific treatment will work for you”.

These are not the words of a drug company shill who says that depression is 100% serotonin in order to put everyone on SSRIs! These are the words of someone who agrees with me that depression is somehow related to neurotransmitters, but it’s still very uncertain which ones and how. His only sin seems to be an overly optimistic view of the speed at which we would come out with genetic tests.

Next example:

There’s also a lot of evidence that the low-serotonin theory of depression is still today being taken seriously by mainstream psychiatry and is still being promoted to the public. A current University of Bristol public education website on depression explains that, “Low serotonin levels are believed to be the cause of many cases of mild to severe depression.”

That appears to be this University of Bristol public education website. The site says it’s by “Claire Rosling”, so I searched her name and I get this roster of people’s sophomore chemistry projects. Ms. Rosling’s is…the website Mad In America cited. Apparently this was part of some college chemistry assignment where people write about molecules to compete for a £50 prize. Ms. Rosling’s was serotonin.

So Mad In America argues that the entire psychiatric establishment is pushing the “depression = serotonin” argument, but the best example they can come up with is some poor woman’s undergraduate chemistry homework?

(in case you’re wondering, she didn’t win. Some girl named Anna won for her webpage on Recycling Plastic.)

Next example!

A current Harvard Medical School special health report, “Understanding Depression”, explains that, “Research supports the idea that some depressed people have reduced serotonin transmission. Low levels of a serotonin byproduct have been linked to a higher risk for suicide.”

Once again, holy !@#$, they’re reporting the results of actual studies. It’s dishonest to do studies on serotonin and find that it is linked to depression? Anyway, when I look up the actual report it starts with the following paragraph: “It’s often said that depression results from a chemical imbalance, but that figure of speech doesn’t capture how complex the disease is. Research suggests that depression doesn’t spring from simply having too much or too little of certain brain chemicals. Rather, depression has many possible causes, including faulty mood regulation by the brain, genetic vulnerability, stressful life events, medications, and medical problems. It’s believed that several of these forces interact to bring on depression.”

Once again, this is the best you can do to find psychiatrists pushing an oversimplified version of the chemical imbalance theory??!

Next example:

WebMD’s “Depression Center” states that, “There are many researchers who believe that an imbalance in serotonin levels may influence mood in a way that leads to depression. Possible problems include low brain cell production of serotonin, a lack of receptor sites able to receive the serotonin that is made… According to Princeton neuroscientist Barry Jacobs… common antidepressant medications known as SSRIs, which are designed to boost serotonin levels, help kick off the production of new brain cells, which in turn allows the depression to lift.”

First of all, this page does not use the classic “serotonin deficiency” theory of depression. This is the hippocampal neurogenesis theory, which in my last post I specifically contrasted with the classic serotonin deficiency theory. Yes, it involves serotonin in some way, but since one of the most important facts about depression is that SSRIs treat it, every theory is going to involve serotonin in some way.

Further, right after this paragraph, WebMD continues: “Although it is widely believed that a serotonin deficiency plays a role in depression, there is no way to measure its levels in the living brain. Therefore, there have not been any studies proving that brain levels of this or any neurotransmitter are in short supply when depression or any mental illness develops. Blood levels of serotonin are measurable — and have been shown to be lower in people who suffer from depression – but researchers don’t know if blood levels reflect the brain’s level of serotonin. Also, researchers don’t know whether the dip in serotonin causes the depression, or the depression causes serotonin levels to drop.”

Once again, I see nothing here to indicate that they are covering up flaws in this theory, pushing it to unsuspecting consumers, or claiming that exploratory research is settled science. They’re presenting the best theories we’ve got, then noting how tentative they are and what the flaws are.

(on the other hand, the article does say that there are “40 million” brain cells, when in fact there are about 90 billion. I’m not saying you should trust WebMD, just that they don’t bungle depression in that particular way)

Next example:

And if the theory was never taken seriously and isn’t being taken seriously, no one has apparently told the National Academy of Sciences or two news media outlets with expert psychiatric editorial boards yet. Psychiatry Advisor’s February 12, 2015 headline for a report about a Duke University study is, “Serotonin Deficiency May Up Depression Risk.” Psychiatry Advisor explains that, “(m)ice with normal serotonin levels, the control group, did not demonstrate depression symptoms a week after the social stress, while the serotonin-deficient rodents did(.)” The study, appearing in the Proceedings of the National Academy of Sciences, states that, serotonin deficiency has been “implicated in the etiology of depression” though a cause-effect relationship has not yet been “formally established.” The researchers write that their results, “provide additional insight into the serotonin deficiency hypothesis of depression.” Medical News Today headline their report on it even more strongly: “Mouse study finds that serotonin deficiency does increase depression risk.” (Medical News Today notes in passing that an earlier, somewhat similar study by a different team came to the exact opposite findings.)

At this point Mad in America’s examples are self-refuting. I am getting the impression they will never be happy unless no news media ever covers the dozens of studies that come out each year linking depression to serotonin. I know this sounds mean, but what other conclusion am I supposed to come to? Here we have a study that provides some evidence for serotonin’s involvement, says very specifically that “a cause-effect relationship has not been formally established”, mentions that other studies have shown the opposite – and yet Mad In America still wants me to accept this as an example of irresponsibly pushing the serotonin theory!

Look. Hundreds of studies have shown some sort of relationship between serotonin and depression. At this point that’s not controversial. What’s controversial is the importance of the relationship, whether it’s causal, whether other things matter more, et cetera. Every single one of Mad In America’s examples has been pretty exemplary in saying that all of these things are still uncertain and need to be investigated further. What more could they do to be more responsible? A total blackout on all news coverage of the new evidence for serotonin’s involvement that keeps coming in?

Ironically, if people had done that, we would have far less evidence that depression was not just a simple serotonin deficiency. The most important nail in that theory’s coffin was that tianeptine, a medication that lowers serotonin levels, effectively treats depression. But that’s a study about serotonin of exactly the same sort as the University of Washington study Mad In America complains about! One of the most convincing alternatives to a purely serotonergic picture is the BDNF-neurogenesis theory. But that’s exactly the theory being pushed in the WebMD article Mad In America complains about!

III.

I raised some of these issues in a comment on the Mad in America blog, and author Rob Wipond kindly responded to me:

Let me address some of these objections piece by piece:

Yes, you’re right, many psychiatrists, media, pharmaceutical companies and others promoting the serotonin deficiency theory of depression have often included generalized, softening “qualifiers” and “equivocations” such as the ones you quoted, even as they have also made those very bold, unequivocal claims explicitly intended to persuade that I quoted. Taken in full context, then how are such qualifiers different than brash infomercials on television with legal disclaimers like, “not all results will be the same for all people”?

The very bold, unequivocal claims explicitly intended to persuade that you quoted WERE A SOPHOMORE CHEMISTRY PROJECT, PLUS A BUNCH OF PEOPLE SPECIFICALLY SAYING THAT THESE SHOULD NOT BE TAKEN AS VERY BOLD UNEQUIVOCAL CLAIMS, BUT THEN YOU QUOTED OUT OF CONTEXT TO TAKE THAT PART OUT. AND THE SOPHOMORE CHEMISTRY PROJECT DIDN’T EVEN WIN THE £50 PRIZE.

Okay. Sorry. I shouldn’t have yelled like that. More seriously: there are a lot of things we don’t totally understand, but which scientific research suggests some weak preliminary theories about. For example, we don’t understand fibromyalgia, but if I were writing a textbook on fibromyalgia, or if a patient asked me what it was, then after some appropriate caveats and equivocations, I would say it has something to do with some sort of inflammation in the fascia which causes central sensitization to pain stimuli. Could I end up being totally wrong? Yeah. But at this point I think there’s enough evidence in this direction that, insofar as it’s important to satisfy patients’ curiosity about what’s going on with them, that it’s proper to mention the current best guess. Likewise, if I am a researcher or a scientific publication, I don’t think I have some duty to carefully hide my results. A big part of scientific progress is people saying “I just got some small amount of evidence which makes me think it’s this” and then other people trying to confirm or refute that with more evidence, until eventually it comes together into a strong theory.

I think researchers and psychiatrists were pretty responsible in coming up with the serotonin deficiency theory. It was inspired by the effectiveness of serotonergic drugs. Then a bunch of studies – Wipond agrees there were hundreds – provided results that seemed to confirm it. Given all of this information, I don’t think it was negligent to say that there was quite a bit of evidence pointing to serotonin, as long as you followed this with caveats that the evidence was still preliminary and lots of other things seemed to be involved too. As I’ve been arguing all along, that’s exactly what most people did.

I know of no one (certainly not me) who has ever said that there were never any studies making tenuous, feeble attempts to draw links between serotonin levels and depression in different ways — there were hundreds, I believe (I haven’t counted) as the psychiatric community and pharmaceutical industry made enormous efforts to try to prove the theory or buttress its apparent validity in the public eye. And as I note, those are still being produced today. What critics have often correctly pointed out, however, is that the main, strongest argument that psychiatrists have often used in support of the low-serotonin theory has always been that SSRIs allegedly boost serotonin levels. Of course, most of the public has never known that SSRIs have barely beaten placebos in clinical trials, so they’ve not been able to understand the true spuriousness of even that argument.

I don’t really understand this objection. There was a strong piece of evidence in favor of serotonin in the form of SSRI-effectiveness, scientists pursued that lead by doing hundreds of studies implicating serotonin using different methodologies, most were in favor and so scientists thought the theory had some merit…what exactly is wrong here? This sounds like every scientific theory – Wegener noted that continents looked like they fit together in a way that implied continental drift, geologists did hundreds of other studies that all pointed to continental drift, therefore they started believing in continental drift.

While Wipond may not know of the people saying there was no evidence for serotonin besides SSRI effectiveness, these people certainly exist and provide one of his side’s major arguments. Indeed, many of the articles I linked to on my original post made exactly that argument. The BBC said that: “although ideas like the serotonin theory of depression have been widely publicised, scientific research has not detected any reliable abnormalities of the serotonin system in people who are depressed.” New York Review Of Books says: “Instead of developing a drug to treat an abnormality, an abnormality was postulated to fit a drug…But the main problem with the theory is that after decades of trying to prove it, researchers have still come up empty-handed”.

To learn more about the claim that SSRIs barely beat placebo, see my article on this.

I notice that your argument has now changed to “far fewer people” and “less important” psychiatrists made such claims, rather than none at all made such claims. Very well; apparently we would now only potentially disagree on subjective notions such as how many is “fewer” and how unimportant is “less important”, rather than disagreeing on the main issue at hand. And then Leo and Lacasse’s question becomes all the more significant: Where is the evidence that the “important” psychiatrists were vigorously trying to correct the public record and clarifying that these were only weak hypotheses with no compelling evidence to support them instead of weighty theories with ever mounting evidence to support them?

In retrospect, “no one has ever said” is a stupid thing for me to have said. I do not deny that a sophomore at University of Bristol once said low serotonin caused depression. And you can find individual psychiatrists who believe a lot of stupid stuff. Some psychiatrists believe in homeopathy. Some psychiatrists believe in reincarnation (the guy in that article conducted my job interview at the University of Virginia. I tried to be very polite.) Some psychiatrists believe that after losing hundreds of thousands of dollars to online Nigerian scammers, it makes perfect sense to give hundreds of thousands more dollars to other Nigerian scammers, because “these were different Nigerians”. But I will venture to say none of these are consensus positions in the psychiatric community.

And this is why I wanted to continue this discussion here on this blog. If I had selected a set of statements from eminent psychiatrists that had lots of caveats and were extremely responsible, I could be justifiably accused of cherry-picking. Instead, Mad In America selected some statements, probably intending to cherry-pick the other way, but when looked at more closely, they’re all pretty responsible and say exactly what I would have said at the time – SSRIs seem to work, there’s some evidence pointing to serotonin being involved, but the whole thing is terribly complicated. To me, this establishes the consensus position in a way much more clearly than I could have done on my own.

And yes, this consensus position got simplified and distorted. I have no doubt that drug companies drew from it to do exactly the sort of infomercials that Mr. Wipond describes. I have no doubt that individual psychiatrists, when faced with low-functioning patients who are bad at understanding complicated systems but who really wanted to know what was going on, said “serotonin” and left it at that. And I have no doubt that to a public who still largely think evolution means “once upon a time a chimp gave birth to a human baby”, complicated caveats about how serotonin levels are linked to depression but might not cause depression largely went over their heads except for the single word “serotonin”.

(“That’s the happiness molecule! Right?”)

But in general I think my point stands. “Chemical imbalance” as generally used points to a sophisticated model of interacting metabolic pathways which goes far beyond serotonin, and which as far as I know is still very much on the table. While serotonin was justifiably pointed to as a promising candidate early on, it was generally done with appropriate caveats that turned out to be warranted, and the research community has now retreated from some of that earlier language while still considering serotonin a promising lead. And SSRIs continue to be moderately effective antidepressants in the people for whom they are indicated.

(ie somewhere less than half of the people for whom they are prescribed).

Posted in Uncategorized | Tagged | 119 Comments

Trouble Walking Down The Hallway

Williams and Ceci just released National Hiring Experiments Reveal 2:1 Faculty Preference For Women On STEM Tenure Track, showing a strong bias in favor of women in STEM hiring. I’ve previously argued something like this was probably the case, so I should be feeling pretty vindicated.

But a while ago I wrote Beware The Man Of One Study, in which I wrote that there is such a variety of studies finding such a variety of contradictory things that anybody can isolate one of them, hold it up as the answer, and then claim that their side is right and the other side are ‘science denialists’. The only way to be sure you’re getting anything close to the truth is to examine the literature of an entire field as a gestalt.

And here’s something no one ever said: “Man, I’m so glad I examined the literature of that entire field as a gestalt, things make much more sense now.”

Two years ago Moss-Racusin et al released Science Faculty’s Subtle Gender Biases Favor Male Students, showing a strong bias in favor of men in STEM hiring. The methodology was almost identical to this current study, but it returned the opposite result.

Now everyone gets to cite whichever study accords with their pre-existing beliefs. So Scientific American writes Study Shows Gender Bias In Science Is Real, and any doubt has been deemed unacceptable by blog posts like Breaking: Some Dudes On The Internet Refuse To Believe Sexism Is A Thing. But the new study, for its part, is already producing headlines like The Myth About Women In Science and blog posts saying that it is “enough for everyone who is reasonable to agree that the feminists are spectacular liars and/or unhinged cranks”.

So probably we’re going to have to do that @#$%ing gestalt thing.

Why did these two similar studies get such different results? Williams and Ceci do something wonderful that I’ve never seen anyone else do before – they include in their study a supplement admitting that past research has contradicted theirs and speculating about why that might be:

1. W&C investigate hiring tenure-track faculty; MR&a investigate hiring a “lab manager”. This is a big difference, but as far as I can tell, W&C don’t give a good explanation for why there should be a pro-male bias for lab managers but a pro-female bias for faculty. The best explanation I can think of is that there have been a lot of recent anti-discrimination campaigns focusing on the shortage of female faculty, so that particular decision might activate a cultural script where people think “Oh, this is one of those things that those feminists are always going on about, I should make sure to be nice to women here,” in a way that just hiring a lab manager doesn’t.

Likewise, hiring a professor is an important and symbolic step that…probably doesn’t matter super-much to other professors. Hiring a lab manager is a step without any symbolism at all, but professors often work with them on a daily basis and depend on their competency. That might make the first decision Far Mode and the second Near Mode. Think of the Obama Effect – mildly prejudiced people who might be wary at the thought of having a black roommate were very happy to elect a black President and bask in a symbolic dispay of tolerance that made no difference whatsoever to their everyday lives.

Or it could be something simpler. Maybe lab work, which is very dirty and hands-on, feels more “male” to people, and professorial work, which is about interacting with people and being well-educated, feels more “female”. In any case, W&C say their study is more relevant, because almost nobody in academic science gets their start as a lab manager (they polled 83 scientists and found only one who had).

2. Both W&C and MR&a ensured that the male and female resumes in their study were equally good. But W&C made them all excellent, and MR&a made them all so-so. Once again, it’s not really clear why this should change the direction of bias. But here’s a hare-brained theory: suppose you hire using the following algorithm: it’s very important that you hire someone at least marginally competent. And it’s somewhat important that you hire a woman so you look virtuous. But you secretly believe that men are more competent than women. So given two so-so resumes, you’ll hire the man to make sure you get someone competent enough to work with. But given two excellent resumes, you know neither candidate will accidentally program the cyclotron to explode, so you pick the woman and feel good about yourself.

And here are some other possibilities that they didn’t include in their supplement, but which might also have made a difference.

3. W&C asked “which candidate would you hire?”. MR&a said “rate each candidate on the following metrics” (including hireability). Does this make a difference? I could sort of see someone who believed in affirmative action saying something like “the man is more hireable, but I would prefer to hire the woman”. Other contexts prove that even small differences in the phrasing of a question can lead to major incongruities. For example, as of 2010, only 34% of people polled strongly supported letting homosexuals serve in the military, but half again as many – a full 51% – expressed that level of support for letting “gays and lesbians” serve in the military. Ever since reading that I’ve worried about how many important decisions are being made by the 17% of people who support gays and lesbians but not homosexuals.

For all we know maybe this is the guy in charge of hiring for STEM faculty positions

4. Williams and Ceci asked participants to choose between “Dr. X” (who was described using the pronouns “he” and “him”) and “Dr. Y” (who was described using the pronouns “she” and “her”). Moss-Racusin et al asked participants to choose between “John” and “Jennifer”. They said they checked to make sure that the names were rated equal for “likeability” (whatever that means), but what if there are other important characteristics that likeability doesn’t capture? We know that names have big effects on our preconceptions of people. For example, people with short first names earn more money – an average of $3600 less per letter. If we trust this study (which may not be wise), John already has a $14,400 advantage on Jennifer, which goes a lot of the way to explaining why the participants offered John higher pay without bringing gender into it at all!

Likewise, independently of a person’s gender they are more likely to succeed in a traditionally male field if they have a male-sounding name. That means that one of the…call it a “prime” that activates sexism…might have been missed by comparing Dr. X to Dr. Y, but captured by pitting the masculine-sounding John against the feminine-sounding Jennifer. We can’t claim that W&C’s subjects were rendered gender-blind by the lack of gender-coded names – they noticed the female candidates enough to pick them twice as often as the men – but it might be that not getting the name activated the idea of gender from a different direction than hearing the candidates’ names would have.

5. Commenter Lee points out that MR&a tried to make their hokey hypothetical hiring seem a little more real than W&C did. MR&a suggest that these are real candidates being hired…somewhere…and the respondents have to help decide whom to hire (although they still use the word “imagine”). W&C clearly say that this is a hypothetical situation and ask the respondents to imagine that it is true. Some people in the comments are arguing that this makes W&C a better signaling opportunity whereas MR&a stays in near mode. But why would people not signal on a hiring question being put to them by people they don’t know about a carefully-obscured situation in some far-off university? Are sexists, out of the goodness of their hearts, urging MR&a to hire the man out of some compassionate desire to ensure they get a qualified candidate, but when W&C send them a hypothetical situation, they switch back into signaling mode?

6. Commenter Will points out that MR&a send actual resumes to their reviewers, but W&C send only a narrative that sums up some aspects of the candidates’ achievements and personalities (this is also the concern of Feminist Philosophers). This is somewhat necessitated by the complexities of tenure-track hiring – it’s hard to make up an entire fake academic when you can find every published paper in Google Scholar – but it does take them a step away from realism. They claim that they validated this methodology against real resumes, but it was a comparatively small validation – only 35 people. On the other hand, even this small validation was highly significant for pro-female bias. Maybe for some reason getting summaries instead of resumes heavily biases people in favor of women?

Or maybe none of those things mattered at all. Maybe all of this is missing the forest for the trees.

I love stories about how scientists set out to prove some position they consider obvious, but unexpectedly end up changing their minds when the results come in. But this isn’t one of those stories. Williams and Ceci have been vocal proponents of the position that science isn’t sexist for years now – for example, their article in the New York Times last year, Academic Science Isn’t Sexist. In 2010 they wrote Understanding Current Causes Of Women’s Underrepresentation In Science, which states:

The ongoing focus on sex discrimination in reviewing, interviewing, and hiring represents costly, misplaced effort: Society is engaged in the present in solving problems of the past, rather than in addressing meaningful limitations deterring women’s participation in science, technology, engineering, and mathematics careers today. Addressing today’s causes of underrepresentation requires focusing on education and policy changes that will make institutions responsive to differing biological realities of the sexes.

So they can hardly claim to be going into this with perfect neutrality.

But the lead author of the study that did find strong evidence of sexism, Corinne Moss-Racusin (whose name is an anagram of “accuser on minor sins”) also has a long history of pushing the position she coincidentally later found to be the correct one. A look at her resume shows that she has a bunch of papers with titles like “Defending the gender hierarchy motivates prejudice against female leaders”, “‘But that doesn’t apply to me:’ teaching college students to think about gender”, and “Engaging white men in workplace diversity: can training be effective?”. Her symposia have titles like “Taking a stand: the predictors and importance of confronting discrimination”. This does not sound like the resume of a woman whose studies ever find that oh, cool, it looks like sexism isn’t a big problem here after all.

So what conclusion should we draw from the people who obviously wanted to find a lack of sexism finding a lack of sexism, but the people who obviously wanted to find lots of sexism finding lots of sexism?

This is a hard question. It doesn’t necessarily imply the sinister type of bias – it may be that Drs. Williams and Ceci are passionate believers in a scientific meritocracy simply because that’s what all their studies always show, and Dr. Moss-Racusin is a passionate believer in discrimination because that’s what her studies find. On the other hand, it’s still suspicious that two teams spend lots of time doing lots of experiments, and one always gets one result, and the other always gets the other. What are they doing differently?

Problem is, I don’t know. Neither study here has any egregious howlers. In my own field of psychiatry, when a drug company rigs a study to put their drug on top, usually before long someone figures out how they did it. In these two studies I’m not seeing anything.

And this casts doubt upon those four possible sources of differences listed above. None of them look like the telltale sign of an experimenter effect. If MR&a were trying to fix their study to show lots of sexism, it would have taken exceptional brilliance to do it by using the names “John” versus “Jennifer”. If W&C were trying to fix their study to disguise sexism, it would have taken equal genius to realize they could do it by asking people “who would you hire?” rather than “who is most hireable?”.

(the only exception here is the lab manager. It’s just within the realm of probability that MR&a might have somehow realized they’d get a stronger signal asking about lab managers instead of faculty. The choice to ask about lab managers instead of faculty is surprising and does demand an explanation. And it’s probably the best candidate for the big difference between their results. But for them to realize that they needed to pull this deception suggests an impressive ability to avoid drinking their own Kool-Aid.)

Other than that, the differences I’ve been considering in these studies are the sort that would be very hard to purposefully bias. But the fact that both groups got the result they wanted suggests that the studies were purposefully biased somehow. This reinforces my belief that experimenter effects are best modeled as some sort of mystical curse incomprehensible to human understanding.

(now would be an excellent time to re-read the the horror stories in Part IV of “The Control Group Is Out Of Control”)

Speaking of horror stories. Sexism in STEM is, to put it mildly, a hot topic right now. Huge fortunes in grant money are being doled out to investigate it (Dr. Moss-Racusin alone received nearly a million dollars in grants to study STEM gender bias) and thousands of pages are written about it every year. And yet somehow the entire assembled armies of Science, when directed toward the problem, can’t figure out whether college professors are more or less likely to hire women than men.

This is not like studying the atmosphere of Neptune, where we need to send hundred-million dollar spacecraft on a perilous mission before we can even begin to look into the problem. This is not like studying dangerous medications, where ethical problems prevent us from doing the experiments we really need. This is not like studying genetics, where you have to gather large samples of identical twins separated at birth, or like climatology, where you hang out at the North Pole and might get eaten by bears. This is a survey of college professors. You know who it is studying this? College professors. The people they want to study are in the same building as them. The climatologists are getting eaten by bears, and the social psychologists can’t even settle a question that requires them to walk down the hallway.

It’s not even like we’re trying to detect a subtle effect here. Both sides agree that the signal is very large. They just disagree what direction it’s very large in!

A recent theme of this blog has been that Pyramid Of Scientific Evidence be damned, our randomized controlled trials suck so hard that a lot of the time we’ll get more trustworthy information from just looking at the ecological picture. Williams and Ceci have done this (see Part V, Section b of their supplement, “Do These Results Differ From Actual Hiring Data”) and report that studies of real-world hiring data confirm women have an advantage over men in STEM faculty hiring (although far fewer of them apply). It also matches the anecdotal evidence I hear from people in the field. I’m not necessarily saying I’m ambivalent between the two studies’ conclusions. Just that it bothers me that we have to go to tiebreakers after doing two good randomized controlled trials.

At this point, I think the most responsible thing would be to have a joint study by both teams, where they all agree on a fair protocol beforehand and see what happens. Outside of parapsychology I’ve never heard of people taking such a drastic step – who would get to be first author?! – but at this point it’s hard to deny that it’s necessary.

In conclusion, I believe the Moss-Racusin et al study more, but I think the Williams and Ceci study is more believable. And the best way to fight sexism in science is to remind people that it would be hard for women to make things any more screwed up than they already are.

OT18: Istanbul, Not Constantinopen

…or Lygos, Byzantium, Miklagard, Nea Roma, Tsargrad, or any of the others

This is the semimonthly open thread. Post about anything you want, ask random questions, whatever. Also:

1. Still busy. Continue to expect less blogging.

2. There were some good counterarguments to some of the links in my last link roundup that I wanted to highlight before anyone gets misled. Luis Coelho points out some problems with the microbiome-sweetener theory. Albatross doesn’t believe that hospitals are raking in the dough. And Scott Sumner says contra Paul Krugman that interstate migration to the south is driven by low taxes, not good weather.

3. Comments of the week are Navin Kumar on poverty traps, Ivvenalis on military suicide, and Janet Johnson on what her work as an educational consultant has taught her about “growth mindset”

4. I’ve given up on Ozy ever posting any more open threads at their place, but you still can’t discuss race and gender here. No, life isn’t fair.

Posted in Uncategorized | Tagged | 607 Comments