Archive for July, 2006
Stats.org breastfeeding article, Part 2 – In which we get bogged down in the murky details of statistics
(This post first appeared on the Good Enough Mum blog, here.)
The story so far: Goldin, Smyth, and Foulkes, of STATS.org, claim to have the truth about What Science Really Says About Breastfeeding – unlike the AAP and the NYT, who are, allegedly, using sloppy science and misleading us all on the issue. They start out their article by listing what would appear to be every possible or potential breastfeeding-related problem they could manage to come up with. Having thus set the scene for their impartial and unbiased approach to the subject, they proceed to discuss the statistical evidence.
Hang onto your hats – we may have to start getting technical at this point. If I’m going too fast, just wave your arms at me and yell loudly, or something.
The article does raise some crucial points about the difficulties with research into breastfeeding. As they point out, it is not possible (for obvious ethical reasons) to conduct the gold standard of research – a trial in which mothers are assigned by the toss of a coin or equivalent procedure into breastfeeding or non-breastfeeding groups. (One point that I must make here, to soothe my pedantic little soul – this type of trial would be a randomised controlled trial, not, as they called it, a ‘case-controlled study’. A case-control study is something completely different. While it doesn’t ultimately make a difference to the point they were making, I did find it bizarre that two statistics professors could make such an elementary mistake.)
Non-randomised studies have a flaw in them from the start – they’re subject to what we call confounding factors. Mothers and babies who breastfed are likely to differ in other crucial ways from mothers and babies who didn’t. Women who choose to breastfeed may well be making other choices about their parenting that differ from those of women who choose to formula-feed; women who are unable to breastfeed or to continue breastfeeding may have been rendered unable by some factor that, in itself, is relevant to the baby’s health. This makes it difficult to know to what extent the differences found between breastfed and non-breastfed babies are due to the breastfeeding itself, and to what extent they’re due to factors that tend, in practice, to be associated more with breastfeeding than with formula feeding or vice versa.
There are statistical ways to take confounding factors into account in a study analysis and hence cancel out their effect on the end results, and any good-quality research will do this as far as possible. The problem, however, is that we can only do that for confounders that we know of and can collect data on. This is a potential source of bias in any non-randomised study. It’s an inevitable flaw in breastfeeding research, and STATS.org are quite right to point it out.
However, using this problem as a reason to be appropriately cautious about interpretation of results is one thing; using it selectively as an excuse to reject only the research whose results you don’t like is another. I’ve previously mentioned one of our most deep-rooted sources of bias; our tendency to reserve our criticisms of study design only for studies whose conclusions we don’t like. This article was, as it happened, the perfect example. Smoking can no more be randomised than breastfeeding can, and hence all our existing research into the harms of smoking in humans is based on non-randomised studies. But STATS.org’s criticism of the research into breastfeeding (which they ultimately dismiss as “voodoo science”) stands in stark contrast to their unquestioning acceptance of the research showing that smoking during pregnancy is harmful.
Please don’t misunderstand this: I am not saying that smoking during pregnancy is harmless. Quite the reverse. I am saying that in spite of the flaws inherent in non-randomised studies, we have no problem saying that the research on smoking and pregnancy is sufficient for us to accept a harmful effect. We don’t dismiss that evidence out of hand simply because the studies aren’t perfect; and, similarly, we are not justified in simply dismissing the huge number of studies that show beneficial effects from breastfeeding.
A far more realistic and constructive approach would be to consider what criteria a good-quality study should fit, pick out the studies that met those criteria, and consider the strengths and weaknesses of the evidence overall. An article aimed at doing that could have been both useful and fascinating. (Writing it is on my list of things to do in that mysterious alternative universe I keep hoping to stumble into where I actually get large amounts of spare time.) Goldin, Smyth and Foulkes, however, simply seem to have picked out a few studies they could pick at and acted as though these were representative of the body of research generally.
For example, the article’s conclusion that the benefits of breastfeeding are limited to ‘certain kinds of low-risk infections’ seem to be based largely on analysis of a single study. Not only was the study in question fairly small, but, from the STATS.org description of it, it seems the two groups being compared could be roughly described, not as “ever breastfed” and “never breastfed”, but as “sometimes breastfed, quite a lot of formula” and “sometimes formula-fed, quite a lot of breastfeeding”. This is a design flaw that is automatically going to cause the study to underestimate any breastfeeding benefits, because the effect is going to be so diluted by the overlap between the groups. In view of these problems, it’s telling that this study came up with any benefits for breastfeeding – we really can’t deduce much from the fact that the benefits it found were limited. STATS.org, however, seem to be taking it as the final word on the matter.
Now, the AAP position paper on breastfeeding from which STATS.org takes this reference cites – by my count – sixty-eight references for studies showing possible short-term or long-term benefits for breastfed babies (plus fourteen references to potential benefts for the mother). STATS.org single out a grand total of five of these for specific discussion (if we count the passing mention of the studies on breastfeeding and diabetes as ‘discussion’). So, out of all those dozens of studies, why did STATS.org place so much weight on one that seems so likely to underestimate benefits of breastfeeding?
The only reason we’re given why this particular study is singled out for mention is that it is, supposedly, an example of one of many studies that, according to STATS.org, “simply didn’t find what AAP claimed they did”. In other words, STATS.org claim that AAP are making incorrect claims about study findings. A serious accusation indeed.
Except that it doesn’t seem to be true. Or, at any rate, the authors totally fail to produce any evidence to support it. They claim that the lack of difference of rates of respiratory infection in the study “contradicts the AAP’s claim that there were decreased upper and lower-respiratory illnesses for nursed babies”. But the AAP didn’t claim that this particular study showed a difference in rates of respiratory infection. They say that it showed a difference in rates of diarrhoea – which it does indeed. (They cited nine studies as reference for their claim that rates of respiratory tract infection are decreased. Goldin, Smyth and Foulkes discuss none of these.)
Are the authors deliberately lying, or are they just very sloppy about checking details? Either way, it doesn’t say much for their reliability. We are given no details on the other supposed studies that “simply didn’t say what the AAP claimed they did”, so I couldn’t assess whether there was any truth to this claim at all. However, this mistake on the part of STATS.org doesn’t bode well.
What did STATS.org tell us about the other studies it discussed? The most important was the Chen and Rogan study on which the AAP base their claim of reduced mortality in breastfed babies. STATS.org dismiss this on the grounds that the study showed that breastfed infants were less likely to die of injuries. True, but certainly not the whole truth.
There’s another statistical concept that needs explaining briefly here – the idea of statistical significance. Simply put, statistical significance is the likelihood that any findings in a study are down to something more than just coincidence. It’s normal to get small differences between the outcomes in two groups purely by chance, just as it’s normal to get 501 heads rather than 500 if you flip a coin a thousand times. But if a thousand coin flips come up with 600 heads, there’s probably something about the coin that’s giving you that result; and, similarly, the larger the differences in outcomes between two groups that differ only in the factor you’re studying, the larger the likelihood that the differences in outcomes are genuinely due to differences in that factor rather than to sheer coincidence. By convention, once the chances of getting a particular result by sheer chance are less than one in twenty then that result is held to be ‘statistically significant’.
The difference in size between two outcomes necessary for the result to be statistically significant depends, among other things, on the frequency of the outcomes. With small groups, a tiny difference between the numbers is less statistically significant than it would be with big groups. (If you flip a coin 1000 times and get 600 heads, there’s probably something odd about the coin – if you flip a coin 10 times and get 6 heads, there’s nothing particularly significant about that, even though the proportion of heads is the same in each case.) Hence, when you’re studying an outcome that’s as rare as infant death in the USA fortunately is, a difference between the figures in two groups has to be quite a sizeable percentage of the overall numbers in order to show up as statistically significant. The more you split the groups down into sub-groups, the less likely it is that even a genuine difference will achieve statistical significance, because there just won’t be the numbers for it to do so.
This, as far as I can tell, is what seems to have happened in the Chen and Rogan study. The author looked at death rates across the board (the only causes excluded from their analysis were cancers and congenital birth defects). Death rates were down overall and in each subgroup studied. However, when the deaths were divided into separate groups, although each group showed a reduction in death rates, the groups of babies dying from infections, SIDS, or other causes were too small for a small difference to show up as statistically significant. It’s only when you combine all the deaths from all causes that you get a group large enough for the statistical significance to show up.
Now, this study is certainly not without flaw (something the authors themselves freely acknowledge). And it’s also worth noticing that even if the 21% reduction in death rates is the true figure and not due to some confounding factor for which the authors couldn’t adjust, that equates to an extremely small risk for any individual formula-fed infant – that level of risk would mean that for every fifty thousand children not breastfed, nine would die as a result. But looking at the results realistically is one thing; dismissing them on spurious grounds because they don’t happen to suit you is another thing entirely, especially when other studies have come up with similar evidence. (STATS.org tell us that the reduced rates of SIDS in this study weren’t statistically significant; what they don’t mention are the other studies cited by the AAP that show a possible link.)
The only other three studies about which STATS.org had anything to say were the three pointing towards a possible association between breastfeeding and decreased risk of diabetes. Two of these were apparently dismissed on the grounds of being based on Chilean and Pima Indian children respectively (why this should be grounds for ignoring them was not explained). The third study, the authors claim, “only found results for children exposed to food. Infant formula wasn’t even considered!” Which is most peculiar, because when I checked out the abstract it certainly mentioned finding an association between diabetes and early cow’s milk exposure (in babies who were already at high risk of diabetes), and cow’s milk was a major ingredient of formula last time I checked.
Of course, although STATS.org are incorrect in saying that no benefit has hitherto been shown of breastfeeding as far as diabetes prevention goes, it’s true that the evidence so far is still in the early and tentative stages. But the AAP’s paper doesn’t try to claim otherwise – diabetes was one of the conditions listed in the section that specified “Some studies suggest decreased rates… Additional research in this area is warranted.” So, again – why did the authors single out this particular topic for further discussion, when several important risks for which the AAP do claim strong evidence of benefit from breastfeeding (meningitis, sepsis, necrotising enterocolitis) were ignored?
Because, it seems, this was their chance to get in a swipe at the NYT. “The Times takes the concept that an indictment is as good as a conviction to new heights” trumpet the authors, under the subheading “Baseless reporting”. What they conveniently omitted to mention was that the Times did actually specify that there wasn’t enough evidence to prove a link. I don’t know whether STATS.org are bashing the NYT solely in order to discredit what they have to say about breastfeeding, or whether it’s actually the other way round and they have some grudge against the NYT which is colouring their interpretation of subjects on which the NYT report. What I do know is that by this stage it was clear that, whatever the authors pretended, they weren’t even attempting to look at the NYT article impartially.
They use the same technique of telling only part of the truth in order to pooh-pooh the AAP’s conclusions about the economic benefits of breastfeeding. The AAP, they say, “is not officially in the business of making economic calculations” (side note: is that true? As an employee of the National Health Service, I’m intrigued by the idea of a country in which a major medical body can get away without being in the business of making economic calculations), and their arguments about the economic benefits “are simply bad (social) science, and are fed by conviction or opportunism rather than hard evidence”. But what they fail to mention is that the AAP aren’t simply making it up as they go along; they cite four studies and two economic analyses (which appear, from the government think-tanks mentioned in the article, to have been done by people who are officially in the business of making economic calculations) as evidence for their claims. (One of the studies was a comparison of breastfeeding and formula-feeding among employed mothers, making a nonsense of STATS.org’s claim that economic benefits would be cancelled out by the incompatibility of breastfeeding and full-time employment.)
So, the authors conclude, what should we take away from this? Their “inescapable conclusion” is, apparently, that it is “nothing short of irresponsible” for a public health campaign to have compared not breastfeeding to smoking during pregnancy. (This was, apparently, their biggest concern with the whole NYT article; I was somewhat amused that it was that, rather than the comparison with riding a mechanical bull during pregnancy, that apparently struck them as shockingly inappropriate.)
They also make one rather good point in their conclusion; namely, that we take risks every day, with our children as well as ourselves (crossed a road with your child recently?), and that it’s quite normal to accept a certain amount of risk if you feel the benefits are worthwhile. But to make these sorts of choices, we need accurate information about what the risks and benefits actually are. On the subject of choosing not to breastfeed, STATS.org mislead us sadly, and to an extent that can only be deliberate, about both.