On aggregation/averaging of poll results

Garbage in, less accurate forecast out


  1. Interesting analysis by @ForecasterEnten - I have one main comment below, and am making it in this Storify rather than trying to condense it to 140 characters.
  2. Harry notes the argument that averaging poll results in the aggregate produces a better forecast than relying on any single poll. After comparing error rates in races with and without "gold-standard" polls, he concludes: "Both of these error rates do suggest that averaging polls leads to lower error rates, but you need better polls in order to get the best predictions." 

    My question: If "nontraditional" polls perform worse than "gold-standard" polls, in cases where there are "gold-standard" polls, what good comes from throwing "nontraditional" estimates into the average? Doesn't that just diminish the accuracy of the forecast? (That said, I wholeheartedly agree with Harry's broader point that we face a dangerous shortage of higher quality polls, so that where there only are "nontraditional" polls the question becomes whether those are better than no polls at all.)

    I've long believed (and in years of guiding Associated Press reporters in how to cover polls, I preached) that in assessing a race, it's better to consider multiple surveys from KNOWN GOOD pollsters - those using methodologies proven to be valid and reliable - rather than focusing on any one of them. But the move toward poll aggregation in recent years has entailed throwing all kinds of polls into the mix. Some aggregators, like @FiveThirtyEight and @UpshotNYT, weight pollsters based on track record and prima facie risk of bias (campaign polls), but the evidence presented by Enten and others before him seems to suggest these models would produce more accurate estimates if they excluded lesser polls altogether. 

    To put it coarsely: Garbage in, less accurate aggregate forecast out. 

    Before my inbox erupts in epic conflagration, I'd hasten to add: No, "nontraditional" does not automatically = "garbage." In fact some nontraditional pollsters laudably put a great deal of thought and effort into their work, adhering to many "best practices" though straying from one or more "traditional" approaches. Note however that the two preceding sentences combine for far more than 140 characters.

    I also realize there is subjectivity in all of this. One person's garbage may be another's roses. Harry's definition may exclude some "gold standard" surveys from the category. Etc. Ideally poll aggregators give us tools to be able to make our own filtering decisions, as HuffPost Pollster does with its "Create Your Own" functionality.

    Now let me go and secure the asbestos lining in my inbox.