N-best evaluation of scientists

A better way to evaluate scientists?


  1. A little while ago I posed a question on Twitter about evaluating scientists on a sample of their best work, rather than all their work. I wanted to know if anyone was familiar with such a system.
  2. This stemmed from a conversation I had with a colleague (over email) about alternative ways of professionally evaluating scientists. In particular, we shared a concern about practices that disadvantage people who do difficult and labor-intensive work, like working with hard-to-reach and marginalized populations, doing longitudinal and field research, using expensive or time-consuming methods, etc. We had both heard this proposal floated, but we didn't know of any examples of it or anything discussing how to implement.
  3. I got a bunch of useful responses, so I've collated them here.
  4. Something I didn't initially mention, but that is important, is that N-best evaluation would probably cover a defined, recent window.
  5. This kind of evaluation would have a few effects. For the scientist, it would mean that brute quantity would have diminishing marginal returns. Limited time means we often face a quantity-quality tradeoff -- you could spend more time on fewer publications, or less time apiece on more. N-best evaluation requires some foundational amount of productivity, but after a certain point it would shift the incentive toward producing high-quality work instead of just churning out vita-filler.
  6. Importantly, it is not enough just to ask the scientist to nominate N recent papers - you have to have an evaluation framework that uses those nominations appropriately. The evaluation framework should put an onus on evaluating committees to read the actual work and form an expert judgment of its merit.
  7. That judgment then becomes a counterweight to flawed and gameable metrics like number of publications, journal impact factors, citation counts, h-indices, etc. It would be possible to do an evaluation where those other things get some weight too, if the evaluators think they carry useful information. Or an N-best evaluation could be the *only* index of scientific accomplishments.
  8. I was most interested in whether this kind of evaluation has been written about, documented, or formalized anywhere. Several responses pointed me to specific proposals, or to written documents about N-best evaluation. A really good, succinct proposal comes from a computer science society - but it is framed in terms that would readily translate to other fields:
  9. Others have written on related ideas too:
  10. A few people also mentioned N-best being used in their departments or universities or for fellowship programs. There seemed to be varying degress of specificity or transparency to the scientists about how it's implemented on the evaluation side.