1. Trying to evaluate topic modeling results. Anyone have some rigorous suggestions? Looking at MALLET diagnostic results: article.gmane.org/gmane.comp.ai.…
  2. .@mwidner 'Topic' meaning is subjective, so evaluation criteria depends on use. Classification? Compare to other classifications, etc.
  3. @scott_bot @mwidner There are quantitative approaches (look at D. Mimno's online vita for leads), but actually ... what Scott said is right.
  4. @mwidner You're looking at it all wrong! Admitting subjectivity means you can justify your topic model pretty much no matter what.
  5. @scott_bot @Ted_Underwood Besides: too much subjectivity in such work can lead to simply confirming intuitions/biases.
  6. @Ted_Underwood @scott_bot @mwidner This is useful for me. I'm finding that interpreting the results is way harder than the actual modeling.
  7. @Ted_Underwood @scott_bot @mwidner (Although it's entirely possible that modeling would be harder for me if I knew what I was doing.)
  8. @scott_bot @mwidner You can sometimes discern "artefacts" -- e.g. a topic shaped by arbitrary overlap of names (George Eliot / TS Eliot).
  9. @miriamkp @Ted_Underwood @scott_bot Right! The modeling is easy, but it's a black box. To understand the box is to interpret better (I hope)
  10. @mwidner @scott_bot I'd say that topic modeling works if & when you discover something interesting and *new*. Confirming what we know (+)
  11. @miriamkp Have you looked at @scott_bot 's guided tour? http://www.scottbot.net/HIAL/?p=19113
  12. @mwidner @scott_bot I have indeed! So useful. I'm getting to the point where I think I might need to read that Blei article.
  13. @mwidner @scott_bot "works" in an algorithmic, but not in a humanistic sense. (-)
  14. @mwidner @miriamkp @Ted_Underwood We choose parameters based on how output intuitively feels; doesn't mean not useful, just not objective.
  15. @Ted_Underwood @mwidner What Ted said. Use it for discovery, comparison, navigation, etc., just try to avoid justification.
  16. I'm finding Excel's data charts incredibly helpful. If you tell MALLET to output *everything*, there's lots of data. #topicmodeling
  17. @scott_bot @mwidner @miriamkp @Ted_Underwood yet what we may call "intuition" often conceals iterations of hypothesis & testing
  18. @nmhouston Exactly. That's what I'm trying to find a way to guard against.
  19. @nmhouston @mwidner @miriamkp @Ted_Underwood True, and concealed selective iteration can lead to selective conclusions (confirmation bias).
  20. @scott_bot @Ted_Underwood @mwidner See the "Reading Tea Leaves" paper by @boydgraber et al.—it discusses a few approaches to evaluation.
  21. @mwidner I am trying to figure out how to read that! What exactly is mapped? Relations among selected terms?
  22. @jeffreyjcohen Prevalent topics in the book (labeled by most common words) & relationships among topics and authors' sections.
  23. @mwidner as determined by wordcounts then mapped onto chapters ... hmm.

Did you find this story interesting? Be the first to or comment.

Liked!

Mike Widner

Academic Technology Specialist at Stanford. ABD in English at UT Austin. Medieval literature, cognitive science, embodiment, genre theory, digital humanities.

Total views
167

Storify

@Storify