This past Saturday I gave a talk at PyData NYC about some of the ways I've seen data collection and storage go wrong in the last year, and how you can prevent them happening in your project or company.
On thing I touched on was how we use the command line to do really quick 'audits' of data sources to get a feel for them and gauge how much to trust them. I recommended some of my favorite tools and books.
The audience was awesome and had the best questions I've ever been asked by a conference audience. I hope to write them up to share once I get the conference video.
And look at this awesome drawing by !