- Data visualization is important because data on its own can be difficult to understand. Imagine reading row after row of numbers, and you get the idea. Data visualization helps present that information in a way that's engaging and helps communicate complex ideas quicker. This is especially true on the web, where data visualization can grab a web surfer's eyes and get information across much easier than paragraphs and paragraphs of words.
- Freely available. The world is generating more and more data. If you know where to look, you can find many free and useful datasets. The City of Toronto, for example, provides a large catalog of free data related to the city via its Open Data Toronto site that can be accessed by anyone.
- The data found on Open Data Toronto has even spawned a stand-alone website that helps drivers locate the worst places for receiving parking tickets in Toronto.
- Crowdsourcing from citizens. Some data can also be collected from the general public. Here is an example of how SocMap and Re:Baltica used crowdsourced data.
- In February, we launched our very first application, HotBills, which we created in partnership with the Baltic Centre for investigative journalism (Re:Baltica). The idea behind the app is to determine how much people pay for heating in various parts of Latvia, so that the data can later be used in journalists’ research into heating prices, transparency and validity, as well as to give people an incentive to talk to their landlords about the prices, ask for explanations, and get adequate answers. We asked users to scan their bills and submit them.
- This following clip explains the above example, as well as a number of other ways Data Journalism is used by journalists from across the globe.
- More data journalism stories from Re:Baltica can be found on their site.
- Scrape sites for data. While newsrooms can get developers to write their own screen scrapers to automatically populate their databases, there is a website that does it for you.
- Freedom of Information requests. For example, the data used by The Toronto Star in their award nominated investigation, Known to Police, was gathered via freedom of information requests.
- 'Known to police' is about who police stop, question and document in encounters that typically involve no arrest or charge, where they do this, and why. What we’ve shown, using Toronto police data, is that, in every part of the city, black and “brown” people are being stopped at rates disproportionate to the populations of black and brown people living in these areas. This is even more so with young males. The analysis allows for a provocative question: Is it possible that police in certain areas of the city have documented every young male of colour who lives there? And, what does that do to a community?
- Toronto police data on arrests and charges served as the basis for Race & Crime, a 2002 series that found police in certain circumstances treated blacks more harshly than whites. Updated charge and arrest data, and data that shows who police stop and document in mostly non-criminal encounters, was requested in 2003. After a seven-year battle for the data — including court challenges — police released them in 2010, resulting in the series Race Matters that same year.
Build a database of relevant... data.
- Much of the data from the aforementioned sources comes in various formats and document types: Excel spreadsheets, XML, PDFs, and others. Not only do newsrooms need a centralized place to store this data, they also need to clean it and prepare in order to use it for data visualizations.
- Extracting the data.
- If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface. And now you can download Tabula and run it on your own computer, like you would with OpenRefine.