Since 2009 one of my key areas of work is to create a practical training for datajournalism.
Developing curricula though is quite tough. It is not only about what should be learned, but how people actually understand the new knowledge. Plus: What is taught must be applicable to daily work in the newsroom, not just an academic overview.
Two years preparation
Getting to the point of of doing the first, one week training was a long and winded road. It took two years. In early 2010 we won a media funding contest. But the process of checking our proposal then took month after month - at the end we simply called this quits. A whole year was lost. Then, by early 2011, we got support by German journalism training institution ABZV.
How was the training structured?
With this backing we gathered material, tested tools and step by step came up with a very structured approach to teach this. In September 2011, we organized a first, one-week training for a group of journalists. Participants came from media companies like Die Zeit, sueddeutsche.de, Rheinische Post, Westfälische Nachrichten and others. The training was conducted in Bonn, Germany. And most important: Attendants liked it.
Main blocks:
- What is data-driven journalism?
- How to find data (sources, data quality)
- How to scrape data (using tools like Needlebase*, OutWitHub or API scrapers)
- How to clean data (using Excel, Google Refine)
- How to visualize data (Google chart API, Google Fusion Tables, Protovis and D3.JS)
- How to publish data stories (formats, variations)
- Data to Story: Creation of simple charts, built by one journalist (with tools like Datawrapper).
- Data Specials: Bigger, team-based projects built by teams.
- Data Apps: Including real-time data, ongoing, built by even more diverse teams.
- Datajournalism requires a change of thinking and workflows. This is probably the most difficult to teach in the beginning.
- Journalists should learn to work from data to story, not the other way round. Today, in most newsrooms data and visualizations are an afterthought after the story is already written - as result a lot of potential for deep and engaging stories is lost.
- Organization is key as well as defining roles for members of a data team. Should all journalists learn how to code and become competent visualization wizards? This is simply not likely, the few datajournalists who are talented like this are simply outliers. Journalists, all journalists, should be competent though being the "directors" of the process to turn data into compelling insights.
- Tools eat up time. There so many tools that *could* be useful for certain steps of the process that checking them out takes quite an effort. At the same time tools won't help you at all, if you don't have an idea what could be the story. A big message is this: Work on stories and strategies and the tools will come. But being attached to whatever platform and hoping for a silver bullet get's you nowhere.
- Resistance in newsrooms are a challenge. Freshly trained data journalists face hidden or open opposition. Just read the interview with Aron Pilhofer where he says "media is driven to data".
- Translate the opportunity from tech speak to media lingo: Once you talk about "scraping", "Python", "programming" a lot of seasoned reporters will walk the other way. But, if you can show them that a two week effort to built a database can get you traffic and a huge audience, will make them interested.
(A good example here is the "Sopa Opera" from Pro Publica. Deep and relevant, data-driven stories like the one above can potentially be done in every country around the world. The news outlet is in the driving seat here, providing the clearest overview - this is what we should be after).
The next steps
The process of development a data-driven journalism curriculum and workflows for newsrooms is a huge project.
So, there are other building blocks to be taken care of: In February 2012 we launched the beta version of a tool simplifying the generation of simple, embeddable charts with Datawrapper. In the first two months the tool attracted above 100.000 visits.
Other future challenges would include how to create data teams in newsrooms, where the competencies of a journalist, a coder and a visual artist would be combined in a productive way. Much to learn here from the New York Times and other teams.
On top of that, there we will need to think about how to built data hubs, using software/database repositories to collect all the data that has been found, looked at, maybe cleaned, maybe visualized in an organized way. Two projects tackling this are The Panda Project and CKAN.
The whole project was possible with the help and support of a whole group of people: Cosmin Cabulea (tutorials for data scraping, maps), Linda Rath-Wiggins and Wilfried Runde (all working at Deutsche Welle). Gregor Aisch presented a range of his projects and how he got from asking questions to great visualizations - impressive for everyone in the room.
Still a lot of work to do here - transforming journalism is a marathon, not a sprint.
* Needlebase will no longer be available by June 2012. This is a result of the acquisition through Google. We assume that the scraping functionality will pop up again, maybe as an extra of Google Refine.
Photo by Ian Smith, via Flickr (with kind permission).