Training data-driven journalism

We developed a one-week, practical datajournalism training, the first being held in September 2011. Some insights and learnings.

Since 2009 one of my key areas of work is to create a practical training for datajournalism.

Developing curricula though is quite tough. It is not only about what should be learned, but how people actually understand the new knowledge. Plus: What is taught must be applicable to daily work in the newsroom, not just an academic overview. 

Two years preparation
Getting to the point of of doing the first, one week training was a long and winded road. It took two years. In early 2010 we won a media funding contest. But the process of checking our proposal then took month after month - at the end we simply called this quits. A whole year was lost. Then, by early 2011, we got support by German journalism training institution ABZV

How was the training structured?
With this backing we gathered material, tested tools and step by step came up with a very structured approach to teach this. In September 2011, we organized a first, one-week training for a group of journalists. Participants came from media companies like Die Zeit,, Rheinische Post, Westfälische Nachrichten and others. The training was conducted in Bonn, Germany. And most important: Attendants liked it. 

Main blocks:

The basic steps above where applied to three distinct types of datajournalism projects:
  • Data to Story: Creation of simple charts, built by one journalist (with tools like Datawrapper). 
  • Data Specials: Bigger, team-based projects built by teams.
  • Data Apps: Including real-time data, ongoing, built by even more diverse teams.
*Update based on comments*
What was missing? 
After updating this article on May 30, 2012 and tweeting it around, I received a comment from Ben Welsh (@palewire), database producer for the LA Times datadesk. He said: 

"@mirkolorenz "how to analyze data" is a missing, IMHO. There needs to be something between clean and present where you stop and think."

Which is true. Being knowledgeable about the quality of the data you work with is as important as avoiding the pitfalls of basic statistical math. Same goes for chart literacy - if you choose the wrong chart for the right data (or vice versa) you can easily do more harm than good. 

But frankly, we did not get this far in the one week course. The main motivation was to open the field, providing a first and motivating overview of what can be done. My best guess is that over time people working with data can pick up more knowledge to do better charts from iteration to iteration - thus including "how to analyze data" as a technique. (Some optimism here. As we know people do learn, but organizations often do not). 

The future of datajournalism
The use of data in newsrooms is not essentially new, but the new options change the process entirely. In the future media companies must look into how to become a data hub and how to create effective data teams. 

Finally, here are some take aways from the process so far.
  • Datajournalism requires a change of thinking and workflows. This is probably the most difficult to teach in the beginning.
  • Journalists should learn to work from data to story, not the other way round. Today, in most newsrooms data and visualizations are  an afterthought after the story is already written - as result a lot of potential for deep and engaging stories is lost. 
  • Organization is key as well as defining roles for members of a data team. Should all journalists learn how to code and become competent visualization wizards? This is simply not likely, the few datajournalists who are talented like this are simply outliers. Journalists, all journalists, should be competent though being the "directors" of the process to turn data into compelling insights. 
  • Tools eat up time. There so many tools that *could* be useful for certain steps of the process that checking them out takes quite an effort. At the same time tools won't help you at all, if you don't have an idea what could be the story. A big message is this: Work on stories and strategies and the tools will come. But being attached to whatever platform and hoping for a silver bullet get's you nowhere. 
  • Resistance in newsrooms are a challenge. Freshly trained data journalists face hidden or open opposition. Just read the interview with Aron Pilhofer where he says "media is driven to data".  
  • Translate the opportunity from tech speak to media lingo: Once you talk about "scraping", "Python", "programming" a lot of seasoned reporters will walk the other way. But, if you can show them that a two week effort to built a database can get you traffic and a huge audience, will make them interested. 
    (A good example here is the "Sopa Opera" from Pro Publica. Deep and relevant, data-driven stories like the one above can potentially be done in every country around the world. The news outlet is in the driving seat here, providing the clearest overview - this is what we should be after). 

The next steps
The process of development a data-driven journalism curriculum and workflows for newsrooms is a huge project.

So, there are other building blocks to be taken care of: In February 2012 we launched the beta version of a tool simplifying the generation of simple, embeddable charts with Datawrapper. In the first two months the tool attracted above 100.000 visits. 

Other future challenges would include how to create data teams in newsrooms, where the competencies of a journalist, a coder and a visual artist would be combined in a productive way. Much to learn here from the New York Times and other teams. 

On top of that, there we will need to think about how to built data hubs, using software/database repositories to collect all the data that has been found, looked at, maybe cleaned, maybe visualized in an organized way. Two projects tackling this are The Panda Project and CKAN

The whole project was possible with the help and support of a whole group of people: Cosmin Cabulea (tutorials for data scraping, maps), Linda Rath-Wiggins and Wilfried Runde (all working at Deutsche Welle). Gregor Aisch presented a range of his projects and how he got from asking questions to great visualizations - impressive for everyone in the room. 

Still a lot of work to do here - transforming journalism is a marathon, not a sprint. 

* Needlebase will no longer be available by June 2012. This is a result of the acquisition through Google. We assume that the scraping functionality will pop up again, maybe as an extra of Google Refine. 

Photo by Ian Smith, via Flickr (with kind permission).