New study: A simple tool for building data journalism projects in the newsroom

The problem Muck solves

  1. Chief among them was demand for a system that would be comprehensible not just by professional programmers but by less technical journalists as well. Data journalists often use different tools to process and analyze data than they do to produce the published story or interactive application. This makes the analysis less accessible to nontechnical collaborators because there are multiple systems involved, each requiring expert knowledge.
  2. Another shortcoming of existing methods is that unless the record of modifications to the data is perfect, auditing work from end to end is impossible. Small manual fixes to bad data are almost always necessary; such transformations often take place in spreadsheets or interactive programming sessions and are not recorded anywhere in the version-controlled code. While content management systems for sharing and versioning documents have existed for decades, as Sarah Cohen of The New York Times put it, “No matter what we do, at the end of a project the data is always in 30 versions of an Excel spreadsheet that got emailed back and forth, and the copy desk has to sort it all out . . . It’s what people know.”
  3. Furthermore, when multiple team members are involved—constantly adding to and tweaking a set or data—the effort required to maintain correctness of code and, consequently, correct project conclusions increases dramatically.

How the new system works

A quick case study

  • download the health expenditures dataset
  • download the life expectancies dataset
  • extract the relevant data points into a table
  • render the chart using the rows in the table
Health costs example: conceptual dependencies.
“Source” dependencies show that a given product is produced by running the pointed-to source code file. Muck determines these relationships automatically via its naming convention. Each “data” dependency exists because the given source file opens and reads the pointed-to data file.
Muck implementation to build a complete web page. In addition to the source and data dependencies seen in the previous diagram, this implementation also features a code module that is shared by two scripts.




Center for Digital Journalism at Columbia Graduate School of Journalism

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Data analytics with AWS : Introduction (part 1)

Tricks I used to succeed on a famous Kaggle Competition

How to Balance Chemical Equations in Python using Constraint Optimization (PuLP)

Democratising Data

Fighting the Covid-19: All the datasets and data efforts in one place

Model Tests are Critical for Building Domain Knowledge

Experimentation and Multiple comparison

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tow Center

Tow Center

Center for Digital Journalism at Columbia Graduate School of Journalism

More from Medium

Image Classification model by Microsoft Lobe in Power Apps — Part 2

How can I improve my app? Analyzing user reviews using text mining approach

Mask-RCNN error analysis using different backbones: applications in smart manufacturing

Snapchat Reviews Analysis