Aims

The Problem

When people talk about skills that are important for data science, the focus tends to be primarily on technical skills, like statistics and computer programming. Often overlooked is the importance of the scientific mindset. Being a critical thinker is essential to interpreting data and to avoiding the traps of analysis on autopilot, which can lead—and has led—to consequential failure.

This is also reflected in education. As more and more undergraduate students are selecting data science as their field of study, a common challenge persists: Students often acquire technical literacy in computation and statistics classes but still have trouble critically solving problems with data.

For instance, students often learn how to derive mathematical proofs but continue to fall into common bias pitfalls when analyzing data. Methods and tools are run mechanically, overlooking known statistical inference mistakes and other errors. When “letting data speak” without spelling out prior expectations and potential explanations, spurious patterns are taken for real. Findings often over-detect the expected. The latest machine learning methods that students learn are seen as the solution to every data problem.

Project Aims

We agree that it is important to establish computational and statistical literacy. This alone is actually hard to do — data science already combines two disciplines. Adding scientific reasoning as another layer is even harder. But we are expecting that such an integration can improve the challenges outlined above.

The learning outcomes we hope to achieve include an increased engagement of students in the thrill of discovery, more rigorous testing for errors and biases, assessing the plausibility of explanations against evidence, and, ultimately, a greater detection of unexpected relevant findings. The figure below summarizes this argument.

Initial informal evaluations of the 2020 summer lab at the Center for Spatial Data Science and the 2021 and 2022 Data4All Bridge workshop have been promising. But more systematic external evaluations are needed to test in how far these goals are actually reached for the teaching materials presented on this site. We plan to incorporate these in the near future.