Viewpoint
## Computational Thinking in the Era of Data Science

Recent years have seen the integration of computer science, mathematics^{a} and statistics, together with real-world domain knowledge, into a new research and applications field: *data science.*^{4} Just as data science integrates knowledge and skills from computer science, statistics, and a real-world application domain, *data thinking*, we propose, integrates computational thinking, statistical thinking, and domain thinking.

Computational thinking was first introduced by Papert^{13} and, a quarter of a century later, was illuminated and elaborated on by Wing.^{15} As it turns out, exploring the novelty of data thinking uncovers new facets of computational thinking.

In this Viewpoint, we first present our interpretation of the concept of data thinking and then, based on insights gained from the discussion about data thinking, we propose a timely need has emerged to introduce data thinking into computer science education along with computational thinking, in the context of various real-world domains using real-life data.

*Computational thinking* is commonly defined as a set of cognitive and social skills that are applied in problem-solving processes. The discussion about computational thinking was reopened by Wing in 2006 and the term has since been interpreted through different prisms as a collection of various skills required for problem solving. Some of these common skills are problem formulation, problem decomposition, organization and logical analysis of data, data representation using models and simulations, abstraction, suggestion and assessment of multiple solutions to a given problem, implementation of the chosen solution, and generalization. Computational thinking deals with cognitive skills and can, therefore, be implemented independent of technology. Like other educators (for example, Yadav et al.^{17}), however, we propose computational thinking must be viewed in the context of up-to-date technologies. Another essential aspect is that acquiring and applying computational thinking skills are not limited to the context of computer science; rather, can and should be applied in various contexts.^{9,16}

The term *statistical thinking* was coined by Deming^{7} and developed by Moore.^{11} It has since received a great deal of attention and discussion within the statistics education community, and voices have emerged, calling for statistical thinking for all.^{14} Statistical thinking is associated with understanding the essence, characteristics, and variability of real-life data and its importance for solving real-life problems is.^{6} According to Ben-Zvi and Garfield,^{3} statistical thinking "involves an understanding of why and how statistical investigations are conducted and the 'big ideas' that underlie statistical investigations." Statistical thinking includes: the understanding that variation exists in any data source and that real-life data contains outliers, errors, biases, and variance; when and how to use specific statistical data analysis methods; the nature of sampling and how to infer from samples to populations; statistical models and their usage; the context of a given problem when performing investigations and drawing conclusions; the entire process of statistical inquiry; and the relevance of critique and evaluation of inquiry results.^{3}

Data thinking integrates computational thinking, statistical thinking, and domain thinking.

Examining the data science thinking skills in relation to the disciplines that make up data science, we suggest each discipline contributes its unique thinking skills: computer science brings computational thinking, statistics brings statistical thinking, with each domain bringing the thinking skills rooted in the said domain of knowledge as well. Accordingly, our interpretation of data thinking integrates computational thinking, statistical thinking, and domain thinking (see the accompanying figure). Specifically,^{b} data thinking is the understanding a solution to a real-life problem should not be based only on data and algorithms, but also on the domain knowledge-driven rules that govern them. Data thinking asks whether the data offers a good representation of the real-life situation. It also addresses how data was collected and asks, "Can the data collection be improved?". It is the understanding data is not just numbers to be stored in an adequate data structure, but these numbers have a meaning that derives from the domain knowledge. Data thinking is understanding any process or calculation performed on the data should preserve the meaning of the relevant knowledge domain. It analyzes the data not only logically but also statistically, using visualizations and statistical methods to find patterns as well as irregular phenomena. Data thinking is understanding that problem abstraction is domain-dependent, and generalization is subject to biases and variance in the data. It is understanding lab testing is not enough, and real-life implementation will always encounter unexpected data and situations, and so improving the models and the solution for a given problem is a continuous process that includes, among other activities, constant and iterative monitoring and data collection.

**Figure. Data Science and Data Thinking: a) Data science integrates computer science, mathematics, and statistics, as well as a real-world domain. b) Data thinking integrates computational thinking, statistical thinking, and domain thinking.**

To illustrate our claims regarding the added value of data thinking for problem-solving processes that stem from its combination of computational thinking, statistical thinking, and domain thinking, we present an example from our own domain: education. Consider the problem of the dropout rate in higher education. From a computational thinking perspective, a model may be suggested that monitors and checks students' progress, from their entrance into higher-education institutes, through all possible study curricula, to their graduation. We can also consider additional aspects, such as the students' attributes before embarking on higher education. By adding such considerations, we can predict their success and accept only those students with the highest probability of graduating. Such research is indeed being carried out using machine learning to predict student success (see Alyahyan and Düştegör^{1}). But this solution may be biased: by adding statistical thinking, we can question whether the available data is indeed the best for this task. Using current student data, we in fact intensify biases of the existing admission system^{12} by overlooking those applicants who were not accepted, and their potential for success is, therefore, not measured. Building on our domain knowledge of education, we suggest that other factors influencing success, such as motivation, should be added to the prediction model. Thus, using domain knowledge, we can further improve our computational and statistical models. In conclusion, we generate the prediction model based on computational thinking, statistical thinking, and domain thinking, that is, based on data thinking.

While computer science and statistics are a discipline of their own, data science integrates domain knowledge and thinking with computational thinking and statistical thinking. We argue that such a connection between computational thinking and domain thinking can be beneficial not only in the context of data science but also in the context of computer science and in the context of the domain itself. We emphasize the two sides of this coin: the potential contribution of computational thinking to domain understanding and the potential contribution of domain knowledge to computational thinking. Due to space limitations, we will not elaborate here on the mutual connections between statistical thinking and computational thinking.

**Computational thinking with data can improve the understanding of domain knowledge.** The two aspects borrowed from data thinking—real domain knowledge and real-life data—are relevant in order to illustrate the mutual contribution of computational thinking and domain thinking. First, many real-life processes can be interpreted as algorithms for which computational thinking is required. For example, the attachment of a virus to a cell and its reproduction using the cell's bio-factory can be described as an algorithm as can the spread of a pandemic. Second, nowadays, we learn new domain knowledge using big data, which is collected in vast volumes, variability, and velocity. Interpreting big data, which is too vast for the human mind to grasp, requires computational thinking skills. For example, abstraction is required in order to analyze the relationship between a problem and the data collected to solve, with the objective of extracting valuable information from the vast collection of available data.

Real-life data should be used for incorporating data thinking into learning environments that aim to develop computational thinking.

**Domain thinking can improve computational thinking.** In order to integrate data thinking and, specifically, domain knowledge into a learning environment that aims to develop computational thinking, real-life data should be used. While some initiatives to teach computer science in a real-world context exist,^{2,5} this is the exception, and "traditional introductory programming courses often take their examples and assignments from the domains of puzzles, games, and abstract mathematics. For instance, students might be shown how to reverse a list or assigned to compute the Fibonacci sequence."^{2} A recent comprehensive literature review on computer science introductory courses revealed this assertion still holds true.^{10} One may claim teaching computational thinking with real data using domain knowledge might introduce a high cognitive load. We suggest, however, that adding domain knowledge can increase the problem's relevance and introduce a different kind of complexity that exists in real life.

With the increasing acknowledgment of the importance of understanding data, we suggest all people living in the 21^{st} century should acquire data thinking skills in addition to computational thinking skills. In practice, we argue that just as computational thinking is currently being acknowledged and taught beyond the scope of computer science education, so should data thinking be taught beyond the scope of data science education. Specifically, we suggest the use of data thinking in the context of computer science education is not only relevant but also vital. Among the many implementations that arise from this claim, we highlight the relevance of working with real data in computer science education.

1. Alyahyan, E. and Düştegör, D. Predicting academic success in higher education: Literature review and best practices. *Int. J. Educ. Technol. High Educ. 17*, 3 (2020); https://bit.ly/3QzxMdN

2. Anderson, R.E. et al. A data programming CS1 course. In *Proceedings of the 46 ^{th} ACM Technical Symposium on Computer Science Education*, 150–155.

3. Ben-Zvi, D. and Garfield, J.B., Eds. *The Challenge of Developing Statistical Literacy, Reasoning and Thinking.* Kluwer Academic Publishers Dordrecht, The Netherlands, 2004.

4. Berman, F. et. Realizing the potential of data science. *Commun. ACM 61*, 4 (Apr. 2018), 67–72.

5. Burlinson, D. et al. BRIDGES: A system to enable creation of engaging data structures assignments with real-world data and visualizations. In *Proceedings of the 47 ^{th} ACM Technical Symposium on Computing Science Education.* (Feb. 2016), 18–23.

6. Cobb, G.W. and Moore, D.S. Mathematics, statistics, and teaching. *The American Mathematical Monthly 104*, (Sept. 1997), 801–823.

7. Demming, W.E. *Out of the Crisis.* MIT Press, 1986.

8. De Veaux, R.D. et al. Curriculum guidelines for undergraduate programs in data science. *Annual Review of Statistics and Its Application 4* (2017), 15–30.

9. Günbatar, M. Computational thinking within the context of professional life: Change in CT skill from the viewpoint of teachers. *Education and Information Technologies*, (2019), 1–24; https://bit.ly/3tJx18k

10. Luxton-Reilly, A. et al. Introductory programming: A systematic literature review. In *Proceedings Companion of the 23 ^{rd} Annual ACM Conference on Innovation and Technology in Computer Science Education* (July 2018), 55–106.

11. Moore, D.S. On the shoulders of giants: New approaches to numeracy. Uncertainty. LA Steen, (1990), 95–137.

12. Murrell, A. Big data and the problem of bias in higher education. *Forbes*; https://bit.ly/39H3qWn

13. Papert, S. *Mindstorms: Children, Computers, and Powerful Ideas.* Basic Books, Inc., 1980.

14. Wallman, K.K. Enhancing statistical literacy: Enriching our society. *Journal of the American Statistical Association 88*, 421 (1993), 1–8.

15. Wing, J.M. Computational thinking. *Commun. ACM 49*, 3 (Mar. 2006), 33–35.

16. Wing, J. M. *Computational Thinking: What and Why.* In presentation slides from Trippel Helix Conference on Computational Thinking and Digital Competencies in Primary and Secondary Education Stockholm, Sweden. (Oct. 2017); https://bit.ly/39EdI9y

17. Yadav, A. et al. Computational thinking as an emerging competence domain. In *Competence-based Vocational and Professional Education.* Springer, (2017), 1051–1067.

a. As mathematics is considered the queen of all sciences, the discussion about the relationships between mathematical thinking and data thinking is beyond the scope of this Viewpoint and therefore, in what follows, we refer only to statistics.

b. Readers are invited to specify which component of data thinking each of its following characteristics is derived from. Clearly, some characteristics are derived from more than one component of data thinking.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.

No entries found