Abstract
Data science has started to become one of the most important skills one can have in the modern world, due to data taking an increasingly meaningful role in our lives. The accessibility of data science is however limited, requiring complicated software or programming knowledge. Both can be challenging and hard to master, even for the simple tasks.
With this in mind, we have approached this issue by providing a new data science platform, termed DS4All.Curation, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation. By combining HCI concepts, this platform is: simple to use through direct manipulation and showing transformation previews; allows users to save time by eliminate repetitive tasks and automatically calculating many of the common analyses data scientists must perform; and suggests data transformations based on the contents of the data, allowing for a smarter environment.
This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ci\(\hat{e}\)ncia e a Tecnologia within project UIDB/50014/2020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
DS4All.Curation can be found at https://github.com/Zamreg/HDC.
- 2.
- 3.
- 4.
- 5.
References
Bart, A.C., Tibau, J., Tilevich, E., Shaffer, C.A., Kafura, D.: BlockPy: an open access data-science environment for introductory programmers. Computer 50(5), 18–26 (2017). https://doi.org/10.1109/MC.2017.132
Burnett, M.M.: Visual programming. In: Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, Hoboken, December 1999. https://doi.org/10.1002/047134608x.w1707
Cao, L.: Data science: a comprehensive overview. ACM Comput. Surv. 50(43) (2017). https://doi.org/10.1145/3076253
Gulwani, S.: Programming by examples (and its applications in data wrangling). In: Dependable Software Systems Engineering, vol. 45, pp. 137–158. IOS Press, April 2016. https://doi.org/10.3233/978-1-61499-627-9-137
IBM and Business-Higher Education Forum and Burning Glass: The Quant Crunch: How the Demand for Data Science Skills Is Disrupting the Job Market (2017). https://www.ibm.com/downloads/cas/3RL3VXGA
Kaggle Inc.: The State of Data Science & Machine Learning (2017). https://www.kaggle.com/surveys/2017
Kandel, S., Paepcke, A., Hellerstein, J.M., Heer, J.: Enterprise data analysis and visualization: an interview study. IEEE Trans. Visual. Comput. Graph. 18(12), 2917–2926 (2012). https://doi.org/10.1109/TVCG.2012.219
Kery, M.B., Radensky, M., Arya, M., John, B.E., Myers, B.A.: The story in the notebook. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI 2018, vol. 2018-April, pp. 1–11. ACM Press, New York, New York, USA, April 2018. https://doi.org/10.1145/3173574.3173748
Kim, M., Zimmermann, T., DeLine, R., Begel, A.: The emerging role of data scientists on software development teams. In: Proceedings - International Conference on Software Engineering, pp. 96–107 (2016). https://doi.org/10.1145/2884781.2884783
Ko, A.J., et al.: The state of the art in end-user software engineering. ACM Comput. Surv. 43(3), (2011). https://doi.org/10.1145/1922649.1922658
Lopes, B., Pedroso, A., Correia, J., Araujo, F., Cardoso, J., Paiva, R.P.: DataScience4NP -A Data Science Service for Non-Programmers. In: 10\(^{\circ }\) Simpósio de Informática - INForum 2018 (2018)
Matalonga, H., et al.: Greenhub farmer: real-world data for android energy mining. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 171–175. IEEE (2019)
Muller, M., et al.: How data science workers work with data: discovery, capture, curation, design, creation. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. CHI 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300356
Pereira, P., Cunha, J., Fernandes, J.P.: On understanding data scientists. In: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2020, to appear)
Portuguese Government: Contrato para a Legislatura com o Ensino Superior para 2020–2023 (2019). https://www.portugal.gov.pt/download-ficheiros/ficheiro.aspx?v=d2607a18-51c9-489c-a61c-1ff420dab2f0
Raman, V., Hellerstein, J.M.: Potter’s wheel: an interactive data cleaning system. In: VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases, pp. 381–390 (2001)
Rao, A., Bihani, A., Nair, M.: Milo: a visual programming environment for Data Science Education. In: 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), vol. 2018-October, pp. 211–215. IEEE, October 2018. https://doi.org/10.1109/VLHCC.2018.8506504
Refaat, M.: Data preparation for data mining using SAS. Elsevier (2007). https://doi.org/10.1016/B978-0-12-373577-5.X5000-5
Shneiderman, B.: Direct manipulation: A step beyond programming languages. Computer 16(8), 57–69 (1983). https://doi.org/10.1109/MC.1983.1654471
Wallace, B.C., et al.: Closing the gap between methodologists and end-users: R as a computational back-end. J. Statist. Softw. 49(5), 1–15 (2012). https://doi.org/10.18637/jss.v049.i05
Wongsuphasawat, K., Liu, Y., Heer, J.: Goals, process, and challenges of exploratory data analysis: an interview study. arXiv preprint arXiv:1911.00568, November 2019
Zikopoulos, P.C., DeRoos, D., Parasuraman, K., Deutsch, T., Corrigan, D., Giles, J.: Harness the power of Big Data : the IBM Big Data platform. McGraw-Hill, New York (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dias, J., Cunha, J., Pereira, R. (2020). Data Curation: Towards a Tool for All. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2020 – Late Breaking Posters. HCII 2020. Communications in Computer and Information Science, vol 1293. Springer, Cham. https://doi.org/10.1007/978-3-030-60700-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-60700-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60699-2
Online ISBN: 978-3-030-60700-5
eBook Packages: Computer ScienceComputer Science (R0)