Skip to main content

Data Curation: Towards a Tool for All

  • Conference paper
  • First Online:
HCI International 2020 – Late Breaking Posters (HCII 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1293))

Included in the following conference series:

Abstract

Data science has started to become one of the most important skills one can have in the modern world, due to data taking an increasingly meaningful role in our lives. The accessibility of data science is however limited, requiring complicated software or programming knowledge. Both can be challenging and hard to master, even for the simple tasks.

With this in mind, we have approached this issue by providing a new data science platform, termed DS4All.Curation, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation. By combining HCI concepts, this platform is: simple to use through direct manipulation and showing transformation previews; allows users to save time by eliminate repetitive tasks and automatically calculating many of the common analyses data scientists must perform; and suggests data transformations based on the contents of the data, allowing for a smarter environment.

This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ci\(\hat{e}\)ncia e a Tecnologia within project UIDB/50014/2020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    DS4All.Curation can be found at https://github.com/Zamreg/HDC.

  2. 2.

    https://powerbi.microsoft.com.

  3. 3.

    http://tableau.com.

  4. 4.

    https://jupyter.org.

  5. 5.

    https://rapidminer.com.

References

  1. Bart, A.C., Tibau, J., Tilevich, E., Shaffer, C.A., Kafura, D.: BlockPy: an open access data-science environment for introductory programmers. Computer 50(5), 18–26 (2017). https://doi.org/10.1109/MC.2017.132

    Article  Google Scholar 

  2. Burnett, M.M.: Visual programming. In: Wiley Encyclopedia of Electrical and Electronics Engineering. Wiley, Hoboken, December 1999. https://doi.org/10.1002/047134608x.w1707

  3. Cao, L.: Data science: a comprehensive overview. ACM Comput. Surv. 50(43) (2017). https://doi.org/10.1145/3076253

  4. Gulwani, S.: Programming by examples (and its applications in data wrangling). In: Dependable Software Systems Engineering, vol. 45, pp. 137–158. IOS Press, April 2016. https://doi.org/10.3233/978-1-61499-627-9-137

  5. IBM and Business-Higher Education Forum and Burning Glass: The Quant Crunch: How the Demand for Data Science Skills Is Disrupting the Job Market (2017). https://www.ibm.com/downloads/cas/3RL3VXGA

  6. Kaggle Inc.: The State of Data Science & Machine Learning (2017). https://www.kaggle.com/surveys/2017

  7. Kandel, S., Paepcke, A., Hellerstein, J.M., Heer, J.: Enterprise data analysis and visualization: an interview study. IEEE Trans. Visual. Comput. Graph. 18(12), 2917–2926 (2012). https://doi.org/10.1109/TVCG.2012.219

    Article  Google Scholar 

  8. Kery, M.B., Radensky, M., Arya, M., John, B.E., Myers, B.A.: The story in the notebook. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI 2018, vol. 2018-April, pp. 1–11. ACM Press, New York, New York, USA, April 2018. https://doi.org/10.1145/3173574.3173748

  9. Kim, M., Zimmermann, T., DeLine, R., Begel, A.: The emerging role of data scientists on software development teams. In: Proceedings - International Conference on Software Engineering, pp. 96–107 (2016). https://doi.org/10.1145/2884781.2884783

  10. Ko, A.J., et al.: The state of the art in end-user software engineering. ACM Comput. Surv. 43(3), (2011). https://doi.org/10.1145/1922649.1922658

  11. Lopes, B., Pedroso, A., Correia, J., Araujo, F., Cardoso, J., Paiva, R.P.: DataScience4NP -A Data Science Service for Non-Programmers. In: 10\(^{\circ }\) Simpósio de Informática - INForum 2018 (2018)

    Google Scholar 

  12. Matalonga, H., et al.: Greenhub farmer: real-world data for android energy mining. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 171–175. IEEE (2019)

    Google Scholar 

  13. Muller, M., et al.: How data science workers work with data: discovery, capture, curation, design, creation. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. CHI 2019, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3290605.3300356

  14. Pereira, P., Cunha, J., Fernandes, J.P.: On understanding data scientists. In: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (2020, to appear)

    Google Scholar 

  15. Portuguese Government: Contrato para a Legislatura com o Ensino Superior para 2020–2023 (2019). https://www.portugal.gov.pt/download-ficheiros/ficheiro.aspx?v=d2607a18-51c9-489c-a61c-1ff420dab2f0

  16. Raman, V., Hellerstein, J.M.: Potter’s wheel: an interactive data cleaning system. In: VLDB 2001 - Proceedings of 27th International Conference on Very Large Data Bases, pp. 381–390 (2001)

    Google Scholar 

  17. Rao, A., Bihani, A., Nair, M.: Milo: a visual programming environment for Data Science Education. In: 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), vol. 2018-October, pp. 211–215. IEEE, October 2018. https://doi.org/10.1109/VLHCC.2018.8506504

  18. Refaat, M.: Data preparation for data mining using SAS. Elsevier (2007). https://doi.org/10.1016/B978-0-12-373577-5.X5000-5

  19. Shneiderman, B.: Direct manipulation: A step beyond programming languages. Computer 16(8), 57–69 (1983). https://doi.org/10.1109/MC.1983.1654471

  20. Wallace, B.C., et al.: Closing the gap between methodologists and end-users: R as a computational back-end. J. Statist. Softw. 49(5), 1–15 (2012). https://doi.org/10.18637/jss.v049.i05

  21. Wongsuphasawat, K., Liu, Y., Heer, J.: Goals, process, and challenges of exploratory data analysis: an interview study. arXiv preprint arXiv:1911.00568, November 2019

  22. Zikopoulos, P.C., DeRoos, D., Parasuraman, K., Deutsch, T., Corrigan, D., Giles, J.: Harness the power of Big Data : the IBM Big Data platform. McGraw-Hill, New York (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jácome Cunha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dias, J., Cunha, J., Pereira, R. (2020). Data Curation: Towards a Tool for All. In: Stephanidis, C., Antona, M., Ntoa, S. (eds) HCI International 2020 – Late Breaking Posters. HCII 2020. Communications in Computer and Information Science, vol 1293. Springer, Cham. https://doi.org/10.1007/978-3-030-60700-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60700-5_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60699-2

  • Online ISBN: 978-3-030-60700-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics