Skip to main content

Smoky Mountain Data Challenge 2021: An Open Call to Solve Scientific Data Challenges Using Advanced Data Analytics and Edge Computing

  • Conference paper
  • First Online:
Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation (SMC 2021)

Abstract

The 2021 Smoky Mountains Computational Sciences and Engineering Conference enlists scientists from across Oak Ridge National Laboratory (ORNL) and industry to be data sponsors and help create data analytics and edge computing challenges for eminent datasets in a variety of scientific domains. This work describes the significance of each of the eight datasets and their associated challenge questions. The challenge questions for each dataset were required to cover multiple difficulty levels. An international call for participation was sent to students, asking them to form teams of up to six people and apply novel data analytics and edge computing methods to solve these challenges.

This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://smc-datachallenge.ornl.gov/data-challenges-2021/.

  2. 2.

    https://www.youtube.com/watch?v=e6gqWr0Ly4g.

  3. 3.

    https://pubmed.ncbi.nlm.nih.gov/.

  4. 4.

    https://skr3.nlm.nih.gov/SemMedDB/.

  5. 5.

    https://www.semanticscholar.org/cord19.

  6. 6.

    https://dx.doi.org/10.13139/OLCF/1646608.

  7. 7.

    https://doi.ccs.ornl.gov/ui/doi/334.

  8. 8.

    https://github.com/olcf/TitanGPULife.

  9. 9.

    https://www.arcgis.com/home/item.html?id=3b0b8cf27ffb49e2a2c8370f9806f267.

  10. 10.

    https://zenodo.org/record/4552901.YY6zXL3MIcB.

  11. 11.

    https://evenstar.ornl.gov/autobem/virtual_vegas/.

  12. 12.

    https://doi.ccs.ornl.gov/ui/doi/328.

  13. 13.

    https://koordinates.com/layer/97329-las-vegas-nv-trees/.

  14. 14.

    https://doi.ccs.ornl.gov/ui/doi/326.

  15. 15.

    https://doi.ccs.ornl.gov/ui/doi/330.

  16. 16.

    https://colab.research.google.com/drive/1ioa9kwibwJcwZkFrw3tW_oAI3Wnwg1Ev.

References

  1. Akbarian, D., et al.: Understanding the influence of defects and surface chemistry on ferroelectric switching: a ReaxFF investigation of BaTiO 3. Phys. Chem. Chem. Phys. 21(33), 18240–18249 (2019)

    Article  Google Scholar 

  2. Biomedical Data Translator Consortium, et al.: Toward a universal biomedical data translator. Clin. Transl. Sci. 12(2), 86 (2019)

    Google Scholar 

  3. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)

    Google Scholar 

  4. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Conference on Robot Learning, pp. 1–16. PMLR (2017)

    Google Scholar 

  5. Herrmannova, D., et al.: Scalable knowledge-graph analytics at 136 petaflop/s – data readme. DOI (2020)

    Google Scholar 

  6. Kelley, K.P., et al.: Tensor factorization for elucidating mechanisms of piezoresponse relaxation via dynamic Piezoresponse Force Spectroscopy. npj Comput. Mater. 6(1), 1–8 (2020)

    Article  Google Scholar 

  7. Landhuis, E.: Scientific literature: information overload. Nature 535(7612), 457–458 (2016)

    Article  Google Scholar 

  8. Office of Science and Technology Policy: Call to action to the tech community on new machine readable COVID-19 dataset. Online (2020). Accessed 18 Apr 2020

    Google Scholar 

  9. Ostrouchov, G., Maxwell, D., Ashraf, R.A., Engelmann, C., Shankar, M., Rogers, J.H.: GPU lifetimes on Titan supercomputer: survival analysis and reliability. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2020)

    Google Scholar 

  10. Passian, A., Imam, N.: Nanosystems, edge computing, and the next generation computing systems. Sensors 19(18), 4048 (2019)

    Article  Google Scholar 

  11. Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7–18 (1986)

    Article  Google Scholar 

  12. Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 91(2), 183–203 (1997)

    Article  Google Scholar 

  13. Swanson, D.R., Smalheiser, N.R., Torvik, V.I.: Ranking indirect connections in literature-based discovery: the role of medical subject headings. J. Am. Soc. Inform. Sci. Technol. 57(11), 1427–1439 (2006)

    Article  Google Scholar 

  14. Thilakaratne, M., Falkner, K., Atapattu, T.: A systematic review on literature-based discovery: general overview, methodology, & statistical analysis. ACM Comput. Surv. (CSUR) 52(6), 1–34 (2019)

    Article  Google Scholar 

  15. Tshitoyan, V., et al.: Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571(7763), 95–98 (2019)

    Article  Google Scholar 

  16. Wang, F., Oral, S., Sen, S., Imam, N.: Learning from five-year resource-utilization data of titan system. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–6. IEEE (2019)

    Google Scholar 

  17. Yang, H.T., Ju, J.H., Wong, Y.T., Shmulevich, I., Chiang, J.H.: Literature-based discovery of new candidates for drug repurposing. Brief. Bioinform. 18(3), 488–497 (2017)

    Google Scholar 

Download references

Acknowledgment

Dataset generation for Challenge 1 was supported by the Center for Nanophase Materials Sciences, which is a DOE Office of Science User Facility. Through the ASCR Leadership Computing Challenge (ALCC) program, this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Dataset generation for Challenge 2 was supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Robinson Pino, program manager, under contract number DE-AC05-00OR22725. Dataset generation for Challenge 3 used resources from General Motors.

Dataset generation for Challenge 4 used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Dataset generation for Challenge 5 was completed by researchers at Oak Ridge National Laboratory sponsored by the DOE Office of Science as a part of the research in Multi-Sector Dynamics within the Earth and Environmental System Modeling Program as part of the Integrated Multiscale Multisector Modeling (IM3) Scientific Focus Area led by Pacific Northwest National Laboratory. The dataset for Challenge 7 was acquired at the Spallation Neutron Source which is sponsored by the User Facilities Division of the Department of Energy. The research for generating datasets for challenges 6 and 8 was conducted at and partially supported by the at the Center for Nanophase Materials Sciences, a US DOE Office of Science User Facility.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pravallika Devineni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Devineni, P. et al. (2022). Smoky Mountain Data Challenge 2021: An Open Call to Solve Scientific Data Challenges Using Advanced Data Analytics and Edge Computing. In: Nichols, J., et al. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation. SMC 2021. Communications in Computer and Information Science, vol 1512. Springer, Cham. https://doi.org/10.1007/978-3-030-96498-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96498-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96497-9

  • Online ISBN: 978-3-030-96498-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics