DOI QR코드

DOI QR Code

A Framework of Intelligent Middleware for DNA Sequence Analysis in Cloud Computing Environment

DNA 서열 분석을 위한 클라우드 컴퓨팅 기반 지능형 미들웨어 설계

  • Oh, Junseok (Communications Policy Research Center, Yonsei University) ;
  • Lee, Yoonjae (Graduate School of Information, Yonsei University) ;
  • Lee, Bong Gyou (Graduate School of Information, Yonsei University)
  • Received : 2013.10.31
  • Accepted : 2013.12.18
  • Published : 2014.02.28

Abstract

The development of NGS technologies, such as scientific workflows, has reduced the time required for decoding DNA sequences. Although the automated technologies change the genome sequence analysis environment, limited computing resources still pose problems for the analysis. Most scientific workflow systems are pre-built platforms and are highly complex because a lot of the functions are implemented into one system platform. It is also difficult to apply components of pre-built systems to a new system in the cloud environment. Cloud computing technologies can be applied to the systems to reduce analysis time and enable simultaneous analysis of massive DNA sequence data. Web service techniques are also introduced for improving the interoperability between DNA sequence analysis systems. The workflow-based middleware, which supports Web services, DBMS, and cloud computing, is proposed in this paper for expecting to reduceanalysis time and aiding lightweight virtual instances. It uses DBMS for managing the pipeline status and supporting the creation of lightweight virtual instances in the cloud environment. Also, the RESTful Web services with simple URI and XML contents are applied for improving the interoperability. The performance test of the system needs to be conducted by comparing results other developed DNA analysis services at the stabilization stage.

차세대 유전체 해독 기술과 자동화 기술이 발전하면서 DNA 서열 분석 환경이 개선되고 있지만, 아직까지 제한된 컴퓨팅 리소스는 분석시간 단축의 장애요인으로 작용하고 있다. 대부분의 과학 워크플로우 시스템은 수 많은 기능들이 특정 시스템 환경에 맞추어 구현되어 있기 때문에 복잡하고 유동적이지 못하며, 이로 인해 기존 시스템의 컴포넌트들을 클라우드 환경의 새로운 시스템에 적용하기 어려운 한계를 지니고 있다. 본 연구에서는 대량의 DNA 데이터를 동시적으로 분석할 수 있는 가상 인스턴스 제공이 가능하며 시스템간의 상호 운용성을 개선시키기 위하여 웹 서비스, DBMS, 클라우드 컴퓨팅 기능을 지원하는 DNA 서열 분석용 미들웨어를 개발하였다. 본 연구에서 개발된 지능형 미들웨어는 DBMS를 사용하여 파이프라인 정보를 관리하고, 클라우드 환경에서 경량의 가상 인스턴스를 제공하며, 상호운용성 개선을 위하여 단순 URI와 XML을 기반으로 한 RESTful 웹서비스 기능을 제공한다.

Keywords

References

  1. Genome Informatics System, http://gisys.kr, 2012.
  2. J. H. Song, K. H. Kim, "An XPDL-based Workflow Model Analyzer," Review of Korean Society for Internet Information, Vol. 11, No. 6, pp.145-157, 2010.
  3. A. Kaya, "Workflow Interoperability: The WfMC Reference Model and an Imple-mentation," Master Thesis, Technical University Hamburg-Harburg, Germany, 2001.
  4. D. Hoolingsworth, "The Workflow Reference Model: 10 Years On," Fujitsu Services, United Kingdom, pp. 295-312, 2004.
  5. S. Kim, K. Yoon and Y. Kim, "A Design of Integrated Scientific Workflow Execution Environment for A Computational Scientific Application," Reviewof Korean Society for Internet Information, Vol. 13, No. 1, pp.37-44, 2012. https://doi.org/10.7472/jksii.2012.13.1.37
  6. A. Barker and J. V. Hemert, "Scientific Workflow: A Survey and Research Direc-tions," in Proceedings of the 7th international conference on Parallel processing and applied mathematics, pp. 746-753, 2007.
  7. Y. Hahn and S. Lee, "Bioworks, A scientific Workflow Platform for Problem Solving in Biological Domain," in Proceedings of the 5th KOCON Conference, 2007.
  8. Kepler Project, https://kepler-project.org, 2009.
  9. B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Hones, E. A. Lee, J. Tao, and Y. Zhao, "Scientific Workflow Management and the KEPLER System," Concurrency and Computation: Practice and Experience, Vol. 18, No. 10, pp. 1039-1065, 2006.. https://doi.org/10.1002/cpe.994
  10. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li, "Taverna: a Tool for the Composition and Enactment of Bioinformatics Workflows," Bioinformatics, Vol. 20, No. 17, pp. 3045-3054, 2004. https://doi.org/10.1093/bioinformatics/bth361
  11. T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Greenwood, C. Goble, A. Wipat, P. Li, and T. Carver, "Delivering Web Service Coordination Capability to Users," in Proceedings of the 13th international World Wide Web conference, pp. 438-439, 2004.
  12. Bioinformatics Workflow Builder Interface, http://www.alphaworks.ibm
  13. S. Lee, T. D. Wang, N. Hashmi and M. P. Cummings, Bio-STEER: A Semantic Web Workflow Tool for Grid Computing in the Life Science. Future Generation Computer Systems, 23, 3 (2007)
  14. I. Taylor, M. Shields, I. Wang, and R. Philp, "Distributed P2P Computing within Triana: A Galaxy Visualization Test Case," in Proceedings of the 17th International Parallel and Distributed Processing Symposium, 2003.
  15. G. Allen, K. Davis, K. N. Dolkas, N. D. Doulamis, T. Goodale, T. Kielmann, A. Merzky, J. Nabrzyski, J. Pukacki, T. Radke, M. Russell, J. Shal, and I. Taylor, "Enabling Applications on the Grid: A GridLab Overview," International Journal of High Performance Computing Applications: Special Issue on Grid Computing: Infra-structure and Applications, Vol. 17, No. 4, pp. 1-22, 2003.
  16. D. Blankenberg, G. V. Kuster, N. Coraor, G. Andanda, R. Lazarus, M. Mangan, A. Nekrutenko, and J. Taylor, "Galaxy: A Web-Based Genome Analysis Tool for Ex-perimentalists," Informatics for Molecular Biologists, Vol. 19, pp. 1-21, 2010.
  17. S. Kim and Y. Park, "Overcoming limits of Bioinformatics using Cloud Computing," Journal of KIISE: Computer Systems and Theory, Vol. 27, No. 6, pp. 27-34, 2009.
  18. L.D. Stein, "The case for cloud computing in genome informatics," Stein Genome Biology, Vol. 11, p. 207, 2010. https://doi.org/10.1186/gb-2010-11-5-207
  19. E. Deelman, D. Gannon, M. Shields, and I. Taylor, "Workflows and e-science: an Overview of Workflow System Features and Capabilities," Future Generation Computer Systems, Vol. 25, No. 5, pp. 528-540, 2009. https://doi.org/10.1016/j.future.2008.06.012
  20. A.J. Nebro, G. Luque, F. Luna, E. Alba, "DNA fragment assembly using a grid-based genetic algorithm," Computers & Operations Research, Vol. 35, pp. 2776-2790, 2008. https://doi.org/10.1016/j.cor.2006.12.011
  21. Schadt, Eric E., et al., "Computational solutions to large-scale data management and analysis," Nature Reviews Genetics, Vol. 11, No. 9, pp. 647-657, 2010. https://doi.org/10.1038/nrg2857
  22. Geospiza, http://www.geospiza.com/cloud, 2010.
  23. Teragen Etax, http://www.thera-gen.com, 2011.
  24. KT ucloud, https://en.ucloudbiz.olleh.com/, 2010.
  25. Clunix supercomputing http://eng.clunix.com/, 2000.