ABSTRACT
Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. Furthermore, their performance may not be optimized across hardware platforms. This paper proposes a novel alignment pipeline that considers all information in sequencing data for biologically accurate acceleration of short read mapping. To ensure the performance of the proposed design optimized across different platforms, we accelerate the memory-bound operations which have been a bottleneck in short read mapping. Specifically, we partition the FM-index into buckets. The length of each bucket is equal to an optimal multiple of the memory burst size and is determined through data-driven exploration. A tool has been developed to obtain the optimal parameters of the design for different hardware platforms to enhance performance optimization. Experimental results indicate that our design maximizes alignment accuracy compared to the state-of-the-art software Bowtie, mapping reads 4.48x as fast. Compared to the previous hardware aligner, our achieved accuracy is 97.7% which reports 4.48 M more valid alignments with a similar speed.
Supplemental Material
- J. Arram et almbox. 2013a. Reconfigurable Acceleration of Short Read Mapping. In 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines. 210--217.Google ScholarDigital Library
- J. Arram et almbox. 2013b. Reconfigurable filtered acceleration of short read alignment. In 2013 International Conference on Field-Programmable Technology (FPT). 438--441.Google ScholarCross Ref
- J. Arram et almbox. 2015. Ramethy: Reconfigurable Acceleration of Bisulfite Sequence Alignment. In 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 250--259.Google ScholarDigital Library
- M. Burrows and D. Wheeler. 1994. A Block-sorting Lossless Data Compression Algorithm. Technical Report. Digital Equipment Corporation.Google Scholar
- A. Chacón et almbox. 2013. n-step FM-Index for Faster Pattern Matching. Procedia Computer Science, Vol. 18 (2013), 70--79.Google ScholarCross Ref
- M. C. F. Chang et almbox. 2016. The SMEM Seeding Acceleration for DNA Sequence Alignment. In 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 32--39.Google Scholar
- P. Draghicescu et almbox. 2012. Inexact Search Acceleration on FPGAs Using the Burrows-Wheeler Transform. Technical Report.Google Scholar
- B. Ewing et almbox. 1998. Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment. Genome Research, Vol. 8, 3 (1998), 175--185.Google ScholarCross Ref
- E. Fernandez et almbox. 2012. Multithreaded FPGA acceleration of DNA Sequence Mapping. In 2012 IEEE Conference on High Performance Extreme Computing. 1--6.Google ScholarCross Ref
- P. Ferragina and G. Manzini. 2001. An Experimental Study of an Opportunistic Index. In 12th Annual ACM-SIAM Symposium on Discrete Algorithms. 269--278.Google Scholar
- National Center for Biotechnology Information. 2020. Genome Reference Consortium Human Build 38. https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/Google Scholar
- P. Grigoras et almbox. 2017. dfesnippets:An Open-Source Library for Dataflow Acceleration on FPGAs. In 13th International Symposium on Applied Reconfigurable Computing. 299--310.Google ScholarCross Ref
- E. J. Houtgast et almbox. 2015. An FPGA-based Systolic Array to Accelerate the BWA-MEM Genomic Mapping Algorithm. In 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). 221--227.Google ScholarCross Ref
- E. J. Houtgast et almbox. 2016. Power-Efficiency Analysis of Accelerated BWA-MEM Implementations on Heterogeneous Computing Platforms. In 2016 International Conference on ReConFigurable Computing and FPGAs (ReConFig). 1--8.Google Scholar
- K. Koliogeorgi et almbox. 2019. Dataflow Acceleration of Smith-Waterman with Traceback for High Throughput Next Generation Sequencing. In 2019 29th International Conference on Field Programmable Logic and Applications (FPL). 74--80.Google Scholar
- B. Langmead et almbox. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, Vol. 10, R25 (2009).Google ScholarCross Ref
- B. Langmead et almbox. 2019. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics, Vol. 35, 3 (2019), 421--432.Google ScholarCross Ref
- B. Langmead and A. Nellore. 2018. Cloud computing for genomic data analysis and collaboration. Nature Reviews Genetics, Vol. 19 (2018), 208--219.Google ScholarCross Ref
- H. Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997v2 (2013).Google Scholar
- H. Li and R. Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, Vol. 25, 14 (2009), 1754--1760.Google ScholarDigital Library
- G. Lightbody et almbox. 2019. Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application. Briefings in Bioinformatics, Vol. 20, 5 (2019), 1795--1811.Google ScholarCross Ref
- N. A. Miller et almbox. 2015. A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases. Genome Medicine, Vol. 7, 100 (2015), 16 pages.Google Scholar
- H. Ng et almbox. 2013. Direct Virtual Memory Access from FPGA for High-productivity Heterogeneous Computing. In 2013 International Conference on Field-Programmable Technology (FPT). 458--461.Google ScholarCross Ref
- H.-C. Ng et almbox. 2017. Reconfigurable Acceleration of Genetic Sequence Alignment: A Survey of Two Decades of Efforts. In 27th International Conference on Field Programmable Logic and Applications (FPL). 1--8.Google Scholar
- H.-C. Ng et almbox. 2018. ADAM: Automated Design Analysis and Merging for Speeding up FPGA Development. In 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 189--198.Google ScholarDigital Library
- H.-C. Ng et almbox. 2020. Acceleration of Short Read Alignment with Runtime Reconfiguration. In 2020 International Conference on Field-Programmable Technology (FPT). 7 pages.Google Scholar
- S. Salamat and T. Rosing. 2020. FPGA Acceleration of Sequence Alignment: A Survey. arXiv preprint arXiv:2002.02394v2 (2020).Google Scholar
- B. Schmidt and A. Hildebrandt. 2017. Next-generation sequencing: big data meets high performance computing. Drug Discovery Today, Vol. 22, 4 (2017), 712--717.Google ScholarCross Ref
Index Terms
- Reconfigurable Acceleration of Short Read Mapping with Biological Consideration
Recommendations
Accelerating short read mapping on an FPGA (abstract only)
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate ArraysThe explosive growth of short read datasets produced by high throughput DNA sequencing technologies poses a challenge to the mapping of short reads to a reference genome in terms of sensitivity and execution speed. Existing methods often use a ...
Implementation of Short Read Alignment Algorithm in OpenCL on Xeon Phi Coprocessor
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and SystemsAligning sequencing reads to a reference genome is often essential in many comparative genomics pipelines. With the maturation of next-generation DNA sequencing (NGS) technologies, an enormous amount of sequence data has been generated, this calls for ...
Hardware Acceleration of Short Read Mapping
FCCM '12: Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing MachinesBioinformatics is an emerging field with seemingly limitless possibilities for advances in numerous areas of research and applications. We propose a scalable FPGA-based solution to the short read mapping problem in DNA sequencing, which greatly ...
Comments