Parallel Automata Processor

Authors:
Arun Subramaniyan

University of Michigan, Ann Arbor

University of Michigan, Ann Arbor
View Profile

,
Reetuparna Das

University of Michigan, Ann Arbor

University of Michigan, Ann Arbor
View Profile

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitectureJune 2017Pages 600–612https://doi.org/10.1145/3079856.3080207

Published:24 June 2017Publication History

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Pages 600–612

ABSTRACT

Finite State Machines (FSM) are widely used computation models for many application domains. These embarrassingly sequential applications with irregular memory access patterns perform poorly on conventional von-Neumann architectures. The Micron Automata Processor (AP) is an in-situ memory-based computational architecture that accelerates non-deterministic finite automata (NFA) processing in hardware. However, each FSM on the AP is processed sequentially, limiting potential speedups.

In this paper, we explore the FSM parallelization problem in the context of the AP. Extending classical parallelization techniques to NFAs executing on AP is non-trivial because of high state-transition tracking overheads and exponential computation complexity. We present the associated challenges and propose solutions that leverage both the unique properties of the NFAs (connected components, input symbol ranges, convergence, common parent states) and unique features in the AP (support for simultaneous transitions, low-overhead flow switching, state vector cache) to realize parallel NFA execution on the AP.

We evaluate our techniques against several important benchmarks including NFAs used for network intrusion detection, malware detection, text processing, protein motif searching, DNA sequencing, and data analytics. Our proposed parallelization scheme demonstrates significant speedup (25.5x on average) compared to sequential execution on AP. Prior work has already shown that sequential execution on AP is at least an order of magnitude better than GPUs, multi-core processors and Xeon Phi accelerator.

References

Micron Automata Processing. Retrieved May 3, 2017 from http://www.micronautomata.com/Google Scholar
Micron Automata Processing D480 Documentation Design Notes. Retrieved May 3, 2017 from http://www.micronautomata.com/documentation/anml_documentation/c_D480_design_notes.htmlGoogle Scholar
Micron Automata Processing D480 Software Development Kit. AP Flow Concepts. Retrieved May 3, 2017 from http://micronautomata.com/apsdk_documentation/latest/h1_ap.htmlGoogle Scholar
Alfred V. Aho and Margaret J. Corasick. 1975. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18, 6 (June 1975), 333--340. Google ScholarDigital Library
Rajeev Alur and Mihalis Yannakakis. 1998. Model checking of hierarchical state machines. In ACM SIGSOFT Software Engineering Notes, Vol. 23. ACM, 175--188. Google ScholarDigital Library
Kevin Angstadt, Westley Weimer, and Kevin Skadron. 2016. RAPID Programming of Pattern-Recognition Processors. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 593--605. Google ScholarDigital Library
Michela Becchi and Patrick Crowley. 2008. Efficient regular expression evaluation: theory to practice. In Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, ANCS 2008, San Jose, California, USA, November 6-7, 2008. 50--59. Google ScholarDigital Library
Michela Becchi, Mark A. Franklin, and Patrick Crowley. 2008. A workload for evaluating deep packet inspection architectures. In 4th International Symposium on Workload Characterization (IISWC 2008), Seattle, Washington, USA, September 14-16, 2008. 79--89.Google ScholarCross Ref
Chunkun Bo, Ke Wang, Jeffrey J Fox, and Kevin Skadron. 2015. Entity Resolution Acceleration using Micron's Automata Processor. Architectures and Systems for Big Data (ASBD), in conjunction with ISCA (2015).Google Scholar
Alessandro Cimatti, Edmund Clarke, Enrico Giunchiglia, Fausto Giunchiglia, Marco Pistore, Marco Roveri, Roberto Sebastiani, and Armando Tacchella. 2002. Nusmv 2: An opensource tool for symbolic model checking. In International Conference on Computer Aided Verification. Springer, 359--364. Google ScholarDigital Library
Sutapa Datta and Subhasis Mukhopadhyay. 2015. A grammar inference approach for predicting kinase specific phosphorylation sites. PloS one 10, 4 (2015), e0122294.Google ScholarCross Ref
Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, and Harold Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems 25, 12 (2014), 3088--3098.Google ScholarCross Ref
Domenico Ficara, Stefano Giordano, Gregorio Procissi, Fabio Vitucci, Gianni Antichi, and Andrea Di Pietro. 2008. An improved DFA for fast regular expression matching. ACM SIGCOMM Computer Communication Review 38, 5 (2008), 29--40. Google ScholarDigital Library
Linley Gwennap. 2014. Micron Accelerates Automata:New Chip Speeds NFA Processing Using DRAM Architectures. In Microprocessor Report.Google Scholar
W Daniel Hillis and Guy L Steele Jr. 1986. Data parallel algorithms. Commun. ACM 29, 12 (1986), 1170--1183. Google ScholarDigital Library
Tommy Tracy II, Yao Fu, Indranil Roy, Eric Jonas, and Paul Glendenning. 2016. Towards Machine Learning on the Automata Processor. In High Performance Computing: 31st International Conference, ISC High Performance 2016, Frankfurt, Germany, June 19-23, 2016, Proceedings, Vol. 9697. Springer, 200.Google Scholar
Christopher Grant Jones, Rose Liu, Leo Meyerovich, Krste Asanovic, and Rastislav Bodik. 2009. Parallelizing the web browser. In Proceedings of the First USENIX Workshop on Hot Topics in Parallelism. Google ScholarDigital Library
Christopher Grant Jones, Rose Liu, Leo Meyerovich, Krste Asanović, and Rastislav Bodík. 2009. Parallelizing the Web Browser. In Proceedings of the First USENIX Conference on Hot Topics in Parallelism (HotPar'09). USENIX Association, Berkeley, CA, USA, 7--7. http://dl.acm.org/citation.cfm?id=1855591.1855598 Google ScholarDigital Library
Blake Kaplan. Speculative parsing path. Bug 527623. Retrieved May 3, 2017 from http://bugzilla.mozilla.orgGoogle Scholar
Shmuel Tomi Klein and Yair Wiseman. 2003. Parallel Huffman decoding with applications to JPEG files. Comput. J. 46, 5 (2003), 487--497.Google ScholarCross Ref
Sailesh Kumar, Sarang Dharmapurikar, Fang Yu, Patrick Crowley, and Jonathan Turner. 2006. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In ACM SIGCOMM Computer Communication Review, Vol. 36. ACM, 339--350. Google ScholarDigital Library
Richard E Ladner and Michael J Fischer. 1980. Parallel prefix computation. Journal of the ACM (JACM) 27, 4 (1980), 831--838. Google ScholarDigital Library
Daniel Luchaup, Randy Smith, Cristian Estan, and Somesh Jha. 2009. Multi-byte regular expression matching with speculation. In International Workshop on Recent Advances in Intrusion Detection. Springer, 284--303. Google ScholarDigital Library
Sasa Misailovic, Michael Carbin, Sara Achour, Zichao Qi, and Martin C Rinard. 2014. Chisel: Reliability-and accuracy-aware optimization of approximate computational kernels. In ACM SIGPLAN Notices, Vol. 49. ACM, 309--328. Google ScholarDigital Library
Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte. 2014. Data-parallel finite-state machines. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014. 529--542. Google ScholarDigital Library
Alexandre Petrenko. 2001. Fault model-driven test derivation from finite state models: Annotated bibliography. In Modeling and verification of parallel processes. Springer, 196--205. Google ScholarDigital Library
Junqiao Qiu, Zhijia Zhao, and Bin Ren. 2016. MicroSpec: Speculation-Centric Fine-Grained Parallelization for FSM Computations. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, PACT 2016, Haifa, Israel, September 11-15, 2016. 221--233. Google ScholarDigital Library
Indranil Roy and Srinivas Aluru. 2016. Discovering motifs in biological sequences using the micron automata processor. IEEE/ACM Transactions on Computational Biology and Bioinformatics 13, 1 (2016), 99--111.Google ScholarDigital Library
Margus Veanes, Todd Mytkowicz, David Molnar, and Benjamin Livshits. 2015. Data-Parallel String-Manipulating Programs. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015. 139--152. Google ScholarDigital Library
Jack Wadden, Nathan Brunelle, Ke Wang, Mohamed El-Hadedy, Gabriel Robins, Mircea Stan, and Kevin Skadron. 2016. Generating efficient and high-quality pseudo-random behavior on Automata Processors. In 34th IEEE International Conference on Computer Design, ICCD 2016, Scottsdale, AZ, USA, October 2-5, 2016. 622--629.Google ScholarCross Ref
Jack Wadden, Vinh Dang, Nathan Brunelle, Tommy Tracy II, Deyuan Guo, Elaheh Sadredini, Ke Wang, Chunkun Bo, Gabriel Robins, Mircea Stan, and Kevin Skadron. 2016. ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures. In 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, RI, USA, September 25-27, 2016. 105--166.Google ScholarCross Ref
Ke Wang, Yanjun Qi, Jeffrey J Fox, Mircea R Stan, and Kevin Skadron. 2015. Association rule mining with the micron automata processor. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE, 689--699. Google ScholarDigital Library
Ke Wang, Elaheh Sadredini, and Kevin Skadron. 2016. Sequential pattern mining with the Micron automata processor. In Proceedings of the ACM International Conference on Computing Frontiers. ACM, 135--144. Google ScholarDigital Library
Michael HLS Wang, Gustavo Cancelo, Christopher Green, Deyuan Guo, Ke Wang, and Ted Zmuda. 2016. Using the automata processor for fast pattern recognition in high energy physics experiments--A proof of concept. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 832 (2016), 219--230.Google ScholarCross Ref
Qiong Wang, Mohamed El-Hadedy, Ke Wang, and Kevin Skadron. 2016. Accelerating Weeder: A DNA Motif Search Tool using the Micron Automata Processor. (2016).Google Scholar
Zhen-Gang Wang, Johann Elbaz, Françoise Remacle, RD Levine, and Itamar Willner. 2010. All-DNA finite-state automata with finite memory. Proceedings of the National Academy of Sciences 107, 51 (2010), 21996--22001.Google ScholarCross Ref
Yi-Hua E Yang and Viktor K Prasanna. 2011. Optimizing regular expression matching with sr-nfa on multi-core systems. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 424--433. Google ScholarDigital Library
Fang Yu, Zhifeng Chen, Yanlei Diao, TV Lakshman, and Randy H Katz. 2006. Fast and memory-efficient regular expression matching for deep packet inspection. In Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems. ACM, 93--102. Google ScholarDigital Library
Zhijia Zhao and Xipeng Shen. 2015. On-the-Fly Principled Speculation for FSM Parallelization. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, Istanbul, Turkey, March 14-18, 2015. 619--630. Google ScholarDigital Library
Zhijia Zhao, Bo Wu, and Xipeng Shen. 2014. Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation. In Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014. 543--558. Google ScholarDigital Library
Keira Zhou, Jeffrey J Fox, Ke Wang, Donald E Brown, and Kevin Skadron. 2015. Brill tagging on the micron automata processor. In Semantic Computing (ICSC), 2015 IEEE International Conference on. IEEE, 236--239.Google ScholarCross Ref

Index Terms

Parallel Automata Processor

Recommendations

Cache automaton
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Finite State Automata are widely used to accelerate pattern matching in many emerging application domains like DNA sequencing and XML parsing. Conventional CPUs and compute-centric accelerators are bottlenecked by memory bandwidth and irregular memory ...
Read More
Parallel Automata Processor
ISCA'17

Finite State Machines (FSM) are widely used computation models for many application domains. These embarrassingly sequential applications with irregular memory access patterns perform poorly on conventional von-Neumann architectures. The Micron Automata ...
Read More
Programming the Linpack benchmark for the IBM PowerXCell 8i processor
High Performance Computing with the Cell Broadband Engine

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i ¹ processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™ ² architecture ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture
June 2017
736 pages
ISBN:9781450348928
DOI:10.1145/3079856
ACM SIGARCH Computer Architecture News Volume 45, Issue 2
ISCA'17
May 2017
715 pages
ISSN:0163-5964
DOI:10.1145/3140659
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Emerging technologies (memory and computing)
accelerators
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ISCA '17 Paper Acceptance Rate54of322submissions,17%Overall Acceptance Rate543of3,203submissions,17%
More
Upcoming Conference
ISCA '24

Sponsor:

sigarch

ISCA '24: The 51st Annual International Symposium on Computer Architecture

June 29 - July 3, 2024

Buenos Aires , Argentina
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 1,684
  Total Downloads
- Downloads (Last 12 months)135
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parallel Automata Processor

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache automaton

Parallel Automata Processor

Programming the Linpack benchmark for the IBM PowerXCell 8i processor