Abstract
This paper discusses the state-of-the-art fast software implementation of block ciphers on Intel’s new microprocessor Core2, particularly concentrating on “bitslice implementation”. The bitslice parallel encryption technique, initially proposed by Biham for speeding-up DES, has been successful on RISC processors with many long registers, but on the other side bitsliced ciphers are not widely used in real applications on PC platforms, because in many cases they were actually not very fast on previous PC processors. Moreover the bitslice mode requires a non-standard data format and hence an additional format conversion is needed for compatibility with an existing parallel mode of operation, which was considered to be expensive.
This paper demonstrates that some bitsliced ciphers have a remarkable performance gain on Intel’s Core2 processor due to its enhanced SIMD architecture. We show that KASUMI, a UMTS/GSM mobile standard block cipher, can be four times faster when implemented using a bitslice technique on this processor. Also our bitsliced AES code runs at the speed of 9.2 cycles/byte, which is the performance record of AES ever made on a PC processor. Next we for the first time focus on how to optimize a conversion algorithm between a bitslice format and a standard format on a specific processor. As a result, the bitsliced AES code can be faster than a highly optimized “standard AES” code on Core2, even taking an overhead of the conversion into consideration. This means that in the CTR mode, bitsliced AES is not only fast but also fully compatible with an existing implementation and moreover secure against cache timing attacks, since a bitsliced cipher does not use any lookup tables with key/data-dependent address.
Chapter PDF
Similar content being viewed by others
References
3GPP TS 35.202 v6.1.0, 3G Security; Specification of the 3GPP Confidentiality and Integrity Algorithms; Document 2: KASUMI Specification (Release 6), 3rd Generation Partnership Project (2005)
Anderson, R., Biham, E., Knudsen, L.: Serpent: A proposal for the Advanced Encryption Standard, Available at http://www.ftp.cl.cam.ac.uk/ftp/users/rja14/serpent.pdf
Aoki, K., Ichikawa, T., Kanda, M., Matsui, M., Moriai, S., Nakajima, J., Tokita, T.: The 128-Bit Block Cipher Camellia. IEICE Trans. Fundamentals E85-A(1), 11–24 (2002)
Bhaskar, R., Dubey, P., Kumar, V., Rudra, A.: Efficient galois field arithmetic on SIMD architectures. In: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, pp. 256–257. ACM Press, New York (2003)
Biham, E.: A Fast New DES Implementation in Software. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 260–272. Springer, Heidelberg (1997)
Canright, D.: A Very Compact S-Box for AES. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 441–455. Springer, Heidelberg (2005)
The distributed. net project: Available at http://www.distributed.net/index.php.en
Federal Information Processing Standards Publication 197, Advanced Encryption Standard (AES), NIST (2001)
Fog, A.: Software optimization resources, Available at http://www.agner.org/optimize/
Gladman, B.: Serpent Performance, Available at http://fp.gladman.plus.com/cryptography_technology/serpent/
Granlund, T.: Instruction latencies and throughput for AMD and Intel x86 Processors, Available at http://swox.com/doc/x86-timing.pdf
ISO/IEC 18033-3, Information technology - Security techniques - Encryption algorithms - Part3: Block ciphers (2005)
Matsui, M.: New encryption algorithm MISTY. In: Biham, E. (ed.) FSE 1997. LNCS, vol. 1267, pp. 54–68. Springer, Heidelberg (1997)
Matsui, M.: How Far Can We Go on the x64 Processors? In: Robshaw, M. (ed.) FSE 2006. LNCS, vol. 4047, pp. 341–358. Springer, Heidelberg (2006)
Nakajima, J., Matsui, M.: Fast Software Implementations of MISTY1 on Alpha Processors. IEICE Trans. Fundamentals E82-A(1), 107–116 (1999)
Mentens, N., Batina, L., Preneel, B., Verbauwhede, I.: A Systematic Evaluation of Compact Hardware Implementations for the Rijndael S-Box. In: Menezes, A.J. (ed.) CT-RSA 2005. LNCS, vol. 3376, pp. 323–333. Springer, Heidelberg (2005)
Osvik, D.A., Shamir, A., Tromer, E.: Full AES key extraction in 65 milliseconds using cache attacks. In: Crypto 2005 rump session.
Rudra, A., Dubey, P., Jutla, C., Kummar, V., Rao, J., Rohatgi, P.: Efficient Rijndael Encryption Implementation with Composite Field Arithmetic. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 171–184. Springer, Heidelberg (2001)
Satoh, A., Morioka, S., Takano, K., Munetoh, S.: A Compact Rijndael Hardware Architecture with S-Box Optimization. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, pp. 239–254. Springer, Heidelberg (2001)
Shimoyama, T., Amada, S., Moriai, S.: Improved fast software implementation of block ciphers. In: Proceedings of the First International Conference on Information and Communication Security, pp. 269–273. Springer, Heidelberg (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Matsui, M., Nakajima, J. (2007). On the Power of Bitslice Implementation on Intel Core2 Processor. In: Paillier, P., Verbauwhede, I. (eds) Cryptographic Hardware and Embedded Systems - CHES 2007. CHES 2007. Lecture Notes in Computer Science, vol 4727. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74735-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-74735-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74734-5
Online ISBN: 978-3-540-74735-2
eBook Packages: Computer ScienceComputer Science (R0)