Abstract
When trying to identify a printed forged document, examining digital evidence can prove to be a challenge. Over the past several years, digital forensics for printed document source identification has begun to be increasingly important which can be related to the investigation and prosecution of many types of crimes. Unlike invasive forensic approach which requires a fraction of the printed document as the specimen for verification, noninvasive forensic technique uses the optical mechanism to explore the relationship between the scanned images and the source printer. To explore the relationship between source printers and images obtained by the scanner, the proposed decision-theoretical approach utilizes image processing techniques and data exploration methods to calculate many important statistical features, including: Local Binary Pattern (LBP), Gray Level Co-occurrence Matrix (GLCM), Discrete Wavelet Transform (DWT), Spatial filters, the Wiener filter, the Gabor filter, Haralick, and SFTA features. Consequently, the proposed aggregation method intensively applies the extracted features and decision-fusion model of feature selections for classification. In addition, the impact of different paper texture or paper color for printed sources identification is also investigated. In the meantime, the up-to-date techniques based on deep learning system is developed by Convolutional Neural Networks (CNNs) which can learn the features automatically to solve the complex image classification problem. Both systems have been compared and the experimental results indicate that the proposed system achieve the overall best accuracy prediction for image and text input and is superior to the existing approaches. In brief, the proposed decision-theoretical model can be very efficiently implemented for real world digital forensic applications.
Similar content being viewed by others
References
Ali GN, Mikkilineni AK, Chiang PJ, Allebach GT, Delp EJ (2003) Intrinsic and extrinsic signatures for information hiding and secure printing with electrophotographic devices. In International Conference on Digital Printing Technologies. New Orleans, LA, USA; 28 Sept–3 Oct, 511–515
Bekhti MA, Kobayashi Y (2016) Prediction of vibrations as a measure of terrain traversability in outdoor structured and natural environments. In: Image and video technology, Vol. 9431 of the series lecture notes in computer science. Springer International Publishing, Auckland 282–294. https://doi.org/10.1007/978-3-319-29451-3_23
Bulan O, Mao J, Sharma G (2009) Geometric distortion signatures for printer identification. International conference on acoustics, speech and signal processing (ICASSP), Taipei, 1401–1404. https://doi.org/10.1109/ICASSP.2009.4959855
Burger W, Burge MJ (2018) Digital image processing: an introduction algorithmic using Java. Springer Science Business Media, New York
Choi JH, Lee HY, Lee HK (2013) Color laser printer forensic based on noisy feature and support vector machine classifier. Multimed Tools Appl 67:363–382. https://doi.org/10.1007/s11042-011-0835-9
Costa AF, Humpire-Mamani G, Traina AJM (2012) An efficient algorithm for fractal analysis of textures. SIBGRAPI conference on graphics, patterns and images, August, Ouro Preto, 39–46. https://doi.org/10.1109/SIBGRAPI.2012.15
Daugman JG (1988) Complete discrete 2D Gabor transforms by neural networks for image-analysis and compression. IEEE Trans Acoust Speech Signal Process 36(7):1169–1179. https://doi.org/10.1109/29.1644
Ferreira A, Navarro LC, Pinheiro G, Santos JAD, Rocha A (2015) Laser printer attribution: exploring new features and beyond. Forensic Sci Int 247:105–125. https://doi.org/10.1016/j.forsciint.2014.11.030
Gonzales RC, Woods RE (2008) Digital image processing, 3rd edn. Prentice Hall, New Jersey
Gonzales RC, Woods RE, Eddins SL (2009) Digital image processing using MATLAB, 2nd edn. Gatesmark, United States
Haghighat M, Zonout S, Abdel-Mottaleb M (2015) CloudID: trustworthy cloud-based and cross-enterprise biometric identification. Expert Syst Appl 42(21):7905–7916. https://doi.org/10.1016/j.eswa.2015.06.025
Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybernet SMC 3(6):610–621
He K et al (2016) Deep residual learning for image recognition. IEEE conference on computer vision and pattern recognition (CVPR) 770–778
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. National Taiwan University, Taipei http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. Accessed 8 Apr 2017
http://www.explainthatstuff.com/laserprinters.html. Accessed 2 Apr 2017
http://computer.howstuffworks.com/laser-printer2.htm. Accessed 2 Apr 2017
Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154
Jurič I, Ranđelović D, Karlović I, Tomić I (2014) Influence of the surface roughness of coated and uncoated papers on the digital print mottle. J Graph Eng Des 5(1):17–23
Kawasaki M, Ishisaki M (2009) Investigation into the cause of print mottle in halftone dots of coated paper: effect of optical dot gain non-uniformity 63(11):1362–1373. http://www.tappi.org/content/06IPGA/5-4%20Kawasaki%20M%20Ishisaki.pdf. Accessed 7 April 2017
Kee E, Farid H (2008) Printer profiling for forensics and ballistics. ACM Workshop on Multimedia and Security, 3–10
Khanna N, Delp EJ (2010) Intrinsic signatures for scanned documents forensics: effect of font shape and size. Proceedings of 2010 I.E. international symposium on circuits and systems (ISCAS), 30 May– 2 June. https://doi.org/10.1109/ISCAS.2010.5537996
Kim DG, Lee HK (2014) Color laser printer identification using photographed halftone images. Proc. of EUSIPCO. September, IEEE, Lisbon, 795–799
Kim KI, Jung K, Park SH, Kim HJ (2002) Support vector machines for texture classification. IEEE Trans Pattern Anal Mach Intell 24(11):1542–1550. https://doi.org/10.1109/TPAMI.2002.1046177
Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Process Int Conf Neural Inf Process Syst (NIPS) 1:1097–1105
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lewis JA (2014) Forensic document examination: fundamentals and current trends. Elsevier, Oxford. https://doi.org/10.1016/B978-0-12-416693-6.12001-6
Lin CJ (2007) A tutorial of the wavelet transforms. National Taiwan University, http://disp.ee.ntu.edu.tw/tutorial/WaveletTutorial.pdf. Accessed 3 Apr 2017
Lopez FM, Martins DC, Cesar RM (2008) Feature selection environment for genomic applications. BMC Bioinformatics 9:451. https://doi.org/10.1186/1471-2105-9-451
Mäenpää T, Pietikäinen M (2004) Texture analysis with local binary patterns. In: Chen CH, Wang PSP (eds) Handbook of pattern recognition & computer vision, 3rd edn. World Scientific, Singapore, pp 115–118
Markoff J (2012) How many computers to identify a cat? 16,000. The New York. Retrieved June 22, 2012, from https://mobile.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html
McAndrew A (2016) A computational introduction to digital image processing. CRC Press, Boca Raton
Mikkilineni AK, Chiang PJ, Ali GN, Chiu GT, Allebach JP, Delp EJ (2004) Printer identification based on textural features. Intl. conference on digital printing technologies 306–311
Mikkilineni AK, Chiang JP, Ali GN, Chiu GT, Allebach JP, Delp EJ (2005) Printer identification based on graylevel co-occurrence features for security and forensic applications. Intl. Conference on Security, Steganography and Watermarking of Multimedia Contents VII, Proc. SPIE. 5681, 430–440, March 21. https://doi.org/10.1117/12.593796
Mikkilineni AK, Arslan O, Chiang PJ, Kumontoy RM, Allebach JP, Chiu GT (2005) Printer forensics using SVM techniques. Intl. conference on digital printing technologies, 223–226
Mikkilineni AK, Khanna N, Delp EJ (2010) Texture based attacks on intrinsic signature based printer identification. Proceedings SPIE 7541, Media Forensics and Security II, 28 January. https://doi.org/10.1117/12.845377
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A (2011) Reading digits in natural images with unsupervised feature learning. NIPS workshop on deep learning and unsupervised feature learning
Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59. https://doi.org/10.1016/0031-3203(95)00067-4
Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with LBP. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623
Pudil P, Ferry FJ, Novovicova J, Kittler J(1994) Floating search methods for feature selection with nonmonotonic criterion functions. IEEE, 1051-465U9, http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=576920. Accessed 3 Apr 2017
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125
Qiu Z, Jin J, Lam HK, Zhang Y, Wang X, Cichocki A (2016) Improved SFFS method for channel selection in motor imagery based BCI. Neurocomputing. https://doi.org/10.1016/j.neucom.2016.05.035
Rumelhart E, Geoffrey E, Ronald J (1986) Learning representations by back-propagating errors. Nature 323:533–536
Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Ryu SJ, Lee KY, Im DH, Choi JH, Lee HK (2010) Electrophotographic printer identification by halftone texture analysis. In: IEEE Intl. conference on acoustics speech and signal processing (ICASSP), 1846–1849. https://doi.org/10.1109/ICASSP.2010.5495377
Say OT, Sauli Z, Retnasamy V (2013) High density printing paper quality investigation. IEEE Regional Symposium on Micro and Nano electronics (RSM), Langkawi, 273–277. https://doi.org/10.1109/RSM.2013.6706528
Schalkoff RJ (1989) Digital image processing and computer vision. Wiley, Australia
Simonyan K. and Zisserman A. (2015) Very deep convolutional networks for large-scale image recognition. IEEE conference on computer vision and pattern recognition (CVPR), arXiv preprint arXiv:1409.1556
Su R, Pekarovicova A, Fleming PD, Bliznyuk V (2005) Physical properties of LWC papers and Gravure Ink Mileage, https://www.researchgate.net/publication/251423637_Physical_Properties_of_LWC_Papers_and_Gravure_Ink_Mileage. Accessed 3 Apr 2017
Szegedy, C. (2015) Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2015.7298594
The Electron Microscope (2017) http://www.microscopemaster.com/electron-microscope.html. Accessed 11 Apr 2017
Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 45–66. http://www.jmlr.org/papers/volume2/tong01a/tong01a.pdf. Accessed 7 Apr 2017
Tsai MJ, Liu J (2013) Digital forensics for printed source identification. In: IEEE international symposium on circuits and systems (ISCAS), May, 2347–2350. https://doi.org/10.1109/ISCAS.2013.6572349
Tsai MJ, Liu J, Wang CS, Chuang CH (2011) Source color laser printer identification using discrete wavelet transform and feature selection algorithms. IEEE international symposium on circuits and systems (ISCAS), May, Rio de Janeiro, 2633–2636. https://doi.org/10.1109/ISCAS.2011.5938145
Tsai MJ, Yin JS, Yuadi I, Liu J (2014) Digital forensics of printed source identification for Chinese characters. Multimed Tools Appl 73:2129–2155. https://doi.org/10.1007/s11042-013-1642-2
Tsai MJ, Hsu CL, Yin JS, Yuadi I (2015) Japanese character based printed source identification. IEEE International Symposium on Circuits and Systems (ISCAS), May, Lisbon, 2800–2803. https://doi.org/10.1109/ISCAS.2015.7169268
Vega LR, Rey H (2013) A rapid introduction to adaptive filtering. Springer-Verlag, Berlin
Wu Y, Kong X, You X, Guo Y 2009 Printer forensics based on page document’s geometric distortion. Intl. conference on image processing (ICIP), Cairo, 2909–2912. https://doi.org/10.1109/ICIP.2009.5413420
Zhou H, Wu J, Zhang J (2010) Digital image processing: part 1. Ventus Publishing ApS, Denmark
Acknowledgments
This work was partially supported by the National Science Council in Taiwan, Republic of China, under NSC104-2410-H-009-020-MY2 and NSC106-2410-H-009-022-.
The authors would like to thank the anonymous reviewers with their valuable comments to improve the quality of this manuscript. Special thanks to Jin-Sheng Yin and Goang-Jiun Wang at National Chiao Tung University who help the revision and the software experiments.
Author information
Authors and Affiliations
Corresponding author
Appendix: Formulas of feature filters
Appendix: Formulas of feature filters
Brief description of the formulas for ten feature filter sets is shown below:
Feature Filter | Image quality Measures | Formula |
GLCM | Region of interest R (ROI) GLCM | \( R=\sum \limits_{\left(i,j\right)\in ROI}^1 \) \( GLCM\ \left(i,j\right)=\frac{1}{\sum \limits_{\left(i,j\right)} Img\left(i,j\right)} Img\left(i,j\right) \) where (i, j) indicates the spatial location of image. Img (i, j) is the probability from location (i, j). |
DWT | Dilation 3 wavelet functions | Ψ(H)(x, y), Ψ(V)(x, y), and Ψ(D)(x, y), When the wavelete function is sparable by f(x, y) = f1(x), f2(y), then these functions rewritten to ϕ(x, y) = ϕ(x), ϕ(y) Ψ(H)(x, y) = Ψ(x), ϕ(y) Ψ(V)(x, y) = ϕ(x), Ψ(y) Ψ(D)(x, y) = Ψ(x), Ψ(y) whereΨ(H)(x, y),Ψ(V)(x, y), and Ψ(D)(x, y) are called horizontal, vertical, and diagonal wavelets |
Gaussian | G (x, y)is Gaussian matrix element at position (x, y) | \( \mathrm{G}\ \left(x,y\right)=\frac{1}{2\uppi {\sigma}^2}{e}^{-\frac{1}{2\uppi {\sigma}^2}} \) where G (x, y)is Gaussian matrix element at position (x, y), σ is the standard deviation. |
LoG | Log(x, y) is the high-frequency Laplacian filter | \( \mathrm{Log}\left(x,y\right)=-\frac{1}{\uppi {\sigma}^4}\left[1-\frac{1}{\uppi {\sigma}^4}\right]{\mathrm{e}}^{-\frac{1}{\uppi {\sigma}^4}} \) |
Unsharp | fs(x, y) is the sharped imaged from unsharp mask | \( {f}_s\left(x,y\right)=f\left(x,y\right)-\overline{f}\ \left(x,y\right) \) where \( \overline{f}\ \left(x,y\right) \)is a blured version of f(x, y) |
Wiener | H(u, v)is function of Wiener filter | g(x, y) = f(x, y) + n (x, y) \( H\left(u,v\right)=\frac{P_f\left(u,v\right)}{P_f\left(u,v\right)+{\sigma}^2} \) Where σ2is variance from the noise in Eqs. (9), Pf(u, v)is the signal power spectrum |
Gabor | sx & sy are the variance along x and y axis, f is the frequency of sinusoidal function and θ is the orientation of Gabor function | \( G\left(x,y\right)=\frac{f^2}{\uppi \upgamma \upeta}\exp \left(-\frac{f^2}{\uppi \upgamma \upeta}\right)\exp \Big(\mathrm{j}2\uppi f{x}^{\prime }+ \) ϕ) x′ = x cos(θ) + y sin(θ) y′ = y cos(θ) − x sin(θ) |
Haralick | Angular second moment Contrast Correlation Sum of squares (variance) Inverse different moment Sum average Sum varince Sum entropy Entropy Difference variance Difference entropy Info. measure of correlation 1 Info. measure of correlation 2 Max. correlation coefficient | ∑i∑jp(i, j)2 \( {\sum}_{\mathrm{n}=0}^{{\mathrm{N}}_{\mathrm{g}}-1}{n}^2\left\{{\sum}_{\mathrm{i}=1}^{{\mathrm{N}}_{\mathrm{g}}}{\sum}_{\mathrm{j}=1}^{{\mathrm{N}}_{\mathrm{g}}}p\left(i,j\right)\right\},\left|i-j\right|=n \) \( \frac{\sum_i{\sum}_j(ij)p\left(i,j\right)-{\mu}_x{\mu}_y}{\sigma_x{\sigma}_y} \) ∑i∑j(i − μ)2p(i, j) \( {\sum}_i{\sum}_j\frac{i}{1+{\left(i-j\right)}^p}p\left(i,j\right) \) \( {\sum}_{i=2}^{2{N}_g}{ip}_{x+y}(i) \) \( {\sum}_{i=2}^{2{N}_g}{\left(i-{f}_s\right)}^2{p}_{x+y}(i) \) \( -{\sum}_{i=2}^{2{N}_g}{p}_{x+y}(i)\mathit{\log}\left\{{p}_{x+y}(i)\right\} \) −∑i∑jp(i, j) log(p(i, j)) variance of px − y(i) \( -{\sum}_{i=0}^{N_g-1}{p}_{x-y}(i)\log \left\{{p}_{x+y}(i)\right\} \) \( \frac{ HX Y-{ HX Y}_1}{\max \left\{ HX, HY\right\}} \) \( {\left(1-\exp \left[-2.0\left( HXY2- HXY\right)\right]\right)}^{\frac{1}{2}} \) HXY1 = − ∑i∑jp(i, j) log(p(i, j)) where HX and HY are the entropies of px and py, HXY1 = − ∑i∑jp(i, j) log {px(i)(p)y(j)}, and HXY2 = − ∑i∑jpx(i)py(j) log {px(i)(p)y(j)} (the second largest eigenvalue of Q)1/2 where\( Q\left(i,j\right)={\sum}_k\frac{p\left(i,k\right)p\left(j,k\right)}{p_x(i){p}_y(k)} \) |
Fractal | Δ(x, y): fractal feature vector | \( \Delta \left(x,y\right)=\left\{\begin{array}{c}1,\\ {}\\ {}0,\end{array}\right.{\displaystyle \begin{array}{c} if\exists \left({x}^{\prime },{y}^{\prime}\right)\in {N}_4{\left[x,y\right]}_{,}\\ {} Ib\left({x}^{\prime },{y}^{\prime}\right)=0\wedge Ib\left({x}^{\prime },{y}^{\prime}\right)=1,\\ {} otherwise\end{array}} \) where N4[x, y],is the set of pixels that are 4-connected to (x, y) from the image. ∆(x, y) uses the value 1 if the pixel at position (x, y) in the binary image Ib(x′, y′) that has the value 1 and having one neighboring pixel with the value 0. Otherwise, ∆(x, y) takes the value 0. |
LBP | LBPP, R(xc, yc) LBP features where P sampling points on a circle of R radius | \( {LBP}_{P,R}\left({x}_c,{y}_c\right)={\sum}_{p=0}^{P-1}\mathrm{s}\left({g}_{\mathrm{p}}-{g}_c\right){2}^ps(x)=\left\{\begin{array}{c}1, ifx\ge 0;\kern1em \\ {}0, otherwise.\end{array}\right. \) |
Rights and permissions
About this article
Cite this article
Tsai, MJ., Yuadi, I. & Tao, YH. Decision-theoretic model to identify printed sources. Multimed Tools Appl 77, 27543–27587 (2018). https://doi.org/10.1007/s11042-018-5938-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5938-0