Skip to main content
Log in

Decision-theoretic model to identify printed sources

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

When trying to identify a printed forged document, examining digital evidence can prove to be a challenge. Over the past several years, digital forensics for printed document source identification has begun to be increasingly important which can be related to the investigation and prosecution of many types of crimes. Unlike invasive forensic approach which requires a fraction of the printed document as the specimen for verification, noninvasive forensic technique uses the optical mechanism to explore the relationship between the scanned images and the source printer. To explore the relationship between source printers and images obtained by the scanner, the proposed decision-theoretical approach utilizes image processing techniques and data exploration methods to calculate many important statistical features, including: Local Binary Pattern (LBP), Gray Level Co-occurrence Matrix (GLCM), Discrete Wavelet Transform (DWT), Spatial filters, the Wiener filter, the Gabor filter, Haralick, and SFTA features. Consequently, the proposed aggregation method intensively applies the extracted features and decision-fusion model of feature selections for classification. In addition, the impact of different paper texture or paper color for printed sources identification is also investigated. In the meantime, the up-to-date techniques based on deep learning system is developed by Convolutional Neural Networks (CNNs) which can learn the features automatically to solve the complex image classification problem. Both systems have been compared and the experimental results indicate that the proposed system achieve the overall best accuracy prediction for image and text input and is superior to the existing approaches. In brief, the proposed decision-theoretical model can be very efficiently implemented for real world digital forensic applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Ali GN, Mikkilineni AK, Chiang PJ, Allebach GT, Delp EJ (2003) Intrinsic and extrinsic signatures for information hiding and secure printing with electrophotographic devices. In International Conference on Digital Printing Technologies. New Orleans, LA, USA; 28 Sept–3 Oct, 511–515

  2. Bekhti MA, Kobayashi Y (2016) Prediction of vibrations as a measure of terrain traversability in outdoor structured and natural environments. In: Image and video technology, Vol. 9431 of the series lecture notes in computer science. Springer International Publishing, Auckland 282–294. https://doi.org/10.1007/978-3-319-29451-3_23

    Chapter  Google Scholar 

  3. Bulan O, Mao J, Sharma G (2009) Geometric distortion signatures for printer identification. International conference on acoustics, speech and signal processing (ICASSP), Taipei, 1401–1404. https://doi.org/10.1109/ICASSP.2009.4959855

  4. Burger W, Burge MJ (2018) Digital image processing: an introduction algorithmic using Java. Springer Science Business Media, New York

    Google Scholar 

  5. Choi JH, Lee HY, Lee HK (2013) Color laser printer forensic based on noisy feature and support vector machine classifier. Multimed Tools Appl 67:363–382. https://doi.org/10.1007/s11042-011-0835-9

    Article  Google Scholar 

  6. Costa AF, Humpire-Mamani G, Traina AJM (2012) An efficient algorithm for fractal analysis of textures. SIBGRAPI conference on graphics, patterns and images, August, Ouro Preto, 39–46. https://doi.org/10.1109/SIBGRAPI.2012.15

  7. Daugman JG (1988) Complete discrete 2D Gabor transforms by neural networks for image-analysis and compression. IEEE Trans Acoust Speech Signal Process 36(7):1169–1179. https://doi.org/10.1109/29.1644

    Article  MATH  Google Scholar 

  8. Ferreira A, Navarro LC, Pinheiro G, Santos JAD, Rocha A (2015) Laser printer attribution: exploring new features and beyond. Forensic Sci Int 247:105–125. https://doi.org/10.1016/j.forsciint.2014.11.030

    Article  Google Scholar 

  9. Gonzales RC, Woods RE (2008) Digital image processing, 3rd edn. Prentice Hall, New Jersey

    Google Scholar 

  10. Gonzales RC, Woods RE, Eddins SL (2009) Digital image processing using MATLAB, 2nd edn. Gatesmark, United States

    Google Scholar 

  11. Haghighat M, Zonout S, Abdel-Mottaleb M (2015) CloudID: trustworthy cloud-based and cross-enterprise biometric identification. Expert Syst Appl 42(21):7905–7916. https://doi.org/10.1016/j.eswa.2015.06.025

    Article  Google Scholar 

  12. Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybernet SMC 3(6):610–621

    Article  Google Scholar 

  13. He K et al (2016) Deep residual learning for image recognition. IEEE conference on computer vision and pattern recognition (CVPR) 770–778

  14. Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  15. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. National Taiwan University, Taipei http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf. Accessed 8 Apr 2017

    Google Scholar 

  16. http://www.explainthatstuff.com/laserprinters.html. Accessed 2 Apr 2017

  17. http://computer.howstuffworks.com/laser-printer2.htm. Accessed 2 Apr 2017

  18. Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154

    Article  Google Scholar 

  19. Jurič I, Ranđelović D, Karlović I, Tomić I (2014) Influence of the surface roughness of coated and uncoated papers on the digital print mottle. J Graph Eng Des 5(1):17–23

    Google Scholar 

  20. Kawasaki M, Ishisaki M (2009) Investigation into the cause of print mottle in halftone dots of coated paper: effect of optical dot gain non-uniformity 63(11):1362–1373. http://www.tappi.org/content/06IPGA/5-4%20Kawasaki%20M%20Ishisaki.pdf. Accessed 7 April 2017

  21. Kee E, Farid H (2008) Printer profiling for forensics and ballistics. ACM Workshop on Multimedia and Security, 3–10

  22. Khanna N, Delp EJ (2010) Intrinsic signatures for scanned documents forensics: effect of font shape and size. Proceedings of 2010 I.E. international symposium on circuits and systems (ISCAS), 30 May– 2 June. https://doi.org/10.1109/ISCAS.2010.5537996

  23. Kim DG, Lee HK (2014) Color laser printer identification using photographed halftone images. Proc. of EUSIPCO. September, IEEE, Lisbon, 795–799

  24. Kim KI, Jung K, Park SH, Kim HJ (2002) Support vector machines for texture classification. IEEE Trans Pattern Anal Mach Intell 24(11):1542–1550. https://doi.org/10.1109/TPAMI.2002.1046177

    Article  Google Scholar 

  25. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. Process Int Conf Neural Inf Process Syst (NIPS) 1:1097–1105

    Google Scholar 

  26. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  27. Lewis JA (2014) Forensic document examination: fundamentals and current trends. Elsevier, Oxford. https://doi.org/10.1016/B978-0-12-416693-6.12001-6

    Book  Google Scholar 

  28. Lin CJ (2007) A tutorial of the wavelet transforms. National Taiwan University, http://disp.ee.ntu.edu.tw/tutorial/WaveletTutorial.pdf. Accessed 3 Apr 2017

  29. Lopez FM, Martins DC, Cesar RM (2008) Feature selection environment for genomic applications. BMC Bioinformatics 9:451. https://doi.org/10.1186/1471-2105-9-451

    Article  Google Scholar 

  30. Mäenpää T, Pietikäinen M (2004) Texture analysis with local binary patterns. In: Chen CH, Wang PSP (eds) Handbook of pattern recognition & computer vision, 3rd edn. World Scientific, Singapore, pp 115–118

    Google Scholar 

  31. Markoff J (2012) How many computers to identify a cat? 16,000. The New York. Retrieved June 22, 2012, from https://mobile.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html

  32. McAndrew A (2016) A computational introduction to digital image processing. CRC Press, Boca Raton

    MATH  Google Scholar 

  33. Mikkilineni AK, Chiang PJ, Ali GN, Chiu GT, Allebach JP, Delp EJ (2004) Printer identification based on textural features. Intl. conference on digital printing technologies 306–311

  34. Mikkilineni AK, Chiang JP, Ali GN, Chiu GT, Allebach JP, Delp EJ (2005) Printer identification based on graylevel co-occurrence features for security and forensic applications. Intl. Conference on Security, Steganography and Watermarking of Multimedia Contents VII, Proc. SPIE. 5681, 430–440, March 21. https://doi.org/10.1117/12.593796

  35. Mikkilineni AK, Arslan O, Chiang PJ, Kumontoy RM, Allebach JP, Chiu GT (2005) Printer forensics using SVM techniques. Intl. conference on digital printing technologies, 223–226

  36. Mikkilineni AK, Khanna N, Delp EJ (2010) Texture based attacks on intrinsic signature based printer identification. Proceedings SPIE 7541, Media Forensics and Security II, 28 January. https://doi.org/10.1117/12.845377

  37. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A (2011) Reading digits in natural images with unsupervised feature learning. NIPS workshop on deep learning and unsupervised feature learning

  38. Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on featured distributions. Pattern Recogn 29(1):51–59. https://doi.org/10.1016/0031-3203(95)00067-4

    Article  Google Scholar 

  39. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with LBP. IEEE Trans Pattern Anal Mach Intell 24(7):971–987. https://doi.org/10.1109/TPAMI.2002.1017623

    Article  MATH  Google Scholar 

  40. Pudil P, Ferry FJ, Novovicova J, Kittler J(1994) Floating search methods for feature selection with nonmonotonic criterion functions. IEEE, 1051-465U9, http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=576920. Accessed 3 Apr 2017

  41. Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125

    Article  Google Scholar 

  42. Qiu Z, Jin J, Lam HK, Zhang Y, Wang X, Cichocki A (2016) Improved SFFS method for channel selection in motor imagery based BCI. Neurocomputing. https://doi.org/10.1016/j.neucom.2016.05.035

    Article  Google Scholar 

  43. Rumelhart E, Geoffrey E, Ronald J (1986) Learning representations by back-propagating errors. Nature 323:533–536

    Article  Google Scholar 

  44. Russakovsky O et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  45. Ryu SJ, Lee KY, Im DH, Choi JH, Lee HK (2010) Electrophotographic printer identification by halftone texture analysis. In: IEEE Intl. conference on acoustics speech and signal processing (ICASSP), 1846–1849. https://doi.org/10.1109/ICASSP.2010.5495377

  46. Say OT, Sauli Z, Retnasamy V (2013) High density printing paper quality investigation. IEEE Regional Symposium on Micro and Nano electronics (RSM), Langkawi, 273–277. https://doi.org/10.1109/RSM.2013.6706528

  47. Schalkoff RJ (1989) Digital image processing and computer vision. Wiley, Australia

    Google Scholar 

  48. Simonyan K. and Zisserman A. (2015) Very deep convolutional networks for large-scale image recognition. IEEE conference on computer vision and pattern recognition (CVPR), arXiv preprint arXiv:1409.1556

  49. Su R, Pekarovicova A, Fleming PD, Bliznyuk V (2005) Physical properties of LWC papers and Gravure Ink Mileage, https://www.researchgate.net/publication/251423637_Physical_Properties_of_LWC_Papers_and_Gravure_Ink_Mileage. Accessed 3 Apr 2017

  50. Szegedy, C. (2015) Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2015.7298594

  51. The Electron Microscope (2017) http://www.microscopemaster.com/electron-microscope.html. Accessed 11 Apr 2017

  52. Tong S, Koller D (2001) Support vector machine active learning with applications to text classification. J Mach Learn Res 45–66. http://www.jmlr.org/papers/volume2/tong01a/tong01a.pdf. Accessed 7 Apr 2017

  53. Tsai MJ, Liu J (2013) Digital forensics for printed source identification. In: IEEE international symposium on circuits and systems (ISCAS), May, 2347–2350. https://doi.org/10.1109/ISCAS.2013.6572349

  54. Tsai MJ, Liu J, Wang CS, Chuang CH (2011) Source color laser printer identification using discrete wavelet transform and feature selection algorithms. IEEE international symposium on circuits and systems (ISCAS), May, Rio de Janeiro, 2633–2636. https://doi.org/10.1109/ISCAS.2011.5938145

  55. Tsai MJ, Yin JS, Yuadi I, Liu J (2014) Digital forensics of printed source identification for Chinese characters. Multimed Tools Appl 73:2129–2155. https://doi.org/10.1007/s11042-013-1642-2

    Article  Google Scholar 

  56. Tsai MJ, Hsu CL, Yin JS, Yuadi I (2015) Japanese character based printed source identification. IEEE International Symposium on Circuits and Systems (ISCAS), May, Lisbon, 2800–2803. https://doi.org/10.1109/ISCAS.2015.7169268

  57. Vega LR, Rey H (2013) A rapid introduction to adaptive filtering. Springer-Verlag, Berlin

    Book  Google Scholar 

  58. Wu Y, Kong X, You X, Guo Y 2009 Printer forensics based on page document’s geometric distortion. Intl. conference on image processing (ICIP), Cairo, 2909–2912. https://doi.org/10.1109/ICIP.2009.5413420

  59. Zhou H, Wu J, Zhang J (2010) Digital image processing: part 1. Ventus Publishing ApS, Denmark

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Science Council in Taiwan, Republic of China, under NSC104-2410-H-009-020-MY2 and NSC106-2410-H-009-022-.

The authors would like to thank the anonymous reviewers with their valuable comments to improve the quality of this manuscript. Special thanks to Jin-Sheng Yin and Goang-Jiun Wang at National Chiao Tung University who help the revision and the software experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min-Jen Tsai.

Appendix: Formulas of feature filters

Appendix: Formulas of feature filters

Brief description of the formulas for ten feature filter sets is shown below:

Feature Filter

Image quality Measures

Formula

GLCM

Region of interest

R (ROI)

GLCM

\( R=\sum \limits_{\left(i,j\right)\in ROI}^1 \)

\( GLCM\ \left(i,j\right)=\frac{1}{\sum \limits_{\left(i,j\right)} Img\left(i,j\right)} Img\left(i,j\right) \)

where (i, j) indicates the spatial location of image. Img (i, j) is the probability from location (i, j).

DWT

Dilation

3 wavelet functions

Ψ(H)(x, y), Ψ(V)(x, y), and Ψ(D)(x, y),

When the wavelete function is sparable by

f(x, y) = f1(x), f2(y), then these functions rewritten to

ϕ(x, y) = ϕ(x), ϕ(y)

Ψ(H)(x, y) = Ψ(x), ϕ(y)

Ψ(V)(x, y) = ϕ(x), Ψ(y)

Ψ(D)(x, y) = Ψ(x), Ψ(y)

whereΨ(H)(x, y),Ψ(V)(x, y), and Ψ(D)(x, y) are called horizontal, vertical, and diagonal wavelets

Gaussian

G (x, y)is Gaussian matrix element at position (x, y)

\( \mathrm{G}\ \left(x,y\right)=\frac{1}{2\uppi {\sigma}^2}{e}^{-\frac{1}{2\uppi {\sigma}^2}} \)

where G (x, y)is Gaussian matrix element at position (x, y), σ is the standard deviation.

LoG

Log(x, y)  is the high-frequency Laplacian filter

\( \mathrm{Log}\left(x,y\right)=-\frac{1}{\uppi {\sigma}^4}\left[1-\frac{1}{\uppi {\sigma}^4}\right]{\mathrm{e}}^{-\frac{1}{\uppi {\sigma}^4}} \)

Unsharp

fs(x, y) is the sharped imaged from unsharp mask

\( {f}_s\left(x,y\right)=f\left(x,y\right)-\overline{f}\ \left(x,y\right) \)

where \( \overline{f}\ \left(x,y\right) \)is a blured version of f(x, y)

Wiener

H(u, v)is function of Wiener filter

g(x, y) = f(x, y) + n (x, y)

\( H\left(u,v\right)=\frac{P_f\left(u,v\right)}{P_f\left(u,v\right)+{\sigma}^2} \)

Where σ2is variance from the noise in Eqs. (9), Pf(u, v)is the signal power spectrum

Gabor

sx & sy are the variance along x and y axis, f is the frequency of sinusoidal function and θ is the orientation of Gabor function

\( G\left(x,y\right)=\frac{f^2}{\uppi \upgamma \upeta}\exp \left(-\frac{f^2}{\uppi \upgamma \upeta}\right)\exp \Big(\mathrm{j}2\uppi f{x}^{\prime }+ \) ϕ)

x = x cos(θ) + y sin(θ)

y = y cos(θ) − x  sin(θ)

Haralick

Angular second moment

Contrast

Correlation

Sum of squares (variance)

Inverse different moment

Sum average

Sum varince

Sum entropy

Entropy

Difference variance

Difference entropy

Info. measure of correlation 1

Info. measure of correlation 2

Max. correlation coefficient

ijp(i, j)2

\( {\sum}_{\mathrm{n}=0}^{{\mathrm{N}}_{\mathrm{g}}-1}{n}^2\left\{{\sum}_{\mathrm{i}=1}^{{\mathrm{N}}_{\mathrm{g}}}{\sum}_{\mathrm{j}=1}^{{\mathrm{N}}_{\mathrm{g}}}p\left(i,j\right)\right\},\left|i-j\right|=n \)

\( \frac{\sum_i{\sum}_j(ij)p\left(i,j\right)-{\mu}_x{\mu}_y}{\sigma_x{\sigma}_y} \)

ij(i − μ)2p(i, j)

\( {\sum}_i{\sum}_j\frac{i}{1+{\left(i-j\right)}^p}p\left(i,j\right) \)

\( {\sum}_{i=2}^{2{N}_g}{ip}_{x+y}(i) \)

\( {\sum}_{i=2}^{2{N}_g}{\left(i-{f}_s\right)}^2{p}_{x+y}(i) \) \( -{\sum}_{i=2}^{2{N}_g}{p}_{x+y}(i)\mathit{\log}\left\{{p}_{x+y}(i)\right\} \)

−∑ijp(i, j) log(p(i, j))

variance of px − y(i)

\( -{\sum}_{i=0}^{N_g-1}{p}_{x-y}(i)\log \left\{{p}_{x+y}(i)\right\} \)

\( \frac{ HX Y-{ HX Y}_1}{\max \left\{ HX, HY\right\}} \)

\( {\left(1-\exp \left[-2.0\left( HXY2- HXY\right)\right]\right)}^{\frac{1}{2}} \)

HXY1 =  − ∑ijp(i, j) log(p(i, j))

where HX and HY are the entropies of px and py,

HXY1 =  − ∑ijp(i, j) log {px(i)(p)y(j)}, and HXY2 =  − ∑ijpx(i)py(j) log {px(i)(p)y(j)}

(the second largest eigenvalue of Q)1/2

where\( Q\left(i,j\right)={\sum}_k\frac{p\left(i,k\right)p\left(j,k\right)}{p_x(i){p}_y(k)} \)

Fractal

Δ(x, y): fractal feature vector

\( \Delta \left(x,y\right)=\left\{\begin{array}{c}1,\\ {}\\ {}0,\end{array}\right.{\displaystyle \begin{array}{c} if\exists \left({x}^{\prime },{y}^{\prime}\right)\in {N}_4{\left[x,y\right]}_{,}\\ {} Ib\left({x}^{\prime },{y}^{\prime}\right)=0\wedge Ib\left({x}^{\prime },{y}^{\prime}\right)=1,\\ {} otherwise\end{array}} \)

where N4[x, y],is the set of pixels that are 4-connected to (x, y) from the image. ∆(x, y) uses the value 1 if the pixel at position (x, y) in the binary image Ib(x, y) that has the value 1 and having one neighboring pixel with the value 0. Otherwise, (x, y) takes the value 0.

LBP

LBPP, R(xc, yc) LBP features where P sampling points on a circle of R radius

\( {LBP}_{P,R}\left({x}_c,{y}_c\right)={\sum}_{p=0}^{P-1}\mathrm{s}\left({g}_{\mathrm{p}}-{g}_c\right){2}^ps(x)=\left\{\begin{array}{c}1, ifx\ge 0;\kern1em \\ {}0, otherwise.\end{array}\right. \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsai, MJ., Yuadi, I. & Tao, YH. Decision-theoretic model to identify printed sources. Multimed Tools Appl 77, 27543–27587 (2018). https://doi.org/10.1007/s11042-018-5938-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5938-0

Keywords

Navigation