1 Introduction

Nowadays more and more automatic access systems are based on various biometric techniques. Face recognition systems [8, 21] are characterized by low invasiveness of acquisition and increasingly better reliability. The main problem that occurs in such systems is low resolution of details in the case of photographs taken from long distances. The identification (recognition) from this kind of images becomes an important scientific issue to solve [31, 32]. It can be observed that the detection stage and the recognition stage are typically analyzed separately. The second one is described in [19] in terms of video quality, testing methods and system requirements as an expansion of ITU-T P.912 standard. Our paper examines the effect of resolution reduction on both the face detection and the face identification.

The authors have prepared a Matlab based GUI (graphical user interface) application for experimental testing of face detection and recognition effectiveness. Our programs are based on standard solutions. The developed application enables real-time people identification using IP wireless cameras and batch processing of databases with various qualities of facial images. Batch processing has been used for testing the effectiveness of face identification with different parameters. The face recognition system is based on the eigenfaces approach.

The paper is organized as follows: Section 2 describes the related work, Section 3 presents the regulations of biometric standards related to face identification, in Section 4 we shortly describe and compare the face databases from various universities. Next sections describe elements of our face detection (Section 5) and recognition (Section 6) system and study effectiveness of these processes with variable lighting, face angle, and image resolution. Section 7 summarizes the obtained results.

2 Related work

Issues related to the face recognition in real-time monitoring systems can be found in paper [8]. The authors of this paper conclude that most of the techniques of the identification systems capture a full-frontal image of the face. Face recognition from a video stream is a more difficult task, because the system has to be resistant to changes in illumination, scale (size) of pictures, and face position. A solution proposed in article [8] is the use of the OpenCV Face Detector, which implements the technique proposed by Viola-Jones. The detection process can also be realized by the detection of the skin color.

Problems related to the low resolution of images and the SR (super resolution) technique are presented in [32]. Based on a set of training LR-HR (low-resolution–high-resolution) image pairs and LR input image, the SR algorithm estimates the HR image. During experiments the authors manually aligned all images by the eyes positions and the experiments have been concentrated on the frontal view.

In [17] face detection from color images is analyzed. Section 2 of this paper presents a set of potentially possible detection techniques. It is shown that the 12-bit Color Census Transform even for 6 × 6 pixel facial image resolution allows for 80 % detection efficiency. Aspects of the face detection at low resolution are unfortunately analyzed without taking the identification stage into account.

The face identification is typically solved using the following techniques: principal component analysis (PCA) [23], Fisherfaces based on linear discriminant analysis (LDA), independent component analysis (ICA), support vector machine (SVM) and other approaches as e.g., the nonnegative matrix factorization (NMF) [10]. A Comparative Study of eigenfaces vs. Fisherfaces vs. ICA faces can be found in [29].

3 Biometric face recognition standards relevant to CCTV

Biometric standards describe general rules, directives, and features concerning biometric input data, such as e.g. face images.

Data interchange formats are one of four main kinds of the biometric standards and they specify contents and formats presentation for the exchange of biometric data [14]. The data presented below are based on two international standards: ISO/IEC 19794 5 (Biometric data interchange formats—Part 5: Face image data, 2005) [15] and ANSI/INCITS 385 2004 (Face Recognition Format for Data Interchange, 2004) [5].

The above-mentioned standards describe an example of proper face position in an image. They contain distances (in pixels) for a picture made with 320 × 240 resolution. The most interesting regions have been separated (the inner region and the outer region). Line Mx in these standards approximates horizontal midpoints of the mouth and of the bridge of the nose and My line defines the line through the center of the left eye and the center of the right eye. At the intersection of these lines an M point is placed and it defines the center of the face. The x-coordinate Mx of M should be between 45 % and 55 % of the image width and the y-coordinate My of M should be between 30 % and 50 % of the image height. The width of the head should be in the range between 50 % and 75 % of the image width, and the length of the head—between 60 % and 90 % of the image height. Rotation of the head should be less than about 5° from frontal in every direction—roll, pitch, and yaw. This standard includes also a width-to-height ratio of the image, which should be between 1.25 to 1.34. Details about the proper use of these biometric norms in public databases are shown in Table 1.

Table 1 Comparison of face databases in relation to the biometric standards

The second important standard relevant to the CCTV (closed-circuit television) is the European norm EN 50132-7: CCTV and alarm systems [3]. It describes the recommended minimum size of the object (in this case a person). The object in the face detection process should occupy at least 50 % of the height of the screen (for the CCTV systems, where the screen height is 480 lines). As regards the need to precisely identify the human object, it should occupy at least 120 % of the image height. Figure 1 presents a schema of the minimum size of the observed object according to this norm.

Fig. 1
figure 1

Schema of the recommended minimum size of the observed object according to European norm “EN 50132-7: CCTV and alarm systems”

4 Scientific databases of faces in relation to biometric standards

There are several face image experimental databases prepared by academic institutions. Below we shortly describe some of them and compare their performance against the standards for facial biometrics.

Selection of the face databases used for testing was dedicated by the lack of face image databases acquired directly from real CCTV systems.

A database of the Sheffield University contains 564 images of 20 persons [7]. Each of them is shown in different positions, from the view of the profile to frontal position. The files are in PGM (Portable Graymap) format in various resolutions and 256-bit grayscale. A disadvantage of this database is that the frontal face images are not clearly separated from the others. From all pictures, only the frontal photos were selected for our experiments. The results are shown in Table 1.

One of the best-constructed databases in terms of the use, ease of processing, and sorting files in terms of features is the Yale Face Database. It includes 5,760 images of ten people [12]. Each of the photographed persons has 576 images in 9 positions and different lighting conditions. Every file in this database can be easily separated, because of a clear description of file names for frontal photos and the others. The pictures are of good quality and in high resolution (640 × 480 pixels) but still do not fully meet the requirements mentioned in the previous section (Table 1).

The next tested database for biometric standards was the MUCT Face Database from the University of Cape Town [24]. It consists of 3,755 face images of 276 subjects. Each face was photographed with the use of five different cameras at the same time. Thanks to that five facial images with different poses were obtained. Additionally, each individual was photographed with 4 different lighting sets.

The Color FERET Database [26] from the George Mason University was collected between the years 1993 and 1996 and includes 2,413 facial images, representing 856 individuals.

Summarizing, Table 1 shows that none of the above databases entirely respects the required biometric standards. The main problems in conforming the standards for these databases are: wrong proportions of image dimensions and too long distances of a person being photographed to the lens (particularly for older databases). It should of course be noted that non-compliance with the biometric recommendations of the tested face databases is even in some sense an advantage because they allow to analyze the identification effectiveness in the case of non-cooperation with the tested individuals. Such situations just occur when the face analysis is based on the recordings made with the video surveillance systems.

5 Face detection

5.1 Methods overview

Face detection in a picture is the preliminary step for the face recognition. By detection we understand location of the face in the image or determination of its position. There are many programs for face detection [4, 13, 18, 21]. An example of web application is faint (The Face Annotation Interface) [30].

Typically, the detection is realized in three stages. The first stage is the reduction of impact of interfering factors with the use of histogram equalization and noise reduction. The next stage is determination of areas with high probability where a face can be placed. At a later stage verification of the previously selected areas is performed. Finally, the face is detected and marked [11].

There are several common approaches for face detection. The first one consists in a face location based on the color of the human skin. The human skin color is different in terms of lighting intensity (luminance) but has the same chroma. Thanks to that other elements in the image, which do not correspond to the skin, can be effectively removed. Then, using mathematical morphology operations in the selected ROI (Region of Interest) further features can be isolated, which indicate the presence of a face in the picture [18]. Examples of the face detection in different conditions obtained with the Face Detection in Color Images algorithm are shown in Fig. 2. As we can see in this figure, the algorithm generally properly detects faces, but has significant disadvantages. As the method finds the skin, together with the face detection the neck and sometimes even the blond hair can also be detected, increasing the ROI. This method does not deal with images with low lighting and intensive side illumination.

Fig. 2
figure 2

Examples of face detection with different methods

Next technique for finding face location is the use of geometric models. This method is based on comparing the location of the selected models of the test face with the processed image. An advantage of such detection is an opportunity of working with static images in grayscale. This method is based on the knowledge of geometry of a typical human face and on dependencies between them—position, distance, etc.. This method is based on the use of the Hausdorff distance [16, 25]. TheFDMver1.0 [4] (Fig. 2) algorithm does not deal with changes in rotation of the face greater than 45° in the vertical and horizontal direction and greater than 25° on the bias direction. Also, as the Skin Detection algorithm, method using geometric models does not work properly in case of intensive side illumination.

The third technique of face detection is the use of the Haar-like features [13]. This method is based on the object detector proposed by Paul Viola and Michael J. Jones and then improved by Rainer Lienhart and Jochen Maydt [20]. The programs input and output are, respectively, the image (IplImage* type—an algorithm was written using the OpenCV library) and parameters with position of the localized area—the face. This program uses four classifiers (for face which is seen from the front), both eyes and separately left and right eye detection. Thanks to the histogram equalization, this algorithm deals with variable lighting and is even effective to the side illumination (Fig. 2). As already mentioned, the Haar-like method uses classifiers for eyes detection (in the previously determined ROI, in which the face was found), thus if the eyes are hidden (as in Fig. 2—an example of the top part of the face hidden) the face will not be detected. Summarizing, the Haar-like method precisely determines the face location in color and in grayscale images. It is resistant to changes in lighting (thanks the histogram equalization) and to inconsiderable rotation of the head. This method has been used in our research for face detection.

5.2 Influence of image resolution on detection process

During the study of influence of image resolution on the detection process, we examined 5,760 facial images from Yale database in 5 resolutions, starting at 640 × 480 pixels and then reducing the image size 2, 4, 8 and 12 times. As we mentioned in Section 4, the Yale database contains images of 10 people. Every person has his head in 9 positions. Every position has 64 different light conditions. Thus, we examined 28,000 photos with the use of the Haar-like method of the face detection.

Figure 3 shows an influence of the image resolution on the face detection process for various face positions, i.e. for each individual position from the database. It can be noted, that the reduction of the image resolution even by 4 times does not affect the detection process. When the image is downsampled 8 times, the detection efficiency is reduced by about 10 % for P00 position and about 20 % for face in position P08—head directed down (by 45°) and to the left (by 30°). Figure 3 indicates that the Haar-like method has the best detection results for positions P00 – P03 (Fig. 3). If the face is rotated from the frontal position, the face detection efficiency is much lower.

Fig. 3
figure 3

Influence of image resolution and various face positions on face detection process

It can be observed (Fig. 3), that the face can be detected (with a lower efficiency) even from an image with dimensions of 54 × 40 pixels. As shown in Fig. 4 the face in the photo with resolution of 54 × 40 pixels has dimensions of about 17 × 21 pixels. Other faces in the images were not exactly the same size, but similar. This example has been included to illustrate that 54 × 40 pixels detection limit does not apply to the size of the face, but the whole picture. Facial area is much smaller and is approximately 20–30 pixels.

Fig. 4
figure 4

Example of face detection in image with low resolution of 54 × 40 taken from Yale database

Figures 5, 6, 7 show an influence of the light source angle on the face detection process for different image resolutions. Like in the case of detection efficiency of rotated faces, the resolution does not have significant impact on the detection of images with 640 × 480 (Fig. 5) and 160 × 120 (Fig. 6) pixels—graphs are similar to each other and face detection efficiency is between 70–100 %. As shown in Figs. 57, the effectiveness of the face detection is high (70–100 %) for the face illuminated from the front. When the face is illuminated from the bottom, the effectiveness of detection is 80 % for 640 × 480 and 160 × 120 image resolutions, and 20 % for the 54 × 40 resolution. The worst detection results occur when the face is illuminated from the bottom right side. In this case, the detection efficiency is 70–90 % for 640 × 480 and 160 × 120 image resolutions and 0–19 % for the resolution of 54 × 40 pixels.

Fig. 5
figure 5

Dependence of light emission angle for face detection process from image with resolution of 640 × 480 pixels

Fig. 6
figure 6

Dependence of light emission angle for face detection process from image with resolution of 160 × 120 pixels

Fig. 7
figure 7

Dependence of light emission angle for face detection process from image with resolution of 54 × 40 pixels

Figures 8, 9, 10, 11 show the impact of face detection efficacy depending on light emission angle (separately in horizontal and vertical direction) and head positions: P00, P01, P03, P05 and P07. Abbreviations of a particular face position from Yale database are shown in Fig. 3. This study may be useful in the intelligent monitoring, where we can choose the location of the camera position in relation to prevailing in the interior lighting conditions in order to high detection efficiency.

Fig. 8
figure 8

Impact of face detection efficacy study depending on light emission angle (in the horizontal direction) and head positions (P00, P03, P07)

Fig. 9
figure 9

Impact of face detection efficacy study depending on light emission angle (in the vertical direction) and head positions (P00, P03, P07)

Fig. 10
figure 10

Impact of face detection efficacy study depending on light emission angle (in the horizontal direction) and head positions (P00, P01, P05)

Fig. 11
figure 11

Impact of face detection efficacy study depending on light emission angle (in the vertical direction) and head positions (P00, P01, P05)

6 Face recognition

6.1 Software description

During analysis of face recognition we used a modified software Face Recognition System 2.1 [27] working in MATLAB environment. This program uses an algorithm based on PCA (principal component analysis), called also eigenfaces (eigenvectors determined by PCA are called eigenfaces, when the PCA is used to analyze the face image) [2, 6]. Face recognition is based on the distance from the nearest class, according to the numbering assigned at the beginning to individual photographs (indicating a person in the class).

Our software is equipped with GUI (graphical user interface) (Fig. 12) and allows for operation in two modes: continuous and batch processing [28]. A simplified block diagram of the recognition software is shown in Fig. 13.

Fig. 12
figure 12

Face recognition program interface (Matlab environment)

Fig. 13
figure 13

Simplified block diagram of face recognition program

In the first mode (continuous processing) we can acquire the image from an IP wireless webcam (e.g. D-Link DSC-930L [9]) or a standard USB camera and recognize face belonging to the person, which is in front of the camera. In case of problems according with image acquired in the YUV color palette, a procedure of automatic conversion from YUV to RGB color space was used. The next operations possible in this mode is noise reduction, face detection in an image (the skin color filter described in Section 5 was used) and also the background removing—in order to reduce the processing area and the calculation time. Images entering the base must be the same size. Thus we could not let the direct use of the ROI—the facial images were scaled to the resolution of 100 × 120 pixels.

The batch processing mode gives two possibilities: loading images from the database and save results in .xls format or a specified number of frames can be recorded from a camera and saving results in .xls format.

6.2 Influence of image resolution on the recognition process

In the presented experiment the Yale Face Database [12], FullFaces [1] and MUCT [24] databases have been used.

The first database (Yale Face Database) includes rotated face pictures (9 positions) of 10 individuals in various light conditions (64 different illumination angle for each position)—each of the photographed person has 576 images. Due to the difficult detection of images, in which the light source was at a high angle, we have decided to limit the database to 25 % for the recognition process. Person number 5, has been rejected from the experiment in case of very poor face detection (average of 50 % for all face positions). From original pictures with the resolution of 640 × 480 pixels, faces have been extracted using the Haar-like algorithm and resampled to the resolution closest to a power of 2. In this case it is 256 × 256 pixels. All of pictures have been downsampled 2, 4, 8, and even 12 times. It corresponds to resolutions 128 × 128, 64 × 64, 32 × 32, and 21 × 21, respectively.

As reference images for the experiment with Yale database, 48 random face pictures of every individual have been chosen. Another 95 face images (for every person) have been used for testing.

Figure 14 shows FAR (false acceptance rate)/FRR (false rejection rate) plots for resolutions from 256 × 256 down to 21 × 21. As it can be seen, downsampling 2 and 4 times does not influence the recognition accuracy. Downsampling 8 and 12 times decreases the recognition by about 8 %. The EER (equal error rate) in 3 first cases is about 30 %. The presented results show, that even the 4 times downsampling does not influence recognition accuracy, irrespective of face rotation and light conditions. Combination of various light conditions and face rotation cause however some deterioration of results in comparison to the previous experiments described in [22]. The previously shown results of the front images, i.e., with no rotated faces (which is the main requirement in the norms) can be very well distinguished between each other.

Fig. 14
figure 14

FAR/FRR plot of face recognition accuracy for original and downsampled Yale database

The second database (FullFaces) includes 10 (horizontally and vertically) rotated face pictures of 30 individuals in constant light conditions saved with resolution 512 × 342. For the experiment all of pictures have been chosen. After the detection stage, the face images have been saved with resolution 256 × 256 and then downsampled 2, 4, 8, and 12 times.

In case of the FullFaces database, 4 random pictures of each individual have been used to create models, and other 6 files were used in the recognition stage. In contrast to the Yale database, face pictures are only rotated horizontally and vertically. It noticeably influences the recognition accuracy, which can be seen in Fig. 15. The EER is in all cases in range of 18–23 %. It can be noticed that even downsampling by 8 and 12 times does not decrease significantly the recognition accuracy in contrast to the Yale database. Main differences can be seen at the ends of lines. The presented results show, that in stable light conditions, face rotation and decimation does not significantly decrease the face recognition accuracy at least using the PCA algorithm.

Fig. 15
figure 15

FAR/FRR plot of face recognition accuracy for original and downsampled FullFaces database

To make previous results more visible, we also have used the MUCT database with color images and greater number of persons. Unfortunately, this database contains various numbers of images for particular individuals. In this case, our research was divided into two separately stages.

The first stage of MUCT database recognition research was the analysis of the effectiveness when the head position was changed. Two hundred-seventy six subjects photographed with five different positions have been taken for designation of recognition effectiveness. One face image from each person has been taken for the training stage and the remaining four was used for the recognition stage. Recognition efficiency for the original and downsampled MUCT database for different face positions is shown in Fig. 16. This figure shows, that upward and down directed head positions have the highest recognition efficiency: 90 % and 75 %, respectively. Generally, our face recognition algorithm is resistant to face resolution changes. Recognition efficiency for resolution of 21 × 21 pixels is reduced only up to 10 % for each tested positions.

Fig. 16
figure 16

Recognition efficiency for original and downsampled MUCT database for different face positions

The second stage of MUCT database recognition research was the examination of the effectiveness for lighting changes. One hundred-ninety nine differently illuminated individuals (2 lighting set) have been taken for recognition process. The first illumination set (marked as ‘q’, ‘r’, ‘s’) has been used in the experiment and consists of 91 individuals. The second illumination set (marked as ‘t’, ‘u’, ‘v’) consists of 108 individuals. The first type from two illumination sets (‘q’ and ‘t’) has been used for two different training stages, separately. The remaining illuminations (‘r’, ‘s’ and ‘u’, ‘v’, respectively) have been used for the recognition stage. Fig. 17 shows recognition efficiency for the original and downsampled MUCT database for different set of lights. Differences in the recognition efficiency between these two illumination sets was only about 5 %. In the case of 256 × 256 pixels resolution, the recognition efficiency has been on the level of 75 %. After decimation, when the face image size has 21 × 21 pixels, recognition efficiency decreasing by about 5 % and was at the level of 61 %.

Fig. 17
figure 17

Recognition efficiency for original and downsampled MUCT database for different light emission angle

7 Conclusions

This paper examines the effect of resolution reduction on both the face detection stage and the face identification. The obtained results may find widespread use in CCTV image analysis. Plots discussed in Section 6 indicate that the face recognition is correct even for images of the resolution of 21 × 21 pixels. It means, that persons can be recognized from a large distance (of several meters) by using basic monitoring systems. A confident image acquisition of the frontal face position can additionally be realized by placing the camera e.g. on the top of the straight stairs.

The achieved EER in every case (even for low resolution) was between 18–23 % (for the FullFaces database). An additional advantage of our approach is that it operates correctly even with a little amount of the training data. Proportions of the training data to the testing data in our experiments are up to ca. 0.3. Another positive feature, in relation to the EN 50132-7 norm, is that the face detection and recognition can take place with lower resolutions than those, that are indicated in the above-mentioned standards. The minimum face height in the picture in the EN 50132-7 standard and in our research is shown in Table 2.

Table 2 Comparison of EN 50132-7 norm and our study for face detection and recognition