Abstract
Feature selection from imbalance data plays an important role in building efficient support decision systems, improving the machine learning process performance and enhancing the classification accuracy. The problem of feature selection becomes even more difficult with imbalance data, which occurs in real-world domains when the classes representing the data set are not equally distributed. Using the traditional classifiers to seek an accurate performance over a full range of instances is not suitable to deal with imbalanced learning tasks, since they tend to classify all the data into one class. In this paper, the Mahalanobis genetic algorithm (MGA) classifier is proposed to address the problem of feature selection for imbalance welding data. The MGA classifier was benchmarked with the Mahalanobis-Taguchi system (MTS) classifier, in terms of the following metrics: the total misclassification errors, the area under the curve (AUC) for receiver operating characteristic (ROC) curves, and the signal-to-noise (S/N) ratio. A real-life data set from the spot welding process was used as a pilot study. The results in terms of the total misclassification error and the AUC metrics showed that the MGA had better classification performance than MTS. Very close results were obtained when the training data set was balanced by using the Synthetic Minority Oversampling Technique (SMOTE) which indicates the suitability of the MGA and MTS classifiers to be used for the imbalance data set without using any preprocessor approach. Regarding the S/N ratio, the results were inconsistent with the other classification metrics, which raises the question about its credibility.
Similar content being viewed by others
References
El-Banna M, Filev D, Chinnam RB (2008) Online qualitative nugget classification by using a linear vector quantization neural network for resistance spot welding. Int J Adv Manuf Technol 36(3–4):237–248
Aslanlar S, Ogur A, Ozsarac U, Ilhan E, Demir Z (2007) Effect of welding current on mechanical properties of galvanized chromided steel sheets in electrical resistance spot welding. Mater Des 28(1):2–7
Chao YJ (2003) Ultimate strength and failure mechanism of resistance spot weld subjected to tensile, shear, or combined tensile/shear loads. J Eng Mater-T ASME 125(2):125–132
Zhou M, Hu S, Zhang H (1999) Critical specimen sizes for tensile-shear testing of steel sheets. Weld J 78:305–313
Marya M, Wang K, Hector LG, Gayden X (2006) Tensile-shear forces and fracture modes in single and multiple weld specimens in dual-phase steels. J Manuf Sci Eng 128(1):287–298
Hao M, Osman K, Boomer D, Newton C (1996) Developments in characterization of resistance spot welding of aluminum. Weld J 75(1):1–4
Tumuluru MD (2006) Resistance spot welding of coated high-strength dual-phase steels. Weld J 85(8):31–37
Aslanlar S (2006) The effect of nucleus size on mechanical properties in electrical resistance spot welding of sheets used in automotive industry. Mater Des 27(2):125–131
Dickinson D, Franklin J, Stanya A (1980) Characterization of spot welding behavior by dynamic electrical parameter monitoring. Weld J 59(6):170–176
Gedeon S, Eagar T (1986) Resistance spot welding of galvanized steel: Part II. Mechanisms of spot weld nugget formation. Metall Trans B 17(4):887–901
Xu L, Chao Y-J (1999) Prediction of nugget development during resistance spot welding using coupled thermal–electrical–mechanical model. Sci Technol Weld Join 4(4):201–207
Martín Ó, López M, Martin F (2007) Artificial neural networks for quality control by ultrasonic testing in resistance spot welding. J Mater Process Technol 183(2):226–233
Martín O, Tiedra PD, Lopez M, San-Juan M, García C, Martín F, Blanco Y (2009) Quality prediction of resistance spot welding joints of 304 austenitic stainless steel. Mater Des 30(1):68–77
Cullen J, Athi N, Al-Jader M, Johnson P, Al-Shamma’a A, Shaw A, El-Rasheed A (2008) Multisensor fusion for on line monitoring of the quality of spot welding in automotive industry. Meas 41(4):412–423
Eşme U (2009) Application of Taguchi method for the optimization of resistance spot welding process. Arab J Sci Eng 34(2):519–528
Podržaj P, Polajnar I, Diaci J, Kariž Z (2004) Expulsion detection system for resistance spot welding based on a neural network. Meas Sci Technol 15(3):592–598
Ji C, Zhou Y (2004) Dynamic electrode force and displacement in resistance spot welding of aluminum. J Manuf Sci Eng 126(3):605–610
Tang H, Hou W, Hu S, Zhang H (2000) Force characteristics of resistance spot welding of steels. Weld J 79(7):175–183
El-Banna M, Filev D, Tseng F (2011) Force-based weld quality monitoring algorithm. Int J Intell Syst Technol Appl 10(1):1–14
Ji C, Deng L (2010) Quality control based on electrode displacement and force in resistance spot welding. Front Mech Eng Chin 5(4):412–417
Chen J, Farson D (2004) Electrode displacement measurement dynamics in monitoring of small scale resistance spot welding. Meas Sci Technol 15(12):2419–2425
Zhou K, Cai L (2013) Online nugget diameter control system for resistance spot welding. Int J Adv Manuf Technol 2571–2588. doi:10.1007/s00170-013-4886-0
Cho Y, Rhee S (2000) New technology for measuring dynamic resistance and estimating strength in resistance spot welding. Meas Sci Technol 11(8):1173–1178
Wang S, Wei P (2001) Modeling dynamic electrical resistance during resistance spot welding. J Heat Transf 123(3):576–585
Lee S, Choo Y, Lee T, Kim M, Choi S (2001) A quality assurance technique for resistance spot welding using a neuro-fuzzy algorithm. J Manuf Syst 20(5):320–328
Park YJ, Cho H (2004) Quality evaluation by classification of electrode force patterns in the resistance spot welding process using neural networks. In. Proceedings of the Institution of Mechanical Engineers, Part B: J Eng Manuf. Professional Engineering Publishing Ltd., pp 1513–1524
Cudney EA, Drain D, Paryani K, Sharma N (2009) A comparison of the Mahalanobis-Taguchi system to a standard statistical method for defect detection. J Ind Syst Eng 2(4):250–258
Cudney EA, Paryani K, Ragsdell KM (2006) Applying the Mahalanobis–Taguchi system to vehicle handling. Concurr Eng 14(4):343–354
Cudney EA, Paryani K, Ragsdell KM (2008) Identifying useful variables for vehicle braking using the adjoint matrix approach to the Mahalanobis-Taguchi system. J Ind Syst Eng 1(4):281–292
Riho T, Suzuki A, Oro J, Ohmi K, Tanaka H (2005) The yield enhancement methodology for invisible defects using the MTS+ method. Semicond Manuf IEEE Trans 18(4):561–568
Huang C-L, Hsu T-S, Liu C-M (2009) The Mahalanobis–Taguchi system—neural network algorithm for data-mining in dynamic environments. Expert Syst Appli 36(3):5475–5480
Lee Y-C, Teng H-L (2009) Predicting the financial crisis by Mahalanobis–Taguchi system—examples of Taiwan’s electronic sector. Expert Syst Appl 36(4):7469–7478
Taguchi G, Rajesh J (2000) New trends in multivariate diagnosis. Sankhyā. Indian J Stat, Series B 62:233–248
Huang X, Chen S (2006) SVM-based fuzzy modeling for the arc welding process. Mater Sci Eng A 427(1):181–187
Wan X, Pekny JF, Reklaitis GV (2005) Simulation-based optimization with surrogate models—application to supply chain management. Comput Chem Eng 29(6):1317–1328
Byvatov E, Schneider G (2003) Support vector machine applications in bioinformatics. Appl Bioinforma 2(2):67–77
Li M, Yuan B (2005) 2D-LDA: a statistical linear discriminant analysis for image matrix. Pattern Recogn Lett 26(5):527–532
Alexandre-Cortizo E, Rosa-Zurera M, Lopez-Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: Computer as a Tool, 2005 EUROCON 2005 The International Conference on. IEEE, pp 1666–1669
Liu C, Wechsler H (2002) Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. Image Process IEEE Trans 11(4):467–476
Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive Bayes. ICML pp 258–267
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15):2429–2437
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection. Springer, New York, pp 117–136
Beddoe GR, Petrovic S (2006) Selecting and weighting features using a genetic algorithm in a case-based reasoning approach to personnel rostering. Eur J Oper Res 175(2):649–671
Pal A, Maiti J (2010) Development of a hybrid methodology for dimensionality reduction in Mahalanobis–Taguchi system using Mahalanobis distance and binary particle swarm optimization. Expert Syst Appl 37(2):1286–1293
Babaoglu İ, Findik O, Ülker E (2010) A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine. Expert Syst Appl 37(4):3177–3183
Jin N, Rahmat-Samii Y (2007) Advances in particle swarm optimization for antenna designs: real-number, binary, single-objective and multiobjective implementations. Antennas Propag IEEE Trans 55(3):556–567
Mao J, Jain AK (1995) Artificial neural networks for feature extraction and multivariate data projection. Neural Networks IEEE Trans 6(2):296–317
Joo S, Yang YS, Moon WK, Kim HC (2004) Computer-aided diagnosis of solid breast nodules: use of an artificial neural network based on multiple sonographic features. Med Imaging IEEE Trans 23(10):1292–1300
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. ICML pp 121–129
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1):37–52
Mirapeix J, García-Allende P, Cobo A, Conde O, López-Higuera J (2007) Real-time arc-welding defect detection and classification with principal component analysis and artificial neural networks. Ndt E Int 40(4):315–323
Petroni A, Braglia M (2000) Vendor selection using principal component analysis. J Supply Chain Manag 36(2):63–69
Bruce Ho C-T, Dash Wu D (2009) Online banking performance evaluation using data envelopment analysis and principal component analysis. Comput Oper Res 36(6):1835–1842
Banda JM, Angryk RA, Martens PC (2012) Quantitative comparison of linear and non-linear dimensionality reduction techniques for solar image archives. In: FLAIRS Conference
Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: Artificial Neural Networks—ICANN’97. Springer, Heidelberg, pp 583–588
Apley DW, Shi J (2001) A factor-analysis method for diagnosing variability in mulitvariate manufacturing processes. Technometrics 43(1):84–95
Liu C-W, Lin K-H, Kuo Y-M (2003) Application of factor analysis in the assessment of groundwater quality in a blackfoot disease area in Taiwan. Sci Total Environ 313(1):77–89
Kahn JH (2006) Factor analysis in counseling psychology research, training, and practice principles, advances, and applications. Couns Psychol 34(5):684–718
Bishop CM, Svensén M, Williams CK (1998) GTM: the generative topographic mapping. Neural comput 10(1):215–234
Yin H (2008) The self-organizing maps: background, theories, extensions and applications. In: Computational intelligence: a compendium. Springer, Heidelberg, pp 715–762
Vidal-Naquet M, Ullman S (2003) Object recognition with informative features and linear classification. ICCV pp 281–288
Caruana R, Freitag D Greedy (1994) Attribute selection. ICML pp 28–36
Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306
Yang W-R, Wang C-S (2011) Current measurement of resistance spot welding using DSP. Tamkang J Sci Eng 14(1):33–38
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Addison D, Wermter S, Arevian G A (2003) Comparison of feature extraction and selection techniques. In: Proceedings of International Conference on Artificial Neural Networks (Supplementary Proceedings), pp 212–215
He H, Garcia EA (2009) Learning from imbalanced data. Knowl Data Eng IEEE Trans 21(9):1263–1284
Japkowicz N (2000) Learning from imbalanced data sets: a comparison of various strategies. In: AAAI workshop on learning from imbalanced data sets, Menlo Park
Woodall WH, Koudelik R, Tsui K-L, Kim SB, Stoumbos ZG, Carvounis CP (2003) A review and analysis of the Mahalanobis—Taguchi system. Technometrics 45(1):1–15
Grichnik A, Seskin M (2005) Mahalanobis distance genetic algorithm (MDGA) method and system. U.S. Patent US20060230018 A1, Oct 12, 2006
Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. Mach Learn 31:1–38
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Egan JP (1975) Signal detection theory and {ROC} analysis. Academic, New York
Swets JA, Dawes RM, Monahan J (2000) Better decisions through. Sci Am 83:360–367
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240(4857):1285–1293
Zou KH, O’Malley AJ, Mauri L (2007) Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circ 115(5):654–657
Provost FJ, Fawcett T (1997) Analysis and visualization of vlassifier performance: comparison under imprecise class and cost distributions. KDD pp 43–48
Provost F, Fawcett T (1998) Robust classification systems for imprecise environments. AAAI/IAAI pp 706–713
Domingos P Metacost (1999) A general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 155–164
Provost F, Domingos P (2000) Well-trained PETs: improving probability estimation trees
Fawcett T, Provost FJ (1996) Combining data mining and machine learning for effective user profiling. KDD pp 8–13
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Furukawa M, Hasegawa H (2000) U.S. Patent No. 6,130,396. U.S. Patent and Trademark Office, Washington, DC
De Maesschalck R, Jouan-Rimbaud D, Massart DL (2000) The Mahalanobis distance. Chemom Intell Lab Syst 50(1):1–18
Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85(411):633–639
Bersimis S, Psarakis S, Panaretos J (2007) Multivariate statistical process control charts: an overview. Qual Reliab Eng Int 23(5):517–543
Fuchs C, Benjamini Y (1994) Multivariate profile charts for statistical process control. Technometrics 36(2):182–195
Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp 1473–1480
Xiang S, Nie F, Zhang C (2008) Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognit 41(12):3600–3612
Holland JH (1975) Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press, London, England
Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning, vol 412. Addison-Wesley Menlo Park, California
Dianati M, Song I, Treiber M (2002) An introduction to genetic algorithms and evolution strategies. Technical report. University of Waterloo, Ontario
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Mahalanobis distance
The MD is a distance measurement introduced by P.C. Mahalanobis in 1936. It is a useful way of finding the similarity of an unknown sample set to a known one. It differs from the Euclidian distance in that it takes into consideration the correlation between variables. MD has been used in multivariate outlier detection [86, 87], process control [88, 89], and pattern recognition [90, 91].
To explain the difference between the Euclidian distance and the MD, consider the data shown in Fig. 10. Suppose a new alias data point “a” is obtained, and a decision should be made whether this point belongs to the group or not. If the Euclidean distance is used (left figure) as the decision criterion, point “a” will be classified to the group because it has the same distance from the center of the group as point “b.” On the other hand, if the MD is used as a decision criterion (right figure), then point “a” will not be classified to the group because it has a larger MD due to the weak correlation to the group data.
Mathematically, the MD is the squared distance (denoted as D2) calculated for the jth observation in a sample of size n with p features (or variables) as follows:
where j = 1…n, i = 1…p, Z ij is the standardized vector obtained by standardizing the values of X ij (i.e., \( {Z}_{ij}=\left({X}_{ij}-{\overline{X}}_i\right)/{S}_i \), \( {\overline{X}}_i \) is the mean of feature i, S i is the sample standard deviation of feature i, Z T ij is the transpose of Z ij , and R − 1 is the inverse of the correlation matrix of the features.
1.2 Genetic algorithms
The GA, which was inspired by evolution theory, is a global meta-heuristic technique first introduced by Holland in 1975 [92]. According to this theory, living beings compete with each other to survive. Weak beings are faced with extinction within their environment by the natural selection process. The strong ones have better opportunity to pass their genes to the future generations via reproduction. In the long run, beings carrying the correct combination in their genes become dominant in their population. Sometimes, random changes may occur in genes during the slow process of evolution. If these changes provide additional advantages in the challenge for survival, new beings evolve from the old ones. Unsuccessful changes are eliminated by the natural selection.
In GA terminology, a candidate solution is called a chromosome or an individual. These chromosomes consist of discrete units called genes. Each gene controls one or more features of the chromosome. Genes are assumed to be binary digits in the original implementation of GA by Holland. In later implementations, more varied gene types have been introduced. Normally, a chromosome corresponds to a unique solution in the solution space. This requires a mapping mechanism between the solution space and the chromosomes. This mapping is called an encoding. In fact, GA works on the encoding of a problem, not on the problem itself.
GA works on a group of chromosomes, called a population. The population is normally randomly initialized. As the search evolves, the population contains fitter and fitter solutions, and eventually it converges, meaning that it is dominated by a single solution.
There are two operators used to generate new solutions from existing ones: crossover and mutation. The crossover operator is the most important operator of GA. In crossover (Fig. 11), generally two chromosomes, called parents, are combined together to form new chromosomes, called offspring. The parents are selected among existing chromosomes in the population with preference toward fitness so that offspring is expected to inherit good genes which make the parents fitter. By iteratively applying the crossover operator, genes of good chromosomes are expected to appear more frequently in the population, eventually leading to convergence to an overall good solution.
On the other hand, the mutation operator introduces random changes into the characteristics of chromosomes. Mutation is generally applied at the gene level (Fig. 12). In typical GA implementations, the mutation rate (probability of changing the properties of a gene) is very small and depends on the length of the chromosome. Therefore, the new chromosome produced by mutation will not be very different from the original one. Mutation plays a critical role in GA.
In summary, crossover, which is by making the chromosomes in the population alike, leads the population to converge while mutation returns genetic diversity back into the population and assists the search to escape from local optima.
Reproduction involves selection of chromosomes for the next generation. In the most general case, the fitness of an individual determines the probability of its survival for the next generation. There are different selection procedures in GA depending on how the fitness values are used. Proportional selection, ranking, and tournament selection are the most popular selection procedures. The general procedure of GA [93] is given as follows (Fig. 13):
-
Step 1
Set t = 1. Randomly generate N solutions to form the first population, P1. Evaluate the fitness of solutions in P1.
-
Step 2
Crossover: Generate an offspring population Qt as follows:
-
2.1
Choose two solutions x and y from Pt based on the fitness values.
-
2.2
Using a crossover operator, generate offspring and add them to Qt.
-
Step 3
Mutation: Mutate each solution x ∈ Qt with a predefined mutation rate.
-
Step 4
Fitness assignment: Evaluate and assign a fitness value to each solution x ∈ Qt based on its objective function value and infeasibility.
-
Step 5
Selection: Select N solutions from Qt based on their fitness and copy them to Pt + 1.
-
Step 6
If the stopping criterion is satisfied, terminate the search and return to the current population. Otherwise, set t = t + 1 and go to Step 2.
More details about the history, theory and mathematical background, applications, and the current direction of genetic algorithms can be found in [94]
Rights and permissions
About this article
Cite this article
El-Banna, M. A novel approach for classifying imbalance welding data: Mahalanobis genetic algorithm (MGA). Int J Adv Manuf Technol 77, 407–425 (2015). https://doi.org/10.1007/s00170-014-6428-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00170-014-6428-9