Abstract

Breast cancer is the most lethal type of cancer for all women worldwide. At the moment, there are no effective techniques for preventing or curing breast cancer, as the source of the disease is unclear. Early diagnosis is a highly successful means of detecting and managing breast cancer, and early identification may result in a greater likelihood of complete recovery. Mammography is the most effective method of detecting breast cancer early. Additionally, this instrument enables the detection of additional illnesses and may provide information about the nature of cancer, such as benign, malignant, or normal. This article discusses an evolutionary approach for classifying and detecting breast cancer that is based on machine learning and image processing. This model combines image preprocessing, feature extraction, feature selection, and machine learning techniques to aid in the classification and identification of skin diseases. To enhance the image’s quality, a geometric mean filter is used. AlexNet is used for extracting features. Feature selection is performed using the relief algorithm. For disease categorization and detection, the model makes use of the machine learning techniques such as least square support vector machine, KNN, random forest, and Naïve Bayes. The experimental investigation makes use of MIAS data collection. This proposed technology is advantageous for accurately identifying breast cancer disease using image analysis.

1. Introduction

Any region of the body might be affected by cancerous cell development. Normal cells become crowded out as the cancerous growth spreads throughout the body, making it difficult for the body to operate properly [1]. Cancer is not a single disease but rather a collection of diseases. There are many different types of cancerous growths. Malignant growth can occur in any internal organ, as well as in the blood cells, and it is not limited to one area. When it comes to the development and spread of malignant growths, there is a distinct difference between them.

Tumors and lumps form as a result of cancer growth. Some anomalies, however, may not be harmful. To determine whether a tumor or lump is cancerous, a little sample is removed by the doctor and examined under a microscope. If it is not cancerous, it is referred to as a benign tumor. Malignancies other than tumors, such as leukemia (a blood illness), can occur in platelets or other cells of the body [2].

The aberrant development of cells is the beginning of breast malignant growth. Recurring knots or x-beams of these cells are typical features of tumors. Tumors can be harmful (diseases) if the cells within them grow into new, more potent substances or spread throughout the body. Various parts of the breast can get infected with breast cancer. The majority of breast cancers begin in the milk ducts before spreading to the areola. Some have their origins in the organs responsible for producing milk. Breast cancer can occur in a variety of ways, and some are more common than others. There are a few breast cancers that originate in distinct tissues. Even though there are many types of breast cancers, the condition results in a protrusion of the breasts. Mammogram screening can detect a wide range of breast illnesses. This aids in resolving issues before they become a problem [3].

An X-ray image of the breast is captured by mammography. Computerized mammography has eliminated the need for repeated mammograms in breast screening methods. Computer-based PC programs that warn radiologists to optimum variations in mammography and allow integrated film misleading are potential points of interest for DM [4].

Malignant breast and lung development, such as colon polyps, can be categorized using the computer-aided design framework. However, in the event that the master human audience is unavailable, these modalities can still be utilized. A compiled assessment of patient drawings was used by computer-aided design frameworks to show radiologist territories that seemed like anomalies. We should keep in mind that CAD can run differently depending on the settings, so any necessary adjustments are made to get the most accurate results [5].

Most of the CAD architecture is designed to help radiologists improve their accuracy and proficiency. They can be drawn by some highlights and overlook an injury; this can happen when they are trying to isolate prescription infections. Because of the wide range of opinions among mammograms, it is not uncommon for one of them to split an equal case in order to arrive at a different conclusion. Restorative networks must be supportive of outcomes [6], but unlike radiologists, CAD frameworks process images faster without reducing accuracy.

This article describes a machine learning and image processing-based evolutionary approach for detecting and classifying breast cancer. This model uses image preprocessing, image enhancement, segmentation, and machine learning algorithms to categorize and detect breast cancer.

2. Literature Survey

It is important to keep track of cancer statistics since it requires long-term planning, possible learning, and constant observation of every cancer patient [7]. The study’s goal is to show how data mining processes can be used to improve the statistical analysis results from cancer log data. The data were gathered from the Greece Cancer Log between the years 1998 and 2004. Data mining techniques and algorithms were used to process and train the data prior to examination. Using data mining techniques, the authors have developed a method for preparing and examining existing cancer data, which they hope this method may prove useful to other researchers in the future.

On the basis of the quantitative examination of bilateral mammographic picture element distinctions within the series of negative full-field digital mammography pictures, Tan et al. [8] developed and tested a novel computational model for forecasting near future breast cancer risk. A collection of 335 women’s digital mammograms from four separate time periods comprised the historical dataset. With the help of leave-one-instance-out-based cross validation, three support vector machine risk replicas have been constructed and weathered. The results show a mammographic feature distinction-based risk model and a rising style of the near-period risk for the mammogram-identified breast cancer.

This study [9] uses evolutionary optimization techniques to extract two well-known datasets under machine learning by applying four different optimization methods. The Iris and Breast Cancer datasets were used to examine the suggested optimization strategies. The neural network is used with four optimization processes, such as the dragonfly, grey wolf, whale, and multiversity optimization, in this article’s classification issue. In order to arrive at a precise conclusion, a number of control metrics were taken into consideration. Gray wolf and multiversity offer exact results over the other two methods in terms of convergence, runtime, and classification rate, according to the proportionate study.

Radiologists are increasingly using and requesting deep learning since it aids them in developing a precise diagnosis and enhances the accuracy of their predictions, according to Kaur et al. [10]. The Mini-MIAS dataset, which contains 322 images, is used in this article to demonstrate a new feature mining technique that uses K-mean clustering in order to select speed-up resilient features. In terms of a deep neural network and a multiclass support vector machine, a new segment is added to the categorization level that contributes 70% to training and 30% to testing. An autonomous decision-making method is based on K-mean clustering, and a multiclass support vector machine achieves higher precision rates than doing so manually.

A systematic review conducted [11] examines the progress made in computer-aided breast cancer diagnosis from the study’s inception. Using a wide range of technical databases as a reference, the systematic review was able to be used for a wide range of papers in the field. Nevertheless, the scope of this article was limited to academic and scholarly publications, with no consideration given to commercial considerations. The results of this survey provide an overview of the current state of computer-aided diagnosis systems in relation to the picture modalities that are being used and the classifiers based on machine learning that are being used.

According to Mohant y, S.S and Mohant y, P.K. [12], breast cancer is the second most common form of cancer worldwide, after lung cancer. A total of approximately 1.9 billion persons were overweight in 2016, with over 650 million of those individuals being obese. Therefore, it is clear that obesity and breast cancer risks are strongly linked. When it comes to oestrogen production through body fats, the author of this review explains that it is a peripheral area for oestrogen biosynthesis and oestrogen disclosure affecting body fat circulation.

Breast cancer detection accuracy and reduced diagnostic variation are critical, according to Wang et al. [13]. WAUCE (weighted area under the receiver operating characteristic curve ensemble) is a new model introduced by the authors, and its performance is compared with that of previous models utilizing datasets from Wisconsin diagnostic centers. When compared to existing ensemble models, the suggested approach achieves greater accuracy and a significantly lower variance in the detection of breast cancer. As an improved and more dependable alternative, the authors suggest that this methodology be used to diagnose other disorders.

According to Yang et al. [14], the deep learning-based classification of breast tissues from histology images has low accuracy because of the lack of training data and a lack of knowledge about structural and textual data that can span many layers. The ensemble of the multiscale convolutional neural network (EMCN) approach presented in this paper is used to categorize haematoxylin-eosin-stained breast microscope images into four categories, namely, benign lesion, normal tissue, invasive, and in-situ carcinoma. Prior to training the pretrained models, such as ResNet-152, DenseNet-161, and ResNet-101, each image is translated to many scales. The collected training bits are then used and upgraded during each scale. The EMS-net approach has better accuracy than the other three algorithms tested.

According to Xu et al. [15], the manual segmentation of ultrasound images takes a significant amount of time, making the automatic segmentation of ultrasound images essential. It has been suggested that images of breast ultrasound be divided into four categories: mass, skin, fat, and fibroglandular tissue by using convolution neural networks, as the authors have done. It appears from the quantitative measures and the Jaccard similarity index that the suggested strategy outperforms the alternatives by an impressive 80 percent. To help with the medical analysis of breast cancer and improve imaging for various types of medical ultrasound, the proposed technique may provide the segmentations needed.

The work [16] focused on a variety of ensemble approaches extensively employed in the field of bioinformatics for performing prediction tasks. Ensemble classification techniques for breast tumors are examined in terms of nine features, including publication domains, medical activities and research categories agreed upon various ensembles recommended, the sole methodologies used to build the ensembles and the validation structure adopted to examine these ensembles, the tools used to construct the ensembles, and optimizing. IEEE Explore, Scopus, ACM, and PubMed databases each had a total of 193 items published after the year 2000. Among the six medical jobs available, the diagnosis remedial job appears to be the most frequently explored one, followed by the experimental-focused empirical form and evaluation-based research procedures. The use of ensemble approaches in the treatment of breast cancer is thoroughly examined in this article. For this reason, specialists in breast cancer research have provided suggestions in the form of a summary of findings.

Within the datasets, Kakti et al. [17] demonstrated greater precision in diagnosing breast cancer scenarios. Based on supervised learning in the hunt and decision tree algorithms, the MMDBM (mixed mode database miner) algorithm has been suggested. Based on empirical learning and comparison analysis, the proposed technique’s output is more precise. Adding other datasets and attributes, as proposed by the author, could yield even better results.

For breast cancer detection, Ting et al. [18] recommended an algorithm called convolutional neural network improvement for breast cancer classification. It uses the convolutional neural network to improve the classification of breast cancer lacerations to help the professionals in the diagnosis of the disease. Classifying medical imaging as benign, malignant, or healthy patients is possible using the CCNI-BCC algorithm.

According to Chaudhury et al. [19], early detection and classification of breast cancer can help patients get the treatment they need. Using the notion of transfer learning, the authors have proposed a new deep learning framework for the diagnosis and categorization of breast cancer using breast cytology images. In contrast to current learning paradigms, transfer learning aims to use the information gained from one problem to solve a similar problem in the future. An attribute-mining structure is proposed that uses CNN architectures such as Google Net, VGGNet (visual geometry group network), and residual networks that have been previously trained and fed into a completely linked layer for the classification of malignant and benign cells with an average pooling classification method. Similar research work has been carried out in the area of breast cancer diagnosis [2022]. Authors have developed a network model to classify medical-related data. Security protocols are also developed to secure medical-related wireless sensor data.

3. Methodology and Result Analysis

3.1. Methodology

The methodology consists of the following major phases. Figure 1 represents a block diagram of the proposed model.(i)Image preprocessing using geometric mean filter(ii)Feature extraction using AlexNet(iii)Feature selection using relief algorithm(iv)Classification using LS-SVM and other algorithms

Image preprocessing is important for the correct classification of disease images. Mammogram images contain various types of noise. These noises are removed using image filtering techniques. A geometric mean filter is used to remove noise from the input images [23, 24].

In order to extract features, the AlexNet uses a deep learning approach. An AlexNet CNN fully connected layer is utilized to extract features from the fused picture. There are 22 layers of feature extractor in the AlexNet CNN, all based on transfer learning, plus a fully connected (FC) layer with 1 × 1 × 64 dimensions.

Inspired by instance-based learning techniques, Kira and Rendell developed the initial relief algorithm. Using relief’s feature selection filtering process, each feature is given a proxy statistic that may be used to determine its “quality” or “relevance” to the target idea. Because it was designed for binary classification issues alone, the original relief method had no way to deal with missing data [16].

There are two types of supervised learning methods: SVM and LS-SVM. These help with classification and regression problems in a machine learning way. Both the SVM and the LS-SVM act as nonprobabilistic binary linear classifiers. They build a hyperplane, which is a line that separates the two classes. LS-SVM is an addition to SVM that is used to solve linear equations and also to find a training model for classification. SVM is used to solve quadratic equations, whereas LS-SVM is used to solve linear equations. LS-SVM classifier costs less than the SVM classifier. In contrast to SVM, LS-SVM is much easier to use because it only needs to solve a set of linear equations to work out how it works. In LS-SVM, there are only a few parameters that need to be set. There are other multivariate classifiers, such as the NB classifier and the NN classifier, but LS-SVM is better at dealing with linear and nonlinear multivariate classification than these other classifiers. Radial basis function (RBF), linear, polynomial, quadratic, and MLP kernels are some of the kernels used in LS-SVM classifiers [25].

One of the simplest classifiers is the KNN classifier. It is called a “lazy” algorithm. A nonparametric algorithm is one that does not make any assumptions about how the data are spread out. The KNN classifier does not do this. The KNN algorithm is used to figure out what the unknown pattern looks like based on its closest neighbor. To classify the images, the nearest neighbor classifier is used. There are two parts to a KNN classifier. The first part is to figure out how far the unknown image is from each image used in the training phase. The second part is to figure out which training images are the most likely to be testing images. The Euclidean distance is used to classify the objects, and it is used to measure the distance between them. Euclidean distance is the most common way to figure out how far two points are from each other. It is the square root of the sum of the distance between the two points [26].

NB classification is a common way to do supervised learning. It is based on the Bayesian theorem with the assumption that each set of features is separate from the other. NB classification is also known as a person who wants to learn. I think it is very easy to build, and it is also very simple to understand. There is a very fast way to get the predicted class of the test data. It also works with a lot of data. This method of classifying looks at how each attribute and the class are related to each other for each individual case and comes up with a conditional probability for the relationships between attribute values and the class. It is used to figure out the chances of each class during training by counting how many times the training dataset shows up at different points in time.

The random forest tree (RFT) classifier is a group classification method. It is the same as the nearest neighbor classifier method. FT makes more trees because it picks variables at random. A classifier learns by looking at a random set of data features to separate tree nodes. The RFT classifier is based on the concept of bagging, which means that each successive tree is made from a bootstrap sample of the data items. The majority vote is used to classify the data items.

3.2. Result Analysis

There are a lot of databases for breast cancer mammograms, and they are used to look at them. MIAS and DDSM are the two databases that are used the most often out of all the databases. It is used in this project to look at the MIAS database. In the MIAS database, there are 322 images of the right and left breasts from mammograms. Among 161 patients, 51 images were found to be malignant, 64 images were found to be benign, and 207 images were found to be normal. Most of the time, MIAS database images include background information, the pectoral muscle, and different types of noises. This picture has a lot of noise in its background. In order to get a better and more accurate analysis and interpretation of breast images, it is important to get rid of all the noise [27].

Three parameters accuracy, sensitivity, and specificity are used in this study to compare the performance of different algorithms.where TP = True Positive, TN = True Negative, FP=False Positive, FN=False Negative

The accuracy, sensitivity, and specificity of LS-SVM, KNN, random forest, and Naïve Bayes classifiers for breast cancer disease detection are shown in Figures 2 and 3. Accuracy of LS-SVM is better than the rest of the classifiers. Sensitivity and specificity of the KNN algorithm are better than the rest of the classifiers.

4. Conclusion

Breast cancer is the most lethal type of cancer for women worldwide, affecting one out of every eight women. Because the etiology of breast cancer is still unclear, there are presently no effective treatments for preventing or treating the disease. Early detection and management of breast cancer are extremely successful techniques for diagnosing and managing the illness, and early detection may result in a higher chance of complete recovery. Mammography, the most effective method for detecting breast cancer, can be used to identify it early. Secondary advantages include the capacity to detect additional illnesses and the giving of information on the types of cancer, such as whether it is benign, malignant, or noncancerous. For the first time, an evolutionary approach for categorizing and identifying breast cancer using machine learning and image processing has been established. This model may be used to aid in the classification and identification of skin problems by utilizing image preprocessing, feature extraction, feature selection, and machine learning methodologies. The geometric mean filter is used to improve the overall picture quality. To extract features from the data, AlexNet is employed. The characteristics to be utilized are chosen using the relief algorithm. To identify and diagnose various illnesses, the model employs machine learning algorithms such as the least square support vector machine, KNN, random forest, and Naïve Bayes. An MIAS data gathering system is used in the experimental inquiry. The proposed technique has the advantage of precisely detecting breast cancer sickness using image analysis, which is a considerable benefit.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.