A Comprehensive Review Study on: Optimized Data Mining, Machine Learning and Deep Learning Techniques for Breast Cancer Prediction in Big Data Context

Madhu Kirola; Minakshi Memoria; Ankur Dumka; Kapil Joshi

Kirola M, Memoria M, Dumka A, Joshi K. A Comprehensive Review Study on: Optimized Data Mining, Machine Learning and Deep Learning Techniques for Breast Cancer Prediction in Big Data Context. Biomed Pharmacol J 2022;15(1)

Manuscript received on :10-07-2021
Manuscript accepted on :10-01-2022
Published online on: 19-01-2022

Plagiarism Check: Yes
Reviewed by: Dr. B Kiran Bala
Second Review by: Dr. Raghavendra Prasad
Final Approval by: Dr. H Fai poon

How to Cite | Publication History

Views: (Visited 2,195 times, 1 visits today)

PDF Downloads: 722

A Comprehensive Review Study on: Optimized Data Mining, Machine Learning and Deep Learning Techniques for Breast Cancer Prediction in Big Data Context

Madhu Kirola^1*, Minakshi Memoria¹, Ankur Dumka², Amrendra Tripathi³and Kapil Joshi¹

¹Uttaranchal University, Dehradun, India

²Women’s Institute of Technology, Dehradun, India

³UPES, Dehradun, India

Corresponding Author E-mail: madhukirola@gmail.com

DOI : https://dx.doi.org/10.13005/bpj/2339

Abstract

In recent years, big data in health care is commonly used for the prediction of diseases. The most common cancer is breast cancer infections of metropolitan Indian women as well as in women worldwide with a broadly factor occurrence among nations and regions. According to WHO, among 14% of all cancer tumours in women breast cancer is well-known cancer in women in India also. Few researches have been done on breast cancer prediction on Big data. Big data is now triggering a revolution in healthcare, resulting in better and more optimized outcomes. Rapid technological advancements have increased data generation; EHR (Electronic Health Record) systems produce a massive amount of patient-level data. In the healthcare industry, applications of big data will help to improve outcomes. However, the traditional prediction models have less efficiency in terms of accuracy and error rate. This review article is about the comparative assessment of complex data mining, machine learning, deep learning models used for identifying breast cancer because accuracy rate of any particular algorithm depends on various factors such as implementation framework, datasets(small or large),types of dataset used(attribute based or image based)etc. Aim of this review article is to help to choose the appropriate breast cancer prediction techniques specifically in the Big data environment to produce effective and efficient result, Because “Early detection is the key to prevention-in case of any cancer”.

Keywords

Bigdata; Deep Learning Algorithm; Data Mining Algorithm; DCIS; LCIS; Invasive; Non-Invasive; Machine Learning Algorithms

Download this article as:

Copy the following to cite this article:

Kirola M, Memoria M, Dumka A, Joshi K. A Comprehensive Review Study on: Optimized Data Mining, Machine Learning and Deep Learning Techniques for Breast Cancer Prediction in Big Data Context. Biomed Pharmacol J 2022;15(1)

Copy the following to cite this URL:

Kirola M, Memoria M, Dumka A, Joshi K. A Comprehensive Review Study on: Optimized Data Mining, Machine Learning and Deep Learning Techniques for Breast Cancer Prediction in Big Data Context. Biomed Pharmacol J 2022;15(1). Available from: https://bit.ly/32d4wWb

Introduction

Nowadays healthcare data includes free form text such as doctors notes, reports from radiologists, still images such as CAT scans, photos, videos, recorded patient historical data, genomic files, biometric and other scientific data from clinical research and drug production. It also collects data from wearables, medical equipment, respirators, blood pressure monitors, and other linked devices using the Internet of Things ( IoT).All this data, in addition to data which exists in independent, standalone systems — EMR, PACS, RTHS, EMPI,LIS, and PMS — is also part of the new data on health care. Big data technology is needed to capture and handle these large quantities of data involved and to provide reliable responses from several reputable sources that represent the latest medical research. Big Data and advanced analytics provide solutions to some of the key issues facing the healthcare industry today. Digital healthcare demands that medical practitioners have access to the study of all data in its original formats instantly, explicitly and in a natural language. Cancer is the world’s second leading cause of death. Breast cancer is the top cancers that affect the Indian population too. Breast cancer diseases is the most common type of cancer detected in women in India. Breast cancer accounts for 2.09 million cases and 627000 deaths globally. It can occur at any age but the incidence rates in India begin to rise in the early thirties and peak at ages 50-64 years. Globocan 2018 (Globocan belongs to IARC – International Agency for Research on Cancer) and NCRP (National Cancer Registry programme, India) represented(data for the year 2018, published on 12 sept 2018^13,14,15.17december 2020,

‘Incidence’ indicates number of newly diagnosed women with breast cancer that year. Newly diagnosed in India in the year 2018 were 162,468 women with breast cancer[Figure 1]. Breast cancer also accounted for 27.7% of all newly diagnosed women’s cancers. That means, in India, that one in four newly diagnosed cancer in women was breast cancer³⁶.

Figure 1: Breast Cancer Incidence

Click here to view figure

‘Mortality’ reflects the number of women who died that year from breast cancer. 87,090 women died in India for the year 2018 from breast cancer [Figure 2]. Breast cancer accounted for approximately 23.5 per cent of all deaths linked to cancer in women in India. Which means that in India’s women nearly one in four deaths due to breast cancer³⁶.

Figure 2: Breast Cancer Mortality

Click here to view figure

Breast Cancer

Breast cancer is an inflammatory tumour developed in the mammary gland. Cancer begins when the cells start to develop out of control. Breast cancer is a group of diseases where breast tissue cells shift and grow uncontrolled, usually leading to a lump or mass. The majority of breast cancers originate in the milk glands ³. Breast cancer is diagnosed by mammograms, breast self-examination (BSE), biopsy, and advanced breast tissue testing. Breast cancer care may include surgery, radiation, hormone therapy, chemotherapy and laser therapy.

Breast cancer can spread when cancer cells invade the bloodstream or lymphatic system and travel to other parts of the body. Breast cancer cells usually form a lump or a tumour, which can be seen on an X- ray or felt as a hard mass. Breast cancer risk can be reduced by keeping track of controllable risk factors.Breast cancer is almost exclusively a female disease, but it is also very common in men now days.

Types of Breast Cancer

The categories of breast cancer can also determine by identifying that whether the cancer has spread or not referred as -Invasive Breast Cancer and Non-Invasive breast Cancer [Figure 3]

Figure 3: Classification of Breast cancer

Click here to view figure

Non-Invasive breast cancer originate in the milk glands but does not spread to the rest of the breast tissues. There are two categories of Non-Invasive Breast Cancer are-Ductal carcinoma in situ (DCIS) is a malignant cell contained within breast ducts, detects by mammograms. Lobular Carcinoma In Situ (LCIS) is a malignant cell contained within breast lobule, detected on biopsy done for others indications [Figure 3].

On the other side Invasive Breast cancer refers to any form of breast cancer that has spread into the surrounding breast tissues. Such as- Invasive Ductal Carcinoma(IDC) infiltrates surrounding breast tissues palpable if large or detected on mammograms. The majority 81% of breast cancers cases are invasive type. Invasive Lobular Carcinoma (ILC) infiltrates surrounding breast tissues bilateral diagnosed microscopically ^{13, 14, 15}. Inflammatory Carcinoma, IBC (inflammatory breast cancer) is an uncommon and severe kind of breast cancer that manifests as a rash or irritated skin region. It obstructs the lymph veins in the skin of the breasts. A mammogram or ultrasound cannot detect inflammatory breast cancer therefore microscopically it can detect. Paget’s disease is a type of breast cancer that is extremely rare. Paget’s disease of the breast begins on the nipple and progresses to the black circle of skin surrounding the nipple (areola). Male breast cancer is typically diagnosed as invasive ductal carcinoma at an advanced age [Figure 3].

Breast Cancer Signs and Symptoms

A very common symptom of breast cancer is new lymph node or hard mass around the breast or underarm area. Breast cancer is more likely to be a painless, hard mass with irregular edges, but it may also be soft, delicate.Since breast cancer usually has no symptoms because the tumour is small and easy to treat, screening is important for early detection. Breast cancer symptoms differ from one person to the next. Some people don’t show any indications or symptoms at all. Breast cancer can manifest itself in a variety of ways, including:

Change in Breast Texture

Detecting a lump, hard knot or area of thickened tissues in the breast or underarm area.

Skin on the breast, nipple or areola becomes red, scaly or feel warm.

Change in Breast shape or size

Swelling or shrinking of the breast.

Breast pain without swelling or Shrinking.

Recent asymmetry in breast size.

Other Changes

Tenderness in the Breast area.

One or both nipple have slightly turned inward or inverted.

Discharge of clear fluid or bloody fluid from Breast.

Breast Cancer Stages and Survival Rate

According to the stage system of the SEER Committee (Surveillance, Epidemiology, and End Results) overview ^{16, 27}:

Cells that are abnormal in the duct lining or a part of the breast. Breast cancer is more likely to occur in one or both breasts. At this point, the survival rate is 100%.

Breast cancer is a form of cancer that affects the tissues of the breast. Tumour is less than an inch in diameter. This stage has a 95 % to 98 % survival rate.

It is also related with tissues of the breast. Tumour measures less than two inches in diameter. Cancer has the potential to spread to the auxiliary lymph nodes. At this point, the survival rate is 88%.

Affect tissues of the breast. Tumour has a diameter of more than two inches. Cancer has the potential to spread to the auxiliary lymph nodes. Inflammation, dimpling, or a shift of skin colour are all possibilities. At this point, the survival rate is between 50% and 60%.

Beyond the breast, cancer has spread to other parts of the body. At this point, the survival rate ranges from 15% to 20%.

Breast Cancer Causes

This is caused because of the progressions or change in DNA of the cells. A portion of the peril factors are kind-hearted condition like hyper plasia increment danger of bosom malignant growth. Having a prior history of malignant growth expands the opportunity of causing disease.

Classification of different Data Mining, Machine Learning and Deep Learning Techniques for Breast Cancer Prediction

Training datasets are being used in data mining techniques. Data mining is a process for detecting common patterns in a data set that are accurate, unique, and useful data.[Figure 4]

Figure 4: Different Supervised and Unsupervised Breast Cancer Prediction Techniques

Click here to view figure

An algorithm that learns from data and improves over time is referred to as machine learning. Machine learning is the analysis of an algorithm that can generate data automatically. Machine learning makes use of data mining techniques and another learning algorithm to create models of what is going on behind the scenes of such data in order to predict potential outcomes. Machine learning algorithms [Figure 4] generate models based on the knowledge that describes the relationship between items in data sets in order to predict future outcomes.

When a computer model learn from images, texts, audios based datasets and perform classification tasks directly is known as Deep learning. Deep learning models [Figure 4] can achieve cutting-edge precision, sometimes even outperforming humans. A large quantity of labelled data and a multilayer neural network architecture are used to train models.

Ensemble algorithms are supervised learning techniques work on the base of hypothesis[Figure 4].There are two types of ensemble approaches: homogeneous and heterogeneous. Homogeneous ensemble techniques combine one base method with two or more configuration methods, whereas heterogeneous ensemble techniques mix two or more base methods.

Review of the Literature

In the field of medical data analysis, many studies on breast cancer have been published, and the majority of them claim to have high classification accuracy.

K. Venkateswara Rao et.al¹proposed an examination report, utilizing various strategies for features selection used for features extraction with various techniques for features grouping to characterize breast cancer malignant growth. Information on breast cancer disease is taken from the UCI store and analysed by utilizing the WEKA strategy, and proposed methods are applied to precisely classify details. This examination plainly characterizes that the strategy for information digging is viable for anticipating breast cancer disease. The WEKA device is viewed as truly outstanding and most dependable data classification techniques in data mining. Contrasted with different algorithm on the data set for breast cancer malignant growth, SVM gives reliable outcomes. 286 cases and 10 qualities of breast cancer disease have been investigated with 82.53%accuracy rate.

Madhu Kumari et.al² proposed a prediction framework plan that could foresee the event of breast cancer malignant growth at a beginning phase by assessing the smallest set of features chose from the clinical dataset. To play out the proposed explore, the Wisconsin breast cancer dataset (WBCD) was utilized. Utilizing characterization exactness that was acquired by contrasting genuine with anticipated qualities, the capability of the proposed technique is obtained. The result shows that this investigation accomplishes the optimum accuracy of classification 99.28%.

Hiba Asria et.al³ set up a report, on execution examination which is completed on the “Wisconsin Breast Cancer (unique datasets between different machine learning classification algorithm: Support Vector Machine (SVM), Decision Tree (C4.5), Naïve Bayes (NB) and K-Nearest Neighbour (k-NN)” ³. The primary target is to decide the exactness of the classification algorithm with respect to the adequacy and effectiveness of every classification algorithm as far as exactness, accuracy, affectability and explicitness. Experimental work done on WEKA tool data mining method. Taking everything into account, in Breast Cancer prediction and diagnosis, SVM has demonstrated its adequacy with 97.13% and accomplishes the most elevated outcomes regarding precision and low error rate.

Sara Al-Ghunaim et.al[4] “consider the issue of breast cancer forecast in the , thought about two assorted data context consider majorly two varieties of data- Gene Expression(GE) and DNA methylation (DM). The goal of this work is proportional up the machine learning algorithms which are utilized for characterization by applying each dataset independently and together. For this reason, they picked Apache Spark as a framework and three distinctive classification algorithms, Support Vector Machine (SVM), Decision Tree (DT), and Random Forest(RF), to make nine models that help in anticipating breast cancer disease. A comparative study conducted by utilizing three situations with GE, DM, and GE and DM consolidated, to show which of the three kinds of information would create the best outcome as far as precision and error rate and just as a test correlation performed between two frameworks (Spark and Weka), to show their conduct when managing huge amount of data i.e-Big Data. Where The research results showed that the scaled SVM classifier in the Spark framework beats different classifiers, as it accomplished the most highest elevated precision and the least error rate with the GE dataset. SVM arrives at an exactness of 99.68% and hence beats different classifiers on both Spark and Weka environment”.

M. Supriya et.al ⁵ “proposed a breast cancer prediction framework using Optimized Artificial Neural Network (OANN). Fundamentally, the unprocessed breast cancer data are viewed as the input. The large amount of data big data (BD) stockpiling contains some rehashed data. Secondarily, such rehashed information are disposed of by using Hadoop Map-Reduce. In the ensuing stage, the data are pre-processed using replacing of missing attributes (RMA) and normalization procedures. Therefore, the features are picked by using Modified Dragonfly algorithm (MDF). At that point, the chose features are inputted for classification. Here, it grouped the features using OANN. Optimization is done by utilizing the Gray Wolf Optimization (GWO) algorithm. Experiential results are appeared differently in relation to winning IWDT (Improved Weighted-Decision Tree) in regard of exactness, recall, precision, and ROC. The proposed OANN classifier (with and without features selection) accomplishes over 96% of accuracy for each data. The ROC performance of the proposed OANN accomplishes more prominent outcomes when contrasted with the existing one”.

A. M. Hemeida, November 2019⁶“addresses execution of transformative optimization algorithm for mining two popular data indexes in machine learning by carrying out four diverse streamlining methods of optimization. The chose data indexes utilized for assessing the proposed optimization algorithm are Iris dataset and Breast Cancer dataset. In the order issue of this paper, the neural organization (NN) is utilized with four streamlining optimization procedures, which are Whale Optimization Algorithm(WOA), Dragonfly Algorithm (DA), Multiverse Optimization (MVA), and Gray Wolf Optimization (GWO). Diverse control boundaries were considered for precise decisions of the proposed optimization procedures. The comparative investigation demonstrates that, the GWO, and MVO give precise outcomes over both WO, and DA regarding convergence, runtime, classification rate, and MSE. Hybrid algorithms consisting of two diverse optimization strategies can be considered for future investigation for data mining tasks”.

Md. Milon Islamet.al ⁷“works on Support Vector Machine (SVM), K-Nearest Neighbours, Random Forest, Artificial Neural Networks (ANNs), and Logistic Regression are five supervised machine learning approaches that have been distinguished (LR). The UCI repository provides links to the Wisconsin Breast Cancer dataset. The yield of the test is surveyed in terms of precision, sensitivity, specificities, accuracy, negative predictive value, false negative rate, false positive rate, F1 score, and Matthews Correlation Coefficient. The findings shows that ANNs have the highest Precision, Accuracy, and F1 score of 98.57%, 97.82 %, and 0.9890, respectively, while SVM has the second highest Precision, Accuracy, and F1 score of 97.14 %, 95.65%, and 0.9777 respectively”.

Habib Dhahri, Eslam et.al ⁸ “ focused on Genetic programming and machine learning algorithms, with the aim of developing a framework that can reliably distinguish between benign and malignant breast tumours. The aim of this research was to develop the learning algorithm. The best features and perfect parameter values of machine learning classifiers are selected using a genetic programming technique. Sensitivity, specificity, precision, accuracy, and roc curves were used to test the proposed method’s efficiency. The study shows that by combining feature pre-processing methods and classifier algorithms, genetic programming can automatically find the best model”.

Walid CHERIF ⁹ gives a “new solution to speed up KNN algorithm based on clustering and attributes filtering to optimize K-Nearest Neighbours algorithm (KNN) performance and to accelerate its process. Therefore the paper’s filtering to optimize contributions are threefold: firstly ,the clustering of class cases, secondly, the identification of the most significant attributes, and third is the assessment of similarities by coefficients of reliability. Classification results indicate that the proposed algorithm outperforms KNN, NB, SVM on the considered dataset with an f-measure slightly exceeding 94%”.

Hui Huang et.al ¹⁰“ established anenhanced machine learning framework to diagnose the breast cancer. The centre of this framework is the adoption of the Levy Flight (LF) Strategy (LFOA) enhanced fruit fly optimization algorithm (FOA) to optimise two main support vector machine (SVM) parameters and to construct LFOA-based SVM (LFOA-SVM) to diagnose breast cancer. In terms of different performance metrics, the experimental results show that the suggested LFOA-SVM approach can beat other counterparts. The proposed method has achieved a classification accuracy of 93.83%, sensitivity of 91.22%, specificity of 96.53% and MCC of 0.8799 for breast cancer diagnosis based on the high-level features”.

Sapiah Binti Sakri et.al ¹¹suggested “improving the efficiency of most classification algorithms by using techniques for feature selection to minimise the number of features. Compared to other features, certain characteristics are more significant and affect the results of the classification algorithms, The objective of this research is to compare the accuracy of a few existing data mining algorithms in predicting breast cancer recurrence. It incorporates a particle swarm optimization as a feature selection in three well-known classifiers: Naive Bayes, K-nearest neighbour, and fast decision tree learner, with the goal of boosting the prediction model’s accuracy.. With and without PSO (Particle swarm optimization), naive Bayes produced better performance, whereas when used with PSO, the other two methods improved”.

Compared to other models for classification, Seyed Reza Kamel et.al ¹² uses “data mining as a blend of Gray Wolf Optimization (GWO) feature selection process and support vector machine (SVM) to improve the accuracy of breast cancer diagnosis compared to previous methods, a new technique introduced with high precision. The approach proposed had a better ability to detect breast cancer comparison to prior approaches. Experimental results are gathered Using the MATLAB and UCI datasets. The best results are obtained from a fusion of the SVM algorithm and the GWO to select the subset of suitable features. The accuracy, sensitivity and specificity were 100%, 100% and 100% compared to the other algorithms”.

P. Israni ¹⁷ “provides research work Using Support Vector Machine (SVM) with 10-fold cross validation-an efficient BCD model for detecting breast cancer. When there are multiple input features for cancer detection, the problem becomes more complicated. To reduce the feature space from a higher to a lower dimension, Principal Component Analysis (PCA) is utilized. The PCA improves the model’s accuracy, according to the results of the experiment. Other supervised learning algorithms such as Decision trees (DT), Random Forest, k-Nearest Neighbours (k-NN), Stochastic Gradient Descent (SGD), AdaBoost, Neural Network (NN), and Nave Bayes are compared to the suggested BCD model. F1 measure, ROC curve, Accuracy, Lift curve, and Calibration Plot are among the evaluation metrics that show that the proposed BCD model outperforms and provides the highest accuracy among the other examined methods. The accuracy of the proposed BCD model is 98.1 % and AUC is 0.995, which are the highest among the other implemented models”.

A. A. Bataineh ¹⁸“studies compares the performance of five nonlinear machine learning algorithms: Multilayer Perceptron (MLP), K-Nearest Neighbours (KNN), Classification and Regression Trees (CART), Gaussian Nave Bayes (NB), and Support Vector Machines (SVM). The major goal is to assess each algorithm’s efficiency and efficacy in terms of classification test accuracy, precision, and recall when it comes to classifying data. MLP has a training data accuracy of 96.70 %, which is higher than the other four algorithms. Following the estimation, the predictive models’ performance is tested using the k-fold cross-validation procedure on unknown data in terms of accuracy, precision, and recall. The MLP model had the highest accuracy, precision, and recall of 99.12%, 99.00 %, and 99.00 %, respectively, according to the findings of this research. Wisconsin Breast Cancer Diagnostic (WBCD) dataset were used for the study.”

A.Reddy ,S. Reddy, , and B. Soni ²⁴ “The authors present the innovative DNNS Breast Cancer Detection Method. Unlike other methods, the proposed solution is based on a deep neural network’s Support value. A normalizing procedure has been used to improve the performance, efficiency, and quality of photographs. Experimental results show that the proposed DNNS outperforms the existing one methods.”

Zheng, Jing & Lin, Denan & Gao, Zhongjun & Wang, Shuang & He, Mingjie & Fan, Jipeng ⁴⁹“proposed With modern computing approaches, a mathematically proposed Deep Learning assisted Efficient Adaboost Algorithm (DLA-EABA) for breast cancer diagnosis has been developed. Tumor classification methods employing transfers, in addition to typical computer vision methodologies, are being actively researched through the use of deep convolutional neural networks(CNNs). This work focuses on finding the best way by integrating various machine learning methodologies with methods for choosing and extracting features, as well as evaluating their output using classification and segmentation algorithms. When compared to other current systems, the experimental findings demonstrate that the high accuracy level of 97.2%, sensitivity of 98.3%, and specificity of 96.5%.”

Comparative Analysis between techniques used for Breast Cancer Prediction (Year 2016 to 2020)

The above survey provides the detailed description of classification of breast cancer using various machine learning, data mining, as well as deep learning techniques on the basis of Algorithm/method used for prediction, tools, data set, data type,number of attributes considered for the study as depicted in Table 1.

Table 1: Comparative Review of Data mining, Machine Learning and Deep learning Techniques for breast Cancer Prediction

Click here to view table

Result Analysis

According to the study, traditional data mining and machine learning approaches have limited use, whereas hybridization of machine learning techniques with optimization techniques as well as hybridization of deep learning methods with optimization methods have a lot of potential for clinical analysis and boosting the diagnostic capacity of existing computer-based application systems like SVM WITH Gary Wolf optimization [12] with 100% accuracy, and ANN with Dragon Fly Optimization Algorithm [6] with 98% accuracy rate as shown in Figure[5].This statistical [Figure 5] and comparative analysis [Table 1] also shows that very few studies of breast cancer prediction are based on mammograms/images. The availability of datasets is a major barrier in using machine learning and deep learning techniques to predict breast cancer because for computational measurements, each method requires a considerable amount of training data. In this paper, we provide an overview of data mining, machine learning, and deep learning methodologies, with an emphasis on the accuracy rate of breast cancer prediction. We looked for publications in data mining, machine learning, and deep learning techniques in the field of medical data analysis and searched BMC Bioinformatics, Biomed, Google Scholar, IEEE, Science-Direct, Springer, and Web of Science databases, as well as Research Gate, where multi-view mammography based data set /numeric attributes based data set used for research study.

Figure 5: Statistical Comparative Analysis of Data mining, Machine Learning and Deep learning Techniques(w.r.t Accuracy level) for breast Cancer Prediction in last Five years (2016-2020)

Click here to view figure

Conclusion

When the objective is to obtain more efficient trends and knowledge that allow improved analysis, decision making, and process automation, analysing large sets of data is difficult. Unfortunately, conventional approaches to using machine-learning algorithms were unable to meet the modern challenges of big data, especially scalability. For breast cancer prediction, numerous data mining ,machine learning and deep learning algorithms are evaluated. The primary goal of this review article is to identify existing machine learning and deep learning based research for breast cancer prediction and to determine the most appropriate approach for predicting the incidence rate. It has been observed that there is still a lot of work has to be done in the future. Because Big data is currently causing a revolution in healthcare. Since today’s digital healthcare needs intelligent integration and aggregation of accessible patient information and computer data, structured, semi-structured, and unstructured, in their original formats, there is a need to manage this vast amount of data. Second, due to the small dataset availability, very few research studies are focused on breast cancer images. Therefore a model can be proposed for the prediction of breast cancer from the histopathological images based data sets on Big data. Initially, Hadoop architecture can be generated top reserve the data samples in order to envisage the work on Big data after that optimized Convolutional neural network algorithm can be implemented for prediction.

Conflict of Interest

There are no conflict of interest.

Funding sources

There is no funding source.

References

Milon Islam, Md. Rezwanul Haque, Hasib Iqbal, Md. Munirul Hasan, Mahmudul Hasan, Muhammad Nomani Kabir” Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques”. SN COMPUT. SCI.1, © Springer Nature Singapore Pte Ltd ,2020,Art.no-290.
CrossRef
Madhu Kumari , Vijendra Singh” Breast Cancer Prediction system” International Conference on Computational Intelligence and Data Science (ICCIDS 2018) Procedia Computer Science 132, Elsevier,2018,pp.371–376.
CrossRef
Hiba Asria *,Hajar Mousannifb ,Hassan Al Moatassime c ,Thomas Noeld” Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis” The 6th International Symposium on Frontiers in Ambient and Mobile Systems (FAMS 2016) Procedia Computer Science 83 , Elsevier, 2016,pp.1064 – 1069.
CrossRef
Alghunaim and H. H. Al-Baity, “On the Scalability of Machine-Learning Algorithms for Breast Cancer Prediction in Big Data Context,” in IEEE Access, 2019, pp.91535-91546.
CrossRef
M. Supriya1 & A. J. Deepa2”A novel approach for breast cancer prediction using optimized ANN classifier based on big data environment” Health Care Management Science,Springer, Science+Business Media, LLC, part of Springer Nature ,2019,pp.414-426.
CrossRef
A.M. Hemeida a, Salem Alkhalaf b , A. Mady c , E.A. Mahmoud c , M.E. Hussein c , Ayman M. Baha Eldin d “Implementation of nature-inspired optimization algorithms in some data mining tasks” Published by Elsevier B.V. on behalf of Faculty of Engineering, Ain Shams University. Ain Shams Engineering Journal, June 2020, pp. 309-318.
CrossRef
Venkateswara Rao, L. Mary Gladence, V. Raja Lakshmi” Research of Feature Selection Methods to Predict Breast Cancer” International Journal of Recent Technology and Engineering, September 2019,pp.2356-2367.
Habib Dhahri, Eslam Al Maghayreh, Awais Mahmood, Wail Elkilani, Mohammed Faisal Nagi, “Automated Breast Cancer Diagnosis Based on Machine Learning Algorithms”, Journal of Healthcare Engineering, 2019,Article Id- 4253641, 11 pages.
CrossRef
Walid Cherif,”Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis”, Elsevier Procedia Computer Science,2018, pp.293-299.
CrossRef
Hui Huang, Xi’an Feng, Suying Zhou, Jionghui Jiang, Huiling Chen, Yuping Li and Chengye Li”A new fruit fly optimization algorithm enhanced support vector machine for diagnosis of breast cancer based on high-level features”BMC Bioinformatics 20,2019, Art,No–290
CrossRef
Sapiah Binti Sakri, Nuraini Binti Abdul Rashid, and Zuhaira Muhammad Zain” Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction” Special section on Big Data Learning and Discovery ,IEEE Access , June, 2018,pp.29637-29647.
CrossRef
Kamel, S.R., YaghoubZadeh, R. & Kheirabadi, M. “Improving the performance of support-vector machine by selecting the best features by Gray Wolf algorithm to increase the accuracy of diagnosis of breast cancer”. J Big Data6, 2019, Art.No-90.
CrossRef
American Cancer Society. 2018. Global Cancer: Facts & Figures, 4th edition, pp-12-15.
India against cancer 2019, “Breast Cancer”, National Institute of Cancer Prevention and Research, viewed 12 November 2019.
American Cancer Society. Breast Cancer Facts & Figures 2019-2020. Atlanta: American Cancer Society, Inc. 2019.
P. Mekha and N. Teeyasuksaet, ‘‘Deep learning algorithms for predicting breast cancer based on tumor cells,’’ in Proc. Joint Int. Conf. Digit. Arts, Media Technol. With ECTI Northern Sect. Conf. Electr., Electron., Comput. Telecommun. Eng. (ECTI DAMT-NCON), Jan.2019, pp. 343–346.
CrossRef
P. Israni, ‘‘Breast cancer diagnosis (BCD) model using machine learning,’’ Int. J. Innov. Technol. Exploring Eng., Aug. 2019, pp. 4456–4463.
CrossRef
A. A. Bataineh, ‘‘A comparative analysis of nonlinear machine learning algorithms for breast cancer detection,’’ Int. J. Mach. Learn. Comput, Jun. 2019, pp. 248–254.
CrossRef
M. K. Keles, ‘‘Breast cancer prediction and detection using data mining classification algorithms: A comparative study,’’ Tehnički Vjesnik,2019, pp. 149–155.
Khourdifi and M. Bahaj, “Applying Best Machine Learning Algorithms for Breast Cancer Prediction and Classification,” 2018 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS), 2018, Corpus ID: 58013185.
CrossRef
Y. Lu, J.-Y. Li, Y.-T. Su, and A.-A. Liu, ‘‘A review of breast cancer detection in medical images,’’ in Proc. IEEE Vis. Commun. Image Process. (VCIP), Dec. 2018, pp. 1–4.
R. Hou, M. A. Mazurowski, L. J. Grimm, J. R. Marks, L. M. King, C. C. Maley, E.-S.-S. Hwang, and J. Y. Lo, ‘‘Prediction of upstaged ductal carcinoma in situ using forced labeling and domain adaptation,’’ IEEE Trans. Biomed. Eng., Jun. 2020, pp. 1565–1572.
CrossRef
A. Memis, N. Ozdemir, M. Parildar, E. E. Ustun, and Y. Erhan, ‘‘Mucinous (colloid) breast cancer: Mammographic and US features with histologic correlation,’’ Eur. J. Radiol., Jul. 2000, pp. 39–43,
CrossRef
A. Reddy, B. Soni, and S. Reddy, ‘‘Breast cancer detection by leveraging machine learning,’’ ICT Express, 2020, pp-320-324.
CrossRef
Z. Salod and Y. Singh, ‘‘Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol,’’ J. Public Health Res., Dec. 2019, pp. 1677.
CrossRef
S. Eltalhi and H. Kutrani, ‘‘Breast cancer diagnosis and prediction using machine learning and data mining techniques: A review,’’ IOSR J. Dental Med. Sci., Apr. 2019, pp. 85–94.
M. D. Ganggayah, N. A. Taib, Y. C. Har, P. Lio, and S. K. Dhillon, ‘‘Predicting factors for survival of breast cancer patients using machine learning techniques,’’ BMC Med. Inform. Decis. Making, 2019, Art.No-48.
CrossRef
A. A. Ibrahim, A. I. Hashad, and N. E. M. Shawky, ‘‘A comparison of open source data mining tools for breast cancer classification,’’ in Handbook of Research on Machine Learning Innovations and Trends. Hershey, PA, USA: IGI Global, 2017, pp. 636–651.
CrossRef
M. Hosni, I. Abnane, A. Idri, J. M. C. de Gea, and J. L. Fernández Alemán, ‘‘Reviewing ensemble classification methods in breast cancer,’’ Comput. Methods Programs Biomed. Aug. 2019, pp. 89–112.
CrossRef
M. Abdar and V. Makarenkov, ‘‘CWV-BANN-SVM ensemble learning classifier for an accurate diagnosis of breast cancer,’’ Measurement, Nov. 2019, pp. 557–570.
CrossRef
S. P. Rajamohana, A. Dharani, P. Anushree, B. Santhiya, and K. Umamaheswari, ‘‘Machine learning techniques for healthcare applications: Early autism detection using ensemble approach and breast cancer prediction using SMO and IBK,’’ in Cognitive Social Mining Applications in Data Analytics and Forensics. Hershey, PA, USA: IGI Global, 2019, pp. 236–251.
CrossRef
M. Togacar and B. Ergen, ‘‘Deep learning approach for classification of breast cancer,’’ in Proc. Int. Conf. Artif. Intell. Data Process. (IDAP), Sep. 2018, pp. 1–5.
CrossRef
M. Tiwari, R. Bharuka, P. Shah, and R. Lokare, ‘‘Breast cancer prediction using deep learning and machine learning techniques,’’ SSRN, New York, NY, USA, 2020,Tech. Rep. 3558786.
CrossRef
D. Selvathi and A. A. Poornila, ‘‘Deep learning techniques for breast cancer detection using medical image analysis,’’ in Biologically Rationalized Computing Techniques for Image Processing Applications. Cham, Switzerland: Springer, 2018, pp. 159–186.
CrossRef
G. Hamed, M. A. E.-R. Marey, S. E.-S. Amin, and M. F. Tolba, ‘‘Deep learning in breast cancer detection and classification,’’ in Proc. Joint Eur.-US Workshop Appl. Invariance Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 322–333.
CrossRef
F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, ‘‘Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,’’ CA, Cancer J. Clin., Nov. 2018, pp. 394–424.
CrossRef
S. Khalil, L. Hatch, C. R. Price, S. H. Palakurty, E. Simoneit, A. Radisic, A. Pargas, I. Shetty, M. Lyman, P. Couchot, R. Roetzheim, L. Guerra, and E. Gonzalez, ‘‘Addressing breast cancer screening disparities among uninsured and insured patients: A student-run free clinic initiative,’’ J. Community Health, 2019, pp. 1–5, Oct.
CrossRef
C. Siotos, A. Naska, R. J. Bello, A. Uzosike, P. Orfanos, D. M. Euhus, M. A. Manahan, C. M. Cooney, P. Lagiou, and G. D. Rosson, ‘‘Survival and disease recurrence rates among breast cancer patients following mastectomy with or without breast reconstruction,’’ Plastic Reconstructive Surg., 2019,pp. 169e–177e,.
CrossRef
H. Memon, J. P. Li, A. U. Haq, M. H. Memon, and W. Zhou, ‘‘Breast cancer detection in the IOT health environment using modified recursive feature selection,’’ Wireless Commun. Mobile Comput., Nov. 2019, pp. 1–19.
CrossRef
A. A. Said, L. A. Abd-Elmegid, S. Kholeif, and A. Abdelsamie, ‘‘Classification based on clustering model for predicting main outcomes of breast cancer using hyper-parameters optimization,’’ Int. J. Adv. Comput. Sci. Appl. 2018, pp. 268–273.
A. Bharat, N. Pooja, and R. A. Reddy, ‘‘Using machine learning algorithms for breast cancer risk prediction and diagnosis,’’ in Proc. 3rd Int. Conf. Circuits, Control, Commun. Comput. (IC), Oct. 2018, pp. 1–4.
CrossRef
E. A. Bayrak, P. Kirci, and T. Ensari, ‘‘Comparison of machine learning methods for breast cancer diagnosis,’’ in Proc. Sci. Meeting Elect.- Electron. Biomed. Eng. Comput. Sci. (EBBT), Apr. 2019, pp. 1–3.
CrossRef
M. Abdar, M. Zomorodi-Moghadam, X. Zhou, R. Gururajan, X. Tao, P. D. Barua, and R. Gururajan, ‘‘A new nested ensemble technique for automated diagnosis of breast cancer,’’ Pattern Recognit. Lett., Apr. 2020, pp. 123–131,.
CrossRef
D. A. Omondiagbe, S. Veeramani, and A. S. Sidhu, ‘‘Machine learning classification techniques for breast cancer diagnosis,’’ IOP Conf. Ser., Mater. Sci. Eng., vol. 495, Jun. 2019, Art. no. 012033.
CrossRef
S. N. Singh and S. Thakral, ‘‘Using data mining tools for breast cancer prediction and analysis,’’ in Proc. 4^th Conf. Comput. Commun. Automat. (ICCCA), Dec. 2018, pp. 1–4.
CrossRef
Bharati, M. A. Rahman, and P. Podder, ‘‘Breast cancer prediction applying different classification algorithm with comparative analysis using WEKA,’’ in Proc. 4th Int. Conf. Electr. Eng. Inf. Commun. Technol. (iCEEiCT), Sep. 2018, pp. 581–584.
CrossRef
L. Shen, L. R. Margolies, J. H. Rothstein, E. Fluder, R. McBride, and W. Sieh, ‘‘Deep learning to improve breast cancer detection on screening mammography,’’ Sci. Rep., Dec. 2019, pp. 1–12,.
CrossRef
Zheng, Jing & Lin, Denan & Gao, Zhongjun & Wang, Shuang & He, Mingjie & Fan, Jipeng “Deep Learning Assisted Efficient AdaBoost Algorithm for Breast Cancer Detection and Early Diagnosis”. IEEE Access. 2020. PP. 1-1.
CrossRef
M. S. Yarabarla, L. K. Ravi, and A. Sivasangari, ‘‘Breast cancer prediction via machine learning,’’ in Proc. 3rd Int. Conf. Trends Electron. Informat. (ICOEI), Apr. 2019, pp. 121–124.
CrossRef
U. Ojha and S. Goel, ‘‘A study on prediction of breast cancer recurrence using data mining techniques,’’ in Proc. 7th Int. Conf. Cloud Computer, Data Sci. Eng.-Confluence, Jan. 2017,pp.527–530.
CrossRef

(Visited 2,195 times, 1 visits today)