DeepCerv: Deep Neural Network for Segmentation Free Robust Cervical Cell Classification

Nirmal Jith, O. U.; Harinarayanan, K. K.; Gautam, Srishti; Bhavsar, Arnav; Sao, Anil K.

doi:10.1007/978-3-030-00949-6_11

O. U. Nirmal Jith²⁸,
K. K. Harinarayanan²⁸,
Srishti Gautam²⁹,
Arnav Bhavsar²⁹ &
…
Anil K. Sao²⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11039))

Included in the following conference series:

2267 Accesses
10 Citations

Abstract

Automated classification of cervical cancer cells has the potential to reduce high mortality rates due to cervical cancer in developing countries. However traditional algorithms for the same depend on accurate segmentation of cells, which in itself is an open problem. Often the algorithms are also not evaluated by considering the huge inter-observer variability in ground truth labels. We propose a new deep learning algorithm that does not depend on accurate segmentation by directly classifying image patches with cells. We evaluate the proposed algorithm on the popular Herlev dataset and show that it achieves state of the art accuracy while being extremely fast. The experimental results are also demonstrated using AIndra dataset collected by us, which also captures the inter observer variability.

O. U. Nirmal Jith and K. K. Harinarayanan—Equal contribution.

You have full access to this open access chapter, Download conference paper PDF

Automated cervical cell segmentation using deep ensemble learning

Article Open access 21 September 2023

CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology

Article 31 March 2021

Deep Transfer Learning Model for Automated Screening of Cervical Cancer Cells Using Multi-cell Images

Keywords

1 Introduction

Cervical cancer is the second most common cancer among women, with more than half a million new cases reported every year [1]. However systematic screening of cervical cancer using Papanicolaou test (PAP-test) can reduce mortality rate by 70% or more [6]. PAP-test consists of a cytologist scanning a slide of vaginal smear, typically at 400x magnification. At this magnification, cytologist has to look at thousands of field of views raising the possibility of fatigue, thereby restricting the number of samples observed to 70 per day [5]. In light of these challenges automation of cervical cancer screening has the potential to significantly improve healthcare.

In this work we propose a new deep learning algorithm for classification of cervical cancer cells. The algorithm is intended to be usable in a health centre with limited computing resources. Hence it is designed to be extremely fast and lightweight. The algorithm surpasses state of the art performance in Herlev dataset while being robust to segmentation errors. We also conducted experimets using our AIndra dataset. This new cervical cell dataset contains annotations for nuclear boundary and labels by multiple expert annotators. This dataset enables novel analysis and interpretation of classification, segmentation and detection algorithms.

The combination of algorithm and dataset enables unique evaluation strategy. By taking into account the inter-observer variability, we are able to unearth interesting insights into the data which is not evident on the surface during evaluation. To best of our knowledge, no work has been reported, which demostrate the effect of inter-observability for PAP semear images.

2 Related Works

2.1 Performance Measures

Common measures of performance like accuracy, precision and recall presuppose the existence of unique ground truth. This assumes that the disagreement between two observers on a classification label is quite small. In many medical problems, this assumption does not hold true. As an example, [17] found that only 35% of their PAP-test samples have unanimous agreement between pathologists. Hence, it is essential to include inter-observer variability in analysing algorithms.

A common strategy to deal with inter observer variance is to remove samples where observers disagree, but this willreduce the difficulty of problem by removing ambiguous samples. Another approach is to take a majority vote with odd number of observers. This method forces a label on samples which are fundamentally ambiguous. Consequently training/evaluation with this data penalises algorithm on samples where pathologists were indecisive. The consensus in medical community to deal with this challenge is to use measures like percentage agreement between the observers [10] or Cohen’s kappa coefficient (\(\kappa \)) [4, 10] which discount observer agreement due to random chance. We refer to percentage agreement between the observers using the symbol \(\varTheta \) in the following sections.

2.2 Datasets

The most popular dataset for evaluating cervical cancer cell classification is the Herlev dataset [12]. It consists of 917 high quality images of single cells in seven classes. During data collection, cells were labelled by two cyto-technicians. Cells that were labelled differently were discarded. Consequently the dataset has artificially reduced difficulty as discussed in Sect. 2.1. Though there are other datasets like HEMLBC [20] none of them provide annotations from multiple pathologists, ruling out inter observer variability analysis.

2.3 Algorithms for Cervical Cancer Cell Classification

During the past decade, extensive research has been devoted towards accurate classification of cells for automating PAP test. Most of the methods looked at classification of single cells into various stages of carcinoma [3, 13]. These methods in turn relied on accurate segmentation of cell or nucleus. However state of the art segmentation algorithms [3, 8, 16, 18] do not provide the needed segmentation accuracy. As an illustrative example, the best segmentation algorithm achieve a ZSI score of 0.92 on Herlev dataset [19]. When this performance is taken into account classification accuracy drops [20].

An interesting approach that does not rely on accurate segmentation is classification of image patches [11, 14]. These patches consists of a cell and its immediate neighbourhood. The recent work [20] used this patch based method. However their evaluation does not involve challenges like clumping, staining variation, overlapping cell etc. The algorithm proposed by [20] is also computationally expensive, taking around 3.5 seconds per input. Given that there are typically upto 300,000 cells in a single slide [12], the algorithm will not be usable on a clinical device.

3 Our Contributions

In this work, we propose a new deep learning algorithm for classification of cervical cancer cells. We also illustrate a unique evaluation strategy.

Salient Aspects of the Proposed Algorithm

Not reliant on accurate segmentation of nucleus or cytoplasm
Extremely fast and lightweight
Surpasses state of the art performance on Herlev dataset

Salient Aspects of Evaluation Method

Accounts for inter-observer variability using our AIndra dataset
Includes common evaluation strategy as a subset
Brings out latent information in the dataset

3.1 AIndra Dataset

This dataset consists, 140 images of conventional PAP smear with sizes varying from 640\(\,\times \,\)480 to 1280\(\,\times \,\)720 pixels with a total of 1201 cells. Each image consists of multiple epithelial cells along with granulocytes like neutrophils. The images also exhibit clumping, defocus blur, staining variation etc. Few sample images from the dataset are shown in Fig. 1. Each epithelial cell in the dataset has its nuclear boundary marked and is classified by a cyto-patholgist (annotator-1) and a cyto-technician (annotator-2) according to Bethesda system [15]. Unlike other datasets, we retain both labels. Among the annotated nucleus, we have an \(\varTheta \) of 76.55% and \(\kappa \) of 0.61.

This dataset is the first cervical cancer cell dataset with multiple expert annotations enabling inter observer variability analysis. Since the dataset contains images with multiple cells, it can be used for benchmarking detection and segmentation of nuclei. The dataset also enables the use of features that are external to epithelial cells, like presence of neutrophils.^{Footnote 1}

3.2 DeepCerv: Network architecture

Our Network design is guided by the twin goals of accuracy and speed. Hence the network takes in raw RGB pixel values without any data prepossessing. Design of the network follows the observation that neural networks for medical image analysis, display adequate performance with low depth. This observation is validated by popular networks in literature like in [21]. The network constitutes the initial three layers of AlexNet [7], batch normalization layer to reduce overfitting, followed by a fully connected layer as depicted in the Table 1

Table 1. Network architecture

Full size table

The network is designed to process image patches of size (99,99,3) because, at 40X resolution complete cell information will be captured in 99\(\,\times \,\)99 field of view. It classifies cells into two classes; normal and abnormal. The abnormal class consists of the abnormal classes in the Bethesda system and normal class captures the rest.

Model Size and Inference Time:- Our implementation of the network in tensorflow is only 6 MB in size without any optimizations like weight quantization. The network is extremely fast, processing an input in 1.7 ms on a Nvidia Geforce GTX930 M GPU with only 5–10% utilisation. With an estimated count of 300,000 cells per slide [12], our network takes 8.5 min per slide as compared to 12 days for [20]. This performance is without any discount for the low end gpu we use in comparison to the TITAN Z used in [20]. The small size of network coupled with extremely fast performance enable future applications for cervical cancer screening on mobile devices.

3.3 Nucleus Detection for Generating DeepCerv Input

DeepCerv expects input to be single cell images. But all the images in AIndra dataset contains multiple cells, even overlapping cells. Hence we pass these images through a cell detection algorithm to detect cell regions. The algorithm will do a contrast adaptive histogram equalisation (CLAHE) followed by thresholding on the AIndra dataset images to get a binary image. Connected components from this binary image is analysed and those with less than 20% overlap with any groundtruth epithelial cells are rejected. Remaining connected components are considered to be potential nuclei. Though the algorithm is simple, it achieves reasonable performance as given in Table 2. We denote this method by the label SEED.

Table 2. Localization performance of detection algorithm for SEED

Full size table

We also use the annotated ground truth to generate input patches. These patches would be free of segmentation errors and hence would serve as a benchmark of DeepCerv performance. This is similar to the strategies employed in other segmentation free algorithms like [20]. To generate the data we crop a patch of fixed size around the centroid of ground truth. We use the label GND to refer this strategy in following sections.

4 Experiments and Results

DeepCerv is evaluated on Herlev dataset and in the AIndra dataset. In line with other work in literature we report accuracy on Herlev dataset. The reason for better performance of features, estimated using first few layers of the networks, could be that they capture low-level features of cell image, such as texture and smoothness of the cell boundary, hence become decisive features for abnormal cell and supported by Bethseda system also. On the proposed dataset we use inter observer agreement and Cohen’s kappa as per the discussion in Sect. 2.1.

Experiment Setup:- We have used Stochastic gradient descent(SGD) optimiser for training the network described in Sect. 3.2, with parameter setup:- learning rate= 0.0001, decay= 1\(e^{-6}\), momentum = 0.9. The network performance is improved by using the data augmentation methods like image rotation, width/height shift and horizontal flip

4.1 Experiments on Herlev Dataset

We did a 5 fold cross-validation on this dataset and the results are in Table 3. It is to be noted that no prepossessing of any sort is involved apart from resizing. The seven classes in the dataset were converted to two classes by combining all abnormal classes in Herlev into one and all the normal classes into another. The table clearly shows that we are able to achieve state of the art performance on Herlev dataset.

Table 3. Performance on Herlev dataset

Full size table

4.2 Experiments on the AIndra Dataset

Experiments on Data Where Annotators Agree.

The discussion in Sect. 2.1 brought forward the impact of various strategies to deal with inter-observer variance. Since the strategy of discarding samples where annotators do not agree is prevalent in literature, we explore the impact of such a strategy. Hence in this experiment we use the cells on which annotators agree, hereafter referred as common data. We perform 5-fold crossvalidation on common data using DeepCerv. From the results given in Table 4 we can see that the percentage agreement and \(\kappa \) between algorithm and common data (\(80.53\%\), 0.57) is close to that between two annotators (\(76.55\%\), 0.61). When the same experiment is repeated on SEED, the results do not show significant change thereby validating robustness of DeepCerv to segmentation errors.

Table 4. Performance on data where annotators agree

Full size table

Performance on Data from Individual Annotators. The strategy of training and testing on common data illustrates the performance of DeepCerv on comparatively error free data. However as this strategy reduces the problem complexity, the results do not reflect the performance on a practical problem. A better estimate would be to see how DeepCerv performs when the data contains cells that are ambiguous and labels have randomness associated with them. Consequently, in this experiment we generate data by using all cells annotated by individual annotators. Similar to the earlier section, we perform 5-fold crossvalidation on this data using DeepCerv. The result of the experiment is given in Table 5.

Table 5. Performance on data from individual annotators

Full size table

The results in Table 5 shows observations in line to that of common data. While the algorithm is able to achieve high percentage agreement on both annotators, the performance on annotator-1 exceeds that on annotator-2. Surprisingly algorithm is able to achieve better agreement with annotator-1 than with common data in the previous section. These observations may indicate low annotation consistency of annotator-2 in comparison to annotator-1. Interestingly this aligns with the annotator profiles given in Sect. 3.1 and acts as a validation to the algorithm.

5 Conclusions

In this work, we proposed a new deep learning algorithm for classification of cervical cells. The algorithm is able to surpass state of the art performance in Herlev dataset while being extremely fast in comparison to similar work on cervical cancer cell classification in literature. The algorithm in virtue of high accuracy and speed has the potential to enable automated cervical cancer screening on low power devices while the AIndra dataset allows novel analysis that is much closer to real world applications. Through the combination of algorithm and dataset, we are able to provide novel analysis that brings forward the importance of considering inter observer reliability in context of medical problems and the insights it can provide on data.

Notes

1.
Presence of neutrophils signifies inflammation in that region.

References

Bengtsson, E., Malm, P.: Screening for Cervical Cancer Using Automated Analysis of PAP-Smears. Computational and Mathematical Methods in Medicine 2014, 1–12 (2014)
Article Google Scholar
Bora, K., Chowdhury, M., Mahanta, L.B., Kundu, M.K., Das, A.K.: Automated classification of Pap smear images to detect cervical dysplasia. Computer Methods and Programs in Biomedicine 138, 31–47 (2017)
Article Google Scholar
Chankong, T., Theera-Umpon, N., Auephanwiriyakul, S.: Automatic cervical cell segmentation and classification in Pap smears. Computer Methods and Programs in Biomedicine 113(2), 539–556 (2014)
Article Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Article Google Scholar
Elsheikh, T.M., Austin, R.M., Chhieng, D.F., Miller, F.S., Moriarty, A.T., Renshaw, A.A.: American Society of Cytopathology: American Society of Cytopathology workload recommendations for automated Pap test screening: Developed by the productivity and quality assurance in the era of automated screening task force. Diagnostic Cytopathology 41(2), 174–178 (2013)
Article Google Scholar
Kitchener, H.C., Castle, P.E., Cox, J.T.: Chapter 7: achievements and limitations of cervical cytology screening. Vaccine 24, S63–S70 (2006)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. pp. 1097–1105 (2012)
Google Scholar
Lu, Z., Carneiro, G., Bradley, A.P., Ushizima, D., Nosrati, M.S., Bianchi, A.G.C., Carneiro, C.M., Hamarneh, G.: Evaluation of Three Algorithms for the Segmentation of Overlapping Cervical Cells. IEEE Journal of Biomedical and Health Informatics 21(2), 441–450 (2017)
Article Google Scholar
Marinakis, Y., Dounias, G., Jantzen, J.: Pap smear diagnosis using a hybrid intelligent scheme focusing on genetic algorithm based feature selection and nearest neighbor classification. Computers in Biology and Medicine 39(1), 69–78 (2009)
Article Google Scholar
McHugh, M.L.: Interrater reliability: The kappa statistic. Biochemia Medica 22(3), 276–282 (2012)
Article MathSciNet Google Scholar
Nanni, L., Lumini, A., Brahnam, S.: Local binary patterns variants as texture descriptors for medical image analysis. Artificial Intelligence in Medicine 49(2), 117–125 (2010)
Article Google Scholar
Norup, J.: Classification of Pap-Smear Data by Tranduction Neuro-Fuzzy Methods. masterThesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark (2005)
Google Scholar
Phoulady, H.A., Zhou, M., Goldgof, D.B., Hall, L.O., Mouton, P.R.: Automatic quantification and classification of cervical cancer via adaptive nucleus shape modeling. In: Image Processing (ICIP), 2016 IEEE International Conference On. pp. 2658–2662. IEEE (2016)
Google Scholar
Sokouti, B., Haghipour, S., Tabrizi, A.D.: A framework for diagnosing cervical cancer disease based on feedforward MLP neural network and ThinPrep histopathological cell image features. Neural Computing and Applications 24(1), 221–232 (2014)
Article Google Scholar
Solomon, D.: The 2001 Bethesda SystemTerminology for Reporting Results of Cervical Cytology. JAMA 287(16), 2114 (2002)
Article Google Scholar
Song, Y., Zhang, L., Chen, S., Ni, D., Lei, B., Wang, T.: Accurate Segmentation of Cervical Cytoplasm and Nuclei Based on Multiscale Convolutional Network and Graph Partitioning. IEEE transactions on bio-medical engineering 62(10), 2421–2433 (2015)
Article Google Scholar
Young, N.A., Naryshkin, S., Atkinson, B.F., Ehya, H., Gupta, P.K., Kline, T.S., Luff, R.D.: Interobserver variability of cervical smears with squamous-cell abnormalities: A philadelphia study. Diagnostic cytopathology 11(4), 352–357 (1994)
Article Google Scholar
Zhang, L., Kong, H., Liu, S., Wang, T., Chen, S., Sonka, M.: Graph-based segmentation of abnormal nuclei in cervical cytology. Computerized Medical Imaging and Graphics 56, 38–48 (2017)
Article Google Scholar
Zhang, L., Kong, H., Ting Chin, C., Liu, S., Fan, X., Wang, T., Chen, S.: Automation-assisted cervical cancer screening in manual liquid-based cytology with hematoxylin and eosin staining. Cytometry. Part A: The Journal of the International Society for Analytical Cytology 85(3), 214–230 (Mar 2014)
Article Google Scholar
Zhang, L., Lu, L., Nogues, I., Summers, R.M., Liu, S., Yao, J.: DeepPap: Deep Convolutional Networks for Cervical Cell Classification. IEEE Journal of Biomedical and Health Informatics 21(6), 1633–1643 (2017)
Article Google Scholar
Zhu, X., Yao, J., Zhu, F., Huang, J.: Wsisa: Making survival prediction from whole slide histopathological images. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 7234–7242 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Aindra Systems, Bangalore, India
O. U. Nirmal Jith & K. K. Harinarayanan
School of Computing and Electrical Engineering, Indian Instiute of Technology, Mandi, India
Srishti Gautam, Arnav Bhavsar & Anil K. Sao

Authors

O. U. Nirmal Jith
View author publications
You can also search for this author in PubMed Google Scholar
K. K. Harinarayanan
View author publications
You can also search for this author in PubMed Google Scholar
Srishti Gautam
View author publications
You can also search for this author in PubMed Google Scholar
Arnav Bhavsar
View author publications
You can also search for this author in PubMed Google Scholar
Anil K. Sao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O. U. Nirmal Jith .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
Radboud University Medical Center, Nijmegen, The Netherlands
Francesco Ciompi
Baidu, Beijing, China
Yanwu Xu
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
University of Warwick, Coventry, UK
Nasir Rajpoot
Radboud University Medical Centre, Nijmegen, The Netherlands
Jeroen van der Laak
Eindhoven University of Technology, Eindhoven, The Netherlands
Mitko Veta
University of Dundee, Dundee, UK
Stephen McKenna
University Hospital Coventry, Coventry, UK
David Snead
University of Dundee, Dundee, UK
Emanuele Trucco
University of Iowa, Iowa City, IA, USA
Mona K. Garvin
Soochow University, Suzhou, China
Xin Jan Chen
Medical University of Vienna, Vienna, Austria
Hrvoje Bogunovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nirmal Jith, O.U., Harinarayanan, K.K., Gautam, S., Bhavsar, A., Sao, A.K. (2018). DeepCerv: Deep Neural Network for Segmentation Free Robust Cervical Cell Classification. In: Stoyanov, D., et al. Computational Pathology and Ophthalmic Medical Image Analysis. OMIA COMPAY 2018 2018. Lecture Notes in Computer Science(), vol 11039. Springer, Cham. https://doi.org/10.1007/978-3-030-00949-6_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-00949-6_11
Published: 14 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00948-9
Online ISBN: 978-3-030-00949-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DeepCerv: Deep Neural Network for Segmentation Free Robust Cervical Cell Classification

Abstract

Similar content being viewed by others

Automated cervical cell segmentation using deep ensemble learning

CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology

Deep Transfer Learning Model for Automated Screening of Cervical Cancer Cells Using Multi-cell Images

Keywords

1 Introduction

2 Related Works

2.1 Performance Measures

2.2 Datasets

2.3 Algorithms for Cervical Cancer Cell Classification

3 Our Contributions

3.1 AIndra Dataset

3.2 DeepCerv: Network architecture

3.3 Nucleus Detection for Generating DeepCerv Input

4 Experiments and Results

4.1 Experiments on Herlev Dataset

4.2 Experiments on the AIndra Dataset

5 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DeepCerv: Deep Neural Network for Segmentation Free Robust Cervical Cell Classification

Abstract

Similar content being viewed by others

Automated cervical cell segmentation using deep ensemble learning

CytoBrain: Cervical Cancer Screening System Based on Deep Learning Technology

Deep Transfer Learning Model for Automated Screening of Cervical Cancer Cells Using Multi-cell Images

Keywords

1 Introduction

2 Related Works

2.1 Performance Measures

2.2 Datasets

2.3 Algorithms for Cervical Cancer Cell Classification

3 Our Contributions

3.1 AIndra Dataset

3.2 DeepCerv: Network architecture

3.3 Nucleus Detection for Generating DeepCerv Input

4 Experiments and Results

4.1 Experiments on Herlev Dataset

4.2 Experiments on the AIndra Dataset

5 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation