Elsevier

Neurocomputing

Volume 437, 21 May 2021, Pages 186-194
Neurocomputing

Interpreting chest X-rays via CNNs that exploit hierarchical disease dependencies and uncertainty labels

https://doi.org/10.1016/j.neucom.2020.03.127Get rights and content

Abstract

Chest radiography is one of the most common types of diagnostic radiology exams, which is critical for screening and diagnosis of many different thoracic diseases. Specialized algorithms have been developed to detect several specific pathologies such as lung nodules or lung cancer. However, accurately detecting the presence of multiple diseases from chest X-rays (CXRs) is still a challenging task. This paper presents a supervised multi-label classification framework based on deep convolutional neural networks (CNNs) for predicting the presence of 14 common thoracic diseases and observations. We tackle this problem by training state-of-the-art CNNs that exploit hierarchical dependencies among abnormality labels. We also propose to use the label smoothing technique for a better handling of uncertain samples, which occupy a significant portion of almost every CXR dataset. Our model is trained on over 200,000 CXRs of the recently released CheXpert dataset and achieves a mean area under the curve (AUC) of 0.940 in predicting 5 selected pathologies from the validation set. This is the highest AUC score yet reported to date. The proposed method is also evaluated on the independent test set of the CheXpert competition, which is composed of 500 CXR studies annotated by a panel of 5 experienced radiologists. The performance is on average better than 2.6 out of 3 other individual radiologists with a mean AUC of 0.930, which ranks first on the CheXpert leaderboard at the time of writing this paper.

Introduction

Chest X-ray (CXR) is one of the most common radiological exams in diagnosing many different diseases related to lung and heart, with millions of scans performed globally every year [1], [2]. Many diseases among them, like Pneumothorax [3], can be deadly if not diagnosed quickly and accurately enough. A computer-aided diagnosis (CAD) system that is able to correctly diagnose the most common observations from CXRs will significantly benefit many clinical practices. In this work, we investigate the problem of multi-label classification for CXRs using deep convolutional neural networks (CNNs).

There has been a recent effort to harness advances in machine learning, especially deep learning, to build a new generation of CAD systems for classification and localization of common thoracic diseases from CXR images [4]. Several motivations are behind this transformation: First, interpreting CXRs to accurately diagnose pathologies is difficult. Even well-trained radiologists can easily make mistake due to challenges in distinguishing different kinds of pathologies, many of which often have similar visual features [5]. Therefore, a high-precision method for common thorax diseases classification and localization can be used as a second reader to support the decision making process of radiologists and to help reduce the diagnostic error. It also addresses the lack of diagnostic expertise in areas where the radiologists are limited or not available [6], [7]. Second, such a system can be used as a screening tool that helps reduce waiting time of patients in hospitals and allows care providers to respond to emergency situations sooner or to speed up a diagnostic imaging workflow [8]. Third, deep neural networks, in particular deep CNNs, have shown remarkable performance for various applications in medical imaging analysis [9], including the CXR interpretation task [10], [11], [12], [13].

Several deep learning-based approaches have been proposed for classifying lung diseases and proven that they could achieve human-level performance [10], [14]. Almost all of these approaches, however, aim to detect some specific diseases such as pneumonia [15], tuberculosis [16], [17], or lung cancer [18]. Meanwhile, building a unified deep learning framework for accurately detecting the presence of multiple common thoracic diseases from CXRs remains a difficult task that requires much research effort. In particular, we recognize that standard multi-label classifiers often ignore domain knowledge. For example, in the case of CXR data, how to leverage clinical taxonomies of disease patterns and how to handle uncertainty labels are still open questions, which have not received much research attention. This observation motivates us to build and optimize a predictive model based on deep CNNs for the CXR interpretation in which dependencies among labels and uncertainty information are taken into account during both the training and inference stages. Specifically, we develop a deep learning-based approach that puts together the ideas of conditional training [19] and label smoothing [20] into a novel training procedure for classifying 14 common lung diseases and observations. We trained our system on more than 200,000 CXRs of the CheXpert dataset [21]—one of the largest CXR dataset currently available, and evaluated it on the validation set of CheXpert containing 200 studies, which were manually annotated by 3 board-certified radiologists. The proposed method is also tested against the majority vote of 5 radiologists on the hidden test set of the CheXpert competition that contains 500 studies.

This study makes several contributions. First, we propose a novel training strategy for multi-label CXR classification that incorporates (1) a conditional training process based on a predefined disease hierarchy and (2) a smoothing regularization technique for uncertainty labels. The benefits of these two key factors are empirically demonstrated through our ablation studies. Second, we train a series of state-of-the-art CNNs on frontal-view CXRs of the CheXpert dataset for classifying 14 common thoracic diseases. Our best model, which is an ensemble of various CNN architectures, achieves the highest area under ROC curve (AUC) score on both the validation set and test set of CheXpert at the time being. Specifically, on the validation set, it yields an averaged AUC of 0.940 in predicting 5 selected lung diseases: Atelectasis (0.909), Cardiomegaly (0.910), Edema (0.958), Consolidation (0.957) and Pleural Effusion (0.964). This model improves the baseline method reported in [21] by a large margin of 5%. On the independent test set, we obtain a mean AUC of 0.930. More importantly, the proposed deep learning model is on average more accurate than 2.6 out of 3 individual radiologists in predicting the 5 selected thoracic diseases when presented with the same data.1

The rest of the paper is organized as follows. Related works on CNNs in medical imaging and the problem of multi-label classification in CXR images are reviewed in Section 2. In Section 3, we present the details of the proposed method with a focus on how to deal with dependencies among diseases and uncertainty labels. Section 4 provides comprehensive experiments on the CheXpert dataset. Section 5 discusses the experimental results, some key findings and limitations of this research. Finally, Section 6 concludes the paper.

Section snippets

Related works

Thanks to the increased availability of large scale, high-quality labeled datasets [22], [21], [23] and high-performing deep network architectures [24], [25], [26], [27], deep learning-based approaches have been able to reach, even outperform expert-level performance for many medical image interpretation tasks [10], [12], [11], [28], [29], [16]. Most successful applications of deep neural networks in medical imaging rely on CNNs [30], [31], which utilize convolutions to extract local features

Proposed method

In this section, we present details of the proposed method. We first give a formulation of the multi-label classification for CXRs and the evaluation protocol used in this study (Section 3.1). We then describe a new training procedure that exploits the relationship among diseases for improving model performance (Section 3.2). This section also introduces the way we use LSR to deal with uncertain samples in the training data (Section 3.3).

CXR dataset and settings

CheXpert dataset [21] was used to develop and evaluate the proposed method. This is one of the largest public CXR dataset currently available, which contains 224,316 X-ray scans of 65,240 patients. The dataset was labeled for the presence of 14 observations, including 12 common thoracic pathologies. Each observation can be assigned to either positive (1), negative (0), or uncertain (−1). The main task on CheXpert is to predict the probability of multiple observations from an input CXR. The

Key findings and meaning

By training a set of strong CNNs on a large scale dataset, we built a deep learning model that can accurately predict multiple thoracic diseases from CXRs. In particular, we empirically showed a major improvement, in terms of AUC score, by exploiting the dependencies among diseases and by applying the label smoothing technique to uncertain samples. We found that it is especially difficult to obtain a good AUC score for all diseases with a single CNN. It is also observed that the classification

Conclusion

We presented in this paper a comprehensive approach for building a high-precision computer-aided diagnosis system for common thoracic diseases classification from CXRs. We investigated almost every aspect of the task including data cleaning, network design, training, and ensembling. In particular, we introduced a new training procedure in which dependencies among diseases and uncertainty labels are effectively exploited and integrated in training advanced CNNs. Extensive experiments

CRediT authorship contribution statement

Hieu H. Pham: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Tung T. Le: Visualization. Dat Q. Tran: Visualization. Dat T. Ngo: Visualization. Ha Q. Nguyen: Conceptualization, Methodology, Writing - original draft, Writing - review & editing, Supervision, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was supported by the Vingroup Big Data Institute (VinBDI). The authors gratefully acknowledge Jeremy Irvin from the Machine Learning Group, Stanford University for helping us evaluate the proposed method on the hidden test set of CheXpert.

Huy H. Pham received the Ph.D. degree in Computer Science at the Toulouse Computer Science Research Institute and Toulouse Cerema Research Center, France. He is currently working as a staff research scientist at Medical Imaging Department at the Vingroup Big Data Institute (VinBDI), Hanoi, Vietnam with a focus on medical image analysis. His research interests include image processing, computer vision and machine learning.

References (51)

  • M. Annarumma et al.

    Automated triaging of adult chest radiographs with deep artificial neural networks

    Radiology

    (2019)
  • P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, et al.,...
  • Q. Guan, Y. Huang, Z. Zhong, Z. Zheng, L. Zheng, Y. Yang, Diagnose like a radiologist: attention guided convolutional...
  • P. Rajpurkar et al.

    Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

    PLoS Medicine

    (2018)
  • P. Kumar et al.

    Boosted cascaded convnets for multilabel classification of thoracic diseases in chest radiographs

  • E.J. Hwang et al.

    Development and validation of a deep learning–based automated detection algorithm for major thoracic diseases on chest radiographs

    JAMA Network Open

    (2019)
  • P. Lakhani et al.

    Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks

    Radiology

    (2017)
  • F. Pasa et al.

    Efficient deep network architectures for fast chest X-ray tuberculosis screening and visualization

    Scientific Reports

    (2019)
  • W. Ausawalaithong et al.

    Automatic lung cancer prediction from chest X-ray images using the deep learning approach

  • H. Chen, S. Miao, D. Xu, G.D. Hager, A.P. Harrison, Deep hierarchical multi-label classification of chest X-ray images,...
  • R. Müller et al.

    When does label smoothing help?

  • J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R.L. Ball, K. Shpanskaya, J....
  • X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R.M. Summers, Chestx-ray8: Hospital-scale chest X-ray database and...
  • A.E. Johnson, T.J. Pollard, S. Berkowitz, N.R. Greenbaum, M.P. Lungren, C.-Y. Deng, R.G. Mark, S. Horng, MIMIC-CXR: A...
  • K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE CVPR, 2016, pp. 770–778....
  • Cited by (108)

    View all citing articles on Scopus

    Huy H. Pham received the Ph.D. degree in Computer Science at the Toulouse Computer Science Research Institute and Toulouse Cerema Research Center, France. He is currently working as a staff research scientist at Medical Imaging Department at the Vingroup Big Data Institute (VinBDI), Hanoi, Vietnam with a focus on medical image analysis. His research interests include image processing, computer vision and machine learning.

    Tung T. Le is currently pursuing the B.S. degree in Computer Science from the Department of Computer Science, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam. He is also a research intern at Medical Imaging Department at the Vingroup Big Data Institute (VinBDI), Hanoi, Vietnam. His research interests include medical image analysis and deep learning techniques.

    Dat Q. Tran received his B.S. degree in Biomedical Engineering from the Department of Biomedical Engineering, International University, Ho Chi Minh City, Vietnam, in 2019. He is currently working as Computer Vision Research Engineer at Medical Imaging Department at the Vingroup Big Data Institute (VinBDI), Hanoi, Vietnam. His work focuses on applying advanced computer vision algorithm to biomedical Imaging.

    Dat T. Ngo received his B.S. degree in Electronics and Communication Engineering from the Faculty of Electronics & Telecommunications, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam in 2017. He is currently working as a Computer Vision Research Specialist at Medical Imaging Department at the Vingroup Big Data Institute (VinBDI), Hanoi, Vietnam. His research interests include computer vision and deep learning techniques.

    Ha Q. Nguyen was born in Hai Phong, Vietnam, in 1983. He received the B.S. degree in mathematics from the Hanoi National University of Education, Hanoi, Vietnam, the S.M. degree in electrical engineering and computer science from the Massachusetts Institute of Technology, Cambridge, MA, USA, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 2005, 2009, and 2014, respectively.

    During 2009–2011, he was a Lecturer in electrical engineering with the International University, Vietnam National University, Ho Chi Minh City, Vietnam, during 2014–2017, he was a Postdoctoral Research Associate with the Biomedical Imaging Group, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, and during 2017–2018, he was a Signal Processing Engineer with the Viettel Research & Development Institute, Hanoi, Vietnam. He is currently the Head of Medical Imaging Department at the Vingroup Big Data Institute, Hanoi, Vietnam. His research interests include medical image analysis, machine learning, computational imaging, and data compression.

    Dr. Nguyen was a Fellow of Vietnam Education Foundation, cohort 2007. He was the recipient of the Best Student Paper Award (second prize) of the IEEE International Conference on Acoustics, Speech and Signal Processing in 2014 for his paper (with P.A. Chou and Y. Chen) on the compression of human body sequences using graph wavelet filter banks.

    View full text