skip to main content
research-article
Open Access

Federated Learning for Electronic Health Records

Published:21 June 2022Publication History

Skip Abstract Section

Abstract

In data-driven medical research, multi-center studies have long been preferred over single-center ones due to a single institute sometimes not having enough data to obtain sufficient statistical power for certain hypothesis testings as well as predictive and subgroup studies. The wide adoption of electronic health records (EHRs) has made multi-institutional collaboration much more feasible. However, concerns over infrastructures, regulations, privacy, and data standardization present a challenge to data sharing across healthcare institutions. Federated Learning (FL), which allows multiple sites to collaboratively train a global model without directly sharing data, has become a promising paradigm to break the data isolation. In this study, we surveyed existing works on FL applications in EHRs and evaluated the performance of current state-of-the-art FL algorithms on two EHR machine learning tasks of significant clinical importance on a real world multi-center EHR dataset.

Skip 1INTRODUCTION Section

1 INTRODUCTION

The broad adoption of electronic health records (EHRs) presents opportunities for collaboration among hospitals. For medical research, multi-center studies have long been considered superior to single-center ones. The larger combined cohort allows for certain hypothesis testings and subgroup analyses that are often not possible in a single-center setting due to inadequate statistical power [46, 70]. In machine learning, multi-center datasets could lead to more robust models. A model trained only on single-center data is prone to poor generalizability, i.e., it may only perform well on data from the same hospital that provided the training data but do poorly when applied to data from others [25]. This is potentially due to differences among hospitals in medical practices, patient demographics, genotypes and phenotypes, as well as variations in software, hardware, and protocols used for data collection. Environmental, social, political, and cultural variations may also play a part. Multi-center datasets may enable models to capture and adapt to the heterogeneity caused by these factors and thus improve their generalizability [25, 63, 81]. In addition, simply by collecting data from several sources, studies end up with a larger dataset for training, which reduces the expected generalization error of the model [24].

Despite the benefits, in reality, conducting machine learning on EHRs across multiple sources faces several tough challenges. The traditional approach of centralized model training is to gather datasets from different silos and store them in a centralized data warehouse so that machine learning models could be trained on the combined dataset. In practice, such collaborative EHR repositories have been established by healthcare organizations who were willing to pool their data together to conduct their research on a larger scale [49, 69]. However, these efforts faced multiple challenges regarding logistics, infrastructures, regulations, privacy and data standardization [12, 73, 75]. Central data repositories increase the risk of data security and privacy compromises. Examples include data leakage due to an increase in the number of parties with access to the data as well as subject re-identification due to linkage across multiple data sources [21]. Moreover, EHRs are subject to a set of rigorous regulations regarding accessing, analyzing and sharing personal health information [3, 27, 79]. Additionally, each hospital also imposes its internal policies on the matter. Significant efforts are required to ensure compliance with these regulations and policies. Last but not least, the high cost to set up and maintain the infrastructure for centralized data storage presents yet another roadblock against collaborative machine learning based on data centralization.

Given these challenges associated with centralized learning, a method that enables collaborative model training to occur in a decentralized manner, without the need for aggregating all data in one place, would make machine learning on EHRs across multiple centers much more feasible. Federated Learning (FL) is an emerging paradigm that enables building machine learning models collaboratively using decentralized data. It was originally proposed by Google for the use case of Gboard query suggestion [43]. The project involved developing a language model over hundreds of thousands of mobile devices for keyboard autosuggestion, predicting the most likely words that a user would type next. Each participating device trained a separate local model using only their own local dataset. Local models were then sent to a central coordinator where they were aggregated into a global model to be sent back to each participant, either for inference or further training. The main purpose of FL is to enable participants to collaborate and produce a better model than they could on their own without compromising data privacy. It achieves this by requiring participants to share only model parameters, not data.

Given these characteristics, FL has the potential to help facilitate collaboration on data-driven research on EHRs across multiple institutions while preserving data privacy. However, that FL, or horizontal FL to be specific, requires data from participating parties to be of the same format might present a challenge. Recently there have been increasing efforts by healthcare institutions towards data harmonization. More and more organizations are adopting a common data model, such as i2b2 [55], PCORnet [22], and OMOP CDM [80], for their EHRs. These organizations are thus well-positioned to employ FL to facilitate collaborations with others who utilize the same data model [17].

In this study, we give a brief survey of existing applications of FL on EHR data. We then provide an overview of common FL algorithms and evaluate their performance on two EHR machine learning tasks, in-hospital mortality prediction and acute kidney injury (AKI) prediction in the intensive care unit (ICU). These two tasks have significant clinical importance and have been shown to greatly benefit from machine learning. In literature there exist various state-of-the-art FL algorithms that achieve good results on general domain or benchmark datasets. However, there has not been a study that evaluates how well they perform in the context of EHRs. Related to this work, Xu et al. [83] published a survey that summarized the progress and challenges of FL as well as gave an overview of current applications and potential opportunities of FL in healthcare. In addition, there exist many studies that successfully applied FL to outcome prediction using EHRs [32, 59, 66, 77]. To the best of our knowledge, ours is the first one that aims to systematically compare the performance of several state-of-the-art FL algorithms on an EHR machine learning task.

Among state-of-the-art FL methods, the most well-known algorithm is Federated Averaging, also known as FedAvg [50]. FedAvg builds a shared global model by periodically averaging the weights of the locally evolving models. Despite being the standard federated optimization method, it suffers from slow convergence or even divergence in some situations when data distribution differs among clients. Multiple variants of FedAvg have been shown to improve the algorithm convergence behavior [30, 31, 38, 45, 62]. We conducted a previous pilot study investigating the performance of FedAvg and FedProx [45], another popular FL method, on predicting in-hospital mortality [14]. In this study, we extended our earlier work by inspecting a more comprehensive set of FL algorithms and evaluating them on an additional task of AKI prediction.

Skip 2Existing Applications of Federated Learning on Electronic Health Records Section

2 Existing Applications of Federated Learning on Electronic Health Records

Wide adoptions of EHRs among healthcare institutions have given rise to various studies on applying machine learning to biomedical research [52]. However, conducting machine learning on EHRs is not without challenges. Among those, lack of data and poor model generalizability are two major problems. For research projects involving rare diseases and conditions, small hospitals may not have enough data for machine learning models to learn meaningful patterns. In addition, as mentioned above, machine learning models trained on data obtained from a single source may not generalize well and thus perform poorly when applied to a different context. FL is becoming a promising approach to mitigate these shortcomings. It allows training a global model on a larger and more diverse set of EHRs from multiple institutions while keeping the data locally, thereby preserving privacy and enhancing the model’s external validity. Several studies have looked into the effectiveness of solving healthcare problems in an FL setting using EHRs. They can be summarized into two categories, predictive modeling and representation learning.

Many studies have achieved success in applying FL to predictive modeling on EHRs. Sharma et al. [66] proposed an FL framework to predict in-hospital mortality for patients in the ICU. Results showed performance obtained by models trained in an FL setting to be on par with those trained in a centralized manner. Vaid et al. [77] achieved an improvement in predicting 7-day mortality for hospitalized COVID-19 patients by employing FL to utilize data from 5 different hospitals. To predict preterm birth in the context of distributed EHRs, Boughorbel et al. [8] presented a federated uncertainty-aware learning algorithm (FUALA) based on FedAvg. FUALA is capable of dynamically adjusting the aggregation model weights by taking into account each model’s uncertainty, thus reducing the adverse effects of models with high uncertainties. Brisimi et al. [10] proposed an iterative cluster Primal-Dual Splitting (cPDS) algorithm for solving the large-scale soft-margin l1-regularized sparse Support Vector Machine to predict hospitalizations due to cardiac events in FL settings. Huang el al. [33] proposed an FL algorithm called LoAdaBoost to predict the mortality of patients admitted to the ICU based on drugs prescribed during the first 48 h of their ICU stay. Pfohl et al. [59] comprehensively studied the efficacy of FL and differential privacy versus centralized training in predicting prolonged length of stay and in-hospital mortality across thirty-one hospitals. Grama et al. [28] evaluated the performance of different robust FL aggregation methods on two disease prediction tasks, diabetes mellitus onset prediction and heart failure prediction. Results showed that adaptive federated averaging (AFA) [54] not only performs well but also is robust against malicious or faulty clients. Tan et al. [72] proposed a tree-based FL method for treatment effect estimation. The method was used to study the effect of oxygen saturation on hospital mortality among ICU patients with respiratory diseases. Tuladhar et al. [76] presented an ensemble approach to distributed learning of machine learning models for rare disease detection. In this approach, inference is done by ensembling predictions provided by local models instead of aggregating their weights to produce a single global model. Xue et al. [85] introduced a federated reinforcement learning system that employs Double Deep Q-Network (DDQN) to provide supports for personalized clinical decisions. The system utilizes data from smart devices at the edge as well as electronic medical records (EMRs).

FL has also been applied to representation learning in the context of EHRs. Liu et al. [47] proposed a two-stage federated natural language processing (NLP) method for phenotyping and patient representation learning. The first stage constructs a representation of patient data using medical notes from multiple hospitals without sharing the notes. The learned presentation is not constrained to any specific medical task. The second stage builds a machine learning model for a specific phenotyping task based on relevant features extracted from representations learned in the first stage. Lee et al. [44] and Xu et al. [84] presented two federated patient hashing frameworks for patient similarity learning. The model learns context-specific hash codes to represent patients across multiple hospitals. The learned hash codes are then used to calculate similarities among patients. Ultimately, the model can match patients with high similarity among multiple hospitals. Lu et al. [48] proposed an efficient decentralized FL approach to extract latent features from patient data. Kim et al. [41] proposed a tensor factorization method that generates meaningful clinical concepts (phenotypes) from a large volume of EHRs. Vepakomma et al. [78] introduced three configurations of a distributed deep learning method called Split Learning [29], which differs from conventional FL in that it does not require participants to share the weights of the entire locally trained model. This leads to improved data privacy and security. Huang et al. [32] proposed a method called community-based federated learning (CBFL), which clusters the distributed patient data into clinically meaningful groups that share similar characteristics, such as drug features and diagnoses, while simultaneously training one model for each group. The method achieved good results on predicting mortality and length of stay.

Skip 3EVALUATION OF CURRENT WELL-KNOWN FL ALGORITHMS ON EHRS Section

3 EVALUATION OF CURRENT WELL-KNOWN FL ALGORITHMS ON EHRS

This section provides an overview of common FL algorithms that have been shown to work well outside of healthcare domain. Their performance was then evaluated on two machine learning tasks in the ICU, in-hospital mortality prediction and AKI prediction, using a dataset containing EHRs from multiple ICUs.

3.1 Overview of Common FL Algorithms

In general, FL involves each individual participants training local models on their local dataset alone and then exchanging model parameters, e.g., the weights and or gradients, at some frequency. There is no exchange of data among different participants. The local model parameters are then aggregated to generate a global model. Aggregation can be conducted with or without the coordination of a central party. Different FL algorithms vary in how the aggregation steps or the local update steps are performed. Among those, FedAvg [50], is the most well-known. FedAvg aims to optimize the following objective: (1) \( \begin{equation} \min _\mathrm{w}\Big (F(\mathrm{w}) = \sum _{k=1}^K p_kF_k(\mathrm{w})\Big), \end{equation} \) where \( N \) is the number of participants and \( p_k \) is the weight of participant \( k \) and \( \sum _{i=k}^N p_k = 1 \). \( p_k \) is usually proportional to the size of each participant dataset. \( F_k(\cdot) \) is the local objective function.

At each communication round \( t \), a global model with weights \( w_t \) is sent to all \( K \) participants. Each participant \( k \) performs local training for \( E \) epochs, producing a new local model with weights \( w^k_{t+1} \). Each participant then sends their newly learned local model weights to a central server where they are aggregated to obtain a new global model with updated weights \( w_{t+1} \) equal to the weighted average of all local models: (2) \( \begin{equation} w_{t+1} = \sum _{k=1}^K p_kw^k_{t+1}. \end{equation} \)

FedAvg performs well in the case of homogeneity, where all local datasets are identically and independently distributed (IID). In the presence of statistical heterogeneity where data are not identically and independently distributed (non-IID) across participants, the global model might perform poorly or not even converge. A number of different approaches have been proposed to counter this problem and improve the convergence rate and performance of FL for non-IID datasets.

FedProx [45] and SCAFFOLD [38] aim to improve the convergence rate in FedAvg by correcting client drift, a phenomenon where client heterogeneity causes a drift in the local updates in each round of local training, resulting in slow convergence. FedProx introduces a proximal term that restricts the local updates to be closer to the latest global update. Instead of optimizing \( F_k(\cdot) \), each participant now optimizes the local objective: (3) \( \begin{equation} h_k(\mathrm{w}, \mathrm{w}^t) = F_k(\mathrm{w})+\frac{\mu }{2}\left\Vert \mathrm{w}-\mathrm{w^t} \right\Vert ^2. \end{equation} \)

SCAFFOLD works by measuring the amount of drift caused by each client in each round and then adjusts their local update accordingly. How much a client drifts is measured by the difference in the direction of the global update versus the direction of the client local update.

Instead of controlling local training, several FL algorithms tackle the slow convergence problem by experimenting with server optimization. The global model update step specified in Equation (2) can be rewritten as (4) \( \begin{equation} w = w - \Delta w, \end{equation} \) where \( \Delta w = \sum _{k=1}^K p_k\Delta w_k \) and \( \Delta w_k \) is the weight updates from client \( k \), (5) \( \begin{equation} \Delta w_k = w^{t+1}_k - w^t. \end{equation} \) \( w = w - \Delta w \) has the same form as a gradient-based optimization step where \( \Delta w \) acts as a pseudo-gradient. Reddi et al. [62] formalized this as a server optimization step that optimizes the model from a global perspective, in addition to the client optimization step 5 that aims to optimize the model from a local perspective. Their proposed FL algorithms FedAdagrad, FedAdam, and FedYogi employ adaptive server optimization by applying adaptive optimization methods Adagrad, Adam, and Yogi in the server optimization step. FedAvgM [30, 31] is another algorithm that uses adaptive server optimization, by adding momentum to the server optimization step, computing \( w = w - v \) where \( v = \beta v + \Delta w \).

3.2 Experiments

We evaluated the performance of well-known FL algorithms, FedAvg, FedProx, FedAvgM, FedAdagrad, FedAdam, and FedYogi, on two common and clinically crucial machine learning tasks in the ICU, in-hospital mortality prediction and AKI prediction. Their results were compared against those obtained from local learning, centralized learning and two non-FL methods that also enable collaborative model training without data sharing, namely, IIL and cyclic institutional incremental learning (CIIL) [11, 67, 68]. In IIL, each party trains the model on their local dataset then passes the model to the next one until all parties have trained the model. CIIL repeats the same process over multiple rounds, but fixes the number of training epochs carried out by each party at each round. The data for both tasks come from the eICU dataset [60], which collected EHRs from more than 200 hospitals and over 139,000 patients across the United States admitted to the ICU in 2014 and 2015. The dataset contains a wide range of data, including demographics, medication, diagnoses, procedures, timestamped vital signs, and lab test results.

For each task, several hospitals in the eICU database were selected as participants. The extracted data were split into a train, validation, and test set for each of the hospitals, each taking up 80%, 10%, and 10% of the whole population, respectively. In the local training setting, a separate model was trained for each hospital using only their own local data. The training was done over a number of epochs, and for each hospital, the model that gave the best performance on the validation set in terms of Area under the ROC Curve (AUC-ROC) became the final model for evaluation.

In the centralized setting, the train, validation, and test sets from all participating hospitals were concatenated to produce a single train, validation, and test set. A single model was then trained on the combined training set. Like in the local setting, training was conducted for several epochs and the best model was picked based on the AUC-ROC score on the combined validation set.

In the IIL and CIIL settings, since there was no global aggregated validation set due to no data sharing among participants, the model produced by the last party that conducted the training was selected as the final model.

In the FL setting, training was done over several communication rounds. Similar to the IIL and CIIL settings, since the central server that coordinated the training and carried out the global model aggregation process did not have access to a global validation set, the final model was the one obtained after all the communication rounds had finished.

Performance among the methods was compared based on global test scores. The metrics used are AUC-ROC and Area under the Precision-Recall Curve (AUC-PR). Delong’s method [15] and logit method [9] were employed to compute 95% confidence intervals for AUC-ROC and AUC-PR, respectively.

3.2.1 In-hospital Mortality Prediction.

In this experiment, we investigated the performance of FL algorithms on predicting a patient’s in-hospital mortality based on data collected during the first 24 h of their ICU stay. This is a crucial task in clinical setting. When a patient is admitted to the ICU, predicting their mortality, either at the end of the ICU stay, hospital stay, or within a fixed period, e.g., 28 days, one month, or three months, provides a proxy for the severity of their condition and helps healthcare providers plan treatment pathways and allocate resources more effectively. There exist several works on successfully applying machine learning to predict in-hospital mortality [6, 61, 82].

Data. The same data extraction process in References [14, 37] was employed. For each hospital in the entire eICU dataset, we extracted a cohort of patients age 16 and above in their first ICU stay who had their in-hospital mortality status recorded. Patients without an APACHE IVa score were excluded. This criterion serves as a proxy for identifying patients with insufficient data or those who were only in the database for administration purpose. Twenty hospitals with the largest cohorts were then selected as participants in the study. The combined cohort contains 87,003 ICU stays.

For each patient, data within 24 h from ICU admission were extracted. The set of features includes

  • demographic information: gender, age, and ethnicity,

  • the first and last results of the following laboratory tests: PaO2, PaCO2, PaO2/FiO2 ratio, pH, base excess, Albumin, the significant band of arterial blood gas, HCO3, Bilirubin, Blood Urea Nitrogen (BUN), Calcium, Creatinine, Glucose, Hematocrit, Hemoglobin, international normalized ratio (INR), Lactate, Platelet, Potassium, Sodium, white blood cell count,

  • the first and last as well as the minimum and maximum measurements of the following vital signs: heart rate, systolic blood pressure, mean blood pressure, respiratory rate, temperature (Celcius), SpO2, Glasgow Coma Scale (GCS),

  • total urine output,

  • whether the hospital admission was for an elective surgery.

A total of 82 covariates were obtained.

Methods. A neural network consisting of two fully connected hidden layers with ReLU activation function and L2 normalization was used. The first hidden layer contains 100 nodes and the second 50. In the local and centralized settings, the model was trained for 90 epochs. In FL settings, the training took place over 30 communication rounds, with each hospital training the model locally for ten epochs each round.

3.2.2 AKI Prediction.

The purpose of this experiment was to evaluate the performance of FL algorithms on predicting the risk of a patient developing AKI within the next hour based on data collected during the previous 7 h. AKI is a sudden onset of renal damage or kidney failure that happens within a few hours or a few days and occurs in at least 5% of hospitalized patients [16]. AKI can affect other organs such as lungs, heart, and brain. It significantly increases hospitalization cost as well as mortality risk [13]. A timely detection of AKI could prevent patients from developing chronic kidney disease [39, 71]. There have been several studies that show strong performance of machine learning models in predicting AKI [26, 53, 58].

Data. We followed the same data extraction process in Reference [16]. The RIFLE criteria [7] were used to define AKI. Specifically, a patient at time \( t \) will be labeled as suffering from AKI if their urine output is less than 0.5 ml/kg/h for \( t\gt =6 \). The cohort exclusion criteria include (1) patients who were under 16 years old or stayed in the ICU for less than 12 h and (2) patients whose data for the selected variables were not recorded at least once during their ICU stay. A total of 10,967 patients in 168 hospitals remained after the filtering. The top 75% hospitals with the most number of patients were selected to participate in the study. The final cohort contains 28 hospitals with a total of 6,641 patients.

For each patient, we extracted data in 7-h sliding windows. The full set of covariates includes

  • demographic information: age and gender,

  • the minimum and maximum values as well as the range (the difference between the maximum and minimum values) of the following vital signs: heart rate, respiratory rate, mean blood pressure,

  • the minimum and maximum values as well as the range of the following lab measurements: SpO2/SaO2, pH, Potassium, Calcium, Glucose, Sodium, HCO3, Hemoglobin, white blood cell count, Platelet count, Urea Nitrogen, Creatinine, GCS,

  • interventions: use of vasoactive medications, use of sedative medications, and use of mechanical ventilation.

  • total urine output.

A total of 22 covariates were obtained.

Methods. Similar to the previous task, a fully connected neural network consisting of two hidden layers with ReLU activation function and L2 normalization was used. However, here each of the two hidden layers contains 512 nodes instead of 100 and 50. In the local and centralized settings, the model was trained for 30 epochs. In FL settings, training took place over four communication rounds. Each hospital trained a local model for 10 local epochs during the first round and 5 local epochs during each subsequent round.

3.3 Results and Discussion

Global test performance in terms of AUC-ROC and AUC-PR obtained with each method is shown in Table 1 for in-hospital mortality prediction and Table 2 for AKI prediction. Comparison of ROC curves obtained with FL methods versus centralized and local training is visualized in Figures 1 and 2. Similarly, Figures 3 and 4 in Appendix A show comparison of ROC curves obtained with FL methods compared to those obtained with CIIL and IIL. In both tasks, all FL methods outperform local training in either metric with the exception of FedProx in predicting AKI. In particular, for mortality prediction, all FL methods perform significantly better than local training. In comparison with IIL and CIIL, for mortality prediction, all FL methods achieve better results. For AKI prediction, the same is true for most FL methods. Only exceptions are FedProx, which obtains worse AUC-ROC and AUC-PR than both IIL and CIIL, and FedAdam, whose AUC-PR is slightly lower than that of CIIL. Overall, for both tasks, FL methods enjoy improvement over IIL and CIIL. This is unsurprising given that IIL is known to suffer from catastrophic forgetting [23, 42, 68] while it is non-trivial to obtain optimal results with CIIL due to its instability [68]. Results obtained by FL are also comparable to centralized learning, with the best FL method in each task achieving AUC-ROC within 0.01 of the global AUC-ROC for centralized learning in terms of point estimates. FedAvg and FedAvgM perform consistently well and are among the top three FL methods with the highest global AUC-ROCs and AUC-PRs in either task, only behind FedYogi in AKI prediction. In both cases, FedAvgM obtained slightly better results than FedAvg. However, FedProx achieved the lowest scores in both mortality and AKI prediction.

Fig. 1.

Fig. 1. Comparison of ROC curves obtained with Local, Centralized, and FL methods for in-hospital mortality prediction.

Fig. 2.

Fig. 2. Comparison of ROC curves obtained with Local, Centralized, and FL methods for AKI prediction.

Fig. 3.

Fig. 3. Comparison of ROC curves obtained with IIL, CIIL, and FL methods for in-hospital mortality prediction.

Fig. 4.

Fig. 4. Comparison of ROC curves obtained with IIL, CIIL, and FL methods for AKI prediction.

Table 1.
MethodAUC-ROC (95% CI)AUC-PR (95% CI)
Local0.833 (0.808–0.859)0.472 (0.373–0.480)
Centralized0.918 (0.907–0.929)0.668 (0.633–0.701)
IIL0.877 (0.857–0.897)0.505 (0.451–0.559)
CIIL0.828 (0.803–0.852)0.426 (0.373–0.480)
FedAvg0.901 (0.882–0.921)0.638 (0.584–0.688)
FedProx0.895 (0.877–0.914)0.577 (0.523–0.630)
FedAvgM0.906 (0.888–0.925)0.645 (0.591–0.695)
FedAdam0.890 (0.870–0.911)0.578 (0.524–0.631)
FedAdagrad0.893 (0.873–0.913)0.596 (0.543–0.649)
FedYogi0.895 (0.875–0.915)0.594 (0.539–0.646)

Table 1. Global Test Performance on the In-hospital Mortality Prediction Task

Table 2.
MethodAUC-ROC (95% CI)AUC-PR (95% CI)
Local0.709 (0.697–0.722)0.748 (0.734–0.761)
Centralized0.735 (0.724–0.747)0.783 (0.770–0.796)
IIL0.664 (0.652–0.677)0.723 (0.709–0.737)
CIIL0.712 (0.70–0.724)0.764 (0.750–0.777)
FedAvg0.724 (0.712–0.736)0.770 (0.757–0.783)
FedProx0.691 (0.679–0.703)0.740 (0.726–0.754)
FedAvgM0.725 (0.713–0.736)0.775 (0.762–0.788)
FedAdam0.716 (0.704–0.728)0.760 (0.746–0.773)
FedAdagrad0.720 (0.708–0.732)0.767 (0.753–0.780)
FedYogi0.732 (0.720–0.743)0.773 (0.760–0.786)

Table 2. Global Test Performance on the AKI Prediction Task

Results strongly favor FL as a viable strategy for facilitating collaboration among organizations in clinical research. Even though performance does not vary much among the different FL methods in our experiments, it is observed that simple FL algorithms, namely, FedAvg and FedAvgM, perform slightly better than FedProx, FedAdam and FedAdagrad. It has been shown that FedProx works well in the presence of heavy data heterogeneity [45]. In our dataset, all hospitals are located in the United States and therefore expected to experience consistencies in clinical practices and patient demographics. Furthermore, they all participated in the Philips eICU program, which guarantees a certain degree of data standardization. Thus, the differences in data distribution among them are not significant enough to benefit from FedProx. Plus, the total number of participants is relatively small compared to FL in an IoT setting with a large number of participating devices where FedProx usually shines [45]. Data homogeneity might also contribute to the lack of performance gain in FedAdam and FedAdagrad compared to FedAvg. In addition, both the tasks of predicting mortality and predicting AKI, similar to most machine learning tasks on tabular EHR data, only require the use of feed forward fully connected neural networks with a small number of layers, which might not see considerable performance gain through the use of Adam and Adagrad.

Skip 4CONCLUSION Section

4 CONCLUSION

This study gave a brief survey on applications of FL on EHR data and then evaluated the performance of multiple common FL algorithms on two typical EHR machine learning tasks, in-hospital mortality prediction and AKI prediction. FL shows notable improvement compared to local training and performs close to centralized learning. This is promising for organizations that seek to collaborate with others in data-driven clinical research using EHRs. FL could help them build better machine learning models than individually using only their own local data while preserving data privacy without compromising on model performance.

Our results also suggest that simple FL algorithms FedAvg and FedAvgM work particularly well for machine learning tasks on tabular EHR compared to more complex methods such as FedProx, FedAdam and FedAdagrad. Data homogeneity due to the fact that all local datasets in our experiments come from hospitals located in the U.S. and thus share certain characteristics might contribute to this finding. This is one limitation in our study that we aim to overcome in future works. We plan to expand the pool of participants in our experiments to include datasets from ICUs in Europe [34, 74], Australia, and New Zealand [70] and investigate how FL performs on more heterogeneous EHR data as a future research direction.

In addition, we plan to validate our results on more recent data. We are currently working with public hospitals in Singapore to establish a federated data network whose participants adopt the OMOP Common Data Model [65, 80] for their EHRs. Once the network is set up, we would replicate our study on this more up to date dataset and also expand it to cover more tasks other than mortality and AKI prediction to improve the generalizability of our findings. Another direction would be to evaluate the performance of different FL methods on other data modals in healthcare such as time series, digital signals [2] and medical imaging [35, 36, 56], those that require more complex model architectures.

It is important to note that even though FL aims to preserve data privacy by sharing model parameters instead of data during training, it by itself does not mitigate all data privacy and security concerns. Studies showed that it is possible to make inference about the raw data by examining model parameter updates [5]. There exists methods that add extra security measures on top of FL to counter this [1, 4, 5, 40], namely, Differential Privacy (DF) [18, 19, 20], Homomorphic Encryption (HE) [64] and Secure Multi-Party Computation (SMC) [51, 86]. They enhance FL data security and privacy at the cost of communication efficiency and model performance. In particular, by adding noise to client training data, DF offers improvement in data privacy but also results in a decrease in model accuracy. HE ensures that only encrypted model parameters are exchanged. This provides data protection but also imposes a penalty on model performance [57]. SMC preserves knowledge of client inputs but is computationally intensive and requires extensive communication among parties. Further research is needed to understand the privacy-accuracy trade-offs of combining these methods with different FL algorithms in the context of EHR.

APPENDIX

A Comparison of ROC Curves Obtained by FL Methods Versus IIL and CIIL

Figures 3 and 4 show comparison of ROC curves obtained with IIL, CIIL, and FL methods for in-hospital mortality prediction and AKI prediction, respectively.

REFERENCES

  1. [1] Acar Abbas, Aksu Hidayet, Uluagac A. Selcuk, and Conti Mauro. 2018. A survey on homomorphic encryption schemes: Theory and implementation. ACM Comput. Surveys 51, 4 (2018), 135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Alday Erick A. Perez, Gu Annie, Shah Amit J., Robichaux Chad, Wong An-Kwok Ian, Liu Chengyu, Liu Feifei, Rad Ali Bahrami, Elola Andoni, Seyedi Salman, et al. 2020. Classification of 12-lead ecgs: The physionet/computing in cardiology challenge 2020. Physiol. Measure. 41, 12 (2020), 124003.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Annas George J.. 2003. HIPAA regulations—A new era of medical-record privacy?N. Engl. J. Med. 348, 15 (2003), 1486–1490.Google ScholarGoogle Scholar
  4. [4] Aono Yoshinori, Hayashi Takuya, Phong Le Trieu, and Wang Lihua. 2016. Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy. 142144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Aono Yoshinori, Hayashi Takuya, Wang Lihua, Moriai Shiho, et al. 2017. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Info. Forens. Secur. 13, 5 (2017), 13331345.Google ScholarGoogle Scholar
  6. [6] Awad Aya, Bader-El-Den Mohamed, McNicholas James, and Briggs Jim. 2017. Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach. Int. J. Med. Inform. 108 (2017), 185195.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Bellomo Rinaldo, Ronco Claudio, Kellum John A., Mehta Ravindra L., and Palevsky Paul. 2004. Acute renal failure–definition, outcome measures, animal models, fluid therapy and information technology needs: The second international consensus conference of the acute dialysis quality initiative (ADQI) group. Crit. Care 8, 4 (2004), 19.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Boughorbel Sabri, Jarray Fethi, Venugopal Neethu, Moosa Shabir, Elhadi Haithum, and Makhlouf Michel. 2019. Federated uncertainty-aware learning for distributed hospital ehr data. Retrieved from https://arXiv:1910.12191.Google ScholarGoogle Scholar
  9. [9] Boyd Kendrick, Eng Kevin H., and Page C. David. 2013. Area under the precision-recall curve: Point estimates and confidence intervals. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 451466.Google ScholarGoogle Scholar
  10. [10] Brisimi Theodora S., Chen Ruidi, Mela Theofanie, Olshevsky Alex, Paschalidis Ioannis Ch, and Shi Wei. 2018. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 112 (2018), 5967.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chang Ken, Balachandar Niranjan, Lam Carson, Yi Darvin, Brown James, Beers Andrew, Rosen Bruce, Rubin Daniel L., and Kalpathy-Cramer Jayashree. 2018. Distributed deep learning networks among institutions for medical imaging. J. Amer. Med. Inform. Assoc. 25, 8 (2018), 945954.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Min Chen, Yongfeng Qian, Jing Chen, Kai Hwang, Shiwen Mao, and Long Hu. 2020. Privacy protection and intrusion avoidance for cloudlet-based medical data sharing. IEEE Trans. Cloud Comput. 8, 4 (2020), 1274–1283.Google ScholarGoogle Scholar
  13. [13] Chertow Glenn M., Burdick Elisabeth, Honour Melissa, Bonventre Joseph V., and Bates David W.. 2005. Acute kidney injury, mortality, length of stay, and costs in hospitalized patients. J. Amer. Soc. Nephrol. 16, 11 (2005), 33653370.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Dang Trung Kien, Tan Kwan Chet, Choo Mark, Lim Nicholas, Weng Jianshu, and Feng Mengling. 2020. Building ICU In-hospital mortality prediction model with federated learning. In Federated Learning. Springer, 255268.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] DeLong Elizabeth R., DeLong David M., and Clarke-Pearson Daniel L.. 1988. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics (1988), 837845.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Hao Du, Ziyuan Pan, Kee Yuan Ngiam, Fei Wang, Ping Shum, and Mengling Feng. 2021. Self-correcting recurrent neural network for acute kidney injury prediction in critical care. Health Data Sci. 2021, Article 9808426 (2021), 10 pages.Google ScholarGoogle Scholar
  17. [17] Duan Rui, Boland Mary Regina, Liu Zixuan, Liu Yue, Chang Howard H., Xu Hua, Chu Haitao, Schmid Christopher H., Forrest Christopher B., Holmes John H., et al. 2020. Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. J. Amer. Med. Inform. Assoc. 27, 3 (2020), 376385.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Dwork Cynthia. 2011. A firm foundation for private data analysis. Commun. ACM 54, 1 (2011), 8695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Dwork Cynthia, McSherry Frank, Nissim Kobbi, and Smith Adam. 2006. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference. Springer, 265284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Dwork Cynthia, Roth Aaron, et al. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3–4 (2014), 211407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Emam Khaled El, Jonker Elizabeth, Arbuckle Luk, and Malin Bradley. 2011. A systematic review of re-identification attacks on health data. PLoS One 6, 12 (2011), e28071.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Fleurence Rachael L., Curtis Lesley H., Califf Robert M., Platt Richard, Selby Joe V., and Brown Jeffrey S.. 2014. Launching PCORnet, a national patient-centered clinical research network. J. Amer. Med. Inform. Assoc. 21, 4 (2014), 578582.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] French Robert M.. 1999. Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3, 4 (1999), 128135.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Friedman Jerome, Hastie Trevor, Tibshirani Robert, et al. 2001. The Elements of Statistical Learning. Springer Series in Statistics, Vol. 1. Springer, New York.Google ScholarGoogle Scholar
  25. [25] Futoma Joseph, Simons Morgan, Panch Trishan, Doshi-Velez Finale, and Celi Leo Anthony. 2020. The myth of generalisability in clinical research and machine learning in health care. Lancet Dig. Health 2, 9 (2020), e489–e492.Google ScholarGoogle Scholar
  26. [26] Gameiro Joana, Branco Tiago, and Lopes José António. 2020. Artificial intelligence in acute kidney injury risk prediction. J. Clin. Med. 9, 3 (2020), 678.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Gostin Lawrence O.. 2001. National health information privacy: Regulations under the health insurance portability and accountability act. JAMA 285, 23 (2001), 30153021.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Grama Matei, Musat Maria, Muñoz-González Luis, Passerat-Palmbach Jonathan, Rueckert Daniel, and Alansary Amir. 2020. Robust aggregation for adaptive privacy preserving federated learning in healthcare. Retrieved from https://arXiv:2009.08294.Google ScholarGoogle Scholar
  29. [29] Gupta Otkrist and Raskar Ramesh. 2018. Distributed learning of deep neural network over multiple agents. J. Netw. Comput. Appl. 116 (2018), 18.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Hsu Tzu-Ming Harry, Qi Hang, and Brown Matthew. 2019. Measuring the effects of non-identical data distribution for federated visual classification. Retrieved from https://arXiv:1909.06335.Google ScholarGoogle Scholar
  31. [31] Hsu Tzu-Ming Harry, Qi Hang, and Brown Matthew. 2020. Federated visual classification with real-world data distribution. Retrieved from https://arXiv:2003.08082.Google ScholarGoogle Scholar
  32. [32] Huang Li, Shea Andrew L., Qian Huining, Masurkar Aditya, Deng Hao, and Liu Dianbo. 2019. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Inform. 99 (2019), 103291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Huang Li, Yin Yifeng, Fu Zeng, Zhang Shifa, Deng Hao, and Liu Dianbo. 2020. LoAdaBoost: Loss-based adaboost federated machine learning with reduced computational complexity on IID and non-IID intensive care data. PLoS One 15, 4 (2020), e0230706.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Hyland Stephanie L., Faltys Martin, Hüser Matthias, Lyu Xinrui, Gumbsch Thomas, Esteban Cristóbal, Bock Christian, Horn Max, Moor Michael, Rieck Bastian, et al. 2020. Early prediction of circulatory failure in the intensive care unit using machine learning. Nature Med. 26, 3 (2020), 364373.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Irvin Jeremy, Rajpurkar Pranav, Ko Michael, Yu Yifan, Ciurea-Ilcus Silviana, Chute Chris, Marklund Henrik, Haghgoo Behzad, Ball Robyn, Shpanskaya Katie, et al. 2019. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 590597.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Johnson Alistair E. W., Pollard Tom J., Berkowitz Seth J., Greenbaum Nathaniel R., Lungren Matthew P., Deng Chih-ying, Mark Roger G., and Horng Steven. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 1 (2019), 18.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Johnson Alistair E. W., Pollard Tom J., and Naumann Tristan. 2018. Generalizability of predictive models for intensive care unit patients. In Proceedings of the Machine Learning for Health (ML4H) Workshop (NeurIPS’18). Retrieved from http://arxiv.org/abs/1812.02275.Google ScholarGoogle Scholar
  38. [38] Karimireddy Sai Praneeth, Kale Satyen, Mohri Mehryar, Reddi Sashank, Stich Sebastian, and Suresh Ananda Theertha. 2020. SCAFFOLD: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning. PMLR, 51325143.Google ScholarGoogle Scholar
  39. [39] Kate Rohit J., Perez Ruth M., Mazumdar Debesh, Pasupathy Kalyan S., and Nilakantan Vani. 2016. Prediction and detection models for acute kidney injury in hospitalized older adults. BMC Med. Inform. Decis. Mak. 16, 1 (2016), 111.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Kim Miran, Song Yongsoo, Wang Shuang, Xia Yuhou, and Jiang Xiaoqian. 2018. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR Med. Inform. 6, 2 (2018), e19.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Kim Yejin, Sun Jimeng, Yu Hwanjo, and Jiang Xiaoqian. 2017. Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 887895.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Kirkpatrick James, Pascanu Razvan, Rabinowitz Neil, Veness Joel, Desjardins Guillaume, Rusu Andrei A., Milan Kieran, Quan John, Ramalho Tiago, Grabska-Barwinska Agnieszka, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A. 114, 13 (2017), 35213526.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Konečnỳ Jakub, McMahan H. Brendan, Ramage Daniel, and Richtárik Peter. 2016. Federated optimization: Distributed machine learning for on-device intelligence. Retrieved from https://arXiv:1610.02527.Google ScholarGoogle Scholar
  44. [44] Lee Junghye, Sun Jimeng, Wang Fei, Wang Shuang, Jun Chi-Hyuck, and Jiang Xiaoqian. 2018. Privacy-preserving patient similarity learning in a federated environment: Development and analysis. JMIR Med. Inform. 6, 2 (2018), e20.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Li Tian, Sahu Anit Kumar, Zaheer Manzil, Sanjabi Maziar, Talwalkar Ameet, and Smith Virginia. 2018. Federated optimization in heterogeneous networks. Retrieved from https://arXiv:1812.06127.Google ScholarGoogle Scholar
  46. [46] Lilly Craig M., McLaughlin John M., Zhao Huifang, Baker Stephen P., Cody Shawn, Irwin Richard S., and Group UMass Memorial Critical Care Operations. 2014. A multicenter study of ICU telemedicine reengineering of adult critical care. Chest 145, 3 (2014), 500507.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Liu Dianbo, Dligach Dmitriy, and Miller Timothy. 2019. Two-stage federated phenotyping and patient representation learning. Retrieved from https://arXiv:1908.05596.Google ScholarGoogle Scholar
  48. [48] Lu Songtao, Zhang Yawen, and Wang Yunlong. 2020. Decentralized federated learning for electronic health records. In Proceedings of the 54th Annual Conference on Information Sciences and Systems (CISS’20). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] McDonald Clement J., Overhage J. Marc, Barnes Michael, Schadow Gunther, Blevins Lonnie, Dexter Paul R., Mamlin Burke, and Committee INPC Management. 2005. The indiana network for patient care: A working local health information infrastructure. Health Affairs 24, 5 (2005), 12141220.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] McMahan Brendan, Moore Eider, Ramage Daniel, Hampson Seth, and Arcas Blaise Aguera y. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. PMLR, 12731282.Google ScholarGoogle Scholar
  51. [51] Micali Silvio, Goldreich Oded, and Wigderson Avi. 1987. How to play any mental game. In Proceedings of the 19th ACM Symposium on Theory of Computing (STOC’87). ACM, 218229.Google ScholarGoogle Scholar
  52. [52] Miotto Riccardo, Wang Fei, Wang Shuang, Jiang Xiaoqian, and Dudley Joel T.. 2018. Deep learning for healthcare: Review, opportunities, and challenges. Brief.Bioinform. 19, 6 (2018), 12361246.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Mohamadlou Hamid, Lynn-Palevsky Anna, Barton Christopher, Chettipally Uli, Shieh Lisa, Calvert Jacob, Saber Nicholas R., and Das Ritankar. 2018. Prediction of acute kidney injury with a machine learning algorithm using electronic health record data. Can. J. Kidney Health Dis. 5 (2018), 2054358118776326.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Muñoz-González Luis, Co Kenneth T., and Lupu Emil C.. 2019. Byzantine-robust federated machine learning through adaptive model averaging. Retrieved from https://arXiv:1909.05125.Google ScholarGoogle Scholar
  55. [55] Murphy Shawn N., Weber Griffin, Mendis Michael, Gainer Vivian, Chueh Henry C., Churchill Susanne, and Kohane Isaac. 2010. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J. Amer. Med. Inform. Assoc. 17, 2 (2010), 124130.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Ng Dianwen, Lan Xiang, Yao Melissa Min-Szu, Chan Wing P., and Feng Mengling. 2021. Federated learning: A collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. Quant. Imag. Med. Surg. 11, 2 (2021), 852.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Nikolaenko Valeria, Weinsberg Udi, Ioannidis Stratis, Joye Marc, Boneh Dan, and Taft Nina. 2013. Privacy-preserving ridge regression on hundreds of millions of records. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE, 334348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Parreco Joshua, Soe-Lin Hahn, Parks Jonathan J., Byerly Saskya, Chatoor Matthew, Buicko Jessica L., Namias Nicholas, and Rattan Rishi. 2019. Comparing machine learning algorithms for predicting acute kidney injury. Amer. Surg. 85, 7 (2019), 725729.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Pfohl Stephen R., Dai Andrew M., and Heller Katherine. 2019. Federated and differentially private learning for electronic health records. Retrieved from https://arXiv:1911.05861.Google ScholarGoogle Scholar
  60. [60] Pollard Tom J., Johnson Alistair E. W., Raffa Jesse D., Celi Leo A., Mark Roger G., and Badawi Omar. 2018. The eICU collaborative research database, a freely available multi-center database for critical care research. Sci. Data 5, 1 (2018), 113.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Purushotham Sanjay, Meng Chuizheng, Che Zhengping, and Liu Yan. 2018. Benchmarking deep learning models on large healthcare datasets. J. Biomed. Inform. 83 (2018), 112134.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Reddi Sashank, Charles Zachary, Zaheer Manzil, Garrett Zachary, Rush Keith, Konečnỳ Jakub, Kumar Sanjiv, and McMahan H. Brendan. 2020. Adaptive federated optimization. Retrieved from https://arXiv:2003.00295.Google ScholarGoogle Scholar
  63. [63] Riley Richard D., Ensor Joie, Snell Kym I. E., Debray Thomas P. A., Altman Doug G., Moons Karel G. M., and Collins Gary S.. 2016. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: Opportunities and challenges. bmj 353 (2016).Google ScholarGoogle Scholar
  64. [64] Rivest Ronald L., Adleman Len, Dertouzos Michael L., et al. 1978. On data banks and privacy homomorphisms. Found. Secure Comput. 4, 11 (1978), 169180.Google ScholarGoogle Scholar
  65. [65] Sathappan Selva Muthu Kumaran, Jeon Young Seok, Dang Trung Kien, Lim Su Chi, Shao Yi-Ming, Tai E. Shyong, and Feng Mengling. 2021. Transformation of electronic health records and questionnaire data to OMOP CDM: A feasibility study using SG_T2DM dataset. Appl. Clin. Inform. 12, 4 (2021), 757767.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Sharma Pulkit, Shamout Farah E., and Clifton David A.. 2019. Preserving patient privacy while training a predictive model of in-hospital mortality. Retrieved from https://arXiv:1912.00354.Google ScholarGoogle Scholar
  67. [67] Sheller Micah J., Edwards Brandon, Reina G. Anthony, Martin Jason, Pati Sarthak, Kotrotsou Aikaterini, Milchenko Mikhail, Xu Weilin, Marcus Daniel, Colen Rivka R., et al. 2020. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10, 1 (2020), 112.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Sheller Micah J., Reina G. Anthony, Edwards Brandon, Martin Jason, and Bakas Spyridon. 2018. Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop. Springer, 92104.Google ScholarGoogle Scholar
  69. [69] Sonntag Daniel, Tresp Volker, Zillner Sonja, Cavallaro Alexander, Hammon Matthias, Reis André, Fasching Peter A., Sedlmayr Martin, Ganslandt Thomas, Prokosch Hans-Ulrich, et al. 2016. The clinical data intelligence project. Informatik-Spektrum 39, 4 (2016), 290300.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Stow Peter J., Hart Graeme K., Higlett Tracey, George Carol, Herkes Robert, McWilliam David, Bellomo Rinaldo, Committee ANZICS Database Management, et al. 2006. Development and implementation of a high-quality clinical database: the australian and new zealand intensive care society adult patient database. J. Crit. Care 21, 2 (2006), 133141.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Sutherland Scott M., Chawla Lakhmir S., Kane-Gill Sandra L., Hsu Raymond K., Kramer Andrew A., Goldstein Stuart L., Kellum John A., Ronco Claudio, Bagshaw Sean M., and Group 15 ADQI Consensus. 2016. Utilizing electronic health records to predict acute kidney injury risk and outcomes: Workgroup statements from the 15th ADQI consensus conference. Can. J. Kidney Health Dis. 3 (2016), 99.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Tan Xiaoqing, Chang Chung-Chou H., and Tang Lu. 2021. A tree-based federated learning approach for personalized treatment effect estimation from heterogeneous data sources. Retrieved from https://arXiv:2103.06261.Google ScholarGoogle Scholar
  73. [73] Thapa Chandra and Camtepe Seyit. 2021. Precision health data: Requirements, challenges and existing techniques for data security and privacy. Comput. Biol. Med. 129 (2021), 104130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Patrick J. Thoral, Jan M. Peppink, Ronald H. Driessen, Eric J. G. Sijbrands, Erwin J. O. Kompanje, Lewis Kaplan, Heatherlee Bailey, Jozef Kesecioglu, Maurizio Cecconi, Matthew Churpek, et al. 2021. Sharing ICU patient data responsibly under the Society of Critical Care Medicine/European Society of Intensive Care Medicine joint data science collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) example. Crit. Care Med. 49, 6 (2021), e563.Google ScholarGoogle Scholar
  75. [75] Tresp Volker, Overhage J. Marc, Bundschus Markus, Rabizadeh Shahrooz, Fasching Peter A., and Yu Shipeng. 2016. Going digital: A survey on digitalization and large-scale data analytics in healthcare. Proc. IEEE 104, 11 (2016), 21802206.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Tuladhar Anup, Gill Sascha, Ismail Zahinoor, Forkert Nils D., Initiative Alzheimer’s Disease Neuroimaging, et al. 2020. Building machine learning models without sharing patient data: A simulation-based analysis of distributed learning by ensembling. J. Biomed. Inform. 106 (2020), 103424.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Vaid Akhil, Jaladanki Suraj K., Xu Jie, Teng Shelly, Kumar Arvind, Lee Samuel, Somani Sulaiman, Paranjpe Ishan, Freitas Jessica K. De, Wanyan Tingyi, et al. 2021. Federated learning of electronic health records to improve mortality prediction in hospitalized patients with COVID-19: Machine learning approach. JMIR Med. Inform. 9, 1 (2021), e24207.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Vepakomma Praneeth, Gupta Otkrist, Swedish Tristan, and Raskar Ramesh. 2018. Split learning for health: Distributed deep learning without sharing raw patient data. Retrieved from https://arXiv:1812.00564.Google ScholarGoogle Scholar
  79. [79] Voigt Paul and Bussche Axel Von dem. 2017. The EU general data protection regulation (GDPR). A Practical Guide, vol. 10, 1st ed. Springer, Cham, 3152676.Google ScholarGoogle Scholar
  80. [80] Voss Erica A., Makadia Rupa, Matcho Amy, Ma Qianli, Knoll Chris, Schuemie Martijn, DeFalco Frank J., Londhe Ajit, Zhu Vivienne, and Ryan Patrick B.. 2015. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Amer. Med. Inform. Assoc. 22, 3 (2015), 553564.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Wynants L., Kent D. M., Timmerman D., Lundquist C. M., and Calster B. Van. 2019. Untapped potential of multicenter studies: A review of cardiovascular risk prediction models revealed inappropriate analyses and wide variation in reporting. Diagnost. Prognost. Res. 3, 1 (2019), 117.Google ScholarGoogle Scholar
  82. [82] Xiao Cao, Choi Edward, and Sun Jimeng. 2018. Opportunities and challenges in developing deep learning models using electronic health records data: A systematic review. J. Amer. Med. Inform. Assoc. 25, 10 (2018), 14191428.Google ScholarGoogle ScholarCross RefCross Ref
  83. [83] Xu Jie, Glicksberg Benjamin S., Su Chang, Walker Peter, Bian Jiang, and Wang Fei. 2021. Federated learning for healthcare informatics. J. Healthcare Inform. Res. 5, 1 (2021), 119.Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Xu Jie, Xu Zhenxing, Walker Peter, and Wang Fei. 2020. Federated patient hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 64866493.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Xue Zeyue, Zhou Pan, Xu Zichuan, Wang Xiumin, Xie Yulai, Ding Xiaofeng, and Wen Shiping. 2021. A resource-constrained and privacy-preserving edge-computing-enabled clinical decision system: A federated reinforcement learning approach. IEEE Internet Things J. 8, 11 (2021), 91229138.Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] Yao Andrew C.. 1982. Protocols for secure computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science (SFCS’82). IEEE, 160164.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Federated Learning for Electronic Health Records

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 13, Issue 5
      October 2022
      424 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3542930
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Copyright © 2022 Copyright held by the owner/author(s).

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 June 2022
      • Online AM: 2 May 2022
      • Revised: 1 January 2022
      • Accepted: 1 January 2022
      • Received: 1 May 2021
      Published in tist Volume 13, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format