publications / 2020
Abs2020·Conference

Data-Distributed Deep Learning using Federated Learning: A Case Study

Sarma, K. V., Harmon, S. A., Sanford, T. H., Roth, H. R., Flores, M. G., Kulkarni, R., Wood, B. J., Choyke, P. L., Raman, S. S., Enzmann, D. R., Turkbey, B., Speier, W., and Arnold, C. W..
In Annual Meeting of the Radiological Society of North America · 2020
Abstract

Background: One challenge in translating deep learning models to the clinic is a lack of generalizability across institutions, due to poor availability of multi-institutional datasets. The creation of such datasets is often limited by data governance limitations as well as patient privacy and other ethical concerns. As such, it is desirable to enable the training of such models across institutions without sharing the underlying data (i.e., "data-distributed learning"). Here, we present a case study on the use of federated learning (FL), a data-distributed learning methodology.

Evaluation: 343 T2-weighted images were retrieved from the ProstateX Challenge dataset and annotated with prostate contours. 43 images were reserved as a held-out test set. Two models were trained using a pooled data (PD) or data-distributed FL approach. For all experiments, the 3D AH-NET was used as the deep learning model, the soft Dice loss was used with the Adam optimizer, and the evaluation metric was the Dice criterion. For the benchmark PD model, 300 cases were split into 240 training cases and 60 validation and trained for 300 epochs. A mean Dice score of 0.911 on the held-out test set was obtained from evaluation. For the experimental FL model, 300 cases were split into three subsets of 100 cases and distributed to each of our three institutions and similarly split into training and validation sets. Each institution then trained the model for 300 epochs, and after each epoch the model weights were collected, averaged, and then redistributed to each institution for use in the next epoch. After 300 training epochs, each institution’s model was then used to predict segmentations for the held-out test set, resulting in mean Dice scores of 0.909, 0.905, and 0.910.

Discussion: Our results showed equivalent performance from both the experimental FL (0.908) and benchmark PD models (0.911). This indicates the FL approach sufficiently enabled learning from the whole dataset without the need to transfer data between institutions.

Conclusion: Our results validated the data-distributed FL approach. This outcome enables future work using non-pooled datasets to enhance model transportability and clinical utility.

★ Winner, 2020 RSNA Trainee Research Award in Medical Informatics
BibTeX
@inproceedings{Sarma2020a,
  author = {Sarma, K. V. and Harmon, S. A. and Sanford, T. H. and Roth, H. R. and Flores, M. G. and Kulkarni, R. and Wood, B. J. and Choyke, P. L. and Raman, S. S. and Enzmann, D. R. and Turkbey, B. and Speier, W. and Arnold, C. W.},
  booktitle = {Annual Meeting of the Radiological Society of North America},
  title = {{Data-Distributed Deep Learning using Federated Learning: A Case Study}},
  year = {2020},
  award = {Winner, 2020 RSNA Trainee Research Award in Medical Informatics}
}