PRIVACY-PRESERVING MACHINE LEARNING FOR CANCER PREDICTION

Team:

Program: Biomedical Engineering

Machine learning plays an increasingly critical role in health science due to its capability to infer valuable information from high-dimensional data. More training data for machine learning models provides greater statistical power which aids decision-making in healthcare. However, this often requires combining patient data across institutions and hospitals, which is not always possible due to privacy considerations including laws such as HIPAA and institutional review board policies. These policies are in place because many common machine learning methods can in-advertently memorize their training data and can be reverse engineered to expose the individuals who contributed their data for training. Here we outline a simple federated learning algorithm implementing differential privacy to ensure patient privacy when training a machine learning model on data spread across different institutions. We tested our model by predicting breast cancer status from publicly available gene expression data. Our model achieves a similar level of accuracy and precision as a single-site non-differentially private (un-noised) neural network model when we enforce privacy. This result suggests that our algorithm is an effective method of implementing differential privacy with federated learning. Clinical data scientists can use our general technique with no prior experience in differential privacy to produce differentially private models on federated datasets.

Gamze G├╝rsoy (Department of Biomedical Informatics, Columbia UniversityCore Faculty, New York Genome Center.))
Mark Gerstein (Department of Molecular Biophysics and Biochemistry, Department of Computer Science, Department of Statistics and Data Science, Yale University)

Mentors

Constanza Miranda, Ph.D. (BME Johns Hopkins University)

Team Members

Project Links


To see the full poster, click the poster link at the end of the page