AI and Federated Learning
The availability of large-scale datasets has accelerated the artificial intelligence development and growth. The scarcity of large-scale medical datasets however, limits the use of machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets is primarily due to concerns about confidentiality and privacy when sharing medical data. A case study was conducted using a differentially private federated learning framework for analysis of histopathology images, the largest and possibly most complex medical images, to illustrate a viable path forward in medical image imaging. The effects of IID and non-IID distributions, as well as the number of healthcare providers like hospitals and clinics, and individual dataset sizes, were investigated by simulating a distributed environment with various datasets from a public repository. According to recent studies, differentially private federated learning is a viable and dependable framework for the collaborative development of machine learning models in medical image analysis.
In many domains, deep neural networks have achieved and established state-of-the-art results. Deep learning models are data-intensive, requiring millions of training examples to learn effectively. Medical images may contain confidential and sensitive information about patients that cannot always be shared outside of the institutions where they were created, especially when complete de-identification is not possible. Some organisations have policies and procedures in place for storing and exchanging personally identifiable information and health data. As a result, large archives of medical data from various consortia remain untapped information sources. Histopathology images, for example, cannot be collected and shared in large numbers due to the aforementioned regulations, as well as data size constraints due to their high resolution and gigapixel nature. The presence of bias or a lack of diversity in images from a single institution necessitates a collaborative approach that does not necessitate data centralisation. One solution to this issue is collaborative data sharing (CDS) or federated learning among hospitals.
In contrast to the traditional learning algorithms, federated learning algorithms learn from decentralised data distributed across multiple client devices. Most FL examples include a centralised server that facilitates training a shared model while also addressing critical issues such as data privacy, security, access rights, and heterogeneity. In FL, each client trains a local copy of the centralised model, represented by the model weights, and reports its updates to the server for aggregation across clients, without disclosing local private data. FL is particularly necessary for histopathology departments because it allows institutions to collaborate without sharing private patient data. The difficulty of domain adaptation is a significant challenge when applying FL to medical images, particularly histopathology. Because most hospitals use a variety of imaging methods and devices, images from various hospitals will be significantly different, and machine learning methods run the risk of overfitting to non-semantic differences between them. When implemented to previously unseen hospital images, FL-trained models can suffer significant performance drops.
The solution however, is a privacy-preserving solution for learning a generalisable FL model across clients using an effective continuous frequency space interpolation mechanism. Sharing frequency domain information allows semantic information to be separated from noise in the original images. Some researchers approach the problem of domain adaptation by using a physics-driven generative approach to disentangle information about model and geometry from imaging sensors. There is a vast reservoir of knowledge in hospital bulk archives of clinical data that is relatively unexplored due to numerous confidentiality and privacy concerns. Some researchers proposed differentially private federated learning as a method for learning from decentralised medical data, such as histopathology images. Federated learning enables models to be trained without explicitly sharing patient information, alleviating some of the confidentiality and privacy concerns associated with clinical data. Because private federated learning produces comparable results to traditional centralised training, it could be considered for distributed training on medical data.