- Omar Maddouri, Xiaoning Qian, Francis J. Alexander, Edward R. Dougherty, Byung-Jun Yoon. Robust importance sampling for error estimation in the context of optimal Bayesian transfer learning. Patterns, 2022; 100428 DOI: 10.1016/j.patter.2021.100428
In data-driven machine learning, models are built to make predictions and estimations for what’s to come in any given data set. One important field within machine learning is classification, which allows a data set to be assessed by an algorithm and then classified or broken down into classes or categories. When the data sets provided are very small, it can be very challenging to not only build a classification model based on this data but also to evaluate the performance of this model, ensuring its accuracy. This is where transfer learning comes into play.
“In transfer learning, we try to transfer knowledge or bring data from another domain to see whether we can enhance the task that we are doing in the domain of interest, or target domain,” Maddouri explained.
The target domain is where the models are built, and their performance is evaluated. The source domain is a separate domain that is still relevant to the target domain from which knowledge is transferred to make the analysis within the target domain easier.
Maddouri’s project utilizes a joint prior density to model the relatedness between the source and target domains and offers a Bayesian approach to apply the transfer learning principles to provide an overall error estimator of the models. An error estimator will deliver an estimate of how accurate these machine-learning models are at classifying the data sets at hand.
What this means is that before any data is observed, the team creates a model using their initial inferences about the model parameters in the target and source domains and then updates this model with enhanced accuracy as more evidence or information about the data sets becomes available.
This technique of transfer learning has been used to build models in previous works; however, no one has ever before used this transfer learning technique to propose novel error estimators to evaluate the performance of these models. For an efficient utilization, the devised estimator has been implemented using advanced statistical methods that enabled a fast screening of source data sets which enhances the computational complexity of the transfer learning process by 10 to 20 times.
This technique can help serve as a benchmark for future research within academia to build upon. In addition, it can help with identifying or classifying different medical issues that would otherwise be very difficult. For example, Maddouri utilized this technique to classify patients with schizophrenia using transcriptomic data from brain tissue samples originally acquired by invasive brain biopsies. Because of the nature and the location of the brain region that can be analyzed for this disorder, the data collected is very limited. However, using a stringent feature selection procedure that comprises differential gene expression analysis and statistical testing for assumptions validity, the research team identified transcriptomic profiles of three genes from an additional brain region found to be highly relevant to the desired brain tissue as reported by independent research studies from other literature.
This knowledge allowed them to utilize the transfer learning technique to leverage samples collected from the second brain region (source domain) to help with the analysis and significantly boost the accuracy of diagnosis within the original brain region (target domain). The data gathered from the source domain can be exploratory in the absence of information from the target domain, allowing the research team to enhance the quality of their conclusion.
This research has been funded by the Department of Energy and the National Science Foundation.
We want to thank the writer of this short article for this incredible content
New insight into machine-learning error estimation