Background and Objective: Social media presents a rich opportunity to gather health information with limited intervention. We sought to estimate health-related quality of life (HRQOL) of Twitter users using automated semantic processing methods. Methods: We collected tweets from 878 Twitter users recruited through online solicitation and in-person contact with patients. All participants completed the four-item Centers for Disease Control Healthy Days Questionnaire at the time of enrollment and 30 days later to measure "ground truth" HRQOL. We used features derived from the participants’ tweets to estimate dichotomized HRQOL ("high" vs. "low"). Results: Binary HRQOL status was estimated with moderate accuracy (AUC=0.64). The highest accuracy was achieved using a bagging generalized linear model with L1 regularization. Conclusions: Our preliminary analysis of social media posts was able to predict HRQOL better than chance. These findings may provide direction toward future studies with larger sample sizes.
@inproceedings{Sarma2018ml4h,
author = {Sarma, K. V. and Spiegel, B. M. R. and Reid, M. W. and Chen, S. and Merchant, R. M. and Seltzer, E. and Arnold, C. W.},
booktitle = {Machine Learning for Health at NeurIPS},
title = {{Can We Estimate the Health-Related Quality of Life of Twitter Users Using Tweets? A Feasibility Study}},
year = {2018},
}