ABSTRACT
We consider the actor-critic contextual bandit for the mobile health (mHealth) intervention. State-of-the-art decision-making algorithms generally ignore the outliers in the data-set. In this paper, we propose a novel robust contextual bandit method for the mHealth. It can achieve the conflicting goal of reducing the influence of outliers, while seeking for a similar solution compared with the state-of-the-art contextual bandit methods on the datasets without outliers. Such performance relies on two technologies: (1) the capped-L2 norm; (2) a reliable method to set the threshold hyper-parameter, which is inspired by one of the most fundamental techniques in the statistics. Although the model is non-convex and non-differentiable, we propose an effective reweighted algorithm and provide solid theoretical analyses. We prove that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations. Extensive experiment results on two datasets demonstrate that our method can achieve almost identical results compared with state-of-the-art contextual bandit methods on the dataset without outliers, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.
- King Abby, Hekler Eric, Grieco Lauren, Winter Sandra, Sheats Jylana, Buman Matthew, .., and Cirimele Jesse. 2013. Harnessing different motivational frames via mobile phones to promote daily physical activity and reduce sedentary behavior in aging adults. Plos ONE 8, 4 (2013).Google Scholar
- D. Ben-Zeev, K. E. Davis, S. Kaiser, I. Krzsos, and R. E. Drake. 2013. Mobile technologies among people with serious mental illness: opportunities for future services. Administration and Policy in Mental Health and Mental Health Services Research 40, 4 (2013), 34--343.Google ScholarCross Ref
- Ku-Chun Chou, Hsuan-Tien Lin, Chao-Kai Chiang, and Chi-Jen Lu. 2014. Pseudoreward Algorithms for Contextual Bandits with Linear Payoff Functions. In JMLR: Workshop and Conference Proceedings. 1--19.Google Scholar
- Robert Dawson. 2011. How significant is a boxplot outlier. Journal of Statistics Education 19, 2 (2011), 1--12.Google ScholarCross Ref
- Walter Dempsey, Peng Liao, Pedja Klasnja, Inbal Nahum-Shani, and Susan A. Murphy. 2016. Randomised trials for the Fitbit generation. Significance 12, 6 (Dec 2016), 20 -- 23.Google Scholar
- Lee Dicker. 2014. Sparsity and the truncated l- 2-norm. In Artificial Intelligence and Statistics. 159--166.Google Scholar
- Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML. 1097--1104. Google ScholarDigital Library
- Joseph Firth, John Torous, and Alison Yung. 2016. Ecological momentary assessment and beyond: the rising interest in e-mental health research. Journal of psychiatric research 80 (2016), 3--4.Google ScholarCross Ref
- Hongchang Gao, Feiping Nie, Weidong Cai, and Heng Huang. 2015. Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF. In ACM International Conference on Information and Knowledge. 871--880. Google ScholarDigital Library
- Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, and Robert Babuska. 2012. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients. IEEE Trans. Systems, Man, and Cybernetics 42, 6 (2012), 1291--1307. Google ScholarDigital Library
- D.H. Gustafson, F.M. McTavish, M.Y. Chih, A.K. Atwood, ..., and D. Shah. 2014. A smartphone application to support recovery from alcoholism: a randomized clinical trial. JAMA Psychiatry 71, 5 (2014), 566--572.Google ScholarCross Ref
- Chun-Yen Ho and Hsuan-Tien Lin. 2015. Contract Bridge Bidding by Learning.. In AAAI Workshop: Computer Poker and Imperfect Information.Google Scholar
- Wenhao Jiang, Feiping Nie, and Heng Huang. 2015. Robust Dictionary Learning with Capped l1-Norm.. In IJCAI. 3590--3596. Google ScholarDigital Library
- Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34, S (2015), 1220.Google Scholar
- Michail G Lagoudakis and Ronald Parr. 2003. Least-squares policy iteration. Journal of machine learning research 4, Dec (2003), 1107--1149. Google ScholarDigital Library
- Huitian Lei. 2016. An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention. Ph.D. Dissertation. University of Michigan.Google Scholar
- Huitian Lei, A. Tewari, and Susan Murphy. 2014. An Actor-Critic Contextual Bandit Algorithm for Personalized Interventions using Mobile Devices. In NIPS 2014 Workshop: Personalization: Methods and Applications. 1 -- 9.Google Scholar
- Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextualbandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW). 661--670. Google ScholarDigital Library
- Ruoyu Li and Junzhou Huang. 2015. Fast regions-of-interest detection in whole slide histopathology images. In International Workshop on Patch-based Techniques in Medical Imaging. Springer, 120--127.Google ScholarCross Ref
- Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2018. Adaptive Graph Convolutional Neural Networks. arXiv:1801.03226 (2018).Google Scholar
- Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, and Ji He. 2015. Recurrent reinforcement learning: a hybrid approach. arXiv:1509.03044 (2015).Google Scholar
- Peng Liao, A. Tewari, and Susan Murphy. 2015. Constructing Just-in-Time Adaptive Interventions. Phd Section Proposal (2015), 1--49.Google Scholar
- Hancong Liu, Sirish Shah, and Wei Jiang. 2004. On-line outlier detection and data cleaning. Computers & chemical engineering 28, 9 (2004), 1635--1647.Google Scholar
- Susan A. Murphy, Yanzhen Deng, Eric B. Laber, Hamid Reza Maei, Richard S. Sutton, and Katie Witkiewitz. 2016. A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward. CoRR abs/1607.05047 (2016).Google Scholar
- Feiping Nie, Heng Huang, Xiao Cai, and Chris H. Ding. 2010. Efficient and Robust Feature Selection via Joint ?-2,1-Norms Minimization. In NIPS. 1813--1821. Google ScholarDigital Library
- Jorge Nocedal and Stephen J. Wright. 2006. Numerical Optimization (2nd ed.). Springer, New York.Google Scholar
- Qian Sun, Shuo Xiang, and Jieping Ye. 2013. Robust principal component analysis via capped norms. In ACM SIGKDD. 311--319. Google ScholarDigital Library
- J Suomela. 2014. Median Filtering is Equivalent to Sorting. arXiv:1406.1717 (2014).Google Scholar
- Richard S. Sutton and Andrew G. Barto. 2012. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, MA, USA.Google Scholar
- Ambuj Tewari and Susan A. Murphy. 2017. From Ads to Interventions: Contextual Bandits in Mobile Health. In Mobile Health: Sensors, Analytic Methods, and Applications, Jim Rehg, Susan A. Murphy, and Santosh Kumar (Eds.). Springer.Google Scholar
- Ying Wang, Chunhong Pan, Shiming Xiang, and Feiyen Zhu. 2015. Robust hyperspectral unmixing with correntropy-based metric. IEEE Transactions on Image Processing 24, 11 (2015), 4027--4040.Google ScholarCross Ref
- David F Williamson, Robert A Parker, and Juliette S Kendrick. 1989. The box plot: a simple visual method to interpret data. Annals of internal medicine 110, 11 (1989), 916--921.Google ScholarCross Ref
- K. Witkiewitz, S. Desai, S. Bowen, B. Leigh, M. Kirouac, and M. Larimer. 2014. Development and evaluation of a mobile intervention for heavy drinking and smoking among college studen. Psychology of Addictive Behaviors 28, 3 (2014), 639--650.Google ScholarCross Ref
- Huan Xu. 2009. Robust decision making and its applications in machine learning. McGill University.Google Scholar
- Zheng Xu and Junzhou Huang. 2016. Detecting 10,000 cells in one second. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 676--684.Google ScholarCross Ref
- Zheng Xu, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2017. Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 285--294. Google ScholarDigital Library
- Min Yang, Linli Xu, Martha White, Dale Schuurmans, and Yao-liang Yu. 2010. Relaxed clipping: A global training method for robust regression and classification. In Advances in neural information processing systems. 2532--2540. Google ScholarDigital Library
- Baichuan Zhang and Mohammad Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management. Google ScholarDigital Library
- Baqun Zhang, Anastasios A Tsiatis, Eric B Laber, and Marie Davidian. 2012. A robust method for estimating optimal treatment regimes. Biometrics 68, 4 (2012), 1010--1018.Google ScholarCross Ref
- Feiyun Zhu, Bin Fan, Xinliang Zhu, Ying Wang, Shiming Xiang, and Chunhong Pan. 2015. 10,000+ Times Accelerated Robust Subset Selection (ARSS). In Proc. Assoc. Adv. Artif. Intell. (AAAI). 3217--3224. Google ScholarDigital Library
- Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, and Junzhou Huang. 2018. Groupdriven Reinforcement Learning for Personalized mHealth Intervention. In International Conference on Medical Image Computing and Computer Assisted Intervention.Google Scholar
- Feiyun Zhu and Peng Liao. 2017. Effective warm start for the online actor-critic reinforcement learning based mhealth intervention. In The Multidisciplinary Conference on. Reinforcement Learning and Decision Making.Google Scholar
- Feiyun Zhu, Ying Wang, Bin Fan, Gaofeng Meng, and Chunhong Pan. 2014. Effective Spectral Unmixing via Robust Representation and Learning-based Sparsity. CoRR abs/1409.0685 (2014). http://arxiv.org/abs/1409.0685Google Scholar
Index Terms
- Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions
Recommendations
Cohesion-driven Online Actor-Critic Reinforcement Learning for mHealth Intervention
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsIn the wake of the vast population of smart device users worldwide, mobile health (mHealth) technologies are hopeful to generate positive and wide influence on people's health. They are able to provide flexible, affordable and portable health guides to ...
Robust Contextual Bandit via the Capped- Norm for Mobile Health Intervention
Machine Learning in Medical ImagingAbstractThis paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in the mHealth generally assume that the noise in the dynamic system follows the Gaussian ...
Corruption-tolerant bandit learning
AbstractWe present algorithms for solving multi-armed and linear-contextual bandit tasks in the face of adversarial corruptions in the arm responses. Traditional algorithms for solving these problems assume that nothing but mild, e.g., i.i.d. sub-Gaussian,...
Comments