research-article

Public Access

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

Authors:
Feiyun Zhu

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Jun Guo

University of Michigan, Ann Arbor, TX, USA

University of Michigan, Ann Arbor, TX, USA
View Profile

,
Ruoyu Li

The University of Texas at Arlington, Arlington, TX, USA

The University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Junzhou Huang

The University of Texas at Arlington; Tencent AI Lab, Arlington, TX, USA

The University of Texas at Arlington; Tencent AI Lab, Arlington, TX, USA
View Profile

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsAugust 2018Pages 492–501https://doi.org/10.1145/3233547.3233554

Published:15 August 2018Publication History

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 492–501

ABSTRACT

We consider the actor-critic contextual bandit for the mobile health (mHealth) intervention. State-of-the-art decision-making algorithms generally ignore the outliers in the data-set. In this paper, we propose a novel robust contextual bandit method for the mHealth. It can achieve the conflicting goal of reducing the influence of outliers, while seeking for a similar solution compared with the state-of-the-art contextual bandit methods on the datasets without outliers. Such performance relies on two technologies: (1) the capped-L2 norm; (2) a reliable method to set the threshold hyper-parameter, which is inspired by one of the most fundamental techniques in the statistics. Although the model is non-convex and non-differentiable, we propose an effective reweighted algorithm and provide solid theoretical analyses. We prove that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations. Extensive experiment results on two datasets demonstrate that our method can achieve almost identical results compared with state-of-the-art contextual bandit methods on the dataset without outliers, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.

References

King Abby, Hekler Eric, Grieco Lauren, Winter Sandra, Sheats Jylana, Buman Matthew, .., and Cirimele Jesse. 2013. Harnessing different motivational frames via mobile phones to promote daily physical activity and reduce sedentary behavior in aging adults. Plos ONE 8, 4 (2013).Google Scholar
D. Ben-Zeev, K. E. Davis, S. Kaiser, I. Krzsos, and R. E. Drake. 2013. Mobile technologies among people with serious mental illness: opportunities for future services. Administration and Policy in Mental Health and Mental Health Services Research 40, 4 (2013), 34--343.Google ScholarCross Ref
Ku-Chun Chou, Hsuan-Tien Lin, Chao-Kai Chiang, and Chi-Jen Lu. 2014. Pseudoreward Algorithms for Contextual Bandits with Linear Payoff Functions. In JMLR: Workshop and Conference Proceedings. 1--19.Google Scholar
Robert Dawson. 2011. How significant is a boxplot outlier. Journal of Statistics Education 19, 2 (2011), 1--12.Google ScholarCross Ref
Walter Dempsey, Peng Liao, Pedja Klasnja, Inbal Nahum-Shani, and Susan A. Murphy. 2016. Randomised trials for the Fitbit generation. Significance 12, 6 (Dec 2016), 20 -- 23.Google Scholar
Lee Dicker. 2014. Sparsity and the truncated l- 2-norm. In Artificial Intelligence and Statistics. 159--166.Google Scholar
Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML. 1097--1104. Google ScholarDigital Library
Joseph Firth, John Torous, and Alison Yung. 2016. Ecological momentary assessment and beyond: the rising interest in e-mental health research. Journal of psychiatric research 80 (2016), 3--4.Google ScholarCross Ref
Hongchang Gao, Feiping Nie, Weidong Cai, and Heng Huang. 2015. Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF. In ACM International Conference on Information and Knowledge. 871--880. Google ScholarDigital Library
Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, and Robert Babuska. 2012. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients. IEEE Trans. Systems, Man, and Cybernetics 42, 6 (2012), 1291--1307. Google ScholarDigital Library
D.H. Gustafson, F.M. McTavish, M.Y. Chih, A.K. Atwood, ..., and D. Shah. 2014. A smartphone application to support recovery from alcoholism: a randomized clinical trial. JAMA Psychiatry 71, 5 (2014), 566--572.Google ScholarCross Ref
Chun-Yen Ho and Hsuan-Tien Lin. 2015. Contract Bridge Bidding by Learning.. In AAAI Workshop: Computer Poker and Imperfect Information.Google Scholar
Wenhao Jiang, Feiping Nie, and Heng Huang. 2015. Robust Dictionary Learning with Capped l1-Norm.. In IJCAI. 3590--3596. Google ScholarDigital Library
Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34, S (2015), 1220.Google Scholar
Michail G Lagoudakis and Ronald Parr. 2003. Least-squares policy iteration. Journal of machine learning research 4, Dec (2003), 1107--1149. Google ScholarDigital Library
Huitian Lei. 2016. An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention. Ph.D. Dissertation. University of Michigan.Google Scholar
Huitian Lei, A. Tewari, and Susan Murphy. 2014. An Actor-Critic Contextual Bandit Algorithm for Personalized Interventions using Mobile Devices. In NIPS 2014 Workshop: Personalization: Methods and Applications. 1 -- 9.Google Scholar
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextualbandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW). 661--670. Google ScholarDigital Library
Ruoyu Li and Junzhou Huang. 2015. Fast regions-of-interest detection in whole slide histopathology images. In International Workshop on Patch-based Techniques in Medical Imaging. Springer, 120--127.Google ScholarCross Ref
Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2018. Adaptive Graph Convolutional Neural Networks. arXiv:1801.03226 (2018).Google Scholar
Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, and Ji He. 2015. Recurrent reinforcement learning: a hybrid approach. arXiv:1509.03044 (2015).Google Scholar
Peng Liao, A. Tewari, and Susan Murphy. 2015. Constructing Just-in-Time Adaptive Interventions. Phd Section Proposal (2015), 1--49.Google Scholar
Hancong Liu, Sirish Shah, and Wei Jiang. 2004. On-line outlier detection and data cleaning. Computers & chemical engineering 28, 9 (2004), 1635--1647.Google Scholar
Susan A. Murphy, Yanzhen Deng, Eric B. Laber, Hamid Reza Maei, Richard S. Sutton, and Katie Witkiewitz. 2016. A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward. CoRR abs/1607.05047 (2016).Google Scholar
Feiping Nie, Heng Huang, Xiao Cai, and Chris H. Ding. 2010. Efficient and Robust Feature Selection via Joint ?-2,1-Norms Minimization. In NIPS. 1813--1821. Google ScholarDigital Library
Jorge Nocedal and Stephen J. Wright. 2006. Numerical Optimization (2nd ed.). Springer, New York.Google Scholar
Qian Sun, Shuo Xiang, and Jieping Ye. 2013. Robust principal component analysis via capped norms. In ACM SIGKDD. 311--319. Google ScholarDigital Library
J Suomela. 2014. Median Filtering is Equivalent to Sorting. arXiv:1406.1717 (2014).Google Scholar
Richard S. Sutton and Andrew G. Barto. 2012. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, MA, USA.Google Scholar
Ambuj Tewari and Susan A. Murphy. 2017. From Ads to Interventions: Contextual Bandits in Mobile Health. In Mobile Health: Sensors, Analytic Methods, and Applications, Jim Rehg, Susan A. Murphy, and Santosh Kumar (Eds.). Springer.Google Scholar
Ying Wang, Chunhong Pan, Shiming Xiang, and Feiyen Zhu. 2015. Robust hyperspectral unmixing with correntropy-based metric. IEEE Transactions on Image Processing 24, 11 (2015), 4027--4040.Google ScholarCross Ref
David F Williamson, Robert A Parker, and Juliette S Kendrick. 1989. The box plot: a simple visual method to interpret data. Annals of internal medicine 110, 11 (1989), 916--921.Google ScholarCross Ref
K. Witkiewitz, S. Desai, S. Bowen, B. Leigh, M. Kirouac, and M. Larimer. 2014. Development and evaluation of a mobile intervention for heavy drinking and smoking among college studen. Psychology of Addictive Behaviors 28, 3 (2014), 639--650.Google ScholarCross Ref
Huan Xu. 2009. Robust decision making and its applications in machine learning. McGill University.Google Scholar
Zheng Xu and Junzhou Huang. 2016. Detecting 10,000 cells in one second. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 676--684.Google ScholarCross Ref
Zheng Xu, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2017. Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 285--294. Google ScholarDigital Library
Min Yang, Linli Xu, Martha White, Dale Schuurmans, and Yao-liang Yu. 2010. Relaxed clipping: A global training method for robust regression and classification. In Advances in neural information processing systems. 2532--2540. Google ScholarDigital Library
Baichuan Zhang and Mohammad Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management. Google ScholarDigital Library
Baqun Zhang, Anastasios A Tsiatis, Eric B Laber, and Marie Davidian. 2012. A robust method for estimating optimal treatment regimes. Biometrics 68, 4 (2012), 1010--1018.Google ScholarCross Ref
Feiyun Zhu, Bin Fan, Xinliang Zhu, Ying Wang, Shiming Xiang, and Chunhong Pan. 2015. 10,000+ Times Accelerated Robust Subset Selection (ARSS). In Proc. Assoc. Adv. Artif. Intell. (AAAI). 3217--3224. Google ScholarDigital Library
Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, and Junzhou Huang. 2018. Groupdriven Reinforcement Learning for Personalized mHealth Intervention. In International Conference on Medical Image Computing and Computer Assisted Intervention.Google Scholar
Feiyun Zhu and Peng Liao. 2017. Effective warm start for the online actor-critic reinforcement learning based mhealth intervention. In The Multidisciplinary Conference on. Reinforcement Learning and Decision Making.Google Scholar
Feiyun Zhu, Ying Wang, Bin Fan, Gaofeng Meng, and Chunhong Pan. 2014. Effective Spectral Unmixing via Robust Representation and Learning-based Sparsity. CoRR abs/1409.0685 (2014). http://arxiv.org/abs/1409.0685Google Scholar

Index Terms

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

Recommendations

Cohesion-driven Online Actor-Critic Reinforcement Learning for mHealth Intervention
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

In the wake of the vast population of smart device users worldwide, mobile health (mHealth) technologies are hopeful to generate positive and wide influence on people's health. They are able to provide flexible, affordable and portable health guides to ...
Read More
Robust Contextual Bandit via the Capped- Norm for Mobile Health Intervention
Machine Learning in Medical Imaging
Abstract
This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in the mHealth generally assume that the noise in the dynamic system follows the Gaussian ...
Read More
Corruption-tolerant bandit learning
Abstract
We present algorithms for solving multi-armed and linear-contextual bandit tasks in the face of adversarial corruptions in the arm responses. Traditional algorithms for solving these problems assume that nothing but mild, e.g., i.i.d. sub-Gaussian,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2018
727 pages
ISBN:9781450357944
DOI:10.1145/3233547
General Chairs:
Amarda Shehu
George Mason University, USA
,
Cathy Wu
University of Delaware, USA
,
Program Chairs:
Christina Boucher
University of Florida, USA
,
Jing Li
Case Western Reserve University, USA
,
Hongfang Liu
Mayo Clinic, USA
,
Mihai Pop
University of Maryland, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
actor-critic contextual bandit
markov decision process (mdp)
mobile health (mhealth) intervention
robust learning
Qualifiers
- research-article
Conference

Acceptance Rates
BCB '18 Paper Acceptance Rate46of148submissions,31%Overall Acceptance Rate254of885submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cohesion-driven Online Actor-Critic Reinforcement Learning for mHealth Intervention

Robust Contextual Bandit via the Capped- Norm for Mobile Health Intervention

Corruption-tolerant bandit learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cohesion-driven Online Actor-Critic Reinforcement Learning for mHealth Intervention

Robust Contextual Bandit via the Capped- Norm for Mobile Health Intervention

Corruption-tolerant bandit learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media