skip to main content
10.1145/3233547.3233554acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Public Access

Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

Authors Info & Claims
Published:15 August 2018Publication History

ABSTRACT

We consider the actor-critic contextual bandit for the mobile health (mHealth) intervention. State-of-the-art decision-making algorithms generally ignore the outliers in the data-set. In this paper, we propose a novel robust contextual bandit method for the mHealth. It can achieve the conflicting goal of reducing the influence of outliers, while seeking for a similar solution compared with the state-of-the-art contextual bandit methods on the datasets without outliers. Such performance relies on two technologies: (1) the capped-L2 norm; (2) a reliable method to set the threshold hyper-parameter, which is inspired by one of the most fundamental techniques in the statistics. Although the model is non-convex and non-differentiable, we propose an effective reweighted algorithm and provide solid theoretical analyses. We prove that the proposed algorithm can sufficiently decrease the objective function value at each iteration and will converge after a finite number of iterations. Extensive experiment results on two datasets demonstrate that our method can achieve almost identical results compared with state-of-the-art contextual bandit methods on the dataset without outliers, and significantly outperform those state-of-the-art methods on the badly noised dataset with outliers in a variety of parameter settings.

References

  1. King Abby, Hekler Eric, Grieco Lauren, Winter Sandra, Sheats Jylana, Buman Matthew, .., and Cirimele Jesse. 2013. Harnessing different motivational frames via mobile phones to promote daily physical activity and reduce sedentary behavior in aging adults. Plos ONE 8, 4 (2013).Google ScholarGoogle Scholar
  2. D. Ben-Zeev, K. E. Davis, S. Kaiser, I. Krzsos, and R. E. Drake. 2013. Mobile technologies among people with serious mental illness: opportunities for future services. Administration and Policy in Mental Health and Mental Health Services Research 40, 4 (2013), 34--343.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ku-Chun Chou, Hsuan-Tien Lin, Chao-Kai Chiang, and Chi-Jen Lu. 2014. Pseudoreward Algorithms for Contextual Bandits with Linear Payoff Functions. In JMLR: Workshop and Conference Proceedings. 1--19.Google ScholarGoogle Scholar
  4. Robert Dawson. 2011. How significant is a boxplot outlier. Journal of Statistics Education 19, 2 (2011), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  5. Walter Dempsey, Peng Liao, Pedja Klasnja, Inbal Nahum-Shani, and Susan A. Murphy. 2016. Randomised trials for the Fitbit generation. Significance 12, 6 (Dec 2016), 20 -- 23.Google ScholarGoogle Scholar
  6. Lee Dicker. 2014. Sparsity and the truncated l- 2-norm. In Artificial Intelligence and Statistics. 159--166.Google ScholarGoogle Scholar
  7. Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML. 1097--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Joseph Firth, John Torous, and Alison Yung. 2016. Ecological momentary assessment and beyond: the rising interest in e-mental health research. Journal of psychiatric research 80 (2016), 3--4.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hongchang Gao, Feiping Nie, Weidong Cai, and Heng Huang. 2015. Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF. In ACM International Conference on Information and Knowledge. 871--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, and Robert Babuska. 2012. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients. IEEE Trans. Systems, Man, and Cybernetics 42, 6 (2012), 1291--1307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D.H. Gustafson, F.M. McTavish, M.Y. Chih, A.K. Atwood, ..., and D. Shah. 2014. A smartphone application to support recovery from alcoholism: a randomized clinical trial. JAMA Psychiatry 71, 5 (2014), 566--572.Google ScholarGoogle ScholarCross RefCross Ref
  12. Chun-Yen Ho and Hsuan-Tien Lin. 2015. Contract Bridge Bidding by Learning.. In AAAI Workshop: Computer Poker and Imperfect Information.Google ScholarGoogle Scholar
  13. Wenhao Jiang, Feiping Nie, and Heng Huang. 2015. Robust Dictionary Learning with Capped l1-Norm.. In IJCAI. 3590--3596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Predrag Klasnja, Eric B Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A Murphy. 2015. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology 34, S (2015), 1220.Google ScholarGoogle Scholar
  15. Michail G Lagoudakis and Ronald Parr. 2003. Least-squares policy iteration. Journal of machine learning research 4, Dec (2003), 1107--1149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huitian Lei. 2016. An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention. Ph.D. Dissertation. University of Michigan.Google ScholarGoogle Scholar
  17. Huitian Lei, A. Tewari, and Susan Murphy. 2014. An Actor-Critic Contextual Bandit Algorithm for Personalized Interventions using Mobile Devices. In NIPS 2014 Workshop: Personalization: Methods and Applications. 1 -- 9.Google ScholarGoogle Scholar
  18. Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextualbandit approach to personalized news article recommendation. In International Conference on World Wide Web (WWW). 661--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ruoyu Li and Junzhou Huang. 2015. Fast regions-of-interest detection in whole slide histopathology images. In International Workshop on Patch-based Techniques in Medical Imaging. Springer, 120--127.Google ScholarGoogle ScholarCross RefCross Ref
  20. Ruoyu Li, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2018. Adaptive Graph Convolutional Neural Networks. arXiv:1801.03226 (2018).Google ScholarGoogle Scholar
  21. Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, and Ji He. 2015. Recurrent reinforcement learning: a hybrid approach. arXiv:1509.03044 (2015).Google ScholarGoogle Scholar
  22. Peng Liao, A. Tewari, and Susan Murphy. 2015. Constructing Just-in-Time Adaptive Interventions. Phd Section Proposal (2015), 1--49.Google ScholarGoogle Scholar
  23. Hancong Liu, Sirish Shah, and Wei Jiang. 2004. On-line outlier detection and data cleaning. Computers & chemical engineering 28, 9 (2004), 1635--1647.Google ScholarGoogle Scholar
  24. Susan A. Murphy, Yanzhen Deng, Eric B. Laber, Hamid Reza Maei, Richard S. Sutton, and Katie Witkiewitz. 2016. A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward. CoRR abs/1607.05047 (2016).Google ScholarGoogle Scholar
  25. Feiping Nie, Heng Huang, Xiao Cai, and Chris H. Ding. 2010. Efficient and Robust Feature Selection via Joint ?-2,1-Norms Minimization. In NIPS. 1813--1821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jorge Nocedal and Stephen J. Wright. 2006. Numerical Optimization (2nd ed.). Springer, New York.Google ScholarGoogle Scholar
  27. Qian Sun, Shuo Xiang, and Jieping Ye. 2013. Robust principal component analysis via capped norms. In ACM SIGKDD. 311--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J Suomela. 2014. Median Filtering is Equivalent to Sorting. arXiv:1406.1717 (2014).Google ScholarGoogle Scholar
  29. Richard S. Sutton and Andrew G. Barto. 2012. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, MA, USA.Google ScholarGoogle Scholar
  30. Ambuj Tewari and Susan A. Murphy. 2017. From Ads to Interventions: Contextual Bandits in Mobile Health. In Mobile Health: Sensors, Analytic Methods, and Applications, Jim Rehg, Susan A. Murphy, and Santosh Kumar (Eds.). Springer.Google ScholarGoogle Scholar
  31. Ying Wang, Chunhong Pan, Shiming Xiang, and Feiyen Zhu. 2015. Robust hyperspectral unmixing with correntropy-based metric. IEEE Transactions on Image Processing 24, 11 (2015), 4027--4040.Google ScholarGoogle ScholarCross RefCross Ref
  32. David F Williamson, Robert A Parker, and Juliette S Kendrick. 1989. The box plot: a simple visual method to interpret data. Annals of internal medicine 110, 11 (1989), 916--921.Google ScholarGoogle ScholarCross RefCross Ref
  33. K. Witkiewitz, S. Desai, S. Bowen, B. Leigh, M. Kirouac, and M. Larimer. 2014. Development and evaluation of a mobile intervention for heavy drinking and smoking among college studen. Psychology of Addictive Behaviors 28, 3 (2014), 639--650.Google ScholarGoogle ScholarCross RefCross Ref
  34. Huan Xu. 2009. Robust decision making and its applications in machine learning. McGill University.Google ScholarGoogle Scholar
  35. Zheng Xu and Junzhou Huang. 2016. Detecting 10,000 cells in one second. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 676--684.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zheng Xu, Sheng Wang, Feiyun Zhu, and Junzhou Huang. 2017. Seq2seq fingerprint: An unsupervised deep molecular embedding for drug discovery. In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 285--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Min Yang, Linli Xu, Martha White, Dale Schuurmans, and Yao-liang Yu. 2010. Relaxed clipping: A global training method for robust regression and classification. In Advances in neural information processing systems. 2532--2540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Baichuan Zhang and Mohammad Al Hasan. 2017. Name Disambiguation in Anonymized Graphs using Network Embedding. In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Baqun Zhang, Anastasios A Tsiatis, Eric B Laber, and Marie Davidian. 2012. A robust method for estimating optimal treatment regimes. Biometrics 68, 4 (2012), 1010--1018.Google ScholarGoogle ScholarCross RefCross Ref
  40. Feiyun Zhu, Bin Fan, Xinliang Zhu, Ying Wang, Shiming Xiang, and Chunhong Pan. 2015. 10,000+ Times Accelerated Robust Subset Selection (ARSS). In Proc. Assoc. Adv. Artif. Intell. (AAAI). 3217--3224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, and Junzhou Huang. 2018. Groupdriven Reinforcement Learning for Personalized mHealth Intervention. In International Conference on Medical Image Computing and Computer Assisted Intervention.Google ScholarGoogle Scholar
  42. Feiyun Zhu and Peng Liao. 2017. Effective warm start for the online actor-critic reinforcement learning based mhealth intervention. In The Multidisciplinary Conference on. Reinforcement Learning and Decision Making.Google ScholarGoogle Scholar
  43. Feiyun Zhu, Ying Wang, Bin Fan, Gaofeng Meng, and Chunhong Pan. 2014. Effective Spectral Unmixing via Robust Representation and Learning-based Sparsity. CoRR abs/1409.0685 (2014). http://arxiv.org/abs/1409.0685Google ScholarGoogle Scholar

Index Terms

  1. Robust Actor-Critic Contextual Bandit for Mobile Health (mHealth) Interventions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
            August 2018
            727 pages
            ISBN:9781450357944
            DOI:10.1145/3233547

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 August 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            BCB '18 Paper Acceptance Rate46of148submissions,31%Overall Acceptance Rate254of885submissions,29%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader