research-article

Examining User Heterogeneity in Digital Experiments

Authors:
Sriram Somanchi

University of Notre Dame, Notre Dame, IN, USA

University of Notre Dame, Notre Dame, IN, USA

0000-0002-3153-1248
View Profile

,
Ahmed Abbasi

University of Notre Dame, Notre Dame, IN, USA

University of Notre Dame, Notre Dame, IN, USA

0000-0001-7698-7794
View Profile

,
Ken Kelley

University of Notre Dame, Notre Dame, IN, USA

University of Notre Dame, Notre Dame, IN, USA

0000-0002-4756-8360
View Profile

,
David Dobolyi

University of Colorado, Boulder, CO, USA

University of Colorado, Boulder, CO, USA

0000-0002-9493-3447
View Profile

,
Ted Tao Yuan

eBay, San Jose, CA, USA

eBay, San Jose, CA, USA

0000-0002-5876-4826
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 41 Issue 4Article No.: 100pp 1–34https://doi.org/10.1145/3578931

Published:22 March 2023Publication History

ACM Transactions on Information Systems

Abstract

Digital experiments are routinely used to test the value of a treatment relative to a status-quo control setting—for instance, a new search relevance algorithm for a website or a new results layout for a mobile app. As digital experiments have become increasingly pervasive in organizations and a wide variety of research areas, their growth has prompted a new set of challenges for experimentation platforms. One challenge is that experiments often focus on the average treatment effect (ATE) without explicitly considering differences across major sub-groups: heterogeneous treatment effect (HTE). This is especially problematic, because ATEs have decreased in many organizations as the more obvious benefits have already been realized. However, questions abound regarding the pervasiveness of user HTEs and how best to detect them. We propose a framework for detecting and analyzing user HTEs in digital experiments. Our framework combines an array of user characteristics with double machine learning. Analysis of 27 real-world experiments spanning 1.76 billion sessions and simulated data demonstrates the effectiveness of our detection method relative to existing techniques. We also find that transaction, demographic, engagement, satisfaction, and lifecycle characteristics exhibit statistically significant HTEs in 10% to 20% of our real-world experiments, underscoring the importance of considering user heterogeneity when analyzing experiment results; otherwise, personalized features and experiences cannot happen, thus reducing effectiveness. In terms of the number of experiments and user sessions, we are not aware of any study that has examined user HTEs at this scale. Our findings have important implications for information retrieval, user modeling, platforms, and digital experience contexts, in which online experiments are often used to evaluate the effectiveness of design artifacts.

REFERENCES

[1] Abbasi Ahmed, Chen Hsinchun, and Salem Arab. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 1–34.Google ScholarDigital Library
[2] Abbasi Ahmed, Lau Raymond Y. K., and Brown Donald E.. 2015. Predicting behavior. IEEE Intell. Syst. 30, 3 (2015), 35–43.Google ScholarDigital Library
[3] Adjerid Idris and Kelley Ken. 2018. Big data in psychology: A framework for research advancement. Am. Psychol. 73, 4 (2018), 899–917.Google ScholarCross Ref
[4] Ahmad Faizan, Abbasi Ahmed, Kitchens Brent, Adjeroh Donald A., and Zeng Daniel. 2020. Deep learning for adverse event detection from web search. IEEE Trans. Knowl. Data Eng. 34, 6 (2020), 2681–2695.Google ScholarCross Ref
[5] Ahmad Faizan, Abbasi Ahmed, Li Jingjing, Dobolyi David G., Netemeyer Richard G., Clifford Gari D., and Chen Hsinchun. 2020. A deep learning architecture for psychometric natural language processing. ACM Trans. Inf. Syst. 38, 1 (2020), 1–29.Google ScholarDigital Library
[6] Arguello Jaime and Choi Bogeum. 2019. The effects of working memory, perceptual speed, and inhibition in aggregated search. ACM Trans. Inf. Syst. 37, 3 (2019), 1–34.Google ScholarDigital Library
[7] Arnold Barry C.. 2015. Pareto Distribution. John Wiley & Sons, Ltd, 1–10.Google Scholar
[8] Athey Susan and Imbens Guido. 2016. Recursive partitioning for heterogeneous causal effects. Proc. Natl. Acad. Sci. U.S.A. 113, 27 (2016), 7353–7360.Google ScholarCross Ref
[9] Athey Susan, Tibshirani Julie, and Wager Stefan. 2019. Generalized random forests. Ann. Stat. 47, 2 (2019), 1148–1178. Google ScholarCross Ref
[10] Bai Xiao, Arapakis Ioannis, Cambazoglu B. Barla, and Freire Ana. 2017. Understanding and leveraging the impact of response latency on user behaviour in web search. ACM Trans. Inf. Syst. 36, 2 (2017), 1–42.Google ScholarDigital Library
[11] Ballings Michel and Poel Dirk Van den. 2012. Customer event history for churn prediction: How long is long enough? Expert Syst. Appl. 39, 18 (2012), 13517–13522.Google ScholarDigital Library
[12] Bao Yihan, Han Shichao, and Wang Yong. 2021. Treatment effect detection with controlled FDR under dependence for large-scale experiments. arXiv:2110.07279. Retrieved from https://arxiv.org/abs/2110.07279.Google Scholar
[13] Barber Rina Foygel and Candès Emmanuel J.. 2015. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 5 (2015), 2055–2085.Google ScholarCross Ref
[14] Bartus Tamás. 2005. Estimation of marginal effects using margeff. Stata J. 5, 3 (2005), 309–329. Google ScholarCross Ref
[15] Benjamini Yoav and Hochberg Yosef. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stat. Soc.: Ser. B (Methodol.) 57, 1 (1995), 289–300. Google ScholarCross Ref
[16] Benjamini Yoav and Yekutieli Daniel. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 4 (2001), 1165–1188.Google Scholar
[17] Bland J. Martin and Altman Douglas G.. 1986. Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet 327, 8476 (1986), 307–310.Google ScholarCross Ref
[18] Breiman Leo. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5–32.Google ScholarDigital Library
[19] Brown Donald E., Abbasi Ahmed, and Lau Raymond Y. K.. 2015. Predictive analytics: Predictive modeling at the micro level. IEEE Intell. Syst. 30, 3 (2015), 6–8.Google ScholarDigital Library
[20] Chernozhukov Victor, Chetverikov Denis, Demirer Mert, Duflo Esther, Hansen Christian, and Newey Whitney. 2017. Double/debiased/neyman machine learning of treatment effects. Am. Econ. Rev. 107, 5 (2017), 261–65.Google ScholarCross Ref
[21] Chernozhukov Victor, Chetverikov Denis, Demirer Mert, Duflo Esther, Hansen Christian, Newey Whitney, and Robins James. 2018. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21, 1 (2018), C1–C68.Google ScholarCross Ref
[22] Cunningham Scott. 2021. Causal Inference: The mixtape. Yale University Press.Google ScholarCross Ref
[23] Deng Alex, Xu Ya, Kohavi Ron, and Walker Toby. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 123–132.Google ScholarDigital Library
[24] Dong Yuxiao, Chawla Nitesh V., Tang Jie, Yang Yang, and Yang Yang. 2017. User modeling on demographic attributes in big mobile social networks. ACM Trans. Inf. Syst. 35, 4 (2017), 1–33.Google ScholarDigital Library
[25] Duan Boyan, Wasserman Larry, and Ramdas Aaditya. 2021. Interactive identification of individuals with positive treatment effect while controlling false discoveries. arXiv:2102.10778. Retrieved from https://arxiv.org/abs/2102.10778.Google Scholar
[26] Fithian William and Lei Lihua. 2022. Conditional calibration for false discovery rate control under dependence. Ann. Stat. 50, 6 (2022), 3091–3118.Google Scholar
[27] Fu Tianjun, Abbasi Ahmed, Zeng Daniel, and Chen Hsinchun. 2012. Sentimental spidering: Leveraging opinion information in focused crawlers. ACM Trans. Inf. Syst. 30, 4 (2012), 1–30.Google ScholarDigital Library
[28] Gao Shen, Chen Xiuying, Liu Li, Zhao Dongyan, and Yan Rui. 2021. Learning to respond with your favorite stickers: A framework of unifying multi-modality and user preference in multi-turn dialog. ACM Trans. Inf. Syst. 39, 2 (2021), 1–32.Google ScholarDigital Library
[29] Gázquez-Abad Juan Carlos, Canniére Marie Hélène De, and Martínez-López Francisco J.. 2011. Dynamics of customer response to promotional and relational direct mailings from an apparel retailer: The moderating role of relationship strength. J. Retail. 87, 2 (2011), 166–181.Google ScholarCross Ref
[30] Gunarathne Priyanga, Rui Huaxia, and Seidmann Abraham. 2017. Whose and what social media complaints have happier resolutions? Evidence from Twitter. J. Manage. Inf. Syst. 34, 2 (2017), 314–340.Google ScholarCross Ref
[31] Gunawardana Asela and Shani Guy. 2015. Evaluating recommender systems. In Recommender Systems Handbook. Springer, 265–308.Google ScholarCross Ref
[32] Guo Yue, Yang Yi, and Abbasi Ahmed. 2022. Auto-debias: Debiasing masked language models with automated biased prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1012–1023.Google ScholarCross Ref
[33] Gupta Somit, Kohavi Ronny, Tang Diane, Xu Ya, Andersen Reid, Bakshy Eytan, Cardin Niall, Chandran Sumita, Chen Nanyu, Coey Dominic, et al. 2019. Top challenges from the first practical online controlled experiments summit. ACM SIGKDD Explor. Newslett. 21, 1 (2019), 20–35.Google ScholarDigital Library
[34] Han Shuguang, Yue Zhen, and He Daqing. 2015. Understanding and supporting cross-device web search for exploratory tasks with mobile touch interactions. ACM Trans. Inf. Syst. 33, 4 (2015), 1–34.Google ScholarDigital Library
[35] Hastie Trevor and Tibshirani Robert. 1987. Non-parametric logistic and proportional odds regression. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 36, 3 (1987), 260–276.Google Scholar
[36] He Yuzi, Tran Christopher, Jiang Julie, Burghardt Keith, Ferrara Emilio, Zheleva Elena, and Lerman Kristina. 2021. Heterogeneous effects of software patches in a multiplayer online battle arena game. In Proceedings of the 16th International Conference on the Foundations of Digital Games (FDG’21) 2021. 1–9.Google ScholarDigital Library
[37] Hitt Lorin M. and Frei Frances X.. 2002. Do better customers utilize electronic distribution channels? The case of PC banking. Manage. Sci. 48, 6 (2002), 732–748.Google ScholarDigital Library
[38] Ivory Melody Y. and Megraw Rodrick. 2005. Evolution of web site design patterns. ACM Trans. Inf. Syst. 23, 4 (2005), 463–497.Google ScholarDigital Library
[39] Kaushik Avinash. 2009. Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity. John Wiley & Sons.Google Scholar
[40] Kelley Ken and Maxwell Scott E.. 2003. Sample size for multiple regression: Obtaining regression coefficients that are accurate, not simply significant. Psychol. Methods 8, 3 (2003), 305–321.Google ScholarCross Ref
[41] Kelley Ken and Preacher Kristopher J.. 2012. On effect size. Psychol. Methods 17, 2 (2012), 137–152.Google ScholarCross Ref
[42] Kitchens Brent, Dobolyi David, Li Jingjing, and Abbasi Ahmed. 2018. Advanced customer analytics: Strategic value through integration of relationship-oriented big data. J. Manage. Inf. Syst. 35, 2 (2018), 540–574.Google ScholarCross Ref
[43] Kohavi Ron. 2015. Online controlled experiments: Lessons from running a/b/n tests for 12 years. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1–1.Google ScholarDigital Library
[44] Kohavi Ron, Deng Alex, Frasca Brian, Walker Toby, Xu Ya, and Pohlmann Nils. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1168–1176.Google ScholarDigital Library
[45] Kohavi Ron, Tang Diane, and Xu Ya. 2020. Trustworthy Online Controlled Experiments: A Practical Guide to a/b Testing. Cambridge University Press.Google ScholarCross Ref
[46] Kohavi Ron, Tang Diane, Xu Ya, Hemkens Lars G., and Ioannidis John P. A.. 2020. Online randomized controlled experiments at scale: Lessons and extensions to medicine. Trials 21, 1 (2020), 1–9.Google ScholarCross Ref
[47] Kohavi Ron and Thomke Stefan. 2017. The surprising power of online experiments. Harv. Bus. Rev. 95, 5 (2017), 74–82.Google Scholar
[48] Künzel Sören R., Sekhon Jasjeet S., Bickel Peter J., and Yu Bin. 2019. Metalearners for estimating heterogeneous treatment effects using machine learning. Proc. Natl. Acad. Sci. U.S.A> 116, 10 (2019), 4156–4165.Google ScholarCross Ref
[49] Lalor John P., Yang Yi, Smith Kendall, Forsgren Nicole, and Abbasi Ahmed. 2022. Benchmarking intersectional biases in NLP. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3598–3609.Google ScholarCross Ref
[50] Li Jingjing, Abbasi Ahmed, Cheema Amar, and Abraham Linda B.. 2020. Path to purpose? How online customer journeys differ for hedonic versus utilitarian purchases. J. Market. 84, 4 (2020), 127–146.Google ScholarCross Ref
[51] Li Jingjing, Larsen Kai, and Abbasi Ahmed. 2020. TheoryOn: A design framework and system for unlocking behavioral knowledge through ontology learning. MIS Quart. 44, 4 (2020).Google ScholarCross Ref
[52] Liang Shangsong, Luo Yupeng, and Meng Zaiqiao. 2021. Profiling users for question answering communities via flow-based constrained co-embedding model. ACM Trans. Inf. Syst. 40, 2 (2021), 1–38.Google ScholarDigital Library
[53] Lindgren Eveliina and Münch Jürgen. 2016. Raising the odds of success: The current state of experimentation in product development. Inf. Softw. Technol. 77 (2016), 80–91.Google ScholarDigital Library
[54] Miguéis Vera L., Poel Dirk Van den, Camanho Ana S., and Cunha João Falcão e. 2012. Modeling partial customer churn: On the value of first product-category purchase sequences. Expert Syst. Appl. 39, 12 (2012), 11250–11256.Google ScholarDigital Library
[55] Mittal Vikas and Kamakura Wagner A.. 2001. Satisfaction, repurchase intent, and repurchase behavior: Investigating the moderating effect of customer characteristics. J. Market. Res. 38, 1 (2001), 131–142.Google ScholarCross Ref
[56] Montgomery Alan L., Li Shibo, Srinivasan Kannan, and Liechty John C.. 2004. Modeling online browsing and path analysis using clickstream data. Market. Sci. 23, 4 (2004), 579–595.Google ScholarDigital Library
[57] Musto Cataldo, Narducci Fedelucio, Polignano Marco, Gemmis Marco De, Lops Pasquale, and Semeraro Giovanni. 2021. MyrrorBot: A digital assistant based on holistic user models for personalized access to online services. ACM Trans. Inf. Syst. 39, 4 (2021), 1–34.Google ScholarDigital Library
[58] Nie X. and Wager S.. 2020. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika 108, 2 (092020), 299–319. Google ScholarCross Ref
[59] Pearl Judea. 2009. Causal inference in statistics: An overview. Stat. Surv. 3 (2009), 96–146.Google ScholarCross Ref
[60] Reinartz Werner J. and Kumar Vita. 2003. The impact of customer relationship characteristics on profitable lifetime duration. J. Market. 67, 1 (2003), 77–99.Google ScholarCross Ref
[61] Rigdon Joseph, Baiocchi Michael, and Basu Sanjay. 2018. Preventing false discovery of heterogeneous treatment effect subgroups in randomized trials. Trials 19, 1 (2018), 1–15.Google ScholarCross Ref
[62] Robinson Peter M.. 1988. Root-N-consistent semiparametric regression. Econometrica 56, 4 (1988), 931–954.Google ScholarCross Ref
[63] Sakai Tetsuya, Tao Sijie, and Zeng Zhaohao. 2022. Relevance assessments for web search evaluation: Should we randomise or prioritise the pooled documents? ACM Trans. Inf. Syst. 40, 4 (2022), 1–35.Google ScholarDigital Library
[64] Schmittlein David C., Morrison Donald G., and Colombo Richard. 1987. Counting your customers: Who-are they and what will they do next? Manage. Sci. 33, 1 (1987), 1–24.Google ScholarCross Ref
[65] Strimmer Korbinian. 2008. A unified approach to false discovery rate estimation. BMC Bioinf. 9, 1 (2008), 1–14.Google ScholarCross Ref
[66] Syrgkanis Vasilis, Lewis Greg, Oprescu Miruna, Hei Maggie, Battocchi Keith, Dillon Eleanor, Pan Jing, Wu Yifeng, Lo Paul, Chen Huigang, et al. 2021. Causal inference and machine learning in practice with econml and causalml: Industrial use cases at microsoft, tripadvisor, uber. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 4072–4073.Google ScholarDigital Library
[67] Taddy Matt, Gardner Matt, Chen Liyun, and Draper David. 2016. A nonparametric bayesian analysis of heterogenous treatment effects in digital experimentation. J. Bus. Econ. Stat. 34, 4 (2016), 661–672.Google ScholarCross Ref
[68] Tang Diane, Agarwal Ashish, O’Brien Deirdre, and Meyer Mike. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 17–26.Google ScholarDigital Library
[69] Tian Yuan, Zhou Ke, and Pelleg Dan. 2021. What and how long: Prediction of mobile app engagement. ACM Trans. Inf. Syst. 40, 1 (2021), 1–38.Google ScholarDigital Library
[70] Tran Christopher and Zheleva Elena. 2019. Learning triggers for heterogeneous treatment effects. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5183–5190.Google ScholarDigital Library
[71] Urgo Kelsey and Arguello Jaime. 2022. Understanding the “Pathway” towards a searcher’s learning objective. ACM Trans. Inf. Syst. 40, 4 (2022), 1–42.Google ScholarDigital Library
[72] Maaten Laurens Van der and Hinton Geoffrey. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008).Google Scholar
[73] Wager Stefan and Athey Susan. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113, 523 (2018), 1228–1242.Google ScholarCross Ref
[74] Wang Chao, Zhu Hengshu, Wang Peng, Zhu Chen, Zhang Xi, Chen Enhong, and Xiong Hui. 2021. Personalized and explainable employee training course recommendations: A bayesian variational approach. ACM Trans. Inf. Syst. 40, 4 (2021), 1–32.Google ScholarDigital Library
[75] Wang Hongwei and Leskovec Jure. 2021. Combining graph convolutional neural networks and label propagation. ACM Trans. Inf. Syst. 40, 4 (2021), 1–27.Google ScholarDigital Library
[76] Wang Hao, Lian Defu, Tong Hanghang, Liu Qi, Huang Zhenya, and Chen Enhong. 2021. HyperSoRec: Exploiting hyperbolic user and item representations with multiple aspects for social-aware recommendation. ACM Trans. Inf. Syst. 40, 2 (2021), 1–28.Google ScholarDigital Library
[77] Wang Lili, Huang Chenghan, Lu Ying, Ma Weicheng, Liu Ruibo, and Vosoughi Soroush. 2021. Dynamic structural role node embedding for user modeling in evolving networks. ACM Trans. Inf. Syst. 40, 3 (2021), 1–21.Google ScholarDigital Library
[78] Wang Wei, Liu Jiaying, Tang Tao, Tuarob Suppawong, Xia Feng, Gong Zhiguo, and King Irwin. 2020. Attributed collaboration network embedding for academic relationship mining. ACM Trans. Web 15, 1 (2020), 1–20.Google ScholarDigital Library
[79] Xie Huizhi and Aurisset Juliette. 2016. Improving the sensitivity of online controlled experiments: Case studies at netflix. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 645–654.Google ScholarDigital Library
[80] Xie Yuxiang, Chen Nanyu, and Shi Xiaolin. 2018. False discovery rate controlled heterogeneous treatment effect detection for online controlled experiments. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 876–885.Google ScholarDigital Library
[81] Xiong Tao, Wang Yong, and Zheng Senlie. 2020. Orthogonal Traffic Assignment in Online Overlapping A/B Tests. Technical Report. Tencent EasyChair Whitepaper.Google Scholar
[82] Yao Jing, Dou Zhicheng, and Wen Ji-Rong. 2021. Clarifying ambiguous keywords with personal word embeddings for personalized search. ACM Trans. Inf. Syst. 40, 3 (2021), 1–29.Google ScholarDigital Library
[83] Zhang Peng, Liu Baoxi, Lu Tun, Ding Xianghua, Gu Hansu, and Gu Ning. 2022. Jointly predicting future content in multiple social media sites based on multi-task learning. ACM Trans. Inf. Syst. 40, 4 (2022), 1–28.Google ScholarDigital Library

Index Terms

Examining User Heterogeneity in Digital Experiments
1. Computing methodologies
  1. Machine learning

Recommendations

Automated user modeling for personalized digital libraries

Digital libraries (DLs) have become one of the most typical ways of accessing any kind of digitalized information. Due to this key role, users welcome any improvements on the services they receive from DLs. One trend used to improve digital services is ...
Read More
Evaluating Intelligent User Interfaces with User Experiments
IUI '16 Companion: Companion Publication of the 21st International Conference on Intelligent User Interfaces

User experiments are an essential tool to evaluate the user experience of intelligent user interfaces. This tutorial teaches the practical aspects of designing and setting up user experiments, as well as state-of-the-art methods to statistically ...
Read More
Cross-representation mediation of user models

Personalization is considered a powerful methodology for improving the effectiveness of information search and decision making. It has led to the dissemination of systems capable of suggesting relevant and personalized information (or items) to the users,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 41, Issue 4
October 2023
958 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3587261
Editor:
Min Zhang
Tsinghua University, China
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2023
- Online AM: 12 January 2023
- Accepted: 17 December 2022
- Revised: 20 October 2022
- Received: 8 April 2022
Published in tois Volume 41, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Heterogeneous treatment effects
digital experiments
user heterogeneity
user modeling
double machine learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 451
  Total Downloads
- Downloads (Last 12 months)308
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Examining User Heterogeneity in Digital Experiments

ACM Transactions on Information Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Automated user modeling for personalized digital libraries

Evaluating Intelligent User Interfaces with User Experiments

Cross-representation mediation of user models