Abstract
We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian classifier may easily be extended to revise user provided profiles. In an experimental evaluation we compare the Bayesian classifier to computationally more intensive alternatives, and show that it performs at least as well as these approaches throughout a range of different domains. In addition, we empirically analyze the effects of providing the classifier with background knowledge in form of user defined profiles and examine the use of lexical knowledge for feature selection. We find that both approaches can substantially increase the prediction accuracy.
Article PDF
Similar content being viewed by others
References
Armstrong, R., Freitag, D., Joachims, T., & Mitchell, T. (1995).WebWatcher: A learning apprentice for the World Wide Web. Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments (pp. 6–12). Palo Alto, CA.
Balabanovic, Shoham, & Yun. (1995). An adaptive agent for automated web browsing (Technical Report CS-TN–97–52). Stanford University, Palo Alto, CA.
Cost, S., & Salzberg, S. (1993). A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning, 10:57–78.
Croft, W.B., & Harper, D. (1979). Using probabilistic models of document retrieval without relevance. Journal of Documentation, 35:285–295.
Domingos, P., & Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the Simple Bayesian Classifier. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 105–112). Morgan Kaufmann, San Fransico, CA.
Duda, R., & Hart, P. (1973). Pattern Classification and Scene Analysis. John Wiley & Sons, New York.
Harman, D.K. (1994). Overview of the second Text Retrieval Conference (TREC-2). Proceedings of the Second Text Retrieval Conference.TREC-2/, NIST Special Publication.
Heckerman, D. (1995). A Tutorial on Learning with Bayesian Networks (Technical Report MSR-TR–95–06). Microsoft Corporation.
Ittner, D., Lewis, D., & Ahn, D. (1995). Text categorization of low quality images. Symposium on Document Analysis and Information Retrieval (pp. 301–315). UNLV, Las Vegas, NV, ISRI.
John, G., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. Proceedings of the Eleventh International Conference on Machine Learning (pp. 121–138). New Brunswick, NJ.
Kittler, J. (1986). Feature selection and extraction. In Young, & Fu, (Eds.), Handbook of Pattern Recognition and Image Processing. Academic Press, New York.
Kononenko, I. (1990). Comparison of inductive and naive Bayesian learning approaches to automatic knowledge acquisition. In B. Wielinga (Ed.), Current Trends in Knowledge Acquisition. IOS Press, Amsterdam.
Lang, K. (1995). NewsWeeder: Learning to filter news. Proceedings of the Twelfth International Conference on Machine Learning (pp. 331–339). Lake Tahoe, CA.
Lashkari, Y. (1995). The WebHound Personalized Document Filtering System. http://rg.media.mit.edu/ projects/webhound/
Lewis, D. (1992). Representation and learning in information retrieval. Doctoral dissertation, Department of Computer and Information Science, University of Massachusetts.
Lieberman, H. (1995). Letizia: An agent that assists web browsing. Proceedings of the International Joint Conference on Artificial Intelligence (pp. 924–929), Montreal, August 1995.
Maron, M. (1961). Automatic indexing: An experimental inquiry. Journal of the Association for Computing Machinery, 8:404–417.
Mauldin, M., & Leavitt, J. (1994).Web agent related research at the center for machine translation. Proceedings of the ACM Special Interest Group on Networked Information Discovery and Retrieval. The MITRE Corporation, McLean, Virgiana.
Miller, G. (1991). WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.
Minsky, M., & Papert, S. (1969). Perceptrons. MIT Press, Cambridge, MA.
Pazzani, M., Muramatsu J., and Billsus, D. (1996). Syskill & Webert: Identifying interesting web sites. Proceedings of the National Conference on Artificial Intelligence (pp. 54–61). Portland, OR.
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Rachlin, Kasif, Salzberg, & Aha, (1994). Towards a better understanding of memory-based reasoning systems. Proceedings of the Eleventh International Conference on Machine Learning (pp. 242–250). New Brunswick, NJ.
Rocchio, J. (1971). Relevance feedback information retrieval. In Gerald Salton (Ed.), The SMART Retrieval System-Experiments in Automated Document Processing (pp. 313–323). Prentice-Hall, Englewood Cliffs, NJ.
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representations by error propagation. In D. Rumelhart & J. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, (pp. 318–362). MIT Press, Cambridge, MA.
Salton, G. (1989). Automatic Text Processing. Addison-Wesley.
Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41:288–297.
Skalak, D. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the Eleventh International Conference on Machine Learning (pp. 293–301). New Brunswick, NJ.
Stanfill, C., & Waltz, D. (1986). Towards memory-based reasoning. Communications of the ACM, 29:1213–1228.
Widrow, G., & Hoff, M. (1960). Adaptive switching circuits. Institute of Radio Engineers, Western Electronic Show and Convention, Convention Record, Part 4.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Pazzani, M., Billsus, D. Learning and Revising User Profiles: The Identification of Interesting Web Sites. Machine Learning 27, 313–331 (1997). https://doi.org/10.1023/A:1007369909943
Issue Date:
DOI: https://doi.org/10.1023/A:1007369909943