Data Mining Methods for Recommender Systems
In this chapter, we give an overview of the main Data Mining techniques used in the context of Recommender Systems. We first describe common preprocessing methods such as sampling or dimensionality reduction. Next, we review the most important classification techniques, including Bayesian Networks and Support Vector Machines. We describe the k-means clustering algorithm and discuss several alternatives. We also present association rules and related algorithms for an efficient training process. In addition to introducing these techniques, we survey their uses in Recommender Systems and present cases where they have been successfully applied.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
eBook EUR 277.13 Price includes VAT (France)
Softcover Book EUR 348.14 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Recommender Systems: Techniques, Applications, and Challenges
Chapter © 2022
Recent Advances in Recommender Systems and Future Directions
Chapter © 2015
A Fast Learning Recommender Estimating Preferred Ranges of Features
Chapter © 2019
Notes
Note that a similarity measure is not a preprocessing step in itself but rather a prerequisite for being able to execute other data mining processes.
References
- G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005. ArticleGoogle Scholar
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases, 1994. Google Scholar
- A. Ahmed and E. Xing. Scalable dynamic nonparametric bayesian models of content and users. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, IJCAI’13, pages 3111–3115. AAAI Press, 2013. Google Scholar
- X. Amatriain. Big & personal: data and models behind netflix recommendations. In Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pages 1–6. ACM, 2013. Google Scholar
- X. Amatriain. Mining large streams of user data for personalized recommendations. ACM SIGKDD Explorations Newsletter, 14(2):37–48, 2013. ArticleGoogle Scholar
- X. Amatriain, N. Lathia, J. M. Pujol, H. Kwak, and N. Oliver. The wisdom of the few: A collaborative filtering approach based on expert opinions from the web. In Proc. of SIGIR ’09, 2009. Google Scholar
- X. Amatriain, J. M. Pujol, and N. Oliver. I like it… i like it not: Evaluating user ratings noise in recommender systems. In UMAP ’09, 2009. Google Scholar
- X. Amatriain, J. M. Pujol, N. Tintarev, and N. Oliver. Rate it again: Increasing recommendation accuracy by user re-rating. In Recys ’09, 2009. Google Scholar
- M. Anderson, M. Ball, H. Boley, S. Greene, N. Howse, D. Lemire, and S. McGrath. Racofi: A rule-applying collaborative filtering system. In Proc. IEEE/WIC COLA’03, 2003. Google Scholar
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117–122, Jan. 2008. ArticleGoogle Scholar
- B. D. Baets. Growing decision trees in an ordinal setting. International Journal of Intelligent Systems, 2003. Google Scholar
- S. Balakrishnan and S. Chopra. Collaborative ranking. In Proceedings of the fifth ACM international conference on Web search and data mining, pages 143–152. ACM, 2012. Google Scholar
- S. Banerjee and K. Ramanathan. Collaborative filtering on skewed datasets. In Proc. of WWW ’08, 2008. Google Scholar
- D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press. 2012. Google Scholar
- C. Basu, H. Hirsh, and W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 714–720. AAAI Press, 1998. Google Scholar
- C. Basu, H. Hirsh, and W. Cohen. Recommendation as classification: Using social and content-based information in recommendation. In AAAI Workshop on Recommender Systems, 1998. Google Scholar
- R. M. Bell, Y. Koren, and C. Volinsky. The bellkor solution to the netflix prize. Technical report, AT&T Labs – Research, 2007. Google Scholar
- A. Bhasin. Beyond ratings and followers. In Proceedings of the 6th ACM Conference on Recommender Systems, RecSys ’12, 2012. Google Scholar
- C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, Mar. 2003. MATHGoogle Scholar
- A. Bouza, G. Reif, A. Bernstein, and H. Gall. Semtree: ontology-based decision tree algorithm for recommender systems. In International Semantic Web Conference, 2008. Google Scholar
- A. Bozzon, G. Prandi, G. Valenzise, and M. Tagliasacchi. A music recommendation system based on semantic audio segments similarity. In Proceeding of Internet and Multimedia Systems and Applications - 2008, 2008. Google Scholar
- M. Brand. Fast online svd revisions for lightweight recommender systems. In SIAM International Conference on Data Mining (SDM), 2003. Google Scholar
- J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, page 43–52, 1998. Google Scholar
- L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. ArticleMATHGoogle Scholar
- R. Burke. Hybrid web recommender systems. pages 377–408. 2007. Google Scholar
- W. Cheng, J. Hühn, and E. Hüllermeier. Decision tree and instance-based learning for label ranking. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 161–168, New York, NY, USA, 2009. ACM. Google Scholar
- Y. Cho, J. Kim, and S. Kim. A personalized recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, 2002. Google Scholar
- C. Christakou and A. Stafylopatis. A hybrid movie recommender system based on neural networks. In ISDA ’05: Proceedings of the 5th International Conference on Intelligent Systems Design and Applications, pages 500–505, 2005. Google Scholar
- W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the 12th International Conference, 1995. Google Scholar
- M. Connor and J. Herlocker. Clustering items for collaborative filtering. In SIGIR Workshop on Recommender Systems, 2001. Google Scholar
- T. Cover and P. Hart. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1):21–27, 1967. ArticleMATHGoogle Scholar
- N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, March 2000. BookGoogle Scholar
- S. Deerwester, S. T. Dumais, G. W. Furnas, L. T. K., and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 1990. Google Scholar
- M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143–177, 2004. ArticleGoogle Scholar
- I. D. E. Montanés, J.-R. Quevedo and J. Ranilla. Collaborative tag recommendation system based on logistic regression. In ECML PKDD Discovery Challenge 09, 2009. Google Scholar
- B. S. et al. Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In Proceedings of the Fifth International Conference on Computer and Information Technology, 2002. Google Scholar
- K. O. et al. Context-aware svm for context-dependent information recommendation. In International Conference On Mobile Data Management, 2006. Google Scholar
- P. T. et al. Introduction to Data Mining. Addison Wesley, 2005. Google Scholar
- S. G. et al. Tv content recommender system. In AAAI/IAAI 2000, 2000. Google Scholar
- S. H. et al. Aimed- a personalized tv recommendation system. In Interactive TV: a Shared Experience, 2007. Google Scholar
- T. B. et al. A trail based internet-domain recommender system using artificial neural networks. In Proceedings of the Int. Conf. on Adaptive Hypermedia and Adaptive Web Based Systems, 2002. Google Scholar
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res., 4:933–969, 2003. MathSciNetGoogle Scholar
- B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 307, 2007. Google Scholar
- J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232, 2001. Google Scholar
- N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Mach. Learn., 29(2–3):131–163, 1997. ArticleMATHGoogle Scholar
- S. Funk. Netflix update: Try this at home, 2006. Google Scholar
- R. Ghani and A. Fano. Building recommender systems using a knowledge base of product semantics. In In 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems, 2002. Google Scholar
- N. Golbandi, Y. Koren, and R. Lempel. Adaptive bootstrapping of recommender systems using decision trees. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 595–604. ACM, 2011. Google Scholar
- K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time collaborative filtering algorithm. Journal Information Retrieval, 4(2):133–151, July 2001. ArticleMATHGoogle Scholar
- G. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numerische Mathematik, 14(5):403–420, April 1970. ArticleMATHMathSciNetGoogle Scholar
- E. Gose, R. Johnsonbaugh, and S. Jost. Pattern Recognition and Image Analysis. Prentice Hall, 1996. Google Scholar
- S. Guha, R. Rastogi, and K. Shim. Rock: a robust clustering algorithm for categorical attributes. In Proc. of the 15th Int’l Conf. On Data Eng., 1999. Google Scholar
- J. A. Hartigan. Clustering Algorithms (Probability & Mathematical Statistics). John Wiley & Sons Inc, 1975. Google Scholar
- J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004. ArticleGoogle Scholar
- Z. Huang, D. Zeng, and H. Chen. A link analysis approach to recommendation under sparse data. In Proceedings of AMCIS 2004, 2004. Google Scholar
- A. Isaksson, M. Wallman, H. Göransson, and M. G. Gustafsson. Cross-validation and bootstrapping are unreliable in small sample classification. Pattern Recognition Letters, 29:1960–1965, 2008. ArticleGoogle Scholar
- X. Jin, Y. Zhou, and B. Mobasher. A maximum entropy web recommendation system: Combining collaborative and content features. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pages 612–617, New York, NY, USA, 2005. ACM. Google Scholar
- I. T. Jolliffe. Principal Component Analysis. Springer, 2002. Google Scholar
- H. Kang and S. Yoo. Svm and collaborative filtering-based prediction of user preference for digital fashion recommendation systems. IEICE Transactions on Inf & Syst, 2007. Google Scholar
- Y. Koren. The bellkor solution to the netflix grand prize. Netflix prize documentation, 2009. Google Scholar
- R. Krestel, P. Fankhauser, and W. Nejdl. Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems, pages 61–68. ACM, 2009. Google Scholar
- M. Kurucz, A. A. Benczur, and K. Csalogany. Methods for large scale svd with missing values. In Proceedings of KDD Cup and Workshop 2007, 2007. Google Scholar
- N. Lathia, S. Hailes, and L. Capra. The effect of correlation coefficients on communities of recommenders. In SAC ’08: Proceedings of the 2008 ACM symposium on Applied computing, pages 2000–2005, New York, NY, USA, 2008. ACM. Google Scholar
- W. Lin and S. Alvarez. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery Journal, 6(1), 2004. Google Scholar
- M. R. McLaughlin and J. L. Herlocker. A collaborative filtering algorithm and evaluation metric that accurately model the user experience. In Proc. of SIGIR ’04, 2004. Google Scholar
- S. M. McNee, J. Riedl, and J. A. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI ’06: CHI ’06 extended abstracts on Human factors in computing systems, pages 1097–1101, New York, NY, USA, 2006. ACM Press. Google Scholar
- K. Miyahara and M. J. Pazzani. Collaborative filtering with the simple bayesian classifier. In Pacific Rim International Conference on Artificial Intelligence, 2000. Google Scholar
- B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Effective personalization based on association rule discovery from web usage data. In Workshop On Web Information And Data Management, WIDM ’01, 2001. Google Scholar
- K. P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012. Google Scholar
- D. Nikovski and V. Kulev. Induction of compact decision trees for personalized recommendation. In SAC ’06: Proceedings of the 2006 ACM symposium on Applied computing, pages 575–581, New York, NY, USA, 2006. ACM. Google Scholar
- M. P. O’mahony. Detecting noise in recommender system databases. In In Proceedings of the International Conference on Intelligent User Interfaces (IUI’06), 29th–1st, pages 109–115. ACM Press, 2006. Google Scholar
- D. Parra, A. Karatzoglou, X. Amatriain, and I. Yavuz. Implicit feedback recommendation via implicit-to-explicit ordinal logistic regression mapping. 2011. Google Scholar
- A. Paterek. Improving regularized singular value decomposition for collaborative filtering. In Proceedings of KDD Cup and Workshop 2007, 2007. Google Scholar
- M. J. Pazzani. A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 13:393–408, 1999. ArticleGoogle Scholar
- M. J. Pazzani and D. Billsus. Learning and revising user profiles: The identification of interesting web sites. Machine Learning, 27(3):313–331, 1997. ArticleGoogle Scholar
- V. Pronk, W. Verhaegh, A. Proidl, and M. Tiemann. Incorporating user control into recommender systems based on naive bayesian classification. In RecSys ’07: Proceedings of the 2007 ACM conference on Recommender systems, pages 73–80, 2007. Google Scholar
- D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann, second edition, 1999. Google Scholar
- B. K. Q. Li. Clustering approach for hybrid recommender system. In Web Intelligence 03, 2003. Google Scholar
- J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, March 1986. Google Scholar
- T. T. R. Zhang and Y. Mao. Recommender systems from words of few mouths. In Proceedings of IJCAJ 11, 2011. Google Scholar
- J. F. S. Zhang, Y. Ouyang and F. Makedon. Analysis of a low-dimensional linear model under recommendation attacks. In Proc. of SIGIR ’06, 2006. Google Scholar
- R. Salakhutdinov, A. Mnih, and G. E. Hinton. Restricted Boltzmann machines for collaborative filtering. In Proc of ICML ’07, New York, NY, USA, 2007. ACM. Google Scholar
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental svd-based algorithms for highly scalable recommender systems. In 5th International Conference on Computer and Information Technology (ICCIT), 2002. Google Scholar
- B. M. Sarwar, G. Karypis, J. A. Konstan, and J. T. Riedl. Application of dimensionality reduction in recommender systems—a case study. In ACM WebKDD Workshop, 2000. Google Scholar
- A. Schclar, A. Tsikinovsky, L. Rokach, A. Meisels, and L. Antwarg. Ensemble methods for improving the performance of neighborhood-based collaborative filtering. In RecSys ’09: Proceedings of the third ACM conference on Recommender systems, pages 261–264, New York, NY, USA, 2009. ACM. Google Scholar
- B. Smyth, K. McCarthy, J. Reilly, D. O‘Sullivan, L. McGinty, and D. Wilson. Case studies in association rule mining for recommender systems. In Proc. of International Conference on Artificial Intelligence (ICAI ’05), 2005. Google Scholar
- E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: A large-scale study in the orkut social network. In Proceedings of the 2005 International Conference on Knowledge Discovery and Data Mining (KDD-05), 2005. Google Scholar
- Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101, 2004. Google Scholar
- M. Tiemann and S. Pauws. Towards ensemble learning for hybrid music recommendation. In RecSys ’07: Proceedings of the 2007 ACM conference on Recommender systems, pages 177–178, New York, NY, USA, 2007. ACM. Google Scholar
- A. Toescher, M. Jahrer, and R. Legenstein. Improved neighborhood-based algorithms for large-scale recommender systems. In In KDD-Cup and Workshop 08, 2008. Google Scholar
- L. H. Ungar and D. P. Foster. Clustering methods for collaborative filtering. In Proceedings of the Workshop on Recommendation Systems, 2000. Google Scholar
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, second edition, 2005. Google Scholar
- M. Wu. Collaborative filtering via ensembles of matrix factorizations. In Proceedings of KDD Cup and Workshop 2007, 2007. Google Scholar
- Z. Xia, Y. Dong, and G. Xing. Support vector machines for collaborative filtering. In ACM-SE 44: Proceedings of the 44th annual Southeast regional conference, pages 169–174, New York, NY, USA, 2006. ACM. Google Scholar
- J. Xu and K. Araki. A svm-based personal recommendation system for tv programs. In Multi-Media Modelling Conference Proceedings, 2006. Google Scholar
- G.-R. Xue, C. Lin, Q. Yang, W. Xi, H.-J. Zeng, Y. Yu, and Z. Chen. Scalable collaborative filtering using cluster-based smoothing. In Proceedings of the 2005 SIGIR, 2005. Google Scholar
- K. Yu, V. Tresp, and S. Yu. A nonparametric hierarchical bayesian framework for information filtering. In SIGIR ’04, 2004. Google Scholar
- Y. Zhang and J. Koren. Efficient bayesian hierarchical user modeling for recommendation system. In SIGIR 07, 2007. Google Scholar
- C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In Proc. of WWW ’05, 2005. Google Scholar
- J. Zurada. Introduction to artificial neural systems. West Publishing Co., St. Paul, MN, USA, 1992. Google Scholar
Author information
Authors and Affiliations
- Netflix, 100 Winchester Cr., Los Gatos, CA, 95032, USA Xavier Amatriain
- Quora, 150 Castro St., Mountain View, USA Xavier Amatriain
- Cliqz, Rosenkavalierplatz 10, 81925, Munich, Germany Josep M. Pujol
- Xavier Amatriain