Abstract
Constraint Programming is becoming competitive for solving certain data-mining problems largely due to the development of global constraints. We introduce the CoverSize constraint for itemset mining problems, a global constraint for counting and constraining the number of transactions covered by the itemset decision variables. We show the relation of this constraint to the well-known table constraint, and our filtering algorithm internally uses the reversible sparse bitset data structure recently proposed for filtering table. Furthermore, we expose the size of the cover as a variable, which opens up new modelling perspectives compared to an existing global constraint for (closed) frequent itemset mining. For example, one can constrain minimum frequency or compare the frequency of an itemset in different datasets as is done in discriminative itemset mining. We demonstrate experimentally on the frequent, closed and discriminative itemset mining problems that the CoverSize constraint with reversible sparse bitsets allows to outperform other CP approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This is similar to the update of currTable in [14] for filtering table constraints.
- 2.
As for many NP-hard global constraints like bin-packing, cumulative, circuit, etc.
- 3.
A related generic technique in CP is shaving [26].
- 4.
- 5.
- 6.
http://research.nii.ac.jp/~uno/codes.htm (v3 is fastest of all versions in our experiments).
- 7.
References
Aggarwal, C.C.: An introduction to frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 1–17. Springer, Cham (2014). doi:10.1007/978-3-319-07821-2_1
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. Int. Conf. Manag. Data (SIGMOD) 22(2), 207–216 (1993)
Aoga, J.O.R., Guns, T., Schaus, P.: An efficient algorithm for mining frequent sequence with constraint programming. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS, vol. 9852, pp. 315–330. Springer, Cham (2016). doi:10.1007/978-3-319-46227-1_20
Aoga, J.O., Guns, T., Schaus, P.: Mining time-constrained sequential patterns with constraint programming. Constraints 22, 1–23 (2017)
Bessiere, C., Régin, J.C.: Arc consistency for general constraint networks: preliminary results. In: International Joint Conference on Artificial Intelligence (IJCAI) (1997)
Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 35–42, November 2004
Borgelt, C.: Efficient implementations of Apriori and Eclat. In: FIMI: Workshop on Frequent Itemset Mining Implementations (2003)
Borgelt, C.: Frequent item set mining. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 2(6), 437–456 (2012)
Bringmann, B., Zimmermann, A.: Tree 2 – decision trees for tree structured data. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS, vol. 3721, pp. 46–58. Springer, Heidelberg (2005). doi:10.1007/11564126_10
Bringmann, B., Zimmermann, A., Raedt, L., Nijssen, S.: Don’t be afraid of simpler patterns. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS, vol. 4213, pp. 55–66. Springer, Heidelberg (2006). doi:10.1007/11871637_10
Cheng, H., Yan, X., Han, J., Philip, S.Y.: Direct discriminative pattern mining for effective classification. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 169–178. IEEE (2008)
Cheng, K.C., Yap, R.H.: An MDD-based generalized arc consistency algorithm for positive and negative table constraints and some global constraints. Constraints 15(2), 265–304 (2010)
De Raedt, L., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 204–212 (2008)
Demeulenaere, J., Hartert, R., Lecoutre, C., Perez, G., Perron, L., Régin, J.-C., Schaus, P.: Compact-table: efficiently filtering table constraints with reversible sparse bit-sets. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 207–223. Springer, Cham (2016). doi:10.1007/978-3-319-44953-1_14
Deshpande, M., Kuramochi, M., Wale, N., Karypis, G.: Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17(8), 1036–1050 (2005)
Fan, W., Zhang, K., Cheng, H., Gao, J., Yan, X., Han, J., Yu, P., Verscheure, O.: Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 230–238. ACM (2008)
Gecode Team: Gecode: generic constraint development environment (2006). http://www.gecode.org
Google: Google optimization tools (2015). https://developers.google.com/optimization/
Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)
Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k SAT problem. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8190, pp. 403–418. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_26
Jabbour, S., Sais, L., Salhi, Y.: Mining top-k motifs with a sat-based framework. Artif. Intell. 244, 30–47 (2017)
Kemmar, A., Loudni, S., Lebbah, Y., Boizumault, P., Charnois, T.: PREFIX-PROJECTION global constraint for sequential pattern mining. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 226–243. Springer, Cham (2015). doi:10.1007/978-3-319-23219-5_17
Knuth, D.: The Art of Computer Programming: Combinatorial Algorithms, vol. 4. Addison-Wesley, Upper Saddle River (2015)
Lazaar, N., Lebbah, Y., Loudni, S., Maamar, M., Lemière, V., Bessiere, C., Boizumault, P.: A global constraint for closed frequent pattern mining. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 333–349. Springer, Cham (2016). doi:10.1007/978-3-319-44953-1_22
Lecoutre, C.: STR2: optimized simple tabular reduction for table constraints. Constraints 16(4), 341–371 (2011)
Lhomme, O.: Quick shaving. In: Proceedings of the 20th National Conference on Artificial Intelligence, vol. 1. pp. 411–415. AAAI Press (2005)
Morishita, S., Sese, J.: Traversing itemset lattice with statistical metric pruning. In: Vianu, V., Gottlob, G. (eds.) Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 15–17 May 2000, Dallas, Texas, USA, pp. 226–236. ACM (2000)
Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining Data Mining (ICDM), pp. 557–566. IEEE (2013)
Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: Michel, L. (ed.) CPAIOR 2015. LNCS, vol. 9075, pp. 288–305. Springer, Cham (2015). doi:10.1007/978-3-319-18008-3_20
Nijssen, S., Guns, T.: Integrating constraint programming and itemset mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6322, pp. 467–482. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15883-4_30
Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in ROC space: a constraint programming approach. In: International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 647–656. ACM (2009)
Nijssen, S., Zimmermann, A.: Constraint-based pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 147–163. Springer, Cham (2014). doi:10.1007/978-3-319-07821-2_7
OscaR Team: OscaR: Scala in OR (2012). https://bitbucket.org/oscarlib/oscar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999). doi:10.1007/3-540-49257-7_25
Perez, G., Régin, J.-C.: Improving GAC-4 for table and MDD constraints. In: O’Sullivan, B. (ed.) CP 2014. LNCS, vol. 8656, pp. 606–621. Springer, Cham (2014). doi:10.1007/978-3-319-10428-7_44
Rácz, B.: nonordfp: an FP-growth variation without rebuilding the FP-tree. In: FIMI: Workshop on Frequent Itemset Mining Implementations (2004)
de Saint-Marcq, V.l.C., Schaus, P., Solnon, C., Lecoutre, C.: Sparse-sets for domain implementation. In: CP workshop on - Techniques foR Implementing Constraint programming Systems (TRICS), pp. 1–10 (2013)
Stuckey, P.J., Becket, R., Fischer, J.: Philosophy of the minizinc challenge. Constraints 15(3), 307–316 (2010)
Uno, T., Kiyomi, M., Arimura, H.: LCM Ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM 2005), pp. 77–86. ACM (2005)
Acknowledgments
The research is supported by the FRIA-FNRS (Fonds pour la Formation à la Recherche dans l’Industrie et dans l’Agriculture, Belgium) and FWO (Research Foundation – Flanders). We also thank Willard Zhan for his help with the reduction proof.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Schaus, P., Aoga, J.O.R., Guns, T. (2017). CoverSize: A Global Constraint for Frequency-Based Itemset Mining. In: Beck, J. (eds) Principles and Practice of Constraint Programming. CP 2017. Lecture Notes in Computer Science(), vol 10416. Springer, Cham. https://doi.org/10.1007/978-3-319-66158-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-66158-2_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66157-5
Online ISBN: 978-3-319-66158-2
eBook Packages: Computer ScienceComputer Science (R0)