Skip to main content
Log in

A fast and low idle time method for mining frequent patterns in distributed and many-task computing environments

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Association rules mining has attracted much attention among data mining topics because it has been successfully applied in various fields to find the association between purchased items by identifying frequent patterns (FPs). Currently, databases are huge, ranging in size from terabytes to petabytes. Although past studies can effectively discover FPs to deduce association rules, the execution efficiency is still a critical problem, particularly for big data. Progressive size working set (PSWS) and parallel FP-growth (PFP) are state-of-the-art methods that have been applied successfully to parallel and distributed computing technology to improve mining processing time in many-task computing, thereby bridging the gap between high-throughput and high-performance computing. However, such methods cannot mine before obtaining a complete FP-tree or the corresponding subdatabase, causing a high idle time for computing nodes. We propose a method that can begin mining when a small part of an FP-tree is received. The idle time of computing nodes can be reduced, and thus, the time required for mining can be reduced effectively. Through an empirical evaluation, the proposed method is shown to be faster than PSWS and PFP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Adnan, M., Alhajj, R.: DRFP-tree: disk-resident frequent pattern tree. Appl. Intell. 30, 84–97 (2009)

    Article  Google Scholar 

  2. Agrawal, R., Shafer, J.C.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)

    Article  Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th international conference very large data bases, VLDB, pp. 487–499 (1994)

  4. Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center, San Jose (2009)

    Google Scholar 

  5. Baralis, E., Cerquitelli, T., Chiusano, S., Grand, A.: P-mine: parallel itemset mining on large datasets. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), IEEE, pp. 266–271 (2013)

  6. Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp. 254–260 (1999)

  7. Buehrer, G., de Oliveira, R.L., Fuhry, D., Parthasarathy, S.: Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In: IEEE 31st International Conference on Data Engineering (ICDE), IEEE, pp. 1071–1082 (2015)

  8. Buehrer, G., Parthasarathy, S., Tatikonda, S., Kurc, T., Saltz, J.: Toward terabyte pattern mining: an architecture-conscious solution. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 2–12 (2007)

  9. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  10. Eggen, M., Eggen, R.: Java versus MPI in a distributed environment. In: PDPTA, pp. 390–395 (1999)

  11. Ezeife, C., Zhang, D.: TidFP: mining frequent patterns in different databases with transaction ID. In: Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery, Springer, pp. 125–137 (2009)

  12. Geurts, K., Wets, G., Brijs, T., Vanhoof, K.: Profiling of high-frequency accident locations by use of association rules. Transp. Res. Rec. 2003, 123–130 (1840)

    Google Scholar 

  13. Goethals, B., Zaki, M.J.: Frequent itemset mining dataset repository. In: Frequent Itemset Mining Implementations (FIMI 2003) (2003)

  14. Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: FIMI, pp. 123–132 (2003)

  15. Grahne, G., Zhu, J.: Mining frequent itemsets from secondary memory. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM’04, IEEE, pp. 91–98 (2004)

  16. Hadoop, A.: Hadoop (2009). http://hadoop.apache.org/

  17. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, ACM, pp. 1–12 (2000)

    Article  Google Scholar 

  18. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Disc. 8, 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  19. Huang, D., Song, Y., Routray, R., Qin, F.: Smart cache: an optimized mapreduce implementation of frequent itemset mining. In: 2015 IEEE International Conference on Cloud Engineering (IC2E), IEEE, pp. 16–25 (2015)

  20. Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16, 321–334 (2004)

    Article  Google Scholar 

  21. Lai, Y., ZhongZhi, S.: An efficient data mining framework on Hadoop using Java persistence API. In: 2010 IEEE 10th International Conference on Computer and Information Technology (CIT), IEEE, pp. 203–209 (2010)

  22. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, ACM, pp. 107–114 (2008)

  23. Liang, Y.-H., Wu, S.-Y.: Sequence-growth: a scalable and effective frequent itemset mining algorithm for big data based on mapreduce framework. In: 2015 IEEE International Congress on Big Data (BigData Congress), IEEE, pp. 393–400 (2015)

  24. Lin, K.W., Chung, S.-H.: A fast and resource efficient mining algorithm for discovering frequent patterns in distributed computing environments. Fut. Gener. Comput. Syst. 52, 49–58 (2015)

    Article  Google Scholar 

  25. Lin, K.W., Chung, S.-H., Lin, C.-C.: A fast and distributed algorithm for mining frequent patterns in congested networks. Computing 98, 235–256 (2016)

    Article  MathSciNet  Google Scholar 

  26. Lin, K.W., Deng, D.-J.: A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments. Int. J. Ad Hoc Ubiquitous Comput. 6, 205–215 (2010)

    Article  Google Scholar 

  27. Lin, K.W., Lo, Y.-C.: Efficient algorithms for frequent pattern mining in many-task computing environments. Knowl. Based Syst. 49, 10–21 (2013)

    Article  Google Scholar 

  28. Lin, W.-T., Chu, C.-P.: Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments. Int. J. Parallel Emerg. Distrib. Syst. 30, 380–392 (2014)

    Article  Google Scholar 

  29. Liu, J., Wu, Y., Zhou, Q., Fung, B.C., Chen, F., Yu, B.: Parallel eclat for opportunistic mining of frequent itemsets. In: Database and Expert Systems Applications, Springer, pp. 401–415 (2015)

  30. Lucchese, C., Orlando, S., Perego, R.: Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007, IEEE, pp. 242–251 (2007)

  31. Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transactional dataset. In: FIMI (2004)

  32. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: 2013 IEEE International Conference on Big Data, IEEE, pp. 111–118 (2013)

  33. Qiu, H., Gu, R., Yuan, C., Huang, Y.: Yafim: a parallel frequent itemset mining algorithm with spark. In: Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, IEEE, pp. 1664–1671 (2014)

  34. Qiu, Y., Lan, Y.-J., Xie, Q.-S.: An improved algorithm of mining from FP-tree. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, IEEE, pp. 1665–1670, 2004

  35. Schlegel, B., Gemulla, R., Lehner, W.: Memory-efficient frequent-itemset mining. In: Proceedings of the 14th International Conference on Extending Database Technology, ACM, pp. 461–472 (2011)

  36. Spark, A.: Spark. https://spark.apache.org/

  37. Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, ACM, p. 5 (2013)

  38. Vu, L., Alaghband, G.: Novel parallel method for mining frequent patterns on multi-core shared memory systems. In: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, ACM, pp. 49–54 (2013)

  39. Wang, Y., Parthasarathy, S., Sadayappan, P.: Stratification driven placement of complex data: a framework for distributed data analytics. In: IEEE 29th International Conference on Data Engineering (ICDE), pp. 709–720 (2013)

  40. Wu, X., Fan, W., Peng, J., Zhang, K., Yu, Y.: Iterative sampling based frequent itemset mining for big data. Int. J. Mach. Learn. Cybern 6, 875–882 (2015)

    Article  Google Scholar 

  41. Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26, 97–107 (2014)

    Article  Google Scholar 

  42. Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of A-Priori algorithm based on Hadoop-MapReduce model. Int. J. Rev. Comput. 12 (2012)

  43. Yang, L., Shi, Z., Xu, L.D., Liang, F., Kirsh, I.: DH-TRIE frequent pattern mining on Hadoop using JPA. In: 2011 IEEE International Conference on Granular Computing (GrC), pp. 875–878 (2011)

  44. Yang, X.Y., Liu, Z., Fu, Y.: MapReduce as a programming model for association rules algorithm on Hadoop. In: 2010 3rd International Conference on Information Sciences and Interaction Sciences (ICIS), IEEE, pp. 99–102 (2010)

  45. Yen, S.-J., Lee, Y.-S., Wang, Y.-S., Wu, J.-W., Ouyang, L.-Y.: The studies of mining frequent patterns based on frequent pattern tree. In: Advances in Knowledge Discovery and Data Mining, Springer, pp. 232–241 (2009)

  46. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12, 372–390 (2000)

    Article  Google Scholar 

  47. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., Stolorz, P., Musick, R.: Parallel algorithms for discovery of association rules. In: Scalable High Performance Computing for Knowledge Discovery and Data Mining, Springer, pp. 5–35 (1997)

  48. Zhang, F., Liu, M., Gui, F., Shen, W., Shami, A., Ma, Y.: A distributed frequent itemset mining algorithm using Spark for Big Data analytics. Clust. Comput. 18, 1493–1501 (2015)

    Article  Google Scholar 

  49. Zhou, J., Yu, K.-M.: Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters. In: Advances in Grid and Pervasive Computing, Springer, pp. 18–28 (2008)

  50. Zhou, J., Yu, K.-M.: Balanced Tidset-based parallel FP-tree algorithm for the frequent pattern mining on grid system. In: Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid, IEEE Computer Society, pp. 103–108 (2008)

Download references

Acknowledgement

This work was supported by the Ministry of Science and Technology of Taiwan, R.O.C., under Grant Nos. MOST 104-2221-E-151 -055 and 105-2221-E-151 -056.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kawuu W. Lin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, CC., Chung, SH., Chen, JC. et al. A fast and low idle time method for mining frequent patterns in distributed and many-task computing environments. Distrib Parallel Databases 36, 613–641 (2018). https://doi.org/10.1007/s10619-018-7221-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7221-9

Keywords

Navigation