Abstract
To understand current situation in specific scenarios, valuable knowledge should be mined from both historical data and emerging new data. However, most existing algorithms take the historical data and the emerging data as a whole and periodically repeat to analyze all of them, which results in heavy computation overhead. It is also challenging to accurately discover new knowledge in time, because the emerging data are usually small compared to the historical data. To address these challenges, we propose a novel knowledge discovery algorithm based on double evolving frequent pattern trees that can trace the dynamically evolving data by an incremental sliding window. One tree is used to record frequent patterns from the historical data, and the other one records incremental frequent items. The structures of the double frequent pattern trees and their relationships are updated periodically according to the emerging data and a sliding window. New frequent patterns are mined from the incremental data and new knowledge can be obtained from pattern changes. Evaluations show that this algorithm can discover new knowledge from evolving data with good performance and high accuracy.
- [1] . 2003. A Framework for Clustering Evolving Data Streams. In Proceedings 2003 VLDB Conference, , , , , , and (Eds.). Morgan Kaufmann, San Francisco, 81–92.
DOI: https://doi.org/10.1016/B978-012722442-8/50016-1 Google ScholarDigital Library - [2] . 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases. 487–499. Google ScholarDigital Library
- [3] . 2021. An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans. Emerg. Topics Computat. Intell.19–28.Google ScholarCross Ref
- [4] . 2020. Rare association rule mining from incremental databases. Pattern Anal. Applic. 23, 1 (2020), 113–134.
DOI: https://doi.org/10.1007/s10044-018-0759-3Google ScholarDigital Library - [5] . 2019. Effective selection of a compact and high-quality review set with information preservation. ACM Trans. Manag. Inf. Syst. 10, 4 (
Dec. 2019).DOI: https://doi.org/10.1145/3369395 Google ScholarDigital Library - [6] . 1996. Maintenance of discovered association rules in large databases: An incremental updating technique. In Proc. of the Intl. Conf. on Data Engineering (New Orleans, Louisiana, USA). 106–114. Google ScholarDigital Library
- [7] . 1997. A general incremental technique for maintaining discovered association rules. In Database Systems for Advanced Applications ’97.
Advanced Database Research and Development Series , Vol. 6. WORLD SCIENTIFIC, 185–194.DOI: https://doi.org/10.1142/9789812819536_0020 Google ScholarDigital Library - [8] . 2017. A review on soft set-based parameter reduction and decision making. IEEE Access 5 (2017), 4671–4689.
DOI: https://doi.org/10.1109/ACCESS.2017.2682231Google ScholarCross Ref - [9] . 2021. ILUNA: Single-pass incremental method for uncertain frequent pattern mining without false positives. Inf. Sci. 564 (2021), 1–26.
DOI: https://doi.org/10.1016/j.ins.2021.02.067Google ScholarCross Ref - [10] . 2018. Mining diversified association rules in big datasets: A cluster/GPU/genetic approach. Inf. Sci. 459 (2018), 117–134.
DOI: https://doi.org/10.1016/j.ins.2018.05.031Google ScholarDigital Library - [11] . 2021. Exploring decomposition for solving pattern mining problems. ACM Trans. Manag. Inf. Syst. 12, 2 (
Feb. 2021).DOI: https://doi.org/10.1145/3439771 Google ScholarDigital Library - [12] . 2020. Utility-driven mining of trend information for intelligent system. ACM Trans. Manag. Inf. Syst. 11, 3 (
June 2020).DOI: https://doi.org/10.1145/3391251 Google ScholarDigital Library - [13] . 2016. More efficient algorithm for mining frequent patterns with multiple minimum supports. In Web-Age Information Management, , , , , and (Eds.). Springer International Publishing, Cham, 3–16. Google ScholarDigital Library
- [14] . 2016. Incremental learning algorithms and applications. In European Symposium on Artificial Neural Networks (ESANN’16), Bruges, Belgium.Google Scholar
- [15] . 2018. Chunk incremental learning for cost-sensitive hinge loss support vector machine. Pattern Recog. 83 (2018), 196–208.Google ScholarCross Ref
- [16] . 2016. Multi-swarm bat algorithm for association rule mining using multiple cooperative strategies. Appl. Intell. 45, 4 (2016), 1021–1033.Google ScholarCross Ref
- [17] . 2008. An efficient FUFP-tree maintenance algorithm for record modification. Int. J. Innov. Comput. Inf. Contr. 4, 11 (2008).Google Scholar
- [18] . 2017. Matrix-based dynamic updating rough fuzzy approximations for data mining. Knowl.-based Syst. 119 (2017), 273–283. Google ScholarDigital Library
- [19] . 2014. Real time contextual collective anomaly detection over multiple data streams. Proceedings of the ODD. 23–30.Google Scholar
- [20] . 2018. A parallel FP-growth algorithm on World Ocean Atlas data with multi-core CPU. J. Supercomput. 75, 2 (2018). Google ScholarDigital Library
- [21] . 2016. Real-time stream data mining based on CanTree and Gtree. Inf. Sci. 367 (2016), 512–528. Google ScholarDigital Library
- [22] . 2014. Open challenges for data stream mining research. ACM SIGKDD Explor. Newslett. 16, 1 (2014), 1–10. Google ScholarDigital Library
- [23] . 2019. SPPC: A new tree structure for mining erasable patterns in data streams. Appl. Intell. 49, 2 (
Feb. 2019), 478–495.DOI: https://doi.org/10.1007/s10489-018-1280-5 Google ScholarDigital Library - [24] . 2018. Self-organizing weighted incremental probabilistic latent semantic analysis. Int. J. Mach. Learn. Cybern. 9, 12 (
Dec. 2018), 1987–1998.DOI: https://doi.org/10.1007/s13042-017-0681-9Google ScholarCross Ref - [25] . 2021. Using social media to analyze public concerns and policy responses to COVID-19 in Hong Kong. ACM Trans. Manag. Inf. Syst. 12, 4 (
Sept. 2021).DOI: https://doi.org/10.1145/3460124 Google ScholarDigital Library - [26] . 2014. Maintenance of prelarge trees for data mining with modified records. Inf. Sci. 278 (2014), 88–103.
DOI: https://doi.org/10.1016/j.ins.2014.03.023Google ScholarCross Ref - [27] . 2009. The pre-FUFP algorithm for incremental mining. Expert Syst. Applic. 36, 5 (2009), 9498–9505.
DOI: https://doi.org/10.1016/j.eswa.2008.03.014 Google ScholarDigital Library - [28] . 2016. Maintaining the discovered high-utility itemsets with transaction modification. Appl. Intell. 44, 1 (2016), 166–178. Google ScholarDigital Library
- [29] . 2018. Maintenance algorithm for high average-utility itemsets with transaction deletion. Appl. Intell. 48, 10 (
Oct. 2018), 3691–3706.DOI: https://doi.org/10.1007/s10489-018-1180-8 Google ScholarDigital Library - [30] . 2016. A fast and distributed algorithm for mining frequent patterns in congested networks. Computing 98, 3 (2016), 235–256. Google ScholarDigital Library
- [31] . 2018. PARMTRD: Parallel association rules based multiple-topic relationships detection. In International Conference on Web Services. Springer, 422–436.Google ScholarDigital Library
- [32] . 2016. Speeding-up association rule mining with inverted index compression. IEEE Trans. Cybern. 46, 12 (2016), 3059–3072.Google ScholarCross Ref
- [33] . 2018. MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems. Knowl.-based Syst. 153 (2018), 176–192. Google ScholarDigital Library
- [34] . 2020. Mining the local dependency itemset in a products network. ACM Trans. Manag. Inf. Syst. 11, 1 (
Apr. 2020). https://doi.org/10.1145/3384473 Google ScholarDigital Library - [35] . 2016. An incremental learning of concept drifts using evolving type-2 recurrent fuzzy neural networks. IEEE Trans. Fuzzy Syst. 25, 5 (2016), 1175–1192.Google Scholar
- [36] . 2018. A review of adaptive online learning for artificial neural networks. Artif. Intell. Rev. 49, 2 (2018), 281–299.
DOI: https://doi.org/10.1007/s10462-016-9526-2 Google ScholarDigital Library - [37] . 2018. Ensemble incremental learning random vector functional link network for short-term electric load forecasting. Knowl.-based Syst. 145 (2018), 182–196.Google ScholarDigital Library
- [38] . 2016. Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks. China Commun. 13, 7 (2016), 108–116.Google ScholarCross Ref
- [39] . 2016. A data mining approach for machine fault diagnosis based on associated frequency patterns. Appl. Intell. 45, 3 (2016), 638–651. Google ScholarDigital Library
- [40] . 2018. A survey towards an integration of big data analytics to big insights for value-creation. Inf. Process. Manag. 54, 5 (
Sept. 2018), 758–790. https://doi.org/10.1016/j.ipm.2018.01.010Google ScholarCross Ref - [41] . 2018. Mining significant crisp-fuzzy spatial association rules. Int. J. Geog. Inf. Sci. 32, 6 (2018), 1247–1270.Google ScholarCross Ref
- [42] . 2016. A sparse memory allocation data structure for sequential and parallel association rule mining. J. Supercomput. 72, 2 (
Feb. 2016), 347–370. https://doi.org/10.1007/s11227-015-1566-x Google ScholarDigital Library - [43] . 2020. Uncertain-driven analytics of sequence data in IoCV environments. IEEE Trans. Intell. Transport. Syst. 22, 8 (2020), 1–12.
DOI: https://doi.org/10.1109/TITS.2020.3012387Google Scholar - [44] . 2020. Large-scale high-utility sequential pattern analytics in internet of things. IEEE Internet Things J. 8, 16 (2020), 1–1.
DOI: https://doi.org/10.1109/JIOT.2020.3026826Google Scholar - [45] . 2018. Concept drift adaptation by exploiting historical knowledge. IEEE Trans. Neural Netw. Learn. Syst. 29, 10 (2018), 4822–4832.
DOI: https://doi.org/10.1109/TNNLS.2017.2775225Google ScholarCross Ref - [46] . 2008. MoStream: An efficient algorithm for monitoring clusters evolving in data streams. In Proceedings of the IEEE International Conference on Granular Computing. IEEE, 582–587.Google Scholar
- [47] . 2012. Real-time Business Intelligence & Frequent Pattern Mining Algorithm: Timely Consistent Analysis Using Real-time Data Warehouse Environment and Improving Efficiency of Apriori Algorithm. LAP Lambert Academic Publishing. Google ScholarDigital Library
- [48] . 2017. Efficient algorithms for mining colossal patterns in high dimensional databases. Knowl.-based Systems 122 (2017). Google ScholarDigital Library
- [49] . 2012. Process mining: Overview and opportunities. ACM Trans. Manag. Inf. Syst. 3, 2 (
July 2012).DOI: https://doi.org/10.1145/2229156.2229157 Google ScholarDigital Library - [50] . 2018. Tracing public opinion propagation and emotional evolution based on public emergencies in social networks. Int. J. Comput. Commun. Contr. 13, 1 (2018), 129–142.Google ScholarCross Ref
- [51] . 2016. Computing exact permutation p-values for association rules. Inf. Sci. 346 (2016), 146–162. Google ScholarDigital Library
- [52] . 2021. Incremental frequent itemsets mining based on frequent pattern tree and multi-scale. Expert Syst. Appl. 163 (2021), 113805.Google ScholarCross Ref
- [53] 1996. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press. Google ScholarDigital Library
- [54] . 2018. Maintenance of discovered high average-utility itemsets in dynamic databases. Appl. Sci. 8, 5 (2018).
DOI: https://doi.org/10.3390/app8050769Google ScholarCross Ref - [55] . 2018. Merging weighted SVMs for parallel incremental learning. Neural Netw. 100 (2018), 25–38.Google ScholarCross Ref
Index Terms
- An Evolutive Frequent Pattern Tree-based Incremental Knowledge Discovery Algorithm
Recommendations
The Studies of Mining Frequent Patterns Based on Frequent Pattern Tree
PAKDD '09: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data MiningMining frequent patterns is to discover the groups of items appearing always together excess of a user specified threshold. Many approaches have been proposed for mining frequent pattern. However, either the search space or memory space is huge, such ...
Transaction-item association matrix-based frequent pattern network mining algorithm in large-scale transaction database
To increase the efficiency of data mining is the emphasis in this field at present. Through the establishment of transaction-item association matrix, this paper changes the process of association rule mining to elementary matrix operation, which makes ...
Frequent pattern network mining algorithm based on transaction-item association matrix
ICCOMP'09: Proceedings of the WSEAES 13th international conference on ComputersTo increase the efficiency of data mining is the emphasis in this field at present. Aiming at the difficulties of data maintaining and updating in association rule mining FP-growth algorithm, this paper proposes a FP-network model which compresses the ...
Comments