Skip to main content
Log in

Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The operational data of advanced process systems have met with explosive growth, but its fluctuations are so slight that the number of the extracted representative samples is quite limited, making it difficult to reflect the nature of the process and to establish prediction models. In this study, inspired by the process of fisherman repairing nets, a Kriging-based virtual sample generation (VSG) named Kriging-VSG is proposed to generate feasible virtual samples in data sparse regions. Then, the accuracy of prediction models is further enhanced by applying the generated virtual samples. In order to reasonably find data sparse regions, a distance-based criterion is imposed on each dimension to identify important samples with large information gaps. Similar to the process of fisherman repairing nets, a certain dimension is initially fixed at different quantiles. A dimension-wise interpolation process using Kriging is then performed on the center between important samples with large information gaps. To validate the performance of the proposed Kriging-VSG, two numerical simulations and a real-world application from a cascade reaction process for high-density polyethylene are carried out. The results indicate that the proposed Kriging-VSG outperforms other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=1460.

References

  • Bouhlel MA, Martins JRRA (2018) Gradient-enhanced Kriging for high-dimensional problems. Eng Comput 35:157–173

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  • Chen Z-S, Zhu B, He Y-L, Yu L-A (2017) A PSO based virtual sample generation method for small sample sets: applications to regression datasets. Eng Appl Artif Intell 59:236–243

    Article  Google Scholar 

  • Dong Y, Zhang Z, Hong W-C (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11:1009

    Article  Google Scholar 

  • Feng S, Zhou H, Dong H (2019) Using deep neural network with small dataset to predict material defects. Mater Des 162:300–310

    Article  Google Scholar 

  • Gao X, Deng F, Yue X (2019) Data augmentation in fault diagnosis based on the Wasserstein generative adversarial network with gradient penalty. Neurocomputing. https://doi.org/10.1016/j.neucom.2018.10.109

  • Garg A, Mhaskar P (2018) Utilizing big data for batch process modeling and control. Comput Chem Eng 119:228–236

    Article  Google Scholar 

  • Ge Z (2014) Active learning strategy for smart soft sensor development under a small number of labeled data samples. J Process Control 24:1454–1461

    Article  Google Scholar 

  • Gong H-F, Chen Z-S, Zhu Q-X, He Y-L (2017) A Monte Carlo and PSO based virtual sample generation method for enhancing the energy prediction and energy optimization on small data problem: an empirical study of petrochemical industries. Appl Energy 197:405–415

    Article  Google Scholar 

  • He Y-L, Wang P-J, Zhang M-Q, Zhu Q-X, Xu Y (2018) A novel and effective nonlinear interpolation virtual sample generation method for enhancing energy prediction and analysis on small data problem: a case study of ethylene industry. Energy 147:418–427

    Article  Google Scholar 

  • Hong W-C, Li M-W, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443

    Article  MathSciNet  Google Scholar 

  • Huang H, He R, Sun Z, Tan T (2018, December 03-08) Introvae: introspective variational autoencoders for photographic image synthesis. Paper presented at the advances in neural information processing systems, Montréal, Canada. ACM, pp 52–63

  • Jamaly M, Kleissl J (2017) Spatiotemporal interpolation and forecast of irradiance data using Kriging. Sol Energy 158:407–423

    Article  Google Scholar 

  • Li D-C, Wu C-S, Tsai T-I, Lina Y-S (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34:966–982

    Article  Google Scholar 

  • Li D-C, Chen C-C, Chang C-J, Lin W-K (2012) A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems. Expert Syst Appl 39:1575–1581

    Article  Google Scholar 

  • Liu Z, Wang L, Zhang Y, Chen CLP (2016) A SVM controller for the stable walking of biped robots based on small sample sizes. Appl Soft Comput 38:738–753

    Article  Google Scholar 

  • Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: a case study of cancer-staging data in biology. Engineering 5:156–163

    Article  Google Scholar 

  • Saha B, Gupta S, Phung D, Venkatesh S (2015) Multiple task transfer learning with small sample sizes. Knowl Inf Syst 46:315–342

    Article  Google Scholar 

  • Shaikhina T, Khovanova NA (2017) Handling limited datasets with neural networks in medical applications: a small-data approach. Artif Intell Med 75:51–63

    Article  Google Scholar 

  • Shaikhina T, Lowe D, Daga S, Briggs D, Higgins R, Khovanova N (2015) Machine learning for predictive modelling based on small data in biomedical engineering. IFAC-PapersOnLine 48:469–474

    Article  Google Scholar 

  • Shapiai MI, Ibrahim Z, Khalid M, Jau LW, Pavlovic V, Watada J (2011) Function and surface approximation based on enhanced kernel regression for small sample sets. Int J Innov Comput Inf Control 7:5947–5960

    Google Scholar 

  • Silva VM, Costa JFCL (2016) Sensitivity analysis of ordinary Kriging to sampling and positional errors and applications in quality control. REM Int Eng J 69:491–496

    Article  Google Scholar 

  • Sun ZL, Wang J, Li R, Tong C (2017) LIF: a new Kriging based learning function and its application to structural reliability analysis. Reliab Eng Syst Saf 157:152–165

    Article  Google Scholar 

  • Talafuse TP, Pohl EA (2017) Small sample reliability growth modeling using a grey systems model. Qual Eng 29:455–467

    Article  Google Scholar 

  • Tang J, Qiao J, Gu K, Yan A (2017, October 20–22) Dioxin soft measuring method in municipal solid waste incineration based on virtual sample generation. Paper presented at the 2017 Chinese automation congress (CAC), Jinan, China. IEEE, pp 7323–7328

  • Tian CL, Li CD, Zhang GQ, Lv YS (2019) Data driven parallel prediction of building energy consumption using generative adversarial nets. Energy Build 186:230–243

    Article  Google Scholar 

  • Tsai TI, Li DC (2008) Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Syst Appl 35:1293–1300

    Article  Google Scholar 

  • Ulaganathan S, Couckuyt I, Deschrijver D, Laermans E, Dhaene T (2015) A Matlab toolbox for Kriging metamodelling. Int Conf Comput Sci 51:2708–2713

    Google Scholar 

  • Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. NPJ Comput Mater 4:25

    Article  Google Scholar 

  • Zhu FY, Ma ZY, Li XX, Chen G, Chien JT, Xue JH, Guo J (2019) Image-text dual neural network with decision strategy for small-sample image classification. Neurocomputing 328:182–188

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant Nos. 61973022, 61973024, 61703027, 61533003, 61573051), the Fundamental Research Funds for the Central Universities (Grant No. JD1808), the China Scholarship Council State-Sponsored Scholarship Program (Grant Nos. 201806880024, 201806885004), and the Open Research Fund of State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, WUHAN University (Grant No. 18I01).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Abbas Rajabifard or Yuan Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

No individual participants are included in the study.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, QX., Chen, ZS., Zhang, XH. et al. Dealing with small sample size problems in process industry using virtual sample generation: a Kriging-based approach. Soft Comput 24, 6889–6902 (2020). https://doi.org/10.1007/s00500-019-04326-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04326-3

Keywords

Navigation