Skip to main content
Log in

Proposing stochastic probability-based math model and algorithms utilizing social networking and academic data for good fit students prediction

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The research progress presented in this paper comes under the areas of data science. The authors propose enhanced machine learning (supervised learning) framework for the prediction of the students through stochastic probability-based math constructs/model and an algorithm [Good Fit Student (GFS)], along with the enhanced quantification of target variables and algorithmic metrics. Academia in today’s modern world sees the problem of dropouts, low retention, poor student performances, lack of motivation, and unnecessary change of study majors and re-admissions. The authors consider this challenge as a research problem and attempt to solve it by utilizing social networking-based personality traits, relevant data and features to improve the predictive modeling approach. The authors recognize that admission choices are often governed by family trends, affordability, basic motivation, market trends, and natural instincts. However, natural gifts and talents are minimally used to select such directions in the academics. The authors based on literature review identify this a research gap and improves with a unique blend of algorithms/methods, an improved modeling of performance metrics, built upon cross-validation to improve fitness, and enhance the process of feature engineering and tuning for reduced errors and optimum fitness, at the end. The authors present the latest progress of their research in this paper. The included results show the progress of the work and ongoing improvements. The authors use machine learning techniques, Microsoft SQL Server, Excel data mining, R and Python to develop and test their model. The authors provide related work and conclude with final remarks and future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. S. Kroft, “The Data Brokers: Selling your personal information”, Cbsnews 2014 (Online). http://www.cbsnews.com/news/the-data-brokers-selling-your-personal-information/. Accessed 05 Feb 2016.

  2. J. Pagliery, “Criminals use IRS website to steal data on 104,000 people”, CNNMoney 2015 (Online). http://money.cnn.com/2015/05/26/pf/taxes/irs-website-data-hack/. Accessed 05 Feb 2016.

  3. “Bipolar subtypes have differing personality traits”, Springer Healthcare News, vol. 1, no. 1, 2012.

References

  • Acharya A, Sinha D (2014) Application of feature selection methods in educational data mining. Int J Comput Appl 103(2):34–38

    Google Scholar 

  • Adali S, Golbeck J (2012) Predicting personality with social behavior. In: 2012 IEEE/ACM international conference on advances in social networks analysis and mining, pp 302–309. doi:10.1109/ASONAM.2012.58

  • Aguiar E, Lakkaraju H, Bhanpuri N, Miller D, Yuhas B, Addison KL (2015) Who, when, and why: a machine learning approach to prioritizing students at risk of not graduating high school on time. In: Proceedings of the fifth international conference on learning analytics and knowledge—LAK’15, pp 93–102. doi:10.1145/2723576.2723619

  • Aguilar-ruiz JS, Giráldez R, Riquelme JC (2007) Natural encoding for evolutionary supervised learning. IEEE Trans Evol Comput 11(4):466–479

    Article  Google Scholar 

  • Anwar MA, Ahmed N (2013) Analyzing undergraduate students’ performance in various perspectives using data mining approach. Inf Knowl Manage 3(8):59–65

    Google Scholar 

  • Al-shargabi AA, Nusari AN (2010) Discovering vital patterns from UST students data by applying data mining techniques. In: 2010 the 2nd international conference on computer and automation engineering, ICCAE 2010, vol 2, no 2, pp 547–551. doi:10.1109/ICCAE.2010.5451653

  • Arnold A, Beck JE, Scheines R (2006) Feature discovery in the context of educational data mining: an inductive approach. In: AAAI workshop—technical report, WS-06-05, pp 7–13. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-33845968088&partnerID=40&md5=a6c3166c4061293d2ba1bc88a389313c

  • Bai S, Hao B, Li A, Yuan S, Gao R, Zhu T (2013) Predicting big five personality traits of microblog users. In: Proceedings—2013 IEEE/WIC/ACM international conference on web intelligence, WI 2013, vol 1, pp 501–508. doi:10.1109/WI-IAT.2013.70

  • Baker RSJD (2010) Data mining for education. Int Encycl Educ 7:112–118. doi:10.4018/978-1-59140-557-3

    Article  Google Scholar 

  • Baker RS, Inventado PS (2014) Educational data mining and learning analytics. In: Larusson JA, White B (eds) Learning analytics. Springer, New York, pp 61–75

  • Bhatia L, Prasad SS (2015) Building a distributed generic recommender using scalable data mining library. In: Proceedings—2015 IEEE international conference on computational intelligence and communication technology, CICT 2015, pp 98–102. doi:10.1109/CICT.2015.129

  • Bishop CM (2006) Pattern recognition and machine learning. Pattern Recognit. doi:10.1117/1.2819119

    MATH  Google Scholar 

  • Burges CCJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167. doi:10.1023/A:1009715923555

    Article  Google Scholar 

  • Byington TA (2011) Communities of practice: using blogs to increase collaboration. Interv Sch Clin 46(5):280–291. doi:10.1177/1053451210395384

    Article  Google Scholar 

  • Celiktutan O, Gunes H (2016) Automatic prediction of impressions in time and across varying context: personality, attractiveness and likeability. IEEE Trans Affect Comput 3045:1. doi:10.1109/TAFFC.2015.2513401

    Google Scholar 

  • Celiktutan O, Sariyanidi E, Gunes H (2015) Let me tell you about your personality? Real-time personality prediction from nonverbal behavioural cues. In: 2015 11th IEEE international conference and workshops on automatic face and gesture recognition, FG 2015, vol 6026. doi:10.1109/FG.2015.7163171

  • Celli F, Polonio L (2013) Relationships between personality and interactions in facebook. In: Social networking: recent trends, emerging issues and future outlook. Nova Science Publishers, Inc., pp 41–53. ISBN 978-1-62808-534-1

  • Cen L, Ruta D, Ng J (2015) Big education: opportunities for big data analytics. IEEE, pp 502–506. ISBN 978-1-4799-8058-1/15

  • Chittaranjan G, Jan B, Gatica-Perez D (2011) Who’s who with big-five: analyzing and classifying personality traits with smartphones. In: Proceedings—international symposium on wearable computers, ISWC, pp 29–36. doi:10.1109/ISWC.2011.29

  • Cieciuch JAN (2014) The big five and beyond: personality traits and their measurements. Ann Psychol 17(2):249–257

    Google Scholar 

  • Cutler A (2010) Random forest for regression and classification. Retrieved from internal-pdf://semisupervised-3254828305/semisupervised.ppt

  • Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107. Retrieved from http://dl.acm.org/citation.cfm?id=1327452.1327492%5Cnhttp://portal.acm.org/citation.cfm?doid=1327452.1327492

  • Dekker GW, Pechenizkiy M, Vleeshouwers JM (2009) Predicting students drop out: a case study. EDM’09—educational data mining 2009: 2nd international conference on educational data mining, pp 41–50. doi:10.1037/0893-3200.21.3.344

  • Delavari N (2004) A new model for using data mining technology in higher educational systems. … Based higher education …. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1358187

  • Delavari N, Reza Beikzadeh M, Phon-Amnuaisuk S (2005) Application of enhanced analysis model for data mining processes in higher educational system. In: 2005 6th international conference on information technology based higher education and training, pp F4B–1–F4B–6. 10.1109/ITHET.2005.1560303

  • Dickey DA (2012) Introduction to predictive modeling with examples. SAS Global Forum 1974:1–14

    Google Scholar 

  • Dutt A (2015) Clustering algorithms applied in educational data mining. Int J Infor Electron Eng (IJIEE) 5(2):112–116. doi:10.7763/IJIEE.2015.V5.513

    Google Scholar 

  • Elbadrawy A, Studham RS, Karypis G (2015) Collaborative multi-regression models for predicting students’ performance in course activities. In: Proceedings of the fifth international conference on learning analytics and knowledge—LAK’15, pp 103–107. doi:10.1145/2723576.2723590

  • El-Halees A (2008) Mining students data to analyze learning behavior: a case study educational systems. J Work, pp 1–4

  • Erik G (2014) Introduction to supervised learning, pp 1–5. Retrieved from http://people.cs.umass.edu/~elm/Teaching/Docs/supervised2014a.pdf

  • Fouché G, Langit L (2011) Data Mining with excel. In: Foundations of SQL server 2008 R2 business intelligence, pp 301–328. Berkeley, Apress. doi:10.1007/978-1-4302-3325-1_11

  • Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. ACM SIGOPS Oper Syst Rev 37(5):29

    Article  Google Scholar 

  • Goga M, Kuyoro S, Goga N (2015) A recommender for improving the student academic performance. In: Procedia—social and behavioral sciences, vol 180 (November 2014), pp 1481–1488. doi:10.1016/j.sbspro.2015.02.296

  • Golbeck J, Robles C, Edmondson M, Turner K (2011) Predicting personality from twitter. In: Proceedings—2011 IEEE international conference on privacy, security, risk and trust and IEEE international conference on social computing, PASSAT/SocialCom 2011, pp 149–156. doi:10.1109/PASSAT/SocialCom.2011.33

  • Goldberg L (1982) From Ace to Zombie: some explorations in the language of personality. Adva Pers Assess. Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:From+ace+to+zombie+some+explorations+in+the+language+of+personality#0

  • Gupta U, Chatterjee N (2013) Personality traits identification using rough sets based machine learning. Int Symp Comput Bus Intell 2013:182–185. doi:10.1109/ISCBI.2013.44

    Google Scholar 

  • Harrel FE (2011) Regression modeling strategies. Rev Esp Cardiol 64(6):501–507. doi:10.1016/j.recesp.2011.01.019

    Article  Google Scholar 

  • Hien NTN, Haddawy P (2007) A decision support system for evaluating international student applications. In: Proceedings—frontiers in education conference, FIE, pp 1–6. doi:10.1109/FIE.2007.4417958

  • Hsu CW, Chang CC, Lin CJ (2008) A practical guide to support vector classification. BJU Int 101(1):1396–400. Retrieved from http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

  • Huang S, Fang N (2013) Predicting student academic performance in an engineering dynamics course: a comparison of four types of predictive mathematical models. Comput Educ 61(1):133–145

    Article  Google Scholar 

  • Jindal R, Borah MD (2013) A survey on educational data mining and research trends. Int J Database Manag Syst (IJDMS) 5(3):53–73. doi:10.5121/ijdms.2013.5304

    Article  Google Scholar 

  • Kafeza E, Kanavos A, Makris C, Vikatos P (2014) T-PICE: Twitter personality based influential communities extraction system. In: Proceedings—2014 IEEE international congress on big data, Bigdata congress 2014, pp 212–219. doi:10.1109/BigData.Congress.2014.38

  • Kalpana JKJ, Venkatalakshmi K (2014) Intellectual performance analysis of students’ by using data mining techniques. Int J Innov Res Sci Eng Technol 3(3):1922–1929

    Google Scholar 

  • La Sala L, Skues J, Grant S (2014) Personality traits and facebook use: the combined/interactive effect of extraversion, neuroticism and conscientiousness. Soc Netw 3(5):211–219. doi:10.4236/sn.2014.35026

    Article  Google Scholar 

  • Linoff GS (2008) Data analysis using SQL and excel. Portalacmorg. Retrieved from http://portal.acm.org/citation.cfm?id=1407834

  • Lops P, Gemmis M (2011) Leveraging the linkedin social network data for extracting content-based user profiles. In: Proceedings of the Fifth …, (November 2015), pp 293–296. doi:10.1145/2043932.2043986

  • Louppe G (2014) Understanding random forests: from theory to practice. Dissertation, University of Liège. doi:10.13140/2.1.1570.5928

  • Markovikj D, Gievska S, Kosinski M, Stillwell D (2013) Mining facebook data for predictive personality modeling. In: Proceedings of the 7th international AAAI conference on weblogs and social media (ICWSM 2013), Boston, MA, USA, pp 23–26

  • Merceron A, Yacef K (2005) Educational data mining: a case study. Artif Intell Educ Support Learn Through Intell Soc Inf Technol. doi:10.1504/IJKESDP.2009.022718

    Google Scholar 

  • Minaei-Bidgoli B (2004) Data mining for a web-based educational system. Dissertation, Michigan State University, pp 1–220

  • Mohammadi G, Vinciarelli A (2012) Automatic personality perception: prediction of trait attribution based on prosodic features. IEEE Trans Affect Comput 3(3):273–284. doi:10.1109/T-AFFC.2012.5

    Article  Google Scholar 

  • Neville PG (1999) Decision trees for predictive modeling. SAS Institute Inc., pp 1–24

  • Nie D, Guan Z, Hao B, Bai S, Zhu T (2014) Predicting personality on social media with semi-supervised learning. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), pp 158–165. doi:10.1109/WI-IAT.2014.93

  • Pal S (2012) Mining educational data using classification to decrease dropout rate of students. Int J Multidiscip Sci Eng 3:35–39

    Google Scholar 

  • Park H-A, Ae H (2013) An introduction to logistic regression: from basic concepts to interpretation with particular attention to nursing domain. J Korean Acad Nurse 43(2):154–164. Retrieved from doi:10.4040/jkan.2013.43.2.154

  • Pianesi F (2013) Searching for personality. IEEE Signal Process Mag 30(1):146–158. doi:10.1109/MSP.2012.2219671

    Article  Google Scholar 

  • Qiu L, Lin H, Ramsay J, Yang F (2012) You are what you tweet: personality expression and perception on Twitter. J Res Pers 46(6):710–718. doi:10.1016/j.jrp.2012.08.008

    Article  Google Scholar 

  • Quercia D, Kosinski M, Stillwell D, Crowcroft J (2011) Our Twitter profiles, our selves: predicting personality with Twitter. In: 2011 IEEE third int’l conference on privacy, security, risk and trust and 2011 IEEE third int’l conference on social computing, pp 180–185. doi:10.1109/PASSAT/SocialCom.2011.26

  • Ren Z, Sweeney M (2016) Predicting student performance using personalized analytics (April), pp 61–69.  IEEE Society, Computer 0018-916 2/16

  • Riedel M, Memon AS, Memon MS (2014) High productivity data processing analytics methods with applications. In: 2014 37th international convention on information and communication technology, electronics and microelectronics, MIPRO 2014—proceedings (May), pp 289–294. doi:10.1109/MIPRO.2014.6859579

  • Romero C, Ventura S, Espejo PG, Hervás C (2008) Data mining algorithms to classify students. Educ Data Min 2008:8–17

    Google Scholar 

  • Saucier G, Goldberg LR (1996) The language of personality: lexical perspectives on the five-factor model. In: Wiggins JS (ed) The five-factor model of personality: theoretical perspectives. The Guilford Press, New York/London, pp 21–50

  • Schmitt DP, Allik J, McCrae RR, Benet-Martinez V (2007) The geographic distribution of big five personality traits: patterns and profiles of human self-description across 56 nations. J Cross Cult Psychol 38(2):173–212. doi:10.1177/0022022106297299

    Article  Google Scholar 

  • Shahiri AM, Husain W, Rashid NA (2015) A review on predicting student’s performance using data mining techniques. Proced Comput Sci 72(February 2016):414–422

    Article  Google Scholar 

  • Sheard J, Ceddia J, Hurst J, Tuovinen J (2003) Inferring student learning behaviour from website interactions: a usage analysis. Educ Inf Technol 8(2002):245–266. doi:10.1023/A:1026360026073

    Article  Google Scholar 

  • Sheather SJ (2009) A Modern approach to regression with R. Bimometrics 67(2):675–677 doi:10.1111/j.1541-0420.2011.01614.x

    MATH  Google Scholar 

  • Slim A, Heileman GL, Kozlick J, Abdallah CT (2014) Employing markov networks on curriculum graphs to predict student performance. In: Proceedings—2014 13th international conference on machine learning and applications, ICMLA 2014, pp 415–418. doi:10.1109/ICMLA.2014.74

  • Stillwell D, Kosinski M (2012) The personality of popular facebook users. In: Proceedings of the ACM 2012 conference on computer supported cooperative work, pp 955–964. doi:10.1145/2145204.2145346

  • Subject T, Date H, Subject T, Page H, Subject T, Page H, Date H (2015) Big data analytics with R and Hadoop. Packt Publishing Ltd, Birmingham

    Google Scholar 

  • Sumner C, Byers A, Boochever R, Park GJ (2012) Predicting dark triad personality traits from twitter usage and a linguistic analysis of tweets. In: Proceedings—2012 11th international conference on machine learning and applications, ICMLA 2012, vol 2, pp 386–393. 10.1109/ICMLA.2012.218

  • Tan P-N, Steinbach M, Kumar V (2006) Classification: basic concepts, decision trees. Introd Data Min 67(17):145–205. doi:10.1016/0022-4405(81)90007-8

    Google Scholar 

  • Tang Z, Maclennan J (2005) Data mining with SQL server, vol 483. Wiley, New York

    Google Scholar 

  • Trivedi S, Pardos Z, Sárközy G, Heffernan N (2011) Spectral clustering in educational data mining. In: Proceedings of the 4th international conference on educational data mining, pp 129–138

  • Vinciarelli A, Mohammadi G (2014) A survey of personality computing. IEEE Trans Affect Comput 5(3):273–291. doi:10.1109/TAFFC.2014.2330816

    Article  Google Scholar 

  • Wald R, Khoshgoftaar T, Sumner C (2012) Machine prediction of personality from Facebook profiles. In: Proceedings of the 2012 IEEE 13th international conference on information reuse and integration, IRI 2012, pp 109–115. doi:10.1109/IRI.2012.6302998

  • Wang SS, Stefanone MA (2013) Showing off? Human mobility and the interplay of traits, self-disclosure, and facebook. Soc Sci Comput Rev 31(70):437–457. doi:10.1177/0894439313481424

    Article  Google Scholar 

  • Yadav S, Shukla S (2016) Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: Proceedings of the 6th international advanced computing conference, IACC 2016, (Cv), pp 78–83. doi:10.1109/IACC.2016.25

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Fahim Uddin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Uddin, M.F., Lee, J. Proposing stochastic probability-based math model and algorithms utilizing social networking and academic data for good fit students prediction. Soc. Netw. Anal. Min. 7, 29 (2017). https://doi.org/10.1007/s13278-017-0448-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-017-0448-z

Keywords

Navigation