Elsevier

Computers & Education

Volume 128, January 2019, Pages 145-158
Computers & Education

Modelling and statistical analysis of YouTube's educational videos: A channel Owner's perspective

https://doi.org/10.1016/j.compedu.2018.09.003Get rights and content

Highlights

  • Periodicity and trend analysis of YouTube education video viewership.

  • Effect of Video Upload Activity and age of channel on its Viewership.

  • The Effect of Video Length, average percentage viewed and translation.

  • YouTube Educational videos rank distribution follow Zipf Distribution.

  • Statistics regarding devices, OS, traffic sources, playback location and demography.

Abstract

YouTube is one of the most popular websites. It is a vast resource for educational content. To better understand the characteristics and impact of YouTube on education, we have analyzed a popular YouTube channel owned by the author of this paper. It has thousands of subscribers, millions of views, and hundreds of video lectures. We have used our private YouTube analytics data to provide an in-depth study of YouTube educational videos. Our analysis provides valuable information that can have major technical and commercial implications in the field of education. We perform in-depth time-series analysis of the channel data to reveal the trend, seasonality and temporal pattern for the educational videos on YouTube. In our study, we find the relationship between video uploading activity, channel's age and its popularity. We use an entropy-based decision tree classifier to find the features that are most important for the popularity of videos. We show that video rank and number of views follow the Zipf distribution for educational videos. We observe a strong correlation between the geographical location of viewers and the location of industry the channel caters to. Besides, we also provide knowledge regarding the popular devices and operating systems used for viewing the educational videos, main traffic sources, playback locations, translation activity, and demography of viewers. Overall, we believe that the results presented in this paper are crucial in understanding YouTube EDU videos characteristics which can be utilized for making well-informed decisions for improving educational content and learning technologies.

Introduction

The Internet has witnessed an explosion of video sharing sites in the recent years. Among them, YouTube is one of the most successful one (Gandomi & Haider, 2015; Orús et al., 2016). Its great achievement lies in the combination of rich media and more importantly the social networks. The growth of User Generated Content(UGC) on YouTube (Burgess & Green, 2013; Lee & Lehto, 2013) has revolutionized education too. YouTube has changed the way people learn. It has brought classrooms in our pockets. We can study anytime, anywhere and at our own pace on almost any topic we are interested in (Cheng, Dale, & Liu, 2008; Lee & Lehto, 2013). The Online Educational channels on YouTube have changed the way education is perceived. Our main goal in this paper is to understand the popular characteristics of educational videos and model the video and user behaviour. This can have substantial technical and business impact. There has been considerable amount of research that has been done to show how videos help students learn better. In (Ljubojevic et al. Vaskovic), the authors study the effectiveness of multimedia in making teaching more effective. Similarly in (Mthembu & Roodt, 2017), the authors demonstrate the effectiveness of YouTube videos when shown in classrooms to the young students. It was observed that the attention and interaction of the class improved considerably by using these videos in class. In (Jung & Lee, 2015), the authors try to find the factors that have led to the success and acceptance of the YouTube videos for education among the university students and (Karvounidis, Chimos, Bersimis, & Douligeris, 2014) evaluates the effectiveness of the web 2.0 technologies for the development of education. All the above works were mostly concerned in studying the efficacy of multimedia, web 2.0 and YouTube in the development of education. However, very little work was done in the modelling and statistical analysis of the educational content present in the YouTube. In (Cha, Kwak, Rodriguez, Ahn, & Moon, 2007), Meeyoung et al.performed the statistical analysis of the YouTube videos and found significant results regarding the popularity distribution, evolution of videos on YouTube and the user behaviour. However, these characteristics would differ significantly based on the content of the video. For example, movies, songs and sports would be having significantly different characteristics compared to the educational videos present on YouTube.

Hence, there is an urgent need for the modelling and statistical analysis of the educational videos on YouTube which could assist the content creators and the business owners to improve their channel experience. For deeper study of these characteristics, it would be more useful if the analysis could be done on channel owners data as it provides much more detailed data for analysis which is not available publicly. This analysis could be used for making better educational content by understanding the user behaviour and interests. It can also be used for performing search optimization by the content creators and improving recommendation systems by the educational advertisers. Hence, to understand the nature and impact of YouTube EDU videos, we analyze a popular YouTube Educational channel owned by the author of this paper. The channel has got millions of views, thousands of subscribers, and hundreds of videos. Being the channel owner, we were privileged to get the channel related data that cannot be mined by any third party or outsider. Our analysis of this privately owned data reveals some very interesting characteristics about the educational videos on YouTube. The highlights of our work and findings are summarized below:

  • 1.

    Seasonality and Trend of Views per Day: We use Fast Fourier Transform and Moving Average to find the seasonality and trend of views per day of our channel. We discover that views per day of our channel are periodic with a period of six months. This very closely relates to the semester system followed in the technical institutes and universities across the world. The seasonality of data can be used to predict the best time to publish educational videos.

  • 2.

    Effect of Video Upload Activity and age of videos on channel Viewership: We carried out t-test to find the correlation between upload activity, the age of videos of the channel and its viewership. The viewership increases with the age of the channel and it also increases with the upload of new videos. However, it is pertinent to consider the effect of similar content uploads from other competing channels. This can stall the increase in viewership to some extent. The result can be utilized for planning the optimal upload and update of content for the channel.

  • 3.

    Classification of Videos: We used decision tree based classifier to classify the videos based on the number of comments, subscribers, shares, and likes. We also sorted these features based on their importance and found out that the importance of feature from highest to lowest is given by (1) number of subscribers (2) number of likes (3) number of play-lists the video is added to (4) number of shares (5) number of comments and (6) average percentage viewed. These features could be used to predict the popularity of a video based on the initial trend and certain strategic and business decisions could be made to provide an incentive to the video-lecturers and upload more of his/her lectures on a new topic.

  • 4.

    Geographical Location of Viewers:Using the latitude, longitude data for the viewership, we produced the global heat map using an R package known as rworldmap. We discovered that the location of viewers is closely related to the location of the industry it caters to. Our channel is related to programming and software interviews. We found that consumers of our videos were mostly from locations like India, USA, Canada and European countries where software is a major industry. Moreover, the location of viewers is also correlated to the country of origin of the video lecturer due to native accent and language affinity. This result can be used to customize the video content according to the given parameters.

  • 5.

    Cumulative distribution function: Cumulative distribution function (Ross, 2014) of the number of comments, the number of likes and number of subscribers per day were constructed from the channel data. Similarly, cumulative distribution function for views, likes, shares per video was also analyzed. This data can be used for modelling these channel and video characteristics and answering important academic, technical and business queries.

  • 6.

    The Effect of Video Length on Views, Average percentage viewed and use of Translations: We created a scatter-plot for (i) The number of views and length of the video in minutes and (ii) Average percentage of the video viewed and the number of views. Most of the popular videos are around 10–15 min in length. Short length videos are not always effective in explaining the concept well and lengthy videos can bore the audience. We observed that most of the videos are watched on an average for only 30–50% of their length. So, we infer that medium-length videos should be preferred and initial part of the video needs to be exciting enough to keep the users engaged till the end.

  • 7.

    Zipf Distribution and the Pareto Law: When we analyzed the rank of videos in terms of the number of views, we found that number of views of videos were inversely proportional to their rank in the viewership table. This phenomenon is known as the Zipf law (Adamic). The same phenomenon was also observed for number of subscribers, number of comments, number of likes and number of shares of videos and the video rank. The inherent Zipf distribution could be used for finding a lot of properties of the educational videos on YouTube. This knowledge could be used in constructing better recommendation system and search optimization techniques. It can also be used in deciding the value of cost per click of advertisements on YouTube.

  • 8.

    Devices used, operating systems, playback location, and traffic sources: We also studied the most prevalent Operating System, Devices used for watching these videos, the playback locations and the traffic sources of the videos. Windows and Android turned out to be the favorite operating systems. Computers and mobile phones are the top two devices for viewing videos. YouTube Search page, suggested videos and playlists were the most important sources of traffic. These data could be used for making better tailor-made videos keeping in mind these device type and OS. The knowledge about the traffic sources, demographics and playback locations can be useful for making videos that cater to the given audience.

  • 9.

    Content Based Analysis: In this section, we have tried to capture user behaviour based on the content of the video. We have tried to derive the relationship between the audience and the topic of the videos. We have shown how the user behaviour changes in terms of views, likes, shares and comments with respect to the change in the relevance of the content. We observed that the individual playlist's average likes, shares and comments differ vastly from that of the channel as well as from each other. This is dependent on the topic of the playlist. This information can be used to suggest that the coherence and correlation between the playlists help in garnering a better viewership.

To the best of our knowledge, we are the first ones to use channel owner's data to provide a deep insight into the educational videos available on YouTube. Most of the results provided in this paper cannot be mined from YouTube by a third party or an outsider. We believe that our results and analysis would be pivotal in answering some of the most pressing questions regarding the academic and business strategies and opportunities available in the educational industry.

The results presented in this paper would provide valuable information to educational video creators, Universities, Massive open online courses, and trainers. It can assist the content creators in finding out who their audience is, what are their interests, how to keep them engaged and what is the general attention time span. Our analysis can be used to know the demographics, the device and the Operating System used by the audience so that the contents can tailor-made for them. The results can also be used to model the random and stochastic process involved behind the YouTube videos and audience.

The rest of the paper is organized as follows. In Section 2, we describe the related work. In Section 3, we perform time series analysis of our YouTube data. We find the seasonality and trend of our YouTube channel. We also find out the relationship between various features of a video and find the relative importance of the features in Section 6. In Section 5, we find the effect of the age of channel and video uploading activity on its popularity. In Section 6, we find the relationship between the rank of a video and its various attributes. In Section 7, we provide the cumulative distribution function for various channel parameters to better understand the characteristics of the educational video on YouTube. In Section8, we provide a heat map of viewers to study the geographical location of the audience and their numbers. In Section 9, we describe the operating system and devices used for watching education channels. We also provide the important traffic sources, playback location, and information regarding the age and gender of the viewers. Finally, in Section11, we provide the conclusion of our work.

Section snippets

Related works

A lot of research has been done on analyzing the effect of the Internet on education and the role it has played in the evolution of e-learning. In (Ljubojevic et al. Vaskovic), the authors discuss the importance of multimedia in making teaching more effective. It presents the positive outcomes of the use of videos as a supplementary teaching tool. They demonstrate that inclusion of videos as a supplement to the classical teaching methods proves to be promising in the improvement of teaching.

In (

Time-series analysis of YouTube data

In this Section, we perform the time-series analysis (Cooley, Lewis, & Welch, 1969) of our YouTube channel and discover some interesting seasonal pattern and year-wise trend for attributes like number of views per day, number of subscribers gained per day and number of comments per day for the channel.

The effect of Video Length on views

In the given section, we have tried to explore if the length of the video creates an impact on the popularity of the video (given by the number of views in our case). From Fig. 7, we can observe that approximately 80% of the videos are between 7 and20 minin length and they have average view below 50000. There are around 55% percentage of videos with the length of 20–25 min that have views above 50000 (high number of views). We observe that it is the average length videos that are most popular.

Zipf distribution

Zipf law is used to describe a phenomenon where large or extraordinary events are rare and smaller and ordinary ones are quite common (Adamic). Zipf usually describes the size yof the occurrence of an event relative to its rank r. This distribution is often plotted on the log-log scale and exhibits the long-tail property and is linear. It indicates that there are a few extremely popular events and a large amount of not so popular events. By observing the rank of our videos in terms of the

Cumulative distribution function

There are four graphs in Fig. 10. Fig. 10arepresents the CDF of the number of likes, number of shares and number of comments per day while Fig.10b represents the CDF of number of likes, number of shares and number of comments per video. The dataset considered is for approximately 2000 days and 454 videos. We find that 90 percent chances are that the number of likes, number of comments and number of shares per day would be less than 45. Ninety-five percent of the videos has around four hundred

Heat Maps

In the heat map given in Fig. 11, we observe that most of our audience (subscribers) comes from India, the United States, and Russia. An understandable reason for the same could be the popularity of IT and software professionals in these countries. Our channel has a high number of subscribers from other countries like the UK, Israel, Germany, Singapore, Pakistan, Bangladesh and Turkey. Heat Maps give us interesting insights into the geographic details of the subscribers. The darker is the shade

Traffic sources, playback location, OS, devices, and demographic characteristics of our channel

In this Section, we study the traffic sources, playback locations, operating systems and devices used for watching our channel and the demographic characteristics of the channel audience. In Fig. 12, we observe that 39%of the viewed videos have YouTube search as the traffic source. It implies that the highest percentage of our content is found through the search feature in YouTube. 18%of the videos have suggested videos as their traffic source. This implies that viewers use the suggested video

Content based analysis and user behaviour pattern

In this Section, we perform in-depth content based analysis of our YouTube channel. YouTube provides the feature of playlists. Playlists are basically used for combining the content of same type into a group which could be played sequentially. This allows the audience to receive videos belonging to the same category at a single place. We have used the playlist data to analyze the different categories of the content of our channel. We find if they follow the same characteristics as the channel

Conclusion

In this paper we have presented an extensive data-driven analysis on the popularity distribution, popularity and evolution of the educational user-generated video contents. We deduced the factors that impacts the video growth rate in terms of popularity. We also studied the periodicity of the views and how exam months impact the growth in the number of views. The paper also presents the demographic details of its viewership. To the best of our knowledge, this is the first paper to have

References (19)

There are more references available in the full text version of this article.

Cited by (48)

  • Modal density and coherence in science dissemination: Orchestrating multimodal ensembles in online TED talks and youtube science videos

    2022, Journal of English for Academic Purposes
    Citation Excerpt :

    As Thelwall et al. (2012) found out, the success of YouTube science videos depends on different variables (e.g., entertainment value, particular topic of research, visual images, etc.) that are not necessarily related to its scientific value. Saurabh and Gautam (2019) also point out that the number of views increases as the channel becomes established and more videos are uploaded. They also conclude that medium-length videos are preferable, and that the initial part of the video needs to be exciting enough to keep the viewers’ interest until the end.

  • Connected audiences in digital media markets: The dynamics of university online video impact

    2022, European Research on Management and Business Economics
    Citation Excerpt :

    The dynamics of audiences in online video have been modelled through different approaches (Borghol et al., 2011; Figueiredo, 2013; Trzciński & Rokita, 2017), with virality being a case of particular interest (Figueiredo, Benevenuto & Almeida, 2011; Jiang, Miao, Yang, Lan & Hauptmann, 2014; Khan & Vong, 2014). For educational videos, Saurabh and Gautham (2019), and Arroyo-Barrigüete et al. (2019) have shown that audiences of academic online videos move according to the academic calendar, with an increasing number of views during exam periods. Regarding university online videos, the dynamics of university audience on YouTube has also been analysed and compared to that of educational channels with results suggesting similar behaviours across both types of channels (Ros-Gálvez et al., 2021).

  • Measuring the impact and reach of informal educational videos on YouTube: The case of Scientific Animations Without Borders

    2021, Heliyon
    Citation Excerpt :

    In general, access data on YouTube indicates what, where, and how frequently video access points occur with respect to time, geographical location (country or region), and ostensibly demographics (age and gender)—only ostensibly because no consistently reliable means exist to confirm or guarantee that the user-reported demographics correspond to the user's actual demographics. Nevertheless, by disclosing how and when informal educational ICT messages were accessed and viewed (and for how long) by recipients, an analysis of user patterns and trends at the channel-level, including the global reach of top videos, can help validate the relationship between user activity metrics and messages transmitted (e.g., Park et al., 2016b; Saurabh and Gautam, 2019). While confirming whether primary success metrics (i.e., view count and subscribers) correlate at the channel- and video-levels is key, it also exposes gaps in the data that further research (or different methods) could fill.

View all citing articles on Scopus
View full text