Content
Introduction 2
Background 2
Intended Audience 2
Scope 2
Storyline and Questions 2
Analysis 3
Key Highlights 3
Findings 3
Conclusion 8
References 9
Appendix 10
Dataset Preparation and Cleaning 10
Join Details 11
Calculation 11
1. Introduction
1.1 Background
Data visualization is a tool to help people understand the importance of data by placing it in a visual context. The information that may not be observed in text-based data such as patterns, trends and correlations can be exposed and recognized easier with data visualization software such as Tableau and R shiny (Rouse, 2012). In this project, we will use the dataset collected from the Scratch website built by MIT Media Lab. Scratch is a web-based platform that makes it easy for youngsters to create programmable media such as interactive characters, games, and animated arts, which young children can share online. In this paper, we will explore the theoretical framework of designing and analyzing Scratch longitudinal dataset that engages youth in learning from each other by sharing their visuals called project.
1.2 Intended Audience
Although we assume our intended audiences include developers, designers and management of Scratch website, our analysis should be easily understandable for everyone regardless their professional backgrounds. The aim of our analysis is to help the audiences understand how Scratch website is performing and how to improve the platform.
1.3 Scope
One goal of Scratch is to foster creative knowledge of youths by collaboration. To achieve this goal, Scratch provides a lot of functions for interactions. For example, kids can create and upload new projects, love and download other projects, make friends with other users. In this paper, we sliced our dataset by loves with data ranging from March 2007 to April 2012, including Projects table, Viewers table, Downloaders table, Friends table, Favorites table, and Users table.
1.4 Storyline and Questions
The objective of this paper is to analyze different aspects of Scratch users, such as their needs, their interest, and parameters that drive the application, by comparing the trends between users from the United States with respects to users from other countries, in relation with the tendency of loving a project. We explored the dataset by finding answers to the questions mentioned, which helped structure our story.
• What is the trend among the users from different countries and how users are doing?
• What is the impact of the decline in the number of users from 2010?
• How do the different attributes drive scratch applications?
• How is the variable Love related to the variable project view or project download?
• How is the Love data distributed over available years?
• How does remixing of projects impact love and what can be the future trend?
2. Analysis
2.1 Key Highlights
– The highest number of projects being used to remix is by user “55931”, who is from the United States and is part of Scratch application since 2007.
– User “690170”, from the United States, appreciated maximum by adding other user’s projects as favorites.
– Project “162167”, has been maximum times (2362) marked as the favorite by others.
– User “215110” from the US started using scratch since 2009 and has maximum followers.
– User “224810” from the UK started using scratch since 2009 and followed maximum (4981) users.
2.2 Findings
Scratch Users – 1
Figure 1
After deeply exploring the data, we found that the biggest chunk of users between 2007 and 2012 are from the United States which contributes towards 40% of total users. 59% of the users are from other countries, and only 0.60% of users deleted their profile from the Application. This may indicate that the website’s user loyalty is high. There was a total of 1835 users who were deleted from the application. These users created about 3520 projects and the Loves they received on their projects were only 396 loves. This shows that these users were not that much interested in the application, causing their projects not being loved much. In addition, comparing users from the United States and the users from other countries, we see that from 2009 onwards, there is an increase in the total number of users from the other countries. This indicates that the website began to gain popularity around the world. The fourth graph (figure 1) shows that the loves received by the projects created by American users are much more than the loves given to projects of other countries. This may be because American users are more familiar with how to create interesting projects to attract the attention of others, helping us find the answer to “What is the trend among the users from different countries and How users are doing”.
Scratch Users – 2
Figure 2
In continuation of our finding around the users, we find some points to confirm that how US users were more involved than rest of the countries. If we see the figure 2, it makes us clear that users who followed other users were more in the US, users that are being followed were more in the US and users that click on Project to make it favorite were more in the US. This clearly indicates that users from the US were more involved in making new friends, following users, appreciating there works, and also creating good projects thus being followed more. Also, graph 2 clearly shows that users from different countries follow each other, it’s not that users from the same country only follow users from their country. The graph 3 (Figure 2) also proves that the huge numbers of US user are not because they have the large user base. The trend shows that the number increased in the normal way as the number of users increased (2009 onwards) in figure 1 (graph 3). In addition, in the third graph (Figure 2) we can find that the number of users who favorite projects, being followed and followed others are close and their changing trend is similar. In this trend too, we can clearly see a decline from 2010 onwards. There is no doubt that there is some correlation between these three variables, they affect each other. So, to answer “what is the impact of the decline in the number of users from 2010”, we believe that due to decline in the total number of users there was a huge impact on people making new friends and marking new projects as their favorite.
Now, let’s talk about How do the different attributes drive scratch applications?
Scratch User Activities
Figure 3
According to the first graph (Figure 3), we can find that many users like creating their projects with more blocks, images, and scripts. This may be because projects that contain different types of content can attract more users’ attention. It can also be observed, that users were not much inclined in downloading other projects or Loving the projects. One of the reasons might be that since most of the users are youngsters around the world, it’s not mandatory or a feeling of duty to click on love or download. Even if one appreciates a certain project, they may view it, take inspiration and create something similar but not necessarily download it or love it. From the second graph (Figure 3), we can see that similar trend follows for the projects from the United States. This means that most of the people prefer to create projects with more blocks.
As we have seen in the previous analysis that there are not many users giving loves on the website. However, we want to find how is the variable Love related to the variable project view or project download?
Loves relation with Views & Downloads
Figure 4
In figure 4, we are exploring the relationship between projects being loved and the project being viewed or downloaded. We have also used data from view and download tables to analyze the data that is being logged in the tables and understand the correlation between them. Since the R-value is large enough, we can say that there is a strong correlation between projects being loved and the project being viewed or downloaded. This makes it clear that projects with more number of views and more number of downloads have more chances of being loved. We also tried bifurcating projects of users from the US and those from other countries to verify if there is some pattern, but it’s almost similar.
After discussing Projects being Loved, let’s see how is the Love data distributed over available years.
Love Data Distribution
Figure 5
The figure 5 is bit interesting as it clearly explains how the projects being loved quarterly and what are its variations with respect to particular years average and overall average.
1. 2010 and 2011 are the years where projects were loved more.
2. From 2007 to 2010, we can see the increase in love from the first quarter to the fourth quarter (middle section of the graph), except the year 2011, where we start to see the decline. This is in sync with all the trend-lines declining this year.
3. If we observe the first section of the graph, it shows that the years 2007, 2008, and 2009 contributes to the project being loved, even if it is less than the total average but always shows an increase.
We discussed so much around project loves, we want to show how remixing of projects impact love and what can be the future trend too? We can see from figure 6, that the remixed projects (self or by others) received less love in comparison to the original projects. The reason can be that remixed projects are less in number or it is possible that people liked the original project and not loved the new ones. However, if we see even the number of projects might not be similar, but the trend is almost same. This analysis seems to be important to us, as it makes us realize that the essence of the original projects cannot be changed with remixing though it can be an inspiration for something new. We also created the forecast till 2013 to see what will be the trend. The reason for this forecast is to see what will be the trend, as we see a huge decline from 2011 to 2012. But, if the forecast to be believed, the chances of improvement in the next year is good, even though the existing observed data for the 1st quarter of 2012 shows a decline.
Year 2012 Forecast
Figure 6
Note: We have not considered data of the year 2012 for the forecast, as the data was not of the complete year.
3. Conclusion
Youngsters spend a good quality time online by not only watching music videos, chatting about dramas, playing video games, and browsing friends on Facebook but also engaging themselves with digital media online. They are typically thought as the consumers of media. So, here Scratch brings the opportunity to engage these young minds as creators of programmable media, particularly interactive media. Scratch website was launched in May 2007 but has become a vibrant community since then, sharing more than one million projects. Each day, worldwide hundreds of scratch users view, download, and love projects. The collection of these projects is extremely diverse, such as interactive newsletters, games, animated cartoons, virtual tours, simulations, and many more, using programming blocks. The participation of the Scratch online community can have a spectrum ranging from getting socialized, by learning group dynamics or interacting with others to creating media, using single process or combining multiple product programming.
We analyzed the entire project dataset (Specific Tables) using Tableau by creating various complex data visualization graphs. Though the users in the United States were more than the users in other countries at the beginning, the situation has changed since 2009. It’s clear that from 2009, the number of users from other countries is higher than the number of users in the United States, this may indicate that the website began popular around the world. However, the users from other countries are more than users in the United States. In recent years, the loves received by the projects created by users in the United States are much more. This may be because the users of United States are the native users of the website who are more familiar with the site’s operation. Also, the impact of declining user can be seen overall, with a decrease in all the activities by Scratch website users.
With this analysis, we can acknowledge that the overall atmosphere and the interest of creating something new, is influenced by social aspects and social networking environment. This in parallel influence the activity of creating projects for the website. In this paper, we concluded that Scratch online community explored the different forms of participation and collaboration within the scratch society to enhance and support young minds to grow as a creator of interactive media.
4. References
-B. M. Hill, A. M. Hernandez, 2017. A longitudinal dataset of five years of public activity in the Scratch online community. Retrieved from https://www.nature.com/articles/sdata20172
-A. M. Hernandez, 2007. ScratchR: sharing user-generated programmable media. Retrieved from http://onesearch.northeastern.edu/primo-explore/fulldisplay?
-Karen Brennan, Andrés Monroy-Hernández, Mitchel Resnick, 2010. Making projects, making friends: Online community as catalyst for interactive media creation. Retrieved from http://onesearch.northeastern.edu/primo-explore/fulldisplay?
-Knaflic, C. N. (2015). Storytelling with Data: A Data Visualization Guide for Business Professional. John Wiley & Sons.
-Getting Started with Visual Analytics. (n.d.). Retrieved from https://www.tableau.com/learn/tutorials/on-demand/getting-started-visual-analytics?product=&version=10.3&topic=visual_analytics
-Interpret the key results for Scatterplot – Minitab Express. (n.d.). Retrieved from http://support.minitab.com/en-us/minitab-express/1/help-and-how-to/graphs/scatterplot/interpret-the-results/key-results/
-data visualization Retrieved from
http://searchbusinessanalytics.techtarget.com/definition/data-visualization
-Scratch (programming language). Retrieved from
https://en.wikipedia.org/wiki/Scratch_(programming_language)
5. Appendix
5.1 Dataset Preparation and Cleaning
To gain good insight of any business data, it is really important to focus on data preparation before applying the concepts of visualization. To achieve accurate, clean, and meaningful visualization, a neat and well-prepared dataset is must. This not only helps an analyst in gaining accuracy in exploratory analysis but also enhances the overall business output. As part of this assignment we were provided with six unique tables (Favorites, Friends, Users, Viewers, Download, and Projects). Out of these, we used four tables (Favorites, Friends, Viewers, and Downloads) to create new tables.
Favorites, was used to create Projects_favorites and User_Favorites.
Project_favorites: Defines the number of favorites per each project, which was created by using Tableau, utilizing the aggregation concept and then exporting the data into a different csv file.
User_Favorites: defines the number of favorites per each user, which was created by using Tableau, utilizing the aggregation concept and then exporting the data into a different csv file.
Friends, was used to create No_Of_Followers and User_that_followed.
No_Of_Followers: defines the number of followers to each user, which was created by using Tableau, utilizing the aggregation concept and then exporting the data into a different csv file.
User_that_followed: defines the number of users that followed a particular user, which was created by using Tableau, utilizing the aggregation concept and then exporting the data into a different csv file.
Viewers, was used to create Unique Views, which defines the total number views to a particular project. It was created by using Tableau, utilizing the aggregation concept and then exporting the data into a different csv file.
Downloaders, was used to create Unique downloads, which defines the total number downloads to a particular project. It was created by using Tableau, utilizing the aggregation concept and then exporting the data into a different csv file.
In User table, Country was manually shared by users with no restrictions, thus there were many incorrect values. Hence, we created new csv file usernew with a new column defining countries as United states or Other countries.
We have used Project table and outer join with seven other files: Project_favorites, User_Favorites, No_Of_Followers, User_that_followed, Unique Views, Unique downloads, and usersnew.
5.2 Join Details
Project Table Columns Other Table Column Other Table Name
User Id Followed User Id No_Of_Followers
Project Id Project Id Project_favorites
Project Id Project Id1 Unique Downloads
Project Id Project Id Unique Views
User Id User Id1 User_Favorites
User Id Follower User Id User_that_followed
User Id User Id usernew
5.3 Calculation
For figure 5, we have made the use of the following calculation with respect to Lovers data.
Love-Diff from All years avg: ZN(SUM([Lovers Website])) – WINDOW_AVG(SUM([Lovers Website]))
Love-Diff from years avg: SUM([Lovers Website]) – WINDOW_AVG(SUM([Lovers Website]), First(), Last())