Constructing the Dataset
Like all other major social networks, Instagram has an API that allows anybody to download some of its images and selected associated data. You can download images by location (the method used in this project), by tags, or by particular usernames. For each image, you can retrieve its tags, description, date and time stamp, geo coordinates, username, list of users who liked this image, and comments. If a user does not publicly share all or some of her images, they are not available for download. Images deleted by users are also are not available.
Instagram does not guarantee that you can download every image that was shared within a particular area and timeframe – but we can assume that we are getting a large part of what was shared, and, most importantly, this part has same characteristics as the whole (i.e., images are distributed over the area in the same way, they have the same tags, etc.)
Because Instagram imposes limits on how much can be downloaded (up to 500 images per hour), in all our recent projects we employed a third-party service Gnip for downloads. Gnip’s limit is 10 million images per every 24 hour.
We began downloading all available Instagram images in central part of Kyiv in late January 2014. We did not know what we would do with them, but we felt that it would be an important resource for a possible project.
Jay Chow (a member of our lab) also created a custom software tool that enabled us to see 500 latest images shared in a particular area defined by geo coordinates location, updated in real time. We used this tool to get a feeling of what people were sharing in different cities, exploring the patterns before committing to do a project. This tool functioned as a telescope for seeing a local part of immense Instagram universe.
As political and social events in Kyiv were unfolding, we were checking the images with this telescope tool. Although we also followed media reports about the Kyiv events, the selection of a particular time period to use in this project was based on what we were observing in real-time - changing patterns in image content, colors and compositions.
We started collecting images and data from Instagram on 2/02/2014 and stopped on 5/15/2014. During this period, we were able to download 463,989 media records (%3.6 are video and the rest are still images).
The collection area was centered on Independence Square. Although we have specified a rectangular area, the images returned by Instagram cover a somewhat different shape: 3.93 miles (6.324 km) x 6.201 miles (9.977 km). The coordinates are this area are 50.4055 - 50.4952 (latitude) and 30.4847 - 30.5739 (longitude).
Here is the key data about our final dataset:
In the project we pay particular attention to images with one or more Maidan tags: #майдан, #maidan, #euromaidan, #євромайдан, #евромайдан, #euromaydan, #Euromaidan. These seven tags account for 1,340 images (10% of the total number of images). (In addition, we examined 100 top tags, and also identified a number of additional protest related tags. Because the frequency of these additional tags in our dataset was quite low, we decided to focus on Maidan tags in the presentation. Here are these additional tags: #war #prayforukraine #беркут, #fire, #freedom).
As we discovered, Maidan tags were sometimes used to tag images that do not appear to have any obvious connection to the protests. Other tags such as #Kyiv and #Ukraine were used both for protest related images and for unrelated images. So we can’t exclusively rely on tags to understand the subjects of images.
Therefore, we also explored images in other ways. We visualized all shared images vs. images with Maidan tags. In other visualizations, each image is repeated multiple times for each of its tags. We also used data clustering to discover visual groupings that maybe not obvious by direct examination. Finally, we manually divided 818 images with Maidan tags into our own categories, mapping and interpreting the iconography of the protests.
To analyze and visualize collected images and data, we used a variety of software: R, Python, Mondrian, as well as our own custom ImagePlot and ImageMontage visualization tools. These tools with their documentation are distributed as open source by our lab Software Studies Initiative.
Protecting social media users privacy is crucial in any research project, and its even more important if we use visual media shared by people during political events.
Beginning in early 2000s, companies running all large social media networks started to make available their selected user content to anybody using a standard software technology - API, which stands for Application programming interface (read more about history of the social media services APIs).
The main purpose of an API is to enable third-party companies create additional services and apps that can interact with a social media network. This brings more users and content to the network. For example, all third party apps that allow you to post to many social networks at once (e.g. Buffer) and see your stats are using APIs from these networks.
But the same data from social networks is also available to anybody. Consequently, the researchers from computer science and other disciplines have published hundreds of thousands of articles that use social media from Twitter, Flickr, YouTube and other networks using this technology.
Depending on the network service, different types of data are available via API. Data available from Instagram includes a username for each image, the date and time it was shared, geo location (latitude and longitude), its tags, a description and a name of filters if it was used. It also contains a link to the users profile on Instagram. No other user information (such as email, or profile information) is provided.
We have used two approaches to protect user privacy in our project. First, none of our visualizations, graphs, or their labels shows any usernames or any additional information we could have obtained by following the links to user profiles.
Second, in our visualizations Instagram images always appear as small thumbnails, no larger than 100 x 100 pixels.
If anybody finds any image in any visualization they want to remove (because it is a photo shared by them, or it shows them in a photo), they can send us a request via email, and we will promptly remove this image from the visualizations.