Gray and Suzor (2020)

From Copyright EVIDENCE

Advertising Architectural Publishing of books, periodicals and other publishing Programming and broadcasting Computer programming Computer consultancy Creative, arts and entertainment Cultural education Libraries, archives, museums and other cultural activities

Film and motion pictures Sound recording and music publishing Photographic activities PR and communication Software publishing Video game publishing Specialised design Television programmes Translation and interpretation

1. Relationship between protection (subject matter/term/scope) and supply/economic development/growth/welfare 2. Relationship between creative process and protection - what motivates creators (e.g. attribution; control; remuneration; time allocation)? 3. Harmony of interest assumption between authors and publishers (creators and producers/investors) 4. Effects of protection on industry structure (e.g. oligopolies; competition; economics of superstars; business models; technology adoption) 5. Understanding consumption/use (e.g. determinants of unlawful behaviour; user-generated content; social media)

A. Nature and Scope of exclusive rights (hyperlinking/browsing; reproduction right) B. Exceptions (distinguish innovation and public policy purposes; open-ended/closed list; commercial/non-commercial distinction) C. Mass digitisation/orphan works (non-use; extended collective licensing) D. Licensing and Business models (collecting societies; meta data; exchanges/hubs; windowing; crossborder availability) E. Fair remuneration (levies; copyright contracts) F. Enforcement (quantifying infringement; criminal sanctions; intermediary liability; graduated response; litigation and court data; commercial/non-commercial distinction; education and awareness)

Source Details

Gray and Suzor (2020)
Title: Playing with machines: Using machine learning to understand automated copyright enforcement at scale
Author(s): Gray, J.E., Suzor, N.P.
Year: 2020
Citation: Gray, J. E., & Suzor, N. P. (2020). Playing with machines: Using machine learning to understand automated copyright enforcement at scale. Big Data & Society, 7(1).
Link(s): Definitive , Open Access
Key Related Studies:
Discipline:
Linked by:
About the Data
Data Description: The authors collected 76.7 million Youtube videos by categorizing the different explanations for removal into six groups and, afterwards, by classifying those videos into five categories. In order to identify videos for each category, the authors trained a machine learning classifier by using the pre-trained Bidirectional Encoder Representations from Transformers (BERT). After the model found that more than 12 million videos fell into one of the categories, the authors analysed the relationships between video removals, the categories themselves and additional variables. This classification mostly contributed to identify video metadata and conduct qualitative analysis.
Data Type: Primary data
Secondary Data Sources:
Data Collection Methods:
Data Analysis Methods:
Industry(ies):
Country(ies):
Cross Country Study?: No
Comparative Study?: No
Literature review?: No
Government or policy study?: No
Time Period(s) of Collection:
  • 2016-2017
Funder(s):
  • Australian Research Council DECRA Fellowship
  • ARC Discovery Projects

Abstract

“This article presents the results of methodological experimentation that utilises machine learning to investigate automated copyright enforcement on YouTube. Using a dataset of 76.7 million YouTube videos, we explore how digital and computational methods can be leveraged to better understand content moderation and copyright enforcement at a large scale. We used the BERT language model to train a machine learning classifier to identify videos in categories that reflect ongoing controversies in copyright takedowns. We use this to explore, in a granular way, how copyright is enforced on YouTube, using both statistical methods and qualitative analysis of our categorised dataset. We provide a large-scale systematic analysis of removals rates from Content ID’s automated detection system and the largely automated, text search based, Digital Millennium Copyright Act notice and takedown system. These are complex systems that are often difficult to analyse, and YouTube only makes available data at high levels of abstraction. Our analysis provides a comparison of different types of automation in content moderation, and we show how these different systems play out across different categories of content. We hope that this work provides a methodological base for continued experimentation with the use of digital and computational methods to enable large-scale analysis of the operation of automated systems”.

Main Results of the Study

This study provides a systematic analysis of content removal rates on Youtube. In particular, videos were in large part removed by the users themselves, and then due to account termination and content ID blocks, with the lowest number of removals through the application of the DMCA. The authors stated that in the context of Youtube’s automated content moderation there is a noticeable discretion when it comes to decision-making and “a potential lack of contextual sensitivity”. The removal rates were really high with regard to videos associated with piracy and any sport content. On the contrary, there were not substantial removals of gameplay videos enforced by game publishers (but only by music rightsholders). Hacks had also high rates of removal due to breaches of Terms of Service.

Policy Implications as Stated By Author

Coverage of Study

Coverage of Fundamental Issues
Issue Included within Study
Relationship between protection (subject matter/term/scope) and supply/economic development/growth/welfare
Relationship between creative process and protection - what motivates creators (e.g. attribution; control; remuneration; time allocation)?
Harmony of interest assumption between authors and publishers (creators and producers/investors)
Effects of protection on industry structure (e.g. oligopolies; competition; economics of superstars; business models; technology adoption)
Understanding consumption/use (e.g. determinants of unlawful behaviour; user-generated content; social media)
Green-tick.png
Coverage of Evidence Based Policies
Issue Included within Study
Nature and Scope of exclusive rights (hyperlinking/browsing; reproduction right)
Exceptions (distinguish innovation and public policy purposes; open-ended/closed list; commercial/non-commercial distinction)
Green-tick.png
Mass digitisation/orphan works (non-use; extended collective licensing)
Licensing and Business models (collecting societies; meta data; exchanges/hubs; windowing; crossborder availability)
Fair remuneration (levies; copyright contracts)
Enforcement (quantifying infringement; criminal sanctions; intermediary liability; graduated response; litigation and court data; commercial/non-commercial distinction; education and awareness)
Green-tick.png

Datasets

{{{Dataset}}}