Author: Eric McLachlan
This work was done as the term project for a masters course titled "Societal Impacts of NLP" offered as part of the Computational Linguistics Masters of Science through the University of Washington.
To install the necessary software, run:
python -m pip install -r requirements.txt
Add your Perspective API key to the environment:
export PERSPECTIVE_API_KEY="Replace this with your Perspective API key"
This project leans relies on the work done by Su Lin Blodgett, Lisa Green, and Brendan O'Connor, EMNLP 2016.
Thanks to all of them for making their work available to the community.
Their original classifier can be found here: https://github.com/slanglab/twitteraae
The paper related to the classifier can be found here:
@inproceedings{blodgett2016demographic, author = {Blodgett, Su Lin and Green, Lisa and O'Connor, Brendan}, title = {{Demographic Dialectal Variation in Social Media: A Case Study of African-American English}}, booktitle = {Proceedings of EMNLP}, year = 2016}
According to their website,
This data set includes over 100k labeled discussion comments from English Wikipedia. Each comment was labeled by multiple annotators via Crowdflower on whether it contains a personal attack.
DOI: https://doi.org/10.6084/m9.figshare.4054689.v6
Wiki: https://meta.wikimedia.org/wiki/Research:Detox/Data_Release
Citation:
Wulczyn, Ellery; Thain, Nithum; Dixon, Lucas (2017): Wikipedia Talk Labels: Personal Attacks. figshare. Dataset. https://doi.org/10.6084/m9.figshare.4054689.v6