Project 4 of the Udacity Data Analyst Nanodegree Program.
Data Wrangling of tweets from three different sources, with different file extensions. The dataset was evaluated according to its structure and data quality, and after that, the data was cleaned, so that there could be an analysis.
- The highest 'rating', which is calculated by 'rating_numerator' / 'rating_denominator', is 177.6.
- There is a big difference in the number of dogs per stage. 84% of the stages are 'none'.
- Considering those that are classified (there is no stage 'none') 66% are 'pupper', 20% are doggo, 7.5% are 'puppo', 2.3% are 'floofer' and and the rest is classified as with 2 stages.
- Considering only the unique stages, the ‘puppo’ stage has the highest average favorite count, followed by ‘doggo’, ‘floofer’ and ‘pupper’.
- The p2 algorithm was the one that identified the largest number of dogs, 1553. Followed by p1 with 1532 and p3 with 1499.
- The algorithms p1, p2 and p3, identified 111,113 and 116 different breeds of dogs respectively. In this metric, the algorithms do not differ much.
UDACITY - Data Analyst Nanodegree Program: https://www.udacity.com/course/data-analyst-nanodegree--nd002
WeRateDogs, Twitter profile (@dog_rates): https://twitter.com/dog_rates/status/749981277374128128