Week: Images
Overview
After several weeks of taking a deep dive into web scraping, APIs and learning professional development tools we bring these skills together.
This week we'll begin to scrape, clean and analyze both natural text and images. We'll learn how to process the text and images we scrape from the web, request via REST APIs and obtain from pre-existing datasets.
Applications
-
[Monday]:
-
[Wednesday]:
-
[Friday]:
- How to Scrape Reddit with Python (2018): An exercises in getting old code to work
- 13 Ways To Scrape Any Public Data From Any Website: Great overview to unify and contextualize some of the Scraping/API methods studied so far
- (Peruse) Surfing the Data Pipeline with Python: Another simplified/unified perspective on our weeks of Scraping/APIs
- FreeCodeCamp: Python Scraping Tutorial - tweepy & snscrape: Be judicious, pick the right tool for right job
Resources
- reddit.com/prefs/apps
- PRAW API: Most popular Python API wrapper for Reddit
- skimage API: GOFAI/ML processing of images
- scraperr Library: Scrape Reddit Images/Galleries
- RedDownloader: Scrape Reddit Images w/o API
- snscrape Library: (Multiple) Social Network Scraper
- scweet Library: A new API-free twitter scraper (+images)
- Python Lib: user_agent
- MS BI Dashboard: Scrape Twitter for Sentiment Analysis: Twitter scraping + NLP/sentiment Analysis + Visualization
- Playwright: Web crawler automation engine (alternative to Selenium)