Data Den: Introduction to Web Scraping and API’s (hybrid)

Learn how to gather data from websites using Python! In this beginner-friendly workshop, you’ll learn the basics of web scraping with Beautiful Soup. We’ll show you how to dig through HTML to find the info you need, and talk about when it makes more sense to use an API or tools like Selenium. You’ll also get tips on cleaning up your data so it’s ready to use.

By the end of this lesson, workshop attendees will be able to:

Use the beautiful soup Python package to parse through HTML (such as tags and attributes) and extract content from webpages relevant to their research questions.

Compare different approaches (web scraping vs. APIs) and tools (Beautiful Soup vs. Selenium) and select the most appropriate one on their skill levels and needs.

Understand how to clean, structure, and export scraped data to make it ready for analysis.

This workshop is offered both in person and virtually. If you register to attend virtually, you will receive a Zoom link. If you register to attend in person, please join us in the Stone classroom, room 103, located in Mann Library.

Instructor:

Jacob Grippin, Research Facilitator, Cornell Center for Social Sciences

Helpers:

Iliana Burgos, Emerging Data Practices Librarian, Digital Scholarship Services

Lencia McKee, Research Data Librarian, Research Data & Open Scholarship

Gabby Evergreen, Research Data Librarian, Research Data & Open Scholarship

Workshops @ Cornell University Library

Event box

Data Den: Introduction to Web Scraping and API’s (hybrid)

Event Organizer