An Introduction to web scraping using Python

Manoj Pandey (~manojpandey)


20

Votes

Description:

Web scraping is a technique for gathering data or information on web pages. You could revisit your favorite web site every time it updates for new information. Or you could write a web scraper to have it do it for you!

Want to learn how to scrape the web (and / or organized data sets and APIs) for content? This talk will give you the building blocks (and code) to begin your own scraping adventures. We will review basic data scraping, API usage, form submission as well as how to scrape pesky bits like Javascript-usage for DOM manipulation.

Besides looking at how websites are put together, we will also discuss the ethics of scraping. What is legal? How can you be a friendly scraper, so that the administrator of the website you are scraping won’t try to shut you down?

Prerequisites:

  1. Interest in building something
  2. Basic Python programming knowledge
  3. Basic HTML knowledge

Content URLs:

  1. BeautifulSoup
  2. lxml
  3. re
  4. scrapy

Speaker Info:

Manoj, is currently a Computer Science sophomore, studying in New Delhi, India. He is passionate about learning new stuff, mentoring people around and tinkering with latest technology. He has an ardent interest in Machine Learning and Human Computer Interaction, and is currently working as a researcher with Stanford's HCI research group.

Recently, he organised his college's first hackathon: [email protected]. He has been frequently giving a lot of open talks in his college, since he joined the college from first semester, on competitive programming, python programming, general web development, version control systems and open source tools/libraries.

Besides, code, he loves music, and has a beautiful Spotify playlist. Feel free to ask for the link ;)

Speaker Links:

  1. Website: http://manojpandey.me
  2. Github: https://github.com/manojpandey
  3. LinkedIn: http://linkedin.com/in/manoj96
  4. Mail: manojpandey1996[at]gmail[dot]com
  5. Talks: https://slides.com/manojp

Section: Data Visualization and Analytics
Type: Talks
Target Audience: Beginner
Last Updated:

Finally !! Someone is covering scraping in Python.
Really looking forward to this talk.

Pradhvan Bisht (~pradhvan)

This is Great!

ericasingh

Can we have some links to your slides or a general structure of your talk, some thing that can be used to be put on the projector so audience can follow along.

Please upload the slides/structure so they can be reviewed before 12th feb.

Have you given any talks(including this one) before? Any experience of public speaking? It's not a requirement for doing the talk but would definitely help us gauge the experience level. We suggest going through the presentation least once in front of a small audience to get some experience if you have not already.

Akshay Arora (~akshayaurora)

Hi Akshay, I've given some talks in the past. All the slides are here: https://slides.com/manojp

Manoj Pandey (~manojpandey)

Login to add a new comment.