Intro to web automation using Python (beautifulsoup)

Sumit Srivastava (~sumit)


13

Votes

Description:

What's web scraping/web automation?


The general idea behind web scraping is to retrieve data that exists on a website, and convert it into a format that is usable for analysis. Webpages are rendered by the brower from HTML and CSS code, but much of the information included in the HTML underlying any website is not interesting to us.

Why use beautifulsoup?


Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favourite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

What cool things you can do with it?


Lot's of possibilities come along with the superpower of beautifulsoup which can range from scraping a website for fun, getting live Ind vs Pak match data, automating your tedious form fillup(with the help of Mechanize) or analysing intricate data present on a website. Power to you.

Outline:

  • What's web scraping?
  • Scrappy v/s beautifulsoup4
  • The quest between framework and library
  • Romance of Mechanize and beautifulsoup4
  • How does beautifulsoup do what it does?
  • Making a soup
  • Searching the tree, filtering and pretty printing
  • To parse, or not to parse- that is the question.
  • Dammit, encoding!
  • Straining the soup
  • Let's see it in action! Twitter parser, here we come!

Prerequisites:

To learn about web automation, it is essential that you are able to breathe. Without that ability you will soon die, and be unable to continue. Everything else will be covered in the talk.

Content URLs:

Slides under construction. Expect them on 11th Feb.

Speaker Info:

Sumit uses python as his primary programming language. He also uses it for automating most of his tedious tasks.

In addition to that, he has been co-founder and CTO of two startups which used Django framework of Python for their backend and is an Ex-intern of IIM Lucknow.

Sumit also contributes to SymPy, the open source symbolic math library of python.

Sumit is also a singer-songwriter-guitarist. Stage is his friend.

Section: Web Development
Type: Talks
Target Audience: Beginner
Last Updated:

Hi! Could you please add up slides to your talk/workshop?

Shivani Bhardwaj (~shivan1b)

Yes, sure. I'll do that in a day or two! :)

Sumit Srivastava (~sumit)

Login to add a new comment.