Intro to web automation using Python (beautifulsoup)

Sumit Srivastava (~sumit)


13

Votes

Description:

What's web scraping/web automation?


The general idea behind web scraping is to retrieve data that exists on a website, and convert it into a format that is usable for analysis. Webpages are rendered by the brower from HTML and CSS code, but much of the information included in the HTML underlying any website is not interesting to us.

Why use beautifulsoup?


Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favourite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

What cool things you can do with it?


Lot's of possibilities come along with the superpower of beautifulsoup which can range from scraping a website for fun, getting live Ind vs Pak match data, automating your tedious form fillup(with the help of Mechanize) or analysing intricate data present on a website. Power to you.

Outline:

  • What's web scraping?
  • Scrappy v/s beautifulsoup4
  • The quest between framework and library
  • Romance of Mechanize and beautifulsoup4
  • How does beautifulsoup do what it does?
  • Making a soup
  • Searching the tree, filtering and pretty printing
  • To parse, or not to parse- that is the question.
  • Dammit, encoding!
  • Straining the soup
  • Let's see it in action! Twitter parser, here we come!

Prerequisites:

To learn about web automation, it is essential that you are able to breathe. Without that ability you will soon die, and be unable to continue. Everything else will be covered in the talk.

Content URLs:

Slides under construction. Expect them on 11th Feb.

Speaker Info:

Sumit uses python as his primary programming language. He also uses it for automating most of his tedious tasks.

In addition to that, he has been co-founder and CTO of two startups which used Django framework of Python for their backend and is an Ex-intern of IIM Lucknow.

Sumit also contributes to SymPy, the open source symbolic math library of python.

Sumit is also a singer-songwriter-guitarist. Stage is his friend.

Section: Web Development
Type: Talks
Target Audience: Beginner
Last Updated:

Hi! Could you please add up slides to your talk/workshop?

Shivani Bhardwaj (~shivan1b)

Yes, sure. I'll do that in a day or two! :)

Sumit Srivastava (~sumit)

informative post I really appreciated and getting a lot of new ideas reading this post keep more posting This truly encourages me a ton and I'm certain for others likewise think that its enlightening continue sharing substance like this much appreciated. Thank you for such a well written article. It’s full of insightful information and entertaining descriptions. Your point of view is the best among many. I come to know all information you have written absolutely correct.Basically I'm a writer I would like to share my website here dissertation writing service and sometimes I wait for your post to get more this type of blogs. A good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a good one. What a great post, love this post! Great blog you have Thanks for sharing.

mentose

Many programmers don't know to scraping and they start writing a code again for python, Beautifulsoup is a very useful library to convert Html files into Python. However, Beautifulsoup takes to convert HTML data into pythons library. I would definitely learn the outcomes of this outline from your library. Basically, I am a freelance author and write articles and academic papers for different clients. Currently, I am working with Assignment help services at Writers.com.pk. I need more resources from Python since I have to write an essay on programming languages. So keep posting such learning outcomes. Thanks

Babarmalik

Login to add a new comment.