Comparing Word Embeddings with Gensim

Parul Sethi (~parulsethi)


14

Votes

Description:

Python has many Natural language processing tools. In particular if someone wants to implement a recommender or a document classifier they face a problem choosing from many open source word embeddings available. In this talk, I will highlight the differences between them. I’ll go through some evaluations, primarily on the three word embeddings, Word2Vec, FastText and WordRank, which are all available either as direct implementation or wrapper in the widely used python library gensim. The results will reflect how these different embeddings specialize on different downstream NLP tasks.

As Visualizations are also a crucial part of Data analysis, to understand the structure and underlying patterns that may be held within the data, so I’ll cover about visualizing the word embeddings using TensorBoard and gensim.

Outline:

  • What are word embeddings.
  • Why are they useful.
  • Examples of some popular word embeddings
  • Why you need to choose carefully b/w those different embeddings.
  • Example of their different results, for similarity with a single word.
  • Benchmark performance overview:
    1. What is Word Similarity data — how diff. embeddings perform on this.
    2. What is Word Analogy data — how diff. embeddings perform on this.
  • Visualizations:
    1. PCA, t-SNE
    2. Using TensorBoard with an example of embedding
  • Relation b/w word frequency and embedding performance
  • How the differences b/w embeddings discussed above could effect downstream applications.
  • Conclusion/Summary
  • Questions

Prerequisites:

Just a basic idea of what word embeddings are.

Speaker Info:

I'm a third year undergraduate student of Maths and IT at Cluster Innovation Centre, University of Delhi. I contributed for WordRank wrapper and Embedding comparison tutorial to gensim as part of my Incubator project with RaRe Technologies.

Section: Data Visualization and Analytics
Type: Talks
Target Audience: Beginner
Last Updated:

Hi! Could you please add up slides to your workshop/talk?

Shivani Bhardwaj (~shivan1b)

Sure, I'll try do it asap. Though it would completely be based on the Blog and it's accompanying Jupyter Notebook mentioned above in Content URLs.

Parul Sethi (~parulsethi)

Does anyone know a passage from the Book of Genesis that would work well for this write my assignment for me ?: Choose a passage from the Book of Genesis and, utilizing the various methodologies of interpretation and sensitive to the senses of Scripture, provide a historical

skylar

Hello Dear, Thank you for your exceptional information and if you please give us some more detailed information about it so it will be more easy for us to understand it and at the implementation, we can simply do it by taking assistance from 7 $ essay with the help of best experts.

anthonyanson

This is the really great post, I like Harley Quinn Costume I am interested in reading this post.

Ruben Wilson (~ruben)

That is the very great post, I like Best Epilator 2018

Amelia Jack (~amelia)

Either you've stood involved in rather or opened rather because this is a virus. These emails and viruses are usually sent via email to contacts, so tell them as well. Teacher” in a private school. He teaches English Literature, so assigning Dissertation Services/Dissertationhub.co.uk essays to students is his common activity.

victoria keating (~victoria)

Hi Dear, Thank you for your outstanding data and on the off chance that you please give us some more point by point data about it so it will be all the more simple for us to comprehend it and at the usage, we can essentially do it by taking help from Crow Trench Coat with the assistance of best specialists

sofia neo (~sofia)

Login to add a new comment.