Comparing Word Embeddings with Gensim

Parul Sethi (~parulsethi)


14

Votes

Description:

Python has many Natural language processing tools. In particular if someone wants to implement a recommender or a document classifier they face a problem choosing from many open source word embeddings available. In this talk, I will highlight the differences between them. I’ll go through some evaluations, primarily on the three word embeddings, Word2Vec, FastText and WordRank, which are all available either as direct implementation or wrapper in the widely used python library gensim. The results will reflect how these different embeddings specialize on different downstream NLP tasks.

As Visualizations are also a crucial part of Data analysis, to understand the structure and underlying patterns that may be held within the data, so I’ll cover about visualizing the word embeddings using TensorBoard and gensim.

Outline:

  • What are word embeddings.
  • Why are they useful.
  • Examples of some popular word embeddings
  • Why you need to choose carefully b/w those different embeddings.
  • Example of their different results, for similarity with a single word.
  • Benchmark performance overview:
    1. What is Word Similarity data — how diff. embeddings perform on this.
    2. What is Word Analogy data — how diff. embeddings perform on this.
  • Visualizations:
    1. PCA, t-SNE
    2. Using TensorBoard with an example of embedding
  • Relation b/w word frequency and embedding performance
  • How the differences b/w embeddings discussed above could effect downstream applications.
  • Conclusion/Summary
  • Questions

Prerequisites:

Just a basic idea of what word embeddings are.

Speaker Info:

I'm a third year undergraduate student of Maths and IT at Cluster Innovation Centre, University of Delhi. I contributed for WordRank wrapper and Embedding comparison tutorial to gensim as part of my Incubator project with RaRe Technologies.

Section: Data Visualization and Analytics
Type: Talks
Target Audience: Beginner
Last Updated:

Hi! Could you please add up slides to your workshop/talk?

Shivani Bhardwaj (~shivan1b)

Sure, I'll try do it asap. Though it would completely be based on the Blog and it's accompanying Jupyter Notebook mentioned above in Content URLs.

Parul Sethi (~parulsethi)

Login to add a new comment.