word-embedding
CBOW v.s. skip-gram: why invert context and target words?
Here is my oversimplified and rather naive understanding of the difference: As we know, CBOW is learning to predict the word by the context. Or maximize the probability of the target word by looking at the context. And this happens to be a problem for rare words. For example, given the context yesterday was a …
How is WordPiece tokenization helpful to effectively deal with rare words problem in NLP?
WordPiece and BPE are two similar and commonly used techniques to segment words into subword-level in NLP tasks. In both cases, the vocabulary is initialized with all the individual characters in the language, and then the most frequent/likely combinations of the symbols in the vocabulary are iteratively added to the vocabulary. Consider the WordPiece algorithm …
Embedding in pytorch
nn.Embedding holds a Tensor of dimension (vocab_size, vector_size), i.e. of the size of the vocabulary x the dimension of each vector embedding, and a method that does the lookup. When you create an embedding layer, the Tensor is initialised randomly. It is only when you train it when this similarity between similar words should appear. …
What does tf.nn.embedding_lookup function do?
Yes, this function is hard to understand, until you get the point. In its simplest form, it is similar to tf.gather. It returns the elements of params according to the indexes specified by ids. For example (assuming you are inside tf.InteractiveSession()) params = tf.constant([10,20,30,40]) ids = tf.constant([0,1,2,3]) print tf.nn.embedding_lookup(params,ids).eval() would return [10 20 30 40], …