Training data for sentiment analysis [closed]

http://www.cs.cornell.edu/home/llee/data/ http://mpqa.cs.pitt.edu/corpora/mpqa_corpus You can use twitter, with its smileys, like this: http://web.archive.org/web/20111119181304/http://deepthoughtinc.com/wp-content/uploads/2011/01/Twitter-as-a-Corpus-for-Sentiment-Analysis-and-Opinion-Mining.pdf Hope that gets you started. There’s more in the literature, if you’re interested in specific subtasks like negation, sentiment scope, etc. To get a focus on companies, you might pair a method with topic detection, or cheaply just a lot of mentions of … Read more

Normalize data before or after split of training and testing data?

You first need to split the data into training and test set (validation set could be useful too). Don’t forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and normalise the data by subtracting the mean and dividing by … Read more

Parameter “stratify” from method “train_test_split” (scikit Learn)

This stratify parameter makes a split so that the proportion of values in the sample produced will be the same as the proportion of values provided to parameter stratify. For example, if variable y is a binary categorical variable with values 0 and 1 and there are 25% of zeros and 75% of ones, stratify=y … Read more