training-data – Row Coding

Publicly Available Spam Filter Training Set [closed]

September 15, 2023 by Tarik

Here is what I was looking for: http://untroubled.org/spam/ This archive has around a gigabyte of compressed accumulated spam messages dating 1998 – 2011. Now I just need to get non-spam email. So I’ll just query my own Gmail for that using the getmail program and the tutorial at mattcutts.com

Data sets for neural network training [closed]

September 10, 2023 by Tarik

https://archive.ics.uci.edu/ml is the University of California Irvine repository of machine learning datasets. It’s a really great resource, and I believe that they are all in CSV files.

Training data for sentiment analysis [closed]

June 9, 2023 by Tarik

http://www.cs.cornell.edu/home/llee/data/ http://mpqa.cs.pitt.edu/corpora/mpqa_corpus You can use twitter, with its smileys, like this: http://web.archive.org/web/20111119181304/http://deepthoughtinc.com/wp-content/uploads/2011/01/Twitter-as-a-Corpus-for-Sentiment-Analysis-and-Opinion-Mining.pdf Hope that gets you started. There’s more in the literature, if you’re interested in specific subtasks like negation, sentiment scope, etc. To get a focus on companies, you might pair a method with topic detection, or cheaply just a lot of mentions of … Read more

Normalize data before or after split of training and testing data?

March 3, 2023 by Tarik

You first need to split the data into training and test set (validation set could be useful too). Don’t forget that testing data points represent real-world data. Feature normalization (or data standardization) of the explanatory (or predictor) variables is a technique used to center and normalise the data by subtracting the mean and dividing by … Read more

How to demonstrate to management that mediocre developers are hurting team [closed]

February 13, 2023 by Tarik

Funny nobody told you that maybe you lack of management skills. Once, I ended up working with people not being able to code a loop after a year and a half of training. I trained them, until they were able to use a full feature web framework, and it took only one month. Maybe you … Read more

What is validation data used for in a Keras Sequential model?

November 30, 2022 by Tarik

If you want to build a solid model you have to follow that specific protocol of splitting your data into three sets: One for training, one for validation and one for final evaluation, which is the test set. The idea is that you train on your training data and tune your model with the results … Read more

Parameter “stratify” from method “train_test_split” (scikit Learn)

October 26, 2022 by Tarik

This stratify parameter makes a split so that the proportion of values in the sample produced will be the same as the proportion of values provided to parameter stratify. For example, if variable y is a binary categorical variable with values 0 and 1 and there are 25% of zeros and 75% of ones, stratify=y … Read more