ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data

It’s because of the batch normalization layers. In training phase, the batch is normalized w.r.t. its mean and variance. However, in testing phase, the batch is normalized w.r.t. the moving average of previously observed mean and variance. Now this is a problem when the number of observed batches is small (e.g., 5 in your example) …

Read more

Understanding max_features parameter in RandomForestRegressor

Straight from the documentation: [max_features] is the size of the random subsets of features to consider when splitting a node. So max_features is what you call m. When max_features=”auto”, m = p and no feature subset selection is performed in the trees, so the “random forest” is actually a bagged ensemble of ordinary regression trees. …

Read more

sklearn metrics for multiclass classification

The function call precision_score(y_test, y_pred) is equivalent to precision_score(y_test, y_pred, pos_label=1, average=”binary”). The documentation (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html) tells us: ‘binary’: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary. So the problem is that your labels are not binary, but probably one-hot encoded. Fortunately, there are other options …

Read more

Correlated features and classification accuracy

Correlated features do not affect classification accuracy per se. The problem in realistic situations is that we have a finite number of training examples with which to train a classifier. For a fixed number of training examples, increasing the number of features typically increases classification accuracy to a point but as the number of features …

Read more

Can anyone give a real life example of supervised learning and unsupervised learning? [closed]

Supervised learning: You get a bunch of photos with information about what is on them and then you train a model to recognize new photos. You have a bunch of molecules and information about which are drugs and you train a model to answer whether a new molecule is also a drug. Unsupervised learning: You …

Read more

Batch normalization instead of input normalization

You can do it. But the nice thing about batchnorm, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrate as the network learns. Effectively, setting the batchnorm right after the input layer is a fancy data pre-processing step. It helps, sometimes a lot (e.g. in linear regression). But …

Read more