Linear vs nonlinear neural network?

For starters, a neural network can model any function (not just linear functions) Have a look at this – http://neuralnetworksanddeeplearning.com/chap4.html. A Neural Network has got non linear activation layers which is what gives the Neural Network a non linear element. The function for relating the input and the output is decided by the neural network … Read more

How to apply Drop Out in Tensorflow to improve the accuracy of neural network?

In the graph, I’d suggest to move keep_prob = tf.placeholder(tf.float32) outside of the model function to make it global. with graph.as_default(): … x = tf.placeholder(“float”, [None, n_input]) y = tf.placeholder(“float”, [None, n_classes]) keep_prob = tf.placeholder(tf.float32) def model(x, weights_hiden, weights_out, biases_hidden, biases_out, keep_prob): # hidden layer with RELU activation layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights_hiden), biases_hidden)) # apply … Read more

Activation function after pooling layer or convolutional layer?

Well, max-pooling and monotonely increasing non-linearities commute. This means that MaxPool(Relu(x)) = Relu(MaxPool(x)) for any input. So the result is the same in that case. So it is technically better to first subsample through max-pooling and then apply the non-linearity (if it is costly, such as the sigmoid). In practice it is often done the … Read more

Using torch.nn.DataParallel with a custom CUDA extension

This is kind of unusual, as commonly “Batch” is exactly defined as all operations of the network being invariant along that dimension. So you could, for example, just introduce another dimension. So you have the “former batch dimension” in which your operation is not invariant. For this keep your current implementation. Then, parallelize over the … Read more

What are forward and backward passes in neural networks?

The “forward pass” refers to calculation process, values of the output layers from the inputs data. It’s traversing through all neurons from first to last layer. A loss function is calculated from the output values. And then “backward pass” refers to process of counting changes in weights (de facto learning), using gradient descent algorithm (or … Read more

What is the purpose of the add_loss function in Keras?

I’ll try to answer the original question of why model.add_loss() is being used instead of specifying a custom loss function to model.compile(loss=…). All loss functions in Keras always take two parameters y_true and y_pred. Have a look at the definition of the various standard loss functions available in Keras, they all have these two parameters. … Read more

Why do we need to explicitly call zero_grad()? [duplicate]

We explicitly need to call zero_grad() because, after loss.backward() (when gradients are computed), we need to use optimizer.step() to proceed gradient descent. More specifically, the gradients are not automatically zeroed because these two operations, loss.backward() and optimizer.step(), are separated, and optimizer.step() requires the just computed gradients. In addition, sometimes, we need to accumulate gradient among … Read more