2-D convolution as a matrix-matrix multiplication [closed]

Yes, it is possible and you should also use a doubly block circulant matrix (which is a special case of Toeplitz matrix). I will give you an example with a small size of kernel and the input, but it is possible to construct Toeplitz matrix for any kernel. So you have a 2d input x … Read more

How useful is Turing completeness? are neural nets turing complete?

The point of stating that a mathematical model is Turing Complete is to reveal the capability of the model to perform any calculation, given a sufficient amount of resources (i.e. infinite), not to show whether a specific implementation of a model does have those resources. Non-Turing complete models would not be able to handle a … Read more

what is the default kernel_initializer in keras

Usually, it’s glorot_uniform by default. Different layer types might have different default kernel_initializer. When in doubt, just look in the source code. For example, for Dense layer: class Dense(Layer): … def __init__(self, units, activation=None, use_bias=True, kernel_initializer=”glorot_uniform”, bias_initializer=”zeros”, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs):

What’s the difference between convolutional and recurrent neural networks? [closed]

Difference between CNN and RNN are as follows: CNN: CNN takes a fixed size inputs and generates fixed-size outputs. CNN is a type of feed-forward artificial neural network – are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing. CNNs use connectivity pattern between its neurons and is inspired by the … Read more

How to count total number of trainable parameters in a tensorflow model?

Loop over the shape of every variable in tf.trainable_variables(). total_parameters = 0 for variable in tf.trainable_variables(): # shape is an array of tf.Dimension shape = variable.get_shape() print(shape) print(len(shape)) variable_parameters = 1 for dim in shape: print(dim) variable_parameters *= dim.value print(variable_parameters) total_parameters += variable_parameters print(total_parameters) Update: I wrote an article to clarify the dynamic/static shapes in … Read more

What does the parameter retain_graph mean in the Variable’s backward() method?

@cleros is pretty on the point about the use of retain_graph=True. In essence, it will retain any necessary information to calculate a certain variable, so that we can do backward pass on it. An illustrative example Suppose that we have a computation graph shown above. The variable d and e is the output, and a … Read more

How to calculate the number of parameters for convolutional neural network?

Let’s first look at how the number of learnable parameters is calculated for each individual type of layer you have, and then calculate the number of parameters in your example. Input layer: All the input layer does is read the input image, so there are no parameters you could learn here. Convolutional layers: Consider a … Read more

What are some good resources for learning about Artificial Neural Networks? [closed]

First of all, give up any notions that artificial neural networks have anything to do with the brain but for a passing similarity to networks of biological neurons. Learning biology won’t help you effectively apply neural networks; learning linear algebra, calculus, and probability theory will. You should at the very least make yourself familiar with … Read more

What’s the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Having two different functions is a convenience, as they produce the same result. The difference is simple: For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1]. For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64. Labels … Read more

Should we do learning rate decay for adam optimizer

It depends. ADAM updates any parameter with an individual learning rate. This means that every parameter in the network has a specific learning rate associated. But the single learning rate for each parameter is computed using lambda (the initial learning rate) as an upper limit. This means that every single learning rate can vary from … Read more