Nicer Causal Convs for Tensorflow 2

Causal 1D convolution are quite useful when working with autoregressive models like WaveNet. To define this layer in tensorflow 2 we just pass “causal” padding in Conv1D layer arguments as is the following one liner:

conv_layer = tf.keras.layers.Conv1D(
    filters=64, 
    kernel_size=2,
    dilation_rate=4, 
    padding='causal')

Calling this layer will preserve the temporal dimension of the input by adding left padding which is not always desirable. As we stack more and more layers with larger dilation rate padding will become a large portion of the input data. Also the default implementation of dilated convolution layer implicitly define some padding based on your first input restricting your choice of the input shape later on.

More …

Forward-Backward Algorithm

Forward-backward algorithm (FB) is a particular case of a dynamic programming. This algorithm is well known in the context of Hidden Markov Models (HMM) where it is used for training and inference, Kalman Smoothers and Connectionist Temporal Classification (CTC). It consists of two passes: the first pass goes forward in time and second pass goes backward, hence the name.

More …

Playing with image embeddings

Quite a while ago I worked on image retrieval and made a little experiment with algebraic operations on image embeddings extracted from convolutional network. I downloaded a set of publicly available photos, extracted feature vectors using pretrained ResNet 50 and applied cosine distance KNN search using linear combinations of some query vectors.

More …

Semantic clustering of images

Recent studies in the field of computer vision have shown the abilities of deep convolutional neural networks to learn high level image representation. These representations have reach semantic and can be very handy in various visual tasks. One of such task that can make use of high level features is semantic clusterization. Let’s find some pictures in Google Web Search, feed them to CNN and apply k-means to the extracted feautures and take the biggest clusters, here is what I got for different queries:

More …

Why is top-5 error useful for evaluating ILSVRC models

For the most publicly available models trained on ImageNet dataset top-5 and top-1 errors are reported. The best perfoming model from ILSVRC 2015 has 6.7% top-5 error and 23% top-1 error evaluated using singe center crop. It’s pretty interesting to find out that human error turned out to be not much better, top-5 error is around 5.1% according to the result of some experiments descibed in the article.

More …