Forward-Backward Algorithm

Forward-backward algorithm (FB) is a particular case of a dynamic programming. This algorithm is well known in the context of Hidden Markov Models (HMM) where it is used for training and inference, Kalman Smoothers and Connectionist Temporal Classification (CTC). It consists of two passes: the first pass goes forward in time and second pass goes backward, hence the name.

More …

Playing with image embeddings

Quite a while ago I worked on image retrieval and made a little experiment with algebraic operations on image embeddings extracted from convolutional network. I downloaded a set of publicly available photos, extracted feature vectors using pretrained ResNet 50 and applied cosine distance KNN search using linear combinations of some query vectors.

More …

Semantic clustering of images

Recent studies in the field of computer vision have shown the abilities of deep convolutional neural networks to learn high level image representation. These representations have reach semantic and can be very handy in various visual tasks. One of such task that can make use of high level features is semantic clusterization. Let’s find some pictures in Google Web Search, feed them to CNN and apply k-means to the extracted feautures and take the biggest clusters, here is what I got for different queries:

More …

Why is top-5 error useful for evaluating ILSVRC models

For the most publicly available models trained on ImageNet dataset top-5 and top-1 errors are reported. The best perfoming model from ILSVRC 2015 has 6.7% top-5 error and 23% top-1 error evaluated using singe center crop. It’s pretty interesting to find out that human error turned out to be not much better, top-5 error is around 5.1% according to the result of some experiments descibed in the article.

More …

Preparing multilabel dataset for training ConvNet with Caffe

Preparing multilabel training set for caffe framework is a bit nontrivial. So, if you have multiple, possibly varying number of ground truth labels for each training example then here is how you can do it using LMDB store. For LMDB data source you need to separate your data input and your labels by creating two LMDB (one for the data and the second one for the labels). You also have to define two data layers in your network definition, set the same batch size for both of them and disable shuffling for the alignment.

More …