Playing with image embeddings

Quite a while ago I worked on image retrieval and made a little experiment with algebraic operations on image embeddings extracted from convolutional network. I downloaded a set of publicly available photos, extracted feature vectors using pretrained ResNet 50 and applied cosine distance KNN search using linear combinations of the following query vectors:

All documents and queries can be encoded with the following few lines of code using Caffe:

def encode(url):
    img = open_image(url)
    img = preprocess(img)
    data = np.asarray([img])
    if net.blobs['data'].data.shape[0] != 1:
        net.blobs['data'].reshape(1,3,224,224)
    result = net.forward(data=data)
    return net.blobs['pool5'].data[0].flatten().copy()

Encode your documents and queries and perform nearest neighbor (NN) search. For fast NN search I put vectors into Annoy index.

    
def search(query_vector):
    # get 9 ids of nearest vectors from index
    ids = index.get_nns_by_vector(query_vector, 9)
    # load images by ids and show them
    images=[]
    for i in ids:
        img = Image.open(id2file[i])
        if img is None:
            continue
        img = centeredCrop(np.array(resize(img, 128, 128)), 128, 128)
        images.append(img)
    plt.figure(2, figsize=(10,10))
    show(np.array(images))
search(sea)

png

search(sea+woman)

png

search(building + crowd*0.5 + sea*1.5)

png

search(woman + car*3 + dress)

png

search(coffee*1.3 + burger)

png

search(man + dress)

png

search(woman_in_dress)

png

search(woman_in_dress - dress)

png