Playing with image embeddings

Quite a while ago I worked on image retrieval and made a little experiment with algebraic operations on image embeddings extracted from convolutional network. I downloaded a set of publicly available photos, extracted feature vectors using pretrained ResNet 50 and applied cosine distance KNN search using linear combinations of some query vectors. All documents and queries can be encoded with the following few lines of code using Caffe:

def encode(url):
    img = open_image(url)
    img = preprocess(img)
    data = np.asarray([img])
    if net.blobs['data'].data.shape[0] != 1:
        net.blobs['data'].reshape(1,3,224,224)
    result = net.forward(data=data)
    return net.blobs['pool5'].data[0].flatten().copy()

Encode your documents and queries and perform nearest neighbor (NN) search. For fast NN search I put vectors into Annoy index.

    
def search(query_vector):
    # get 9 ids of nearest vectors from index
    ids = index.get_nns_by_vector(query_vector, 9)
    # load images by ids and show them
    images=[]
    for i in ids:
        img = Image.open(id2file[i])
        if img is None:
            continue
        img = centeredCrop(np.array(resize(img, 128, 128)), 128, 128)
        images.append(img)
    plt.figure(2, figsize=(10,10))
    show(np.array(images))
search(sea)

png

search(sea+woman)

png

search(building + crowd*0.5 + sea*1.5)

png

search(woman + car*3 + dress)

png

search(coffee*1.3 + burger)

png

search(man + dress)

png

search(woman_in_dress)

png

search(woman_in_dress - dress)

png

Comments