Final Project 1998.2

Multiresolution Query by Image Content

intro.gif (64744 bytes)

Objective

The purpose of the project was to implement a system, based on wavelets, for querying image databases by content.

Description

The project was based on the article "Fast Multiresolution Image Querying" by Charles E. Jacobs, Adam Finkelstein, and David H. Salesin on SIGGRAPH'97 Proceedings.

Team

The following students participated on the project

Alexandre Ferreira (alexgf@tecgraf.puc-rio.br)
Carlos Vitor Alencar (alencar@tecgraf.puc-rio.br)
Marcos Machado (mmachado@tecgraf.puc-rio.br)
Paulo Mattos (pmattos@tecgraf.puc-rio.br)
Rodrigo Toledo (rtoledo@tecgraf.puc-rio.br)
William Lira (william@tecgraf.puc-rio.br)

Slides of a talk about the project

[ Acrobat (PDF) version ]

Availability

The source code will be available soon.

Implementation

The program was implemented in C with IUP (user interface), CD (graphical package) and IM (image file persistence) libraries. It works across all major-computing platforms: Windows 95/98/NT, Silicon Graphics, Sun, Linux e IBM RISC600.

The libraries are freeware for academical purpose. To acquire it, send a message to TeCGraf: tecgraf@tecgraf.puc-rio.br.

The documentation of the used libraries is available at :

IUP: http://www.tecgraf.puc-rio.br/manuais/iup
CD: http://www.tecgraf.puc-rio.br/manuais/cd
IM: http://www.tecgraf.puc-rio.br/manuais/im

Algorithm

The key to the algorithm is the establishment of an effective and efficient metric capable of computing the distance between a query image Q and a potential target image T. Wavelet decomposition proved to be a good foundation for this metric, for several reasons, such as: few coefficients provides a good approximation of the original image retaining information from existing edges; presents relative invariance to resolution changes; it is fast to compute, running in linear time in size of the image; spatial localization of the frequencies. The chosen metric use the YIQ color space and the Haar wavelet.

For each image we compute its Haar wavelet transform, truncating and quantizing its coeffecients. Those remaining coeffecients represent the image signature.

figure3.gif (19639 bytes)

A painting its truncated and quantized wavelet decomposition with 2000 coefficients
(Y color channel)
the actual decomposition used with 60 coefficients
(Y color channel)

The metric for each color channel can be represented as:

metric1.gif (2061 bytes)

Where Q[0,0] and T[0,0] correspond to the overall average intensity of that color channel; Q'[i,j] and T'[i.j] represent the [i,j]th truncated (the mth greatest), quantized (-1, 0, 1) wavelet coefficients (terms) of Q and T; and w{i,j} the weight of the [i,j]th coefficient.

Some simplifications can be applied:

  • Actually, the expression |Q'[i,j] - T'[i.j]|, can be replaced by (Q'[i,j] = T'[i.j]), where it means 0 if it's really different, and 1 otherwise.
  • The terms can be grouped into "buckets" so that each group has a weigh instead of each term.
  • In order to make the metric even faster, the only terms considered are the non-zero coefficients in the query wavelet (Q'[i,j] is non-zero). (Note that the metric becomes asymmetric after this simplification.)
  • Q[0,0] and T[0,0] are reals in the range [0,1], where 0 represents the lowest average color of all the target images and 1 the greatest.

The resulting metric is:

metric2.gif (2227 bytes)

The function bin(i,j) provides a way of grouping different coefficients into a small number of bins, with each bin weighted by some constant w [b]. For a given set of bins, the best weights w [b] were found experimentally,

The application implements both the pre-processing and query interface of the algorithm. The pre-process phase consists in creating an image database containing the signatures of each image. The program offers two query options: the user inputs a pre-existing image or the user draws a sketch of the intended image on the canvas.

The figure below show the screen interface of the program:

figure1.gif (82382 bytes)

 

Some improvements

In our implementation we made two basic changes from the original paper:

  • To use a low-resolution database, and also use a low-resolution version of the image provided for querying. This speeds up the process substantially and gives essentially the same results since the continuous wavelet transform is invariant under change of scale and almost all of the largest m coefficients are located in the low resolution.
  • We implement a estimated percentual of aproximation between the query image and the images in the data base.
  • We also tried to use another Haar decomposition, which is faster than the one used by Salesin et al.. The results were good, in spite of using the same weigths of the Haar initially used.

Some examples are given below

Data base: Animals (100 images):

shark.gif (360661 bytes)

 

Data base: Van Gogh (500 images):

image1peq.jpg (8482 bytes)image3peq.jpg

image4peq.jpgimage7peq.jpg

image8peq.jpgimage10peq.jpg

image13peq.jpgimage12peq.jpg (7981 bytes)

 

Data base: Clipart (100 images):

clip1peq.jpg (8482 bytes)clip2peq.jpg

clip3peq.jpg (8482 bytes)clip4peq.jpg

clip5peq.jpg (8482 bytes)clip6peq.jpg

clip7peq.jpg (8482 bytes)clip8peq.jpg

 

Extension to Video Query

The image query algorithm described above can be extended to volumetric graphical objects. The extension is straightforward:

  • The volumetric object is given by a matrix representation (voxels);
  • We extend the Haar transform to volumetric objects using tensor product;
  • The computation of the volumetric signature is also a simple extension of the 2D solution.

The video query consists in considering a video sequence as a volumetric object (see figure below). Two possibilities for que input video query are possible: Use a video shot of the desired sequence (a time normalization should be considered). Use a paint system interface where user strokes, represents the motion path of an essential motion in the required video sequence.

vq.gif (10918 bytes)