About 5,000 images per minute are uploaded to the photo-sharing site http://www.flickr.com/; over 7,000,000 a day. Similar numbers are uploaded to other social sites. Often these images are acquired by amateur photographers under non-ideal conditions and with low-end digital cameras such as those available in mobile phones. Such images often look noisy, blurry, and with the wrong colors or contrast. Even images acquired by high-end devices, such as MRI or microscopy, suffer from these effects due to the intrinsic physics of the device and the structure of the material being photographed. A key challenge in image science then is how to go from the "low-" quality image to a high-quality one that is sharp, has good contrast, and is clean of artifacts. This is an intrinsically ill-posed inverse problem, according to Hadamard's definition. So, what do we do?
We have to include additional assumptions, a process often called regularization. These assumptions come with different names depending on one's particular area of research or interest, and are often called priors or models. Deriving appropriate regularization terms, priors or models, has occupied the research community since the early days of digital image processing, and we have witnessed fantastic and very inspiring models such as linear and nonlinear diffusion, wavelets, and total variation. Different image models can be appropriate for different types of images; for example, MRI and natural images should have different models. Indeed, some models might be useful for some inverse problems and not for others.
In their landmark paper, Buades, Coll, and Morel discuss a number of image models under a unified framework. Let us concentrate on the self-similarity model, which leads to the important non-local means algorithm proposed by the authors for image denoising and its extensions to other image inverse problems. The basic underlying concept is that local image information repeats itself across the non-local image. Noise, on the other hand, is expected in numerous scenarios to be random. Therefore, collecting those similar local regions all across the image, the noise can be eliminated by simple estimators based on having multiple observations of the same underlying signal under different noise conditions. This simple and powerful idea of self-similarity, which brings a unique perspective of simultaneous local and non-local processing, dates at least to Shannon's model for English writings in 1950 ("Prediction and Entropy of Printed English," Bell Sys. Tech. J., 5064), and was used in image processing for synthesis tasks. But it was not until the 2005 elegant paper by Buades et al. that the community had its Eureka moment and clearly realized it could be exploited for reconstructions challenges as well.
This idea of self-similarity opened a large number of questions. At the practical level, we could ask how to define the scale of the local regions, how to efficiently find similar regions in an image, how to define the distance between local image regions to determine that they are "similar," and what type of image processing tasks can be addressed with this model. At the theoretical level, standard questions like consistency of the estimator and its optimality are naturally raised. The image processing community is busy addressing these questions.
In their landmark paper, Buades, Coll, and Morel discuss a number of image models under a unified framework.
There is another critical aspect clearly illustrated by the following seminal work, this being the idea of addressing image inverse problems with overlapping local image regions, or overlapping image patches. In many scenarios this became the working unit, replacing the standard single point or pixel (sometimes these are now called super-pixels). While some researchers have adopted models that are different than the self-similarity one, it is safe to say that today, six years after their original paper was published, the state-of-the-art techniques for image reconstruction, as well as for image classification, are all based on working with these super-pixels or patches. This has become a fundamental building block of virtually all image models.
The authors' work also starts hinting at the idea that we can learn the model from the data itself, or at least adapt it to the image, instead of relying on predefined mathematical structures. This relates to dictionary learning, where the image is modeled as being represented via a learned dictionary. The self-similarity model assumes the dictionary is the image itself, or actually its local patches. All these models indicate that images, and in particular image patches, do not actually live in the ambient high-dimensional space, but in some much lower dimensional stratification embedded on it.
For over 40 years, the image processing community has been on the lookout for image models. The most fundamental of them have left important footprints in the community. Many of the questions are still open today, from the eternal battle between generative and discriminative models to the need of deriving computationally feasible and fundamentally useful models. All this work goes to the root of our desire to know "What is an image?"
©2011 ACM 0001-0782/11/0500 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2011 ACM, Inc.