2009年6月20日星期六

paper critique &summarization : Rapid object detection using a boosted cascade of simple features

Title:Rapid object detection using a boosted cascade of simple features
Author: Paula Viola and Michael Jones

Summarization:
Adaboost is a useful tool which based on the concept that collect many "weak classify" will construct a "strong classify" that is solid. In every stage, it test all data and add the weight with the wrongly cases and reduce the weight with the collect cases, which want to minimize the training error. And this paper also uses a technique called "Integral image" to improve the calculation efficiency based on a simple addition and subtraction. Adaboost could use for feature selection, this paper use it for face detection, which could filter the non-face image one by one classifiier with important decrease, and it is also suitable for other features to select the valuable feature.

critique:
The idea of this paper is cool and simple, but it is a supervised algorithm, so the training data is important for the performance. Moreover, as a feature selection algorithm, it also select the most important feature easily, whcih could be useful and efficiency for approximate classify.

paper critique &summarization : Algorithms for Fast Vector Quantization

Title: Algorithms for Fast Vector Quantization
Author: Sunil Arya and David M. Mount

Summarization:
For fast vector quantization, this paper introduced three algorithm:
1.standard KD-tree with incremental distance calculation, which uses the information of distance between query and boundaries to decide which path could be stopped.
2.priority kD-tree search , which maintains a priority queue of subtrees to record the sibling of the node pass down and could find the really near point with query.
3. neighborhood graphs, which use the neighborhood graph to improve precision, it will expand query to the nearest neighbor and pass down until a point which all neighbor have been parsed, this method could get best performance of the three methods.

critique:
KD-tree based quantize algorithm is efficiency, but only in low dimension vector space, how to use these methods in high dimension and could reach nearly same performance is very important now.

paper critique &summarization : Probabilistic Latent Semantic Indexing

Title: Probabilistic Latent Semantic Indexing
Author: Thomas Hoffmann

summarization:
This paper proposed a pLSA model which is a automatic indexing method based on statistical latent class model. As a simple example, give the matrix of document-term, this model could extract the concept of "topic", which measures the probability of each word to each topic and the relation between topic and document. A EM model is used to iterative compute the probabilities and try to reach the maximum likelihood and the process is done then.

critique:
pLSA is a useful tool that to extract the hidden topic, which I think is very interesting. But the problem of it is the efficiency, largely computation need to be compute during the process, especially when the word number is large. So, how to enhence the efficiency would be a important issue for it.

paper critique &summarization : The structure and function of complex networks

Title : The structure and function of complex networks.
Author: M. E. Newman

summarization:
This paper introduces the basic properties of and models of network.
First, it tells us what is a network, and some types of different networks, which maybe useful in our real life.

And it starts to introduce the properties:
small-word(which famous),
transitivity(mention a good measure for density of network),
degree distributions(which tell us the long tail),
network resilience(some vertices are more important in the whole network),
miximg pattern ( a connect rule between vertices),
community structure (mentions a hierarchical algorithm to extract the cluster in network).

moreover, some models to construct the network are also be mentioned:
configuration model(the simplest model which just random connected)
Price's model(based on the theory that "The rich get richer", so generate the long tail)
Barabasi and Albert's model(based on Price's model, but undirected)

This paper also discussed some real world problem like the transmitted disease, measure the transmit speed and how to process. but I think it is still far from real case.

critique:
network provides a good tool for visualize many problems in the world, but as I have seem, most research still couldn't fit the real case well just like the semantic gap problem in image retrieval, but it provides us a direction to solve the problems.

2009年5月12日星期二

paper critique &summarization : Rapid Object Detection using a Boosted Cascade of Simple Features

Title : Rapid Object Detection using a Boosted Cascade of Simple Features
Author : P. Viola and M. Jones

summarization:

Three major contributions:

1. introduce a new image representation call "integral image".
there 2 reasons why using features rather than pixels dirctly:
(1) can encode ad-hoc domain knowledge from training data.
(2) feature based system much faster.
this feature compute the sum of pixel values with a rectangular regions,
and use the difference between 2, 3, or 4 regions as the feature.
The best advantage is this paper use a dynamic programming way to compute the
difference between regions efficiently.

2. learning algorithm based on AdaBoost.
the object of AdaBoost is to select small set form all feature set and train the classify.
It selects the feature which is significant and achieve fewer than 1% false negatives
and 40% false positives by the experiment.

3. the method for combining increasingly more complex classifiers in a "cascade",
which like a filter and quickly discarded the noise region. This method could reject
many of negative regions and nearly detect all of positive regions.


critique : I think two methods in this paper is attractive. one is the efficiently feature computing.
the dynamic technique reduce the total computation needed. second is the cascade
method, reject most non-target at earlier stage but still guarantee the detection of
positive region.
and there are discussion after each issue, I think it is more readable for readers.

2009年4月7日星期二

paper critique & summarization :Algorithm for Fast Vector Quantization

Title : Algorithms for Fast Vector Quantization
Authors : Sunil Arya, David M. Mount

summarization:

This paper proposed three algorithms:

The first is the standard K-d tree with incremental distance calculation, which use a offset technique, to examine surrounding buckets to check that if any point is more near but in different buckets, to find the really nearly neighbor points with query point.

The second algorithm is the priority k-d three search, which try to find the nearly neighbor before run to the all algorithm termination. It maintains a priority queue which record the priority of every possibly subtree, and search will terminate when the queue empty(means that all tree have been traversed) or the highest priority subtree has larger distance to query point than the closest point we find in the sequence step.

And the third algorithm is the Neighborhood graphs, which is a simple and greedy algorithm.
It builds graph from data points by the rule that if any two points, we can't find any point which has both shorter distance to this two point than the distance between these two points, then they have a edge. And the algorithm expand the points that is closest to the query point recursively, until arrive at a point which its neighbor are all expanded or get a predefined number. Then output the closest data point visited.

critique:

Actually, what I care about of this issue is how they perform on the high dimensions domain, the dimensions of their test data are just 8~16, for my research, the dimensions always over then 100, and we have known that at least for the standard K-d tree, we have worse performance in time issue, so maybe we need to find another quantization method in image retrieval domain.

2009年3月24日星期二

paper critique & summarization : Shape Matching and Object Recognition Using Shape Contexts

Title : Shape Matching and Object Recognition Using Shape Contexts
Author : S. Belongie, J. Malik, and J. Puzicha

Summarization:

This paper proposed a shape descriptor : shape context, to describe the coarse distribution of the rest of the shape to a point on the shape. And solve the correspondence problem by computing sum of matching errors between corresponding points to find the nearest neighbor.

In their definition, a shape is a set of edge pixels which are found by edge detector. For computing the distance between two shape, they used the shape context which compute the coarse histogram of n-1 points in the relative coordinates position by the query point, and used this histogram to compute the difference with the other shape to find the best match points-pair with the other shape.