Title: Probabilistic Latent Semantic Indexing
Author: Thomas Hoffmann
summarization:
This paper proposed a pLSA model which is a automatic indexing method based on statistical latent class model. As a simple example, give the matrix of document-term, this model could extract the concept of "topic", which measures the probability of each word to each topic and the relation between topic and document. A EM model is used to iterative compute the probabilities and try to reach the maximum likelihood and the process is done then.
critique:
pLSA is a useful tool that to extract the hidden topic, which I think is very interesting. But the problem of it is the efficiency, largely computation need to be compute during the process, especially when the word number is large. So, how to enhence the efficiency would be a important issue for it.
沒有留言:
張貼留言