With the development of the Internet, the amount of resource accessible to people extended greatly, far exceed beyond the human ability to manually process. This situation made the technology which can fast and precisely locates the information an urgent need. Information retrieval is a field emerged aiming to satisfy this need. It focuses its attention on every aspect concerning information, ranging from the representation and storage to the organization and acquisition. This thesis made its standpoint on the query representation for information retrieval. It has been observed that short queries usually perform well than their corresponding long versions when submitted to the same retrieval engine. This is mainly because most of the current retrieval models taking the terms in the query as equally important. This makes the documents that apt to non-important terms ranked higher than they should be and, relatively, makes the others lower and finally hurts the retrieval performance. This thesis focuses its attention on this drawback of the traditional method and tries to distinguish the importance between different query terms that represent the user's information need. By utilizing this information, it can finally enhance the retrieval performance. The central framework of the method adopted in this thesis is the hidden markov model, which we will discuss in detail in later chapters. We will show the advantage of integrating this model with the tradition IR model to handle the problem by a large body of experiments and finally find the optimal configuration. Experimental results show that the method can assign most of the terms to their corresponding weighting level precisely and we will see that even mapping these weighting levels linearly to the real-valued weights in retrieval model, improvement under significant statistical test (t-test) on the final retrieval performance can be observed consistently. This shows that our method can do effectively capture the weighting information embedded in the sentence structure and also further potential of our method. |