For the specialty of vertical search engine, this paper deals with problems of product hierarchy extraction and product-oriented query expansion. The main innovation contributions of this paper are listed below:1. This paper proposes an algorithm of Product Hierarchy Extraction based on Page Analysis. This algorithm is aimed at detecting repeating patterns from encoded pattern string according to the node's DOM path from leaf to root. And then we classify product-urls into several categories. Finally we pick a name in the page for each category. We got an accuracy of 71% in clustering and an accuracy of 77.3% in naming.2. This paper presents a novel method based on Concept Lattice which can give user query expansion. In information retrieval, the term of doc-keywords relation can be regarded as the context in Formal Concept Analysis. Thus, a document represents an object and its keywords represent attributes, and a concept lattice can be constructed. According to the distance between concept nodes in a concept lattice and the product hierarchies, we can get product-oriented query expansions. The result shows that users find a piece of information became more concisely and quickly.3. This paper presents a smart information retrieval system. Based on this smart system, people can lead their researches more easily in a personal lab.Part 1 is the preprocessing step, part 2 is the kernel module of the paper, and part 3 is the engineering implement. |