Font Size: a A A

The Research Of Technology Project Similarity Calculation Based On Hadoop

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:H N LiuFull Text:PDF
GTID:2308330479999192Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since the implement of "national medium-and long-term programmed for scientific and technological development(2006-2020)", the rapid growth of Public Finance Technology Input and the continuous improvement of technology projects and funds management provide a strong support for the development of science and technology in our country. At the same time, it brought new challenges to research project management:firstly, with the increase in technology project arise many problems, such as project repeated declaration, management is not enough scientific and transparency. Secondly, as the subject refinement and interdisciplinary integration have become increasing severe, extensive exchanges and cooperation in science research project is an important driving force for the development of science and technology. The reasonable integration based on the similarity of projects is the right trend of future development. The key to solve those problems is strengthen the similarity analysis of the project. the similarity analysis of the project Generally, the similarity calculation of the application to find the similarity projects should be done to provide a basis for the science research project. The main research contents include:Firstly, by analyzing the key technical of technology projects similarity calculation, technology project application exits a large number of technical term, this paper proposes an improved method to identity unknowm words based on directed net of word-sequence frequency. This method filters the application words based on POS. then filters the extracted unknown word by stop list. These unknown words become part of the characteristics, and build representation model of project based on vector space and group model. Then calculating the similarity of applications based on this model.Thirdly this paper proposes the maximun clique to solve the similarity of the group model, the similarity of the group model can be solved by the maximun common subgroup, at the same time the maximun common subgroup problem can be converted into the maximun clique problem.In the end, with the increasing of technology projects, technology projects similarity related to the text preprocessing, feature extraction and text similarity calculate model, such calculates need much time. To solve this problem this paper using Hadoop platform, using Mapreduce parallel to calculation text similarity.
Keywords/Search Tags:technology project, similarity calculation, graph model, the maximum clique, Hadoop
PDF Full Text Request
Related items