Research On Parameters Correlation And Optimization In Text Similarity Measurement

Posted on:2011-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Xu

Full Text:PDF

GTID:2178360305994207

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of computer network and application technologies, Internet becomes the primary channel of information memory and communion, but it also brings the disaster of information high-speed increase. So information processing technologies such as Data Excavation, Information Retrieval and Text Classification emerge. As the basis of those information processing technologies, text similarity measurement technology has deep study significance and extensive application prospect.Parameters in text similarity measurement such as similarity threshold, precision, recall rate, size of moving window, shingle measure coefficient threshold, extractive rate and length of text are interrelated and complicated. The thesis firstly analyses pivotal technologies such as text mathematical expression, feature generation, feature picking and similarity calculation according to the clue of text similarity measurement implementation process; based on this, it implements and compares two kinds of the most typical algorithms; then it studies the correlation of those parameters combining the shingling algorithm experiment; at last it proposes the parameters optimization suggestions, and proposes and analyzes the parameters such as similarity threshold adaptable algorithm for text similarity measurement.The algorithm is applied to the system of text similarity measurement for the fund which has 7378 proposals in 2009. The results show that the algorithm has high performance in pratical use, and can make precision and recall rate achieve up to more than 95% no matter the length of the text is long or short.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Feature Selection Algorithm And Classification Algorithm In Chinese Text Categoriztion
2	Research In Chinese Text Proofreading Based On OCR
3	Research On Semantic Similarity Measurement For Text
4	The Research And Application On Text Similarity Measurement Based On Semantic Analysis
5	The Correlation Analysis Based On The Fuzzy Similarity Of Property
6	A Method For Text Similarity Measurement With TF-IDF And Word Semantic Information
7	Research On WordNet Based Chinese-english Cross Language Text Similarity Measurement
8	Research On Measurement And Channel Characteristic Parameters Extraction Algorithm Of 5G-R Wireless Channel
9	The Study Of Measures And Applications Of Short Text Semantic Similarity
10	The effects of speech rate and sentence length on the recall of synthetic speech for meaningful and anomalous sentences