| In the world with enormous information, text is the important format for information distributing and storage, for its flexibility, capability and convenience. How to process the masses of text data so that they can be managed and made use of efficiently is one of the fundamental problems of this age. And in text processing, measuring the relationship between the texts and making the chaotic texts into clusters according to their content so that the detailed following process can be applied on them is a paramount problem.For a long time, since lacking of deep discussing on the connotation of the concept of"relevant", researchers of computer science always use the text similarity calculating instead of the text relevance calculating in texts relationships measuring. But this approximate method with inexplicit incentive can not satisfy the requirements of the applications emphasizing"relevant".In this paper, based on the analysis of the"relevant"concept offered by the researchers from both of the cognize and information science, the system oriented relevance calculating mode is improved at the semantic level. It takes advantages of general knowledge of the users, and makes the system oriented relevance calculating mode moved towards the user oriented mode in order to simulate the human relevant judgments. For two sub-types of texts, sentences and documents, we do research on the relationships measuring of them respectively. And the corresponding applications of them are also discussed. The detailed content of this paper includes:An improved system similarity model based on the system similarity theory for sentences retrieval in Question-Answering system is proposed. It makes the latent answer elements contribute to the text similarity degree through offering respective simulated similar parameter. In this way, it changes the similarity calculating model into the relevance calculating model, and satisfies the requirement the Question-Answering system. The system which takes this processing as the main character achieved excellent result in the authoritative international test and the further evaluation of this method on this test data also confirms its effectivity.Besides the calculation between sentences, a novel relevant calculating method between documents is proposed. Based on the lexical cohension theory, and with the help of knowledge resources, we detect the semantic relationship between words, and propose a document representation method based on lexical chain, a lexical chain weight calculating method and a respective documents matching method. Depending on the analysis of the features of human relevant judgments, we proposed an evaluating method for document relevant calculation through documents classification. The test results show that the lexical cohension based method works successfully.Further more, we present a distance flexible method for the detection of words semantic relationship. And through analyzing the inner structure of lexical cohension, we present a document relevant calculating method based on lexical cohension with structure information. And the advantage of this method is proved in the experiments.To support the training of the pharmacokinetics model in new drug development, we do research on the application of text filtering. The filter system gets the papers about pharmacokinetics parameters by applying document relevant calculation based on lexical cohension. The structure of it and the special text pre-processing for this special field are also described. In the evaluation for 8 drugs of 3 classes, substrate, inducer and inhibitor, it is indicated the filtering system which takes the document relevance calculating method based on lexical cohension as the central processing step gets excellent results. It makes significant effort in improving the efficiency of the drug development. |