| Open relation extraction can extract the corpus flexibly without presetting the relation word list,and organize the knowledge quickly and effectively.However,corpus extracted by open relation usually contains a large number of complex texts.The existing open relation extraction methods are not effective in extracting such complex texts,the main problems are as follows :Firstly,The sentence structure of the text is complex,so it is difficult to analyze the accurate result of syntactic analysis to provide data support for open relation extraction.Secondly,Entity words in complex texts are usually noun phrases composed of multiple words,which are difficult to identify.Finally,it is difficult to extract all the relational data completely due to the overlapping of relationships in complex texts.Aiming at the above problems,this paper proposes two optimization methods of open relation extraction based on long sentence simplification and joint relation extraction based on multi-task learning to improve the performance of open relation extraction for complex texts.The main research contents are as follows:(1)An open relation extraction method based on long sentence simplification is proposed.The method firstly simplifies complex long sentences by using sequence to sequence model,and then extracts the simplified sentences according to rule template extraction method.In the process of relation extraction,entities are identified according to the heuristic rules of part-of-speech information,and then special extraction rules are designed for simplified clauses based on the results of syntactic analysis.(2)An open relation combined extraction method based on the multitasking learning is proposed.The method using sequence to sequence model to complex text directly.In this method,multi-relational data is sequentially transformed by a special relational sequence representation method,and then multi-task learning of entity label prediction and relation extraction is realized based on sequence annotation and special mask mechanism.Finally,the model is guided to generate entities in the relational data according to the predicted labels.(3)The knowledge base of the development process of thermometer paint was constructed.Firstly,according to the opinions of experts in the field,the corpus of knowledge about the development process of temperature indicating paint was collected from reference books and domestic periodicals.Then according to the open relation extraction method proposed in this paper,the relational data of the development knowledge of temperature indicating paint is extracted.Finally,according to extracted relational data and sorted entry data,the domain knowledge base is constructed and visualized. |