| With the application and popularization of the Internet,the network novel based on the network basic platform,as an emerging novel type,has developed rapidly.Now it has become the main force in the field of Chinese novels.Compared with the traditional style,the style of online novels is free and the subject matter is unlimited,but the high degree of autonomy makes the articles published on the platform mixed.If the lowquality content can’t be effectively eliminated,it is very likely to cause wrong values and public opinion guidance.In the face of the massive text published on the platform every day,the traditional manual audit obviously can’t meet the demand.Therefore,the automatic audit of Chinese novel text based on deep learning technology is of great significance to improve the audit efficiency of online novel platform and save labor cost.Sensitive word recognition and automatic classification are two core elements of Chinese novel text content audit task.The traditional sensitive word recognition method based on sensitive word list matching is easy to omit sensitive word variants.This thesis analyzes the generation law of common variant characters in the novel text,and puts forward a generation scheme of Chinese variant characters.The similarity network between Chinese characters is constructed by combining the Pinyin information of Chinese characters and the font information of Chinese characters.The font information of Chinese characters is composed of four parts:the structure of Chinese characters,the splitting of components,the four corner coding and the number of strokes.Then,the Chinese characters with high comprehensive similarity are selected as the generated variant words.The final variant words fully consider the similarities in pronunciation and font with the original words.Based on this scheme,the corresponding variants of the sensitive word list are generated to improve the recognition accuracy of the sensitive word variants in the novel text.On the other hand,aiming at the problem that the traditional classification model does not make full use of the classified label text,resulting in the loss of potential semantic information,this thesis proposes a label embedding model based on attention mechanism,which uses the classified label text as additional semantic information,and captures the semantic information of the input text sequence and the classified label respectively through the improved DRNN model and the fully connected network.Then,the attention weight between text vocabulary level semantic information and label semantic information is calculated,so as to obtain the importance of each word to each classification label,and reduce the interference of useless words to the classification network.In this thesis,several groups of comparative experiments are designed on the Chinese novel text dataset to verify the effectiveness of the above two algorithms.Based on the above two algorithms,this thesis designs and implements a Chinese novel text content audit system.Users can complete the online audit task of novel text by simple operation on the front end of the web page corresponding to the system.Finally,this thesis tests the function and performance of the system to verify the effectiveness of the Chinese novel text content audit system. |