Font Size: a A A

Design And Implementation Of Intelligent Feature Engineering Platform For Telecommunication Data

Posted on:2021-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2518306308967859Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age,human activities have brought about a rapid increase in massive data.Various modes hidden in big data require manual mining,and feature mining requires rich computer-related experience and deep business-related knowledge.Experienced data scientists are growing much slower than data,and data scientists do not necessarily have relevant knowledge in specific fields,therefore,this paper proposes automatic feature recommendation algorithms based on call behavior data(structured data)and call recording data(Chinese text data)in the telecommunication anti-fraud scenario.Using corresponding algorithm,a set of efficient and interpretable features can be quickly obtained.Further,in order to provide a support platform for the above algorithms,at the same time to adapt to the ever-changing feature mining scenarios,this paper proposes an intelligent feature engineering platform for telecommunications data.By using this platform,users can get rid of heavy coding work,use the traditional feature mining functions provided by this platform to quickly perform data analysis,and make feature engineering work into the dragging and connecting of components,thereby focusing the research direction on feature mining method instead of coding.By using the automatic feature recommendation algorithm provided by this platform and the traditional feature mining function encapsulated,users can quickly complete data analysis problems in different scenarios.The key algorithm difficulty of this paper is automatic feature recommendation.Because the automatic feature recommendation algorithms for different types of data sets are quite different,this topic is based on the telecommunication anti-fraud scenario,and solves the automatic feature recommendation problem in two scenarios of structured data and Chinese text data.The automatic feature recommendation algorithm is also an important manifestation of the "intelligence" of this platform.This paper proposes "Structured feature recommendation algorithm based on subset search and feature crossing" and "Chinese text feature recommendation based on complex networks" algorithms.The former algorithm borrows local optimal greedy processing ideas in the process of feature selection and feature crossing.The execution time is much shorter than traditional methods on the basis of ensuring accuracy and recall.The latter starts from the two key steps in Chinese text classification based on Chinese word segmentation and keyword extraction,respectively,and proposes "bi-gram based bidirectional maximum matching word segmentation" and "complex network-based keyword extraction" solutions,providing the basis of a set of outstanding feature sets is presented,and as a by-product,the results of Chinese word segmentation and keyword extraction are given.Experimental results show that the recommendation algorithms in two different scenarios are both better than the existing traditional methods.In order to implement this platform,this article first elaborates the background and practical significance of the entire subject research,and analyzes the research status of this subject at home and abroad in academia and industry.Then,the research significance of automatic feature mining algorithms and artificial feature mining functions of the entire platform are analyzed.According to different purposes in the feature engineering process,the analysis method is divided into four different modules:data preprocessing,statistical analysis,original feature evaluation,and hidden feature construction.Then researches and solves the key problems that the system needs to implement.Then,the architecture and functional modules of the entire platform are designed and implemented.The internal classes and interfaces of the module are introduced in detail,and how the user's request interacts between the modules is explained.Finally,the deployment and testing of the platform are explained,and the correctness and integrity of the entire platform are verified.
Keywords/Search Tags:feature engineering, feature recommendation, structured data, chinese text data
PDF Full Text Request
Related items