In the era of big data,Internet users are always producing a large amount of data on various terminal devices,so it is particularly important to extract valuable information from massive data efficiently and quickly.The business modules involved in the big data processing program are complex and cannot respond quickly when working together across systems.The traditional image label calculation method has some logic flaws,which can not accurately describe the characteristics of users.Aiming at the real-time and accuracy of massive data processing,this thesis studies the real-time portrait processing technology based on large e-commerce logs,and builds a large data screen for business operations.The main research contents are as follows:(1)In order to improve the accuracy of user portrait label results,this thesis proposes to build an ID mapping dictionary based on the most general subgraph algorithm to achieve ID connection,use Naive Bayes algorithm to analyze user semantic emotion,calculate preference labels,and use multiple algorithms to comprehensively predict user gender labels.(2)In order to improve the data processing capacity and response speed,this thesis proposes to integrate bypass cache and enable asynchronous IO to improve the data query speed and reduce the thread blocking time,and realize the function of dynamic shunting auto sensing based on Flink.(3)Around the research problems and research purposes,this thesis builds a real-time portrait processing system based on the optimization scheme,which is implemented and tested from the aspects of data access,portrait label calculation,visual large screen display and so on.The experimental results show that after the ID is opened,the user’s data record is more comprehensive,the results of the preference tag with subjective comments and the gender tag calculated by multi algorithm classification are more accurate,the data processing program is more flexible after using dynamic streaming,and the response speed of the system is greatly improved after adding cache and enabling asynchronous io. |