| The rapid development of the "Internet Plus" has greatly facilitated people's lives and has also profoundly transformed many industries.Take "Internet + renting" as an example.At present,there are many online housing rental information platforms such as Anjuke and SouFun.When a tenant rents a house,most users will preferentially lease the house through the Internet.However,there are a lot of shortcomings in the lease information on the Internet,for example,some of the screening of the information itself is not strict enough,and most of them will appear exaggerated or hidden.Some may charge a certain amount of agency fees,which can be a controlled expense for tenants who wish to contact the landlord directly and save on agency fees.Renting houses has become a hot spot for many people nowadays.It makes sense to find out how to accurately and quickly find precise information on webpages by using a large amount of useless information,and to eliminate agents from efficiently contacting lessors or renters.This article aims to build a house leasing information platform based on information extraction technology for those who want to obtain real estate on the Internet and seek economic tenants.The platform is devoted to collecting a large number of rental and rent-seeking information on webpages.For the characteristics of the target group that tend to be true to information and even to economy,this platform has been designed and developed.The platform mainly collects information on non-intermediary listings posted by individuals on the webpage,and has a better information experience for both renter users and renter users.In the collection of information,this article mainly collects community websites,such as the Douban Renter Group and major campus BBSs,to extract detailed information on the housing.Users can filter target information based on key geographic locations and expected prices.The information sources provided by this platform are based on crawler technology and rules and deep learning information extraction techniques.The information extraction based on this platform is mainly for identifying and extracting Chinese named entities.At present,for the problem of named entity recognition in the field of sequence labeling,the mainstream solution model is the RNN-CRF model that combines the neural network with the CRF model.However,the RNN has the disadvantage of gradient diffusion when processing long texts.The commonly used LSTM is used instead.The information extraction model selects the BiLSTM-CRF model to realize the identification and extraction of the geographical location and organization name in the specific information of the listing;For the price and supply-demand relationship in the specific information of the listing,a rule-based information extraction model is adopted. |