Font Size: a A A

Research And Implementation Of Technology For POI Chinese Address Fuzzy Matching

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:S L ShanFull Text:PDF
GTID:2428330605474904Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of ecommerce and crowdsourcing technology,the address data generated by users has exploded.Address matching is a crucial task in various location-based businesses like take-out services and express delivery,which aims at identifying addresses referring to the same location in address databases.It is a challenging one due to various possible ways to express the address of a location,especially in Chinese.Traditional address matching approaches relying on string similarities and learning matching rules to identify addresses referring to the same location,could hardly solve the cases with redundant,incomplete or unusual expression of addresses.In view of the above problems,this paper studies the techniques of data acquisi-tion,text embedding and address matching,and implements the POI(Point of Inter-esting)Chinese address fuzzy matching system,which includes the following aspects:1.To learn the geographical semantic representations for address strings,we novelly propose to get rich contexts for addresses from the web through web search engines,which could strongly enrich the semantic meaning of addresses that could be learned.2.We propose a geographical address representation learning model for address matching.We propose to use an encode-decoder architecture to learn the semantic vector representation for each address string where an up-sampling and sub-sampling strategy is applied to solve the problem of address redundancy and incompleteness.The attention mechanism is also applied to the model to highlight important features of addresses in their semantic representations.3.Moreover,we construct a single large graph from the corpus,which contains address elements and addresses as nodes,and the edges between nodes are built by word co-occurrence information to learn embedding representations for all the nodes on the graph.Our empirical study conducted on two real-world address datasets demonstrates that our approach greatly improves both precision(up to 8%)and recall(up to 12%)of the state-of-the-art existing methods4.We implement a system of automatic model training and address matching,which can better demonstrate the experimental results and make it more convenient to do experiments on other data sets.
Keywords/Search Tags:Address Matching, Deep Learning, Data Collection, Graph Neural Network
PDF Full Text Request
Related items