Font Size: a A A

The System Of Merchandise Search Engine

Posted on:2006-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:F H DiFull Text:PDF
GTID:2168360182457163Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the booming of electronic commerce development , the bargain that merchandise is more and more frequency and multifarious, and the netizen proceeds network shopping's process are usually related information that search the merchandise and usually want "goods to compare three house".Hence, the merchandise search engin has the soil of its development. The merchandise that similar search engine of local now search for has already had a lot of houses such as: 8848.com, chinaec.com,dangdang.com…etc.,by the development of internet ,the progress of information 's bulding is fast too. more and more people are recognize the importance of profession search engine, therefore, its will become a big development direction in local. There are many similar website already in abroad,such as Shopping.com, Froogle.com...etc.,and the deep study in professional search engine are in going on.On the internet you can find alll kinds of articles about spider ,database index . Moreover, according to the customer inquisition that reach 80% person preparation purchase the product to all use the search engine to search the merchandise that their need.The visitant, new customer comparison that come from the search engine is very high, and all visitants all have very strong goal,they are all initiative come into your website ,therefore specially high request to business website they are.and the trend of its development are equal with internet's ,it is the original motive that why I wite the merchandise search engine system. History of search engine can trace back to the ancestor of the search engine, in 1990 the Montreal McGill University's student Alan the Emtage, Peter the Deutsch, Bill the Wheelan the Archie that invent.( Archie FAQ)Although at that time World Wide Web did not appear, in the internet file's transmission is frequence , because of large quantity spreading at eachly FTP host ,searching very inconveniently, for this reason Alan Emtage etc. developping a system that it can seek files only through input file's name .it is the Archie. The Archie is the program that first and automatic index anonymous FTP the procedure of document website in the internet To a certain degree,it is not just a real search engine. Archie is a document form that he can search FTP address, but customer must inputting the precision of a FTP document 's name for search , then the Archie can tell the customer which FTPs address can download that document.search engine's research later are all on his foundation to continuously proceeds, always develop up to today. However all the search engine in this way, namely it will according to certain strategy get together and discover infromation,while it will understand ,pick-up,orgnize and deal with information,at the same time it will provide the searches service for people, sequentially realize the aim of information navigate. Sum up ,general search engine 's realize will through three step firstly,get the website from internet ,secondly,building database index,thirdly search and compositor in index database.here involved four important concept search enginery ,index enginery,searches enginery and user'sinterface and so on, moreover merchandise search engine have the modules of snatch web ,information draw-out,interlinkage analyze,index and searches ,so this paper's write order like this. Snatch web module 's responsibility is download information from electronic commerce website . generally ,search engine use the spider as his tool to collect infromation. Spider carry out mutual with website server by HTTP protocol, download web page and use interlinkage analyze module analyze HTML web code ,at the same time draw-out new interlinkage provide spider programe use. Thereby it will access the document by recursion, finally the aim of information collet be realized.Usually large search engine use distribution crawler or spider technology,means spider program distubute in many server and deal with correspond website in distribution way. Consequently break through the limited of bandwidth and realized the informations's high speed colletion. Information draw-out module ,his main task is delete all the tag of web's HTML,and only save pure text for the using in the next process ,it means that can draw-out some useful information;the further technique is to use statistical matrix analyze the information of navigate and advertisement column,and peel off them reach the aim of filtrate noise.Because merchandise search engine demand higher degree structure information ,these technology have no use in his type system..Then it need your draw-out structure information frome all the informations.namely draw –out merchandise's name ,pictures's position and merchandise's price and so on.delete repeat informationis important things in the process of make index table,because this system use the MD5 arithmetic encrypt the key words as the data fingerprint of the web page, and this technology can heavyly reduce repeatiton'happen. Otherwise,in order to improve searches speed ,it have to make a converse compositor index,traditional's method is scan document and convert them to two dimension array(to Chinese said always your need record the position of keyword appear,so you need record three dimension array< keyword serial number,document serial number,the position in document >),then you have to relign the original document and merger compositor memory. Searches engine use whole document search technology can realize high speed search and can realize 10GB documents operation per sub-second in general PC. Inspected the efficiency to still melt intoed the front to search for in common use and a few search engines the method this parted to will discuss the characteristic index of several kinds for the sake of the exaltationed in this system namely:Concept index therefore the concept is a core, this kind of way on changing regarding key words as the core's search mode to ask for help the concept phrase dictionary , and through hand over with customer with , acquire the customer's need of search, and combine regarding this purpose as the core's a kind of search method. F&Q type search engine is a module that question and answer base on the natural language search .Leading type classified search and clustering search to show the way that result of afresh arrange method, and is a kind of search method to convenientcustomer.The characteristic search module is to make use of the character withdraw technique through draw out different customer acquire and get different result. For the sake of safety, the system receive the users's search claim through CGI procedure ,and pass by the first step handle will through TCP protocol send out to searches engine , alongside of accept to searches engine 's deal result ,and base on the result draw out the correlation data display in ultimate user's interface.while ,in order to prevent malice attack( for expample: inputed the long word string etc.), above all CGI procedure will filter the user'key words ,include limit the long of string ,fiter the word that not formula (switch between DBC case,SBC case),after encryption will be sent to searches engine. After the search engine complete the handles will feed back search result( namely related document number list), CGI will base on the document number list seek the document's position in the database file ,its length, read the file and dynamicly certain essays position for appearing in dynamic way , for the purpose of as a abstract of the file show itself to most ultimate customer. For the sake of the compression hard dish 's space, web page document all return back to a data file, pass a document index to record the start position of the file and the length.By so doing the CGI procedure read each text file will need very long time(because of operate speed is more slowly in the disk ), this system adopt memory image file's way,through call mmap method to make the access speed fast, availably resolved this problem.Pass ameliorative that technical application of above forerunner, wether in information withdraw or delete repeate aspect all established more optimizer's model ,while the searches method applicated the technology of artificial intelligence and data mining make the searches result more accurate and offering to the user's interface more friendly, to here the all new search engine have designed success, and all kinds of function of this engine is not too bad , and the accurate degree is good, but the recall rate is not superiored, this aspect 's shortcoming only continuously complete in the follow-up work .
Keywords/Search Tags:Merchandise
PDF Full Text Request
Related items