Font Size: a A A

Content-Based Retrieval Of Similar Configurations Using R-trees And Spatial Data Mining

Posted on:2007-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:P YangFull Text:PDF
GTID:2178360182996317Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Content-based retrieval and Spatial Data Mining are two of importantfunctions in GIS, In this paper we discussed some algorithms aboutcontent-based retrieval of similar configurations ,and put forward analgorithm frame for spatial data mining.1 Content-based retrieval of similar configurationsThe retrieval of stored objects matching an input configuration is animportant form of content-based retrieval. A special form of content-basedretrieval is configuration similarity, otherwise called spatial, structural, orarrangement similarity. The corresponding queries describe some prototypeconfiguration and the goal is to retrieve all objects containing arrangementsof objects matching the input exactly or approximately. As an exampleconsider that the user is looking for all objects containing arrangementssimilar to that of a given query which could be expressed by one of theexisting pictorial languages that permit configuration similarity retrieval,select v0,v1,v2,v3, from Space Database, where NE(v0,v1),NW(v0,v2),N(v0,v3), … (NE means northeast, NW northwest and so on).Formally, a configuration similarity query can be considered as a CSP,described by: (i)A set of n variables, v0,v1,…,vn-1 that appear in thequery, (ii) For each variable vi, a finite domain Di ={u0,…, uN-1} of Nvalues, (iii) For each pair of variables (vi,vj), a constraint Cij which can bea simple spatio-temporal relation or a disjunction of relations.In this paper we only consider the region objects, and access to regionobjects through Minimum Bounding Rectangles, which convey about theactual objects they enclose. And only deal with configuration similarity ontopological and direction relations. Topological relations, described by RCC(Region Connection Calculus) classify into eight kinds: equal, contain,inside, cover, covered_by, disjoint, meet and overlap. Directionrelations ,using the definitions of direction relations in Dimitris Papadias'articles :strong_north,weak_north,strong_northeast,weak_northeast,strong_east,weak_east,strong_southeast,weak_southeast,strong_south,weak_sourtheast,strong_southwest,weak_southwest,strong_west,weak_west,strong_northwest,weak_northwest,same_width, same_level.In a configuration similarity, the constraint Cij of each pair of variables(vi, vj) describe the topological and direction relations that the pair of MBRsdrawn from database should be satisfied. In our paper, two five-bit stringsuse to represent Cij, which embody the topological and direction relations oftwo MBRs in the same form.The number of spatial data are quite large in GIS, Exhaustive processing(i.e., retrieval of the best solutions) of configuration similarity queries is, ingeneral, exponential and fast search for sub-optimal solutions is the onlyway to deal with the vast (and ever increasing) amounts of multimediainformation in several real-time applications. In this paper we discuss theutilization of hill climbing heuristics that can provide very good resultswithin limited processing time. Hill climbing algorithms operate on such agraph: starting with a random solution (called seed), they improve it byperforming uphill moves, i.e., by visiting neighbors with higher similarity.Each uphill move (otherwise called a step) may involve a number ofunsuccessful attempts (i.e., visits to nodes of lower similarity). A solution iscalled a local maximum if no uphill moves can be performed.In this paper, three hillclimb algorithms have been introduced. Thedifferences between them are the conditions on which they uphill: inNotreeFirst, once find a better instantiation ,then uphill;in NotreeBest,systematically tries all possible values in the domain of the variable to bere-instantiated and assigns the one that results in the solution with thehighest similarity, in NotreeFChang, improved from NotreeFisrt, once find asolution S, which similarity is higher than the current one and theneighbor-list of which are different from the current one's, then uphill. Aftera number of experiments , we conclude that the results of NotreeFChange ismuch better than NotreeFirst obviously;the speed of NotreeFChange isquicker than NotreeBest , but NotreeBest reach an area of higher similaritythan both NotreeFChange and NotreeFirst.Sub-optimal algorithms can speed up the retrieval of configuration similarity,but as the number of spatial objects increase large, the retrieval time isintolerably for users, too. So we improve the algorithms by using Rtreeindex structure. Build an Rtree for spatial data, and then exclude theneighbours, which are no influence on retrieval of the configurationsimilarity, out of the neighbour-list using the Rtree.Retrieval of configuration similarity using Rtree can be considered asa hierarchical CSPs. Compared with the above Sub-optimal algorithms, itadds a process: filter the neighbour-list. The process has two methods: beginwith the root or with the leaf level. In our paper, three algorithms usingRtree have been put forward and after large experiments we make aconclusion: compared with the retrieval from disorderly and unsystematicspatial data, sub-optimal algorithms using Rtree are much faster and almostreach the same high similarity. Yet, the result of the retrieval ofconfiguration similarity are greatly influenced by both the distribution ofspatial data and the given configuration, so it hard to say which sub-optimalalgorithm of the three is best, different case may receive different answer. Ingeneral, at the same spatial data, when the configuration similarity on theloose, algorithm which filters the neighbour-list form the leaf level is muchfaster;on the contrary, algorithm which filters the neighbour-list form theroot is faster.2 multi-layer spatial data miningData mining refers to extracting or mining knowledge from largeamounts of data. With wide applications of remote sensing technology andautomatic data collection tools, tremendous amounts of spatial andnonspatial data have been collected and stored on large spatial data.Traditional data organization and retrieval tools can only handle the storageand retrieval of explicitly stored datas, the extraction and comprehension ofthe knowledge implied by the huge amount of spatial data ,though highlydesirable, pose great challenges to currently available spatial databasetechnologies. This situation demands new technologies for knowledgediscovery in large spatial data, or spatial data mining, that is, extraction ofimplicit knowledge, spatial relations, or other patterns not explicitly storedin spatial databases.In the paper, we put forward an algorithm frame for multi-layer (ormulti-thematic) spatial data mining. In this frame, spatial data miningconsist of two parts: (i) execute AGGREGATE algorithm to acquire aaggregative table TA, which contain the spatial relation information and thenonspatial information of the pair of objects belong to different layer;(ii)use the traditional data mining algorithms to extract knowledge. The mostcontribution of this frame is that without any change the traditional datamining algorithms can be used to spatial data mining. The frame is appliedto a real spatial data classification problem, which runs efficiently andachieves good classification results.
Keywords/Search Tags:Configurations
PDF Full Text Request
Related items