Font Size: a A A

Research On Data Discretization And Classification Algorithm Based On Factor Space Theory

Posted on:2022-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:R J WanFull Text:PDF
GTID:2518306722968359Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Massive data contain a lot of information.Researchers are committed to extracting valuable,hidden and streamlined rules from massive amounts of data.They can be used to guide production practices and form AI decision making.Knowledge mining has become an important research field in artificial intelligence technology.After decades of research,knowledge mining theory and applications are also facing great challenges while developing.For example,knowledge mining technology has not formed a strong force combination with specific applications,there is an adaptation problem with specific data types,and efficiency and interpretability cannot reach a good balance.Knowledge mining algorithms centered on basic innovation theories can solve many international frontiers and hot issues,and are the guarantee for the sustainable development of artificial intelligence.Factor space is the mathematical foundation of the mechanism of artificial intelligence theory.It is a mathematical preparation for the profound revolution of artificial intelligence.Based on factor space theory,this paper aims to solve the challenges and problems faced by knowledge mining.The following researches are made on the data preprocessing stage and classification tasks in knowledge mining:(1)A dynamic discretization algorithm based on the expression intensity of factors is proposed to solve the problem of data diversity.A new measure of set division—Expression intensity is given,which is used to describe the expression ability of conditional factors to result factors,and a dynamic discretization algorithm is proposed.In order to reduce the complexity of the discretization process,a simple heuristic is adopted to reduce the elements in the candidate breakpoint concentration.Finally,the classification performance of the data before and after the discretization is compared through experiments.The experimental results show that the discretization algorithm proposed in this paper significantly improves the learning performance of the classification algorithm through prior discretization.It also shows that the discretization algorithm proposed in this paper is better than the built-in discretization strategy of each classification algorithm.(2)Propose a classification algorithm based on the integrity of factors to solve problems such as insufficient data completeness.Put forward the concept of factor integrity,which is used to measure the ability of factors to describe concepts,and based on this,propose a completeness division algorithm for classification tasks,and explain the algorithm steps through an example,further in order for the algorithm to be able to adapt to complex background relationships and improve the ability to deal with noisy data is adaptively improved,and finally data experiments are performed on multiple classification data sets to compare and analyze with classic classification methods.The experimental results show that the sample data is often incomplete and contains noisy data.After the adaptive improvement of the IDA algorithm,the learning accuracy and learning efficiency are improved compared with the IDA algorithm.Through experimental comparison,it is found that AAIDA performs better than other classification algorithms in terms of learning efficiency and learning performance.The expression intensity proposed in this paper is a new measure of set division under the factor space theory,which can describe the expression ability of conditional factors to result factors.It is used for discretization to significantly improve the learning performance of classification algorithms,and solves the application of algorithms caused by data diversity.Factor integrity is used to measure the ability of factors to describe concepts.As a factor measurement criterion of the integrity division algorithm,more concise and efficient knowledge rules can be obtained,and the problem of unrecognition and misrecognition caused by incomplete data and carrying noise is solved.The paper has 16 pictures,20 tables,and 59 references.
Keywords/Search Tags:expression intensity of factors, dynamic discretization, factor integrity, IDA, AAIDA, factor space
PDF Full Text Request
Related items