Texture features are important visual clues in images,which are a unified description of human visual and sensory attributes.Texture analysis plays an important role in computer vision applications such as object recognition,pattern recognition,and medical image analysis.Early texture classification algorithms described texture features relatively simply,such as texture image roughness and smoothness,which neglected the diversity of real texture features.The current classification algorithms based on convolutional neural networks extract texture features through multiple levels,enriching the description of texture features.Convolutional neural networks are structures with strong learning ability,which can obtain prior knowledge through the learning of datasets,and then determine the attributes of object features.Currently,most algorithms perform unordered aggregation based on convolutional neural networks,which maintains spatial invariance but ignores the inherent connections between features.Simultaneously,texture images have problems such as significant differences between the same categories,small differences between different categories,and no fixed shape information.Therefore,this thesis improves the convolutional neural network model by adding background information and Transformer structure in feature encoding to enhance the description of intrinsic attribute correlations of features.The specific improvements are as follows:(1)Texture images have the problem of differences in the performance of inter-class features between classes and large differences in the performance of intra-class features.The background of textures of texture are important information in texture image,which generally represent the location information and aggregation degree of feature in the image.By adding corresponding background information to the feature embedding,the dependency on the intrinsic attributes of the texture is enhanced.The convolution neural network structure is utilized to extract the texture feature information of different scales.The cascade structure is adopted to fuse the feature information of three different scales and levels,and extract the background information of different scales;The formation of next stage features is guided by the shallow background information,which can enhance the similarity of the intra-class image,and the obtained intra-class feature information is more general;Finally,the attention embedding module is added to strengthen the model’s attention to key location feature information,reduce the impact of irrelevant information,and enhance the model’s ability to distinguish different attribute features.(2)Texture images lack fixed shape information,and there is a correlation between global and local features.This correlation can reduce the problem of large intra-class semantic gaps by the lack of shape features in texture images.Therefore,CNN and Transformer structures are combined to describe texture feature.The algorithm first improves the Transformer structure,by utilizing a depth-wise separable convolution map instead of linear map.Then the cross attention module is utilized to explore the relevance of feature information between different scales and characterize the spatial dependence of features at different dimension.The above algorithm improves the accuracy of classification while reducing the amount of computation. |