| Recently,there has been extensive investigation on intelligent applications,such as assisted driving,driven by the advancements in artificial intelligence technologies.These applications generate a large amount of image and video data,which is analyzed by machine vision algorithms instead of humans.However,the performance of machine vision analysis is degraded due to the distortion caused by image coding.Therefore,there is a need for research on image coding for machines.The image understanding performance of machine vision algorithms depends on three factors:1)The amount of semantic information carried by the image.Sufficient semantic information is essential in machine vision tasks,as algorithms cannot correctly understand images without sufficient semantic information.2)The representation of semantic information.The representation of semantic information determines how difficult it can be parsed by machine vision algorithms,and an appropriate semantic representation can facilitate easier semantics parsing.3)The semantics parsing ability of machine vision algorithms.Only when machine vision algorithms possess strong semantic parsing abilities can they accurately analyze and understand semantic information.Considering the above three factors of image understanding,this paper carries out the following work from three aspects.Semantics-oriented Image Coding Distortion Measurement Method:Given the fact that the distortion metric determines the optimization direction of an encoder,this work proposes to optimize the encoder with a semantic metric to preserve as much semantic information as possible during the image coding process.Based on this idea,the transform-based and projection-based semantic metrics are proposed and combined as the semantic metric.To utilize the advantage of end-to-end optimization,the proposed semantic metric is applied to an end-to-end image coding framework.Experimental results show that,under the same machine vision analysis performance,the encoder optimized by the proposed semantic metric achieves significant bitrate savings in various machine vision tasks,providing a feasible solutions to task-generic image coding.In addition,the proposed method outperforms the traditional image coding method in terms of perceptual quality.Semantic Representation-based Image Coding Method for Detection Tasks:This work aims to reduce the difficulty of semantics parsing from the perspective of the machine vision algorithms.This work analyze how intra-class distance and inter-class distance affect the object localization stage and classification stage in the detection tasks.Based on the analysis,this work proposes to reduce the difficulty of semantics parsing by enlarging the inter-class distance.Taking Gaussian distribution as an example,a semantic transform is theoretically derived and applied before image coding.The experimental results demonstrate that the proposed semantic representation-based image coding method effectively reduces the difficulty of semantics parsing,achieving a significant improvement in analysis accuracy under the same bitrate in detection tasks.Additionally,an inverse transform is proposed to alleviate artifacts caused by the forward semantic transform,enhancing perceptual quality.Joint Optimization-based Feature Coding for Identification Tasks:This method seeks to improve the semantics parsing ability of machine vision algorithms by training them under a specific information capacity constraint.A comparison between image coding strategy and feature coding strategy is performed,with the latter being selected for integration into identification networks to limit information capacity.To fully explore the advantages of the joint optimization-based feature coding method over the independently optimized feature coding method under information capacity constraints,this paper investigates the impact of different encoder structures and joint optimization approaches on the semantic parsing ability of machine vision algorithms.Experimental results demonstrate that the joint optimization-based feature coding method improves the semantic parsing ability of identification tasks,leading to higher identification accuracy.The above three work in this paper study the topic of image coding for machines from the perspectives of semantic fidelity,semantic representation,and semantics parsing,which correspond to the three factors of the amount of semantic information,the difficulty of semantics parsing,and the semamtic parsing ability of machine vision algorithms in intelligent applications,respectively.The proposed methods provide three feasible research directions for the topic of image coding for machines. |