| There are abundant natural grassland and livestock resources in Inner Mongolia Autonomous Region,which provides privileged conditions for the development of the animal husbandry.The herd of dairy cattle and the production of milk rank first in China,making it the most significant milk producing region.Considering the influence on the quality and safety of dairy products,the economic benefits of livestock farms,and even the future of the dairy industry,the health development of the dairy industry is the important prerequisite for improving the level of dairy welfare husbandry and further strengthening the status and comprehensive competence of the dairy industry.There is rich health-related information in basic behaviors of dairy cows.It is important to realize accurate and rapid behavior detection for intelligent perception of dairy cow health conditions.Aiming at the shortcomings of the current intelligent behavior perception technology in natural husbandry environments,a video collection platform was built to record the daily behaviors of dairy cows from lactation to dry lactation without contact.Integrating basic theories and methods such as machine vision,deep learning and multi-feature fusion in this thesis,it carried out studies in terms of behavioral feature extraction algorithm,the dairy cow skeleton feature extraction algorithm,the dairy cattle temporal-spatial feature extraction algorithm,as well as the the multi modal feature fusion algorithm,to realize the daily behavior classification of dairy cows groups in natural husbandry environments.The specific research content and results of the thesis are as follows:(1)Considering that the color distribution of the natural husbandry environment is similar to the body color of the diary cows,it is also difficult to distinguish the front and rear limbs due to the similarity.The behavior detection algorithm based on RGB images has poor robustness encountering with multiple interference factors such as natural light changes,chaotic background,and the objects occlusion in the husbandry environment.a Gaussian response thermogram was proposed to encode the keypoints of the the dairy cow,and construct the skeleton representing the complete body structure and behavioral pose.A dairy cow skeleton extraction model based on the spatial feature fusion was designed,which used the high-quality skeleton information to accurately predict the standing,walking,and lying behaviors of the dairy cow.The experimental results showed that the prediction accuracy of single object skeleton prediction was 85%during the day and 80.43%at night.When the object occlusion was less than 50%of the cow’s body area,the accuracy of double objects skeleton prediction was 74.38%during the day and70.58%at night,respectively.The overall accuracy of each keypoint was 77.26%,and the average accuracy of standing,walking,and lying behavior prediction was 91.67%,92.97%,and 99.2%,respectively.(2)The distance between the dairy cow and the monitoring camera varies are different in daily behavior monitoring videos.Moreover,the dairy cow objects in images have different scales.Therefore,the thesis proposed a dairy cow skeleton estimation model based on the multi-scale residual structure,which could capture the spatial feature information of different scales in parallel with multiple branches,making the prediction model have better spatial feature representation capabilities.During the construction of the data set,the part affinity fields between some keypoints was added to improve the skeleton estimation precision when the hindquarters of the dairy cow were severely occluded.An adaptive adjustment factor was designed to improve the L2loss function,which could automatically compress the impact of easily-detected negative samples on the loss.Thus,it could effectively alleviate the imbalance between positive and negative samples,as well as the imbalance between easily-detected samples and those difficultly-detected samples during the keypoint detection process.The experimental results show that the average confidence of each keypoint extracted based on skeleton estimation model of multi-scale residuals was higher than 70%under the loose index with OKS of0.5.(3)Due to the high similarity between standing and walking features in a single frame skeleton,their behaviors are easy to be misclassified.A 4-layer bidirectional LSTM network was designed to analyze the temporal skeleton sequence,which could capture the temporal context information of the skeleton before and after the frame.According to the different importance of each frame skeleton feature for overall behavior detection,a Transformer frame attention module was proposed to assign the weights to the temporal information mined by the Bi LSTM module,and extract the behavior attention features with strong discrimination.The Dropout network regularization would be introduced to the network parameters to prevent network overfitting.The experimental results showed that compared with the dairy cow behavior detection model based on the single frame skeleton feature,the detection accuracy of standing and walking was improved by 5.72%and 3.07%,respectively.It indicated that the performance of the multi-objec dairy cow basic behavior detection algorithm can be effectively improved by utilizing the temporal dimension information of video sequences and the importance of different frame skeleton sequences was more advantageous on behavior detection than single frame skeleton feature.(4)To solve the problem of insufficient feature expression based on single mode data,a multi-modal behavior feature fusion algorithm was proposed,which used the temporal-spatial skeleton features to make up for the shortcomings of RGB images and optical flow data in behavioral model.The spatial stream sparsely sampled the behavioral video clips,and captured the spatial features of the object appearance with twice the sample input amount of traditional spatial stream.An improved Flow Net2.0 optical flow estimation network was designed to achieve the end-to-end optical flow feature extraction.The temporal behavioral information was extracted by stacking 10consecutive frames of optical flow information as the feature input.The temporal-spatial stream network used the temporal-spatial skeleton feature as the input to capture the dynamic changes of the dairy cow skeleton over time.The experimental results showed that the network model based on the multi-modal feature fusion could significantly improve the accuracy of behavioral detection.Compared with the behavior detection algorithms based on single-mode spatial flow,the detection accuracy was improved by5.23%,8.95%and 1.21%,respectively.This indicates that using the information complementarity advantages of different modal data can improve the performance of the dairy cow behavior detection algorithms in natural husbandry environments. |