| Haptic representation in mobile devices has emerged as a pivotal technology for enhancing user interaction,aiming to provide users with authentic tactile sensations when interacting with content displayed on their screens.The essence of haptic rendering involves the establishment of a mapping relationship between tactile characteristics and driving electrical signals,constituting the core step toward achieving an authentic representation effect.According to the representation method of tactile characteristics,haptic rendering can be classified into data-driven rendering and image feature-based rendering.Data-driven rendering,by utilizing measured data to represent tactile characteristics,frequently achieves a greater degree of realism compared to image feature-based rendering.The quality of the tactile data is a pivotal factor influencing the fidelity of data-driven haptic rendering.The acquisition of tactile data encompasses two methods: direct measurement and cross-modal generation.Direct measurement uses a force sensor to measure the friction,normal force or acceleration of a finger interacting with the material surface.The process of measuring tactile data is time-consuming and labor-intensive,with a limited variety of materials covered.Additionally,the variability in fingertip biomechanics among individuals results in significant differences in the tactile data obtained from the same material under identical measurement conditions.To achieve a uniformly authentic tactile experience across different users,it is necessary to individually measure the interaction data of each user with the material surface,which significantly escalates the time cost.Cross-modal generation methods are predicated on measured datasets,employing neural networks to cross-modally generate tactile data from visual images or audio.This methodology augments the diversity of tactile data samples.However,the fidelity of the generated tactile data may significantly diverge from actual measurements due to the limitations inherent in image measurement conditions and the neural network’s capacity for feature extraction.Typically,crossmodal approaches generate the amplitude spectrum of force tactile data,neglecting the temporal characteristics of the data.Furthermore,training neural networks requires a large volume of data samples,and the tactile data collected from a single user may not be sufficient to support the training of the network.To improve the data generation quality and realistic rendering of tactile features,this dissertation employs shrinkage estimation theory and deep learning methods to propose four data generation algorithms.The performance of these algorithms is validated using experimentally measured data or publicly available datasets.For the generated tactile data,we introduce four tactile rendering techniques based on electrovibration,ultrasonic vibration or mechanical vibration.These methods aim to reproduce the contours and texture features of material surfaces on tactile feedback devices,and subjective experiments are organized to evaluate the fidelity of the haptic rendering by the generated data.The main innovations are as follows:1.Aiming at the problem of the difficulties in measuring force tactile data and the significant disparities in the realism of rendering caused by the biomechanical properties of fingertips,a force tactile data generation and rendering method incorporating fingertip biomechanical properties is introduced.This approach notably diminishes the time cost associated with data measurement,ensuring a uniformly authentic tactile experience for diverse users.Utilizing principal component analysis,friction is decomposed into fingertip biomechanical factors and principal components related to the mechanical properties of the material surface.The Rao-Blackwell LedoitWolf estimator is employed to achieve a consistent estimation of the principal components,so as to generate a new user’s friction with fingertip biomechanical properties.Under conditions of constant pressure and velocity,the friction experienced by the fingertips of five participants sliding over a 3D protruded surface is measured.The average relative error of the friction generated by this method is 10.41%.An independent samples t-test revealed no significant difference between the generated and the measured data.Rendering the generated data on an electrovibration device,subjective experiment results demonstrate that the rendering method with fingertip biomechanical characteristics provides a uniformly authentic tactile reproduction effect for different participants.2.Aiming at the problem of the significant discrepancies between cross-modally generated tactile data and measured data,a Transformer algorithm for generating friction coefficients through the integration of image and audio is proposed.This algorithm efficiently extracts the latent correlations among visual,auditory,and tactile data,thereby reducing the error in the generated friction coefficients.It encodes the input audio and images together to extract both local and global features,transforms these into tactile features using a Transformer module,then decodes and reconstructs them to obtain friction coefficients.Simulation results on the LMT dataset demonstrate that the algorithm reduces the Root Mean Square Error(RMSE)of the generated friction coefficients by 28.53% and 33.25% compared to the Audio-visual-aided Haptic Signal Reconstruction(AVHR)method and the Convolutional Autoencoder(CAE)method,respectively.Employing a rendering method that modulates the drive voltage amplitude and frequency of electrovibration to render the generated friction coefficients on a device,subjective experimental results indicate that the rendering realism of data generated by the Transformer method is significantly higher than that achieved by the AVHR and CAE methods.3.Aiming at the problem of the large demand for data samples in the cross-modal generation method of tactile data,a federal Transformer algorithm is proposed to generate friction coefficient by combining images and audio.The algorithm uses data samples from multiple clients to establish a distributed data generation network,which reduces the data demand of each client.In the algorithm,each client trains the Transformer network according to the local data,and the server aggregates the network parameters uploaded by the client through federated learning,updates the calculation and sends the updated network parameters to the client.Simulation experiments on the LMT dataset indicate that the federated Transformer algorithm reduces the RMSE for generated friction coefficients by 28.47% and 36.79% compared to the federated AVHR and federated CAE algorithms,respectively.Employing a rendering method that modulates electrovibration and ultrasonic vibration to render the generated friction coefficients on a device,the subjective experimental results show that the rendering realism of the data generated by the federal Transformer method is significantly higher than that generated by the federal AVHR and the federal CAE methods.4.Aiming at the issue of lacking temporality in force tactile data for cross-modal generation,a joint Visual and Audio to generate temporal Tactile data(VA2T)algorithm is proposed,which directly generates the temporal friction force and normal force and reduces the error of data generation.The algorithm utilizes a feature extraction network to merge audio and image information and employs dilated causal convolutions in a tactile reconstructor to focus on the temporal dependencies of the data,ensuring the temporality of the generated data.Simulation experiments on the LMT dataset demonstrate that the VA2T algorithm reduces the RMSE for generated friction by 29.44%and 32.37%,and for normal forces by 23.30% and 35.43%,compared to the Transformer and AVHR algorithms,respectively.Employing a rendering method that modulates electrovibration and mechanical vibration to render the generated friction and normal force on a device,the subjective experimental results show that the rendering realism of the data generated by the VA2T method is significantly higher than that generated by Transformer and AVHR methods. |