| In nowadays society,energy conservation and emission reduction is vigorously advocated in China,building energy conservation design has become an important part of architectural design.In the process of the design,the determination of design parameters cannot be achieved without the support of reasonable and reliable basic parameters.In terms of indoor environment,the thermal comfort basic database has been established in foreign developed countries,and a standard system has been formed accordingly.However,there is still a lack of industry recognized databases in China.The quality of basic thermal comfort data will directly affect data mining analysis results,and it is of great significance to control the quality.At present,the research on its quality control method needs to be improved,and the influence analysis of data characteristics on the quality of thermal comfort database is still rarely involved.In this paper,based on six kinds of human thermal comfort factors,including indoor temperature,relative humidity,average radiation temperature,air velocity,metabolic rate and clothing thermal insulation,and three subjective evaluation indexes,namely,thermal sensation vote,thermal comfort vote and thermal acceptability vote,according to the characteristics of building environment and people’s living habits,a data quality control method for thermal comfort field investigation of office buildings and residential buildings is proposed from three aspects of missing value identification,data consistency check and outlier detection.In abnormal value detection,subjective voting value has some quality problems such as subject misunderstanding and personnel input error.Through the analysis of common outlier detection method,combined with the principle of thermal comfort,the two-step architecture outlier detecting method based on the SET distance for human thermal sensation vote was developed.In this method,the standard effective temperature was used as the index,the K-Nearest Neighbor classification method based on distance d was used to determine the similar operating conditions,and the outliers were determined by the outliers processing method based on Gaussian distribution.At the same time,data visualization method was presented to detect outliers in thermal comfort vote and thermal acceptability vote.Based on this method,a thermal comfort field survey data processing system is developed by using MATLAB language.The field study of office and residential buildings in Dalian,Zhengzhou,Luoyang and Tianjin in the cold region was carried out,and 2447 valid questionnaires were collected.On the basis of the data obtained in this survey,the research also collected thermal comfort field investigation data from other universities in this study,and adopted the proposed quality control method to control the quality of all the collected data.The results show that the method has a significant effect.In this paper,a comprehensive thermal comfort database in China has been established with strict quality control.The database covers 41977 sets of thermal comfort field survey data covering 50 cities in 24 provinces in five thermal zones of China in the past 20 years from 2001 to 2021.The data is widely distributed and has representativeness in terms of environmental temperature and survey location.Based on the established thermal comfort database,this paper taked the thermal sensation model as the basis to evaluate the quality of the data set,and analyzed the influence of data characteristics on the quality of the data set from three aspects which are sample number,data distribution and data range.The sample size calculation method based on interval estimation was adopted in this study,and the minimum sample size was 350 in the thermal comfort field survey of office and residential buildings.The results show that compared with the normal and uniform distribution data,the neutral temperature of positive skewness distribution data is lower,while the neutral temperature of negative skewness distribution data is higher.The thermal sensation model established by the data with uniform distribution is more robust.As the temperature range decreases,the neutral temperature in winter increases and the neutral temperature in summer decreases,and the accuracy of thermal sensation model established by thermal comfort data decreases. |