Font Size: a A A

Research On Determining Image Base Of Firmware For ARM Devices

Posted on:2017-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:R J ZhuFull Text:PDF
GTID:1318330566456056Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Embedded devices have become popular in our life.All of these devices contain a kind of special software which is called firmware.Similar to traditional software,firmware usually has defects and vulnerabilities which pose a security threat to the device and even cause device failure.Reverse engineering is a common method for security analysis of embedded device firmware,and disassembling is a basic and important component of reverse engineering.Disassembling is a technique that recovering the equivalent assembly instructions from its binary representation.Before disassembling a firmware,its processor type and image base should be available.For a given embedded system firmware,we can easily get the processor type by consulting the product manual or physically examining of the device,but cannot get the image base directly.With the correct image base,disassembler is able to build accurate cross references in instances where the address reference uses absolute addresses rather than offsets in the firmware.These cross references,which includes code cross references and data cross references,are very important for analyzing the functions and evaluating the security of the firmware.However,most of the embedded system firmwares are non-standard,and their format is unknown.To our best knowledge,there is still no automatic method can get the image base of a file with unknown format.At present,about 63% of embedded systems are based on ARM processor.Hence,in this paper we focus on the firmware under ARM architecture.By studying the instruction characteristic of binary function,function entry table,storage and reference method of string,and storage of literal pool,we propose four methods to determine the image base of firmware.The contributions of this paper are summarized as follows.(1)Considering the function entry table(FET)and prologue of binary function,we propose a method of Determining Image Base of Firmware by Matching Function Entry Table(DBMFET).First,based on the analysis of source and features of FET,we present the FIND-FET algorithm to identify possible FETs in a firmware.And then,the characteristic of the binary function prologue is studied.Finally,based on the characteristics of FET and binary function prologue,we propose the algorithm DBMFET to determine the image base of firmware.DBMFET uses function entry addresses in a FET to define the range of image base.If on a certain address in the range the percentage of matched function entry addresses is greater than a predefined threshold value,this memory address is outputted as a candidate image base.Then we perform DBMFET algorithm on all FETs of a binary file and get multiple candidate image bases.If the number of FETs corresponding to a particular candidate image base is much larger than others,this candidate image base is considered as actual image base.(2)Firmware usually contains some strings.Based on the correspondence between string offset and string address,we propose the method named Determining image Base by Matching Set of String Addresses(DBMSSA).First,according to storage characteristic of string in the firmware,we present FIND-String algorithm to identify all of the strings in firmware and output the string offset and length.Since the string address is usually loaded by LDR instruction,we analyze the machine code format of LDR instruction respective in ARM state and Thumb state,and present FIND-LDR algorithm to identify LDR instruction and calculate the addresses loaded by it.Next,the addresses outputted by FIND-LDR are sorted and de-duplicated,which produces an address set loaded by LDR instruction.Finally,with the set of string offset calculated by FIND-String and the set of address loaded by LDR,we proposed the DBMSSA algorithm.The algorithm calculates the difference value between each address in the address set and each offset in the string offset set.Then we count the number of times that each difference value appears.If there is a difference value appears much more times than any others,it is considered as the actual image base.(3)Inspired by the feature that the order of string addresses in literal pool is consistent with the order of strings in firmware,we propose the method of Determining Image Base of Firmware by Matching Literal pool(DBMLP).Based on the storage characteristic of literal pools that contain string addresses,we present FIND-LP algorithm to recognize all possible literal pools in a firmware.And then we propose the algorithm DBMLP to determine the image base of firmware.Besides literal pools,DBMLP also needs the string offset and length outputted by FIND-String algorithm.DBMLP first uses the string addresses in a literal pool to get the string length.Then,the string length in a literal pool is tried to match with each string length outputted by FIND-String algorithm.If the match succeeds,the relationship between string addresses in a literal pool and string offset can be got,and a candidate image base can be calculated accordingly.Then we perform DBMLP algorithm on all literal pools of a binary file,which produces multiple candidate image base.If on a candidate image base,the number of matched literal pools is far greater than at other locations,this candidate is considered as the actual image base of the firmware.(4)For the compiler usually centrally stores the strings that referenced in adjacent code,we proposed the method of Determining image Base by Group Matching String Storage Length(DBGMSSL).This method takes as input the set of string offset and the set of string address,both of which can be used to get a vector of string storage length.Then,we choose a group with fixed number of elements from the vector got from the set of string offset,and match the group with the vector got from the set of string address.If it matches successfully,the relationship between string offset and string address can be obtained,by which a candidate image base can be calculated.Each group is processed in the same way and we can get multiple candidate base.If there is a candidate image base appears much more times than any others,it is considered the actual image base.For the above four methods,we select 16 firmware files as the test sets to test their validity and efficiency.The experiment results show that each method has its applicable scenarios.With the success rate of the four methods in descending order is 1,2,3 and 4.And the success rate is respective 93.75%,87.5%,81.25%,68.75%.With the average runtime in descending order is 4,3,1 and 2.And the average runtime is respective 27.2 seconds,88.6 seconds,102.6 seconds and 1181.6 seconds.
Keywords/Search Tags:Firmware, Reverse Engineering, Disassembling, Image Base
PDF Full Text Request
Related items