The Use of Hashes in Signature-based Malware Detection

A hash is a fingerprint of a file that is unique for every file. It is almost impossible to compromise an operating system without changing a system file – and with any changes in a file’s characteristics – the hash or signature is also modified. This enables some threats (with recognized, recorded signatures), and corrupted files, to be detected using file integrity checking. This is done with hashes. Any difference in a file, even 1 extra comma in a ten-thousand word text document, will alter the file’s ‘fingerprint’ to a totally different hash value (this applies to all file formats). So with any malicious modification, this signature is changed. Though with a genuine modification such as an update or patch the hash will also change and must be noted. In theory this provides a reliable method to check for malware, though as hash recognition relies on a directory of current threats to be maintained for comparison by security software, this is not totally effective as a first line of defence and prevention.

There can also be some inherent vulnerabilities in an operating system, for example up to Windows 2000, the directory of the system’s file hashes could be altered or written over, making it possible for infections to escape detection. It is difficult to keep a registry current to enable accurate checking of hashes to distinguish between legitimate and non-authorized system file changes. Real-time detection by this kind of scanning is far from fool-proof, to the extent that Windows System File Checker is sometimes incapable of detecting infections, while genuine patched files can labeled as threats because the updated hash was not registered/recognized, resulting in what is known as a false positive alert.

With hash comparison techniques, viruses have also evolved various escape and evasion techniques. Malware like the ZeroAccess rootkit can choose and deploy one of two techniques, depending on the specifications of the target operating system. In Windows 32-bit OS mode, it overwrites a driver and to hide in, and on any subsequent scanning presents the original for examination which retains the hash of an undamaged system file.

To further overcome vulnerability to signature-based checks, malware will often try to hide its re-written and downloaded files in a folder where it is not detectable by routine scans, or those by the majority of anti-malware. Such is the obfuscation strategy used by ZeroAccess rootkit on Windows 64-bit systems of hiding in folders of the Global Assembly Cache (the generalized way that these folders are listed make this a fairly secure place for the virus’ hash to avoid detection). This is a form of concealment comparable to one called steganography, from the Greek steganos, and graphein or covered writing…. the ‘secret’ remains hidden by not drawing attention to itself as a subject requiring the scrutiny of searches. In this case the virus appears to be a genuine element of a host folder; its properties are not scrutinized as a separate constituent by any but the most rigorous of searches. In computer terms, the simplest way to detect such files is to compare them to known, unadulterated original files – or against a data-base of known malware hashes, though the questionable file has to be first located in order to do this.

Signature-based anti-virus software is data-based: it can compare files on a system to commercial data-bases as well as public malware signature repositories to identify known infection signatures. As said, this requires the suspect file to first be made available for comparison. As well as obfuscation tactics such as botnet-type infections employ (sometimes these ‘covert’ programs are also known as ghostware), malware developers have side-stepped this detection method to some extent by using polymorphic coding and encrypting sections of code that are difficult to create a reliable/consistant signature for AV software to reference. As a result, infections can be missed in the data-packets that anti-malware examines.

This is why most security software does not rely solely on this method but increasingly depends on the behavioral science of malware – Heuristic analysis – to combine with comparison signature-based detection. Heuristics is the monitoring of unusual system behavior such as inordinate CPU power usage, RAM activity, unauthorized file modification/execution, random port use,  based on previous virtual models of a particular malware variant’s behavior. This model is then mapped to generate a threat warning for a system. This protection is only effective after the developers have first run a sample of a new variant through virtual trials and updated their threat index accordingly. Signatures are an important part of cross-referencing in data protection and file management, though they are not practical as the sole system of defence given ongoing malware advancements and tactical changes.

Leave a Comment

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.