Are Metamorphic Viruses Beyond Antivirus Software?Exploring Challenges of Defeating Self-Modifying Code Attacks

“Has antivirus software surpassed metamorphic viruses?”

OR “Are metamorphic viruses beyond antivirus software?”

Attacks and intrusions on computer systems have made headlines in recent years, perhaps the one of the most visible being the WannaCry ransomware attack in 2017 which affected more than 200,000 computers in 150 countries, and which resulted in major disruption to our own UK National Health Service with some 70,000 NHS computers, imaging systems, blood storage and other equipment being effectively disabled by the attack [BBC, 2017].

Given the increasing installed base of computing devices, including desktops, laptops, tablets, smartphones and other embedded devices, progressively connected via unbiqutious networks, computer security researchers face ongoing challenges to detect and stop the threat of undesirable computer intrusions. Moreover, while hacking may have been the realm of nerdy students in years past, nowadays intruders are more likely to be cyber criminals or adversarial state actors, operating on an industrial scale [Anderson, 2008].

Computer viruses pose a major threat to society, costing billions of dollars, by causing systems to fail, corrupting data, increasing maintenance costs, wasting computer resources, and in some cases, putting human life in peril. The WannaCry attack alone is estimated to have caused more than US$4 billion in damage [CBS, 2017].

One category of viruses called metamorphic viruses can transform their code and structure to avoid detection by antivirus software. Because they are considered a very challenging type of virus to detect, they have posed considerable security risks to computers and systems that lack adequate defenses.

Introduction: computer viruses

Before a detailed discussion of metamorphic viruses, it is helpful to provide some basic definitions and concepts about computer viruses more generally. The term “computer virus” was first coined by Dr. Fred Cohen in 1986, as “a program than can infect other programs by modifying them to include a possibly evolved copy of itself” [Cohen, 1986]. While there is no perfect definition of a virus, researchers today would also say computer viruses do not necessarily have to change the code of other programs, nor do they need to include a copy of themselves in other programs. Finally, unlike other types of malware such as worms and spyware, viruses require user interaction and a host file, and do not operate on a standalone basis in systems [Konstantinou, 2008].

A typical structure of a virus contains three subroutines. These subroutines were provided by Dr. Cohen in the form of pseudo-code (p-code) as follows:

1. Infect-executable: this subroutine is responsible for finding executable files and copy its code into them, thereby infecting them.

2. Do-damage: this subroutine is responsible for delivering the malicious part of the virus, otherwise known as the payload of the virus.

3. Trigger-pulled: this subroutine checks if pre-specified conditions are met, and if they are, then the payload is delivered and the damage is done. The trigger condition could be anything, for example a day of week.

[NB: can delete this illustration if document is too long]

Below a simple virus:

program virus :=

{1234567;

subroutine infect-executable :=

{loop: file = random-executable;

if first-line-of-file = 1234567

then goto loop;

prepend virus to file;

}

subroutine do-damage :=

{whatever damage is desired}

subroutine trigger-pulled :=

{return true on desired conditions}

mini-program :=

{infect-executable;

if trigger-pulled then do-damage;

goto next;

}

next :}

Since viruses will seek to replicate and attach itself to a host undetected, viruses often target operating systems and are written in machine code [Dewdney, 1989]. As is evident, while the concept of the virus subroutines is simple, there can be, and indeed are, a huge number of variations in practice.

The evolution of viruses

Computer viruses have evolved over time, with virus writers employing ever more sophisticated methods to evade detection. Here we cite three key ways in which viruses have evolved that represent a continuum of sophistication.

Encrypted viruses in their basic form have a decrypter, which is followed by an encrypted virus. The decrypter executes when the infected program runs. The purpose of this approach is to hide the functionality of the virus. The encryption seeks to achieve the following:

1. Prevent static code analysis. Encryption makes it harder to dissect the code and review it for malicious or suspicious instructions.

2. Make the process of dissection longer. Encryption will increase the difficulty of disassembling the code and thus add time and draw on more computing resources.

3. Prevent altering. Encryption makes it more difficult to tamper with and change the virus itself.

4. Evade detection. Though early versions of encrypted viruses used the same decrypter for all infected files, more sophisticated viruses use self-modifying encryption which makes detection very difficult.

Antivirus programs typically focus on trying to detect the unique decryption key itself, since the virus will be more difficult to detect. To make it more difficult for antivirus programs, Virus writers have employed a number of techniques with encryption such as changing the direction of the loop, storing the decryption key in the virus, or decrypting the code in different locations in memory. [Szor, 2005]

Oligomorphic viruses take these methods a step further by changing the code pattern of the decrypter with each new generation, making detection based on decrypter’s code impractical [Patel, 2017]. This can be achieved simply by having more than one decrypter. [Konstantinou, 2008]. With Oligomorphic viruses, detection becomes focused on the constant code in the decrypted virus body.

In a Polymorphic virus, the decrypters mutate into a very large number of versions (in some cases, millions) that use different encryption methods to encrypt the constant part of the virus body. Polymorphic viruses use a mutation engine to create a different decryption routine each time they infect a program. The new decryption routine has the same functionality, but the sequence of instructions may be complete different. [EK.28 and Patel, p.2]

Virus detection mechanisms

To detect and combat virus intrusion, anti-virus software employs a range of scanning techniques that have been developed by researchers over the years.

First generation scanners use simple methods to detect computer viruses. The technique relies on scanning suspect code or locations for a pre-defined sequence of bytes, called strings, that are likely to be a match for the same sequence in a known virus. Key examples of first generation scanners are [Konstantinou, 2008]:

1. Signature or “string” scanning – Signature scanning is the simple technique used by anit-virus software. It searches for a sequence of bytes (strings) that are specific to a virus (the “signature”) but not likely to be found in other programs. Anti-virus software developers collect signatures and store them in a database. The virus scanner uses this database to compare strings between those in the database and those in files and system areas on a computer. A key challenge with any such method is the presence of “false positives,” when the anti-virus believes it has a close enough match but in reality the code under scrutiny is legitimate and not a virus. The number of bytes required to minimise false positives depends on the size of the virus. For example, to detect a 16-bit malicious code, 16 bytes is long enough to detect it without false positives. For 32-bit viruses, longer strings are required.

The following examples of a typical pattern for a W32/Beast virus in EXE files, provided by (shown in hexadecimal form below):

83EB 0274 1683 EB0E 740A 81EB 0301 0000

2. Wildcards – Some scanners are designed to ignore or skip specific bytes within the string, and therefore only seek to match the bytes which precede or follow it. Below, the wildcard is represented by the ‘?’ and the ‘%2’ means the scanner will try to match the next byte, 03 in the example below, in the two positions that follow it.

83EB 0274 ??83 EB0? 740A 81EB %2 0301 0000

3. Mismatches – Mismatches permit a certain number of bytes in a string to be of arbitrary value, independent of their position in the string. Whereas Wildcards allow substitutions within, say a 16-byte string, mismatches permit other bytes to be interlaced and to extend the overall length of the scanned string. The idea is that this method can be better at detecting virus families. However, because it handles more complex situations, it is a slower method. For example, the string 11 22 33 44 55 66 77 88 with a mismatch value of 3, would match the following:

A3 11 22 33 C9 44 55 66 0A 77 88

11 34 22 33 C4 44 55 66 67 77 88

11 22 33 44 D4 DD E5 55 66 77 88

4. Generic detection – Generic detection takes two or more detected viruses, and looks for a common pattern in the string. It is another method employed to identify virus families, and often makes use of wildcards and mismatches.

5. Bookmarks – Bookmarks are other characteristics of the virus that may be recorded to more accurately detect the virus. Examples of bookmarks include the length of the virus code, or even the distance (offset) between the start of virus body and the detection string.

Over time, viruses became more sophisticated and more easily eluded basic scanning techniques. As a results, next generation scanning techniques have been developed. Key examples of these second generation scanners are:

6. Smart scanning – Virus writers began to introduce junk code into viruses, in order to mutate the virus as a kind of camouflage and evade detection while preserving its functionality. Examples of junk code include NOP instructions, or characters like TAB or space, which have no references to data or other subroutines. Smart scanning looks for junk code and skips it, thereby enhancing detection.

7. Skeleton detection – Skeleton detection is used with macro code viruses and parses the code in the virus line by line, and drops all non-essential programming statements and white spaces, so what is left in the “skeleton” of the code. This allows the scanner to focus on the core macro code common to the macro virus. Skeleton detection does not use strings or checksums.

8. Exact identification – Exact identification uses ranges of constant bytes in the virus body to calculate a checksum of all constant bits of the virus body. The checksum is essentially a form of the hash function, an efficient way of looking up a larger set of information or values [Dewdney, 1989]. Variable bytes of the virus body are eliminated. This method guarantees precise identification of virus families, and differentiate between the variants. Knowing the variants can also aid disinfection, because variants may require different methods of disinfection. It is often combined with first generation methods. While precise, this method can be slow and it can be difficult to map content ranges for large viruses.

9. Heuristics analysis – Heuristics (or rules-based) analysis is helpful in detecting macro viruses and new viruses. For example, heuristics may look at the structure of the suspect code for suspicious section names, redirections, or even where code execution starts in the last section. One challenge with heuristic analysis is that it can create many false positives.

10. Dynamic detection methods – Dynamic methods generally employ an emulator in the form of a virtual machine that simulates the CPU, memory and operating environment. The malicious code is executed within the virtual machine iteratively, and thus is safe, and the system can be analysed at every instruction. Dynamic methods thus combine other methods mentioned above; examples include dynamic heuristic analysis, and dynamic decrypter detection.

The metamorphic virus

Metamorphic viruses pose more challenging problems for detection. Metamorphic viruses do not have a decrypter, or a constant virus body. They have a single code body that carries data as code. They may generate new variants that look different and do not use a constant data area filled in with strings constants. An example of one of the first known metamorphic viruses was Win95/Regswap, which swapped registers (as explained below) as its method of metamorphosis.

Metamorphic viruses are comprised of several functional units, typically including the following [Patel, 2017]:

1. Locate own code – This unit locates the code so that it can be transformed.

2. Decode – This unit decodes the information needed to execute the transformation.

3. Analyse – This unit analyses and constructs a control flow graph of the program, so that it can be used to re-write the control flow logic of the program if a transformation expands the code.

4. Transform – This unit converts the code into equivalent code for the next generation of the virus.

5. Attach – This unit attaches the new generation of the virus to a host file.

Metamorphic viruses then employ one or more specific techniques to effect the transformation. Key examples are:

1. Garbage code insertion – Inserting garbage or junk code into the virus program is designed to make the code look different but not impact the functionality of the program. This technique is designed to thwart simple signature scanning.

2. Register usage exchange/permutation – The virus may use different registers, but have the same code. This technique can be detected through wildcards.

3. Permutation techniques – The virus may divide the code into subroutines, then use branch instructions so that the virus executes in the correct order. This technique could also include changing the order of subroutines from one generation to another. Further, some viruses may actually replace their instructions with other equivalent instructions.

4. Insertion of jump instructions – The virus may change the sequence of instructions in subsequent generations (even randomly), but also insert jump instructions so that the program executes in the correct order. It is very similar to permutation techniques.

5. Host code mutation – Some viruses may not only mutate themselves from generation to generation, but also mutate the code of the host, creating the opportunity for new viruses or worms to be embedded. To do this, the viruses uses a randomly executed code-morphing routine.

6. Code integration – Viruses which employ this technique are extremely sophisticated. One example virus can decompile Portable Executable (PE) files to their smallest elements, move blocks out of the way, insert itself into the code, and rebuilds the executable code and data references. Not only are such viruses hard to detect, they are hard to repair [Szor, 2001].

Clearly, metamorphic viruses represent a significant evolution of the computer virus and can pose massive challenges to anti-virus software and must be addressed through more sophisticated methods of detection. It should also be noted that metamorphic viruses are also difficult to develop by virus writers. Simple search string scanning is easily rendered useless with metamorphic viruses.

Metamorphic virus detection

Metamorphic virus detection relies on more advanced techniques, such as analysing the file structure itself, or analysis of the behavior of the code. Given the nature of metamorphic viruses, a “perfect” detection ought to be able to analyse a given instance of the virus and generate its essential instruction code. Current techniques for detecting metamorphic viruses are:

1. Geometric detection – Geometric detection looks for modifications to file structure that are likely due to the virus. For example, if the virtual size of a file is 32KB larger than its physical size, then this could be an indication that an encrypted version of the virus has inserted itself into the data file. However, this technique on its own is subject to many false positives. To mitigate the false positives, this technique can be combined with other techniques, such as bookmarks.

2. Code disassembling – Code disassembling involves separating the virus into individual instructions. This technique can be particularly useful if the virus is inserting junk code and is too long for simple string scanning. This technique can be made more powerful when combined with a dynamic or emulator based technique, because it can look for the same instructions executed in the same order.

3. Code emulation – Code emulation uses a virtual machine environment, allowing the antivirus scanner to observe the executed instructions and look for suspicious code.

4. Machine learning – Since metamorphic viruses can mutate or transform their own bodies, some security researchers are developing more advanced techniques based on machine learning, for example neural networks or hidden markov models (HMM). HMM is a method of statistical pattern analysis used in applications such as speech recognition. The HMM developed by Wong and Stamp [Wong, 2006] uses a state machine in which the transitions between states have fixed probabilities. An HMM is trained represent a set of data, based on recognising common features between generations of viruses, and thus identify virus families. When trained, an HMM should be able to recognise a virus it has never seen before as being a member of the same family. Because, even though metamorphic viruses mutate, there are still some similarities within metamorphic virus families, HMM’s probabilistic approach is feasible.

Comparison

In Dr. Cohen’s definition of a computer virus, the three components are the insertion/replication, payload, and trigger. The component of viruses that has evolved the most over the years is the replication component. In simple first generation viruses, the virus simply infects other files by modifying them and attaching itself to them. But the virus itself did not change, which made them easily detectable by antivirus software. The antivirus would get a copy of the virus, extract a hexadecimal string from the virus code, and scan for that string in the target computer.

In the case of an encrypted virus, the virus has two parts, an encryption routine (“decrypter”), and the virus body itself. Antivirus cannot detect the virus because of its encryption, usually with a different encryption key each time. However, the virus body remains constant, and the presence of the decrypter is a constant. So the antivirus detects by scanning for the decrypter.

The polymorphous virus changes the code of the decrypter from generation to generation, while the virus body is constant (but encrypted). Therefore, antivirus software cannot use the search string method of detection. More advanced antivirus software can decrypt the virus body and identify the virus, because eventually the virus has to decrypt itself, with its constant body, somewhere in memory in order to execute its malicious code.

The metamorphic virus does not use encryption. Each generation of a metamorphic virus will not have constant data, and may look completely different whilst still maintaining its functionality.

Weaknesses

Fundamentally, metamorphic viruses are difficult to detect because they are designed to avoid known techniques of antivirus scanners. Metamorphic viruses change their code from generation to generation, to avoid detection by static signature/string scanning, using techniques to hide its code. They may even avoid dynamic analysis such as emulators, if they are aware that they are being executed in a controlled environment.

However, metamorphic viruses, while sophisticated, are not without their weaknesses. Chief among these is that, in order to mutate effectively, the metamorphic virus needs to analyse its own code. And there’s the rub. The metamorphic virus faces the same limits as the antivirus software in analysing itself. This is true for each successive generation, and the ability to analyse itself in the current generation will be a function of the complexity of the previous transformation. Metamorphic viruses need to have some special algorithms that will help them detect their own obfuscations [Lakhotia, 2001].

In theory, therefore, an antivirus software ought to be able to analyse a metamorphic virus using the virus’ own method of analysis. This would allow the antivirus to reverse the mutation and identify the real virus code. Security researchers would need a sample of the virus, in order to extract the algorithm.

Trends/conclusion

Metamorphic viruses are a particularly difficult class of viruses which are not easily detectable by convention antivirus methods.

There are a number of emerging techniques to address metamorphic viruses, including the machine learning based approaches discussed here. Further, given metamorphic virus’ ability to mutate, if a kind of shorthand notation or taxonomy for different types of metamorphic viruses with equivalent functionality could be developed (a technique called “canonicalisation” [Gollman, 2011]), it could increase the efficiency and effectiveness of virus scanners, because they would leverage a common version of the code that can be easily recognised.

Metamorphism is moving into other forms of malware, such as worms (i.e., infection of networks, not just files) spyware (from websites) and rootkits (which allow threat actors to remotely control others’ computers). Classically, the most common way in which viruses were inserted to a host computer was via email, for example in a phishing scam where a user unwittingly clicks an attachment, only for that attachment to be a nefarious method for inserting the virus into the computer. Today, users are engaged ever more outside of email into websites, eCommerce, embedded systems such as Internet-of-Things, mobile telephone and data networks, social media and cloud computing which create new opportunities for metamorphism [Anderson, 2008].

More broadly, given the potential for interaction across files, networks, websites, and other domains, there is a potential future threat of multiple metamorphic viruses communicating with each other, and operate across these domains in order to improve their resilience. Thus, even if global attacks from viruses may have subsided over time, there is potential for a new generation of metamorphic viruses which may, for example, operate across computing platforms and be even more adept at delivering malicious payloads such as rootkits.

As with other aspects of cyber security, the continuous cat and mouse game between the developers of malware such as metamorphic viruses and the security researchers who are seeking to thwart them goes on. To stay ahead, security researchers will need to develop techniques not only to apply advance analytical methods and machine learning, but also address metamorphism across computing platforms and technologies.

Essay: Are Metamorphic Viruses Beyond Antivirus Software?Exploring Challenges of Defeating Self-Modifying Code Attacks

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: