The following is from ai-one’s white paper on Machine Learning for Cyber Security at Network Speed & Scale. Click here to download a copy from SlideShare (registration required).
A Call to Action
Our research indicates that cyber security is far worse than is commonly reported in news outlets. We estimate there is an extreme shortage of human capital with the skills necessary to thwart attacks from rapidly evolving, highly adaptive adversaries.,  Research for this paper includes publically available sources of information found on the Internet, interviews with network and software security experts and experts in artificial intelligence. In particular, we speculate on how machine learning might impact the security of large-scale (enterprise) networks from both offensive and defensive perspectives. In particular, we seek to find ways that machine learning might create and thwart zero-day attacks in networks deploying the most current security technologies, such as neural network enabled intrusion detection and protection system (IDPS), heuristic and fuzzy matching anti-malware software systems, distributed firewalls, and packet encryption technologies. Furthermore, we evaluate ways that adaptive adversaries might bypass application level security measures such as:
- address space layout randomization (ASLR)
- heap hardening
- data execution prevention (DEP)
We conclude that machine learning provides first-mover advantages to both attackers and defenders. However, we find that the nature of machine learning’s ability to understand complexity provides the
greater advantage to network defenses when deployed as part of a multi-layer defensive framework.
As networks grow in value they become exponentially more at risk to cyber attacks. Metcalfe’s Law states that the value of any network is proportional to the number of users. From a practical standpoint, usability is proportional to functionality. That is, the use of a network is proportional to its functionality: The more it can do, the more people will use it. From a cyber security standpoint, each additional function (or application) running on a network increases the threat surface. Vulnerabilities grow super-linearly because attacks can happen at both the application surface (through an API) and in the connections between applications (through malicious packets).
Coordinated cyber attacks using more than one method are the most effective means to find zero-day vulnerabilities. The December 2009 attack on Google reportedly relied upon exploiting previously discovered pigeonholes to extract information while human analysts were concurrently distracted by what appeared to be an unrelated attack.
Sources & Types of Cyber Attacks
(employees, contractors, etc.)
(hostile nations, terrorist organizations, criminals, etc.)
Cyber attacks are usually derivatives of previously successful tactics. Attackers know that software programmers are human – they make mistakes. Moreover, they tend to repeat the same mistakes – making it relatively easy to exploit vulnerabilities once they are detected. Thus, if a hacker finds that a particular part of a network has been breached with a particular byte-pattern (such as a birthday attack) they will often create numerous variations of this pattern to be used in the future to secure an entry into the network (such as a pigeonhole).
Let’s evaluate a few of these types of attacks to compare and contrast Computer programming and machine learning approaches to exploit and defend cyber vulnerabilities.
Exploiting API Weaknesses (Application Hijacking)
Detecting flaws in application program interfaces (APIs) is a rapidly evolving form of cyber attack where vulnerabilities in the underlying application are exploited. For example, an attacker may use video files to embed code that will cause a video player to erase files. This approach often involves incrementally inserting malicious code, frame-by-frame, to corrupt the file buffer and/or hijack the application. This incremental approach depends upon finding flaws within the code base. This is easily done if the attacker has access to the application outside the network – such as a commercial or open-source copy of the software.
Programming Measures and Counter-Measures to API Exploits
Traditional approaches to thwart derivative attacks to an API are relatively straightforward and human resource intensive: First, the attack is analyzed to identify markers (such as identifiers within packet payload). Next, the markers are categorized, classified and recorded – usually into a master library (e.g., McAfee Global Threat Intelligence). Finally, anti-malware software (such as McAfee) and IDPS network appliances (such as ForeScout CounterACT) scan packets to detect threats from known sources (malware, IPs, DNS, etc.). Threats that are close derivatives of known threats are easily thwarted using look up tables, algorithms and heuristics while concurrently detecting and isolating anomalous network behavior for further human review.
Problems with the Computer Programming Approach
“Should we fear hackers? Intent is at the heart of this question.” Kevin Mitnick, Hacker, after his release from Federal prison 2000.
There are many problems with defenses that know only what they are programmed to know. First, it is almost impossible for a person to predict and program a computer to handle every possible attack. Even if you could, it is practically impossible to scale human resources to meet the demands of addressing each potential threat as network complexities grow exponentially. A single adaptive adversary can keep many security analysts very busy. Next, cyber threats are far easier to produce than they are to detect – it takes 10 times more effort to isolate and develop counter measures to a virus than it does to create it. Finally, the sheer scale of external intelligence and human resources far outstrips the defensive resources available within the firewall. For example, the US Army’s estimated 21,000 security analysts must counter the collective learning capacity and computational resources of all hackers seeking to disrupt ARCYBER – potentially facing a 100:1 disadvantage worldwide.
Moreover, new approaches to malware involve incremental loading of fragments of malware into a network where they are later assembled and executed by a native application. Often the malicious code fragments are placed over many disparate channels and inputs thereby disguising themselves as noise or erroneous packets.
Machine Learning Measures and Counter-Measures to API Exploits
Machine learning is an ideal technology for both attacking and defending against API source code vulnerabilities. Knowing that programmers tend to repeat mistakes, an attacker can find similarities across the code base to identify vulnerabilities. A sophisticated attacker might use genetic algorithms and/or statistical techniques (such as naïve Bayes) to find new vulnerabilities that are similar to others that have been found previously. Machine learning provides defenders with an advantage over attackers because it detects these flaws before the attack. This enables the defender to entrap, deceive or use other counter-measures against the attacker.
Machine learning provides a first-mover advantage to both defender and attacker – but the advantage is far stronger for the defender because it can detect any anomaly within the byte-pattern of the network – even after malicious code has bypassed cyber defenses, as in a sleeper attack. Thus, the attacker would need to camouflage byte-patterns in addition to finding and exploiting vulnerabilities – thus requiring the attacker to add tremendous complexity to his tactics to bypass defenses. Since machine learning becomes more intelligent with use, the defenders systems will harden with each attack – becoming exponentially more secure over time.
Counterfeiting network authentication to gain illicit access to network assets is one of the oldest tricks in the hacker’s book. This can be done as easily as leaving a thumb drive infected with malware in a parking lot for a curious insider to insert into a network computer. It can also involve sophisticated social engineering to crack passwords, find use patterns and points of entry for a hacker to impersonate a legitimate user.
Programming Measures and Counter-Measures to Impersonations
Traditional approaches to impersonation attacks depend upon user authentication and controlling access to network assets using predetermined permissions. Once an attacker is inside the network with a false identity, he can run freely so long as he does not trigger any alarms by violating his permissions. This defense is entirely programmatic as it assumes that if the attacker gets past the firewall he will behave differently than a legitimate user. This is irrelevant to defense since the attacker can use his presence to learn about network assets to attack them in different ways. For example, the attacker can identify APIs, network appliances and determine other security protocols to identify further vulnerabilities that might be compromised with an external attack.
Problems with the Computer Programming Approach to Prevent Impersonations
Rules-based permissions are only as good as the rules can model human behavior. Attackers familiar with these rules and the standard practices of network security easily stay within acceptable boundaries of use.
Machine Learning Measures and Counter-Measures to Impersonation
In the case of insider threats, machine learning provides the defender more advantages than the attacker. Although attackers can use machine learning of byte-patterns to “hack” an identity, they are limited to behaving exactly as that identity would – to the extent that they must know how that person has behaved in the past and how the system will perceive their every movement. The defenders advantage is that machine learning creates an “entology” – an ontology of the entity – for every authenticated user. This is a heterarchical representation of all past behavior at the byte- or packet-level. This enables network security to evaluate use patterns to find anomalies that would be difficult (if not impossible) to predict using a set of computer programming commands. Machine learning does not depend on rules – rather just observation to find associations and patterns. This can be done at every at every point within the network – routers, network appliances, APIs, data bases access points, etc.
 The shortage in cyber warriors in the US Government is widely reported. For example, see http://www.npr.org/templates/story/story.php?storyId=128574055
 Threats to the Information Highway: Cyber Warfare, Cyber Terrorism and Cyber Crime
 V∝n2 where value (V) is proportional to the square of the number of connected users of a network (n).
 Threat vulnerability is a corollary to Metcalfe’s Law whereby each additional network connection provides an additional point security exposure. T∝(n2p2) where vulnerability (T) is proportional to the square of the number of connected users of a network (n) times the square of the number of APIs (p).
 Interview with former anonymous hacker.
 Yamaguchi, Fabian. “Automated Extraction of API Usage Patterns from Source Code for Vulnerability Identification” Diploma Thesis TU Berlin, January 2011.
 Examples of this technique were discussed at the BlackHat Security Conference in early August 2011.
 For a discussion on sleeper attacks see: Borg, Scott. “Securing the Supply Chain for Electronic Equipment: A Strategy and Framework.” The Internet Security Alliance report to the White House. (available on http://www.whitehouse.gov/files/documents/cyber/) and also The US Cyber Consequences Unit (http://www.usccu.us/)
 Interview with former forensic network security agent at major investment bank.