Combatting Botnet Traffic with Behavioral Analysis: Part III

Posted by Will Woodson | Lead Security Engineer on Jul 12, 2018 11:54:37 AM
Will Woodson | Lead Security Engineer
Find me on:

In the first two parts of this series we discussed methods to identify and block a few different types of botnet traffic, namely commodity comment/form spam and slightly more targeted attacks like distributed password guessing. This final part covers a slightly different way of analyzing malicious web traffic: grouping attackers based on their behavioral characteristics.

Web Application Protection

Behavioral Analysis - Grouping Bot Actors

Tracking Behavior Across Multiple Actors

As we saw in the previous post, Bot Behavior - Distributed Attacks, bots can be used to evade basic volumetric attack detection. Because each bot may use a unique source IP address, aggregating the traffic for alerting, reporting, or blocking can be difficult.

We can try to work around this by baselining traffic patterns for a given URL within a web application, but developing rules around these traffic norms is not always sufficient to distinguish botnet traffic and take appropriate action for offenders.

To complement web application baselining we also need a method for grouping attackers by their behavior. Below is an example heuristic for clustering attacks based on common characteristics: the first step in using behavior to identify likely botnets.

Clustering Attacks - Analysis

For the purpose of this analysis, the overall profile for an attacker or "threat actor" is made up of all their  attacks, which are just a subset of HTTP requests (those which we've already decided are malicious in some way).

If the attacker's attacks are similar in enough ways to another threat actor's, we can group them and call the combined entity a 'botnet', ascribing control of each of the threat actors to a single entity or individual, i.e. they're bots and each of their individual risks, behaviors, etc. make up some component of the whole entity's.

Example attacks.json

The following 25 records are a stripped down and sanitized set of  attack logs from the ThreatX WAF, containing only enough information for the clustering example in the next section. In practice, these attacks log over 15 fields each on which we could potentially make clustering decisions. 


Clustering Attacks - Python Example

The example code below performs clustering of the attach json provided in the previous section. In order to cluster attacks we're using a very basic set of rules:

1. Set a likeness score for each attack

2. If the attack hostname (HTTP  Host:) header matches an existing cluster, add 0.25 to likeness

3. If the attack path ( resource path portion of the URL in the request) header matches an existing cluster, add 0.50 to likeness

4. If the attack timestamp is within ~16 hours (60000s) of the last timestamp on an existing cluster, add 0.25 to likeness

5. If the attack's total likeness is >= 0.75 for an existing cluster, add the attack to that cluster (and stop checking likeness against the rest of the clusters)

6. If the attack's total likeness is < 0.75 for all existing clusters, create a cluster for the attack, using the attack's hostname, path, and timestamp as metadata for the new cluster

Clustering based primarily on resource path gives us a pretty simple approximation of intent:

The actor intended to perform some kind attack against the resource located at this path.

At this point, we don't really know if the actor was successful, the type of attack (we've stripped those parts out), or who actually controls the actor, but adding the other scores and clustering based on total likeness allows us to at least state:

These actors intended to perform some attacks against this resource path maybe somewhere around the same time, and possibly against the same host.

Which, though a big leap, is enough to say for the purpose of this example:

These actors could be coordinated, let's consider them to be controlled by the same entity (botnet).

Example Code

#! /usr/bin/env python3
import argparse, datetime, hashlib, json
# get attacks source file
parser = argparse.ArgumentParser()
parser.add_argument('-a','--attack_file', required=True, help="attacks.json")
args = parser.parse_args()
# attempt to cluster the attack with others
def cluster(attack,clusters):
    # get an epoch timestamp we can work with easily
    last_timestamp =
    for cluster in clusters:
        likeness = 0.0
        # test attack likeness to existing cluster based on field matches
        if attack['hostname'] == cluster['prototype_hostname']:
            likeness += 0.25
        if attack['path'] == cluster['prototype_path']:
            likeness += 0.50
        if abs(last_timestamp - cluster['last_timestamp']) <= 60000:
            likeness += 0.25
        # if alike enough, add the attack to cluster
        if likeness >= 0.75:
            # update the cluster last_timestamp if newer
            if last_timestamp > cluster['last_timestamp']:
                cluster['last_timestamp'] = last_timestamp
            return clusters
    # if the attack didn't match a cluster, it is unique enough to prototype new cluster
cluster= {}
cluster['attacks'] = [attack]
    # cluster metadata
    cid = "%s:%s:%s" % (attack['ip'],attack['hostname'],attack['path'])
    cluster['cluster_id'] = hashlib.sha1(cid.encode('utf-8')).hexdigest()
    cluster['last_timestamp'] = last_timestamp
    # prototype defines the cluster
    cluster['prototype_hostname'] = attack['hostname']
    cluster['prototype_path'] = attack['path']
    # add new cluster to list of clusters
    return clusters
def main():
    # init clusters data
    clusters = []
    # load the attack data
    with open(args.attack_file, 'r') as data:
        attack_data = json.load(data)
    # cluster the attacks
    for attack in attack_data:
        clusters = cluster(attack,clusters)
    # print the clusters
    print(json.dumps(clusters, indent=4))
if __name__ == '__main__':
Example Clustering Results

Below are selected results from our basic clustering algorithm. It was able to successfully identify the behavior seen in Part II: Case #1 - Distributed Password Guessing  /wp-login.php and Part II: Case #2 - Distributed Parameter Fuzzing  /path/to/debug/script.php as related even though each attack came from multiple source IP addresses, and, in the case of the debug script, also targeted different hosts.



        "cluster_id": "2d303dc16dc0a40c5c2b97d28a3c2d32cf881362",

        "last_timestamp": 1529963834,

        "prototype_path": "/wp-login.php",

        "prototype_hostname": "wordpress.threatxlabs.local",

        "attacks": [





        "cluster_id": "94d0b5a7b4f6ff580d7474f26eac0a6a3ecc9c10",

        "last_timestamp": 1529954635,

        "prototype_path": "/path/to/debug/script.php",

        "prototype_hostname": "www.threatxlabs.local",

        "attacks": [









        "cluster_id": "adc0522852e356492dbc3939658ebe3c6ad5db27",

        "last_timestamp": 1529955308,

        "prototype_path": "/wwwstats",

        "prototype_hostname": "payments.threatxlabs.local",

        "attacks": [





        "cluster_id": "1f1d730e936e217d903c895f7f8ff5485b7725ef",

        "last_timestamp": 1529963811,

        "prototype_path": "/htaccess",

        "prototype_hostname": "securecheckout.threatxlabs.local",

        "attacks": [





We also see examples of one-off attacks from the example data set; these were sorted into clusters of their own.

Taking Action on the Entire Botnet

What does grouping attackers get us?

The clustering example included in this post is just the beginning of the techniques that can be applied to identify like attack traffic. By clustering attacks and grouping attackers we gain the ability to make decisions based on the entire group's behavior. For botnet traffic, this means quickly detecting, tarpitting, and/or blocking malicious traffic from all identified members of the botnet, potentially before the individual member is able to fully participate in the attack.

ThreatX is continuously updating our capability to perform this kind of clustering based on attacker behavior. This, combined with active interrogation of suspicious actors and dynamic site profiling, ensures malicious bots are quickly identified and stopped.

Live Product Demo Series

Topics: Threat Intelligence

Threat X Labs - Blog

Arm yourself with information and insights on the latest cybersecurity trends to defend against today's most advanced cyber criminals with articles from the leader in SaaS-based web application firewall solutions.

Subscribe Here!

Recent Posts

Follow Me