Last updated at Wed, 08 Jan 2025 19:43:51 GMT
As botnets continue to evolve, so do the techniques required to detect them. While Transport Layer Security (TLS) encryption is widely adopted for secure communications, botnets leverage TLS to obscure command-and-control (C2) traffic. These malicious actors often have identifiable characteristics embedded within their TLS certificates, opening a potential pathway for advanced detection techniques.
In first-of-its-kind research, Rapid7’s Dr. Stuart Millar, in collaboration with Kumar Shashwat, Francis Hahn and Prof. Xinming Ou, at the University of South Florida, studied the use of AI large language models (LLMs) to detect botnets' use of TLS encryption by analyzing embedding similarities to weed out botnets within a sea of benign TLS certificates. The work was presented at AISec 2024 in Salt Lake City as part of the leading ACM CCS conference toward the end of last year, where previously Rapid7 collected the best paper award.
Botnets — networks of hacked devices that attackers control remotely — often use TLS encryption to hide their activity. This encryption keeps the traffic secure, making it challenging for traditional security tools to detect whether a device is part of a botnet. Millar and company found they could detect botnets by analyzing the unique characteristics in the TLS certificates that each server uses to identify itself, dramatically reducing the time and human effort required.
Large language models can represent text as embeddings, or numerical vectors that capture the meaning and structure of the text. These embeddings were used to create vector representations of the text in TLS certificates, such as the organization names and country codes listed on them. By projecting these representations into a vector space and then using a similarity search, any new certificate can first be compared to a known set of botnet and benign certificates, and then a decision made as to whether or not it is malicious.
They found that in using an open-source LLM called C-BERT, the model achieved an accuracy rate of 0.994, surpassing proprietary alternatives in accuracy, speed, and cost-efficiency. This means it could reliably distinguish between botnet and benign certificates far more effectively and efficiently than standard practices, which was confirmed through random sampling.
In order to simulate a real world scenario, the researchers tested the model on 150,000 TLS certificates. They found 13 certificates as potential botnets which, when verified against a malware detection service, yielded one certificate that was found to be malicious. This approach eliminated the time intensive and costly process of identifying malicious botnets manually.
The model was also able to identify zero-day botnets, or those that had not yet been documented before. By omitting certain known botnets during training and then testing with these omitted samples, they demonstrated that the model could still detect them, even without prior exposure.
Deploying this AI solution in a real-world environment offers cybersecurity teams a substantial advantage in botnet detection by reducing false positives and minimizing manual inspection. Future research aims to expand the range of certificate attributes used in embeddings, improve real-time processing capabilities, and integrate additional datasets for a broader scope. Explore the full research paper for an in-depth look at the methodology and results of an LLM-based approach to botnet TLS certificate detection.