Last updated at Fri, 06 Oct 2023 14:35:28 GMT
This week is the Virus Bulletin Conference in London. Part of the conference is the Cyber Threat Alliance summit, where CTA members like Rapid7 showcase their research into all kinds of cyber threats and techniques.
Traditionally, when we investigate a campaign, the focus is mostly on the code of the file, the inner workings of the malware, and communications towards threat actor-controlled infrastructure. Having a background in forensics, and in particular data forensics, I’m always interested in new ways of looking at and investigating data. New techniques can help proactively track, detect, and hunt for artifacts.
In this blog, which highlights my presentation at the conference, I will dive into the world of Shell Link files (LNK) and Virtual Hard Disk files (VHD). As part of this research, Rapid7 is releasing a new feature in Velociraptor that can parse LNK files and will be released with the posting of this blog.
VHD files
VHD and its successor VHDX are formats representing a virtual hard disk. They can contain contents usually found on a physical hard drive, such as disk partitions and files. They are typically used as the hard disk of a virtual machine, are built into modern versions of Windows, and are the native file format for Microsoft's hypervisor, Hyper-V. The format was created by Connectix for their Virtual PC, known as Microsoft Virtual since Microsoft acquired Connectix in 2003. As we will see later, the word “Connectix” is still part of the footer of a VHD file.
Why would threat actors use VHD files in their campaigns? Microsoft has a security technology that is called “Mark of the Web” (MOTW). When files are downloaded from the internet using Windows, they are marked with a secret Zone.Identifier NTFS Alternate Data Stream (ADS) with a particular value called the MOTW. MOTW-tagged files are restricted and unable to carry out specific operations. Windows Defender SmartScreen, which compares files with an allowlist of well-known executables, will process executables marked with the MOTW. SmartScreen will stop the execution of the file if it is unknown or untrusted and will alert the user not to run it. Since VHD files are a virtual hard-disk, they can contain files and folders. When files are inside a VHD container, they will not receive the MOTW and bypass the security restrictions.
Note: Microsoft released a patch for https://msrc.microsoft.com/update-guide/vulnerability/CVE-2022-41091 impacting container files at the end of 2022. This should mitigate this issue by providing a warning to the user.
Depending on the underlying operating system, the VHD file can be in FAT or NTFS. The great thing about that is that traditional file system forensics can be applied. Think about Master-File_Table analysis, Header/Footer analysis and data carving, to name a few.
Example case:
In the past we investigated a case where a threat-actor was using a VHD file as part of their campaign. The flow of the campaign demonstrates how this attack worked:
After sending a spear-phishing email with a VHD file, the victim would open up the VHD file that would auto-mount in Windows. Next, the MOTW is bypassed and a PDF file with backdoor is opened to download either the Sednit or Zebrocy malware. The backdoor would then establish a connection with the command-and-control (C2) server controlled by the threat actor.
After retrieving the VHD file, first it is mounted as ‘read-only’ so we cannot change anything about the digital evidence. Secondly, the Master-File-Table (MFT) is retrieved and analyzed:
Besides the valuable information like creation and last modification times (always take into consideration that these can be altered on purpose), two of the files were copied from a system into the VHD file. Another interesting discovery here is that the VHD disk contained a RECYCLE.BIN file that contained deleted files. That’s great since depending on the filesize of the VHD (the bigger, the more chance that files are not overwritten), it is possible to retrieve these deleted files by using a technique called “data carving.”
Using Photorec as one of the data carving tools, again the VHD file is mounted read-only and the tool pointed towards this share to attempt to recover the deleted files.
After running for a short bit, the deleted files could be retrieved and used as part of the investigation. Since this is not relevant for this blog, we continue with the footer analysis.
Footer analysis of a VHD file
The footer, which is often referred to as the trailer, is an addition to the original header that is appended to the end of a file. It is a data structure that resembles a header.
A footer is never located at a fixed offset from the beginning of an image file unless the image data is always the same size because by definition it comes after the image data, which is typically of variable length. It is often situated a certain distance from the end of a picture file. Similar to headers, footers often have a defined size. A rendering application can use a footer's identification field or magic number, like a header's, to distinguish it from other data structures in the file.
When we look at the footer of the VHD file, certain interesting fields can be observed:
These values are some of the examples of the data structures that are specified for the footer of a VHD file, but there are also other values like “type of disk” that can be valuable during comparisons of multiple campaigns by an actor.
From the screenshot, we can see that “conectix” is the magic number value of the footer of a VHD file, you can compare it to a small fingerprint. From the other values, we can determine that the actor used a Windows operating system, and we can derive from the HEX value the creation time of the VHD file.
From a threat hunting or tracking perspective, these values can be very useful. In the below example, a Yara rule was written to identify the file as a VHD file and secondly the serial number of the hard drive used by the actor:
Shell link files (LNK), aka Shortcut files
A Shell link, also known as a Shortcut, is a data object in this format that houses data that can be used to reach another data object. Windows files with the "LNK" extension are in a format known as the Shell Link Binary File Format. Shell links can also be used by programs that require the capacity to store a reference to a destination file. Shell links are frequently used to facilitate application launching and linking scenarios, such as Object Linking and Embedding (OLE).
LNK files are massively abused in multiple cybercrime campaigns to download next stage payloads or contain code hidden in certain data fields. The data structure specification of LNK files mentions that LNK files store various information, including “optional data” in the “extra data” sections. That is an interesting area to focus on.
Below is a summarized overview of the Extra Data structure:
The ‘Header’ LinkInfo part contains interesting data on the type of drive used, but more importantly it contains the SerialNumber of the hard drive used by the actor when creating the LNK file:
Other interesting information can be found; for example, around a value with regards to the icon used and in this file used, it contains an interesting string.
Combining again that information, a simple Yara rule can be written for this particular LNK file which might have been used in multiple campaigns:
One last example is to look for the ‘Droids’ values in the Extra Data sections. Droids stands for Digital Record Object Identification. There are two values present in the example file:
The value in these fields translates to the MAC address of the attacker’s system… yes, you read this correctly and may close your open mouth now…
Also this can be used to build upon the previous LNK Yara rule, where you could replace the “.\\3.jpg” part with the MAC address value to hunt for LNK files that were created on that particular device with that MAC address.
In a recent campaign called “Raspberry Robin”, LNK files were used to distribute the malware. Analyzing the LNK files and using the above investigation technique, the following Yara rule was created:
Velociraptor LNK parser
Based on our research into LNK files, an updated LNK parser was developed by Matt Green from Rapid7 for Velociraptor, our advanced open-source endpoint monitoring, digital forensics, and cyber response platform.
With the parser, multiple LNK files can be processed and information can be extracted to use as an input for Yara rules that can be pushed back into the platform to hunt.
Windows.Forensics.Lnk parses LNK shortcut files using Velociraptor’s built-in binary parser. The artifact outputs fields aligning to Microsoft’s ms-shllink protocol specification and some analysis hints to assist review or detection use cases. Users have the option to search for specific indicators in key fields with regex, or control the definitions for suspicious items to bubble up during parsing.
Some of the default targeted suspicious attributes include:
- Large size
- Startup path location for auto execution
- Environment variable script — environment variable with a common script configured to execute
- No target with an environment variable only execution
- Suspicious argument size — large sized arguments over 250 characters as default
- Arguments have ticks — ticks are common in malicious LNK files
- Arguments have environment variables — environment variables are common in malicious LNKs
- Arguments have rare characters — look for specific rare characters that may indicate obfuscation
- Arguments that have leading space. Malicious LNK files may have many leading spaces to obfuscate some tools
- Arguments that have http strings — LNKs are regularly used as a download cradle
- Suspicious arguments — some common malicious arguments observed in field
- Suspicious trackerdata hostname
- Hostname mismatch with trackerdata hostname
Due to the use of Velociraptor’s binary parser, the artifact is significantly faster than other analysis tools. It can be deployed as part of analysis or at scale as a hunting function using the IOCRegex and/or SuspiciousOnly flag.
Summary
It is worth investigating the characteristics of file types we tend to skip in threat actor campaigns. In this blog I provided a few examples of how artifacts can be retrieved from VHD and LNK files and then used for the creation of hunting logic. As a result of this research, Rapid7 is happy to release a new LNK parser feature in Velociraptor and we welcome any feedback.