December 2020, the weeks before Christmas, saw an increase in reported malware activity that culminated most prominently in the Sunburst Trojan attacks - events that are still developing as of today. As we were asserting our readiness to respond to new threats under our watch, we identified a suspicious executable being copied to a remote network share.
When notepad.exe connects to the Internet…
We caught a sample of an executable named functrl64.exe as it was moving laterally between hosts, and a file functrl32.dat briefly after that. VirusTotal already firmly suggested that we were dealing with a type of trojan or backdoor. So, at this point we were already expecting to find some sort of remote access mechanism or a gadget installing a backdoor. What we found was…
a) Pestudio file analysis summary
b) ResourceHacker screenshot
… notepad++.exe. Or it would be more accurate to say: A program that was trying very hard to look like Notepad++. It is not without irony, paraphrasing virtually any textbook about reverse engineering on process injection - Notepad is the textbook example of a harmless process being hijacked.
Luckily Notepad++ is open source which facilitates the analysis immensely: We can download the original source code from Github and place it side by side with our sample. If we can identify the points at which the paths of execution diverge between the original source code and the malicious sample, then we have very strong confidence that we are looking at modifications made by the malware authors. And indeed, we quickly come across a modification that does not exist in the official release:
a) Extract of decompiled notepad++.exe (Official release version 7.8.6)
b) Extract of decompiled functrl64.exe
You can easily convince yourself that the decompilates in Fig. 2a and Fig. 2b show variants of the same original code base - give or take a few compiler settings, optimizations, and assumptions made by the decompiler. In the juxtaposition, there are a number of recognizable function calls at (1), (2), (3), and (4). The sequence of instructions roughly pairs in Fig. 2a with those in Fig. 2b but starts diverging visibly around (5). Looking closer at how the malicious sample proceeds, we can spot that it is opening a handle to a file whose name and path matches with the second sample functrl32.dat that was found. The modified code range is preceded by a routine loading resources from a file stylers.model.xml, and succeeded by one loading resources from userDefineLang.xml. Both can be identified in the official version and the sample (userDefineLang.xml is not shown here).
Fig. 3 shows the modifications in between these two that were found in functrl64.exe, but not in the official release.
The malicious executable opens the functrl32.dat (1), copies it into a local buffer (2). Next, it applies an apparent decryption routine on the buffer (3). The program changes protections on the buffer’s memory to read-write-execute (4) and as soon as the payload has been delivered, functrl64.exe deletes it from disk (5).
At this point we are realizing how lucky we were and how important it was to catch the second sample in transit: If we had only been monitoring activity on the disk, our analysis might have already hit an unwelcome dead end by now. It is time to have a closer look at functrl32.dat:
0010h:
0020h:
0030h:
0040h:
0050h:
0060h:
0070h:
0080h:
0090h:
...
DA 4B 9A 43 82 84 4B 32 31 DE 43 82 84 52 4B C2
31 58 49 C2 31 83 43 82 84 43 2E 84 31 80 B9 46
74 C2 AB AA DB FF A3 A8 D4 FF FF FF 54 E1 59 C7
54 4F 5A C7 99 FB 4B C2 F1 FB 4B C2 F1 E0 C2 DD
24 69 67 5C E7 BD 1B 5C E7 02 08 34 97 F7 AA 22
FF F3 AA 22 FF 24 15 B2 FF 24 15 B2 FF 24 15 B2
FF 24 15 B2 FF 24 15 B2 FF 24 15 B2 FF 24 15 B2
4F 24 15 B2 C1 3B EF BC C1 4F 26 71 E0 B7 27 3D
2D 96 F3 55 04 65 D3 A5 F6 4A B4 57 97 67 94 34
...
ÚKšC‚„K21ÞC‚„RKÂ
1XIÂ1ƒC‚„C.„1€¹F
t«ªÛÿ£¨ÔÿÿÿTáYÇ
TOZÇ™ûKÂñûKÂñàÂÝ
$ig\ç½.\ç..4—÷ª"
ÿóª"ÿ$.²ÿ$.²ÿ$.²
ÿ$.²ÿ$.²ÿ$.²ÿ$.²
O$.²Á;ï¼ÁO&qà·'=
-–óU.eÓ¥öJ´W—g”4
...
We already know, the source code is removing a layer of encryption, so there is no surprise there that initially we are looking at seemingly random jumble. One thing not visible here, because it happens later in the code, is that once the file has been copied into a local buffer, functrl64.exe uses the pointer to the buffer as a pointer to a function and calls it. There are other giveaways, too, that the de-obfuscated file contained executable code.
0010h:
0020h:
0030h:
0040h:
0050h:
0060h:
0070h:
0080h:
0090h:
...
5A 8B 1A 83 C2 04 8B 32 31 DE 83 C2 04 52 8B 02
31 D8 89 02 31 C3 83 C2 04 83 EE 04 31 C0 39 C6
74 02 EB EA 5B FF E3 E8 D4 FF FF FF 54 21 D9 47
54 0F DA 47 19 7B 8B 02 F1 7B 8B 02 F1 20 02 DD
A4 A9 E7 5C 67 BD 9B 5C 67 42 48 34 97 F7 EA 62
FF F3 EA 62 FF A4 15 B2 FF A4 15 B2 FF A4 15 B2
FF A4 15 B2 FF A4 15 B2 FF A4 15 B2 FF A4 15 B2
0F A4 15 B2 01 BB AF BC 01 0F A6 71 20 B7 A7 3D
ED 96 F3 55 84 E5 D3 25 F6 8A B4 57 97 E7 94 34
...
Z‹.ƒÂ.‹21ÞƒÂ.R‹.
1؉.1ÃÂ.ƒî.1À9Æ
t.ëê[ÿãèÔÿÿÿT!ÙG
T.ÚG.{‹.ñ{‹.ñ .Ý
¤©ç\g½›\gBH4—÷êb
ÿóêbÿ¤.²ÿ¤.²ÿ¤.²
ÿ¤.²ÿ¤.²ÿ¤.²ÿ¤.²
.¤.².»¯¼..¦q ·§=
í–óU„åÓ%öŠ´W—ç”4
...
Jump and call instructions in the x86 instruction set are remarkably recognisable. Shellcode has to rely on position-independent code because it cannot anticipate its location in memory or the location of any other structures. It so happens that x86 CALL instructions to a relative offset are encoded as E8, JMP instructions to a relative 8-bit offset as EB, and the opcodes of almost all variants of short conditional jumps, i.e. conditional jumps to an 8-bit offset are between 70 and 7F. An elevated density of bytes looking like these opcodes is often an indicator for executable code, especially if their operands look like legitimate jump targets. The disassembled version reveals a second de-obfuscation routine:
In Fig. 6, execution first jumps in two steps from (1) over (2) to (3), and then calls back to offset 031E1118. A CALL directly followed by a POP instruction, is a common method for shellcode to get the value of the instruction pointer. If the author has placed data somewhere in the shellcode, the instruction pointer is a useful position-independent base address to locate said data.
This way the CALL at (3) stores the address to the directly following instruction and POP EDX loads this address again. In the loop between (4) and (5) this address serves as an index into an array, decrypting the array byte for byte. The very first decrypted datum would later turn out to be a function pointer. After the loop has run, this is the next target to which the shellcode jumps at (6). Let us look one more time at the resulting hex dump:
0010h:
0020h:
0030h:
0040h:
0050h:
0060h:
0070h:
0080h:
0090h:
00A0h:
00B0h:
00C0h:
00D0h:
00E0h:
00F0h:
0100h:
0110h:
0120h:
0130h:
...
5A 8B 1A 83 C2 04 8B 32 31 DE 83 C2 04 52 8B 02
31 D8 89 02 31 C3 83 C2 04 83 EE 04 31 C0 39 C6
74 02 EB EA 5B FF E3 E8 D4 FF FF FF 54 21 D9 47
54 0F DA 47 4D 5A 52 45 E8 00 00 00 00 5B 89 DF
55 89 E5 81 C3 14 7C 00 00 FF D3 68 F0 B5 A2 56
68 04 00 00 00 57 FF D0 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
F0 00 00 00 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C
CD 21 54 68 69 73 20 70 72 6F 67 72 61 6D 20 63
61 6E 6E 6F 74 20 62 65 20 72 75 6E 20 69 6E 20
44 4F 53 20 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00
00 00 00 00 A1 5A 12 04 E5 3B 7C 57 E5 3B 7C 57
E5 3B 7C 57 58 74 EA 57 E4 3B 7C 57 FB 69 F8 57
CD 3B 7C 57 FB 69 E9 57 F1 3B 7C 57 FB 69 FF 57
67 3B 7C 57 C2 FD 07 57 EE 3B 7C 57 E5 3B 7D 57
28 3B 7C 57 FB 69 F5 57 2F 3B 7C 57 FB 69 EE 57
E4 3B 7C 57 FB 69 ED 57 E4 3B 7C 57 52 69 63 68
E5 3B 7C 57 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 50 45 00 00 4C 01 04 00 02 B2 A0 5F
...
Z‹.ƒÂ.‹21ÞƒÂ.R‹.
1؉.1ÃÂ.ƒî.1À9Æ
t.ëê[ÿãèÔÿÿÿT!ÙG
T.ÚG<MZREè....[‰ß
U‰å.Ã.|..ÿÓhðµ¢V
h....WÿÐ........
................
ð.....º..´.Í!¸.L
Í!This program c
annot be run in
DOS mode....$...
....¡Z..å;|Wå;|W
å;|WXtêWä;|WûiøW
Í;|WûiéWñ;|WûiÿW
g;|WÂý.Wî;|Wå;}W
(;|WûiõW/;|WûiîW
ä;|WûiíWä;|WRich
å;|W............
....PE..L....² _
...
Looks familiar! Apparently, functrl32.dat contains an embedded PE file (s. Portable Executable), the primary format for Windows executable files. The PE file format starts with a little DOS executable in the beginning, indicated by the magic number 4D 5A and a stub program containing the string “This program cannot be run in DOS mode”. The PE file itself begins after the DOS stub with the magic number 50 45 00 00. If we dump the successfully unpacked file to disk, we can now put the debugger aside and inspect the file statically in a disassembler like IDA or Ghidra.
A DLL in not-so-disguise
The first thing a static analyser can tell, is whether a given PE file is an executable image or a DLL. The PE file header consists of three main components: The signature, the COFF file (Common Object File Format) header, and the Optional Header (s. PE Format). The characteristics field at offset 18 in the COFF header - at offset 22 from the base of the PE file - confirms that the de-obfuscated functrl32.dat is indeed a DLL. This means, as a DLL this file generally cannot be run independently. In contrast to executable images, which as consumers primarily import functionality from libraries, DLLs export their own interface functions. DLLs can implement a DllMain function for one-time initialization of resources that the library may require to function properly. This function is called by the Windows loader when it has successfully finished loading the library into a process.
Figure 8: Export table of functrl32.dat (Ghidra)As shown in Fig. 8, functrl32.dat exports precisely one function: _ReflectiveLoader@4. Further, it shows that the DLL is exported by the name beacon.dll. This looks alarmingly familiar. In 2008, Stephen Fewer, then independent researcher at Harmony Security, published a paper on Reflective DLL Injection, which he later followed up with a proof of concept. A beacon is generally a malware component which serves as a client, registering itself with a C2 server to request further instructions. We are going to talk about the beacon more in depth in a dedicated article. For now, let us focus on the reflective loader.
Stephen Fewer describes in his paper a technique to load a library into a process without invoking the Windows loader, but where the library performs the necessary bootstrapping itself. This task is further complicated by the custom loader not being able to make any assumptions about its own location in memory. Hence again, the bootstrapping code needs to be position independent and needs to manually locate the system library functions that it requires.
Injection Techniques
Injection techniques in general all try to instrument an otherwise harmless or trusted program. The objective is to coerce the program into executing code that did not originate from that program. If successful, the task of preventing harm becomes crucially more difficult. Malware on disk can be statically analyzed and is therefore the easiest to identify and neutralize - that is if it is not encrypted in any shape or form. Malware in process memory though, is only observable in two ways:
- Calls to external APIs can be intercepted and recorded,
- One can take a snapshot of the process memory at a given instance for analysis.
Malware trying to stay under the radar is therefore often trying to reduce its visible surface in four ways: a) Minimize the time spent on disk, b) obfuscate the resources that have to touch disk, c) hide in benign or seemingly benign processes, and d) perform setup actions autonomously without the help of the system.
DLL Injection is a subcategory of Process Injection, which aims to load malware covertly on a system into a target process. Admittedly, DLL Injection is not only performed by malware. Replacing a DLL, a Dynamic-Link Library, at load time for instance, with a custom implementation of that DLL can be a legitimate way to record an application’s behaviour by tallying the calls and call arguments of the application that flow through the DLL. But this case would still require the DLL to be recognizable as such. Imagine you were a hacker and wanted to bring your tool set to work: Would you expect to pass through a security control carrying a lock pick and a crow bar?
The Windows Loader and Reflective Loading
To reiterate what we said about DLL Injection: The aim is to deploy a payload in the form of a library into a process and coerce the target process into executing it. The difference in our specific functrl32.dat case, is that the target is not a remote process, but the calling notepad++.exe imposter. Secondly as mentioned, after being copied into the process’ address space, the library here performs the task of the Windows loader itself. But what is the task of the Windows loader anyway?
The Windows loader, or more correctly the image loader, is a set of API functions prefixed with Ldr in the user-mode system DLL ntdll.dll. The image loader has a number of responsibilities, but the arguably three most significant ones - if not at least from our perspective - include:
- The initialisation of the user-mode state of the process,
- The resolution of imports of the process from DLLs that it requires, and
- The dynamic loading and unloading of DLLs at runtime.
The prior two tasks are also the last stages in the creation of new processes. The image loader takes over after the kernel-side initialisation of the process, e.g. the allocation of the process control block, the address space of the process, and mapping the image into the process, has been completed. For a new process, that typically involves the creation of internal exception handler tables, the creation of a process heap, checks whether the CLR (Common Language Runtime) needs to be invoked, etc.
It also involves the creation of a module database, which is where things become interesting for us again. The module database is maintained by the loader throughout the lifetime of the process. It keeps track of all the modules that are mapped into the process’ address space, including ntdll.dll itself as the first entry, and the main executable as the second. The Ldr field at offset 0x0C on x86 of the Process Environment Block (PEB), which is mapped by the Kernel into the address space of the process, points to a PEB_LDR_data structure. This structure contains three linked lists of LDR_DATA_TABLE_ENTRY elements. These entries store the information about the modules that have been loaded by the image loader. All three lists contain the same entries and only vary in the order that they are linked. At process startup, the loader parses the import directory of the executable, locates each dependency on disk, maps it into the address space of the process and adds a new database entry. After the DLL is loaded, the loader queries its export table for the functions that were requested by the executable or other DLLs in the process. This procedure is repeated recursively for each newly loaded dependency. After the entire dependency graph has been resolved and loaded into memory, the loader calls DllMain for each DLL, allowing them to run per-library initializations.
If the application calls LoadLibrary or one of its sibling functions, the loader performs the same steps again to resolve the newly requested dependency. This all assumes that the image, regardless of whether the main executable or a DLL, was loaded at its preferred base address as specified by the ImageBase field in the PE header. If not, the loader has to parse the .reloc section of the PE file and apply all of listed relocations, so that calls to imported functions and references to global and static data point to the correct addresses.
Obviously, the reflective loader is not going to mirror all of this behaviour - and it does not need to. In fact, it provides considerable benefits not adding a new entry to the module database for the reflectively loaded DLL itself. Without it there is no traceable evidence of the DLL that can be queried from outside the process, other than a seemingly lost array of memory pages marked with read-write-execute permissions in the process’ address space. This is indeed not the case for orderly loaded images, as the section parsing normally would be performed in kernel space. Only then the image is mapped into process memory, at which point a) page permissions have been correctly set, and b) the module database is amended by the loader.
As Stephen Fewer points out, the minimal but critical steps that the reflective loader has to implement to load an arbitrary DLL is:
- Find the base address of the image,
- Locate kernel32.dll in the module database of the current process and retrieve the addresses to the functions LoadLibraryA, GetProcAddress, and VirtualAlloc,
- Copy the image to a new permanent, page aligned memory address,
- Parse and load all sections of the image file,
- Resolve the imports of the DLL (using LoadLibraryA and GetProcAddress),
- Perform relocations, and
- Call the DLL’s entry point with DLL_PROCESS_ATTACH.
Back to functrl32.dat
_ReflectiveLoader@4 first retrieves the current EIP and uses it as the starting point to walk back through memory to find the base address of the image. Remember that the file has been copied into a buffer that has been allocated with LocalAlloc, meaning there are no guarantees in terms of memory alignment, etc.
Figure 9: Disassembly _ReflectiveLoader@4 at offset 10008A6BFig. 9 shows the function walking back byte for byte, looking for the memory address that points to the DOS magic number “MZ\x00\x00” at (1), which stands for the base address of the image. You may wonder, Fig. 7 shows that the four bytes at the image base address are “MZRE”. Effectively, this function is only comparing the initial two bytes, trying to match “MZ”. Once it has found a candidate for the base address, it tentatively treats it like a DOS header, getting the offset to the PE header at (2). The offset needs to lie within a reasonable range, i.e. at least as large as the size of the DOS header, which is 64 Bytes, and no larger than 1KB. If so, at (3) the reflective loader controls whether the PE header offset actually points to the PE header signature, to avoid false positives. Without the additional sanity checks, the function may accidentally dereference an invalid address and segfault.
After storing the image base address, the next step is to find the module database and search it for specific modules that the process has hopefully already loaded. We do not know exactly, where the module database is in memory, but we know that it is referenced by the PEB. Now, neither do we know where the PEB is located. But, the location of the Thread Environment Block (TEB) on Windows (on x86) can reliably be found in the FS segment register.
Figure 10: Disassembly _ReflectiveLoader@4 at offset 10008AD3With the help of the TEB at (1) in Fig. 10 the PEB can be located by dereferencing the pointer at offset 0x30. From there we know how to proceed: Look for the module database at offset 0xC (2). Inside the PEB_LDR_data structure, we can choose one of the three linked lists. This code chooses the linked list at offset 0x14 (3), which links the entries in memory placement order.
Each LDR_DATA_TABLE_ENTRY contains static information about the loaded module, including the base address of the module, and a UNICODE_STRING with the name of the DLL. The reflective loader uses a simple ror13 hash algorithm to compare the DLL names against some precomputed hashes. Specifically, it is looking for the module whose name hashes to 0x6A4ABC5B, which is kernel32.dll. If the entry matches, the loader can just follow the pointer to the base address of kernel32.dll, and locate its export directory. The export directory lists the exported functions in lexicographical order.
Figure 11: Disassembly _ReflectiveLoader@4 at offset 10008C10By the same ror13 hash method, the loader is trying to locate six specific functions in kernel32.dll:
- LoadLibraryA (hash = 0xEC0E4E8E)
- GetProcAddress (hash = 0x7C0DFCAA)
- VirtualAlloc (hash = 0x91AFCA54)
- VirtualProtect (hash = 0x7946C61B)
- LoadLibraryExA (hash = 0x753A4FC)
- GetModuleHandleA (hash = 0xD3324904)
Next, the reflective loader has to properly align its own DLL in memory which involves copying it over to a new page aligned address. After that, first the import directory of the DLL is parsed, using the previously detected function pointers to LoadLibraryA and GetProcAddress to resolve the imports in the directory. Lastly, the loader has to go through the base relocation table and block for block change the listed references to properly account for the new base address of the image. The reflective loader has to account for the different possible types of relocations specified for each entry in the block. It could be as simple as adjusting the reference by the delta between the preferred image address, as specified in the PE header, and the actual address it has now in memory. More involved are the relocations where only the high or low order bits are adjusted, or which are assembled from multiple relocation entries in the block. At this point one more task remains to be done, although the malicious DLL still takes detours to zero out its traces - e.g. module and function names that were required along the way (cf. Fig. 11), pointers to the functions that have been used to perform the loading, etc.
Figure 12: Disassembly _ReflectiveLoader@4 at offset 100089E8To complete loading the DLL, it requires notification that it has been orderly attached to the process. At (1) in Fig. 12 the entry point, i.e. DllMain, which has been located and stored during the preceding loading procedure, is called with the argument DLL_PROCESS_ATTACH at (2).
DllMain is in fact called twice: First to complete the attachment process of the DLL. The second time DllMain is called with a custom notification code to execute its payload. The payload that we found (remember beacon.dll?) - is subject to another dedicated article.
Detecting the Dropper and the Reflectively Loaded DLL
How does any of this behaviour stand out from the ordinary and hint at a program delivering a malicious payload? At which point in the process can we observe indicators of compromise? Can we statically analyse and detect this attack pattern apart from fingerprinting the samples that have already been detected?
A running system generates very noisy data about the software running on it, and much of this software is performing legitimate and useful tasks. In the case of delivering a payload via reflective DLL injection, it is particularly difficult to single out isolated features that serve as reliable indicators of an attack or an exploit. Even Microsoft acknowledges that detecting reflective DLL loading is challenging solely based on static indicators. They suggest a behavioural approach, monitoring memory allocations of a process and correlating features such as: Allocation size, allocation history, information from the specific thread requesting the allocation, etc.
In our specific malware case, there have been static indicators, which in combination should raise the suspicion of a malware detection engine. We also happen to have caught another sample from the same malware authors, so we can cross check, which of these indicators could potentially scale to detect variants of this attack:
- Most obviously, the name of the executable, when it was copied over network, differed from the name found in the version document in the resources directory. Unfortunately, this is not true for the second sample that we found. The second sample was a recompiled version of a Japanese text editor called “Sakura editor”, that has received similar modifications to the first sample. In this case, the file name has remained unchanged.
- The listed debug directories in either sample are conspicuous:
Official ReleaseMalicious SampleNotepad++c:\sources\notepad-plus-plus\powereditor\bin\npp.pdbj:\neudttjvnmfzpyzyrew\fkvksyfydr\feerorgrkl\wxmxqtitqcqgzxfnxkz\tisepdjbuefu\grycyeulksa.pdbSakurac:\projects\sakura\win32\release\sakura.pdbj:\hcimjeauqba\ituesatfkpebikthwcns\jlenkwcpywi\ceithycybrtlvy.pdb
- Comparing the import tables between functrl64.exe which was compiled from the Notepad++ source code, to the official release with matching release version and matching target architecture reveals imports in the malicious binary which the official executable does not import:
○ CreateFileA
○ DeleteFileA
○ VirtualProtectEx
These imports are not suspicious on their own. What stands out for the first two imports though, is that official release uses wide character versions of these functions, which the malicious version consequently also imports. The same is true for either samples, both breaking the pattern of normally using the UTF-16 API and then importing the ANSI-character version for very specific API functions.
Though, it is important to note that the official Notepad++ release still imports VirtualProtect, albeit through a statically linked concurrency library. The question, whether the use is legitimate or not would gravitate back to monitoring how the application uses VirtualProtect/VirtualProtectEx.
- Luckily, the official Notepad++ release has been signed and can be verified, which cannot be said about the official release of the Sakura Editor.
- In either case, the malware attempted to run from a system directory, which is an immediate red flag for any software that has not been released by Microsoft.
- The modifications by the malware authors are nearly identical in either case, looking for a .dat file with a name similar, but not necessarily equal, to the executable’s name in the same system directory. E.g.:
○ "c:\\windows\\help\\functrl32.dat",
○ "c:\\windows\\debug\\sakura.dat",
○ "c:\\windows\\help\\sakura.dat".
- A sandboxed dynamic analysis engine serving functrl64.exe’s request to open the file "c:\\windows\\help\\functrl32.dat" with a dummy file, would have been able to detect, that the contents of the area whose permissions were changed via VirtualProtectEx, would have exactly matched those of the dummy file. This would have been a serious security vulnerability under any circumstance, whether functrl64.exe would have been a legitimate piece of software or not.