George Wu | DEFENDER ENGAGEMENT — Structural Detection Research

TABLE OF CONTENTS

01 Situation Report 02 Mutation Pipeline Results 03 Detection Chain Analysis 04 Defender Emulation Engine 05 ETW Threat Intelligence (Kernel) 06 NtContinue — The Documented Gap 07 Why dark_room Passes but inject_dll Fails 08 Pipeline Fixes — Defender Quarantine Handling 09 Structural Diff — VADER vs SKYWALKER 10 Research Directions 11 Sources

01 // SITUATION REPORT

The VADER rootkit kill chain has 8 components. 7 pass Defender's static scan cleanly after XOR key mutation. One component — inject_dll (signature set HOTEL) — is detected regardless of which XOR key is used. After 10 consecutive key rotations across the full 0x00-0xFF range, every compiled binary is flagged.

This document analyses WHY Defender detects inject_dll, maps the three-layer detection chain, and identifies documented gaps in each layer. This is research into detection mechanics — understanding how endpoint security works from the inside.

COMPONENT STATUS

Component	Signature	Mutation	Status
dark_room	ALPHA	0x38 → 0xBE	CLEAN (attempt 1)
inject_dll	HOTEL	10 high-bit keys tried	DETECTED (all 10)
inject_exe	ECHO	0x30 → 0xB5	CLEAN (attempt 1)
v4_svc_replace	V4	0xCA → 0xDC	CLEAN (attempt 1)
v5_dll_proxy	V5	0xB1 → 0xC9	CLEAN (attempt 1)
v6_path_hijack	V6	0xF7 → 0xC6	CLEAN (attempt 1)
v7_phantom_dll	V7	0x93 → 0xBD	CLEAN (attempt 1)
shell	BRAVO	0xB9 → 0xF7	CLEAN (attempt 1)

SKYWALKER (cold standby fork) has the same inject_dll pattern. Previously found a clean key after 8 attempts, but as of 18 June 2026, SKYWALKER inject_dll also fails 10/10 with high-bit keys. Defender has updated its signatures — both variants are now equally detected. The detection is purely structural.

02 // MUTATION PIPELINE RESULTS

The XOR key mutation pipeline (mutate.py) automates the compile → scan → rotate cycle. For each component, it generates a new random XOR key, re-encodes all string arrays, recompiles, and scans with MpCmdRun.exe -Scan -ScanType 3. Up to 10 attempts per component.

What XOR Mutation Does

XOR encoding transforms plaintext strings (API names, DLL names, file paths) into byte arrays that don't match Defender's string signatures. At runtime, a single-byte XOR key decodes them back. The encoding is its own inverse: A ^ K = E, E ^ K = A.

What XOR Mutation Does NOT Do

It changes the data in .rdata, not the code in .text. The API call sequence, the function structure, the import table, the PE layout — all remain identical regardless of key. If Defender's signature is on code structure rather than string patterns, XOR rotation has zero effect.

KEY FINDING

inject_dll is detected via structural signature, not string matching. The detection targets the code pattern itself: VEH registration + thread enumeration + cross-thread SetThreadContext with debug register flags. This is why 10/10 key rotations fail — the key changes the strings but the code shape is constant.

03 // DETECTION CHAIN ANALYSIS

Defender operates a three-layer detection pipeline. Each layer uses different signals. inject_dll likely triggers all three.

Layer 1 — Static Analysis (Import Table)

Before any code executes, Defender inspects the PE file's Import Address Table. Combinations of APIs associated with injection patterns raise a heuristic score:

SUSPICIOUS IMPORT COMBINATION (inject_dll):
  CreateToolhelp32Snapshot   ← thread/process enumeration
  Thread32First / Thread32Next
  OpenThread                 ← accessing other threads
  SuspendThread / ResumeThread
  GetThreadContext / SetThreadContext  ← context manipulation
  AddVectoredExceptionHandler         ← VEH registration
  GetProcAddress / LoadLibraryA       ← dynamic resolution

A DLL with this import combination and minimal legitimate exports is flagged with elevated confidence before the emulator even starts.

Layer 2 — Emulation (mpengine.dll)

Defender's ~14MB emulation engine translates machine code to an intermediate language and executes it in a sandbox. It emulates x86/x64 instructions, Windows API calls, kernel functions, filesystem, and registry. The emulator enters at DllMain(DLL_PROCESS_ATTACH) and follows code flow from there.

inject_dll does ALL suspicious work inside DllMain: resolve targets, register VEH, enumerate threads, set debug registers. This is the worst case for emulation detection — everything is reachable within the emulator's instruction budget in a single linear code path.

Layer 3 — Runtime Behavioral (ETW-TI)

The Microsoft-Windows-Threat-Intelligence ETW provider instruments thread manipulation from kernel space. This layer only fires at runtime (not during file scan), but Defender's cloud analysis and AMSI integration can correlate runtime signals back to the file.

04 // DEFENDER EMULATION ENGINE

The emulator is the second line. Understanding its capabilities and limits is essential for understanding detection.

Capabilities

Full x86/x64 instruction set emulation
Emulated Windows API (CreateFile, LoadLibrary, GetProcAddress, etc.)
Emulated NT kernel calls (NtCreateFile, NtAllocateVirtualMemory, etc.)
Emulated filesystem and registry
DLL loading and dependency resolution
Entry point analysis: DllMain for DLLs, main/WinMain for EXEs

Known Limits

Finite instruction budget — terminates after N instructions or N API calls
Time limit on total emulation
Memory ceiling on emulated allocations
Known environment artifacts: GetComputerName returns "HAL9TH", GetUserName returns "JohnDoe"
Cannot fully emulate multi-threaded behavior
Network calls return simulated responses

IMPLICATION FOR INJECT_DLL

All suspicious API calls in inject_dll happen inside DllMain in a linear sequence. The emulator can follow this entire path within its budget. If the suspicious work were moved to a separate exported function called later (not from DllMain), the emulator would need to follow a more complex call graph and might not reach it within budget.

Reference

BlackHat 2018: Windows Offender — Reverse Engineering Windows Defender's Emulator

0xAlexei/WindowsDefenderTools (GitHub)

05 // ETW THREAT INTELLIGENCE (KERNEL TELEMETRY)

ETW-TI is the highest-fidelity detection layer. It operates at kernel level — usermode code cannot hook, intercept, or disable it without a kernel driver.

Relevant Events

ETW-TI Event           Trigger                        inject_dll fires?
———————           —————                        —————————
Task 5  (local)        NtSetContextThread (self)       YES (own thread)
Task 10 (remote)       NtSetContextThread (other)      YES (all threads)
Memory alloc events    VirtualAllocEx (remote)         NO
APC queue events       NtQueueApcThread                NO

Debug Register Inspection

When NtSetThreadContext fires with CONTEXT_DEBUG_REGISTERS flags, kernel-level telemetry can inspect the DR0-DR3 values being written. If those values resolve to addresses within amsi.dll (AmsiScanBuffer) or ntdll.dll (EtwEventWrite), that is a near-certain indicator of a patchless AMSI/ETW bypass.

THIS IS THE PRIMARY DETECTION SIGNAL

Cross-thread SetThreadContext with debug registers pointing at security module functions is the highest-confidence heuristic. No amount of string encoding, import hiding, or timing evasion affects this signal. It is generated by the kernel when the actual API is called at runtime.

06 // NtContinue — THE DOCUMENTED GAP

Praetorian Security documented that NtContinue updates thread context — including debug registers — WITHOUT traversing the kernel code path that emits EtwTiLogSetContextThread.

CRITICAL RESEARCH FINDING

NtSetContextThread  →  fires ETW-TI  →  Defender sees DR manipulation  →  DETECTED
NtContinue          →  NO ETW-TI     →  kernel telemetry BLIND          →  ???

This is the most significant documented gap in the detection chain for hardware breakpoint placement. NtContinue is normally used by the exception dispatcher to restore context after an exception is handled, but it can be called directly to set arbitrary thread context including debug registers.

How NtContinue Works

NtContinue takes a CONTEXT structure and a boolean RaiseAlert. It restores the thread's context from the structure — including RIP, RSP, and debug registers. Because it's designed for the exception dispatcher path (which fires millions of times in normal operation), the kernel does NOT instrument it with ETW-TI telemetry the way it instruments NtSetThreadContext.

Implications

A thread can set its OWN debug registers via NtContinue without generating kernel telemetry
Combined with APC injection, each target thread could call NtContinue to set its own DRs
This removes the cross-thread SetThreadContext pattern entirely
The thread enumeration + suspend/resume pattern is also eliminated

Source: Praetorian — ETW Threat Intelligence and Hardware Breakpoints

07 // WHY dark_room PASSES BUT inject_dll FAILS

Both components use the same VEH + HWBP technique. Both set DR0 on AmsiScanBuffer and DR1 on EtwEventWrite. Both use the same VEH handler logic. But dark_room passes Defender's scan and inject_dll does not.

STRUCTURAL COMPARISON

Attribute	dark_room.exe	inject_dll.dll
PE type	EXE	DLL
Target	Own thread only	ALL threads in process
Thread context	GetCurrentThread() pseudo-handle	OpenThread() on remote TIDs
Suspend/Resume	NOT NEEDED (self)	SuspendThread/ResumeThread on each thread
Thread enumeration	NONE	CreateToolhelp32Snapshot full enum
ETW-TI event	Task 5 only (local)	Task 5 + Task 10 (local + remote)
Import table	Minimal	Thread manipulation APIs
Entry point work	main() — sequential	DllMain — under loader lock
Scan result	CLEAN	DETECTED

The difference is the cross-thread manipulation. Setting your own debug registers via GetCurrentThread() is a benign-looking operation — debuggers do this constantly. Enumerating every thread in a process, suspending each one, setting debug registers, and resuming is an injection pattern that Defender specifically recognizes.

08 // PIPELINE FIXES — DEFENDER QUARANTINE HANDLING

During mutation testing, a secondary problem emerged: when Defender detects and quarantines vader_inject.dll, it places an exclusive file lock. The mutation pipeline then fails because the compiler can't overwrite the locked binary.

The Quarantine Lock Problem

Defender detects vader_inject.dll
  → Places exclusive handle on file
  → os.remove()   → PermissionError [WinError 5]
  → os.rename()   → OSError [WinError 225]
  → cl.exe /Fe:   → LNK1104 "cannot open file"
  → Mutation pipeline CRASHES

The Fix — Temp-Dir Compile Fallback

When the target binary is locked, the pipeline now:

Detects the lock via os.remove() try/except
Creates a temporary build directory (tempfile.mkdtemp())
Compiles to the temp directory instead
Attempts os.replace() to swap the new binary in
Falls back to .new suffix if replace fails
Scans whichever file exists (prefers .new)
On CLEAN result, promotes .new to expected filename

This fix was applied to 4 files across both VADER and SKYWALKER repositories:

vader-rootkit/mutate.py — compile_component() + rotate_component()
vader-rootkit/deploy.py — compile_component()
skywalker/mutate.py — compile_component() + rotate_component()
skywalker/deploy.py — compile_component()

PIPELINE STATUS

All compile/scan/mutate operations now handle Defender quarantine locks gracefully. The pipeline continues operating even when previous binaries are quarantined. 7/8 components rotate successfully on both VADER and SKYWALKER. inject_dll is the sole holdout — structural detection, not string matching. Updated 18 June 2026.

09 // STRUCTURAL DIFF — VADER vs SKYWALKER

Priority research direction #1: diff the two inject_dll source files. VADER fails 10/10 key rotations. SKYWALKER originally found a clean key after 8 attempts (now also fails 10/10 as of 18 June 2026 — Defender updated signatures). What's different?

FINDING: THE FILES ARE FUNCTIONALLY IDENTICAL

Every function signature, every API call, every logic branch, every control flow path, every variable declaration, every compile flag is the same. After stripping comments and blank lines, the diff produces zero logic changes.

The detection difference is caused by the XOR key choice and how it transforms the .rdata section of the compiled DLL.

The Key Difference (Literally)

XOR KEY COMPARISON

Attribute	VADER	SKYWALKER
XOR Key	`0x77`	`0xE3`
Key high bit	0 (ASCII range)	1 (above ASCII)
Encoded byte range	`0x00 - 0x59`	`0x80 - 0xD9`
Bytes ≥ 0x80	0 of 77 (0%)	77 of 77 (100%)
Bytes < 0x20	60 of 77 (78%)	0 of 77 (0%)
Null bytes (0x00)	2 (from 'w' ^ 0x77)	0
Scan result (original)	DETECTED 10/10	CLEAN after 8 attempts
Scan result (18 Jun 2026)	DETECTED 10/10	DETECTED 10/10

Why This Matters for Detection

VADER's byte distribution is a statistical fingerprint. Key 0x77 is itself an ASCII character (lowercase 'w'). XOR-encoding ASCII text with an ASCII-range key produces encoded values that cluster in the low half of the byte space (0x00-0x59). This narrow, low-range distribution screams "XOR-encoded ASCII strings" to entropy-based scanners.

VADER produces null bytes. Where plaintext contains 'w' (two occurrences in "Windows" in the canary path), the encoding produces 0x00. Null bytes in .rdata const arrays adjacent to non-null data is an unusual pattern that can serve as a heuristic anchor point.

SKYWALKER's bytes look like normal binary data. Key 0xE3 has its high bit set, which flips the high bit of every ASCII character during encoding. All encoded values land in the 0x80-0xD9 range — overlapping with common binary data patterns like relocations, padding, and encoded resources. This looks unremarkable in a PE .rdata section.

YARA-STYLE SIGNATURE POSSIBILITY

A rule matching "5+ consecutive bytes all below 0x60 in .rdata near GetProcAddress import" would hit VADER every time and miss SKYWALKER every time. This suggests Defender has a static signature anchored on the byte distribution produced by low-range keys.

Detection Model

The evidence suggests Defender uses BOTH detection layers against inject_dll:

Layer A — Static byte pattern signature
  Key 0x77 → low-range clustering → ALWAYS MATCHES → DETECTED
  Key 0xE3 → high-range bytes     → MISSES          → passes static

Layer B — Behavioral/heuristic rule (API call sequence)
  Both keys → same code structure → SOMETIMES MATCHES → inconsistent

VADER:   Layer A hits 100%  →  always DETECTED (Layer B irrelevant)
SKYWALKER: Layer A misses    →  Layer B fires inconsistently → CLEAN after 8 tries

Implications for Mutation Pipeline

The mutate.py key selection should prefer keys with high bit set (0x80-0xFF)
Keys that produce null bytes when XOR'd with common ASCII ('a'-'z', 'A'-'Z', '.', '\') should be avoided
Key selection alone does not guarantee evasion — it only defeats Layer A (static). Layer B (behavioral) still fires based on the code structure
To defeat both layers, the code structure must change (see research directions below)

10 // RESEARCH DIRECTIONS

Prioritised by expected impact and feasibility. Each direction targets a specific layer of the detection chain.

PRIORITY 1 — DIFF VADER vs SKYWALKER [COMPLETED]

Files are functionally identical. Zero logic differences. The detection gap was caused by XOR key byte distribution (Layer A), not code structure. High-bit key fix applied to mutate.py. As of 18 June 2026, Defender has updated — both variants now fail 10/10 regardless of key range. Layer A is no longer bypassable via key selection alone. The remaining detection is purely Layer B (structural/behavioral).

COMPLETED. Finding: structural detection confirmed. Key selection exhausted as evasion vector.

PRIORITY 2 — NtContinue FOR HWBP PLACEMENT

Replace NtSetContextThread with NtContinue for debug register manipulation. Bypasses ETW-TI kernel telemetry entirely. Documented by Praetorian. Each thread sets its OWN DRs via NtContinue, eliminating the cross-thread SetThreadContext pattern.

Targets: Layer 3 (ETW-TI). Impact: Removes highest-fidelity detection signal.

PRIORITY 3 — DECOUPLE DllMain

Move VEH registration + thread blinding out of DllMain into a separate exported init function. DllMain becomes a no-op (DisableThreadLibraryCalls only). Injector calls init via second CreateRemoteThread after LoadLibrary returns.

Targets: Layer 2 (emulation). Impact: Code not reachable from DllMain entry point.

PRIORITY 4 — APC-BASED THREAD BLINDING

Replace Suspend/SetContext/Resume with QueueUserAPC. Each thread receives an APC that calls NtContinue to set its own debug registers. Eliminates thread enumeration + cross-thread context manipulation entirely.

Targets: All 3 layers. Impact: Completely different API surface.

PRIORITY 5 — DIRECT SYSCALLS

Replace Win32 API calls with direct NT syscall invocations (SysWhispers-style). Removes suspicious imports from IAT. Note: does NOT bypass ETW-TI kernel telemetry — syscalls still fire kernel callbacks.

Targets: Layer 1 (static) + Layer 2 (emulation). Impact: Clean IAT, harder to emulate.

PRIORITY 6 — STAGGER TIMING

Insert computation delays between VEH registration and thread enumeration. May exceed emulator's instruction budget. Use CreateTimerQueueTimer for deferred execution.

Targets: Layer 2 (emulation). Impact: Uncertain — cloud analysis has no time budget.

DEFENDER ENGAGEMENT // STRUCTURAL DETECTION RESEARCH