Excel RC4-based encryption belongs to the legacy era of Office security. It still appears in older .xls files and occasionally in compatibility modes. This analysis provides a complete technical pipeline from password entry to encrypted bytes, with concrete examples and security implications.

This deep dive focuses strictly on file-level "Open Password" encryption, not worksheet or workbook structure protection.

🔹 What Is RC4?

RC4 is a symmetric stream cipher developed by Ron Rivest that encrypts data by XORing plaintext with a pseudorandom keystream derived from a secret key. It was widely adopted in legacy software, including early Microsoft Excel versions, due to its high performance and low computational cost.

Today, RC4 is considered cryptographically weak and deprecated because of known statistical biases and fast brute-force feasibility. Modern security standards forbid its use in production systems.

Property Details
Cipher Type Stream cipher
Key Size 40-128 bits (Excel 97-2003)
Designer Ron Rivest (RSA Security)
Year Introduced 1987
Current Status Deprecated / Broken

1️⃣ Where RC4 Fits in Excel Encryption Timeline

Understanding when RC4 was used helps contextualize its security weaknesses. Microsoft gradually phased out RC4 as computational attacks became feasible.

Excel Version File Format Encryption Method Security Level
Excel 97-2003 .xls RC4 (40-128 bit) Critical Risk
Excel 2007 (early) .xls RC4 (optional) Moderate Risk
Excel 2007+ .xlsx AES-128/256 (default) Secure

RC4 is used only for workbook "Open Password" encryption, not worksheet protection. Once a file is saved in .xls format, the encryption method remains RC4 unless explicitly converted to a newer format.

2️⃣ High-Level Encryption Flow

The RC4 encryption process in Excel follows a specific sequence. Here is the conceptual pipeline:

User Password
   ↓
Password Hashing (SHA-1)
   ↓
Key Derivation (with salt + block index)
   ↓
RC4 Key Schedule Algorithm (KSA)
   ↓
RC4 Keystream Generator (PRGA)
   ↓
XOR with Workbook Bytes
   ↓
Encrypted .xls File

RC4 is a stream cipher, which means encryption and decryption are symmetric operations. The same keystream XORed with ciphertext produces the original plaintext.

⚠️ Important: The same cryptographic process is used to decrypt the file when the correct password is supplied. This symmetry is both a strength and a vulnerability.

3️⃣ Step-by-Step Internal Mechanics

🔹 Step 1: User Password Input

When a user sets a password, Excel receives plaintext input. For example:

    Summer2024!

Excel never stores this password directly. The plaintext password is immediately transformed through hashing and never persists in readable form.

🔹 Step 2: Password Hashing

Excel hashes the password using SHA-1, a cryptographic hash function that produces a fixed 160-bit output.

SHA1("Summer2024!") = 
9d1a4f5c1f0e5c8b1d6d8fbbf7c6a0b2d8c2f4a1

This hash becomes the root secret for all further cryptographic operations. Any change to the password produces a completely different hash.

🔹 Step 3: Key Derivation (Weak KDF)

Excel mixes the password hash with additional entropy sources:

  • Password hash (from Step 2)
  • Workbook salt (random 16-byte value)
  • Block index (important for re-keying)

The pseudo-logic looks like this:

RC4Key = SHA1(PasswordHash + Salt + BlockIndex)
⚠️ Critical Weakness: No memory-hard function. No iterations. The key derivation is extremely fast, making brute-force attacks highly efficient. Each password guess can be tested independently and offline.

🔹 Step 4: RC4 Key Schedule Algorithm (KSA)

RC4 initializes a 256-byte state array S through the Key Scheduling Algorithm. This permutation is the foundation of the keystream.

S = [0, 1, 2, 3, ..., 255]
j = 0

for i in 0..255:
    j = (j + S[i] + Key[i mod key_length]) mod 256
    swap(S[i], S[j])

This step permutes the state array S using the derived key. The quality of this permutation directly affects keystream predictability and security.

💡 Technical Note: Known biases in the initial bytes of RC4 keystreams have been exploited in practical attacks. This is one reason RC4 is deprecated.

🔹 Step 5: Keystream Generation (PRGA)

RC4 generates a pseudorandom byte stream using the Pseudo-Random Generation Algorithm (PRGA):

i = (i + 1) mod 256
j = (j + S[i]) mod 256
swap(S[i], S[j])
keystream_byte = S[(S[i] + S[j]) mod 256]

Example keystream output:

A3 F1 9C 77 2B 8E D4 5A ...

This keystream is combined with file data byte by byte using XOR operations.

🔹 Step 6: XOR Encryption

Plaintext bytes are XORed with keystream bytes to produce ciphertext. This is the actual encryption step:

Component Hex Values
Plaintext (BIFF header) 50 4B 03 04
Keystream A3 F1 9C 77
Ciphertext (XOR result) F3 BA 9F 73

Decryption repeats the same XOR operation. XOR symmetry is why the identical process decrypts the file.

4️⃣ Block-Based Re-Keying

Excel does not use a single continuous keystream for the entire file. Instead, it re-derives the RC4 key per data block:

Block 0 → Key₀
Block 1 → Key₁
Block 2 → Key₂
...

This approach avoids certain stream cipher vulnerabilities associated with long keystreams, but:

  • Still uses SHA-1 for key derivation
  • Still computationally fast
  • Still brute-force friendly
⚠️ Security Impact: Re-keying improves structural security but does nothing to slow password guessing attacks.

5️⃣ Practical Example: Office Hash Format

Password recovery tools like Hashcat extract Excel RC4 hashes in a standardized format for offline cracking:

$office$*2003*128*16*SALTHEX*ENCRYPTEDVERIFIER*VERIFIERHASH

Example extracted hash:

$office$*2003*128*16*13b7cf2d2a5a436f*d048c5880a286c7d*99909f3bb868d81b

Attackers brute-force passwords and verify correctness via decrypted verifier bytes. This verification step avoids decrypting the full workbook for each guess, dramatically accelerating attacks.

6️⃣ Why RC4 Excel Encryption Is Weak

Weakness Explanation
❌ RC4 is deprecated Known statistical biases in keystream output
❌ No key stretching Millions of password guesses per second possible
❌ SHA-1 only Fast hashing algorithm, GPU-friendly for attacks
❌ Short passwords common Human password patterns easily exploited
❌ No memory hardness Cheap parallel attacks on commodity hardware

Modern GPUs can test billions of RC4 passwords per second. Security depends almost entirely on password entropy and length.

7️⃣ Comparison With Modern Excel AES

Feature RC4 Excel (Legacy) AES Excel (2010+)
Cipher Algorithm RC4 stream cipher AES-128/256 block cipher
Key Derivation SHA-1 (1 round) PBKDF2 with SHA-512
Iterations ❌ 1 ✅ 50,000-100,000
GPU Resistance ❌ None ⚠️ Partial (slowed)
Security Status Broken Acceptable

AES significantly raises the computational cost of brute-force attacks through key stretching and stronger cryptographic primitives.

8️⃣ Security Takeaways 🔒

✓ What RC4 Protects

RC4 Excel encryption protects against casual snooping only. It prevents average users from opening locked files without tools.

✗ What RC4 Does Not Protect

It does not meet modern security standards. Any sensitive .xls file should be considered recoverable by motivated attackers.

Best Practices

  • Convert legacy .xls files to .xlsx format
  • Use AES-256 encryption for sensitive workbooks
  • Store passwords in a dedicated password manager
  • Assume legacy RC4-encrypted files are already compromised
  • Implement regular security audits for document protection
💡 Pro Tip: Security improves only after re-saving legacy files with modern encryption. Simply opening and saving in Excel 2013+ with a password automatically upgrades to AES.

If you need to recover access to a legacy Excel file, Niraiya specializes in secure password recovery using advanced techniques.