Pseudo Randomness

Randomness "the fact of being done, chosen, etc. without somebody deciding in advance what is going to happen, or without any regular pattern"

Based on the definition, we can state that randomness is non-deterministic. In computer science, randomness is typically defined statistically:

"A sequence is random if it passes statistical tests and is unpredictable under computational constraints."

This doesn't require metaphysical nondeterminism; deterministic processes can produce sequences indistinguishable from true randomness. Even then, we rely on randomness to generate session tokens, encryption keys, passwords, and initialization vectors. But there is a fundamental contradiction at the heart of computing: computers are deterministic machines.

These questions arise: "If randomness is a non-deterministic event, how can a deterministic function give a non-deterministic output."

Through this post, I'm trying to explore the limitations of software-generated randomness, why even our "secure" standards have cracks, and why the industry looks toward physical phenomena for additional entropy.

My intention is not to state that modern classical randomness is vulnerable. The state of computation is changing, meaning we need to evolve. Also, note that current classical randomness is deterministic. This blog is also subjected to change over time. Thanks.

Cloudflare use a physical system for randomness. They state: "To produce the unpredictable, chaotic data necessary for strong encryption, a computer must have a source of random data. The "real world" turns out to be a great source for randomness, because events in the physical world are unpredictable." You can learn more about their lava lamp here. At the end of the blog, I revisit this system in brief.

Standard PRNGs

The first generation of random number generators in software are Pseudo-Random Number Generators (PRNGs). These are algorithms that produce a sequence of numbers that looks random to a human observer but is actually determined by an initial value, known as the "seed."

If an attacker can guess or retrieve the seed, the entire sequence becomes predictable. Let's prove this vulnerability across the most popular modern languages.

Python

Python’s random module is a developer favorite. It is fast, easy to use, and statistically excellent for simulations. However, it uses the Mersenne Twister algorithm, which is not cryptographically secure.

The vulnerability lies in the seed. Often, developers use the system time as a seed (or Python does it by default).

import random

# Attacker guesses the seed was the timestamp of server start
import time
approx_seed = int(time.time())

# Attacker initializes their own generator with the guessed seed
random.seed(approx_seed)

# If the guess is correct, the attacked knows the "random" token
predicted_token = random.randint(0, 999999)
print(f"Predicted Token: {predicted_token}")

If an attacker can approximate when a server rebooted or when a token was generated (down to the second or millisecond), they can brute-force the seed and replicate the exact "random" values generated by the server. Furthermore, the Mersenne Twister’s internal state can be reverse-engineered after observing just 624 outputs.

Java

Java’s java.util.Random class relies on a Linear Congruential Generator (LCG). While statistically passable for basic tasks, it is mathematically brittle.

An LCG works on the formula: next_seed = (a * current_seed + c) % m

Because this is a linear equation, it is solvable. While java.util.Random exposes 31-bit truncated outputs, with multiple observed outputs an attacker can use lattice reconstruction or brute-force to recover the seed. Once the seed is known, every future output is compromised.

// Developer mistake: Using java.util.Random for a secure token
Random rand = new Random();
Integer token = rand.nextInt();
// Attacker observes previous tokens -> solves for seed -> predicts token.

Go

In versions of Go prior to 1.20, if a developer used math/rand without explicitly setting a seed, the generator would default to the value 1. This meant that every time the program restarted, it would generate the exact same sequence of numbers. While newer versions auto-seed, manually seeded math/rand is still deterministic and unsuitable for security keys.

Node.js

In Node.js and browsers, Math.random() is the default go-to. However, the specification does not require a cryptographically secure algorithm. Most implementations use variations of XorShift or LCG. Like Java and Python, Math.random() is designed for speed and distribution, not unpredictability.

Are CSPRNGs Truly Secure?

To fix the flaws of standard PRNGs, modern languages introduced Cryptographically Secure Pseudo-Random Number Generators (CSPRNGs).

Python: secrets module
Java: java.security.SecureRandom
Node.js: crypto.randomBytes()
Go: crypto/rand

Implementation

Are they secure?

For most modern applications, yes. They are "computationally secure." This means that while they are theoretically predictable if you have infinite computing power, the time required to break them exceeds the lifespan of the universe (with current technology).

CSPRNGs derive their entropy from the operating system’s "entropy pool" - a collection of unpredictable events like - CPU jitter, hardware interrupts, RDSEED / RDRAND, network timing noise and disk interrupts. They are designed to resist state compromise extension attacks; meaning if an attacker figures out the current state, they cannot easily determine previous states.

But they are not perfect.

Entropy Starvation: This is the "boot-up" problem. On embedded systems, IoT devices, or freshly spun-up cloud containers, there is very little "noise" (user interaction) to draw from. The entropy pool runs dry, and the CSPRNG may block (stall the program) or, in worst-case scenarios, degrade into weaker randomness. Historically, this caused real-world vulnerabilities:
- Debian OpenSSL bug (2008): A Debian maintainer accidentally disabled OpenSSL's entropy source, leaving it with only one predictable seed. This made all SSL keys generated during that period vulnerable. {1}
- Android SecureRandom bug (2013): Android's Bitcoin wallet app used a weak entropy source, allowing attackers to steal private keys. {2}
- Dual_EC_DRBG controversy: The NSA reportedly engineered a backdoor into this NIST-standardized DRBG, revealing the dangers of trusting closed-source cryptographic primitives. {3}
Deterministic Nature Remains: CSPRNGs are still algorithms. They are complex algorithms, but they are bound by the rules of code. If an attacker gains root access to the OS and reads the memory state, the "randomness" of that specific session can be compromised.
Not True Randomness: While CSPRNG output is cryptographically indistinguishable from true randomness, they are still deterministic algorithms. For the highest security requirements, physical sources are preferred as entropy seeds.

Modern OS RNG Architecture

The modern RNG pipeline follows this structure:

Physical Entropy → Entropy Pool → DRBG (Deterministic Random Bit Generator) → Application

Common DRBG algorithms include:

CTR_DRBG (AES in counter mode)
HMAC_DRBG (HMAC-based)
ChaCha20-based RNG (used in modern Linux)

On Linux, /dev/random and /dev/urandom serve different purposes:

/dev/random historically blocked on entropy estimates, ensuring high-quality randomness
/dev/urandom never blocks; it reuses the CSPRNG state to provide unlimited random bytes

After initialization they are effectively equivalent for most purposes. For most applications, /dev/urandom is sufficient. Only cryptographic key generation for high-security contexts typically requires /dev/random.

Hardware TRNGs

Physical randomness sources include:

Intel RDRAND/RDSEED: Hardware RNG built into modern CPUs
TPM (Trusted Platform Module): Dedicated security chip with onboard RNG
Hardware noise diodes: Electronic components that amplify thermal noise
Quantum random number generators: Exploit quantum mechanics for randomness

If software is deterministic, where can we find true chaos? The answer lies in the physical world. This brings us to the concept of True Random Number Generators (TRNGs). To achieve true randomness, we must measure physical phenomena that are fundamentally unpredictable.

The Cloudflare LavaRand Solution

Cloudflare needs an immense amount of randomness to generate SSL keys for millions of requests. They installed a wall of lava lamps in their San Francisco office. A camera points at the wall and takes photos at regular intervals. The data from these photos is converted into a stream of numbers.

Why do they do this?: The "lava" inside the lamps moves due to fluid dynamics and thermodynamics, processes that are anarchic and governed by infinitesimal variables (air currents, temperature fluctuations, etc.). No two photos are ever the same. Even if an attacker stood in the room with the lamps, they could not predict the exact pixel configuration of the camera's sensor.

This creates a source of Entropy that is:

Unpredictable: It relies on physics, not math.
External: It is not generated by the computer’s internal state.

Cloudflare uses the lava lamps as a supplementary entropy source. They mix this physical randomness with their system's existing entropy (CPU jitter, RDRAND, etc.) to ensure that their keys are resistant to theoretical attacks. Notably, Cloudflare states that the lava lamps are not strictly necessary - their systems already have strong entropy sources. The lamps serve as additional entropy and a public demonstration of randomness.

As we move into an era of quantum computing and advanced cryptographic attacks, the demand for True Random Number Generators is growing. We are reaching a point where simulating randomness is no longer enough; we must start harnessing the fundamental unpredictability of the universe itself.