dev cosmos/ blog/ uuid-generator-guide

UUID Deep Dive: RFC 4122, Versions, Collision Probability, and Best Practices

The definitive guide to UUID v1, v4, v5, and Nil — and how to choose the right identifier strategy for your database, API, and distributed system.

UID

UUID Generator

v1 · v4 · v5 · Nil · RFC 4122

UUID (Universally Unique Identifier) is one of those things every developer uses daily but few understand at depth. The version you choose, and how you store UUIDs, can have major implications for database performance, security, and system architecture. This guide covers everything.

UID
Open in Dev Cosmos
UUID Generator →

UUID Structure (RFC 4122)

A UUID is 128 bits, canonically formatted as 32 hexadecimal characters in 5 hyphen-separated groups:

550e8400 - e29b - 41d4 - a716 - 446655440000
 8 chars   4     4     4     12 chars
time_low  time  ver+  var+  node
          mid   hi    clock

The single hex digit at position 13 (the version nibble) identifies the version: 1, 4, or 5. The first 1–2 bits of the 17th group are the variant — RFC 4122 UUIDs always have 10xx here (hex 8, 9, a, or b).

UUID Versions Explained

Version 1 — Time + Node

v1 generates uniqueness from two sources: a 60-bit timestamp (100-nanosecond intervals since October 15, 1582) and a 48-bit node (typically the host's MAC address or a random value).

Advantages: Monotonically increasing per generator, meaning v1 UUIDs sort chronologically. This is valuable for time-series data and for database B-tree indexes — inserting at the end of an index is much faster than random insertion.

Disadvantage: Using the real MAC address leaks network topology. Most modern implementations use a random node to mitigate this.

Version 4 — Random (Default Choice)

v4 uses 122 bits of cryptographically random data. The remaining 6 bits encode the version and variant. No state, no coordination, no infrastructure needed.

// JavaScript — built-in
crypto.randomUUID()

// Python
import uuid; str(uuid.uuid4())

// Go
import "github.com/google/uuid"
uuid.New().String()

// C#
Guid.NewGuid().ToString()

// PostgreSQL
SELECT gen_random_uuid();

v4 is the right choice for 95% of use cases — database primary keys, request IDs, session tokens (where you also need unpredictability), feature flag keys, and test data identifiers.

Version 5 — Namespace + SHA-1 (Deterministic)

v5 is deterministic: the same namespace UUID + name string always produces the same output UUID. It hashes the concatenation using SHA-1 and sets the appropriate version/variant bits.

ℹ️
Standard Namespace UUIDs
RFC 4122 defines four well-known namespaces: DNS (6ba7b810-9dad-11d1-80b4-00c04fd430c8), URL, OID, and X.500. Use these for interoperability — any implementation using the same namespace and name will produce the same UUID.
// v5 use cases:
// 1. Stable user ID from email (same email → same UUID across services)
uuidv5("alice@example.com", DNS_NAMESPACE)
// → always "a0a60b28-5f94-5f93-8a71-2a1b5f3e8a4a"

// 2. Deduplicate events by content hash
uuidv5(JSON.stringify(eventPayload), URL_NAMESPACE)

// 3. Content-addressable IDs for documents
uuidv5(documentTitle, customNamespace)

Nil UUID

All 128 bits zero: 00000000-0000-0000-0000-000000000000. Used as a sentinel "no value" or "unset" indicator — equivalent to null but typed as a UUID. Common in databases where a nullable UUID column would complicate queries.

Collision Probability

For v4, the probability of any two UUIDs colliding is approximately 1 / (2^122). To have a 50% chance of a collision, you would need to generate roughly 2.7 × 10¹⁸ UUIDs — approximately 85 years of generating 1 billion UUIDs per second.

💡
Practical Uniqueness
For any real application, UUID v4 collision probability is so astronomically low it can be ignored entirely. The risk of your database hardware failing is orders of magnitude higher than generating two identical v4 UUIDs.

Database Performance Considerations

This is where UUID version choice matters most in production systems:

ID StrategyInsert PerfGlobally UniqueClient-GeneratedSortable
Auto-increment INTExcellentNo (per-DB)NoYes
UUID v4 (random)Degrades at scaleYesYesNo
UUID v1 (time)GoodYesYesYes (approx.)
UUID v7 (timestamp prefix)ExcellentYesYesYes
ULID / CUID2ExcellentYesYesYes

Random v4 UUIDs cause B-tree index fragmentation because each new row inserts at a random position in the index rather than appending to the end. At millions of rows this triggers frequent page splits and cache misses, measurably degrading INSERT performance.

UUID v7 — The Modern Answer

UUID v7 (IETF draft, widely adopted since 2023) prefixes a 48-bit Unix millisecond timestamp, followed by random bits. This makes UUIDs monotonically increasing within a millisecond, combining the B-tree locality of sequential IDs with the global uniqueness of random UUIDs. PostgreSQL 17+, MySQL 8.0+, and most modern ORMs support v7 natively.

UUIDs Are Identifiers, Not Secrets

A common mistake: using a UUID as a secret token (password reset link, email verification token, API key). UUID v4 has 122 bits of entropy — which sounds secure — but UUIDs are designed to be shared. They appear in URLs, logs, headers, and error messages. If they're guessable from context or leaked through logs, security breaks down.

For secrets, generate a dedicated cryptographic random token using crypto.randomBytes(32).toString('hex') (Node.js) or the equivalent in your language — and store only its hash.

More from the Blog