← Back to all posts Explainer

URL encoding and Base64: what they do and when to use each

May 28, 2026·7 min read

URL encoding and Base64 are two of the most common text-safety conversions on the web. They look similar at a glance because both turn input into a string of letters, digits, and symbols, but they exist for different reasons. This post explains what each one does, the alphabet it uses, and the practical cases where one is the right tool and the other is not.

The short version

URL encoding takes a piece of text and rewrites the characters that are unsafe to put inside a URL as percent-hex sequences. A space becomes %20. An ampersand becomes %26. Letters, digits, and a small set of safe punctuation stay as they are.

Base64 takes a sequence of bytes — text, an image, a serialised payload, anything — and rewrites the whole stream as letters and digits drawn from a 64-character alphabet. The output is unreadable but compact enough to drop into any text-only context.

The two formats solve different problems. URL encoding keeps a string mostly readable so it can travel inside a URL. Base64 sacrifices readability so that any sequence of bytes can travel inside a text-only channel.

URL encoding (also called percent-encoding)

A URL has a limited safe character set defined by RFC 3986. Inside any single path or query component the safe characters are letters, digits, and a short list of punctuation: - _ . ~. Almost everything else has to be replaced by its UTF-8 byte values written as %XX hex pairs.

The encoder follows a simple rule. For each character:

That rule explains every example a person typically runs into:

" "       →  %20
"&"       →  %26
"="       →  %3D
"?"       →  %3F
"é"       →  %C3%A9   (two UTF-8 bytes)
"你"      →  %E4%BD%A0  (three UTF-8 bytes)
"🚀"     →  %F0%9F%9A%80  (four UTF-8 bytes)

Decoding reverses the process. The decoder reads each %XX pair as a byte, accumulates the bytes, and interprets the result as UTF-8 text.

Component vs full URL

There are two common encoders. The component encoder escapes everything that is not in the safe set, including :, /, ?, and &. The full-URL encoder leaves those alone because they form the structure of the URL. Most online tools default to the component encoder, which is what you want when escaping a single value to put inside a query string or path segment.

Where URL encoding shows up

One small gotcha: + versus %20

Standard URL component encoding (RFC 3986) writes a space as %20 and leaves + as a literal plus sign. Older HTML form submissions use a different convention called application/x-www-form-urlencoded, where a space is written as + and a literal plus is written as %2B. The two forms look similar but mean different things for spaces.

When decoding something that came out of a form submission, replace + with a space before running the decode. When decoding a string that came from a path or a properly-encoded query parameter, leave + alone.

Base64

Base64 is defined by RFC 4648. It treats the input as a stream of 8-bit bytes, takes them in groups of three (24 bits), and rewrites each group as four characters drawn from this alphabet:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 + /

That is 64 characters, which is exactly enough to represent the 64 possible values of 6 bits. Three input bytes (24 bits) become four output characters (4 × 6 bits = 24 bits). The mapping is fixed and reversible.

When the input length is not a multiple of three, the encoder pads the result with one or two = characters to keep the output length a multiple of four. The padding has no meaning beyond signalling how many bytes were in the final group.

What Base64 is not

Base64 is not encryption. The mapping is public and reversible by anyone. The output is unreadable to a human eye, but it is not secret. A Base64-encoded password is the same as a plaintext password as far as security goes.

The URL-safe variant

The standard Base64 alphabet contains + and /. Both have special meanings in URLs, which means a Base64 string with those characters cannot be dropped into a URL without further encoding. RFC 4648 §5 defines a URL-safe variant that replaces + with - and / with _, and usually omits the trailing = padding.

This is the form used inside JSON Web Tokens (JWT). A JWT is three URL-safe Base64 strings separated by dots: header, payload, and signature. The payload is plain JSON encoded with URL-safe Base64 so that the whole token can be passed as a header value or a query parameter without further escaping.

Where Base64 shows up

When to use which

The decision is usually clear once the input and the destination are named.

A common mistake

Some teams use Base64 to "encode" a string before putting it in a URL because the output looks safe. The output does avoid most URL-unsafe characters, but standard Base64 still includes + and /, which then need to be URL-encoded again. The result is double escaping for no benefit. URL-safe Base64 fixes the alphabet so the second pass is unnecessary.

Decoding errors and what they usually mean

Both encoders are reversible. Errors during decoding almost always come from input that is not in the expected form.

URL decode errors

Base64 decode errors

Convert between all four forms in your browser

Free, no signup. Standard and URL-safe Base64. Unicode and emoji supported.

Open the encoder →

What this post is not

It is not a security guide. Neither URL encoding nor Base64 protects a value. Both formats are public, fully reversible, and intended to make characters or bytes survive transport. Any value that needs to stay private has to be encrypted separately, before encoding.

Related reading