URL encoding and Base64: what they do and when to use each
URL encoding and Base64 are two of the most common text-safety conversions on the web. They look similar at a glance because both turn input into a string of letters, digits, and symbols, but they exist for different reasons. This post explains what each one does, the alphabet it uses, and the practical cases where one is the right tool and the other is not.
The short version
URL encoding takes a piece of text and rewrites the characters that are unsafe to put inside a URL as percent-hex sequences. A space becomes %20. An ampersand becomes %26. Letters, digits, and a small set of safe punctuation stay as they are.
Base64 takes a sequence of bytes — text, an image, a serialised payload, anything — and rewrites the whole stream as letters and digits drawn from a 64-character alphabet. The output is unreadable but compact enough to drop into any text-only context.
The two formats solve different problems. URL encoding keeps a string mostly readable so it can travel inside a URL. Base64 sacrifices readability so that any sequence of bytes can travel inside a text-only channel.
URL encoding (also called percent-encoding)
A URL has a limited safe character set defined by RFC 3986. Inside any single path or query component the safe characters are letters, digits, and a short list of punctuation: - _ . ~. Almost everything else has to be replaced by its UTF-8 byte values written as %XX hex pairs.
The encoder follows a simple rule. For each character:
- If the character is in the safe set, leave it alone.
- If it is not, convert it to its UTF-8 byte representation and write each byte as
%followed by two hex digits.
That rule explains every example a person typically runs into:
" " → %20 "&" → %26 "=" → %3D "?" → %3F "é" → %C3%A9 (two UTF-8 bytes) "你" → %E4%BD%A0 (three UTF-8 bytes) "🚀" → %F0%9F%9A%80 (four UTF-8 bytes)
Decoding reverses the process. The decoder reads each %XX pair as a byte, accumulates the bytes, and interprets the result as UTF-8 text.
Component vs full URL
There are two common encoders. The component encoder escapes everything that is not in the safe set, including :, /, ?, and &. The full-URL encoder leaves those alone because they form the structure of the URL. Most online tools default to the component encoder, which is what you want when escaping a single value to put inside a query string or path segment.
Where URL encoding shows up
- Query string values:
?q=caf%C3%A9for a search of café. - Sitemap entries with non-ASCII paths: every URL in
sitemap.xmlshould be percent-encoded for the parts outside the safe set. - Hreflang and canonical tags pointing at internationalised URLs.
- Redirect rules that move a path containing accents or spaces to a new location.
- The
hrefattribute of any link generated from user input.
One small gotcha: + versus %20
Standard URL component encoding (RFC 3986) writes a space as %20 and leaves + as a literal plus sign. Older HTML form submissions use a different convention called application/x-www-form-urlencoded, where a space is written as + and a literal plus is written as %2B. The two forms look similar but mean different things for spaces.
When decoding something that came out of a form submission, replace + with a space before running the decode. When decoding a string that came from a path or a properly-encoded query parameter, leave + alone.
Base64
Base64 is defined by RFC 4648. It treats the input as a stream of 8-bit bytes, takes them in groups of three (24 bits), and rewrites each group as four characters drawn from this alphabet:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 + /
That is 64 characters, which is exactly enough to represent the 64 possible values of 6 bits. Three input bytes (24 bits) become four output characters (4 × 6 bits = 24 bits). The mapping is fixed and reversible.
When the input length is not a multiple of three, the encoder pads the result with one or two = characters to keep the output length a multiple of four. The padding has no meaning beyond signalling how many bytes were in the final group.
What Base64 is not
Base64 is not encryption. The mapping is public and reversible by anyone. The output is unreadable to a human eye, but it is not secret. A Base64-encoded password is the same as a plaintext password as far as security goes.
The URL-safe variant
The standard Base64 alphabet contains + and /. Both have special meanings in URLs, which means a Base64 string with those characters cannot be dropped into a URL without further encoding. RFC 4648 §5 defines a URL-safe variant that replaces + with - and / with _, and usually omits the trailing = padding.
This is the form used inside JSON Web Tokens (JWT). A JWT is three URL-safe Base64 strings separated by dots: header, payload, and signature. The payload is plain JSON encoded with URL-safe Base64 so that the whole token can be passed as a header value or a query parameter without further escaping.
Where Base64 shows up
- Embedded images inside CSS or HTML:
data:image/png;base64,iVBORw0KGgo.... The favicon on this site is one example. - The credentials part of an HTTP
Authorization: Basicheader is a Base64-encodedusername:passwordpair. - JWT header and payload sections, both URL-safe Base64.
- Binary attachments inside JSON or XML payloads that have no native binary type.
- Small icons or fonts inlined into a page to avoid an extra HTTP request.
- Anywhere a binary blob has to travel through a channel that only accepts text.
When to use which
The decision is usually clear once the input and the destination are named.
- Going into a URL? Use URL encoding. It keeps the value mostly readable and is the convention every HTTP client and server already understands.
- Carrying raw bytes through a text-only channel? Use Base64. It can represent any byte sequence, but the output is roughly 33% larger than the input and looks like noise.
- Embedding a small binary asset inside text? Use Base64 inside a
data:URL. Browsers, CSS parsers, and email clients all understand this form. - Passing a binary token inside a URL? Use URL-safe Base64. Standard Base64 would need a second pass of URL encoding for
+and/; URL-safe Base64 avoids that step.
A common mistake
Some teams use Base64 to "encode" a string before putting it in a URL because the output looks safe. The output does avoid most URL-unsafe characters, but standard Base64 still includes + and /, which then need to be URL-encoded again. The result is double escaping for no benefit. URL-safe Base64 fixes the alphabet so the second pass is unnecessary.
Decoding errors and what they usually mean
Both encoders are reversible. Errors during decoding almost always come from input that is not in the expected form.
URL decode errors
- Malformed percent escape. The input has a
%followed by something that is not two hex digits. Examples: a stray%on its own,%2at the end of a string, or%2Gwith a non-hex second digit. - Mixed conventions. The string was form-encoded but is being decoded as a URI component. The result will keep
+instead of turning it back into a space.
Base64 decode errors
- Wrong alphabet. A standard Base64 string contains
-or_, or a URL-safe string contains+or/. The two alphabets do not mix in the same input. - Wrong padding. One
=where two are expected, or extra characters after the padding. Missing padding is usually accepted because the spec explicitly permits it. - Decoded bytes are not text. Decoding succeeds at the byte level, but the resulting bytes do not form valid UTF-8. This is the normal case for Base64 input that was encoding a binary asset rather than text. The bytes are intact, just not displayable as a string.
Convert between all four forms in your browser
Free, no signup. Standard and URL-safe Base64. Unicode and emoji supported.
What this post is not
It is not a security guide. Neither URL encoding nor Base64 protects a value. Both formats are public, fully reversible, and intended to make characters or bytes survive transport. Any value that needs to stay private has to be encrypted separately, before encoding.