What's URL encoding

URL encoding, also known as percent encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters. Is is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

Detail of URL encoding

A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value. For example, "%20" is the percent-encoding for the binary octet "00100000" (ABNF: %x20), which in US-ASCII corresponds to the space character (SP).

According to RFC 3986, The characters allowed in a URI are either reserved or unreserved (or a percent character as part of a percent-encoding).

Reserved characters are those characters that sometimes have special meaning. For example, forward slash characters are used to separate different parts of a URL (or more generally, a URI).

! * ' ( ) ; : @ & = + $ , / ? # [ ]

The unreserved characters can be encoded, but should not be encoded. The unreserved characters include alpha A to Z, a to z, number 0 to 9, and - _ . ~.

The function encodeURIComponent in JavaScript aims to do URL encoding, and used widely in web development. The function does not encode characters ! * ' ( ), it seems to not fit RFC 3986 specification. The tools in the page will do better than encodeURIComponent.

See also