URLs Explained
A URL (Uniform Resource Locator) is a means of locating a resource on the web, i.e., its address. A URL is the text you enter into the address field of a browser when you want to visit a Web site. For example, http://www.example.com
is a valid URL.
Elements of a URL
URLs conform to a general format of nine elements:
<scheme>://<user>:<password>@<host>:<port>/<path>;<parameters>?<query>#<fragment>
Not all of these elements need be present in a URL.
-
scheme: The protocol to use when accessing a resource. Some example schemes include
http
,https
,ftp
andfile
. -
user: Some schemes require a username to access a resource, e.g.,
mailto:bob@example.com
. -
password: A password can sometimes be included after the username, separated by a colon (:).
-
host: The host name, or IP address of the server hosting the resource.
-
port: Servers listen for connections on specific ports. Most schemes have a default port number. The default port number for
http
is port 80. -
path: This is the local name for the resource on the hosting server. For example, the following URL contains the path component
images/welcome.gif
:http://www.example.co.uk/images/welcome.gif
-
parameters: Parameters are name/value pairs used by some schemes to specify input data. Multiple pairs can be specified, separated from each other and the remainder of the URL by semi-colons (;).
-
query: Used by some schemes to pass data to applications (e.g. databases, search engines). There is no fixed format for query data and is separated from the rest of the URL by a question mark (?).
-
fragment: Sometimes known as a
reference
it specifies a part of the resource. The fragment is not passed to the server but is processed locally by the client. It is often used to navigate to specific points on a Web page. It is separated from the rest of the URL by a hash character (#).
Encoding of Unsafe Characters
URLs are only allowed to contain characters from a relatively small safe alphabet. Any unsafe characters within a URL should be encoded.
Unsafe characters are encoded using an escape sequence. This sequence is a percent (%
) character followed by two hexadecimal characters representing the ASCII code of the character. For example, the SPACE character is an unsafe character that has an ASCII code of 32 (Hex value 20).
The FcUrl
class has methods encode()
and decode()
for encoding and decoding strings. In most circumstances, Eggplant Performance automatically handles the URL encoding for you.