URL components and their names
This post explains the names of different parts of a URL. For a given URL like
https://blog.mysite.com:8080/marketing/parts-url;foo=bar?key1=hello&key2=world#qux
Names for individual components:
https
: Schemeblog
: Subdomainmysite
: 2nd-level domaincom
: Top-level domainblog.mysite.com
: Host8080
: Port/marketing/parts-url
: Pathfoo=bar
: Params.key1=hello&key2=world
: Queryqux
: Fragment.
Names for combination of multiple components:
blog.mysite.com:8080
: i.e.<host>:<port>
, Network location/Sockethttps://blog.mysite.com:8080
: i.e.<scheme>://<host>:<port>
, Origin
Parsing example in Python:
>>> from urllib.parse import urlparse
>>> urlparse("https://blog.example.com:8080/marketing/parts-url;foo=bar?key1=hello&key2=world#qux")
ParseResult(scheme='https', netloc='blog.example.com:8080', path='/marketing/parts-url', params='foo=bar', query='key1=hello&key2=world', fragment='qux')
References:
- RFC 1808 Relative Uniform Resource
Locators
- Generic-RL syntax:
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
- Generic-RL syntax: