URI-XS/lib/URI/XS.pod
=head1 NAME
URI::XS - fast URI framework, compatible with classic URI.pm, with C++ interface
=head1 DESCRIPTION
URI::XS has similar functionality as L<URI>, but is much faster (sometimes 100x).
URI::XS conforms to RFC 3986.
=head1 SYNOPSIS
use URI::XS;
my $u = URI::XS->new("http://mysite.com:8080/my/path?a=b&c=d#myhash");
say $u->scheme;
say $u->host;
say $u->port;
say $u->path;
say $u->query_string;
Dumper($u->query);
say $u->fragment;
$u = URI::XS->new("about:blank");
say $u->scheme;
say $u->path;
$u->clone;
=head1 FUNCTIONS
=head4 uri($url, [$flags])
Creates URI object from string $url. Created object will be of special subclass (URI::XS::http, URI::XS::ftp, ...) if
scheme is supported. Otherwise it will be of class URI::XS.
Created object is in "strict" mode, i.e. it has additional methods according to the scheme, however you cannot change it's
scheme. You can still set a new url to this objects, but it must have the same scheme or error will be raisen.
Also "strict"(customized) classes has its own constructors with possibly additional arguments like this:
my $url = URI::XS::http->new("http://google.com?b=20", q => 'something', a => 10);
say $url->query_string; # q=something&a=10&b=20
$url->scheme('ftp'); # CROAKS, changing scheme is disallowed.
See custom classes' docs for details.
$flags is a bitmask of one or more of these:
=over
=item ALLOW_SUFFIX_REFERENCE
By default, RFC doesn't allow urls to begin with authority (i.e. host,port). For example
www.google.com/hello
is not interpreted as you might think. In this case, url is treated as relative and "www.google.com/hello" is a path
Enabling this flag makes URI::XS detect such urls:
www.google.com/hello no scheme, host is www.google.com, path is /hello
hello/world no scheme, host is hello, path is world
/hello/world no scheme, no host, path is /hello/world
However, URI::XS never produces RFC-uncompliant urls on output, so
say uri("www.google.com/hello", ALLOW_SUFFIX_REFERENCE);
prints "//www.google.com/hello" (scheme-relative format), making it valid
See https://tools.ietf.org/html/rfc3986#section-4.5
=item QUERY_PARAM_SEMICOLON
If true, URI::XS will use ';' as delimiter between query string params instead of a default '&'. Both for input and output.
=item ALLOW_EXTENDED_CHARS
Allow non-rfc urls with additional chars in query string for parsing ('"', '{', '}' and '|') like browsers or other software with user input,
for example http://host.com?param={"key":"val"}
The stringified form however will always be RFC-compliant
=back
=head4 register_scheme($scheme, $perl_class)
Registers a new scheme and a perl class for that scheme (it must inherit from URI::XS). This only applies when creating
"strict"(customized) urls via uri() function (or via custom class' constructor).
As URI::XS is a C++ framework in its base, you want also register a C++ class for that scheme in XS to be able to do something
when such uris are constructed (even from XS/C code).
See C<REGISTERING SCHEMAS> for how to.
=head4 encode_uri_component($bytes, [$use_plus]), encodeURIComponent($bytes, [$use_plus])
Does what JavaScript's encodeURIComponent does.
$uri = encode_uri_component("http://www.example.com/");
# http%3A%2F%2Fwww.example.com%2F
If $use_plus is true, then produces '+' for spaces instead of '%20'.
=head4 decode_uri_component($bytes), decodeURIComponent($bytes)
Does what JavaScript's decodeURIComponent does.
$str = decode_uri_component("http%3A%2F%2Fwww.example.com%2F");
# http://www.example.com/
=head1 CLASS METHODS
=head4 new($url, [$flags])
Creates URI object from string $url. Created object will be "non-strict", i.e. it will be of class "URI::XS" and won't have any
scheme-specific methods, however you can change its scheme and set new urls with defferent scheme into the object.
register_scheme() makes no effect for this method.
$flags are the same as for uri() function.
=head1 OBJECT METHODS
=head4 url([$newurl], [$flags])
Returns url as string. If $newurl is present, sets this url in object (respecting $flags). May croak if object is in "strict" mode
and $newurl's scheme differs from current. If object is "strict" and $newurl has no scheme, it's assumed to be current
(instead of leaving it empty if object is non-strict). Examples:
my $u = URI::XS->new("http://facebook.com"); # non-strict mode
$u->query({a => 1, b => 2});
say $u->url; # http://facebook.com?a=1&b=2
$u->url("//twitter.com"); # scheme-relative url
say $u; # //twitter.com
$u = uri("http://facebook.com"); # strict mode
$u->url("//twitter.com");
say $u; # http://twitter.com, force object's scheme as it cannot change
$u->url("svn://svn.com"); # croaks, scheme cannot change
$u = URI::XS::ftp->new("//my.com"); # strict mdoe
say $u; # ftp://my.com
$u->url("http://ya.ru"); # croaks
=head4 scheme([$new_scheme]), proto([$new_scheme]), protocol([$new_scheme])
Sets/returns uri's scheme. May croak if object is strict and new scheme differs from current.
=head4 user_info([$new_uinfo])
Sets/returns user_info part of uri (ftp://<user_info>@host/...)
=head4 host([$new_host])
Sets/returns host part of uri
=head4 port([$new_port])
Sets/returns port. If no port is explicitly present in uri, returns default port for uri's scheme. If no scheme in uri, returns 0.
=head4 explicit_port()
Returns port if it was explicitly set via port() or was present in uri. Otherwise returns 0.
=head4 default_port()
Returns default port for the uri's scheme. Returns 0 if scheme is not specified/not supported.
=head4 path([$new_path])
Sets/returns path part of uri as string.
=head4 path_info()
Returns URI-decoded path as per RFC 3875
=head4 query_string([$new_query_string])
Sets/returns query string part of uri as string. String is expected/returned in decoded, but plain format,
i.e. after uri encode of all params, but before encode_uri_component of the whole result string.
=head4 raw_query([$new_query_string])
Sets/returns query string part of uri as string. String is expected/returned in RAW (encoded) format, i.e.
after uri encode of all params and after encode_uri_component of the whole result string.
=head4 query([\%new_query | %new_query | $new_query_string])
If no params specified, returns query part of uri as hashref. Keys/values are returned unencoded.
If uri has no query params, empty hash is returned.
If you change returned hash, no changes will occur in uri object.
To commit these changes, set this hash again via query($hash) or use param() method.
If params are specified, sets new query from hash or hashref or string.
Keys/values are accepted unencoded for hash/hashref.
If you pass query as string, the effect will be the same as calling query_string($new_query).
If you want to make query strings like 'a=1&a=2&a=3', set "a"'s value to an arrayref of values, like:
$u = URI::XS->new("http://ya.ru");
$u->query(b => 10, a => [1,2,3]);
say $u; # http://ya.ru?b=10&a=1&a=2&a=3
Note hovewer, that multiparams are NOT returned in hashref:
say Dumper($u->query); # {b => 10, a => 1/2/3 }
A's value may be any of 1/2/3 depending on hash order. This is done because most of the time you don't want multiparams and don't
wanna be suprised by an arrayref in query if someone passes you second value for some key.
If you want to get all values of multiparam, use multiparam().
=head4 add_query(\%query | %query | $query_string)
Like query() but instead of replacing, adds passed query to existing query. If some key already exists in uri's query, it doesn't
get replaced, instead it becomes a multiparam.
=head4 param($name, [$value | \@values])
Without second arg, returns the value of query param '$name'. If no such param exists, return undef. If param $name is a multiparam,
returns one of its values.
With $value supplied, replaces current value(values) of $name with $value.
With \@values supplied, replaces current value(values) of $name with \@values ($name becomes multiparam).
=head4 multiparam($name, [$value | \@values])
Does the same as param() does. The only difference is when called without second arg, returns a list of param's values if param
is a multiparam. Also returns empty list instead of undef if there is no such param in query.
=head4 nparam()
Returns the number of query parameters in query (even for multiparams). For example:
"http://google.com"; # nparam() == 0
"http://google.com?a=1&b=2"; # nparam() == 2
"http://google.com?a=1&b=2&b=3&b=4"; # nparam() == 4
=head4 remove_param($name)
Removes param $name from query. If param is a multiparam, removes all its values.
=head4 fragment([$new_fragment]), hash([$new_fragment])
Sets/returns fragment (hash) part of uri.
=head4 location([$new_location])
Sets/returns location part of uri. Location is a "host:port" together. If no port was explicitly set, returned
location will contain details port for the scheme. If no scheme defined, or scheme is unknown, returned location will contain
port 0 - "host:0". Examples:
say URI::XS->new("http://ya.ru:8080")->location; # ya.ru:8080
say URI::XS->new("http://ya.ru")->location; # ya.ru:80
say URI::XS->new("//ya.ru")->location; # ya.ru:0
say URI::XS->new("http://ya.ru")->explicit_location; # ya.ru
say URI::XS->new("http://ya.ru:8080")->explicit_location; # ya.ru:8080
=head4 explicit_location()
Returns location with explicit port set if any, otherwise returns location without port (i.e. just host).
Effect is the same as
$u->explicit_port ? $u->host.':'.$u->port : $u->host
=head4 relative(), rel()
Returns uri, relative to current scheme and location, for example:
say uri("http://ya.ru/mypath")->relative; # /mypath
=head4 to_string(), as_string(), '""'
Returns the whole uri as string.
=head4 to_bool(), 'bool'
Returns true if url is not empty. Note that if an uri object has only user_info or only port set, it is empty as it is not printable.
Actually
if ($uri) {} # the same as if ($uri->to_bool())
is the same as
if ($uri->to_string) {}
but runs faster.
=head4 empty()
Same as C<!$uri>.
=head4 secure()
Returns true if uri's scheme is secure (for example, https).
=head4 set($other_uri)
Sets uri from another uri object making them equal. May croak if current object is strict and other object has different scheme.
=head4 assign($url, [$flags])
Same as url($url, [$flags])
=head4 equals($other_uri), 'eq'
Returns true if $other_uri contains the same url (including all parts - query, fragment, etc).
=head4 clone()
Clones current uri. If current uri is in strict mode, then cloned uri will be in strict mode too.
=head4 path_segments([@new_segments])
Sets/returns path segments as list.
$u = uri("http://ya.ru/abc/def/jopa");
say join(", ", $u->path_segments); # abc, def, jopa
$u->path_segments('my', 'folder');
say $u; # http://ya.ru/my/folder
=head4 user([$new_user])
Sets/returns user part of user_info in uri.
=head4 password([$new_pass])
Sets/returns password part of user_info in uri.
=head1 STRICT CLASSES
=head2 URI::XS::http
=head2 URI::XS::https
=head2 URI::XS::ws
=head2 URI::XS::wss
=head4 new($url, [\%query | %query | $query_string])
If provided, adds query params to $url after creating object.
=head2 URI::XS::ftp
=head2 URI::XS::socks
=head2 URI::XS::ssh
=head2 URI::XS::telnet
=head2 URI::XS::sftp
=head1 CLONING/SERIALIZING
URI::XS supports:
=over
=item cloning via Storable
=item cloning via L<Data::Recursive>'s C<clone>
=item serializing/deserializing via Storable
=item serializing via JSON::XS with C<convert_blessed> flag enabled
=back
=head1 REGISTERING SCHEMAS
=head2 PURE PERL IMPLEMENTATION
You will not be able to change core features of URI via pure perl implementation
package URI::myproto;
use parent 'URI::XS';
URI::XS::register_scheme("myproto", "MyURI::myproto");
sub some_data_from_user_info {}
# run-time
my $u = uri("myproto://info@google.com");
say ref $u; # URI::myproto
say $u->some_data_from_user_info;
=head2 NATIVE IMPLEMENTATION
Let's create our custom scheme "myproto" which like FTP uses some info from "user_info". Our protocol won't be secure and default
port is for example 12345.
Firstly we need to create our own C++ class. It must inherit from panda::uri::URI::Strict
#include <panda/uri.h>
using panda::uri::URI;
struct URImyproto : URI::Strict<URImyproto> {
using URI::Strict<URImyproto>::Strict;
string some_data_from_user_info () const {
// parse user_info
// return result
}
void some_data_from_user_info (const string& new_data) {
// change user_info
}
}
Register your new scheme somewhere in program's initialization:
void init () {
...
URI::register_scheme("myproto", &typeid(URImyproto), [](const URI& u) -> URI* { return new URImyproto(u); }, 12345, false);
}
That's it. Now use your custom scheme:
auto uri = URI::create("myproto://myinfo@google.com");
auto myuri = dynamic_cast<URImyproto*>(uri.get()); // will return not-null
cout << myuri->some_data_from_user_info();
Finally, create an XS and register a perl class
#include <xs/uri.h>
// TYPEMAP
template <> struct Typemap<URImyproto*> : Typemap<panda::uri::URI*, URImyproto*> {}
MODULE = URI::myproto PACKAGE = URI::myproto
PROTOTYPES: DISABLE
URImyproto* URImyproto::new (string url = string(), int flags = 0)
string URImyproto::some_data_from_user_info (SV* newval = NULL) {
if (newval) {
THIS->some_data_from_user_info(xs::in<string>(newval));
XSRETURN_UNDEF;
}
RETVAL = THIS->some_data_from_user_info();
}
# perl package
package URI::myproto;
use parent 'URI::XS';
URI::XS::register_scheme("myproto", "MyURI::myproto");
=head1 EXPORTED TYPEMAPS
=head2 TYPEMAPS
=head4 panda::uri::URI*, panda::uri::URISP
Typemap for input/output any URI objects.
=head4 panda::uri::URI::http*, panda::uri::URI::https*, panda::uri::URI::ftp*, panda::uri::URI::socks*, ... other protocols
Typemaps for input/output strict uris.
=head4 xs::uri::URIx
Output-only typemap for autodetecting strict uri type and setting right perl class to bless to.
# XS
URISP my_cool_uri_create1 (string url) {
RETVAL = URI::create(url);
}
URIx my_cool_uri_create2 (string url) {
RETVAL = URI::create(url);
}
# Perl
say ref my_cool_uri_create1("http://ya.ru"); # URI::XS
say ref my_cool_uri_create2("http://ya.ru"); # URI::XS::http
=head1 AUTHOR
Pronin Oleg <syber@crazypanda.ru>, Crazy Panda LTD
=head1 LICENSE
You may distribute this code under the same terms as Perl itself.
=cut