User Guide ========== .. currentmodule:: urllib3 Installing ---------- urllib3 can be installed with `pip `_ .. code-block:: bash $ python -m pip install urllib3 Making Requests --------------- First things first, import the urllib3 module: .. code-block:: python import urllib3 You'll need a :class:`~poolmanager.PoolManager` instance to make requests. This object handles all of the details of connection pooling and thread safety so that you don't have to: .. code-block:: python http = urllib3.PoolManager() To make a request use :meth:`~urllib3.PoolManager.request`: .. code-block:: python import urllib3 # Creating a PoolManager instance for sending requests. http = urllib3.PoolManager() # Sending a GET request and getting back response as HTTPResponse object. resp = http.request("GET", "https://httpbin.org/robots.txt") # Print the returned data. print(resp.data) # b"User-agent: *\nDisallow: /deny\n" ``request()`` returns a :class:`~response.HTTPResponse` object, the :ref:`response_content` section explains how to handle various responses. You can use :meth:`~urllib3.PoolManager.request` to make requests using any HTTP verb: .. code-block:: python import urllib3 http = urllib3.PoolManager() resp = http.request( "POST", "https://httpbin.org/post", fields={"hello": "world"} # Add custom form fields ) print(resp.data) # b"{\n "form": {\n "hello": "world"\n }, ... } The :ref:`request_data` section covers sending other kinds of requests data, including JSON, files, and binary data. .. note:: For quick scripts and experiments you can also use a top-level ``urllib3.request()``. It uses a module-global ``PoolManager`` instance. Because of that, its side effects could be shared across dependencies relying on it. To avoid side effects, create a new ``PoolManager`` instance and use it instead. In addition, the method does not accept the low-level ``**urlopen_kw`` keyword arguments. System CA certificates are loaded on default. .. _response_content: Response Content ---------------- The :class:`~response.HTTPResponse` object provides :attr:`~response.HTTPResponse.status`, :attr:`~response.HTTPResponse.data`, and :attr:`~response.HTTPResponse.headers` attributes: .. code-block:: python import urllib3 # Making the request (The request function returns HTTPResponse object) resp = urllib3.request("GET", "https://httpbin.org/ip") print(resp.status) # 200 print(resp.data) # b"{\n "origin": "104.232.115.37"\n}\n" print(resp.headers) # HTTPHeaderDict({"Content-Length": "32", ...}) JSON Content ~~~~~~~~~~~~ JSON content can be loaded by :meth:`~response.HTTPResponse.json` method of the response: .. code-block:: python import urllib3 resp = urllib3.request("GET", "https://httpbin.org/ip") print(resp.json()) # {"origin": "127.0.0.1"} Alternatively, Custom JSON libraries such as `orjson` can be used to encode data, retrieve data by decoding and deserializing the :attr:`~response.HTTPResponse.data` attribute of the request: .. code-block:: python import orjson import urllib3 encoded_data = orjson.dumps({"attribute": "value"}) resp = urllib3.request(method="POST", url="http://httpbin.org/post", body=encoded_data) print(orjson.loads(resp.data)["json"]) # {'attribute': 'value'} Binary Content ~~~~~~~~~~~~~~ The :attr:`~response.HTTPResponse.data` attribute of the response is always set to a byte string representing the response content: .. code-block:: python import urllib3 resp = urllib3.request("GET", "https://httpbin.org/bytes/8") print(resp.data) # b"\xaa\xa5H?\x95\xe9\x9b\x11" .. note:: For larger responses, it's sometimes better to :ref:`stream ` the response. Using io Wrappers with Response Content ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sometimes you want to use :class:`io.TextIOWrapper` or similar objects like a CSV reader directly with :class:`~response.HTTPResponse` data. Making these two interfaces play nice together requires using the :attr:`~response.HTTPResponse.auto_close` attribute by setting it to ``False``. By default HTTP responses are closed after reading all bytes, this disables that behavior: .. code-block:: python import io import urllib3 resp = urllib3.request("GET", "https://example.com", preload_content=False) resp.auto_close = False for line in io.TextIOWrapper(resp): print(line) # # # # .... # # .. _request_data: Request Data ------------ Headers ~~~~~~~ You can specify headers as a dictionary in the ``headers`` argument in :meth:`~urllib3.PoolManager.request`: .. code-block:: python import urllib3 resp = urllib3.request( "GET", "https://httpbin.org/headers", headers={ "X-Something": "value" } ) print(resp.json()["headers"]) # {"X-Something": "value", ...} Or you can use the ``HTTPHeaderDict`` class to create multi-valued HTTP headers: .. code-block:: python import urllib3 # Create an HTTPHeaderDict and add headers headers = urllib3.HTTPHeaderDict() headers.add("Accept", "application/json") headers.add("Accept", "text/plain") # Make the request using the headers resp = urllib3.request( "GET", "https://httpbin.org/headers", headers=headers ) print(resp.json()["headers"]) # {"Accept": "application/json, text/plain", ...} Cookies ~~~~~~~ Cookies are specified using the ``Cookie`` header with a string containing the ``;`` delimited key-value pairs: .. code-block:: python import urllib3 resp = urllib3.request( "GET", "https://httpbin.org/cookies", headers={ "Cookie": "session=f3efe9db; id=30" } ) print(resp.json()) # {"cookies": {"id": "30", "session": "f3efe9db"}} Note that the ``Cookie`` header will be stripped if the server redirects to a different host. Cookies provided by the server are stored in the ``Set-Cookie`` header: .. code-block:: python import urllib3 resp = urllib3.request( "GET", "https://httpbin.org/cookies/set/session/f3efe9db", redirect=False ) print(resp.headers["Set-Cookie"]) # session=f3efe9db; Path=/ Query Parameters ~~~~~~~~~~~~~~~~ For ``GET``, ``HEAD``, and ``DELETE`` requests, you can simply pass the arguments as a dictionary in the ``fields`` argument to :meth:`~urllib3.PoolManager.request`: .. code-block:: python import urllib3 resp = urllib3.request( "GET", "https://httpbin.org/get", fields={"arg": "value"} ) print(resp.json()["args"]) # {"arg": "value"} For ``POST`` and ``PUT`` requests, you need to manually encode query parameters in the URL: .. code-block:: python from urllib.parse import urlencode import urllib3 # Encode the args into url grammar. encoded_args = urlencode({"arg": "value"}) # Create a URL with args encoded. url = "https://httpbin.org/post?" + encoded_args resp = urllib3.request("POST", url) print(resp.json()["args"]) # {"arg": "value"} .. _form_data: Form Data ~~~~~~~~~ For ``PUT`` and ``POST`` requests, urllib3 will automatically form-encode the dictionary in the ``fields`` argument provided to :meth:`~urllib3.PoolManager.request`: .. code-block:: python import urllib3 resp = urllib3.request( "POST", "https://httpbin.org/post", fields={"field": "value"} ) print(resp.json()["form"]) # {"field": "value"} .. _json: JSON ~~~~ To send JSON in the body of a request, provide the data in the ``json`` argument to :meth:`~urllib3.PoolManager.request` and urllib3 will automatically encode the data using the ``json`` module with ``UTF-8`` encoding. In addition, when ``json`` is provided, the ``"Content-Type"`` in headers is set to ``"application/json"`` if not specified otherwise. .. code-block:: python import urllib3 resp = urllib3.request( "POST", "https://httpbin.org/post", json={"attribute": "value"}, headers={"Content-Type": "application/json"} ) print(resp.json()) # {'headers': {'Content-Type': 'application/json', ...}, # 'data': '{"attribute":"value"}', 'json': {'attribute': 'value'}, ...} Files & Binary Data ~~~~~~~~~~~~~~~~~~~ For uploading files using ``multipart/form-data`` encoding you can use the same approach as :ref:`form_data` and specify the file field as a tuple of ``(file_name, file_data)``: .. code-block:: python import urllib3 # Reading the text file from local storage. with open("example.txt") as fp: file_data = fp.read() # Sending the request. resp = urllib3.request( "POST", "https://httpbin.org/post", fields={ "filefield": ("example.txt", file_data), } ) print(resp.json()["files"]) # {"filefield": "..."} While specifying the filename is not strictly required, it's recommended in order to match browser behavior. You can also pass a third item in the tuple to specify the file's MIME type explicitly: .. code-block:: python resp = urllib3.request( "POST", "https://httpbin.org/post", fields={ "filefield": ("example.txt", file_data, "text/plain"), } ) For sending raw binary data simply specify the ``body`` argument. It's also recommended to set the ``Content-Type`` header: .. code-block:: python import urllib3 with open("/home/samad/example.jpg", "rb") as fp: binary_data = fp.read() resp = urllib3.request( "POST", "https://httpbin.org/post", body=binary_data, headers={"Content-Type": "image/jpeg"} ) print(resp.json()["data"]) # data:application/octet-stream;base64,... .. _ssl: Certificate Verification ------------------------ .. note:: *New in version 1.25:* HTTPS connections are now verified by default (``cert_reqs = "CERT_REQUIRED"``). While you can disable certification verification by setting ``cert_reqs = "CERT_NONE"``, it is highly recommend to leave it on. Unless otherwise specified urllib3 will try to load the default system certificate stores. The most reliable cross-platform method is to use the `certifi `_ package which provides Mozilla's root certificate bundle: .. code-block:: bash $ python -m pip install certifi Once you have certificates, you can create a :class:`~poolmanager.PoolManager` that verifies certificates when making requests: .. code-block:: python import certifi import urllib3 http = urllib3.PoolManager( cert_reqs="CERT_REQUIRED", ca_certs=certifi.where() ) The :class:`~poolmanager.PoolManager` will automatically handle certificate verification and will raise :class:`~exceptions.SSLError` if verification fails: .. code-block:: python import certifi import urllib3 http = urllib3.PoolManager( cert_reqs="CERT_REQUIRED", ca_certs=certifi.where() ) http.request("GET", "https://httpbin.org/") # (No exception) http.request("GET", "https://expired.badssl.com") # urllib3.exceptions.SSLError ... .. note:: You can use OS-provided certificates if desired. Just specify the full path to the certificate bundle as the ``ca_certs`` argument instead of ``certifi.where()``. For example, most Linux systems store the certificates at ``/etc/ssl/certs/ca-certificates.crt``. Other operating systems can be `difficult `_. Using Timeouts -------------- Timeouts allow you to control how long (in seconds) requests are allowed to run before being aborted. In simple cases, you can specify a timeout as a ``float`` to :meth:`~urllib3.PoolManager.request`: .. code-block:: python import urllib3 resp = urllib3.request( "GET", "https://httpbin.org/delay/3", timeout=4.0 ) print(type(resp)) # # This request will take more time to process than timeout. urllib3.request( "GET", "https://httpbin.org/delay/3", timeout=2.5 ) # MaxRetryError caused by ReadTimeoutError For more granular control you can use a :class:`~util.timeout.Timeout` instance which lets you specify separate connect and read timeouts: .. code-block:: python import urllib3 resp = urllib3.request( "GET", "https://httpbin.org/delay/3", timeout=urllib3.Timeout(connect=1.0) ) print(type(resp)) # urllib3.request( "GET", "https://httpbin.org/delay/3", timeout=urllib3.Timeout(connect=1.0, read=2.0) ) # MaxRetryError caused by ReadTimeoutError If you want all requests to be subject to the same timeout, you can specify the timeout at the :class:`~urllib3.poolmanager.PoolManager` level: .. code-block:: python import urllib3 http = urllib3.PoolManager(timeout=3.0) http = urllib3.PoolManager( timeout=urllib3.Timeout(connect=1.0, read=2.0) ) You still override this pool-level timeout by specifying ``timeout`` to :meth:`~urllib3.PoolManager.request`. Retrying Requests ----------------- urllib3 can automatically retry idempotent requests. This same mechanism also handles redirects. You can control the retries using the ``retries`` parameter to :meth:`~urllib3.PoolManager.request`. By default, urllib3 will retry requests 3 times and follow up to 3 redirects. To change the number of retries just specify an integer: .. code-block:: python import urllib3 urllib3.request("GET", "https://httpbin.org/ip", retries=10) To disable all retry and redirect logic specify ``retries=False``: .. code-block:: python import urllib3 urllib3.request( "GET", "https://nxdomain.example.com", retries=False ) # NewConnectionError resp = urllib3.request( "GET", "https://httpbin.org/redirect/1", retries=False ) print(resp.status) # 302 To disable redirects but keep the retrying logic, specify ``redirect=False``: .. code-block:: python resp = urllib3.request( "GET", "https://httpbin.org/redirect/1", redirect=False ) print(resp.status) # 302 For more granular control you can use a :class:`~util.retry.Retry` instance. This class allows you far greater control of how requests are retried. For example, to do a total of 3 retries, but limit to only 2 redirects: .. code-block:: python urllib3.request( "GET", "https://httpbin.org/redirect/3", retries=urllib3.Retry(3, redirect=2) ) # MaxRetryError You can also disable exceptions for too many redirects and just return the ``302`` response: .. code-block:: python resp = urllib3.request( "GET", "https://httpbin.org/redirect/3", retries=urllib3.Retry( redirect=2, raise_on_redirect=False ) ) print(resp.status) # 302 If you want all requests to be subject to the same retry policy, you can specify the retry at the :class:`~urllib3.poolmanager.PoolManager` level: .. code-block:: python import urllib3 http = urllib3.PoolManager(retries=False) http = urllib3.PoolManager( retries=urllib3.Retry(5, redirect=2) ) You still override this pool-level retry policy by specifying ``retries`` to :meth:`~urllib3.PoolManager.request`. Errors & Exceptions ------------------- urllib3 wraps lower-level exceptions, for example: .. code-block:: python import urllib3 try: urllib3.request("GET","https://nx.example.com", retries=False) except urllib3.exceptions.NewConnectionError: print("Connection failed.") # Connection failed. See :mod:`~urllib3.exceptions` for the full list of all exceptions. Logging ------- If you are using the standard library :mod:`logging` module urllib3 will emit several logs. In some cases this can be undesirable. You can use the standard logger interface to change the log level for urllib3's logger: .. code-block:: python logging.getLogger("urllib3").setLevel(logging.WARNING)