labthings_fastapi.outputs.blob ============================== .. py:module:: labthings_fastapi.outputs.blob .. autoapi-nested-parse:: BLOB Output Module. The ``.Blob`` class is used when you need to return something file-like that can't easily (or efficiently) be converted to JSON. This is useful for returning large objects like images, especially where an existing file-type is the obvious way to handle it. There is a documentation page on :ref:`blobs` that explains how to use this mechanism. To return a file from an action, you should declare its return type as a `.Blob` subclass, defining the `.Blob.media_type` attribute. .. code-block:: python class MyImageBlob(Blob): media_type = "image/png" class MyThing(Thing): @action def get_image(self) -> MyImageBlob: # Do something to get the image data data = self._get_image_data() return MyImageBlob.from_bytes(data) The action should then return an instance of that subclass, with data supplied either as a `bytes` object or a file on disk. If files are used, it's your responsibility to ensure the file is deleted after the `.Blob` object is garbage-collected. Constructing it using the class methods `.Blob.from_bytes` or `.Blob.from_temporary_directory` will ensure this is done for you. Bear in mind a `tempfile.TemporaryFile` object only holds a file descriptor and is not safe for concurrent use, which does not work well with the HTTP API: action outputs may be retrieved multiple times after the action has completed, possibly concurrently. Creating a temp folder and making a file inside it with `.Blob.from_temporary_directory` is the safest way to deal with this. **Serialisation** `.Blob` objects are serialised to a JSON representation that includes a download ``href``\ . This is generated using `.middleware.url_for` which uses a context variable to pass the function that generates URLs to the serialiser code. That context variable is available in every response handler function in the FastAPI app - but it is not, in general, available in action or property code (because actions and properties run their code in separate threads). The sequence of events that leads to a `Blob` being downloaded as a result of an action is roughly: * A `POST` request invokes the action. * `.middleware.url_for.url_for_middleware` makes `url_for` accessible via a context variable * A `201` response is returned that includes an ``href`` to poll the action. * Action code is run in a separate thread (without `url_for` in the context): * The action creates a `.Blob` object. * The function that creates the `.Blob` object also creates a `.BlobData` object as a property of the `.Blob` * The `.BlobData` object's constructor adds it to the ``blob_manager`` and sets its ``id`` property accordingly. * The `.Blob` is returned by the action. * The output value of the action is stored in the `.Invocation` thread. * A `GET` request polls the action. Once it has completed: * `.middleware.url_for.url_for_middleware` makes `url_for` accessible via a context variable * The `.Invocation` model is returned, which includes the `.Blob` in the ``output`` field. * FastAPI serialises the invocation model, which in turn serialises the `.Blob` and uses ``url_for`` to generate a valid download ``href`` including the ``id`` of the `.BlobData` object. * A further `GET` request actually downloads the `.Blob`\ . This slightly complicated sequence ensures that we only ever send URLs back to the client using `url_for` from the current `.fastapi.Request` object. That means the URL used should be consistent with the URL of the request - so if an action is started by a client using one IP address or DNS name, and polled by a different client, each client will get a download ``href`` that matches the address they are already using. In the future, it may be possible to respond directly with the `.Blob` data to the original `POST` request, however this only works for quick actions so for now we use the sequence above, which will work for both quick and slow actions. Attributes ---------- .. autoapisummary:: labthings_fastapi.outputs.blob.router Classes ------- .. autoapisummary:: labthings_fastapi.outputs.blob.BlobData labthings_fastapi.outputs.blob.RemoteBlobData labthings_fastapi.outputs.blob.LocalBlobData labthings_fastapi.outputs.blob.BlobBytes labthings_fastapi.outputs.blob.BlobFile labthings_fastapi.outputs.blob.BlobModel labthings_fastapi.outputs.blob.Blob Functions --------- .. autoapisummary:: labthings_fastapi.outputs.blob.parse_media_type labthings_fastapi.outputs.blob.match_media_types labthings_fastapi.outputs.blob.download_blob labthings_fastapi.outputs.blob.url_to_id Module Contents --------------- .. py:class:: BlobData(media_type: str) The data store of a Blob. `.Blob` objects can represent their data in various ways. Each of those options must provide three ways to access the data, which are the `content` property, the `save()` method, and the `open()` method. This base class defines the interface needed by any data store used by a `.Blob`. Blobs that store their data locally should subclass `.LocalBlobData` which adds a `response()` method and `id` property, appropriate for data that would need to be downloaded from a server. It also takes care of generating a download URL when it's needed. Initialise a `.BlobData` object. :param media_type: the MIME type of the data. .. py:attribute:: _media_type .. py:property:: media_type :type: str The MIME type of the data, e.g. 'image/png' or 'application/json'. .. py:method:: get_href() -> str :abstractmethod: Return the URL to download the blob. The implementation of this method for local blobs will need `.url_for.url_for` and thus it should only be called in a response handler when the `.middeware.url_for` middleware is enabled. :return: the URL as a string. :raises NotImplementedError: always, as this must be implemented by subclasses. .. py:property:: content :type: bytes :abstractmethod: The data as a `bytes` object. :raises NotImplementedError: always, as this must be implemented by subclasses. .. py:method:: save(filename: str) -> None :abstractmethod: Save the data to a file. :param filename: the path where the file should be saved. :raises NotImplementedError: always, as this must be implemented by subclasses. .. py:method:: open() -> io.IOBase :abstractmethod: Return a file-like object that may be read from. :return: an open file-like object. :raises NotImplementedError: always, as this must be implemented by subclasses. .. py:class:: RemoteBlobData(media_type: str, href: str, client: httpx.Client | None = None) Bases: :py:obj:`BlobData` A BlobData subclass that references remote data via a URL. This `.BlobData` implementation will download data lazily, and provides it in the three ways defined by `.BlobData`\ . It does not cache downloaded data: if the `.content` attribute is accessed multiple times, the data will be downloaded again each time. .. note:: This class is rarely instantiated directly. It is usually best to use `.Blob.from_url` on a `.Blob` subclass. Create a reference to remote `.Blob` data. :param media_type: the MIME type of the data. :param href: the URL where it may be downloaded. :param client: if supplied, this `httpx.Client` will be used to download the data. .. py:attribute:: _href .. py:attribute:: _client .. py:method:: get_href() -> str Return the URL to download the data. :return: the URL as a string. .. py:property:: content :type: bytes The binary data, as a `bytes` object. .. py:method:: save(filepath: str) -> None Save the output to a file. Note that the current implementation retrieves the data into memory in its entirety, and saves to file afterwards. :param filepath: the file will be saved at this location. .. py:method:: open() -> io.IOBase Open the output as a binary file-like object. Internally, this will download the file to memory, and wrap the resulting `bytes` object in an `io.BytesIO` object to allow it to function as a file-like object. To work with the data on disk, use `save` instead. :return: a file-like object containing the downloaded data. .. py:class:: LocalBlobData(media_type: str) Bases: :py:obj:`BlobData` A BlobData subclass where the data is stored locally. `.Blob` objects can reference data by a URL, or can wrap data held in memory or on disk. For the non-URL options, we need to register the data with the `.BlobManager` and allow it to be downloaded. This class takes care of registering with the `.BlobManager` and adds the `.response` method that must be overridden by subclasses to allow downloading. See `.BlobBytes` or `.BlobFile` for concrete implementations. Initialise the `.LocalBlobData` object. :param media_type: the MIME type of the data. .. py:attribute:: _all_blobdata :type: ClassVar[weakref.WeakValueDictionary[uuid.UUID, LocalBlobData]] A way to retrieve `.LocalBlobData` objects by their ID. Note that this does not interfere with garbage collection, as it only holds weak references to the `.LocalBlobData` objects. .. py:attribute:: _id .. py:method:: from_id(id: uuid.UUID) -> LocalBlobData :classmethod: Retrieve a `.LocalBlobData` object by its ID. Note that this does not imply `.LocalBlobData` objects are permanently stored: if there are no strong references to the object, it may have been garbage collected and will no longer be available. :param id: the UUID of the desired `.LocalBlobData` object. :return: the corresponding `.LocalBlobData` object. :raise KeyError: if no such object exists. .. py:method:: all_ids() -> list[uuid.UUID] :classmethod: Return a list of all currently registered BlobData IDs. :return: a list of UUIDs for all registered `.LocalBlobData` objects. .. py:property:: id :type: uuid.UUID A unique identifier for this BlobData object. The ID is set when the BlobData object is added to the `BlobDataManager` during initialisation. .. py:method:: get_href() -> str Return a URL where this data may be downloaded. Note that this should only be called in a response handler, as it relies on `.url_for.url_for`\ . :return: the URL as a string. .. py:method:: response() -> fastapi.responses.Response :abstractmethod: Return a`fastapi.Response` object that sends binary data. :return: a response that streams the data from disk or memory. :raises NotImplementedError: always, as this must be implemented by subclasses. .. py:class:: BlobBytes(data: bytes, media_type: str) Bases: :py:obj:`LocalBlobData` A `.Blob` that holds its data in memory as a `bytes` object. `.Blob` objects use objects conforming to the `.BlobData` protocol to store their data either on disk or in a file. This implements the protocol using a `bytes` object in memory. .. note:: This class is rarely instantiated directly. It is usually best to use `.Blob.from_bytes` on a `.Blob` subclass. Create a `.BlobBytes` object. .. note:: This class is rarely instantiated directly. It is usually best to use `.Blob.from_bytes` on a `.Blob` subclass. :param data: is the data to be wrapped. :param media_type: is the MIME type of the data. .. py:attribute:: _id :type: uuid.UUID .. py:attribute:: _bytes .. py:property:: content :type: bytes The wrapped data, as a `bytes` object. .. py:method:: save(filename: str) -> None Save the wrapped data to a file. :param filename: where to save the data. .. py:method:: open() -> io.IOBase Return an open file-like object containing the data. This wraps the underlying `bytes` in an `io.BytesIO`. :return: an `io.BytesIO` object wrapping the data. .. py:method:: response() -> fastapi.responses.Response Send the underlying data over the network. :return: a response that streams the data from memory. .. py:class:: BlobFile(file_path: str, media_type: str, **kwargs: Any) Bases: :py:obj:`LocalBlobData` A `.BlobData` backed by a file on disk. Only the filepath is retained by default. If you are using e.g. a temporary directory, you should add the `.TemporaryDirectory` as an instance attribute, to stop it being garbage collected. See `.Blob.from_temporary_directory`. .. note:: This class is rarely instantiated directly. It is usually best to use `.Blob.from_file` on a `.Blob` subclass. Create a `.BlobFile` to wrap data stored on disk. `.BlobFile` objects wrap data stored on disk as files. They are not usually instantiated directly, but made using `.Blob.from_temporary_directory` or `.Blob.from_file`. :param file_path: is the path to the file. :param media_type: is the MIME type of the data. :param \**kwargs: will be added to the object as instance attributes. This may be used to stop temporary directories from being garbage collected while the `.Blob` exists. :raise IOError: if the file specified does not exist. .. py:attribute:: _file_path .. py:property:: content :type: bytes The wrapped data, as a `bytes` object in memory. This reads the file on disk into a `bytes` object. :return: the contents of the file in a `bytes` object. .. py:method:: save(filename: str) -> None Save the wrapped data to a file. `.BlobFile` objects already store their data on disk. Currently, this method copies the file to the given filename. In the future, this may change to ``move`` for increased efficiency. :param filename: the path where the file should be saved. .. py:method:: open() -> io.IOBase Return an open file-like object containing the data. In the case of `.BlobFile`, this is an open file handle to the underlying file, which is where the data is already stored. It is opened with mode ``"rb"`` i.e. read-only and binary. :return: an open file handle. .. py:method:: response() -> fastapi.responses.Response Generate a response allowing the file to be downloaded. :return: a response that streams the file from disk. .. py:class:: BlobModel(/, **data: Any) Bases: :py:obj:`pydantic.BaseModel` A model for JSON-serialised `.Blob` objects. This model describes the JSON representation of a `.Blob` and does not offer any useful functionality. Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:attribute:: href :type: str The URL where the data may be retrieved. .. py:attribute:: media_type :type: str The MIME type of the data. This should be overridden in subclasses. .. py:attribute:: rel :type: Literal['output'] :value: 'output' The relation of this link to the host object. Currently, `.Blob` objects are found in the output of :ref:`actions`, so they always have ``rel = "output"``. .. py:attribute:: description :type: str :value: 'The output from this action is not serialised to JSON, so it must be retrieved as a file. This... This description is added to the serialised `.Blob`. .. py:function:: parse_media_type(media_type: str) -> tuple[str, str] Parse a media type string into its type and subtype. :param media_type: the media type string to parse. :return: a tuple of (type, subtype) where each is a string or None. :raises ValueError: if the media type is invalid. .. py:function:: match_media_types(media_type: str, pattern: str) -> bool Check if a media type matches a pattern. The pattern may include wildcards, e.g. ``image/*`` or ``*/*``. :param media_type: the media type to check. :param pattern: the pattern to match against. :return: True if the media type matches the pattern, False otherwise. .. py:class:: Blob(data: BlobData, description: str | None = None) A container for binary data that may be retrieved over HTTP. See :ref:`blobs` for more information on how to use this class. A `.Blob` may be created to hold data using the class methods `.Blob.from_bytes`, `.Blob.from_file` or `.Blob.from_temporary_directory`\ . It may also reference remote data, using `.Blob.from_url`\ , though this is currently only used on the client side. The constructor requires a `.BlobData` instance, so the methods mentioned previously are likely a more convenient way to instantiate a `.Blob`\ . You are strongly advised to use a subclass of this class that specifies the `.Blob.media_type` attribute, as this will propagate to the auto-generated documentation and make the return type of your action clearer. This class is `pydantic` compatible, in that it provides a schema, validator and serialiser. However, it may use `.url_for.url_for` during serialisation, so it should only be serialised in a request handler function. This functionality is intended for use by LabThings library functions only. Validation and serialisation behaviour is described in the docstrings of `.Blob._validate` and `.Blob._serialize`. Create a `.Blob` object wrapping the given data. :param data: the `.BlobData` object that stores the data. :param description: an optional description of the blob. :raise ValueError: if the media_type of the data does not match the media_type of the `.Blob` subclass. .. py:attribute:: media_type :type: str :value: '*/*' The MIME type of the data. This should be overridden in subclasses. .. py:attribute:: description :type: str | None :value: None An optional description that may be added to the serialised `.Blob`. .. py:attribute:: _data :type: BlobData This object stores the data - in memory, on disk, or at a URL. .. py:method:: __get_pydantic_core_schema__(source: type[Any], handler: pydantic.GetCoreSchemaHandler) -> pydantic_core.core_schema.CoreSchema :classmethod: Get the pydantic core schema for this type. This magic method allows `pydantic` to serialise `.Blob` instances, and generate a JSONSchema for them. We tell `pydantic` to base its handling of `Blob` on the `.BlobModel` schema, with custom validation and serialisation. Validation and serialisation behaviour is described in the docstrings of `.Blob._validate` and `.Blob._serialize`. The JSONSchema is generated for `.BlobModel` but is then refined in `__get_pydantic_json_schema__` to include the ``media_type`` and ``description`` defaults. :param source: The source type being converted. :param handler: The pydantic core schema handler. :return: The pydantic core schema for the URLFor type. .. py:method:: __get_pydantic_json_schema__(core_schema: Blob.__get_pydantic_json_schema__.core_schema, handler: pydantic.GetJsonSchemaHandler) -> pydantic.json_schema.JsonSchemaValue :classmethod: Customise the JSON Schema to include the media_type. :param core_schema: The core schema for the Blob type. :param handler: The pydantic JSON schema handler. :return: The JSON schema for the Blob type, with media_type included. .. py:method:: _validate(value: Any, handler: collections.abc.Callable[[Any], BlobModel]) -> Self :classmethod: Validate and convert a value to a `.Blob` instance. :param value: The input value, as passed in or loaded from JSON. :param handler: A function that runs the validation logic of BlobModel. If the value is already a `.Blob`, it will be returned directly. Otherwise, we first validate the input using the `.BlobModel` schema. When a `.Blob` is validated, we check to see if the URL given as its ``href`` looks like a `.Blob` download URL on this server. If it does, the returned object will hold a reference to the local data. If we can't match the URL to a `.Blob` on this server, we will raise an error. Handling of `.Blob` input is currently experimental, and limited to passing the output of one Action as input to a subsequent one. :return: a `.Blob` object pointing to the data. :raise ValueError: if the ``href`` does not contain a valid Blob ID, or if the Blob ID is not found on this server. .. py:method:: _serialize(obj: Self, handler: collections.abc.Callable[[BlobModel], Mapping[str, str]]) -> Mapping[str, str] :classmethod: Serialise the Blob to a dictionary. See `.Blob.to_blobmodel` for a description of how we serialise. :param obj: the `.Blob` instance to serialise. :param handler: the handler (provided by pydantic) takes a BlobModel and converts it to a dictionary. The handler runs the serialiser of the core schema we've wrapped, in this case the BlobModel serialiser. :return: a JSON-serialisable dictionary with a URL that allows the `.Blob` to be downloaded from the `.BlobManager`. .. py:method:: to_blobmodel() -> BlobModel Represent the `.Blob` as a `.BlobModel` to get ready to serialise. When `pydantic` serialises this object, we first generate a `.BlobModel` with just the information to be serialised. We use `.from_url.from_url` to generate the URL, so this will error if it is serialised anywhere other than a request handler with the middleware from `.middleware.url_for` enabled. :return: a JSON-serialisable dictionary with a URL that allows the `.Blob` to be downloaded from the `.BlobManager`. .. py:property:: data :type: BlobData The data store for this Blob. It is recommended to use the `.Blob.content` property or `.Blob.save` or `.Blob.open` methods rather than accessing this property directly. :return: the data store wrapping data on disk or in memory. .. py:property:: content :type: bytes Return the the output as a `bytes` object. This property may return the `bytes` object, or if we have a file it will read the file and return the contents. Client objects may use this property to download the output. This property is read-only. You should also only read it once, as no guarantees are given about caching - reading it many times risks reading the file from disk many times, or re-downloading an artifact. :return: a `bytes` object containing the data. .. py:method:: save(filepath: str) -> None Save the output to a file. This may remove the need to hold the output in memory, especially if it is already stored on disk. :param filepath: The location to save the data on disk. .. py:method:: open() -> io.IOBase Open the data as a binary file-like object. This will return a file-like object that may be read from. It may be either on disk (i.e. an open file handle) or in memory (e.g. an `io.BytesIO` wrapper). :return: a binary file-like object. .. py:method:: from_bytes(data: bytes) -> Self :classmethod: Create a `.Blob` from a bytes object. This is the recommended way to create a `.Blob` from data that is held in memory. It should ideally be called on a subclass that has set the ``media_type``. :param data: the data as a `bytes` object. :return: a `.Blob` wrapping the supplied data. .. py:method:: from_temporary_directory(folder: tempfile.TemporaryDirectory, file: str) -> Self :classmethod: Create a `.Blob` from a file in a temporary directory. This is the recommended way to create a `.Blob` from data that is saved to a file, when the file should not be retained. It should ideally be called on a subclass that has set the ``media_type``. The `tempfile.TemporaryDirectory` object will persist as long as this `.Blob` does, which will prevent it from being cleaned up until the object is garbage collected. This means the file will stay on disk until it is no longer needed. :param folder: a `tempfile.TemporaryDirectory` where the file is saved. :param file: the path to the file, relative to the ``folder``. :return: a `.Blob` wrapping the file. .. py:method:: from_file(file: str) -> Self :classmethod: Create a `.Blob` from a regular file. This is the recommended way to create a `.Blob` from a file, if that file will persist on disk. It should ideally be called on a subclass of `.Blob` that has set ``media_type``. .. note:: The file should exist for at least as long as the `.Blob` does; this is assumed to be the case and nothing is done to ensure it's not temporary. If you are using temporary files, consider creating your `.Blob` with `from_temporary_directory` instead. :param file: is the path to the file. This file must exist. :return: a `.Blob` object referencing the specified file. .. py:method:: from_url(href: str, client: httpx.Client | None = None) -> Self :classmethod: Create a `.Blob` that references data at a URL. This is the recommended way to create a `.Blob` that references data held remotely. It should ideally be called on a subclass of `.Blob` that has set ``media_type``. :param href: the URL where the data may be downloaded. :param client: if supplied, this `httpx.Client` will be used to download the data. :return: a `.Blob` object referencing the specified URL. .. py:method:: response() -> fastapi.responses.Response Return a suitable response for serving the output. This method is called by the `~lt.ThingServer` to generate a response that returns the data over HTTP. :return: an HTTP response that streams data from memory or file. :raise NotImplementedError: if the data is not local. It's not currently possible to serve remote data via the `.BlobManager`. .. py:data:: router A FastAPI router for BlobData download endpoints. .. py:function:: download_blob(blob_id: uuid.UUID) -> fastapi.responses.Response Download a `.Blob`. This function returns a `fastapi.Response` allowing the data to be downloaded, using the `.LocalBlobData.response` method. :param blob_id: the unique ID of the blob data. :return: a `fastapi.Response` object that will send the content of the blob over HTTP. :raises HTTPException: if the requested blob is not found. .. py:function:: url_to_id(url: str) -> uuid.UUID | None Extract the blob ID from a URL. Currently, this checks for a UUID at the end of a URL. In the future, it might check if the URL refers to this server. :param url: a URL previously generated by `blobdata_to_url`. :return: the UUID blob ID extracted from the URL.