Overview
This document will describe Akri's components. The word "resource" is used to describe what is being searched for and ultimately utilized. Resources offer services. For example, they can be USB or IP cameras, which serve video frames, or GPUs, which provide computation. They can be locally attached, embedded, or remotely accessible to worker nodes, such as USB devices, GPUs, and IP cameras, respectively.

How Akri Works

Akri's architecture is made up of five key components: two custom resources, Discovery Handlers, an Agent (device plugin implementation), and a custom Controller. The first custom resource, the Akri Configuration, is where you name it. This tells Akri what kind of device it should look for. At this point, Akri finds it! Akri's Discovery Handlers look for the device and inform the Agent of discovered devices. The Agent then creates Akri's second custom resource, the Akri Instance, to track the availability and usage of the device. Having found your device, the Akri Controller helps you use it. It sees each Akri Instance (which represents a leaf device) and deploys a ("broker") Pod that knows how to connect to the resource and utilize it.

Custom Resource Definitions

There are two Akri CRDs:
  1. 1.
    Configuration
  2. 2.
    Instance

Akri Configuration CRD

The configuration of Akri is enabled by the Configuration CRD. Akri users will create Configurations to describe what resources should be discovered and what pod should be deployed on the nodes that discover a resource. Take a look at the Akri Configuration CRD. It specifies what components all Configurations must have, including the following:
  • the desired discovery protocol used for finding resources, i.e. ONVIF or udev.
  • a capacity (spec.capacity) that defines the maximum number of nodes that may schedule workloads on this resource.
  • a PodSpec (spec.brokerPodSpec) that defines the "broker" pod that will be scheduled to each of these reported resources.
  • a ServiceSpec (spec.instanceServiceSpec) that defines the service that provides a single stable endpoint to access each individual resource's set of broker pods.
  • a ServiceSpec (spec.configurationServiceSpec) that defines the service that provides a single stable endpoint to access the set of all brokers for all resources associated with the Configuration.
Akri has already provided two Configurations, one for discovering IP cameras using the ONVIF protocol and the other for discovering USB cameras via udev. Let's look at an example ONVIF Configuration yaml. You can see it specifies the protocol ONVIF, an image for the broker pod, a capacity of 5, and two Kubernetes services. In this case, the broker pod is a sample frame server we have provided. To get only the frames from a specific camera, a user could point an application at the Instance service, while the Configuration service provides the frames from all the cameras.The ONVIF Configuration can be customized using Helm. When installing the ONVIF Configuration to your Akri enabled cluster, you can specify the values you want to be inserted into the ONVIF Configuration template. Learn more about deploying the ONVIF sample here.

Akri Instance CRD

Each Instance represents an individual resource that is visible to the cluster. So, if there are 5 IP cameras visible to the cluster, there will be 5 Instances. Akri coordination and resource sharing is enabled by the Instance CRD. These instances store internal Akri state and are not intended to be edited by users. For a more in-depth understanding on how resource sharing is accomplished, see Resource Sharing In-depth.

Agent

The Akri Agent implements Kubernetes Device-Plugins for discovered resources.
The basic flow of the Akri Agent is:
  1. 1.
    Watch for Configuration changes to determine what resources to search for
  2. 2.
    Monitor resource availability (as edge devices may come and go) to determine what resources to advertise
  3. 3.
    Inform Kubernetes of resource health/availability as it changes
This basic flow combined with the state stored in the Instance allows multiple nodes to share a resource while respecting the limitations defined by Configuration.capacity.
For a more in-depth understanding, see Agent In-depth.

Discovery Handlers

A Discovery Handlers discover devices around the cluster, whether connected to Nodes (ie USB sensors), embedded in Nodes (ie GPUs), or on the network (ie IP cameras) and report them to the Agent. They are oftentimes protocol implementations for discovering a set of devices, whether a network protocol like OPC UA or a proprietary protocol. Discovery Handlers implement the DiscoveryHandler service defined in discovery.proto. In order to be utilized, a Discovery Handler must register with the Agent, which hosts the Registration service defined in discovery.proto.
To get started creating a Discovery Handler, see Discovery Handler development.

Controller

The Akri controller serves two purposes:
  1. 1.
    Handle (create and/or delete) the Pods & Services that enable resource availability
  2. 2.
    Ensure that Instances are aligned to the cluster state at any given moment
To achieve these goals, the basic flow of the controller is: 1. Watch for Instance changes to determine what Pods and Services should exist 1. Watch for Nodes that are contained in Instances that no longer exist
This basic flow allows the Akri controller to ensure that protocol brokers and Kubernetes Services are running on all nodes exposing desired resources while respecting the limitations defined by Configuration.capacity.
For a more in-depth understanding, see Controller In-depth.

Akri Flow - In Depth

For the sake of this example, some content has been excluded from the Pod, Configuration and Instances shown below.
  1. 1.
    Operator applies a Configuration with a capacity of 3 to the single node cluster.
    1
    kind: Configuration
    2
    metadata:
    3
    name: akri-<protocolA>
    4
    spec:
    5
    discoveryHandler:
    6
    name: protocolA
    7
    discovery_details: {}
    8
    brokerPodSpec:
    9
    containers:
    10
    - name: custom-broker
    11
    image: "ghcr.io/…"
    12
    # ...
    13
    capacity: 3
    Copied!
  2. 2.
    The Akri Agent sees the Configuration and discovers a leaf device using the protocol specified in the Configuration. It creates a device plugin for that leaf device and registers it with the kubelet. When creating the device plugin, it tells the kubelet to set connection information for that specific device and additional metadata from a Configuration's brokerProperties as environment variables in all Pods that request this device's resource. This information is also set in the brokerProperties section of the Instance the Agent creates to represent the discovered leaf device. In the Instance, the Agent also lists itself as a node that can access the device under nodes. Note how Instance has 3 available deviceUsage slots, since capacity was set to 3 and no brokers have been scheduled to the leaf device yet.
    1
    kind: Instance
    2
    metadata:
    3
    name: akri-<protocolA>-<hash>
    4
    spec:
    5
    configurationName: akri-<protocolA>
    6
    shared: true
    7
    deviceUsage:
    8
    akri-<protocolA>-<hash>-0: ""
    9
    akri-<protocolA>-<hash>-1: ""
    10
    akri-<protocolA>-<hash>-2: ""
    11
    brokerProperties:
    12
    BROKER_ENV_VAR_1: <value>
    13
    BROKER_ENV_VAR_N: <value>
    14
    nodes:
    15
    - "<this-node>"
    Copied!
  3. 3.
    The Controller is notified by the API Server of Instance changes. It is informed that a new Instance has been created. It schedules a pod to one of the nodes on the Instance’s nodes list, adding the Instance’s name as a resource limit of the pod. Note that the pod is currently in pending state.
    1
    kind: Pod
    2
    metadata:
    3
    labels:
    4
    app: akri-<protocolA>-<hash>-pod
    5
    controller: akri.sh
    6
    akri.sh/configuration: akri-<protocolA>
    7
    akri.sh/instance: akri-<protocolA>-<hash>
    8
    akri.sh/target-node: <this-node>
    9
    name: akri-<protocolA>-<hash>-pod
    10
    spec:
    11
    affinity:
    12
    nodeAffinity:
    13
    requiredDuringSchedulingIgnoredDuringExecution:
    14
    nodeSelectorTerms:
    15
    - matchFields:
    16
    - key: metadata.name
    17
    operator: In
    18
    values:
    19
    - <this-node>
    20
    containers:
    21
    image: ghcr.io/…
    22
    name: custom-broker
    23
    resources:
    24
    limits:
    25
    akri.sh/akri-<protocolA>-<hash>: "1"
    26
    requests:
    27
    akri.sh/akri-<protocolA>-<hash>: "1"
    28
    status:
    29
    # ...
    30
    phase: Pending
    Copied!
  4. 4.
    The kubelet on the selected node sees the scheduled pod and resource limit. It checks to see if the resource is available by calling allocate on the device plugin running in the Agent for the requested leaf device. When calling allocate, the kubelet requests a specific deviceUsage slot. Let's say the kubelet requested akri-<protocolA>-<hash>-1. The leaf device's device plugin checks to see that the requested deviceUsage slot has not been taken by another node. If it is available, it reserves that deviceUsage slot for this node (as shown below) and returns true. In the allocate response, the Agent also tells kubelet to mount the Instance.brokerProperties as environment variables in the broker Pod.
    1
    kind: Instance
    2
    metadata:
    3
    name: akri-<protocolA>-<hash>
    4
    spec:
    5
    configurationName: akri-<protocolA>
    6
    shared: true
    7
    deviceUsage:
    8
    akri-<protocolA>-<hash>-0: ""
    9
    akri-<protocolA>-<hash>-1: "<this-node>"
    10
    akri-<protocolA>-<hash>-2: ""
    11
    brokerProperties:
    12
    BROKER_ENV_VAR_1: <value>
    13
    BROKER_ENV_VAR_N: <value>
    14
    nodes:
    15
    - "<this-node>"
    Copied!
  5. 5.
    Allocate will return false if kubelet requests a deviceUsage slot that is already taken. See the resource sharing document for a better understanding on how this is resolved. Otherwise, upon a true result, the kubelet will run the pod. The broker is now running and has the information necessary to communicate with the specific device.
Last modified 1mo ago