# Architecture
> This bundle contains all pages in the Architecture section.
> Source: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture ===

# Architecture

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](https://www.union.ai/docs/v2/union/deployment/selfmanaged/section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

This section covers the architecture of the Union.ai data plane.
It provides an overview of the components and their interactions within the system.
Understanding the architecture is crucial for effectively deploying and managing your Union.ai cluster.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/overview ===

# Overview

The Union.ai architecture consists of two components, referred to as planes — the control plane and the data plane.

![](../../../_static/images/deployment/architecture.svg)

## Control plane

The control plane:
  * Runs within the Union.ai AWS account.
  * Provides the user interface through which users can access authentication, authorization, observation, and management functions.
  * Is responsible for placing executions onto data plane clusters and performing other cluster control and management functions.

## Data plane

Union.ai operates one control plane for each supported region, which supports all data planes within that region. You can choose the region in which to locate your data plane. Currently, Union.ai supports the `us-west`, `us-east`, `eu-west`, and `eu-central` regions, and more are being added.

### Data plane nodes

Worker nodes are responsible for executing your workloads. You have full control over the configuration of your worker nodes. When worker nodes are not in use, they automatically scale down to the configured minimum.

## Union.ai operator

The Union.ai hybrid architecture lets you maintain ultimate ownership and control of your data and compute infrastructure while enabling Union.ai to handle the details of managing that infrastructure.

Management of the data plane is mediated by a dedicated operator (the Union.ai operator) resident on that plane.
This operator is designed to perform its functions with only the very minimum set of required permissions.
It allows the control plane to spin up and down clusters and provides Union.ai's support engineers with access to system-level logs and the ability to apply changes as per customer requests.
It _does not_ provide direct access to secrets or data.

In addition, communication is always initiated by the Union.ai operator in the data plane toward the Union.ai control plane, not the other way around.
This further enhances the security of your data plane.

Union.ai is SOC-2 Type 2 certified. A copy of the audit report is available upon request.

## Registry data

Registry data is comprised of:

* Names of workflows, tasks, launch plans, and artifacts
* Input and output types for workflows and tasks
* Execution status, start time, end time, and duration of workflows and tasks
* Version information for workflows, tasks, launchplans, and artifacts
* Artifact definitions

This type of data is stored in the control plane and is used to manage the execution of your workflows.
This does not include any workflow or task code, nor any data that is processed by your workflows or tasks.

## Execution data

Execution data is comprised of::

* Event data
* Workflow inputs
* Workflow outputs
* Data passed between tasks (task inputs and outputs)

This data is divided into two categories: *raw data* and *literal data*.

### Raw data

Raw data is comprised of:

* Files and directories
* Dataframes
* Models
* Python-pickled types

These are passed by reference between tasks and are always stored in an object store in your data plane.
This type of data is read by (and may be temporarily cached) by the control plane as needed, but is never stored there.

### Literal data

* Primitive execution inputs (int, string... etc.)
* JSON-serializable dataclasses

These are passed by value, not by reference, and may be stored in the Union.ai control plane.

## Data privacy

If you are concerned with maintaining strict data privacy, be sure not to pass private information in literal form between tasks.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/kubernetes-rbac ===

# Kubernetes Access Controls

Union's data plane runs entirely within your Kubernetes cluster. This page documents the Kubernetes RBAC configuration
applied by the https://github.com/unionai/helm-charts/tree/main/charts/dataplane — including service account configuration, 
namespace-scoped Roles, and cluster-wide ClusterRoles for each component.

## Service account

By default, all data plane components share a single Kubernetes service account: `union-system`. This service account is configured through the `commonServiceAccount` Helm value and is used by the operator, executor, proxy, webhook, and FluentBit.

Users can disable the common service account and configure per-component service accounts instead. When `commonServiceAccount` is disabled, each component falls back to its own service account (for example, `operator-system` for the operator, `fluentbit-system` for FluentBit). Refer to the [dataplane Helm chart reference](https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference/dataplane) for the full set of per-component service account values.

See the [dataplane helm charts](https://github.com/unionai/helm-charts/tree/main/charts/dataplane) for the full set of Roles and ClusterRoles.

## Standard mode vs. low-privilege mode

The data plane supports two RBAC modes:

| Mode | RBAC scope | Use case |
|------|-----------|----------|
| **Standard** (default) | ClusterRoles + namespace Roles | Multi-namespace deployments, full feature set |
| **Low-privilege** (`low_privilege: true`) | Namespace-scoped Roles only | Single-namespace deployments, restricted environments |

Choose low-privilege mode when your cluster policies prohibit ClusterRoles (e.g. OPA Gatekeeper, Kyverno), when Union is
a tenant on a shared cluster, or when compliance requires minimizing blast radius to a single namespace. The tradeoff is
that multi-namespace workflow execution, automatic namespace provisioning (ClusterResourceSync), cluster-wide monitoring,
and usage collection are disabled.

In low-privilege mode, the chart automatically:
- Replaces ClusterRoles with namespace-scoped Roles
- Limits resource sync, executor, and monitoring to the release namespace
- Disables features that require cluster-wide access (e.g. ClusterResourceSync and OpenCost. Both require cluster-wide access to function — OpenCost to aggregate spend across all namespaces, and ClusterResourceSync to propagate configs and RBAC into user namespaces.)

## Namespace-scoped Roles

##### `proxy-system-secret`
- Scoped to `union` namespace
- Permissions on secrets: get, list, create, update, delete

##### `operator-system`
- Scoped to `union` namespace
- Permissions on secrets and deployments: get, list, watch, create, update

##### `union-operator-admission` (for webhook)
- Scoped to `union` namespace
- Permissions on secrets: get, create

## ClusterRoles (standard mode only)

> [!NOTE] Low-privilege mode
> The ClusterRoles below are **not created** in low-privilege mode. Equivalent namespace-scoped Roles are created instead.

### Metrics and Monitoring

##### `release-name-kube-state-metrics`

- **Purpose**: Collects metrics from Kubernetes resources
- **Access Pattern**: Read-only (`list`, `watch`) to numerous resources across multiple API groups
- **Scope**: Comprehensive — covers core resources, workloads, networking, storage, and authentication

##### `prometheus-operator`
- **Access**: Full control (`*`) over Prometheus monitoring resources
- **Key Permissions**:
  - Complete access to monitoring.coreos.com API group resources
  - Full access to statefulsets, configmaps, secrets
  - Pod management (list, delete)
  - Service/endpoint management
  - Read-only for nodes, namespaces, ingresses

##### `union-operator-prometheus`
- **Access**: Read-only access to metrics sources
- **Resources**: nodes, services, endpoints, pods, endpointslices, ingresses
- **Special**: Access to `/metrics` and `/metrics/cadvisor` endpoints

### Resource Management

##### `clustersync-resource`
- **Access**: Full control (`*`) over core and RBAC resources
- **Resources**:
  - Core: configmaps, namespaces, pods, resourcequotas, secrets, services, serviceaccounts, podtemplates
  - RBAC: roles, rolebindings, clusterrolebindings
- **API Groups**: `""` (core) and `rbac.authorization.k8s.io`

##### `proxy-system`
- **Access**: Read-only (`get`, `list`, `watch`)
- **Resources**: events, flyteworkflows, pods/log, pods, rayjobs, resourcequotas

### Workflow Management

##### `operator-system`
- **Access**: Full control over Flyte workflows, CRUD for core resources
- **Resources**:
  - Full access to flyteworkflows
  - Management of pods, configmaps, resourcequotas, podtemplates, nodes
  - Access to `/metrics` endpoint

##### `flytepropeller-webhook-role`
- **Access**: Get, create, update, patch
- **Resources**: mutatingwebhookconfigurations, secrets, pods, replicasets/finalizers

##### `flytepropeller-role`
- **Access**: Varied per resource type
- **Key Permissions**:
  - Read-only for pods
  - Event management
  - CRD management
  - Full control over flyteworkflows including finalizers

## Service Access

### `operator/operator-proxy`
Service that provides access to both cluster resources and cloud provider APIs, particularly focused on compute resource management.

#### Kubernetes Resources

##### Core Resources
- Pods: Access via informers to monitor and manage pod lifecycle.
- Nodes: Access to retrieve node information.
- ResourceQuotas: Read access.
- ConfigMaps: Access for configuration management
- Secrets: Access for credentials storage
- Namespaces: Referenced in container/pod identification contexts

##### Custom Resources
- FlyteWorkflows: Management of v1alpha1.FlyteWorkflow resources
- Kueue Resources (optional): Access to ResourceFlavor, ClusterQueue, and other queue resources
- Karpenter NodePools (optional): For AWS-based compute resource management

##### Cloud Provider Resources
- Object Storage: Read/write operations to cloud storage buckets

##### Authentication and Configuration
- OAuth: Uses app ID for authentication with Union cloud services
- Service Account Roles: Configured via UserRoleKey and UserRole
- Cluster Information: Access to cluster metadata and metrics

### `FlytePropeller/PropellerWebhook`
Kubernetes operator that executes Flyte graphs natively on Kubernetes. The webhook runs as a separate deployment with configurable certificate management (Helm-generated, cert-manager, external, or legacy).

#### Kubernetes Resources
- Manages pod creation for executions
- Secret injection
- MutatingWebhookConfiguration management (standard mode only; disabled in low-privilege mode)

#### Custom Resources
- FlyteWorkflows: Management of v1alpha1.FlyteWorkflow resources

