# Self-managed deployment
> This bundle contains all pages in the Self-managed deployment section.
> Source: https://www.union.ai/docs/v2/union/deployment/selfmanaged/

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged ===

# Self-managed deployment

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

In a self-managed deployment, you operate the data plane on your own Kubernetes infrastructure.
Union.ai runs the control plane, but you manage the cluster, upgrades, and operational aspects of the data plane yourself.
Union.ai has no access to your cluster, providing the highest level of data isolation.

## Getting started

1. Review the [architecture](./architecture/_index) to understand the control plane, data plane operators, and security model.
2. Check the **Self-managed deployment > Cluster recommendations** for Kubernetes version, networking, and IP planning requirements.
3. Set up your data plane on your cloud provider:
   - [Generic Kubernetes](./selfmanaged-generic/_index) (on-premise or any S3-compatible environment)
   - [AWS](./selfmanaged-aws/_index)
   - [GCP](./selfmanaged-gcp/_index)
   - [Azure](./selfmanaged-azure/_index)
   - [OCI](./selfmanaged-oci/_index)
   - [CoreWeave](./selfmanaged-coreweave/_index)

## Configuration

After initial setup, configure platform features on your cluster:

- **Self-managed deployment > Advanced Configurations > Authentication**
- **Self-managed deployment > Advanced Configurations > Image Builder**
- **Self-managed deployment > Advanced Configurations > Multiple Clusters**
- **Self-managed deployment > Advanced Configurations > Configuring Service and Worker Node Pools**
- **Self-managed deployment > Advanced Configurations > Monitoring**
- **Self-managed deployment > Advanced Configurations > Persistent logs**
- **Self-managed deployment > Advanced Configurations > Data retention policies**
- **Self-managed deployment > Advanced Configurations > Namespace mapping**
- **Self-managed deployment > Advanced Configurations > Secrets**

## Reference

- [Helm chart reference](./helm-chart-reference/_index) for available chart values
- **Self-managed deployment > Architecture > Kubernetes Access Controls** for RBAC configuration details

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture ===

# Architecture

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

This section covers the architecture of the Union.ai data plane.
It provides an overview of the components and their interactions within the system.
Understanding the architecture is crucial for effectively deploying and managing your Union.ai cluster.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/overview ===

# Overview

The Union.ai architecture consists of two components, referred to as planes — the control plane and the data plane.

![](../../../_static/images/deployment/architecture.svg)

## Control plane

The control plane:
  * Runs within the Union.ai AWS account.
  * Provides the user interface through which users can access authentication, authorization, observation, and management functions.
  * Is responsible for placing executions onto data plane clusters and performing other cluster control and management functions.

## Data plane

Union.ai operates one control plane for each supported region, which supports all data planes within that region. You can choose the region in which to locate your data plane. Currently, Union.ai supports the `us-west`, `us-east`, `eu-west`, and `eu-central` regions, and more are being added.

### Data plane nodes

Worker nodes are responsible for executing your workloads. You have full control over the configuration of your worker nodes. When worker nodes are not in use, they automatically scale down to the configured minimum.

## Union.ai operator

The Union.ai hybrid architecture lets you maintain ultimate ownership and control of your data and compute infrastructure while enabling Union.ai to handle the details of managing that infrastructure.

Management of the data plane is mediated by a dedicated operator (the Union.ai operator) resident on that plane.
This operator is designed to perform its functions with only the very minimum set of required permissions.
It allows the control plane to spin up and down clusters and provides Union.ai's support engineers with access to system-level logs and the ability to apply changes as per customer requests.
It _does not_ provide direct access to secrets or data.

In addition, communication is always initiated by the Union.ai operator in the data plane toward the Union.ai control plane, not the other way around.
This further enhances the security of your data plane.

Union.ai is SOC-2 Type 2 certified. A copy of the audit report is available upon request.

## Registry data

Registry data is comprised of:

* Names of workflows, tasks, launch plans, and artifacts
* Input and output types for workflows and tasks
* Execution status, start time, end time, and duration of workflows and tasks
* Version information for workflows, tasks, launchplans, and artifacts
* Artifact definitions

This type of data is stored in the control plane and is used to manage the execution of your workflows.
This does not include any workflow or task code, nor any data that is processed by your workflows or tasks.

## Execution data

Execution data is comprised of::

* Event data
* Workflow inputs
* Workflow outputs
* Data passed between tasks (task inputs and outputs)

This data is divided into two categories: *raw data* and *literal data*.

### Raw data

Raw data is comprised of:

* Files and directories
* Dataframes
* Models
* Python-pickled types

These are passed by reference between tasks and are always stored in an object store in your data plane.
This type of data is read by (and may be temporarily cached) by the control plane as needed, but is never stored there.

### Literal data

* Primitive execution inputs (int, string... etc.)
* JSON-serializable dataclasses

These are passed by value, not by reference, and may be stored in the Union.ai control plane.

## Data privacy

If you are concerned with maintaining strict data privacy, be sure not to pass private information in literal form between tasks.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/architecture/kubernetes-rbac ===

# Kubernetes Access Controls

Union's data plane runs entirely within your Kubernetes cluster. This page documents the Kubernetes RBAC configuration
applied by the https://github.com/unionai/helm-charts/tree/main/charts/dataplane — including service account configuration, 
namespace-scoped Roles, and cluster-wide ClusterRoles for each component.

## Service account

By default, all data plane components share a single Kubernetes service account: `union-system`. This service account is configured through the `commonServiceAccount` Helm value and is used by the operator, executor, proxy, webhook, and FluentBit.

Users can disable the common service account and configure per-component service accounts instead. When `commonServiceAccount` is disabled, each component falls back to its own service account (for example, `operator-system` for the operator, `fluentbit-system` for FluentBit). Refer to the [dataplane Helm chart reference](../helm-chart-reference/dataplane) for the full set of per-component service account values.

See the [dataplane helm charts](https://github.com/unionai/helm-charts/tree/main/charts/dataplane) for the full set of Roles and ClusterRoles.

## Standard mode vs. low-privilege mode

The data plane supports two RBAC modes:

| Mode | RBAC scope | Use case |
|------|-----------|----------|
| **Standard** (default) | ClusterRoles + namespace Roles | Multi-namespace deployments, full feature set |
| **Low-privilege** (`low_privilege: true`) | Namespace-scoped Roles only | Single-namespace deployments, restricted environments |

Choose low-privilege mode when your cluster policies prohibit ClusterRoles (e.g. OPA Gatekeeper, Kyverno), when Union is
a tenant on a shared cluster, or when compliance requires minimizing blast radius to a single namespace. The tradeoff is
that multi-namespace workflow execution, automatic namespace provisioning (ClusterResourceSync), cluster-wide monitoring,
and usage collection are disabled.

In low-privilege mode, the chart automatically:
- Replaces ClusterRoles with namespace-scoped Roles
- Limits resource sync, executor, and monitoring to the release namespace
- Disables features that require cluster-wide access (e.g. ClusterResourceSync and OpenCost. Both require cluster-wide access to function — OpenCost to aggregate spend across all namespaces, and ClusterResourceSync to propagate configs and RBAC into user namespaces.)

## Namespace-scoped Roles

##### `proxy-system-secret`
- Scoped to `union` namespace
- Permissions on secrets: get, list, create, update, delete

##### `operator-system`
- Scoped to `union` namespace
- Permissions on secrets and deployments: get, list, watch, create, update

##### `union-operator-admission` (for webhook)
- Scoped to `union` namespace
- Permissions on secrets: get, create

## ClusterRoles (standard mode only)

> [!NOTE] Low-privilege mode
> The ClusterRoles below are **not created** in low-privilege mode. Equivalent namespace-scoped Roles are created instead.

### Metrics and Monitoring

##### `release-name-kube-state-metrics`

- **Purpose**: Collects metrics from Kubernetes resources
- **Access Pattern**: Read-only (`list`, `watch`) to numerous resources across multiple API groups
- **Scope**: Comprehensive — covers core resources, workloads, networking, storage, and authentication

##### `prometheus-operator`
- **Access**: Full control (`*`) over Prometheus monitoring resources
- **Key Permissions**:
  - Complete access to monitoring.coreos.com API group resources
  - Full access to statefulsets, configmaps, secrets
  - Pod management (list, delete)
  - Service/endpoint management
  - Read-only for nodes, namespaces, ingresses

##### `union-operator-prometheus`
- **Access**: Read-only access to metrics sources
- **Resources**: nodes, services, endpoints, pods, endpointslices, ingresses
- **Special**: Access to `/metrics` and `/metrics/cadvisor` endpoints

### Resource Management

##### `clustersync-resource`
- **Access**: Full control (`*`) over core and RBAC resources
- **Resources**:
  - Core: configmaps, namespaces, pods, resourcequotas, secrets, services, serviceaccounts, podtemplates
  - RBAC: roles, rolebindings, clusterrolebindings
- **API Groups**: `""` (core) and `rbac.authorization.k8s.io`

##### `proxy-system`
- **Access**: Read-only (`get`, `list`, `watch`)
- **Resources**: events, flyteworkflows, pods/log, pods, rayjobs, resourcequotas

### Workflow Management

##### `operator-system`
- **Access**: Full control over Flyte workflows, CRUD for core resources
- **Resources**:
  - Full access to flyteworkflows
  - Management of pods, configmaps, resourcequotas, podtemplates, nodes
  - Access to `/metrics` endpoint

##### `flytepropeller-webhook-role`
- **Access**: Get, create, update, patch
- **Resources**: mutatingwebhookconfigurations, secrets, pods, replicasets/finalizers

##### `flytepropeller-role`
- **Access**: Varied per resource type
- **Key Permissions**:
  - Read-only for pods
  - Event management
  - CRD management
  - Full control over flyteworkflows including finalizers

## Service Access

### `operator/operator-proxy`
Service that provides access to both cluster resources and cloud provider APIs, particularly focused on compute resource management.

#### Kubernetes Resources

##### Core Resources
- Pods: Access via informers to monitor and manage pod lifecycle.
- Nodes: Access to retrieve node information.
- ResourceQuotas: Read access.
- ConfigMaps: Access for configuration management
- Secrets: Access for credentials storage
- Namespaces: Referenced in container/pod identification contexts

##### Custom Resources
- FlyteWorkflows: Management of v1alpha1.FlyteWorkflow resources
- Kueue Resources (optional): Access to ResourceFlavor, ClusterQueue, and other queue resources
- Karpenter NodePools (optional): For AWS-based compute resource management

##### Cloud Provider Resources
- Object Storage: Read/write operations to cloud storage buckets

##### Authentication and Configuration
- OAuth: Uses app ID for authentication with Union cloud services
- Service Account Roles: Configured via UserRoleKey and UserRole
- Cluster Information: Access to cluster metadata and metrics

### `FlytePropeller/PropellerWebhook`
Kubernetes operator that executes Flyte graphs natively on Kubernetes. The webhook runs as a separate deployment with configurable certificate management (Helm-generated, cert-manager, external, or legacy).

#### Kubernetes Resources
- Manages pod creation for executions
- Secret injection
- MutatingWebhookConfiguration management (standard mode only; disabled in low-privilege mode)

#### Custom Resources
- FlyteWorkflows: Management of v1alpha1.FlyteWorkflow resources

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/cluster-recommendations ===

# Cluster recommendations

Union.ai is capable of running on any Kubernetes cluster.
This includes managed Kubernetes services such as Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and Amazon Elastic Kubernetes Service (EKS), as well as self-managed Kubernetes clusters.

While many configurations are supported, we have some recommendations to ensure the best performance and reliability of your Union deployment.

## Kubernetes Versions

We recommend running Kubernetes versions that are [actively supported by the Kubernetes community](https://kubernetes.io/releases/).  This
typically means running one of the most recent three minor versions.  For example, if the most recent version is 1.32, we recommend
running 1.32, 1.31, or 1.30.

## Networking Requirements

Many Container Network Interface (CNI) plugins require planning for IP address allocation capacity.
For example, [Amazon's VPC CNI](https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html) and [GKE's Dataplane v2](https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2)
allocate IP addresses to Kubernetes Pods out of one or more of your VPC's subnets.
If you are using one of these CNI plugins, you should ensure that your VPC's subnets have enough available IP addresses to support the number of concurrent tasks you expect to run.

### VPC and subnet sizing

We recommend using at least a `/16` CIDR range (65,536 addresses) for the overall VPC. Within that range, size your subnets according to their role:

| Subnet type | Recommended size | Purpose |
|-------------|-----------------|---------|
| **Private subnets** (worker nodes) | `/18` per AZ (16,384 addresses) | Pods receive IPs from these subnets. Size for your peak concurrent task count — each running task pod consumes at least one IP. |
| **Public subnets** (load balancers) | `/24` per AZ (256 addresses) | Only needed for internet-facing load balancers and NAT Gateways. Minimal IP consumption. |

As a rule of thumb, you should have at least 1 available IP address for each task you expect to run concurrently.

### Public vs. private subnets

We recommend running worker nodes in **private subnets** (no direct internet ingress). This is the default for managed Kubernetes services like EKS, GKE, and AKS. Public subnets are only needed for internet-facing load balancers or bastion hosts.

A typical layout per availability zone:

```
VPC (/16)
├── Public subnet  (/24) — NAT Gateway, load balancers
└── Private subnet (/18) — Worker nodes, pods
```

### NAT Gateway requirements

Worker nodes in private subnets need outbound internet access to pull container images from public registries (e.g. Docker Hub, ECR Public, ghcr.io) and to communicate with the Union control plane. This requires a **NAT Gateway** (or equivalent) in each availability zone's public subnet.

| Cloud | Service | Notes |
|-------|---------|-------|
| AWS | [NAT Gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html) | One per AZ for high availability. `eksctl` creates these automatically with the `--managed` flag. |
| GCP | [Cloud NAT](https://cloud.google.com/nat/docs/overview) | Attach to the Cloud Router for your VPC. Private GKE clusters require this. |
| Azure | [NAT Gateway](https://learn.microsoft.com/en-us/azure/nat-gateway/nat-overview) | Associate with the AKS subnet. Alternatively, AKS `outboundType: loadBalancer` (default) provides outbound access via the LB. |

> [!NOTE] If you use a fully private cluster with no outbound internet access, you must configure private endpoints or mirrors for all container registries and the Union control plane.

## Service accounts

The Union.ai data plane uses a single Kubernetes service account, `union-system`, shared by all platform components (operator, executor, webhook, proxy, and FluentBit). This service account needs cloud provider credentials to access:

- **Object storage** (S3, GCS, or Azure Blob Storage) — read/write workflow execution data (task inputs/outputs, bundled code -in fast registration bucket-).
- **Container registry** (ECR, Artifact Registry, or ACR) — pull task container images; push images when Image Builder is enabled.

See the cloud-specific setup pages for details on configuring this service account:
[AWS](./selfmanaged-aws/_index), [GCP](./selfmanaged-gcp/_index), [Azure](./selfmanaged-azure/_index).

> [!NOTE] Common service account
> In previous versions, each component had its own service account. The consolidated `union-system` service account simplifies IAM configuration — you only need to bind cloud permissions to a single identity.

# Performance Recommendations

## Node Pools

It is recommended but not required to use separate node pools for the Union services and the Union worker pods.  This allows you to
guard against resource contention between Union services and other tasks running in your cluster.  You can find additional information
in the [Configuring Node Pools](./configuration/node-pools) section.

By default, the Union installation request the following resources:

|          | CPU (vCPUs)| Memory (GiB) |
|----------|------------|--------------|
| Requests |          14|          27.1|
| Limits   |          17|            32|

For GPU access, Union injects tolerations and label selectors to execution Pods.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-generic ===

# Data plane setup on generic Kubernetes

Union.ai's modular architecture allows for great flexibility and control.
You can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

> [!NOTE] These instructions cover installing Union.ai in an on-premise Kubernetes cluster.
> If you are installing at a cloud provider, use the cloud provider specific instructions: [AWS](../selfmanaged-aws/_index), [GCP](../selfmanaged-gcp/_index), [Azure](../selfmanaged-azure/_index), [OCI](../selfmanaged-oci/_index).

If you already have a Kubernetes cluster, S3-compatible object storage, a container registry, and credentials configured, skip directly to **Self-managed deployment > Data plane setup on generic Kubernetes > Deploy the dataplane**.

Otherwise, start with **Self-managed deployment > Data plane setup on generic Kubernetes > Prepare infrastructure** to set up the required resources.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-generic/prepare-infra ===

# Prepare infrastructure

This page walks you through creating the resources needed for a Union data plane on generic (on-premise) Kubernetes. If you already have these resources, skip to [Deploy the dataplane](../selfmanaged-generic/deploy-dataplane).

> [!NOTE] If you are installing at a cloud provider, use the cloud provider specific instructions: [AWS](../selfmanaged-aws/_index), [GCP](../selfmanaged-gcp/_index), [Azure](../selfmanaged-azure/_index), [OCI](../selfmanaged-oci/_index).

## Kubernetes Cluster

You need a Kubernetes cluster running one of the most recent three minor Kubernetes versions. See [Cluster Recommendations](../cluster-recommendations) for networking and node pool guidance.

If you don't already have a cluster, common options for provisioning one include:

- [kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/) — the standard Kubernetes bootstrap tool
- [k3s](https://k3s.io/) — lightweight Kubernetes distribution
- [RKE2](https://docs.rke2.io/) — Rancher's hardened Kubernetes distribution

Regardless of how you create your cluster, verify the following requirements are met:

- **Kubernetes version**: one of the most recent three minor versions. [Learn more](https://kubernetes.io/releases/version-skew-policy/).
- **Networking**: a CNI plugin is installed and functioning (e.g. Calico, Flannel, Cilium).
- **DNS**: CoreDNS is running in the cluster.
- **Storage**: a default StorageClass is configured for persistent volume claims.
- **Load balancer or ingress**: an ingress controller or load balancer is available for exposing services.

Union supports autoscaling and the use of spot (interruptible) instances if your infrastructure provides them.

## Object Storage

Each data plane uses S3-compatible object storage (such as [MinIO](https://min.io)) to store data used in workflow execution.
Union recommends the use of two buckets:

1. **Metadata bucket**: contains workflow execution data such as task inputs and outputs.
2. **Fast registration bucket**: contains local code artifacts copied into the Flyte task container at runtime when using `flyte deploy` or `flyte run --copy-style all`.

You can also choose to use a single bucket.

Create the buckets. For example, using the [MinIO Client (`mc`)](https://min.io/docs/minio/linux/reference/minio-mc.html):

```bash
# Set an alias for your MinIO server (if not already configured)
mc alias set myminio https://minio.example.com MINIO_ACCESS_KEY MINIO_SECRET_KEY

# Create the buckets
mc mb myminio/union-metadata
mc mb myminio/union-fast-reg
```

### CORS Configuration

To enable the [Code Viewer](../configuration/code-viewer) in the Union UI, configure a CORS policy on your bucket(s). This allows the UI to securely fetch code bundles directly from storage.

Save the following as `cors.json`:

```json
{
  "CORSRules": [
    {
      "AllowedOrigins": ["https://*.unionai.cloud"],
      "AllowedMethods": ["GET", "HEAD"],
      "AllowedHeaders": ["*"],
      "ExposeHeaders": ["ETag"],
      "MaxAgeSeconds": 3600
    }
  ]
}
```

Apply it to both buckets:

```bash
mc anonymous set-json cors.json myminio/union-metadata
mc anonymous set-json cors.json myminio/union-fast-reg
```

Consult your object storage provider's documentation for the equivalent configuration if you are not using MinIO.

### Data Retention

Union recommends using lifecycle policies on these buckets to manage storage costs. See [Data retention policy](../configuration/data-retention) for more information.

## Container Registry

You need a container registry accessible from your cluster for Image Builder to push and pull container images. Options include:

- A private [Docker Registry](https://docs.docker.com/registry/)
- [Harbor](https://goharbor.io/)
- Any OCI-compliant registry

For a basic private Docker Registry, you can start one with:

```bash
docker run -d -p 5000:5000 --restart=always --name registry registry:2
```

> [!NOTE] This runs an unauthenticated registry suitable for testing. For production, configure TLS and authentication. See the [Docker Registry documentation](https://docs.docker.com/registry/deploying/) for details.

Note the registry URL (e.g. `registry.example.com:5000/union`) — you will configure it in your Helm values.

## Identity & Access

On generic Kubernetes, Union authenticates to object storage and the container registry using static credentials (access key and secret key). These are configured in the generated values file during deployment.

### Storage credentials

Ensure the credentials you provide have read/write access to your storage bucket(s). If you are using MinIO, you can create a dedicated access key pair through the MinIO Console or the `mc` CLI:

```bash
# Create a new access key on your MinIO server
mc admin user add myminio union-service GENERATED_SECRET_KEY

# Attach a read/write policy for the Union buckets
mc admin policy attach myminio readwrite --user union-service
```

> [!NOTE] For production, create a scoped policy that limits access to only the Union buckets rather than using the built-in `readwrite` policy.

### Registry credentials

Ensure you have push/pull credentials for your container registry. The specifics depend on your registry choice (Docker Registry basic auth, Harbor robot accounts, etc.).

> [!NOTE] Worker pod authentication
> Worker pods (task executions) use the same storage credentials as the platform services. The credentials are injected into per-project namespaces via the cluster resource sync mechanism.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-generic/deploy-dataplane ===

# Deploy the dataplane

If you have not yet set up the required resources (Kubernetes cluster, object storage, container registry, credentials), see [Prepare infrastructure](../selfmanaged-generic/prepare-infra) first.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization (e.g. `https://your-org-name.us-east-2.unionai.cloud`).
* You have a cluster name provided by or coordinated with Union.
* You have a Kubernetes cluster, running one of the most recent three minor Kubernetes versions. [Learn more](https://kubernetes.io/releases/version-skew-policy/).
* Object storage provided by a vendor or an S3-compatible platform (such as [MinIO](https://min.io)), with CORS configured as described in [Prepare infrastructure](../selfmanaged-generic/prepare-infra).
* A container registry accessible from your cluster.

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the Union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider metal
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `metal` (bare metal / generic).

   * Save the secret that is displayed. Union does not store the credentials; rerunning the same command can be used to retrieve the secret later.

3. Update the generated values file with your infrastructure details:

   - Set `storage.endpoint` to your S3-compatible storage endpoint (e.g. your MinIO URL).
   - Set `storage.accessKey` and `storage.secretKey` to your storage credentials.
   - Set `storage.bucketName` and `storage.fastRegistrationBucketName` to your bucket name(s).
   - Set `storage.region` to the region of your storage provider.
   - The same credentials are also needed in `fluentbit.env` for log shipping.

4. Install the data plane Helm chart:

   ```bash
   helm upgrade --install union unionai/dataplane \
     -f <GENERATED_VALUES_FILE> \
     --namespace union \
     --create-namespace \
     --force-conflicts
   ```

5. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:

   ```bash
   uctl create apikey --keyName EAGER_API_KEY --org <YOUR_ORG_NAME>
   ```

6. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

7. Follow the [Quickstart](https://www.union.ai/docs/v2/union/user-guide/quickstart) to run your first workflow and verify your cluster is working correctly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-aws ===

# Data plane setup on AWS

Union.ai's modular architecture allows for great flexibility and control.
You can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

If you already have an EKS cluster, S3 buckets, ECR repository, and IAM role configured, skip directly to **Self-managed deployment > Data plane setup on AWS > Deploy the dataplane**.

Otherwise, start with **Self-managed deployment > Data plane setup on AWS > Prepare infrastructure** to set up the required AWS resources.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-aws/prepare-infra ===

# Prepare infrastructure

This page walks you through creating the AWS resources needed for a Union data plane. If you already have these resources, skip to [Deploy the dataplane](../selfmanaged-aws/deploy-dataplane).

## Environment variables

Set these variables before running the commands below. Customize the names if you are deploying multiple data planes in the same AWS account.

```bash
export AWS_REGION=us-east-2                          # AWS region for all resources
export CLUSTER_NAME=union-dataplane                  # EKS cluster name
export BUCKET_PREFIX=union-dataplane                 # prefix for S3 buckets (must be globally unique)
export ECR_REPO_NAME=${ECR_REPO_NAME}    # ECR repository name
export IAM_ROLE_NAME=union-system-role               # IAM role name
```

## EKS Cluster

You need an EKS cluster running one of the most recent three minor Kubernetes versions. See [Cluster Recommendations](../cluster-recommendations) for networking and node pool guidance.

If you don't already have a cluster, create one with `eksctl`:

```bash
eksctl create cluster \
  --name ${CLUSTER_NAME} \
  --region us-east-2 \
  --version 1.31 \
  --node-type m5.2xlarge \
  --nodes 3 \
  --with-oidc \
  --managed
```

> [!NOTE] The `--with-oidc` flag creates an IAM OIDC provider for the cluster, which is required for **Self-managed deployment > Data plane setup on AWS > Prepare infrastructure > IAM** below.

The following EKS add-ons are required and come pre-installed on managed clusters created with `eksctl`:
  - CoreDNS
  - Amazon VPC CNI
  - Kube-proxy

If you created your cluster through other means, verify they are installed:

```bash
aws eks list-addons --cluster-name ${CLUSTER_NAME} --region ${AWS_REGION}
```

Union supports Autoscaling and the use of spot (interruptible) instances.

## S3

Each data plane uses S3 buckets to store data used in workflow execution.
Union recommends the use of two S3 buckets:

1. **Metadata bucket**: contains workflow execution data such as task inputs and outputs.
2. **Code bundle/Fast registration bucket**: contains local code artifacts copied into the Flyte task container at runtime when using `flyte deploy` or `flyte run --copy-style all`.

You can also choose to use a single bucket.

Create the buckets:

```bash
aws s3api create-bucket \
  --bucket ${BUCKET_PREFIX}-metadata \
  --region ${AWS_REGION} \
  --create-bucket-configuration LocationConstraint=${AWS_REGION}

aws s3api create-bucket \
  --bucket ${BUCKET_PREFIX}-fast-reg \
  --region ${AWS_REGION} \
  --create-bucket-configuration LocationConstraint=${AWS_REGION}
```

> [!NOTE] If your region is `us-east-1`, omit the `--create-bucket-configuration` flag.

### CORS Configuration

To enable the [Code Viewer](../configuration/code-viewer) in the Union UI, configure a CORS policy on your buckets. This allows the UI to securely fetch code bundles directly from S3.

Save the following as `cors.json`:

```json
{
  "CORSRules": [
    {
      "AllowedHeaders": ["*"],
      "AllowedMethods": ["GET", "HEAD"],
      "AllowedOrigins": ["https://*.unionai.cloud"],
      "ExposeHeaders": ["ETag"],
      "MaxAgeSeconds": 3600
    }
  ]
}
```

Apply it to both buckets:

```bash
aws s3api put-bucket-cors --bucket ${BUCKET_PREFIX}-metadata --cors-configuration file://cors.json
aws s3api put-bucket-cors --bucket ${BUCKET_PREFIX}-fast-reg --cors-configuration file://cors.json
```

### Data Retention

Union recommends using Lifecycle Policy on these buckets to manage storage costs. See [Data retention policy](../configuration/data-retention) for more information.

## ECR

Create an [ECR private repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) for Image Builder to push and pull container images:

```bash
aws ecr create-repository \
  --repository-name ${ECR_REPO_NAME} \
  --region ${AWS_REGION} \
  --image-scanning-configuration scanOnPush=true
```

Note the repository URI from the output (e.g. `<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/${ECR_REPO_NAME}`) — you will reference it when configuring IAM permissions below.

## IAM

Create an IAM role that both the Union platform services and workflow task pods will use to access S3 and ECR. This role is assumed via [IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

### 1. Enable OIDC

If you created your cluster with `--with-oidc` above, this is already done. Otherwise, create an [IAM OIDC provider for your EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html#_create_oidc_provider_eksctl):

```bash
eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} --region ${AWS_REGION} --approve
```

Get the OIDC provider URL (you'll need it for the trust policy):

```bash
export OIDC_PROVIDER=$(aws eks describe-cluster \
  --region ${AWS_REGION} \
  --name ${CLUSTER_NAME} \
  --query "cluster.identity.oidc.issuer" \
  --output text | sed 's|https://||')

export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
```

### 2. Create the IAM role

Save the following trust policy as `trust-policy.json`:

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::$AWS_ACCOUNT_ID:oidc-provider/$OIDC_PROVIDER"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "$OIDC_PROVIDER:aud": "sts.amazonaws.com"
                },
                "StringLike": {
                    "$OIDC_PROVIDER:sub": "system:serviceaccount:*"
                }
            }
        }
    ]
}
```

> [!NOTE] Why `system:serviceaccount:*`?
> Union platform services run in the data plane namespace (e.g. `union`), but workflow task pods run in per-project namespaces (e.g. `union-health-monitoring-development`). Both need to assume this role to access S3 and ECR.

Substitute your values and create the role:

```bash
envsubst < trust-policy.json > /tmp/trust-policy.json

aws iam create-role \
  --role-name ${IAM_ROLE_NAME} \
  --assume-role-policy-document file:///tmp/trust-policy.json
```

### 3. Attach the S3 policy

Save as `s3-policy.json` (replace `<BUCKET_PREFIX>` with your actual prefix):

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3BucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:DeleteObject*",
                "s3:GetObject*",
                "s3:ListBucket",
                "s3:PutObject*"
            ],
            "Resource": [
                "arn:aws:s3:::<BUCKET_PREFIX>-metadata",
                "arn:aws:s3:::<BUCKET_PREFIX>-metadata/*",
                "arn:aws:s3:::<BUCKET_PREFIX>-fast-reg",
                "arn:aws:s3:::<BUCKET_PREFIX>-fast-reg/*"
            ]
        }
    ]
}
```

```bash
aws iam put-role-policy \
  --role-name ${IAM_ROLE_NAME} \
  --policy-name union-s3-access \
  --policy-document file://s3-policy.json
```

### 4. Attach the ECR policy

Save as `ecr-policy.json` (replace `<AWS_REGION>`, `<AWS_ACCOUNT_ID>`, and `<REPOSITORY>`):

```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ECRAuth",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        },
        {
            "Sid": "ECRReadWrite",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchCheckLayerAvailability",
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:DescribeImages",
                "ecr:PutImage",
                "ecr:InitiateLayerUpload",
                "ecr:UploadLayerPart",
                "ecr:CompleteLayerUpload"
            ],
            "Resource": "arn:aws:ecr:<AWS_REGION>:<AWS_ACCOUNT_ID>:repository/<REPOSITORY>"
        }
    ]
}
```

```bash
aws iam put-role-policy \
  --role-name ${IAM_ROLE_NAME} \
  --policy-name union-ecr-access \
  --policy-document file://ecr-policy.json
```

### 5. Configure the service account annotation

In your Helm values, annotate the `union-system` service account with the role ARN:

```yaml
commonServiceAccount:
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::<AWS_ACCOUNT_ID>:role/${IAM_ROLE_NAME}"
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-aws/deploy-dataplane ===

# Deploy the dataplane

If you have not yet set up the required AWS resources (EKS cluster, S3, ECR, IAM), see [Prepare infrastructure](../selfmanaged-aws/prepare-infra) first.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have an EKS cluster with OIDC enabled, running one of the most recent three minor K8s versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/)
* You have configured S3 bucket(s), ECR, and IAM role as described in [Prepare infrastructure](../selfmanaged-aws/prepare-infra).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider aws
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file `<org>-values.yaml` specific to the provider that you specify, in this case `aws`.

   * Save the secret that is displayed. Union does not store the credentials; rerunning the same command can be used to retrieve the secret later.

3. Update the generated values file with your infrastructure details:

   Using the [environment variables](../selfmanaged-aws/prepare-infra#environment-variables) from the prepare infrastructure step:

   - Set `global.AWS_ACCOUNT_ID` to your AWS account ID. You can retrieve it with `aws sts get-caller-identity --query Account --output text`.
   - Set `global.METADATA_BUCKET` to `${BUCKET_PREFIX}-metadata`.
   - Set `global.FAST_REGISTRATION_BUCKET` to `${BUCKET_PREFIX}-fast-reg`.
   - Set `global.BACKEND_IAM_ROLE_ARN` to `arn:aws:iam::${AWS_ACCOUNT_ID}:role/${IAM_ROLE_NAME}` (where `AWS_ACCOUNT_ID` is your 12-digit account ID).
   - Set `global.WORKER_IAM_ROLE_ARN` to the same value (or a separate role if you use distinct worker permissions).
   - Set `storage.bucketName` to `${BUCKET_PREFIX}-metadata`.
   - Set `storage.fastRegistrationBucketName` to `${BUCKET_PREFIX}-fast-reg`.
   - Set `storage.region` to `${AWS_REGION}`.
   - Set `commonServiceAccount.annotations."eks.amazonaws.com/role-arn"` to `arn:aws:iam::${AWS_ACCOUNT_ID}:role/${IAM_ROLE_NAME}`.
   - Set `imageBuilder.registryName` to `${ECR_REPO_NAME}` (defaults to `union-dataplane`; the chart auto-generates the full ECR URL from the account ID and region).

4. Install the data plane Helm chart:

   ```bash
   helm upgrade --install union unionai/dataplane \
     -f <GENERATED_VALUES_FILE> \
     --namespace union \
     --create-namespace \
     --force-conflicts
   ```

5. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:

   ```bash
   uctl create apikey --keyName EAGER_API_KEY --org <YOUR_ORG_NAME>
   ```

6. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

7. Follow the [Quickstart](https://www.union.ai/docs/v2/union/user-guide/quickstart) to run your first workflow and verify your cluster is working correctly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-gcp ===

# Data plane setup on GKE (GCP)

Union.ai's modular architecture allows for great flexibility and control.
You can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

If you already have a GKE cluster, GCS buckets, Artifact Registry repository, and Workload Identity configured, skip directly to **Self-managed deployment > Data plane setup on GKE (GCP) > Deploy the dataplane**.

Otherwise, start with **Self-managed deployment > Data plane setup on GKE (GCP) > Prepare infrastructure** to set up the required GCP resources.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-gcp/prepare-infra ===

# Prepare infrastructure

This page walks you through creating the GCP resources needed for a Union data plane. If you already have these resources, skip to [Deploy the dataplane](../selfmanaged-gcp/deploy-dataplane).

## Environment variables

Set these variables before running the commands below. Customize the names if you are deploying multiple data planes in the same GCP project.

```bash
export PROJECT_ID=my-project            # your GCP project ID
export REGION=us-central1               # GCP region for all resources
export CLUSTER_NAME=union-dataplane     # GKE cluster name
export BUCKET_PREFIX=union-dataplane    # prefix for GCS buckets (must be globally unique)
export AR_REPOSITORY=union-dataplane    # Artifact Registry repository name
export GSA_NAME=union-system            # Google Service Account name
```

## GKE Cluster

You need a GKE cluster running one of the most recent three minor Kubernetes versions. See [Cluster Recommendations](../cluster-recommendations) for networking and node pool guidance.

If you don't already have a cluster, create one with `gcloud`:

First, enable the required APIs:

```bash
gcloud services enable container.googleapis.com --project ${PROJECT_ID}
```

> [!NOTE] If the project has no default VPC network, create one before proceeding:
> ```bash
> gcloud compute networks create default --project ${PROJECT_ID} --subnet-mode=auto
> ```

```bash
gcloud container clusters create ${CLUSTER_NAME} \
  --project ${PROJECT_ID} \
  --region ${REGION} \
  --release-channel regular \
  --machine-type e2-standard-4 \
  --num-nodes 1 \
  --workload-pool ${PROJECT_ID}.svc.id.goog
```

> [!NOTE] The `--workload-pool` flag enables [GKE Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity), which is required for the **Self-managed deployment > Data plane setup on GKE (GCP) > Prepare infrastructure > Workload Identity** setup below.

The following GKE add-ons are required and come pre-installed on GKE clusters:
  - CoreDNS (kube-dns)
  - GKE networking (Dataplane V2 / Calico)
  - Kube-proxy

If you created your cluster through other means, verify that Workload Identity is enabled:

```bash
gcloud container clusters describe ${CLUSTER_NAME} \
  --region ${REGION} \
  --project ${PROJECT_ID} \
  --format="value(workloadIdentityConfig.workloadPool)"
```

Union supports Autoscaling and the use of preemptible (spot) instances.

### BuildKit node pool

Image Builder (BuildKit) requires 4 CPUs and 50Gi ephemeral storage, which can exceed what's allocatable on a standard `e2-standard-4` node when other pods are running. Add a dedicated node pool with a larger machine type and boot disk:

```bash
gcloud container node-pools create buildkit-pool \
  --cluster ${CLUSTER_NAME} \
  --region ${REGION} \
  --project ${PROJECT_ID} \
  --machine-type e2-standard-8 \
  --disk-size 200GB \
  --num-nodes 0 \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 2
```

## GCS

Each data plane uses GCS buckets to store data used in workflow execution.
Union recommends the use of two buckets:

1. **Metadata bucket**: contains workflow execution data such as task inputs and outputs.
2. **Fast registration bucket**: contains local code artifacts copied into the Flyte task container at runtime when using `flyte deploy` or `flyte run --copy-style all`.

You can also choose to use a single bucket.

Create the buckets:

```bash
gcloud storage buckets create gs://${BUCKET_PREFIX}-metadata \
  --project ${PROJECT_ID} \
  --location ${REGION}

gcloud storage buckets create gs://${BUCKET_PREFIX}-fast-reg \
  --project ${PROJECT_ID} \
  --location ${REGION}
```

### CORS Configuration

To enable the [Code Viewer](../configuration/code-viewer) in the Union UI, configure a CORS policy on your buckets. This allows the UI to securely fetch code bundles directly from GCS.

Save the following as `cors.json`:

```json
[
    {
        "origin": ["https://*.unionai.cloud"],
        "method": ["HEAD", "GET"],
        "responseHeader": ["ETag"],
        "maxAgeSeconds": 3600
    }
]
```

Apply it to both buckets:

```bash
gcloud storage buckets update gs://${BUCKET_PREFIX}-metadata --cors-file=cors.json
gcloud storage buckets update gs://${BUCKET_PREFIX}-fast-reg --cors-file=cors.json
```

### Data Retention

Union recommends using Lifecycle Policy on these buckets to manage storage costs. See [Data retention policy](../configuration/data-retention) for more information.

## Artifact Registry

Create an [Artifact Registry Docker repository](https://cloud.google.com/artifact-registry/docs/docker/store-docker-container-images#create) for Image Builder to push and pull container images:

```bash
gcloud artifacts repositories create ${AR_REPOSITORY} \
  --project ${PROJECT_ID} \
  --location ${REGION} \
  --repository-format docker \
  --description "Union Image Builder repository"
```

Note the repository path (`${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPOSITORY}`) -- you will reference it when configuring Workload Identity permissions below.

## Workload Identity

Union recommends using [GKE Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) to securely access GCP resources.

### 1. Create a Google Service Account

```bash
gcloud iam service-accounts create ${GSA_NAME} \
  --project ${PROJECT_ID} \
  --display-name "Union data plane service account"
```

### 2. Bind the GSA to the Kubernetes service account

Bind both the `union-system` and `union` Kubernetes service accounts in the `union` namespace to impersonate the Google Service Account:

```bash
gcloud iam service-accounts add-iam-policy-binding \
  ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
  --project ${PROJECT_ID} \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:${PROJECT_ID}.svc.id.goog[union/union-system]"

gcloud iam service-accounts add-iam-policy-binding \
  ${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com \
  --project ${PROJECT_ID} \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:${PROJECT_ID}.svc.id.goog[union/union]"
```

> [!NOTE] Why bind both `union/union-system` and `union/union`?
> Union platform services run under `union-system`, while task pods in the `union` namespace run under the `union` service account. Both need Workload Identity access to GCS.

### 3. Grant GCS access

```bash
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/storage.objectAdmin \
  --condition="expression=resource.name.startsWith('projects/_/buckets/${BUCKET_PREFIX}'),title=union-bucket-access"
```

Alternatively, grant the role on each bucket directly:

```bash
gcloud storage buckets add-iam-policy-binding gs://${BUCKET_PREFIX}-metadata \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/storage.objectAdmin

gcloud storage buckets add-iam-policy-binding gs://${BUCKET_PREFIX}-fast-reg \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/storage.objectAdmin
```

Also grant `legacyBucketReader` on each bucket. This is required for `storage.buckets.get` access, which the operator needs to verify the bucket exists at startup:

```bash
gcloud storage buckets add-iam-policy-binding gs://${BUCKET_PREFIX}-metadata \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/storage.legacyBucketReader

gcloud storage buckets add-iam-policy-binding gs://${BUCKET_PREFIX}-fast-reg \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/storage.legacyBucketReader
```

### 4. Grant Artifact Registry access

```bash
gcloud artifacts repositories add-iam-policy-binding ${AR_REPOSITORY} \
  --project ${PROJECT_ID} \
  --location ${REGION} \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/artifactregistry.writer
```

### 5. Grant token creator access

This role includes `iam.serviceAccounts.signBlob`, which is required for Image Builder authentication:

```bash
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member "serviceAccount:${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role roles/iam.serviceAccountTokenCreator
```

> [!NOTE] If prompted to specify a condition, select **None**. This role applies project-wide and does not require a condition. The prompt appears because the policy already contains other conditional bindings.

Once your infrastructure is ready, proceed to [Deploy the dataplane](../selfmanaged-gcp/deploy-dataplane).

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-gcp/deploy-dataplane ===

# Deploy the dataplane

If you have not yet set up the required GCP resources (GKE cluster, GCS, Artifact Registry, Workload Identity), see [Prepare infrastructure](../selfmanaged-gcp/prepare-infra) first.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization (e.g. `https://your-org-name.us-east-2.unionai.cloud`).
* You have a cluster name provided by or coordinated with Union.
* You have a GKE cluster with Workload Identity enabled, running one of the most recent three minor Kubernetes versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/)
* You have configured GCS bucket(s), Artifact Registry, and Workload Identity as described in [Prepare infrastructure](../selfmanaged-gcp/prepare-infra).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the Union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider gcp
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `gcp`.

   * Save the secret that is displayed. Union does not store the credentials; rerunning the same command can be used to retrieve the secret later.

3. Update the generated values file with your infrastructure details:

   Using the [environment variables](../selfmanaged-gcp/prepare-infra#environment-variables) from the prepare infrastructure step:

   - Set `global.METADATA_BUCKET` to `${BUCKET_PREFIX}-metadata`.
   - Set `global.FAST_REGISTRATION_BUCKET` to `${BUCKET_PREFIX}-fast-reg`.
   - Set `global.BACKEND_IAM_ROLE_ARN` to `${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com`.
   - Set `global.WORKER_IAM_ROLE_ARN` to the same value (or a separate GSA if you use distinct worker permissions).
   - Set `storage.bucketName` to `${BUCKET_PREFIX}-metadata`.
   - Set `storage.fastRegistrationBucketName` to `${BUCKET_PREFIX}-fast-reg`.
   - Set `storage.region` to `${REGION}`.
   - Set `storage.gcp.projectId` to `${PROJECT_ID}`.
   - Set `commonServiceAccount.annotations."iam.gke.io/gcp-service-account"` to `${GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com`.
   - Set `imageBuilder.registryName` to `${AR_REPOSITORY}` (defaults to `union-dataplane`; the chart auto-generates the full Artifact Registry URL from the project ID and region).

4. Install the data plane Helm chart:

   ```bash
   helm upgrade --install union unionai/dataplane \
     -f <GENERATED_VALUES_FILE> \
     --namespace union \
     --create-namespace \
     --force-conflicts
   ```

5. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:

   ```bash
   uctl create apikey --keyName EAGER_API_KEY --org <YOUR_ORG_NAME>
   ```

6. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

7. Follow the [Quickstart](https://www.union.ai/docs/v2/union/user-guide/quickstart) to run your first workflow and verify your cluster is working correctly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-azure ===

# Data plane setup on Azure

Union.ai's modular architecture allows for great flexibility and control.
You can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

If you already have an AKS cluster, Storage Account with Data Lake Gen2, Managed Identities, and Workload Identity configured, skip directly to **Self-managed deployment > Data plane setup on Azure > Deploy the dataplane**.

Otherwise, start with **Self-managed deployment > Data plane setup on Azure > Prepare infrastructure** to set up the required Azure resources.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-azure/prepare-infra ===

# Prepare infrastructure

This page walks you through the Azure infrastructure required before deploying the Union dataplane on AKS. If you already have these resources, skip to [Deploy the dataplane](../selfmanaged-azure/deploy-dataplane).

> [!NOTE] **Deployment model**: This guide covers **Self Managed** — you run only the dataplane chart; Union hosts the control plane.

## Prerequisites
- Azure CLI [installed](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and [configured](https://learn.microsoft.com/en-us/cli/azure/get-started-with-azure-cli?view=azure-cli-latest#sign-in-to-azure)

## Environment variables

Set these once at the top of your terminal session. All commands below reference them. Customize the names if you are deploying multiple data planes in the same subscription.

```bash
# --- Your environment ---
export SUBSCRIPTION_ID=$(az account show --query id --output tsv)
export TENANT_ID=$(az account show --query tenantId --output tsv)
export RESOURCE_GROUP=union-rg
export LOCATION=eastus2
export CLUSTER_NAME=union-dataplane
export ORG_NAME=<your-union-org-name>       # provided by Union

# --- Storage ---
export STORAGE_ACCOUNT=uniondataplane       # 3-24 lowercase alphanumeric, globally unique
export METADATA_CONTAINER=union-metadata

# --- Identities ---
export BACKEND_IDENTITY_NAME=union-backend
export WORKER_IDENTITY_NAME=union-executions

# --- AKS namespace (do not change) ---
export DATAPLANE_NAMESPACE=union
```

## 1. Subscription and Resource Group

All Union infrastructure lives in a dedicated resource group for access control and cost tracking.

```bash
az account set --subscription $SUBSCRIPTION_ID

az group create \
  --name $RESOURCE_GROUP \
  --location $LOCATION
```

## 2. AKS Cluster

You need an AKS cluster running one of the most recent three minor Kubernetes versions. See [Cluster Recommendations](../cluster-recommendations) for networking and node pool guidance.

Three specific add-ons are required:

| Add-on | Why |
|--------|-----|
| `--enable-oidc-issuer` | Enables the OIDC token issuer AKS needs for Workload Identity |
| `--enable-workload-identity` | Allows pods to assume Azure Managed Identities without credentials |
| `--enable-managed-identity` | AKS control plane uses a managed identity (not service principal) |

```bash
az aks create \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --location $LOCATION \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --enable-managed-identity \
  --node-count 2 \
  --node-vm-size Standard_D4s_v3
```

Save the OIDC issuer URL — you will need it when creating federated credentials:

```bash
export AKS_OIDC_ISSUER=$(az aks show \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)
```

Get cluster credentials:

```bash
az aks get-credentials \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME
```

## 3. Node Pools

Union workloads run on dedicated node pools. Separating system, worker, and GPU nodes allows independent scaling and keeps system pods stable.

### System node pool

The system pool was created with the cluster above. Recommended minimum: `Standard_D4s_v3` (4 vCPU / 16 GB) x 2 nodes.

### CPU worker node pool

```bash
az aks nodepool add \
  --resource-group $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME \
  --name workers \
  --node-count 2 \
  --node-vm-size Standard_B8as_v2 \
  --labels union.ai/node-role=worker
```

### GPU node pool (optional)

```bash
az aks nodepool add \
  --resource-group $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME \
  --name gpuworkers \
  --node-count 1 \
  --node-vm-size Standard_NC6s_v3 \
  --node-taints sku=gpu:NoSchedule \
  --labels union.ai/node-role=worker
```

> [!NOTE] **Spot VMs**: Union supports interruptible workloads on Azure Spot. Spot nodes are identified by the label `kubernetes.azure.com/scalesetpriority: spot`, which AKS sets automatically when `--priority Spot` is used.

## 4. Storage Account and Container

Union stores workflow metadata and code bundle artifacts in Azure Blob Storage. The storage account **must have Data Lake Storage Gen2 enabled** (`--enable-hierarchical-namespace`) — Union uses the `abfs://` protocol which requires this.

```bash
az storage account create \
  --name $STORAGE_ACCOUNT \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION \
  --sku Standard_LRS \
  --kind StorageV2 \
  --enable-hierarchical-namespace true \
  --allow-blob-public-access false

az storage container create \
  --name $METADATA_CONTAINER \
  --account-name $STORAGE_ACCOUNT
```

### CORS Configuration

To enable the [Code Viewer](../configuration/code-viewer) in the Union UI, configure a CORS rule on your Storage Account:

```bash
az storage cors add \
  --services b \
  --methods GET HEAD \
  --origins "https://*.unionai.cloud" "https://*.union.ai" \
  --allowed-headers "*" \
  --exposed-headers "ETag" \
  --max-age 3600 \
  --account-name $STORAGE_ACCOUNT
```

### Data Retention

Union recommends using lifecycle management policies on your Storage Account to manage storage costs. See [Data retention policy](../configuration/data-retention) for more information.

## 5. Managed Identities

Union separates infrastructure-level access from workload-level access using two identities:

| Identity | Used by | Needs access to |
|----------|---------|-----------------|
| `union-backend` | Operator, propeller, clusterresourcesync | Storage account, Key Vault |
| `union-executions` | Task execution pods (user workloads) | Storage account + any customer Azure services |

```bash
# Backend identity (for Union system components)
az identity create \
  --name $BACKEND_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP

# Worker identity (for task pods)
az identity create \
  --name $WORKER_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP

# Save client IDs and principal IDs
export BACKEND_CLIENT_ID=$(az identity show \
  --name $BACKEND_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP \
  --query clientId --output tsv)

export BACKEND_PRINCIPAL_ID=$(az identity show \
  --name $BACKEND_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP \
  --query principalId --output tsv)

export WORKER_CLIENT_ID=$(az identity show \
  --name $WORKER_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP \
  --query clientId --output tsv)

export WORKER_PRINCIPAL_ID=$(az identity show \
  --name $WORKER_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP \
  --query principalId --output tsv)
```

## 6. Workload Identity and Federated Credentials

Azure Workload Identity lets Kubernetes pods authenticate to Azure services using a projected service account token — no credentials stored in secrets.

### Backend identity (Union system components)

```bash
az identity federated-credential create \
  --name "union-backend-federated" \
  --identity-name $BACKEND_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP \
  --issuer $AKS_OIDC_ISSUER \
  --subject "system:serviceaccount:${DATAPLANE_NAMESPACE}:union-system" \
  --audiences api://AzureADTokenExchange
```

### Worker identity (task execution pods)

Task pods run under the `union` service account. In single-namespace mode (the default with `low_privilege: true`), create one federated credential for the release namespace:

```bash
az identity federated-credential create \
  --name "union-worker-single-ns" \
  --identity-name $WORKER_IDENTITY_NAME \
  --resource-group $RESOURCE_GROUP \
  --issuer $AKS_OIDC_ISSUER \
  --subject "system:serviceaccount:${DATAPLANE_NAMESPACE}:union" \
  --audiences api://AzureADTokenExchange
```

If using multi-namespace mode (`low_privilege: false`), also create credentials for each project-domain namespace:

```bash
for ns in development staging production; do
  az identity federated-credential create \
    --name "union-worker-${ns}" \
    --identity-name $WORKER_IDENTITY_NAME \
    --resource-group $RESOURCE_GROUP \
    --issuer $AKS_OIDC_ISSUER \
    --subject "system:serviceaccount:${ns}:default" \
    --audiences api://AzureADTokenExchange
done
```

## 7. Role Assignments

The managed identities need explicit RBAC permissions on the storage account.

- Obtaine the Storage Account ID:

```bash
STORAGE_ACCOUNT_ID=$(az storage account show \
    --name $STORAGE_ACCOUNT \
    --resource-group $RESOURCE_GROUP \
    --query id -o tsv)
```

```bash
# Backend identity: read/write workflow metadata
az role assignment create \
  --assignee-object-id $BACKEND_PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Contributor" \
  --scope $STORAGE_ACCOUNT_ID

# Worker identity: read/write artifacts
az role assignment create \
  --assignee-object-id $WORKER_PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --role "Storage Blob Data Contributor" \
  --scope $STORAGE_ACCOUNT_ID
```

## 8. Azure Key Vault (optional)

Union provides an embedded secrets management backend. If your organization needs to integrate with Azure Key Vault, create a vault and grant the backend identity access:

```bash
export KEY_VAULT_NAME=union-${ORG_NAME}

az keyvault create \
  --name $KEY_VAULT_NAME \
  --resource-group $RESOURCE_GROUP \
  --location $LOCATION \
  --enable-rbac-authorization true

KEY_VAULT_RESOURCE_ID=$(az keyvault show \
  --name $KEY_VAULT_NAME \
  --query id --output tsv)

az role assignment create \
  --assignee-object-id $BACKEND_PRINCIPAL_ID \
  --assignee-principal-type ServicePrincipal \
  --role "Key Vault Secrets Officer" \
  --scope $KEY_VAULT_RESOURCE_ID
```

The Key Vault URI (`https://${KEY_VAULT_NAME}.vault.azure.net/`) maps to `AZURE_KEY_VAULT_URI` in the chart values.

Once your infrastructure is ready, proceed to [Deploy the dataplane](../selfmanaged-azure/deploy-dataplane).

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-azure/deploy-dataplane ===

# Deploy the dataplane

If you have not yet set up the required Azure resources (AKS cluster, Storage Account, Managed Identities, Workload Identity), see [Prepare infrastructure](../selfmanaged-azure/prepare-infra) first.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have an AKS cluster with OIDC issuer and Workload Identity enabled, running one of the most recent three minor K8s versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/).
* You have configured a Storage Account, Managed Identities, and Workload Identity as described in [Prepare infrastructure](../selfmanaged-azure/prepare-infra).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the Union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider azure
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file `<org>-values.yaml` specific to the provider that you specify, in this case `azure`.

   * Save the secret that is displayed. Union does not store the credentials; rerunning the same command can be used to retrieve the secret later.

3. Update the generated values file with your infrastructure details:

   Using the [environment variables](../selfmanaged-azure/prepare-infra#environment-variables) from the prepare infrastructure step:

   - Set `global.BACKEND_IAM_ROLE_ARN` to `${BACKEND_CLIENT_ID}` (the backend managed identity client ID).
   - Set `global.WORKER_IAM_ROLE_ARN` to `${WORKER_CLIENT_ID}` (the worker managed identity client ID).
   - Set `global.METADATA_BUCKET` to `${METADATA_CONTAINER}`.
   - Set `storage.custom.stow.config.account` to `${STORAGE_ACCOUNT}`.
   - Set `storage.region` to `${LOCATION}`.
   - Set `commonServiceAccount.annotations."azure.workload.identity/client-id"` to `${BACKEND_CLIENT_ID}`.

   If using Azure Key Vault (optional):
   - Set `AZURE_KEY_VAULT_URI` to `https://${KEY_VAULT_NAME}.vault.azure.net/`.

4. Install the data plane Helm chart:

   ```bash
   helm upgrade --install union unionai/dataplane \
     -f <GENERATED_VALUES_FILE> \
     --namespace union \
     --create-namespace \
     --force-conflicts
   ```

5. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:

   ```bash
   uctl create apikey --keyName EAGER_API_KEY --org <YOUR_ORG_NAME>
   ```

6. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

7. Follow the [Quickstart](https://www.union.ai/docs/v2/union/user-guide/quickstart) to run your first workflow and verify your cluster is working correctly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-oci ===

# Data plane setup on OCI

Union.ai's modular architecture allows for great flexibility and control.
You can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

If you already have an OKE cluster, Object Storage buckets, Container Registry, and IAM access configured, skip directly to **Self-managed deployment > Data plane setup on OCI > Deploy the dataplane**.

Otherwise, start with **Self-managed deployment > Data plane setup on OCI > Prepare infrastructure** to set up the required OCI resources.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-oci/prepare-infra ===

# Prepare infrastructure

This page walks you through creating the OCI resources needed for a Union data plane. If you already have these resources, skip to [Deploy the dataplane](../selfmanaged-oci/deploy-dataplane).

## OKE Cluster

You need an OKE cluster running one of the most recent three minor Kubernetes versions. See [Cluster Recommendations](../cluster-recommendations) for networking and node pool guidance.

If you don't already have a cluster, create one via the [OCI Console](https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengcreatingclusterusingoke.htm) or the OCI CLI:

```bash
export COMPARTMENT_ID=<YOUR_COMPARTMENT_OCID>
export REGION=<YOUR_OCI_REGION>              # e.g. us-ashburn-1
export VCN_ID=<YOUR_VCN_OCID>
export SUBNET_ID=<YOUR_KUBERNETES_API_SUBNET_OCID>

oci ce cluster create \
  --compartment-id ${COMPARTMENT_ID} \
  --name union-dataplane \
  --kubernetes-version v1.31.1 \
  --vcn-id ${VCN_ID} \
  --endpoint-subnet-id ${SUBNET_ID} \
  --region ${REGION}
```

> [!NOTE] The OKE cluster creation requires a pre-existing VCN and subnet. See the [OCI networking documentation](https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengnetworkconfigexample.htm) for details on setting up the required network resources.

Union supports Autoscaling and the use of preemptible instances.

## Object Storage

Each data plane uses OCI Object Storage buckets to store data used in workflow execution.
Union recommends the use of two buckets:

1. **Metadata bucket**: contains workflow execution data such as task inputs and outputs.
2. **Fast registration bucket**: contains local code artifacts copied into the Flyte task container at runtime when using `flyte deploy` or `flyte run --copy-style all`.

You can also choose to use a single bucket.

Create the buckets:

```bash
export BUCKET_PREFIX=union-dataplane   # choose a unique prefix within your tenancy

oci os bucket create \
  --compartment-id ${COMPARTMENT_ID} \
  --name ${BUCKET_PREFIX}-metadata \
  --region ${REGION}

oci os bucket create \
  --compartment-id ${COMPARTMENT_ID} \
  --name ${BUCKET_PREFIX}-fast-reg \
  --region ${REGION}
```

### CORS Configuration

To enable the [Code Viewer](../configuration/code-viewer) in the Union UI, configure a CORS policy on your bucket(s). This allows the UI to securely fetch code bundles directly from storage.

OCI Object Storage CORS is configured via bucket settings. See the [OCI CORS documentation](https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/managingbuckets_topic-CORS.htm) for details. Apply the following rule:

- **Allowed Origins:** `https://*.unionai.cloud`
- **Allowed Methods:** `GET`, `HEAD`
- **Allowed Headers:** `*`
- **Expose Headers:** `ETag`
- **Max Age Seconds:** `3600`

### Data Retention

Union recommends using lifecycle policies on these buckets to manage storage costs. See [Data retention policy](../configuration/data-retention) for more information.

## Container Registry

Create an [OCI Container Registry (OCIR)](https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm) repository for Image Builder to push and pull container images:

```bash
oci artifacts container-repository create \
  --compartment-id ${COMPARTMENT_ID} \
  --display-name union-dataplane/imagebuilder \
  --is-public false
```

Note the repository path (e.g. `${REGION}.ocir.io/<TENANCY_NAMESPACE>/union-dataplane/imagebuilder`) — you will reference it when configuring access below.

## Identity & Access

Union services and workflow task pods need access to your Object Storage buckets and Container Registry. OCI supports two authentication models:

### Option A: Instance Principals (recommended)

Use [Instance Principals](https://docs.oracle.com/en-us/iaas/Content/Identity/Tasks/callingservicesfrominstances.htm) so that pods running on OKE nodes inherit permissions automatically.

#### 1. Create a Dynamic Group

Create a [Dynamic Group](https://docs.oracle.com/en-us/iaas/Content/Identity/Tasks/managingdynamicgroups.htm) matching your OKE worker nodes:

```bash
oci iam dynamic-group create \
  --compartment-id ${COMPARTMENT_ID} \
  --name union-dataplane-nodes \
  --description "OKE worker nodes for Union data plane" \
  --matching-rule "ALL {instance.compartment.id = '${COMPARTMENT_ID}'}"
```

#### 2. Create IAM policies

Grant the dynamic group access to Object Storage and OCIR:

```bash
oci iam policy create \
  --compartment-id ${COMPARTMENT_ID} \
  --name union-dataplane-policy \
  --description "Allow Union data plane access to Object Storage and OCIR" \
  --statements \
  '["Allow dynamic-group union-dataplane-nodes to manage objects in compartment id '"${COMPARTMENT_ID}"' where target.bucket.name='"'"''"${BUCKET_PREFIX}"'-metadata'"'"'",
    "Allow dynamic-group union-dataplane-nodes to manage objects in compartment id '"${COMPARTMENT_ID}"' where target.bucket.name='"'"''"${BUCKET_PREFIX}"'-fast-reg'"'"'",
    "Allow dynamic-group union-dataplane-nodes to manage repos in compartment id '"${COMPARTMENT_ID}"'"]'
```

### Option B: Static Credentials

If Instance Principals are not available, you can use S3-compatible access keys:

#### 1. Generate a Customer Secret Key

Create a [Customer Secret Key](https://docs.oracle.com/en-us/iaas/Content/Identity/Tasks/managingcredentials.htm#s3) for S3 Compatibility API access:

```bash
export USER_OCID=<YOUR_USER_OCID>

oci iam customer-secret-key create \
  --user-id ${USER_OCID} \
  --display-name union-dataplane-s3-compat
```

> [!NOTE] The command output contains the secret key value. Save it immediately — it cannot be retrieved again.

You will configure these credentials in the generated values file during deployment (see step 3 in [Deploy the dataplane](../selfmanaged-oci/deploy-dataplane)).

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-oci/deploy-dataplane ===

# Deploy the dataplane

If you have not yet set up the required OCI resources (OKE cluster, Object Storage, Container Registry, IAM access), see [Prepare infrastructure](../selfmanaged-oci/prepare-infra) first.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have an OKE cluster running one of the most recent three minor Kubernetes versions.
  [Learn more](https://kubernetes.io/releases/version-skew-policy/)
* You have configured Object Storage bucket(s), Container Registry, and IAM access as described in [Prepare infrastructure](../selfmanaged-oci/prepare-infra).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).

## Deploy the Union.ai operator

1. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

2. Use the `uctl selfserve provision-dataplane-resources` command to generate a new client and client secret for communicating with your Union control plane, provision authorization permissions for the app to operate on the union cluster name you have selected, generate values file to install dataplane in your Kubernetes cluster and provide follow-up instructions:

   ```bash
   uctl config init --host=<YOUR_UNION_CONTROL_PLANE_URL>
   uctl selfserve provision-dataplane-resources --clusterName <YOUR_SELECTED_CLUSTERNAME>  --provider oci
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML file specific to the provider that you specify, in this case `oci`.

   * Save the secret that is displayed. Union does not store the credentials; rerunning the same command can be used to retrieve the secret later.

3. Update the generated values file with your infrastructure details:

   - Set `storage.bucketName` and `storage.fastRegistrationBucketName` to your Object Storage bucket name(s).
   - Set `storage.region` to your OCI region.
   - If using static credentials (Option B), set `storage.accessKey` and `storage.secretKey` to your S3 Compatibility API credentials.

4. Install the data plane Helm chart:

   ```bash
   helm upgrade --install union unionai/dataplane \
     -f <GENERATED_VALUES_FILE> \
     --namespace union \
     --create-namespace \
     --force-conflicts
   ```

5. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:

   ```bash
   uctl create apikey --keyName EAGER_API_KEY --org <YOUR_ORG_NAME>
   ```

6. Once deployed you can check to see if the cluster has been successfully registered to the control plane:

   ```bash
   uctl get cluster
    ----------- ------- --------------- -----------
   | NAME      | ORG   | STATE         | HEALTH    |
    ----------- ------- --------------- -----------
   | <cluster> | <org> | STATE_ENABLED | HEALTHY   |
    ----------- ------- --------------- -----------
   1 rows
   ```

7. Follow the [Quickstart](https://www.union.ai/docs/v2/union/user-guide/quickstart) to run your first workflow and verify your cluster is working correctly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-coreweave ===

# Data plane setup on CoreWeave

Union.ai's modular architecture allows for great flexibility and control.
You can decide how many clusters to have, their shape, and who has access to what.
All communication is encrypted.  The Union architecture is described on the [Architecture](../architecture/_index) page.

If you already have a CoreWeave Kubernetes Service (CKS) cluster and CoreWeave AI Object Storage (bucket, access keys, access policy) configured, skip directly to **Self-managed deployment > Data plane setup on CoreWeave > Deploy the dataplane**.

Otherwise, start with **Self-managed deployment > Data plane setup on CoreWeave > Prepare infrastructure** to set up the required CoreWeave resources.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-coreweave/prepare-infra ===

# Prepare infrastructure

This page walks you through creating the resources needed for a Union data plane on CoreWeave Kubernetes Service (CKS) with [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage). If you already have these resources, skip to [Deploy the dataplane](../selfmanaged-coreweave/deploy-dataplane).

## CKS cluster

You need a [CKS cluster](https://docs.coreweave.com/products/cks/clusters/introduction) running one of the most recent three minor Kubernetes versions, with `kubectl` access configured. See [Cluster Recommendations](../cluster-recommendations) for networking and node pool guidance.

For instructions on creating a cluster, see [Create a CKS cluster](https://docs.coreweave.com/products/cks/clusters/create).

## CoreWeave AI Object Storage

Union uses S3-compatible object storage to store workflow data and artifacts. CoreWeave AI Object Storage is supported, but requires virtual-hosted style S3 URLs (the default Union configuration uses path-style URLs, which CoreWeave doesn't support — you'll set the relevant overrides during deployment).

### Create a bucket

Create a bucket in the [CoreWeave Cloud Console](https://console.coreweave.com/). Navigate to **Storage > Object Storage** and create a bucket in your desired Availability Zone.

For detailed instructions, see [Create a bucket](https://docs.coreweave.com/products/storage/object-storage/buckets/create-bucket).

### Generate access credentials

Navigate to **Administration > Object Storage Access Keys** in the Cloud Console and create an access key pair. Record the **Access Key ID** and **Secret Key** for use during deployment.

You need this key pair so the Helm values and workload environment variables can authenticate to your bucket.

For detailed instructions, see [Create access keys](https://docs.coreweave.com/products/storage/object-storage/auth-access/manage-access-keys/create-keys).

### Create an access policy

Create an organization access policy that grants your access key permissions on the bucket. Navigate to **Administration > Policies > Object Storage Access** in the Cloud Console and create a policy with the following JSON. Replace `<BUCKET_NAME>` with the name of your bucket.

```json
{
  "policy": {
    "version": "v1alpha1",
    "name": "union-ai-bucket-access",
    "statements": [
      {
        "name": "allow-union-s3-access",
        "effect": "Allow",
        "actions": ["s3:*"],
        "resources": [
          "<BUCKET_NAME>",
          "<BUCKET_NAME>/*"
        ],
        "principals": ["*"]
      }
    ]
  }
}
```

For detailed instructions on creating and managing policies, see [Organization access policies](https://docs.coreweave.com/products/storage/object-storage/auth-access/organization-policies/about).

> [!WARNING]
> Without an access policy, API operations return `403 Forbidden` errors even with valid access keys.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/selfmanaged-coreweave/deploy-dataplane ===

# Deploy the dataplane

If you have not yet set up the required CoreWeave resources (CKS cluster, AI Object Storage bucket, access keys, access policy), see [Prepare infrastructure](../selfmanaged-coreweave/prepare-infra) first.

## Assumptions

* You have a Union.ai organization, and you know the control plane URL for your organization.
* You have a cluster name provided by or coordinated with Union.
* You have a CKS cluster running one of the most recent three minor Kubernetes versions. [Learn more](https://kubernetes.io/releases/version-skew-policy/)
* You have a CoreWeave AI Object Storage bucket, access keys, and access policy as described in [Prepare infrastructure](../selfmanaged-coreweave/prepare-infra).

## Prerequisites

* Install [Helm 3](https://helm.sh/docs/intro/install/).
* Install [uctl](https://www.union.ai/docs/v2/union/deployment/api-reference/uctl-cli/_index).
* Install the [`flyte` CLI](https://www.union.ai/docs/v2/union/api-reference/flyte-cli) (used later to run a sample workflow).

## Deploy the Union.ai operator

1. Set your `KUBECONFIG` to the CKS cluster where you want to deploy the data plane:

   ```bash
   export KUBECONFIG=<PATH_TO_KUBECONFIG>
   ```

2. Configure the Union CLI and provision data plane resources:

   ```bash
   uctl config init --host=<ORG_NAME>.union.ai
   uctl selfserve provision-dataplane-resources --clusterName <CLUSTER_NAME> --provider metal
   ```

   * The command will output the ID, name, and a secret that will be used by the Union services to communicate with your control plane.
     It will also generate a YAML values file specific to the `metal` provider.

   * Save the secret that is displayed. Union does not store the credentials; rerunning the same command can be used to retrieve the secret later.

3. Update the generated values file with your CoreWeave-specific storage configuration. AI Object Storage requires virtual-hosted style S3 URLs, so you must override the default storage configuration. Replace the placeholders with your actual credentials and settings.

   ```yaml
   host: <ORG_NAME>.union.ai
   clusterName: <CLUSTER_NAME>
   orgName: <ORG_NAME>
   provider: metal

   storage:
     provider: custom
     bucketName: <BUCKET_NAME>
     fastRegistrationBucketName: <BUCKET_NAME>
     custom:
       type: stow
       container: <BUCKET_NAME>
       stow:
         kind: s3
         config:
           region: <AVAILABILITY_ZONE>
           auth_type: accesskey
           access_key_id: <ACCESS_KEY_ID>
           secret_key: <SECRET_ACCESS_KEY>
           endpoint: https://cwobject.com
           disable_ssl: false
           disable_force_path_style: true

   executor:
     extraEnvVars:
       - name: FLYTE_AWS_S3_ADDRESSING_STYLE
         value: "virtual"
       - name: FLYTE_AWS_ENDPOINT
         value: "https://cwobject.com"
       - name: AWS_ACCESS_KEY_ID
         value: <ACCESS_KEY_ID>
       - name: AWS_SECRET_ACCESS_KEY
         value: <SECRET_ACCESS_KEY>
       - name: AWS_DEFAULT_REGION
         value: <AVAILABILITY_ZONE>

   config:
     k8s:
       plugins:
         k8s:
           default-env-vars:
             - FLYTE_AWS_ENDPOINT: https://<BUCKET_NAME>.cwobject.com
             - FLYTE_AWS_S3_ADDRESSING_STYLE: "virtual"
             - AWS_ACCESS_KEY_ID: <ACCESS_KEY_ID>
             - AWS_SECRET_ACCESS_KEY: <SECRET_ACCESS_KEY>
             - AWS_DEFAULT_REGION: <AVAILABILITY_ZONE>

   operator:
     enableTunnelService: true

   secrets:
     admin:
       create: true
       clientId: <CLIENT_ID>
       clientSecret: <CLIENT_SECRET>

   fluentbit:
     enabled: true
     env:
       - name: AWS_ACCESS_KEY_ID
         value: <ACCESS_KEY_ID>
       - name: AWS_SECRET_ACCESS_KEY
         value: <SECRET_ACCESS_KEY>
   ```

   > [!NOTE]
   > The `uctl selfserve provision-dataplane-resources` command in step 2 generates the `<CLIENT_ID>` and `<CLIENT_SECRET>` values. Use the values from that command's output.

   The settings below are required for AI Object Storage compatibility:

   | Setting                                                                 | Value                                | Purpose                                                         |
   | ----------------------------------------------------------------------- | ------------------------------------ | --------------------------------------------------------------- |
   | `storage.custom.stow.config.disable_force_path_style`                   | `true`                               | Enables virtual-hosted style S3 URLs.                           |
   | `storage.custom.stow.config.endpoint`                                   | `https://cwobject.com`               | AI Object Storage endpoint for the control plane.               |
   | `config.k8s.plugins.k8s.default-env-vars` (entry: `FLYTE_AWS_ENDPOINT`) | `https://<BUCKET_NAME>.cwobject.com` | Bucket-specific endpoint injected into task pods.               |
   | `executor.extraEnvVars.FLYTE_AWS_S3_ADDRESSING_STYLE`                   | `virtual`                            | Configures the executor to use virtual-hosted style addressing. |

4. Add the Union.ai Helm repo:

   ```bash
   helm repo add unionai https://unionai.github.io/helm-charts/
   helm repo update
   ```

5. Install the Custom Resource Definitions (CRDs):

   ```bash
   helm upgrade --install unionai-dataplane-crds unionai/dataplane-crds
   ```

6. Install the data plane. Replace `<PATH_TO_VALUES_FILE>` with the path to the Helm values file you customized in step 3.

   ```bash
   helm upgrade --install unionai-dataplane unionai/dataplane \
     --namespace union --create-namespace \
     --values <PATH_TO_VALUES_FILE> \
     --timeout 10m
   ```

7. Verify the pods are running:

   ```bash
   kubectl get pods -n union
   ```

   When the deployment succeeds, all pods show a `Running` status, including `union-operator-proxy`, `union-operator-buildkit`, `flytepropeller`, and `executor`.

8. Verify the cluster is registered with the control plane:

   ```bash
   uctl get cluster
   ```

   The output is similar to the following:

   ```text
   NAME               ORG          STATE          HEALTH
   union-coreweave    my-org       STATE_ENABLED  HEALTHY
   ```

9. Create an API key for your organization. This is required for v2 workflow executions on the data plane. If you have already created one, rerun the same command to propagate the key to the new cluster:

   ```bash
   uctl create apikey --keyName EAGER_API_KEY --org <ORG_NAME>
   ```

   > [!NOTE]
   > If you receive a `PermissionDenied` error, contact [Union.ai support](https://www.union.ai/) to have the permission enabled for your organization.

## Test a workflow

To run a sample workflow, complete the following steps:

1. Create a Flyte CLI configuration file at the path `.flyte/config.yaml` in your project directory. Replace `<ORG_NAME>` and `<PROJECT_NAME>` with your organization and project identifiers.

   ```yaml
   admin:
     endpoint: dns:///<ORG_NAME>.union.ai
   image:
     builder: remote
   task:
     domain: development
     org: <ORG_NAME>
     project: <PROJECT_NAME>
   ```

2. Run a sample workflow:

   ```bash
   flyte run --image ghcr.io/flyteorg/flyte:py3.13-v2.0.2 \
     hello_world.py main --n 5
   ```

   > [!NOTE]
   > If the remote image builder isn't enabled for your organization, use the `--image` flag with a pre-built container image as in the preceding `flyte run` example.

3. Check the run status. Replace `<RUN_NAME>` with the workflow run identifier.

   ```bash
   flyte get run <RUN_NAME>
   ```

   Look for `ACTION_PHASE_SUCCEEDED` in the output to confirm the workflow completed successfully.

## Troubleshooting

| Symptom                                      | Cause                                                  | Fix                                                                                                                                                                 |
| -------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `PathStyleRequestNotAllowed 400`             | The control plane generates path-style S3 URLs.        | Set `storage.custom.stow.config.disable_force_path_style` to `true` in the Helm values file.                                                                        |
| `403 Forbidden` on S3 operations             | No access policy for the storage key.                  | Create an object storage access policy in the [Cloud Console](https://docs.coreweave.com/products/storage/object-storage/auth-access/organization-policies/manage). |
| Task pods reach `s3.us-east-1.amazonaws.com` | Task pods are missing the CoreWeave endpoint.          | Add `FLYTE_AWS_ENDPOINT` to `config.k8s.plugins.k8s.default-env-vars` with the value `https://<BUCKET_NAME>.cwobject.com`.                                          |
| "All enabled clusters are unhealthy"         | The control plane can't reach the data plane.          | Verify the tunnel service is running: `kubectl get pods -n union \| grep proxy`.                                                                                    |
| "Remote image builder is not enabled"        | The remote builder isn't enabled on the control plane. | Contact Union.ai to enable the remote builder, or use `--image` with a pre-built image.                                                                             |
| `invalid keys: collectbillableresourceusage` | Chart version mismatch with the operator.              | Use matching chart and operator image versions.                                                                                                                     |
| Helm "another operation in progress"         | An interrupted Helm upgrade.                           | Run `helm rollback unionai-dataplane <LAST_GOOD_REVISION> -n union`.                                                                                                |
| "Provided Tunnel token is not valid"         | The control plane isn't configured for this cluster.   | Complete cluster registration first.                                                                                                                                |

## Additional resources

For more information, see the following resources:

- [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage)
- [Create a CKS cluster](https://docs.coreweave.com/products/cks/clusters/create)

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration ===

# Advanced Configurations

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

This section covers the configuration of union features on your Union.ai cluster.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/node-pools ===

# Configuring Service and Worker Node Pools

As a best practice, we recommend using separate node pools for the Union services and the Union worker pods. This allows
you to guard against resource contention between Union services and other tasks running in your cluster.

Start by creating two node pools in your cluster. One for the Union services and one for the Union worker pods.
Configure the node pool for the Union services with the `union.ai/node-role: services` label.  The worker pool will
be configured with the `union.ai/node-role: worker` label.  You will also need to taint the nodes in the service and
worker pools to ensure that only the appropriate pods are scheduled on them.

The nodes for Union services should be tainted with:

```bash
kubectl taint nodes <node-name> union.ai/node-role=services:NoSchedule
```
The nodes for execution workers should be tainted with:

```bash
kubectl taint nodes <node-name> union.ai/node-role=worker:NoSchedule
```

Vendor interfaces and provisioning tools may support tainting nodes automatically through configuration options.

Set the scheduling constraints for the Union services in your values file:

```yaml
scheduling:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: union.ai/node-role
            operator: In
            values:
            - services
  tolerations:
    - effect: NoSchedule
      key: union.ai/node-role
      operator: Equal
      value: services
```

To ensure that your worker processes are scheduled on the worker node pool, set the following for the Flyte kubernetes plugin:

```yaml
config:
  k8s:
    plugins:
      k8s:
        default-affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: union.ai/node-role
                  operator: In
                  values:
                  - worker
        default-tolerations:
          - effect: NoSchedule
            key: union.ai/node-role
            operator: Equal
            value: worker
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/authentication ===

# Authentication

Union.ai uses [OpenID Connect (OIDC)](https://openid.net/specs/openid-connect-core-1_0.html) for user authentication and [OAuth 2.0](https://tools.ietf.org/html/rfc6749) for service-to-service authorization. You must configure an external Identity Provider (IdP) to enable authentication on your deployment.

## Overview

Authentication is enforced at two layers:

1. **Ingress layer** — The control plane nginx ingress validates every request to protected routes via an auth subrequest to the `/me` endpoint.
2. **Application layer** — `flyteadmin` manages browser sessions, validates tokens, and exposes OIDC discovery endpoints.

The following diagram shows how these layers interact for browser-based authentication:

```mermaid
sequenceDiagram
    participant B as Browser
    participant N as Nginx Ingress
    participant F as Flyteadmin
    participant IdP as Identity Provider
    B->>N: Request protected route
    N->>F: Auth subrequest (GET /me)
    F-->>N: 401 (no session)
    N-->>B: 302 → /login
    B->>F: GET /login (unprotected)
    F-->>B: 302 → IdP authorize endpoint
    B->>IdP: Authenticate (PKCE)
    IdP-->>B: 302 → /callback?code=...
    B->>F: GET /callback (exchange code)
    F->>IdP: Exchange code for tokens
    F-->>B: Set-Cookie + 302 → original URL
    B->>N: Retry with session cookie
    N->>F: Auth subrequest (GET /me)
    F-->>N: 200 OK
    N-->>B: Forward to backend service
```

## Prerequisites

- A Union.ai deployment with the control plane installed.
- An OIDC-compliant Identity Provider (IdP).
- Access to create OAuth applications in your IdP.
- A secret management solution for delivering client secrets to pods (e.g., External Secrets Operator with AWS Secrets Manager, HashiCorp Vault, or native Kubernetes secrets).

## Configuring your Identity Provider

You must create three OAuth applications in your IdP:

| Application | Type | Grant Types | Purpose |
|---|---|---|---|
| Web app (browser login) | Web | `authorization_code` | Console/web UI authentication |
| Native app (SDK/CLI) | Native (PKCE) | `authorization_code`, `device_code` | SDK and CLI authentication |
| Service app (internal) | Service | `client_credentials` | All service-to-service communication |

> [!NOTE]
> A single service app is shared by both control plane and dataplane services. If your security policy requires separate credentials per component, you can create additional service apps, but the configuration below assumes a single shared client.

### Authorization server setup

1. Create a custom authorization server in your IdP (or use the default).
2. Add a scope named `all`.
3. Add an access policy that allows all registered clients listed above.
4. Add a policy rule that permits `authorization_code`, `client_credentials`, and `device_code` grant types.
5. Note the **Issuer URI** (e.g., `https://your-idp.example.com/oauth2/<server-id>`).
6. Note the **Token endpoint** (e.g., `https://your-idp.example.com/oauth2/<server-id>/v1/token`).

### Application details

#### 1. Web application (browser login)

- **Type**: Web Application
- **Sign-on method**: OIDC
- **Grant types**: `authorization_code`
- **Sign-in redirect URI**: `https://<your-domain>/callback`
- **Sign-out redirect URI**: `https://<your-domain>/logout`
- Note the **Client ID** → used as `OIDC_CLIENT_ID`
- Note the **Client Secret** → stored in `flyte-admin-secrets` (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**)

#### 2. Native application (SDK/CLI)

- **Type**: Native Application
- **Sign-on method**: OIDC
- **Grant types**: `authorization_code`, `urn:ietf:params:oauth:grant-type:device_code`
- **Sign-in redirect URI**: `http://localhost:53593/callback`
- **Require PKCE**: Always
- **Consent**: Trusted (skip consent screen)
- Note the **Client ID** → used as `CLI_CLIENT_ID` (no secret needed for public clients)

#### 3. Service application (internal)

- **Type**: Service (machine-to-machine)
- **Grant types**: `client_credentials`
- Note the **Client ID** → used as `INTERNAL_CLIENT_ID` (control plane) and `AUTH_CLIENT_ID` (dataplane)
- Note the **Client Secret** → stored in multiple Kubernetes secrets (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**)

## Control plane Helm configuration

The control plane Helm chart requires auth configuration in several sections. All examples below use the global variables defined in `values.<cloud>.selfhosted-intracluster.yaml`.

### Global variables

Set these in your customer overrides file:

```yaml
global:
  OIDC_BASE_URL: "<issuer-uri>"             # e.g. "https://your-idp.example.com/oauth2/default"
  OIDC_CLIENT_ID: "<web-app-client-id>"     # Browser login
  CLI_CLIENT_ID: "<native-app-client-id>"   # SDK/CLI
  INTERNAL_CLIENT_ID: "<service-client-id>" # Service-to-service
  AUTH_TOKEN_URL: "<token-endpoint>"         # e.g. "https://your-idp.example.com/oauth2/default/v1/token"
```

### Flyteadmin OIDC configuration

Configure `flyteadmin` to act as the OIDC relying party. This enables the `/login`, `/callback`, `/me`, and `/logout` endpoints:

```yaml
flyte:
  configmap:
    adminServer:
      server:
        security:
          useAuth: true
      auth:
        grpcAuthorizationHeader: flyte-authorization
        httpAuthorizationHeader: flyte-authorization
        authorizedUris:
          - "http://flyteadmin:80"
          - "http://flyteadmin.<namespace>.svc.cluster.local:80"
        appAuth:
          authServerType: External
          externalAuthServer:
            baseUrl: "<issuer-uri>"
          thirdPartyConfig:
            flyteClient:
              clientId: "<native-app-client-id>"
              redirectUri: "http://localhost:53593/callback"
              scopes:
                - all
        userAuth:
          openId:
            baseUrl: "<issuer-uri>"
            clientId: "<web-app-client-id>"
            scopes:
              - profile
              - openid
              - offline_access
          cookieSetting:
            sameSitePolicy: LaxMode
            domain: ""
          idpQueryParameter: idp
```

Key settings:

- `useAuth: true` — registers the `/login`, `/callback`, `/me`, and `/logout` HTTP endpoints. **Required** for auth to function.
- `authServerType: External` — use your IdP as the authorization server (not flyteadmin's built-in server).
- `grpcAuthorizationHeader: flyte-authorization` — the header name used for bearer tokens. Both the SDK and internal services use this header.

### Flyteadmin and scheduler admin SDK client

Flyteadmin and the scheduler use the admin SDK to communicate with other control plane services. Configure client credentials so these calls are authenticated:

```yaml
flyte:
  configmap:
    adminServer:
      admin:
        clientId: "<service-client-id>"
        clientSecretLocation: "/etc/secrets/client_secret"
```

The secret is mounted from the `flyte-admin-secrets` Kubernetes secret (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**).

### Scheduler auth secret

The flyte-scheduler mounts a separate Kubernetes secret (`flyte-secret-auth`) at `/etc/secrets/`. Enable this mount:

```yaml
flyte:
  secrets:
    adminOauthClientCredentials:
      enabled: true
      clientSecret: "placeholder"
```

> [!NOTE]
> Setting `clientSecret: "placeholder"` causes the subchart to render the `flyte-secret-auth` Kubernetes Secret. Use External Secrets Operator with `creationPolicy: Merge` to overwrite the placeholder with the real credential, or create the secret directly before installing the chart.

### Service-to-service authentication

Control plane services communicate through nginx and need OAuth tokens. Configure the admin SDK client credentials and the union service auth:

```yaml
configMap:
  admin:
    clientId: "<service-client-id>"
    clientSecretLocation: "/etc/secrets/union/client_secret"
  union:
    auth:
      enable: true
      type: ClientSecret
      clientId: "<service-client-id>"
      clientSecretLocation: "/etc/secrets/union/client_secret"
      tokenUrl: "<token-endpoint>"
      authorizationMetadataKey: flyte-authorization
      scopes:
        - all
```

The secret is mounted from the control plane service secret (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**).

### Executions service

The executions service has its own admin client connection that also needs auth:

```yaml
services:
  executions:
    configMap:
      executions:
        app:
          adminClient:
            connection:
              authorizationHeader: flyte-authorization
              clientId: "<service-client-id>"
              clientSecretLocation: "/etc/secrets/union/client_secret"
              tokenUrl: "<token-endpoint>"
              scopes:
                - all
```

### Ingress auth annotations

The control plane ingress uses nginx auth subrequests to enforce authentication. These annotations are set on protected ingress routes:

```yaml
ingress:
  protectedIngressAnnotations:
    nginx.ingress.kubernetes.io/auth-url: "https://$host/me"
    nginx.ingress.kubernetes.io/auth-signin: "https://$host/login?redirect_url=$escaped_request_uri"
    nginx.ingress.kubernetes.io/auth-response-headers: "Set-Cookie"
    nginx.ingress.kubernetes.io/auth-cache-key: "$http_flyte_authorization$http_cookie"
  protectedIngressAnnotationsGrpc:
    nginx.ingress.kubernetes.io/auth-url: "https://$host/me"
    nginx.ingress.kubernetes.io/auth-response-headers: "Set-Cookie"
    nginx.ingress.kubernetes.io/auth-cache-key: "$http_authorization$http_flyte_authorization$http_cookie"
```

For every request to a protected route, nginx makes a subrequest to `/me`. If flyteadmin returns 200 (valid session or token), the request is forwarded. If 401, the user is redirected to `/login` for browser clients, or the 401 is returned directly for API clients.

## Dataplane Helm configuration

When the control plane has OIDC enabled, the dataplane must also authenticate. All dataplane services use the same service app credentials (`AUTH_CLIENT_ID`), which is the same client as `INTERNAL_CLIENT_ID` on the control plane.

### Dataplane global variables

```yaml
global:
  AUTH_CLIENT_ID: "<service-client-id>"  # Same as INTERNAL_CLIENT_ID
```

### Cluster resource sync

```yaml
clusterresourcesync:
  config:
    union:
      auth:
        enable: true
        type: ClientSecret
        clientId: "<service-client-id>"
        clientSecretLocation: "/etc/union/secret/client_secret"
        authorizationMetadataKey: flyte-authorization
        tokenRefreshWindow: 5m
```

### Operator (union service auth)

```yaml
config:
  union:
    auth:
      enable: true
      type: ClientSecret
      clientId: "<service-client-id>"
      clientSecretLocation: "/etc/union/secret/client_secret"
      authorizationMetadataKey: flyte-authorization
      tokenRefreshWindow: 5m
```

### Propeller admin client

```yaml
config:
  admin:
    admin:
      clientId: "<service-client-id>"
      clientSecretLocation: "/etc/union/secret/client_secret"
```

### Executor (eager mode)

Injects the `EAGER_API_KEY` secret into task pods for authenticated eager-mode execution:

```yaml
executor:
  config:
    unionAuth:
      injectSecret: true
      secretName: EAGER_API_KEY
```

### Dataplane secrets

Enable the `union-secret-auth` Kubernetes secret mount for dataplane pods:

```yaml
secrets:
  admin:
    enable: true
    create: false
    clientId: "<service-client-id>"
    clientSecret: "placeholder"
```

> [!NOTE]
> `create: false` means the chart does not create the `union-secret-auth` Kubernetes Secret. You must provision it externally (see **Self-managed deployment > Advanced Configurations > Authentication > Secret delivery**). Setting `clientSecret: "placeholder"` with `create: true` is also supported if you want the chart to create the secret and then overwrite it via External Secrets Operator.

## Secret delivery

Client secrets must be delivered to pods as files mounted into the container filesystem. The table below lists the required Kubernetes secrets, their mount paths, and which components use them:

| Kubernetes Secret | Mount Path | Components | Namespace |
| --- | --- | --- | --- |
| `flyte-admin-secrets` | `/etc/secrets/` | flyteadmin | `union-cp` |
| `flyte-secret-auth` | `/etc/secrets/` | flyte-scheduler | `union-cp` |
| Control plane service secret | `/etc/secrets/union/` | executions, cluster, usage, and other CP services | `union-cp` |
| `union-secret-auth` | `/etc/union/secret/` | operator, propeller, CRS | `union` |

All secrets must contain a key named `client_secret` with the service app's OAuth client secret value.

### Option A: External Secrets Operator (recommended)

If you use [External Secrets Operator (ESO)](https://external-secrets.io/) with a cloud secret store, create `ExternalSecret` resources that sync the client secret into each Kubernetes secret:

```yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: flyte-admin-secrets-auth
  namespace: union-cp
spec:
  secretStoreRef:
    name: default
    kind: SecretStore
  refreshInterval: 1h
  target:
    name: flyte-admin-secrets
    creationPolicy: Merge
    deletionPolicy: Retain
  data:
    - secretKey: client_secret
      remoteRef:
        key: "<your-secret-store-key>"
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: flyte-secret-auth
  namespace: union-cp
spec:
  secretStoreRef:
    name: default
    kind: SecretStore
  refreshInterval: 1h
  target:
    name: flyte-secret-auth
    creationPolicy: Merge
    deletionPolicy: Retain
  data:
    - secretKey: client_secret
      remoteRef:
        key: "<your-secret-store-key>"
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: union-secret-auth
  namespace: union
spec:
  secretStoreRef:
    name: default
    kind: SecretStore
  refreshInterval: 1h
  target:
    name: union-secret-auth
    creationPolicy: Merge
    deletionPolicy: Retain
  data:
    - secretKey: client_secret
      remoteRef:
        key: "<your-secret-store-key>"
```

> [!NOTE]
> `creationPolicy: Merge` ensures the ExternalSecret adds the `client_secret` key alongside any existing keys in the target secret.

### Option B: Direct Kubernetes secrets

If you manage secrets directly:

```bash
# Control plane — flyteadmin
kubectl create secret generic flyte-admin-secrets \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union-cp

# Control plane — scheduler
kubectl create secret generic flyte-secret-auth \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union-cp

# Control plane — union services (add to existing secret)
kubectl create secret generic union-controlplane-secrets \
  --from-literal=pass.txt='<DB_PASSWORD>' \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union-cp --dry-run=client -o yaml | kubectl apply -f -

# Dataplane — operator, propeller, CRS
kubectl create secret generic union-secret-auth \
  --from-literal=client_secret='<SERVICE_CLIENT_SECRET>' \
  -n union
```

## SDK and CLI authentication

The SDK and CLI use PKCE (Proof Key for Code Exchange) for interactive authentication:

1. The SDK calls `AuthMetadataService/GetPublicClientConfig` (an unprotected endpoint) to discover the `flytectl` client ID and redirect URI.
2. The SDK opens a browser to the IdP's authorize endpoint with a PKCE challenge.
3. The user authenticates in the browser.
4. The IdP redirects to `localhost:53593/callback` with an authorization code.
5. The SDK exchanges the code for tokens and stores them locally.
6. Subsequent requests include the token in the `flyte-authorization` header.

No additional SDK configuration is required beyond the standard `uctl` or Union config:

```yaml
admin:
  endpoint: dns:///<your-domain>
  authType: Pkce
  insecure: false
```

For headless environments (CI/CD), use the **Self-managed deployment > Advanced Configurations > Authentication > SDK and CLI authentication > Client credentials for CI/CD** flow instead.

### Client credentials for CI/CD

For automated pipelines, create a service app in your IdP and configure:

```yaml
admin:
  endpoint: dns:///<your-domain>
  authType: ClientSecret
  clientId: "<your-ci-client-id>"
  clientSecretLocation: "/path/to/client_secret"
```

Or use environment variables:

```bash
export FLYTE_CREDENTIALS_CLIENT_ID="<your-ci-client-id>"
export FLYTE_CREDENTIALS_CLIENT_SECRET="<your-ci-client-secret>"
export FLYTE_CREDENTIALS_AUTH_MODE=basic
```

## Troubleshooting

### Browser login redirects in a loop

Verify that `useAuth: true` is set in `flyte.configmap.adminServer.server.security`. Without this, the `/login`, `/callback`, and `/me` endpoints are not registered.

### SDK gets 401 Unauthenticated

1. Check that the `AuthMetadataService` routes are in the **unprotected** ingress (no auth-url annotation).
2. Verify the SDK can reach the token endpoint. The SDK discovers it via `AuthMetadataService/GetOAuth2Metadata`.
3. Check that `grpcAuthorizationHeader` matches the header name used by the SDK (`flyte-authorization`).

### Internal services get 401

1. Verify that `configMap.union.auth.enable: true` and the `client_secret` file exists at the configured `clientSecretLocation`.
2. Check `ExternalSecret` sync status: `kubectl get externalsecret -n <namespace>`.
3. Verify the secret contains the correct key: `kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.client_secret}' | base64 -d`.

### Operator or propeller cannot authenticate

1. Verify `union-secret-auth` exists in the dataplane namespace and contains `client_secret`.
2. Check operator logs for auth errors: `kubectl logs -n union -l app.kubernetes.io/name=operator --tail=50 | grep -i auth`.
3. Verify the `AUTH_CLIENT_ID` matches the control plane's `INTERNAL_CLIENT_ID`.
4. Verify the service app is included in the authorization server's access policy.

### Scheduler fails to start

1. Verify `flyte-secret-auth` exists in the control plane namespace: `kubectl get secret flyte-secret-auth -n union-cp`.
2. Check that `flyte.secrets.adminOauthClientCredentials.enabled: true` is set.
3. Check scheduler logs: `kubectl logs -n union-cp deploy/flytescheduler --tail=50`.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/code-viewer ===

# Code Viewer

The Union UI allows you to view the exact code that executed a specific task. Union securely transfers the [code bundle](https://www.union.ai/docs/v2/union/user-guide/run-scaling/life-of-a-run) directly to your browser without routing it through the control plane.

![Code Viewer](https://www.union.ai/docs/v2/union/_static/images/deployment/configuration/code-viewer/demo.png)

## Enable CORS policy on your fast registration bucket

To support this feature securely, your bucket must allow CORS access from Union. The configuration steps vary depending on your cloud provider.

### AWS S3 Console

1. Open the AWS Console.
2. Navigate to the S3 dashboard.
3. Select your fast registration bucket. By default, this is the same as the metadata bucket configured during initial deployment.
4. Click the **Permissions** tab and scroll to **Cross-origin resource sharing (CORS)**.
5. Click **Edit** and enter the following policy:
![S3 CORS Policy](https://www.union.ai/docs/v2/union/_static/images/deployment/configuration/code-viewer/s3.png)

```
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET",
            "HEAD"
        ],
        "AllowedOrigins": [
            "https://*.unionai.cloud"
        ],
        "ExposeHeaders": [
            "ETag"
        ],
        "MaxAgeSeconds": 3600
    }
]
```

For more details, see the [AWS S3 CORS documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/cors.html).

### Google GCS

Google Cloud Storage requires CORS configuration via the command line.

1. Create a `cors.json` file with the following content:
    ```json
    [
        {
        "origin": ["https://*.unionai.cloud"],
        "method": ["HEAD", "GET"],
        "responseHeader": ["ETag"],
        "maxAgeSeconds": 3600
        }
    ]
    ```
2. Apply the CORS configuration to your bucket:
    ```bash
    gcloud storage buckets update gs://<fast_registration_bucket> --cors-file=cors.json
    ```
3. Verify the configuration was applied:
   ```bash
   gcloud storage buckets describe gs://<fast_registration_bucket> --format="default(cors_config)"

   cors_config:
   - maxAgeSeconds: 3600
     method:
     - GET
     - HEAD
     origin:
     - https://*.unionai.cloud
     responseHeader:
     - ETag
   ```
For more details, see the [Google Cloud Storage CORS documentation](https://docs.cloud.google.com/storage/docs/using-cors#command-line).

### Azure Storage

For Azure Storage CORS configuration, see the [Azure Storage CORS documentation](https://learn.microsoft.com/en-us/rest/api/storageservices/cross-origin-resource-sharing--cors--support-for-the-azure-storage-services).

## Troubleshooting

| Error Message | Cause | Fix |
|---------------|-------|-----|
| `Not available: No code available for this action.` | The task does not have a code bundle. This occurs when the code is baked into the Docker image or the task is not a code-based task. | This is expected behavior for tasks without code bundles. |
| `Not Found: The code bundle file could not be found. This may be due to your organization's data retention policy.` | The code bundle was deleted from the bucket, likely due to a retention policy. | Review your fast registration bucket's retention policy settings. |
| `Error: Code download is blocked by your storage bucket's configuration. Please contact your administrator to enable access.` | CORS is not configured on the bucket. | Configure CORS on your bucket using the instructions above. |

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/image-builder ===

# Image Builder

Union Image Builder supports the ability to build container images within the dataplane. This enables the use of the `remote` builder type for any defined [Container Image](https://www.union.ai/docs/v2/union/user-guide/task-configuration/container-images).

Configure the use of remote image builder:
```bash
flyte create config --builder=remote --endpoint...
```

Write custom [container images](https://www.union.ai/docs/v2/union/user-guide/task-configuration/container-images):
```python
env = flyte.TaskEnvironment(
    name="hello_v2",
    image=flyte.Image.from_debian_base()
        .with_pip_packages("<package 1>", "<package 2>")
)
```

> By default, Image Builder is disabled and has to be enabled by configuring the builder type to `remote` in flyte config

## Requirements

* The image building process runs in the target run's project and domain. Any image push secrets needed to push images to the registry will need to be accessible from the project & domain where the build happens.

## Build backends

Image Builder supports two build backends:

| Backend | Helm configuration | Description |
|---------|-------------------|-------------|
| **BuildKit** (default) | `imageBuilder.buildkit.enabled: true` | Runs a BuildKit daemon in the cluster for building images |
| **Depot** | `imageBuilder.buildkit.enabled: false` | Uses Depot's hosted build service for faster builds |

When BuildKit is disabled and no custom `buildkitUri` is set, the chart automatically configures Depot as the build backend. In single-namespace mode, a task PodTemplate with the Depot token imagePullSecret is created automatically.
[Depot](https://depot.dev/) is a remote, persistent BuildKit builder service, while [BuildKit](https://docs.docker.com/build/buildkit/) is the underlying container image builder engine developed by Moby/Docker.

## Configuration

Image Builder is configured directly through Helm values.

```yaml
imageBuilder:

  # Enable Image Builder
  enabled: true

  # -- The config map build-image container task attempts to reference.
  # -- Should not change unless coordinated with Union technical support.
  targetConfigMapName: "build-image-config"

  # -- The URI of the buildkitd service. Used for externally managed buildkitd services.
  # -- Leaving empty and setting imageBuilder.buildkit.enabled to true will create a buildkitd service and configure the Uri appropriately.
  # -- E.g. "tcp://buildkitd.buildkit.svc.cluster.local:1234"
  buildkitUri: ""

  # -- The default repository to publish images to when "registry" is not specified in ImageSpec.
  # -- Note, the build-image task will fail unless "registry" is specified or a default repository is provided.
  defaultRepository: ""

  # -- How build-image task and operator proxy will attempt to authenticate against the default repository.
  # -- Supported values are "noop", "google", "aws", "azure"
  # -- "noop" no authentication is attempted
  # -- "google" uses docker-credential-gcr to authenticate to the default registry
  # -- "aws" uses docker-credential-ecr-login to authenticate to the default registry
  # -- "azure" uses az acr login to authenticate to the default registry. Requires Azure Workload Identity to be enabled.
  authenticationType: "noop"

  buildkit:

    # -- Enable buildkit service within this release.
    # -- Set to false to use Depot instead.
    enabled: true

    # Configuring Union managed buildkitd Kubernetes resources.
    ...
```

## Authentication

### AWS

By default, Union is intended to be configured to use [IAM roles for service accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for authentication. Setting `authenticationType` to `aws` configures Union image builder related services to use AWS default credential chain. Additionally, Union image builder uses [`docker-credential-ecr-login`](https://github.com/awslabs/amazon-ecr-credential-helper) to authenticate to the ecr repository configured with `defaultRepository`.

`defaultRepository` should be the fully qualified ECR repository name, e.g. `<AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com/<REPOSITORY_NAME>`.

Therefore, it is necessary to configure the user role with the following permissions.

```json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken"
  ],
  "Resource": "*"
},
{
  "Effect": "Allow",
  "Action": [
    "ecr:BatchCheckLayerAvailability",
    "ecr:BatchGetImage",
    "ecr:GetDownloadUrlForLayer"
  ],
  "Resource": "*"
  // Or
  // "Resource": "arn:aws:ecr:<AWS_REGION>:<AWS_ACCOUNT_ID>:repository/<REPOSITORY>"
}
```

Similarly, the `operator-proxy` requires the following permissions

```json
{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken"
  ],
  "Resource": "*"
},
{
  "Effect": "Allow",
  "Action": [
    "ecr:DescribeImages"
  ],
  "Resource": "arn:aws:ecr:<AWS_REGION>:<AWS_ACCOUNT_ID>:repository/<REPOSITORY>"
}
```

#### AWS Cross Account access

Access to repositories that do not exist in the same AWS account as the data plane requires additional ECR resource-based permissions. An ECR policy like the following is required if the configured `defaultRepository` or `ImageSpec`'s `registry` exists in an AWS account different from the dataplane's.

```json
{
  "Statement": [
    {
      "Sid": "AllowPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<user-role>",
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<node-role>",
          // ... Additional roles that require image pulls
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    },
    {
      "Sid": "AllowDescribeImages",
      "Action": [
        "ecr:DescribeImages"
      ],
      "Principal": {
        "AWS": [
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<operator-proxy-role>",
        ]
      },
      "Effect": "Allow"
    },
    {
      "Sid": "ManageRepositoryContents"
      // ...
    }
  ],
  "Version": "2012-10-17"
}
```

In order to support a private ImageSpec `base_image` the following permissions are required.

```json
{
  "Statement": [
    {
      "Sid": "AllowPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<user-role>",
          "arn:aws:iam::<DATAPLANE_AWS_ACCOUNT>:role/<node-role>",
          // ... Additional roles that require image pulls
        ]
      },
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:BatchGetImage",
        "ecr:GetDownloadUrlForLayer"
      ]
    },
  ]
}
```

### Google Cloud Platform

By default, GCP uses [Kubernetes Service Accounts to GCP IAM](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#kubernetes-sa-to-iam) for authentication. Setting `authenticationType` to `google` configures Union image builder related services to use GCP default credential chain. Additionally, Union image builder uses [`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr) to authenticate to the Google artifact registries referenced by `defaultRepository`.

`defaultRepository` should be the full name to the repository in combination with an optional image name prefix. `<GCP_LOCATION>-docker.pkg.dev/<GCP_PROJECT_ID>/<REPOSITORY_NAME>/<IMAGE_PREFIX>`.

It is necessary to configure the GCP user service account with `iam.serviceAccounts.signBlob` project level permissions.

#### GCP Cross Project access

Access to registries that do not exist in the same GCP project as the data plane requires additional GCP permissions.

* Configure the user "role" service account with the `Artifact Registry Writer`.
* Configure the GCP worker node and union-operator-proxy service accounts with the `Artifact Registry Reader` role.

### Azure

By default, Union is designed to use Azure [Workload Identity Federation](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) for authentication using [user-assigned managed identities](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp) in place of AWS IAM roles.

* Configure the user "role" user-assigned managed identity with the `AcrPush` role.
* Configure the Azure kubelet identity ID and operator-proxy user-assigned managed identities with the `AcrPull` role.

### Private registries

Follow guidance in this section to integrate Image Builder with private registries:

#### GitHub Container Registry

1. Follow the [GitHub guide](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry) to log in to the registry locally.
2. Create a Union secret:
```bash
flyte create secret --type image_pull --from-docker-config --registries ghcr.io SECRET_NAME
```

> This secret will be available to all projects and domains in your tenant. [Learn more about Union Secrets](./union-secrets)
> Check alternative ways to create image pull secrets in the [API reference](https://www.union.ai/docs/v2/union/api-reference/flyte-cli)

3. Reference this secret in the Image object:

```python
env = flyte.TaskEnvironment(
    name="hello_v2",
    # Allow image builder to pull and push from the private registry. `registry` field isn't required if it's configured
    # as the default registry in imagebuilder section in the helm chart values file.
    image=flyte.Image.from_debian_base(registry="<my registry url>", name="private", registry_secret="<YOUR_SECRET_NAME>")
        .with_pip_packages("<package 1>", "<package 2>"),
    # Mount the same secret to allow tasks to pull that image
    secrets=["<YOUR_SECRET_NAME>"]
)
```

This will enable Image Builder to push images and layers to a private GHCR. It'll also allow pods for this task environment to pull
this image at runtime.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/multi-cluster ===

# Multiple Clusters

Union enables you to integrate multiple Kubernetes clusters into a single Union control plane using the `clusterPool` abstraction.

Currently, the clusterPool configuration is performed by Union in the control plane when you provide the mapping between clusterPool name and clusterNames using the following structure:

```yaml
clusterPoolname:
  - clusterName
```
With `clusterName` matching the name you used to install the Union operator Helm chart.

You can have as many cluster pools as needed:

**Example**

```yaml
default: # this is the clusterPool where executions will run, unless another mapping specified
  - my-dev-cluster
development-cp:
  - my-dev-cluster
staging-cp:
  - my-staging-cluster
production-cp:
  - production-cluster-1
  - production-cluster-2
dr-region:
  - dr-site-cluster
```

## Using cluster pools

Once the Union team configures the clusterPools in the control plane, you can proceed to configure mappings:

### project-domain-clusterPool mapping

1. Create a YAML file that includes the project, domain, and clusterPool:

**Example: cpa-dev.yaml**

```yaml
domain: development
project: flytesnacks
clusterPoolName: development-cp
```

2. Update the control plane with this mapping:

```bash
uctl update cluster-pool-attributes --attrFile cpa-dev.yaml
```
3. New executions in `flytesnacks-development` should now run in the `my-dev-cluster`

### project-domain-workflow-clusterPool mapping

1. Create a YAML file that includes the project, domain, and clusterPool:

**Example: cpa-dev.yaml**

```yaml
domain: production
project: flytesnacks
workflow: my_critical_wf
clusterPoolName: production-cp
```

2. Update the control plane with this mapping:

```bash
uctl update cluster-pool-attributes --attrFile cpa-prod.yaml
```
3. New executions of the `my_critical_wf` workflow in `flytesnacks-production` should now run in any of the clusters under `production-cp`

## Data sharing between cluster pools

The sharing of metadata is controlled by the cluster pool to which a cluster belongs. If two clusters are in the same cluster pool, then they must share the same metadata bucket, defined in the Helm values as `storage.bucketName`.

If they are in different cluster pools, then they **must** have different metadata buckets. You could, for example, have a single metadata bucket for all your development clusters, and a separate one for all your production clusters, by grouping the clusters into cluster pools accordingly.

 Alternatively you could have a separate metadata bucket for each cluster, by putting each cluster in its own cluster pool.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/persistent-logs ===

# Persistent logs

Persistent logging is enabled by default. The data plane deploys [FluentBit](https://fluentbit.io/) as a DaemonSet that collects container logs from every node and writes them to the `persisted-logs/` path in the object store configured for your data plane.

FluentBit runs under the `fluentbit-system` Kubernetes service account. This service account must have write access to the storage bucket so FluentBit can push logs. The sections below describe how to grant that access on each cloud provider.

## AWS (IRSA)

On EKS, use [IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) to grant the FluentBit service account permission to write to S3.

### 1. Create an IAM policy

Create an IAM policy that allows writing to your metadata S3 bucket:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:PutObjectAcl",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<BUCKET_NAME>",
        "arn:aws:s3:::<BUCKET_NAME>/persisted-logs/*"
      ]
    }
  ]
}
```

Replace `<BUCKET_NAME>` with the name of your data plane metadata bucket.

### 2. Create an IAM role with a trust policy

Create an IAM role that trusts the EKS OIDC provider and is scoped to the `fluentbit-system` service account:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<ACCOUNT_ID>:oidc-provider/<OIDC_PROVIDER>"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "<OIDC_PROVIDER>:sub": "system:serviceaccount:<NAMESPACE>:fluentbit-system",
          "<OIDC_PROVIDER>:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}
```

Replace:

- `<ACCOUNT_ID>` with your AWS account ID
- `<OIDC_PROVIDER>` with your EKS cluster's OIDC provider (e.g. `oidc.eks.us-east-1.amazonaws.com/id/EXAMPLE`)
- `<NAMESPACE>` with the namespace where the data plane is installed (default: `union`)

You can retrieve the OIDC provider URL with:

```bash
aws eks describe-cluster --name <CLUSTER_NAME> --region <REGION> \
  --query "cluster.identity.oidc.issuer" --output text
```

Attach the IAM policy from step 1 to this role.

### 3. Configure the Helm values

Set the IRSA annotation on the FluentBit service account in your data plane Helm values:

```yaml
fluentbit:
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/<FLUENTBIT_ROLE_NAME>"
```

## Azure (Workload Identity Federation)

On AKS, use [Microsoft Entra Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) to grant the FluentBit service account access to Azure Blob Storage.

### Azure prerequisites

- Your AKS cluster must be [enabled as an OIDC Issuer](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer)
- The [Azure Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-deploy-cluster) mutating webhook must be installed on your cluster

### 1. Create or reuse a Managed Identity

Create a User Assigned Managed Identity (or reuse an existing one):

```bash
az identity create \
  --name fluentbit-identity \
  --resource-group <RESOURCE_GROUP> \
  --location <LOCATION>
```

Note the `clientId` from the output.

### 2. Add a federated credential

Create a federated credential that maps the `fluentbit-system` Kubernetes service account to the managed identity:

```bash
az identity federated-credential create \
  --name fluentbit-federated-credential \
  --identity-name fluentbit-identity \
  --resource-group <RESOURCE_GROUP> \
  --issuer <AKS_OIDC_ISSUER_URL> \
  --subject "system:serviceaccount:<NAMESPACE>:fluentbit-system" \
  --audiences "api://AzureADTokenExchange"
```

Replace:

- `<RESOURCE_GROUP>` with your Azure resource group
- `<AKS_OIDC_ISSUER_URL>` with the OIDC issuer URL of your AKS cluster
- `<NAMESPACE>` with the namespace where the data plane is installed (default: `union`)

You can retrieve the OIDC issuer URL with:

```bash
az aks show --name <CLUSTER_NAME> --resource-group <RESOURCE_GROUP> \
  --query "oidcIssuerProfile.issuerUrl" --output tsv
```

### 3. Assign a storage role

Assign the `Storage Blob Data Contributor` role to the managed identity at the storage account level:

```bash
az role assignment create \
  --assignee <CLIENT_ID> \
  --role "Storage Blob Data Contributor" \
  --scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<STORAGE_ACCOUNT>"
```

### 4. Configure the Azure Helm values

Set the Workload Identity annotation on the FluentBit service account in your data plane Helm values:

```yaml
fluentbit:
  serviceAccount:
    annotations:
      azure.workload.identity/client-id: "<CLIENT_ID>"
```

You must also ensure the FluentBit pods have the Workload Identity label. If you have already set `additionalPodLabels` for your data plane, confirm the following label is present:

```yaml
additionalPodLabels:
  azure.workload.identity/use: "true"
```

## GCP (Workload Identity)

On GKE, use [GKE Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) to grant the FluentBit service account access to GCS.

### GCP prerequisites

- [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#enable) must be enabled on your GKE cluster

### 1. Create or reuse a GCP service account

Create a GCP service account (or reuse an existing one):

```bash
gcloud iam service-accounts create fluentbit-gsa \
  --display-name "FluentBit logging service account" \
  --project <PROJECT_ID>
```

### 2. Grant storage permissions

Grant the service account write access to the metadata bucket:

```bash
gcloud storage buckets add-iam-policy-binding gs://<BUCKET_NAME> \
  --member "serviceAccount:fluentbit-gsa@<PROJECT_ID>.iam.gserviceaccount.com" \
  --role "roles/storage.objectAdmin"
```

### 3. Bind the Kubernetes service account to the GCP service account

Allow the `fluentbit-system` Kubernetes service account to impersonate the GCP service account:

```bash
gcloud iam service-accounts add-iam-policy-binding \
  fluentbit-gsa@<PROJECT_ID>.iam.gserviceaccount.com \
  --role "roles/iam.workloadIdentityUser" \
  --member "serviceAccount:<PROJECT_ID>.svc.id.goog[<NAMESPACE>/fluentbit-system]"
```

Replace:

- `<PROJECT_ID>` with your GCP project ID
- `<BUCKET_NAME>` with the name of your data plane metadata bucket
- `<NAMESPACE>` with the namespace where the data plane is installed (default: `union`)

### 4. Configure the GCP Helm values

Set the Workload Identity annotation on the FluentBit service account in your data plane Helm values:

```yaml
fluentbit:
  serviceAccount:
    annotations:
      iam.gke.io/gcp-service-account: "fluentbit-gsa@<PROJECT_ID>.iam.gserviceaccount.com"
```

## Disabling persistent logs

To disable persistent logging entirely, set the following in your Helm values:

```yaml
fluentbit:
  enabled: false
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/monitoring ===

# Monitoring

The Union.ai data plane deploys a static [Prometheus](https://prometheus.io/) instance that collects metrics required for platform features like cost tracking, task-level resource monitoring, and execution observability. This Prometheus instance is pre-configured and requires no additional setup.

For operational monitoring of the cluster itself (node health, API server metrics, CoreDNS, etc.), the data plane chart includes an optional [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) instance that can be enabled separately.

## Architecture overview

The data plane supports two independent monitoring concerns:

| Concern | What it monitors | How it's deployed | Configurable |
|---------|-----------------|-------------------|--------------|
| **Union features** | Task execution metrics, cost tracking, GPU utilization, container resources | Prometheus with pre-built scrape config (`prometheus` or `prometheus-simple`) | Retention, resources, scheduling |
| **Cluster health** (optional) | Kubernetes components, node health, alerting, Grafana dashboards | `kube-prometheus-stack` via `monitoring.enabled` | Full kube-prometheus-stack values |

The chart offers two Prometheus deployment options for Union features:

| Option | Helm key | CRDs required | Cluster-wide RBAC | Best for |
|--------|----------|--------------|-------------------|----------|
| **Static Prometheus** (default) | `prometheus` | No | Yes | Standard deployments |
| **Prometheus Simple** | `prometheus-simple` | No | No | Low-privilege / single-namespace deployments |

> [!NOTE] Mutual exclusivity
> `prometheus` and `prometheus-simple` cannot be enabled at the same time. The chart will fail validation if both are enabled.

```
                    ┌─────────────────────────────────────┐
                    │          Data Plane Cluster         │
                    │                                     │
                    │  ┌──────────────────────┐           │
                    │  │  Prometheus          │           │
                    │  │  (Union features)    │           │
                    │  │  ┌────────────────┐  │           │
                    │  │  │ Scrape targets │  │           │
                    │  │  │ - kube-state   │  │           │
                    │  │  │ - cAdvisor     │  │           │
                    │  │  │ - propeller    │  │           │
                    │  │  │ - opencost     │  │           │
                    │  │  │ - dcgm (GPU)   │  │           │
                    │  │  │ - envoy        │  │           │
                    │  │  └────────────────┘  │           │
                    │  └─────────────────────-┘           │
                    │                                     │
                    │  ┌──────────────────────┐           │
                    │  │  kube-prometheus     │           │
                    │  │  -stack (optional)   │           │
                    │  │  - Prometheus        │           │
                    │  │  - Alertmanager      │           │
                    │  │  - Grafana           │           │
                    │  │  - node-exporter     │           │
                    │  └──────────────────────┘           │
                    └─────────────────────────────────────┘
```

## Union features Prometheus

The static Prometheus instance is always deployed and pre-configured to scrape the metrics that Union.ai requires. No Prometheus Operator or CRDs are needed. This instance is a platform dependency and should not be replaced or reconfigured.

### Scrape targets

The following targets are scraped automatically:

| Job | Target | Metrics collected |
|-----|--------|------------------|
| `kube-state-metrics` | Pod/node resource requests, limits, status, capacity | Cost calculations, resource tracking |
| `kubernetes-cadvisor` | Container CPU and memory usage via kubelet | Task-level resource monitoring |
| `flytepropeller` | Execution round info, fast task duration | Execution observability |
| `opencost` | Node hourly cost rates (CPU, RAM, GPU) | Cost tracking |
| `gpu-metrics` | DCGM exporter metrics (when `dcgm-exporter.enabled`) | GPU utilization |
| `serving-envoy` | Envoy upstream request counts and latency (when `serving.enabled`) | Inference serving metrics |

### Configuration

The static Prometheus instance is configured under the `prometheus` key in your data plane values:

```yaml
prometheus:
  image:
    repository: prom/prometheus
    tag: v3.3.1
  # Data retention period
  retention: 3d
  # Route prefix for the web UI and API
  routePrefix: /prometheus/
  resources:
    limits:
      cpu: "3"
      memory: "3500Mi"
    requests:
      cpu: "1"
      memory: "1Gi"
  serviceAccount:
    create: true
    annotations: {}
  priorityClassName: system-cluster-critical
  nodeSelector: {}
  tolerations: []
  affinity: {}
```

> [!NOTE] Retention and storage
> The default 3-day retention is sufficient for Union.ai features. Increase `retention` if you query historical feature metrics directly.

### Internal service endpoint

Other data plane components reach Prometheus at:

```
http://union-operator-prometheus.<NAMESPACE>.svc:80/prometheus
```

OpenCost is pre-configured to use this endpoint. You do not need to change it unless you rename the Helm release.

## Prometheus Simple (low-privilege mode)

For deployments that cannot use cluster-wide RBAC (e.g., single-namespace or low-privilege mode), enable `prometheus-simple` instead of the default static Prometheus:

```yaml
prometheus:
  enabled: false
prometheus-simple:
  enabled: true
  rbac:
    create: false  # Namespace-scoped Role is created by the dataplane chart
  kube-state-metrics:
    enabled: true
    rbac:
      useClusterRole: false
    releaseNamespace: true
```

This deploys a standalone Prometheus instance with namespace-scoped RBAC. The dataplane chart creates the necessary Role and RoleBinding automatically.

> [!NOTE] Node-level metrics
> In low-privilege mode, kube-state-metrics only watches the release namespace. Pod-level metrics (`kube_pod_*`, `kube_pod_container_*`) are available, but node-level metrics (`kube_node_*`) are not, since nodes are cluster-scoped resources.

### Recording rules

The chart includes pre-built recording rules for cost tracking and execution observability (GPU allocation, execution metadata, workspace metrics). These rules are:

- Embedded in a `PrometheusRule` when using `kube-prometheus-stack`
- Embedded in a ConfigMap when using `prometheus-simple`
- Only enabled when `cost.enabled: true` and the deployment is not in low-privilege mode

## Enabling cluster health monitoring

To enable operational monitoring with Prometheus Operator, Alertmanager, Grafana, and node-exporter:

```yaml
monitoring:
  enabled: true
```

This deploys a full [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) instance with sensible defaults:

- Prometheus with 7-day retention
- Grafana with admin credentials (override `monitoring.grafana.adminPassword` in production)
- Node exporter, kube-state-metrics, kubelet, CoreDNS, API server, etcd, and scheduler monitoring
- Default alerting and recording rules

### Prometheus Operator CRDs

The `kube-prometheus-stack` uses the Prometheus Operator, which discovers scrape targets and alerting rules through Kubernetes CRDs (ServiceMonitor, PodMonitor, PrometheusRule, etc.). If you prefer to use static scrape configs with your own Prometheus instead, see **Self-managed deployment > Advanced Configurations > Monitoring > Scraping Union services from your own Prometheus**.

To install the CRDs, use the `dataplane-crds` chart:

```yaml
# dataplane-crds values
crds:
  flyte: true
  prometheusOperator: true  # Install Prometheus Operator CRDs
```

Then install or upgrade the CRDs chart before the data plane chart:

```shell
helm upgrade --install union-dataplane-crds unionai/dataplane-crds \
  --namespace union \
  --set crds.prometheusOperator=true
```

> [!NOTE] CRD installation order
> CRDs must be installed before the data plane chart. The `dataplane-crds` chart should be deployed first, and the monitoring stack's own CRD installation is disabled (`monitoring.crds.enabled: false`) to avoid conflicts.

### Customizing the monitoring stack

The monitoring stack accepts all [kube-prometheus-stack values](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#configuration) under the `monitoring` key. Common overrides:

```yaml
monitoring:
  enabled: true

  # Grafana
  grafana:
    enabled: true
    adminPassword: "my-secure-password"
    ingress:
      enabled: true
      ingressClassName: nginx
      hosts:
        - grafana.example.com

  # Prometheus retention and resources
  prometheus:
    prometheusSpec:
      retention: 30d
      resources:
        requests:
          memory: "2Gi"

  # Alertmanager
  alertmanager:
    enabled: true
    # Configure receivers, routes, etc.
```

The monitoring stack's Prometheus supports [remote write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) for forwarding metrics to external time-series databases (Amazon Managed Prometheus, Grafana Cloud, Thanos, etc.):

```yaml
monitoring:
  prometheus:
    prometheusSpec:
      remoteWrite:
        - url: "https://aps-workspaces.<REGION>.amazonaws.com/workspaces/<WORKSPACE_ID>/api/v1/remote_write"
          sigv4:
            region: <REGION>
```

For the full set of configurable values, see the [kube-prometheus-stack chart documentation](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack).

## Scraping Union services from your own Prometheus

If you already run Prometheus in your cluster, you can scrape Union.ai data plane services for operational visibility. All services expose metrics on standard ports.

> [!NOTE] Union features Prometheus
> The built-in static Prometheus handles all metrics required for Union.ai platform features. Scraping from your own Prometheus is for additional operational visibility only -- it does not replace the built-in instance.

### Static scrape configs

Add these jobs to your Prometheus configuration:

```yaml
scrape_configs:
  # Data plane service metrics (operator, propeller, etc.)
  - job_name: union-dataplane-services
    kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names: [union]
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_instance]
        regex: union-dataplane
        action: keep
      - source_labels: [__meta_kubernetes_endpoint_port_name]
        regex: debug
        action: keep
```

### ServiceMonitor (Prometheus Operator)

If you run the Prometheus Operator, create a ServiceMonitor instead:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: union-dataplane-services
  namespace: union
spec:
  selector:
    matchLabels:
      app.kubernetes.io/instance: union-dataplane
  namespaceSelector:
    matchNames:
      - union
  endpoints:
    - port: debug
      path: /metrics
      interval: 30s
```

This requires the Prometheus Operator CRDs. Install them via the `dataplane-crds` chart with `crds.prometheusOperator: true`.

## Further reading

- [Prometheus documentation](https://prometheus.io/docs/introduction/overview/) -- comprehensive guide to Prometheus configuration, querying, and operation
- [Prometheus remote write](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write) -- forwarding metrics to external storage
- [Prometheus `kubernetes_sd_config`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config) -- Kubernetes service discovery for scrape targets
- [kube-prometheus-stack chart](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack) -- full monitoring stack with Grafana and alerting
- [OpenCost documentation](https://www.opencost.io/docs/) -- cost allocation and tracking

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/union-secrets ===

# Secrets

[Union Secrets](https://www.union.ai/docs/v2/union/user-guide/task-configuration/secrets) are enabled by default. Union Secrets are managed secrets created through the native Kubernetes secret manager.

The only configurable option is the namespace where the secret is stored. To override the default behavior, set `proxy.secretManager.namespace` in the values file used by the helm chart. If this is not specified, the `union` namespace will be used by default.

Example:
```yaml
proxy:
  secretManager:
    # -- Set the namespace for union managed secrets created through the native Kubernetes secret manager. If the namespace is not set,
    # the release namespace will be used.
    namespace: "secret"
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/data-retention ===

Implications of object storage retention or lifecycle policies on the default bucket and metadata.

# Data retention policies

Union.ai relies on object storage for both **metadata** and **raw data** (your data that is passing through the workflow). Bucket-level retention and lifecycle policies (such as S3 lifecycle rules) that affect the metadata store can cause execution failures, broken history, and data loss.

## How Union.ai uses the default bucket

The platform uses a **default object store bucket** in the data plane for two distinct purposes:

1. **Metadata store** — References, execution state, and pointers to task outputs. The control plane and UI use this metadata to schedule workflows, resolve task dependencies, display execution history, and resolve output locations. This data is required for the correct operation of the platform.

2. **Raw data store** — Large task inputs and outputs or complex types (for example `FlyteFile`, dataframes, etc.). The metadata store holds only pointers to these blobs; the actual bytes live in the raw data store.

Because the **default bucket contains the metadata store**, it must be treated as **durable storage**. Retention or lifecycle policies that delete or overwrite objects in this bucket are **not supported** and can lead to data loss and system failure. There is **no supported way** to recover from metadata loss.

## Impact of metadata loss

| Area | Impact |
|------|--------|
| **UI and APIs** | Execution list or detail views may show errors or "resource not found." Output previews may fail to load. |
| **Execution engine** | In-flight or downstream tasks that depend on a node's output can fail. Retry state may be lost. |
| **Caching** | Pointers to cached outputs may be lost, resulting in cache misses; tasks may re-run or fail. |
| **Traces** | [Trace](https://www.union.ai/docs/v2/union/user-guide/task-programming/traces) checkpoint data (used by `@flyte.trace` for fine-grained recovery from system failures) may be lost, preventing resume-from-checkpoint. |
| **Data** | Raw blobs may still exist, but without metadata the system has no pointers to them. That data becomes **orphaned**. Downstream tasks that consume outputs by reference will fail at runtime. |
| **Operations** | Audit trails and the record of what ran, when, and with what outputs are lost. |

## Retention on a separate raw-data location

If you separate raw data from metadata, you can apply retention policies **only to the raw data location** while keeping metadata durable. This is the only supported approach for applying retention. You can do this either by configuring separate buckets using `configuration.storage.metadataContainer` and `configuration.storage.userDataContainer` in the [data plane chart](https://github.com/unionai/helm-charts/blob/master/charts/dataplane/values.yaml), or by using a metadata prefix within the same bucket (see **Self-managed deployment > Advanced Configurations > Data retention policies > Customizing the metadata path** below).

Be aware of the trade-offs:

- **Historical executions** that reference purged raw data will fail.
- **Cached task outputs** stored as raw data will be lost, causing cache misses and task re-execution.
- **Trace checkpoints** stored in the raw-data location will be purged, preventing resume-from-checkpoint for affected executions.

Data correctness is not silently violated, but the benefits of caching and trace-based recovery are lost for purged data.

## Customizing the metadata path

You can control where metadata is stored within the bucket via the **`config.core.propeller.metadata-prefix`** setting (e.g. `metadata/propeller` in the [data plane chart values](https://github.com/unionai/helm-charts/blob/master/charts/dataplane/values.yaml)). This lets you design lifecycle rules that **exclude** the metadata prefix (for example, in S3 lifecycle rules, apply expiration only to prefixes that do not include the metadata path) so that only non-metadata paths are subject to retention.

Confirm the exact prefix and bucket layout for your deployment from the chart configuration, and validate any retention rules in a non-production environment before applying them broadly.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/plugins ===

# Compute plugins

Union.ai supports distributed computing plugins that extend the platform with specialized workloads like [Dask](https://www.union.ai/docs/v2/union/deployment/integrations/dask/_index) and [Ray](https://www.union.ai/docs/v2/union/deployment/integrations/ray/_index). These plugins require their respective Kubernetes operators to be installed on your data plane cluster, along with Helm configuration to enable the plugin and configure log and dashboard links.

## Dask

[Dask](https://www.dask.org/) is a flexible parallel computing library for analytics. The Dask plugin enables you to run distributed Dask workloads on your Union.ai cluster.

### Install the Dask operator

Install the Dask Kubernetes operator using Helm:

```bash
helm repo add dask https://helm.dask.org
helm repo update
helm upgrade --install dask-kubernetes-operator dask/dask-kubernetes-operator \
  --create-namespace \
  --namespace dask-operator \
  --version 2024.4.1 \
  --timeout 600s
```

### Configure the data plane Helm values

Add the following to your data plane Helm values to enable the Dask plugin and configure log and dashboard links.

### AWS

```yaml
config:
  enabled_plugins:
    tasks:
      task-plugins:
        enabled-plugins:
          - connector-service
          - container
          - dask
          - echo
          - fast-task
          - k8s-array
          - sidecar
        default-for-task-types:
          dask: dask

  task_logs:
    plugins:
      dask:
        logs:
          cloudwatch-enabled: false
          kubernetes-enabled: false
          templates:
            - displayName: "Cloudwatch Logs"
              scheme: TaskExecution
              templateUris:
                - 'https://{{ ternary .Values.storage.region "us-east-2" (eq .Values.storage.provider "s3") }}.console.aws.amazon.com/cloudwatch/home?region={{ ternary .Values.storage.region "us-east-2" (eq .Values.storage.provider "s3") }}#logsV2:log-groups/log-group/$252Funion$252Fcluster-{{.Values.clusterName}}$252Ftask/log-events/kube.namespace-{{`{{.namespace}}`}}.pod-{{`{{.podName}}`}}.cont-job-runner'
            - displayName: Dask Dashboard
              linkType: dashboard
              scheme: TaskExecution
              templateUris:
                - "/dataplane/dask/v1/generated_name/task/{{`{{.executionProject}}`}}/{{`{{.executionDomain}}`}}/{{`{{.executionName}}`}}/{{`{{.nodeID}}`}}/{{`{{.taskRetryAttempt}}`}}/{{.Values.clusterName}}/{{`{{.namespace}}`}}/{{`{{.taskProject}}`}}/{{`{{.taskDomain}}`}}/{{`{{.taskID}}`}}/{{`{{.taskVersion}}`}}/{{`{{.generatedName}}`}}/status"
            - displayName: Dask Runner logs
              scheme: TaskExecution
              templateUris:
                - "/{{`{{.executionProject}}`}}/domains/{{`{{.executionDomain}}`}}/executions/{{`{{.executionName}}`}}/nodeId/{{`{{.nodeID}}`}}/taskId/{{`{{.taskID}}`}}/attempt/{{`{{.taskRetryAttempt}}`}}/view/logs?duration=all&fromExecutionNav=true"
```

### GCP

```yaml
config:
  enabled_plugins:
    tasks:
      task-plugins:
        enabled-plugins:
          - connector-service
          - container
          - dask
          - echo
          - fast-task
          - k8s-array
          - sidecar
        default-for-task-types:
          dask: dask

  task_logs:
    plugins:
      dask:
        logs:
          cloudwatch-enabled: false
          kubernetes-enabled: false
          templates:
            - displayName: "Stackdriver Logs"
              scheme: TaskExecution
              templateUris:
                - "https://console.cloud.google.com/logs/query;query=resource.labels.namespace_name%3D%22{{`{{.namespace}}`}}%22%0Aresource.labels.pod_name%3D%22{{`{{.podName}}`}}%22%0Aresource.labels.container_name%3D%22job-runner%22?project={{.Values.storage.gcp.projectId}}&angularJsUrl=%2Flogs%2Fviewer%3Fproject%3D{{.Values.storage.gcp.projectId}}"
            - displayName: Dask Dashboard
              linkType: dashboard
              scheme: TaskExecution
              templateUris:
                - "/dataplane/dask/v1/generated_name/task/{{`{{.executionProject}}`}}/{{`{{.executionDomain}}`}}/{{`{{.executionName}}`}}/{{`{{.nodeID}}`}}/{{`{{.taskRetryAttempt}}`}}/{{.Values.clusterName}}/{{`{{.namespace}}`}}/{{`{{.taskProject}}`}}/{{`{{.taskDomain}}`}}/{{`{{.taskID}}`}}/{{`{{.taskVersion}}`}}/{{`{{.generatedName}}`}}/status"
            - displayName: Dask Runner logs
              scheme: TaskExecution
              templateUris:
                - "/{{`{{.executionProject}}`}}/domains/{{`{{.executionDomain}}`}}/executions/{{`{{.executionName}}`}}/nodeId/{{`{{.nodeID}}`}}/taskId/{{`{{.taskID}}`}}/attempt/{{`{{.taskRetryAttempt}}`}}/view/logs?duration=all&fromExecutionNav=true"
```

## Ray

[Ray](https://www.ray.io/) is a unified framework for scaling AI and Python applications. The Ray plugin enables you to run distributed Ray workloads on your Union.ai cluster.

### Install the KubeRay operator

Install the KubeRay CRDs and operator:

```bash
kubectl create -k "https://github.com/ray-project/kuberay/ray-operator/config/crd?ref=v1.1.0"
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm upgrade --install kuberay-operator kuberay/kuberay-operator \
  --create-namespace \
  --namespace kuberay-operator \
  --version 1.1.0 \
  --set resources.limits.memory=1Gi \
  --skip-crds
```

### Configure the data plane Helm values

Add the following to your data plane Helm values to enable the Ray plugin and configure log and dashboard links.

### AWS

```yaml
config:
  enabled_plugins:
    tasks:
      task-plugins:
        enabled-plugins:
          - connector-service
          - container
          - echo
          - fast-task
          - k8s-array
          - ray
          - sidecar
        default-for-task-types:
          ray: ray

  task_logs:
    plugins:
      ray:
        logs:
          templates:
            - displayName: "Ray Dashboard"
              linkType: dashboard
              scheme: TaskExecution
              templateUris:
                - "/dataplane/ray/v1/generated_name/task/{{`{{.executionProject}}`}}/{{`{{.executionDomain}}`}}/{{`{{.executionName}}`}}/{{`{{.nodeID}}`}}/{{`{{.taskRetryAttempt}}`}}/{{.Values.clusterName}}/{{`{{.namespace}}`}}/{{`{{.taskProject}}`}}/{{`{{.taskDomain}}`}}/{{`{{.taskID}}`}}/{{`{{.taskVersion}}`}}/{{`{{.generatedName}}`}}/"
            - displayName: "Cloudwatch Logs (Ray All)"
              scheme: TaskExecution
              templateUris:
                - 'https://{{ternary .Values.storage.region "us-east-2" (eq .Values.storage.provider "s3")}}.console.aws.amazon.com/cloudwatch/home?region={{ternary .Values.storage.region "us-east-2" (eq .Values.storage.provider "s3")}}#logsV2:log-groups/log-group/$252Funion$252Fcluster-{{.Values.clusterName}}$252Ftask$3FlogStreamNameFilter$3Dkube.namespace-{{`{{.namespace}}`}}.pod-{{`{{.executionName}}`}}-{{`{{.nodeID}}`}}-{{`{{.taskRetryAttempt}}`}}'
            - displayName: Ray Head logs
              scheme: TaskExecution
              templateUris:
                - "/{{`{{.executionProject}}`}}/domains/{{`{{.executionDomain}}`}}/executions/{{`{{.executionName}}`}}/nodeId/{{`{{.nodeID}}`}}/taskId/{{`{{.taskID}}`}}/attempt/{{`{{.taskRetryAttempt}}`}}/view/logs?duration=all&fromExecutionNav=true"
```

### GCP

```yaml
config:
  enabled_plugins:
    tasks:
      task-plugins:
        enabled-plugins:
          - connector-service
          - container
          - echo
          - fast-task
          - k8s-array
          - ray
          - sidecar
        default-for-task-types:
          ray: ray

  task_logs:
    plugins:
      ray:
        logs:
          templates:
            - displayName: "Ray Dashboard"
              linkType: dashboard
              scheme: TaskExecution
              templateUris:
                - "/dataplane/ray/v1/generated_name/task/{{`{{.executionProject}}`}}/{{`{{.executionDomain}}`}}/{{`{{.executionName}}`}}/{{`{{.nodeID}}`}}/{{`{{.taskRetryAttempt}}`}}/{{.Values.clusterName}}/{{`{{.namespace}}`}}/{{`{{.taskProject}}`}}/{{`{{.taskDomain}}`}}/{{`{{.taskID}}`}}/{{`{{.taskVersion}}`}}/{{`{{.generatedName}}`}}/"
            - displayName: "Stackdriver Logs (Ray All)"
              scheme: TaskExecution
              templateUris:
                - "https://console.cloud.google.com/logs/query;query=resource.labels.namespace_name%3D%22{{`{{.namespace}}`}}%22%0Aresource.labels.pod_name%3D%7E%22{{`{{.executionName}}`}}-{{`{{.nodeID}}`}}-{{`{{.taskRetryAttempt}}`}}%22?project={{.Values.storage.gcp.projectId}}&angularJsUrl=%2Flogs%2Fviewer%3Fproject%3D{{.Values.storage.gcp.projectId}}"
            - displayName: Ray Head logs
              scheme: TaskExecution
              templateUris:
                - "/{{`{{.executionProject}}`}}/domains/{{`{{.executionDomain}}`}}/executions/{{`{{.executionName}}`}}/nodeId/{{`{{.nodeID}}`}}/taskId/{{`{{.taskID}}`}}/attempt/{{`{{.taskRetryAttempt}}`}}/view/logs?duration=all&fromExecutionNav=true"
```

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/configuration/namespace-mapping ===

# Namespace mapping

By default, Union.ai maps each project-domain pair to a Kubernetes namespace using the pattern `{project}-{domain}`. For example, the project `flytesnacks` in domain `development` runs workloads in namespace `flytesnacks-development`.

You can customize this mapping by setting the `namespace_mapping.template` value in your Helm configuration.

## Template syntax

The template uses Go template syntax with two variables:

- `{{ project }}` — the project name
- `{{ domain }}` — the domain name (e.g., `development`, `staging`, `production`)

### Examples

| Template | Project | Domain | Resulting namespace |
|----------|---------|--------|---------------------|
| `{{ project }}-{{ domain }}` (default) | `flytesnacks` | `development` | `flytesnacks-development` |
| `{{ domain }}` | `flytesnacks` | `development` | `development` |
| `myorg-{{ project }}-{{ domain }}` | `flytesnacks` | `development` | `myorg-flytesnacks-development` |

> [!WARNING]
> Changing namespace mapping after workflows have run will cause existing data in old namespaces to become inaccessible. Plan your namespace mapping before initial deployment.

## Data plane configuration

Set the `namespace_mapping` value at the top level of your dataplane Helm values. This single value cascades to all services that need it: clusterresourcesync, propeller, operator, and executor.

```yaml
namespace_mapping:
  template: "myorg-{{ '{{' }} project {{ '}}' }}-{{ '{{' }} domain {{ '}}' }}"
```

> [!NOTE]
> The template uses Helm's backtick escaping for Go template delimiters. In your values file, wrap `{{ project }}` and `{{ domain }}` with backtick-escaped `{{` and `}}` delimiters as shown above.

## How it works

Namespace mapping controls several components:

| Component | Role |
|-----------|------|
| **Clusterresourcesync** | Creates Kubernetes namespaces and per-namespace resources (service accounts, resource quotas) based on the mapping |
| **Propeller** | Resolves the target namespace when scheduling workflow pods |
| **Operator** | Resolves the target namespace for operator-managed resources |
| **Executor** | Resolves the target namespace for task execution |
| **Flyteadmin** (control plane) | Determines the target namespace when creating V1 executions |

All components must agree on the mapping. The dataplane chart's top-level `namespace_mapping` value is the canonical source that cascades to clusterresourcesync, propeller, operator, and executor automatically. You should **not** set per-service overrides.

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference ===

# Helm chart reference

> **📝 Note**
>
> An LLM-optimized bundle of this entire section is available at [`section.md`](section.md).
> This single file contains all pages in this section, optimized for AI coding agent context.

A full list of Helm values available for configuration can be found here:

* **Self-managed deployment > Helm chart reference > Page**
* **Self-managed deployment > Helm chart reference > Page**

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference/dataplane ===

Deploys the Union dataplane components to onboard a kubernetes cluster to the Union Cloud.

## Chart info

| | |
|---|---|
| **Chart version** | 2026.3.12 |
| **App version** | 2026.3.9 |
| **Kubernetes version** | `>= 1.28.0-0` |

## Dependencies

| Repository | Name | Version |
|------------|------|---------|
| https://fluent.github.io/helm-charts | fluentbit(fluent-bit) | 0.48.9 |
| https://kubernetes-sigs.github.io/metrics-server/ | metrics-server(metrics-server) | 3.12.2 |
| https://kubernetes.github.io/ingress-nginx | ingress-nginx | 4.12.3 |
| https://nvidia.github.io/dcgm-exporter/helm-charts | dcgm-exporter | 4.7.1 |
| https://opencost.github.io/opencost-helm-chart | opencost | 1.42.0 |
| https://prometheus-community.github.io/helm-charts | monitoring(kube-prometheus-stack) | 80.8.0 |
| https://prometheus-community.github.io/helm-charts | kube-state-metrics | 5.30.1 |
| https://unionai.github.io/helm-charts | knative-operator(knative-operator) | 2025.5.0 |

## Values

| Key | Type | Description | Default |
|-----|------|-------------|---------|
| additionalPodAnnotations | object | Define additional pod annotations for all of the Union pods. | `{}` |
| additionalPodEnvVars | object | Define additional pod environment variables for all of the Union pods. | `{}` |
| additionalPodLabels | object | Define additional pod labels for all of the Union pods. | `{}` |
| additionalPodSpec | object | Define additional PodSpec values for all of the Union pods. | `{}` |
| clusterName | string | Cluster name should be shared with Union for proper functionality. | `"{{ .Values.global.CLUSTER_NAME }}"` |
| clusterresourcesync | object | clusterresourcesync contains the configuration information for the syncresources service. | `(see values.yaml)` |
| clusterresourcesync.additionalTemplates | list | Additional cluster resource templates to create per project namespace. Use this instead of overriding `templates` to avoid accidentally removing the default namespace, service account, and resource quota templates. Each entry has a `key` (filename stem) and `value` (Kubernetes manifest). | `[]` |
| clusterresourcesync.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| clusterresourcesync.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| clusterresourcesync.affinity | object | affinity configurations for the syncresources pods | `{}` |
| clusterresourcesync.config | object | Syncresources service configuration | `(see values.yaml)` |
| clusterresourcesync.config.clusterResourcesPrivate | object | Additional configuration for the cluster resources service | `{"app":{"isServerless":false}}` |
| clusterresourcesync.config.clusterResourcesPrivate.app | object | Configuration of app serving services. | `{"isServerless":false}` |
| clusterresourcesync.config.cluster_resources.clusterName | string | The name of the cluster.  This should always be the same as the cluster name in the config. | `"{{ include \"getClusterName\" . }}"` |
| clusterresourcesync.config.cluster_resources.refreshInterval | string | How frequently to sync the cluster resources | `"5m"` |
| clusterresourcesync.config.cluster_resources.standaloneDeployment | bool | Start the cluster resource manager in standalone mode. | `true` |
| clusterresourcesync.config.cluster_resources.templatePath | string | The path to the project the templates used to configure project resource quotas. | `"/etc/flyte/clusterresource/templates"` |
| clusterresourcesync.config.union | object | Connection information for the sync resources service to connect to the Union control plane. | `(see values.yaml)` |
| clusterresourcesync.config.union.connection.host | string | Host to connect to | `"dns:///{{ tpl .Values.host . }}"` |
| clusterresourcesync.enabled | bool | Enable or disable the syncresources service | `true` |
| clusterresourcesync.nodeName | string | nodeName constraints for the syncresources pods | `""` |
| clusterresourcesync.nodeSelector | object | nodeSelector constraints for the syncresources pods | `{}` |
| clusterresourcesync.podAnnotations | object | Additional pod annotations for the syncresources service | `{}` |
| clusterresourcesync.podEnv | object | Additional pod environment variables for the syncresources service | `{}` |
| clusterresourcesync.resources | object | Kubernetes resource configuration for the syncresources service | `{"limits":{"cpu":"1","memory":"500Mi"},"requests":{"cpu":"500m","memory":"100Mi"}}` |
| clusterresourcesync.serviceAccount | object | Override service account values for the syncresources service | `{"annotations":{},"name":""}` |
| clusterresourcesync.serviceAccount.annotations | object | Additional annotations for the syncresources service account | `{}` |
| clusterresourcesync.serviceAccount.name | string | Override the service account name for the syncresources service | `""` |
| clusterresourcesync.templates | list | The templates that are used to create and/or update kubernetes resources for Union projects. | `(see values.yaml)` |
| clusterresourcesync.templates[0] | object | Template for namespaces resources | `(see values.yaml)` |
| clusterresourcesync.templates[1] | object | Patch default service account | `(see values.yaml)` |
| clusterresourcesync.tolerations | list | tolerations for the syncresources pods | `[]` |
| clusterresourcesync.topologySpreadConstraints | object | topologySpreadConstraints for the syncresources pods | `{}` |
| config | object | Global configuration settings for all Union services. | `(see values.yaml)` |
| config.admin | object | Admin Client configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/subworkflow/launchplan#AdminConfig) | `(see values.yaml)` |
| config.catalog | object | Catalog Client configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/catalog#Config) Additional advanced Catalog configuration [here](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/catalog#Config) | `(see values.yaml)` |
| config.configOverrides | object | Override any configuration settings. | `{"cache":{"identity":{"enabled":false}}}` |
| config.copilot | object | Copilot configuration | `(see values.yaml)` |
| config.copilot.plugins.k8s.co-pilot | object | Structure documented [here](https://pkg.go.dev/github.com/lyft/flyteplugins@v0.5.28/go/tasks/pluginmachinery/flytek8s/config#FlyteCoPilotConfig) | `(see values.yaml)` |
| config.core | object | Core propeller configuration | `(see values.yaml)` |
| config.core.propeller | object | follows the structure specified [here](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/config). | `(see values.yaml)` |
| config.domain | object | Domains configuration for Union projects. This enables the specified number of domains across all projects in Union. | `(see values.yaml)` |
| config.enabled_plugins.tasks | object | Tasks specific configuration [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig) | `(see values.yaml)` |
| config.enabled_plugins.tasks.task-plugins | object | Plugins configuration, [structure](https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig) | `(see values.yaml)` |
| config.enabled_plugins.tasks.task-plugins.enabled-plugins | list | [Enabled Plugins](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/config#Config). Enable sagemaker*, athena if you install the backend plugins | `["container","sidecar","k8s-array","echo","fast-task","connector-service"]` |
| config.k8s | object | Kubernetes specific Flyte configuration | `{"plugins":{"k8s":{"default-cpus":"100m","default-env-vars":[],"default-memory":"100Mi"}}}` |
| config.k8s.plugins.k8s | object | Configuration section for all K8s specific plugins [Configuration structure](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config) | `{"default-cpus":"100m","default-env-vars":[],"default-memory":"100Mi"}` |
| config.logger | object | Logging configuration | `{"level":4,"show-source":true}` |
| config.operator | object | Configuration for the Union operator service | `(see values.yaml)` |
| config.operator.apps | object | Enable app serving | `{"enabled":"{{ .Values.serving.enabled }}"}` |
| config.operator.billing | object | Billing model: None, Legacy, or ResourceUsage. | `{"model":"Legacy"}` |
| config.operator.clusterData | object | Dataplane cluster configuration. | `(see values.yaml)` |
| config.operator.clusterData.appId | string | The client id used to authenticate to the control plane.  This will be provided by Union. | `"{{ tpl .Values.secrets.admin.clientId . }}"` |
| config.operator.clusterData.bucketName | string | The bucket name for object storage. | `"{{ tpl .Values.storage.bucketName . }}"` |
| config.operator.clusterData.bucketRegion | string | The bucket region for object storage. | `"{{ tpl .Values.storage.region . }}"` |
| config.operator.clusterData.cloudHostName | string | The hose name for control plane access. This will be provided by Union. | `"{{ tpl .Values.host . }}"` |
| config.operator.clusterData.gcpProjectId | string | For GCP only, the project id for object storage. | `"{{ tpl .Values.storage.gcp.projectId . }}"` |
| config.operator.clusterData.metadataBucketPrefix | string | The prefix for constructing object storage URLs. | `"{{ include \"storage.metadata-prefix\" . }}"` |
| config.operator.clusterId | object | Set the cluster information for the operator service | `{"organization":"{{ tpl .Values.orgName . }}"}` |
| config.operator.clusterId.organization | string | The organization name for the cluster.  This should match your organization name that you were provided. | `"{{ tpl .Values.orgName . }}"` |
| config.operator.collectUsages | object | Configuration for the usage reporting service. | `{"enabled":true}` |
| config.operator.collectUsages.enabled | bool | Enable usage collection in the operator service. | `true` |
| config.operator.dependenciesHeartbeat | object | Heartbeat check configuration. | `(see values.yaml)` |
| config.operator.dependenciesHeartbeat.prometheus | object | Define the prometheus health check endpoint. | `{"endpoint":"{{ include \"prometheus.health.url\" . }}"}` |
| config.operator.dependenciesHeartbeat.propeller | object | Define the propeller health check endpoint. | `{"endpoint":"{{ include \"propeller.health.url\" . }}"}` |
| config.operator.dependenciesHeartbeat.proxy | object | Define the operator proxy health check endpoint. | `{"endpoint":"{{ include \"proxy.health.url\" . }}"}` |
| config.operator.enableTunnelService | bool | Enable the cloudflare tunnel service for secure communication with the control plane. | `true` |
| config.operator.enabled | bool | Enables the operator service | `true` |
| config.operator.syncClusterConfig | object | Sync the configuration from the control plane. This will overwrite any configuration values set as part of the deploy. | `{"enabled":false}` |
| config.proxy | object | Configuration for the operator proxy service. | `(see values.yaml)` |
| config.proxy.smConfig | object | Secret manager configuration | `(see values.yaml)` |
| config.proxy.smConfig.enabled | string | Enable or disable secret manager support for the Union dataplane. | `"{{ .Values.proxy.secretManager.enabled }}"` |
| config.proxy.smConfig.k8sConfig | object | Kubernetes specific secret manager configuration. | `{"namespace":"{{ include \"proxy.secretsNamespace\" . }}"}` |
| config.proxy.smConfig.type | string | The type of secret manager to use. | `"{{ .Values.proxy.secretManager.type }}"` |
| config.resource_manager | object | Resource manager configuration | `{"propeller":{"resourcemanager":{"type":"noop"}}}` |
| config.resource_manager.propeller | object | resource manager configuration | `{"resourcemanager":{"type":"noop"}}` |
| config.sharedService | object | Section that configures shared union services | `{"features":{"gatewayV2":true},"port":8081}` |
| config.task_logs | object | Section that configures how the Task logs are displayed on the UI. This has to be changed based on your actual logging provider. Refer to [structure](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/logs#LogConfig) to understand how to configure various logging engines | `(see values.yaml)` |
| config.task_logs.plugins.logs.cloudwatch-enabled | bool | One option is to enable cloudwatch logging for EKS, update the region and log group accordingly | `false` |
| config.task_resource_defaults | object | Task default resources configuration Refer to the full [structure](https://pkg.go.dev/github.com/lyft/flyteadmin@v0.3.37/pkg/runtime/interfaces#TaskResourceConfiguration). | `(see values.yaml)` |
| config.task_resource_defaults.task_resources | object | Task default resources parameters | `{"defaults":{"cpu":"100m","memory":"500Mi"},"limits":{"cpu":4096,"gpu":256,"memory":"2Ti"}}` |
| config.union.connection | object | Connection information to the union control plane. | `{"host":"dns:///{{ tpl .Values.host . }}"}` |
| config.union.connection.host | string | Host to connect to | `"dns:///{{ tpl .Values.host . }}"` |
| cost.enabled | bool | Enable or disable the cost service resources.  This does not include the opencost or other compatible monitoring services. | `true` |
| cost.serviceMonitor.matchLabels | object | Match labels for the ServiceMonitor. | `{"app.kubernetes.io/name":"opencost"}` |
| cost.serviceMonitor.name | string | The name of the ServiceMonitor. | `"cost"` |
| databricks | object | Databricks integration configuration | `{"enabled":false,"plugin_config":{}}` |
| dcgm-exporter | object | Dcgm exporter configuration | `(see values.yaml)` |
| dcgm-exporter.enabled | bool | Enable or disable the dcgm exporter | `false` |
| dcgm-exporter.serviceMonitor | object | It's common practice to taint and label  to not run dcgm exporter on all nodes, so we can use node selectors and    tolerations to ensure it only runs on GPU nodes. affinity: {} nodeSelector: {} tolerations: [] | `{"enabled":false}` |
| executor.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| executor.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| executor.affinity | object | affinity for executor deployment | `{}` |
| executor.config.cluster | string |  | `"{{ tpl .Values.clusterName . }}"` |
| executor.config.evaluatorCount | int |  | `64` |
| executor.config.maxActions | int |  | `2000` |
| executor.config.organization | string |  | `"{{ tpl .Values.orgName . }}"` |
| executor.config.unionAuth.injectSecret | bool |  | `true` |
| executor.config.unionAuth.secretName | string |  | `"EAGER_API_KEY"` |
| executor.config.workerName | string |  | `"worker1"` |
| executor.enabled | bool |  | `true` |
| executor.idl2Executor | bool |  | `false` |
| executor.nodeName | string | nodeName constraints for executor deployment | `""` |
| executor.nodeSelector | object | nodeSelector for executor deployment | `{}` |
| executor.plugins.fasttask | object | Configuration section for all K8s specific plugins [Configuration structure](https://pkg.go.dev/github.com/lyft/flyteplugins/go/tasks/pluginmachinery/flytek8s/config) | `(see values.yaml)` |
| executor.plugins.ioutils.remoteFileOutputPaths.deckFilename | string |  | `"report.html"` |
| executor.plugins.k8s.disable-inject-owner-references | bool |  | `true` |
| executor.podEnv | list | Appends additional environment variables to the executor container's spec. | `[]` |
| executor.podLabels.app | string |  | `"executor"` |
| executor.propeller.node-config.disable-input-file-writes | bool |  | `true` |
| executor.raw_config | object |  | `{}` |
| executor.resources.limits.cpu | int |  | `4` |
| executor.resources.limits.memory | string |  | `"8Gi"` |
| executor.resources.requests.cpu | int |  | `1` |
| executor.resources.requests.memory | string |  | `"1Gi"` |
| executor.serviceAccount.annotations | object |  | `{}` |
| executor.sharedService.metrics.scope | string |  | `"executor:"` |
| executor.sharedService.security.allowCors | bool |  | `true` |
| executor.sharedService.security.allowLocalhostAccess | bool |  | `true` |
| executor.sharedService.security.allowedHeaders[0] | string |  | `"Content-Type"` |
| executor.sharedService.security.allowedOrigins[0] | string |  | `"*"` |
| executor.sharedService.security.secure | bool |  | `false` |
| executor.sharedService.security.useAuth | bool |  | `false` |
| executor.task_logs.plugins.logs.cloudwatch-enabled | bool | One option is to enable cloudwatch logging for EKS, update the region and log group accordingly | `false` |
| executor.task_logs.plugins.logs.dynamic-log-links[0].vscode.displayName | string |  | `"VS Code Debugger"` |
| executor.task_logs.plugins.logs.dynamic-log-links[0].vscode.linkType | string |  | `"ide"` |
| executor.task_logs.plugins.logs.dynamic-log-links[0].vscode.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[1].wandb-execution-id.displayName | string |  | `"Weights & Biases"` |
| executor.task_logs.plugins.logs.dynamic-log-links[1].wandb-execution-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[1].wandb-execution-id.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[2].wandb-custom-id.displayName | string |  | `"Weights & Biases"` |
| executor.task_logs.plugins.logs.dynamic-log-links[2].wandb-custom-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[2].wandb-custom-id.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[3].comet-ml-execution-id.displayName | string |  | `"Comet"` |
| executor.task_logs.plugins.logs.dynamic-log-links[3].comet-ml-execution-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[3].comet-ml-execution-id.templateUris | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[4].comet-ml-custom-id.displayName | string |  | `"Comet"` |
| executor.task_logs.plugins.logs.dynamic-log-links[4].comet-ml-custom-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[4].comet-ml-custom-id.templateUris | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.dynamic-log-links[5].neptune-scale-run.displayName | string |  | `"Neptune Run"` |
| executor.task_logs.plugins.logs.dynamic-log-links[5].neptune-scale-run.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[5].neptune-scale-run.templateUris[0] | string |  | `"https://scale.neptune.ai/{{`{{ .taskConfig.project }}`}}/-/run/?customId={{`{{ .podName }}`}}"` |
| executor.task_logs.plugins.logs.dynamic-log-links[6].neptune-scale-custom-id.displayName | string |  | `"Neptune Run"` |
| executor.task_logs.plugins.logs.dynamic-log-links[6].neptune-scale-custom-id.linkType | string |  | `"dashboard"` |
| executor.task_logs.plugins.logs.dynamic-log-links[6].neptune-scale-custom-id.templateUris[0] | string |  | `(see values.yaml)` |
| executor.task_logs.plugins.logs.kubernetes-enabled | bool |  | `true` |
| executor.tolerations | list | tolerations for executor deployment | `[]` |
| executor.topologySpreadConstraints | object | topologySpreadConstraints for executor deployment | `{}` |
| extraObjects | list |  | `[]` |
| fluentbit | object | Configuration for fluentbit used for the persistent logging feature. FluentBit runs as a DaemonSet and ships container logs to the persisted-logs/ path in the configured object store. The fluentbit-system service account must have write access to the storage bucket.  Grant access using cloud-native identity federation:   AWS (IRSA):     annotations:       eks.amazonaws.com/role-arn: "arn:aws:iam::`<ACCOUNT_ID>`:role/`<ROLE_NAME>`"   Azure (Workload Identity):     annotations:       azure.workload.identity/client-id: "`<CLIENT_ID>`"   GCP (Workload Identity):     annotations:       iam.gke.io/gcp-service-account: "`<GSA_NAME>`@`<PROJECT_ID>`.iam.gserviceaccount.com"  See https://www.union.ai/docs/v1/selfmanaged/deployment/configuration/persistent-logs/ | `(see values.yaml)` |
| flyteagent | object | Flyteagent configuration | `{"enabled":false,"plugin_config":{}}` |
| flyteconnector.additionalContainers | list | Appends additional containers to the deployment spec. May include template values. | `[]` |
| flyteconnector.additionalEnvs | list | Appends additional envs to the deployment spec. May include template values | `[]` |
| flyteconnector.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| flyteconnector.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| flyteconnector.affinity | object | affinity for flyteconnector deployment | `{}` |
| flyteconnector.autoscaling.maxReplicas | int |  | `5` |
| flyteconnector.autoscaling.minReplicas | int |  | `2` |
| flyteconnector.autoscaling.targetCPUUtilizationPercentage | int |  | `80` |
| flyteconnector.autoscaling.targetMemoryUtilizationPercentage | int |  | `80` |
| flyteconnector.configPath | string | Default glob string for searching configuration files | `"/etc/flyteconnector/config/*.yaml"` |
| flyteconnector.enabled | bool |  | `false` |
| flyteconnector.extraArgs | object | Appends extra command line arguments to the main command | `{}` |
| flyteconnector.image.pullPolicy | string | Docker image pull policy | `"IfNotPresent"` |
| flyteconnector.image.repository | string | Docker image for flyteconnector deployment | `"ghcr.io/flyteorg/flyte-connectors"` |
| flyteconnector.image.tag | string |  | `"py3.13-2.0.0b50.dev3-g695bb1db3.d20260122"` |
| flyteconnector.nodeSelector | object | nodeSelector for flyteconnector deployment | `{}` |
| flyteconnector.podAnnotations | object | Annotations for flyteconnector pods | `{}` |
| flyteconnector.ports.containerPort | int |  | `8000` |
| flyteconnector.ports.name | string |  | `"grpc"` |
| flyteconnector.priorityClassName | string | Sets priorityClassName for datacatalog pod(s). | `""` |
| flyteconnector.prometheusPort.containerPort | int |  | `9090` |
| flyteconnector.prometheusPort.name | string |  | `"metric"` |
| flyteconnector.replicaCount | int | Replicas count for flyteconnector deployment | `2` |
| flyteconnector.resources | object | Default resources requests and limits for flyteconnector deployment | `(see values.yaml)` |
| flyteconnector.service | object | Service settings for flyteconnector | `{"clusterIP":"None","type":"ClusterIP"}` |
| flyteconnector.serviceAccount | object | Configuration for service accounts for flyteconnector | `{"annotations":{},"create":true,"imagePullSecrets":[]}` |
| flyteconnector.serviceAccount.annotations | object | Annotations for ServiceAccount attached to flyteconnector pods | `{}` |
| flyteconnector.serviceAccount.create | bool | Should a service account be created for flyteconnector | `true` |
| flyteconnector.serviceAccount.imagePullSecrets | list | ImagePullSecrets to automatically assign to the service account | `[]` |
| flyteconnector.tolerations | list | tolerations for flyteconnector deployment | `[]` |
| flytepropeller | object | Flytepropeller configuration | `(see values.yaml)` |
| flytepropeller.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| flytepropeller.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| flytepropeller.affinity | object | affinity for Flytepropeller deployment | `{}` |
| flytepropeller.configPath | string | Default regex string for searching configuration files | `"/etc/flyte/config/*.yaml"` |
| flytepropeller.extraArgs | object | extra arguments to pass to propeller. | `{}` |
| flytepropeller.nodeName | string | nodeName constraints for Flytepropeller deployment | `""` |
| flytepropeller.nodeSelector | object | nodeSelector for Flytepropeller deployment | `{}` |
| flytepropeller.podAnnotations | object | Annotations for Flytepropeller pods | `{}` |
| flytepropeller.podLabels | object | Labels for the Flytepropeller pods | `{}` |
| flytepropeller.replicaCount | int | Replicas count for Flytepropeller deployment | `1` |
| flytepropeller.resources | object | Default resources requests and limits for Flytepropeller deployment | `{"limits":{"cpu":"3","memory":"3Gi"},"requests":{"cpu":"1","memory":"1Gi"}}` |
| flytepropeller.serviceAccount | object | Configuration for service accounts for FlytePropeller | `{"annotations":{},"imagePullSecrets":[]}` |
| flytepropeller.serviceAccount.annotations | object | Annotations for ServiceAccount attached to FlytePropeller pods | `{}` |
| flytepropeller.serviceAccount.imagePullSecrets | list | ImapgePullSecrets to automatically assign to the service account | `[]` |
| flytepropeller.tolerations | list | tolerations for Flytepropeller deployment | `[]` |
| flytepropeller.topologySpreadConstraints | object | topologySpreadConstraints for Flytepropeller deployment | `{}` |
| flytepropellerwebhook | object | Configuration for the Flytepropeller webhook | `(see values.yaml)` |
| flytepropellerwebhook.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| flytepropellerwebhook.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| flytepropellerwebhook.affinity | object | affinity for webhook deployment | `{}` |
| flytepropellerwebhook.enabled | bool | enable or disable secrets webhook | `true` |
| flytepropellerwebhook.nodeName | string | nodeName constraints for webhook deployment | `""` |
| flytepropellerwebhook.nodeSelector | object | nodeSelector for webhook deployment | `{}` |
| flytepropellerwebhook.podAnnotations | object | Annotations for webhook pods | `{}` |
| flytepropellerwebhook.podEnv | object | Additional webhook container environment variables | `{}` |
| flytepropellerwebhook.podLabels | object | Labels for webhook pods | `{}` |
| flytepropellerwebhook.priorityClassName | string | Sets priorityClassName for webhook pod | `""` |
| flytepropellerwebhook.replicaCount | int | Replicas | `1` |
| flytepropellerwebhook.securityContext | object | Sets securityContext for webhook pod(s). | `(see values.yaml)` |
| flytepropellerwebhook.service | object | Service settings for the webhook | `(see values.yaml)` |
| flytepropellerwebhook.service.port | int | HTTPS port for the webhook service | `443` |
| flytepropellerwebhook.service.targetPort | int | Target port for the webhook service (container port) | `9443` |
| flytepropellerwebhook.serviceAccount | object | Configuration for service accounts for the webhook | `{"imagePullSecrets":[]}` |
| flytepropellerwebhook.serviceAccount.imagePullSecrets | list | ImagePullSecrets to automatically assign to the service account | `[]` |
| flytepropellerwebhook.tolerations | list | tolerations for webhook deployment | `[]` |
| flytepropellerwebhook.topologySpreadConstraints | object | topologySpreadConstraints for webhook deployment | `{}` |
| fullnameOverride | string | Override the chart fullname. | `""` |
| global.CLIENT_ID | string |  | `""` |
| global.CLUSTER_NAME | string |  | `""` |
| global.FAST_REGISTRATION_BUCKET | string |  | `""` |
| global.METADATA_BUCKET | string |  | `""` |
| global.ORG_NAME | string |  | `""` |
| global.UNION_CONTROL_PLANE_HOST | string |  | `""` |
| host | string | Set the control plane host for your Union dataplane installation.  This will be provided by Union. | `"{{ .Values.global.UNION_CONTROL_PLANE_HOST }}"` |
| image.flytecopilot | object | flytecopilot repository and tag. | `{"pullPolicy":"IfNotPresent","repository":"cr.flyte.org/flyteorg/flytecopilot","tag":"v1.14.1"}` |
| image.kubeStateMetrics | object | Kubestatemetrics repository and tag. | `(see values.yaml)` |
| image.union | object | Image repository for the operator and union services | `{"pullPolicy":"IfNotPresent","repository":"public.ecr.aws/p0i0a9q8/unionoperator","tag":""}` |
| imageBuilder.authenticationType | string | "azure" uses az acr login to authenticate to the default registry. Requires Azure Workload Identity to be enabled. | `"noop"` |
| imageBuilder.buildkit.additionalVolumeMounts | list | Additional volume mounts to add to the buildkit container | `[]` |
| imageBuilder.buildkit.additionalVolumes | list | Additional volumes to add to the pod | `[]` |
| imageBuilder.buildkit.autoscaling | object | buildkit HPA configuration | `{"enabled":false,"maxReplicas":2,"minReplicas":1,"targetCPUUtilizationPercentage":60}` |
| imageBuilder.buildkit.autoscaling.targetCPUUtilizationPercentage | int | We can adjust this as needed. | `60` |
| imageBuilder.buildkit.deploymentStrategy | string | deployment strategy for buildkit deployment | `"Recreate"` |
| imageBuilder.buildkit.enabled | bool | Enable buildkit service within this release. | `true` |
| imageBuilder.buildkit.fullnameOverride | string | The name to use for the buildkit deployment, service, configmap, etc. | `""` |
| imageBuilder.buildkit.image.pullPolicy | string | Pull policy | `"IfNotPresent"` |
| imageBuilder.buildkit.image.repository | string | Image name | `"docker.io/moby/buildkit"` |
| imageBuilder.buildkit.image.tag | e.g. "buildx-stable-1" becomes "buildx-stable-1-rootless" | unless the tag already contains "rootless". | `"buildx-stable-1"` |
| imageBuilder.buildkit.log | object | Enable debug logging | `{"debug":false,"format":"text"}` |
| imageBuilder.buildkit.nodeSelector | object | Node selector | `{}` |
| imageBuilder.buildkit.oci | object | Buildkitd service configuration | `{"maxParallelism":0}` |
| imageBuilder.buildkit.oci.maxParallelism | int | maxParalelism limits the number of concurrent builds, default is 0 (unbounded) | `0` |
| imageBuilder.buildkit.pdb.minAvailable | int | Minimum available pods | `1` |
| imageBuilder.buildkit.podAnnotations | object | Pod annotations | `{}` |
| imageBuilder.buildkit.podEnv | list | Appends additional environment variables to the buildkit container's spec. | `[]` |
| imageBuilder.buildkit.replicaCount | int | Replicas count for Buildkit deployment | `1` |
| imageBuilder.buildkit.resources | object | Resource definitions | `{"requests":{"cpu":1,"ephemeral-storage":"20Gi","memory":"1Gi"}}` |
| imageBuilder.buildkit.rootless | bool | kernel >= 5.11 with unprivileged user namespace support. | `true` |
| imageBuilder.buildkit.service.annotations | object | Service annotations | `{}` |
| imageBuilder.buildkit.service.loadbalancerIp | string | Static ip address for load balancer | `""` |
| imageBuilder.buildkit.service.port | int | Service port | `1234` |
| imageBuilder.buildkit.service.type | string | Service type | `"ClusterIP"` |
| imageBuilder.buildkit.serviceAccount | object | Service account configuration for buildkit | `{"annotations":{},"create":true,"imagePullSecret":"","name":"union-imagebuilder"}` |
| imageBuilder.buildkit.tolerations | list | Tolerations | `[]` |
| imageBuilder.buildkitUri | string | E.g. "tcp://buildkitd.buildkit.svc.cluster.local:1234" | `""` |
| imageBuilder.defaultRepository | string | Note, the build-image task will fail unless "registry" is specified or a default repository is provided. | `""` |
| imageBuilder.enabled | bool |  | `true` |
| imageBuilder.targetConfigMapName | string | Should not change unless coordinated with Union technical support. | `"build-image-config"` |
| ingress-nginx.controller.admissionWebhooks.enabled | bool |  | `false` |
| ingress-nginx.controller.allowSnippetAnnotations | bool |  | `true` |
| ingress-nginx.controller.config.annotations-risk-level | string |  | `"Critical"` |
| ingress-nginx.controller.config.grpc-connect-timeout | string |  | `"1200"` |
| ingress-nginx.controller.config.grpc-read-timeout | string |  | `"604800"` |
| ingress-nginx.controller.config.grpc-send-timeout | string |  | `"604800"` |
| ingress-nginx.controller.ingressClassResource.controllerValue | string |  | `"union.ai/dataplane"` |
| ingress-nginx.controller.ingressClassResource.default | bool |  | `false` |
| ingress-nginx.controller.ingressClassResource.enabled | bool |  | `true` |
| ingress-nginx.controller.ingressClassResource.name | string |  | `"dataplane"` |
| ingress-nginx.enabled | bool |  | `false` |
| ingress.dataproxy | object | Dataproxy specific ingress configuration. | `{"annotations":{},"class":"","hostOverride":"","tls":{}}` |
| ingress.dataproxy.annotations | object | Annotations to apply to the ingress resource. | `{}` |
| ingress.dataproxy.class | string | Ingress class name | `""` |
| ingress.dataproxy.hostOverride | string | Ingress host | `""` |
| ingress.dataproxy.tls | object | Ingress TLS configuration | `{}` |
| ingress.enabled | bool |  | `false` |
| ingress.host | string |  | `""` |
| ingress.serving | object | Serving specific ingress configuration. | `{"annotations":{},"class":"","hostOverride":"","tls":{}}` |
| ingress.serving.annotations | object | Annotations to apply to the ingress resource. | `{}` |
| ingress.serving.class | string | Ingress class name | `""` |
| ingress.serving.hostOverride | Optional | Host override for serving ingress rule. Defaults to *.apps.{{ .Values.host }}. | `""` |
| ingress.serving.tls | object | Ingress TLS configuration | `{}` |
| knative-operator.crds.install | bool |  | `true` |
| knative-operator.enabled | bool |  | `false` |
| kube-state-metrics | object | Standalone kube-state-metrics for Union features (cost tracking, pod resource metrics). Metric filtering is handled in the Prometheus static scrape config. | `{}` |
| low_privilege | bool | Scopes the deployment, permissions and actions created into a single namespace | `false` |
| metrics-server.enabled | bool |  | `false` |
| monitoring.alerting.enabled | bool |  | `false` |
| monitoring.alertmanager.enabled | bool |  | `false` |
| monitoring.coreDns.enabled | bool |  | `true` |
| monitoring.crds.enabled | bool |  | `false` |
| monitoring.dashboards.enabled | bool |  | `true` |
| monitoring.dashboards.label | string |  | `"grafana_dashboard"` |
| monitoring.dashboards.labelValue | string |  | `"1"` |
| monitoring.defaultRules.create | bool |  | `true` |
| monitoring.enabled | bool |  | `false` |
| monitoring.fullnameOverride | string |  | `"monitoring"` |
| monitoring.grafana.adminPassword | string |  | `"admin"` |
| monitoring.grafana.enabled | bool |  | `true` |
| monitoring.grafana.fullNameOverride | string |  | `"monitoring-grafana"` |
| monitoring.kube-state-metrics.fullnameOverride | string |  | `"monitoring-kube-state-metrics"` |
| monitoring.kube-state-metrics.nameOverride | string |  | `"monitoring-kube-state-metrics"` |
| monitoring.kubeApiServer.enabled | bool |  | `true` |
| monitoring.kubeControllerManager.enabled | bool |  | `true` |
| monitoring.kubeEtcd.enabled | bool |  | `true` |
| monitoring.kubeProxy.enabled | bool |  | `true` |
| monitoring.kubeScheduler.enabled | bool |  | `true` |
| monitoring.kubeStateMetrics.enabled | bool |  | `true` |
| monitoring.kubelet.enabled | bool |  | `true` |
| monitoring.nameOverride | string |  | `"monitoring"` |
| monitoring.nodeExporter.enabled | bool |  | `true` |
| monitoring.prometheus.agentMode | bool |  | `false` |
| monitoring.prometheus.enabled | bool |  | `true` |
| monitoring.prometheus.prometheusSpec.maximumStartupDurationSeconds | int |  | `600` |
| monitoring.prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues | bool |  | `false` |
| monitoring.prometheus.prometheusSpec.resources.limits.cpu | string |  | `"2"` |
| monitoring.prometheus.prometheusSpec.resources.limits.memory | string |  | `"4Gi"` |
| monitoring.prometheus.prometheusSpec.resources.requests.cpu | string |  | `"500m"` |
| monitoring.prometheus.prometheusSpec.resources.requests.memory | string |  | `"1Gi"` |
| monitoring.prometheus.prometheusSpec.retention | string |  | `"7d"` |
| monitoring.prometheus.prometheusSpec.ruleSelectorNilUsesHelmValues | bool |  | `false` |
| monitoring.prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues | bool |  | `false` |
| monitoring.prometheus.service.port | int |  | `80` |
| monitoring.prometheusOperator.enabled | bool |  | `true` |
| monitoring.prometheusRules.enabled | bool |  | `true` |
| monitoring.serviceMonitors.enabled | bool |  | `true` |
| monitoring.slos.alerting.enabled | bool |  | `false` |
| monitoring.slos.enabled | bool |  | `false` |
| monitoring.slos.targets.availability | float |  | `0.999` |
| monitoring.slos.targets.latencyP99 | int |  | `5` |
| nameOverride | string | Override the chart name. | `""` |
| namespace_mapping | object | Namespace mapping template for mapping Union runs to Kubernetes namespaces. This is the canonical source of truth. All dataplane services (propeller, clusterresourcesync, operator, executor) will inherit this value unless explicitly overridden in their service-specific config sections (config.namespace_config, config.operator.org, executor.raw_config). | `{}` |
| namespaces.enabled | bool |  | `true` |
| nodeobserver | object | nodeobserver contains the configuration information for the node observer service. | `(see values.yaml)` |
| nodeobserver.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| nodeobserver.additionalVolumes | list | Appends additional volumes to the daemonset spec. May include template values. | `[]` |
| nodeobserver.affinity | object | affinity configurations for the pods associated with nodeobserver services | `{}` |
| nodeobserver.enabled | bool | Enable or disable nodeobserver | `false` |
| nodeobserver.nodeName | string | nodeName constraints for the pods associated with nodeobserver services | `""` |
| nodeobserver.nodeSelector | object | nodeSelector constraints for the pods associated with nodeobserver services | `{}` |
| nodeobserver.podAnnotations | object | Additional pod annotations for the nodeobserver services | `{}` |
| nodeobserver.podEnv | list | Additional pod environment variables for the nodeobserver services | `(see values.yaml)` |
| nodeobserver.resources | object | Kubernetes resource configuration for the nodeobserver service | `{"limits":{"cpu":"1","memory":"500Mi"},"requests":{"cpu":"500m","memory":"100Mi"}}` |
| nodeobserver.tolerations | list | tolerations for the pods associated with nodeobserver services | `[{"effect":"NoSchedule","operator":"Exists"}]` |
| nodeobserver.topologySpreadConstraints | object | topologySpreadConstraints for the pods associated with nodeobserver services | `{}` |
| objectStore | object | Union Object Store configuration | `{"service":{"grpcPort":8089,"httpPort":8080}}` |
| opencost.enabled | bool | Enable or disable the opencost installation. | `true` |
| opencost.opencost.exporter.resources.limits.cpu | string |  | `"1000m"` |
| opencost.opencost.exporter.resources.limits.memory | string |  | `"4Gi"` |
| opencost.opencost.exporter.resources.requests.cpu | string |  | `"500m"` |
| opencost.opencost.exporter.resources.requests.memory | string |  | `"1Gi"` |
| opencost.opencost.metrics.serviceMonitor.enabled | bool |  | `false` |
| opencost.opencost.prometheus.external.enabled | bool |  | `true` |
| opencost.opencost.prometheus.external.url | string |  | `"http://union-operator-prometheus.{{.Release.Namespace}}.svc:80/prometheus"` |
| opencost.opencost.prometheus.internal.enabled | bool |  | `false` |
| opencost.opencost.ui.enabled | bool |  | `false` |
| operator.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| operator.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| operator.affinity | object | affinity configurations for the operator pods | `{}` |
| operator.autoscaling.enabled | bool |  | `false` |
| operator.enableTunnelService | bool |  | `true` |
| operator.imagePullSecrets | list |  | `[]` |
| operator.nodeName | string | nodeName constraints for the operator pods | `""` |
| operator.nodeSelector | object | nodeSelector constraints for the operator pods | `{}` |
| operator.podAnnotations | object |  | `{}` |
| operator.podEnv | object |  | `{}` |
| operator.podLabels | object |  | `{}` |
| operator.podSecurityContext | object |  | `{}` |
| operator.priorityClassName | string |  | `""` |
| operator.replicas | int |  | `1` |
| operator.resources.limits.cpu | string |  | `"2"` |
| operator.resources.limits.memory | string |  | `"3Gi"` |
| operator.resources.requests.cpu | string |  | `"1"` |
| operator.resources.requests.memory | string |  | `"1Gi"` |
| operator.secretName | string |  | `"union-secret-auth"` |
| operator.securityContext | object |  | `{}` |
| operator.serviceAccount.annotations | object |  | `{}` |
| operator.serviceAccount.create | bool |  | `true` |
| operator.serviceAccount.name | string |  | `"operator-system"` |
| operator.tolerations | list | tolerations for the operator pods | `[]` |
| operator.topologySpreadConstraints | object | topologySpreadConstraints for the operator pods | `{}` |
| orgName | string | Organization name should be provided by Union. | `"{{ .Values.global.ORG_NAME }}"` |
| prometheus | object | Union features Prometheus configuration. Deploys a static Prometheus instance (no Prometheus Operator required) for Union features like cost tracking and task-level monitoring. | `(see values.yaml)` |
| prometheus.affinity | object | Affinity rules for the Prometheus pod. | `{}` |
| prometheus.nodeSelector | object | Node selector for the Prometheus pod. | `{}` |
| prometheus.priorityClassName | string | Priority class for the Prometheus pod. | `"system-cluster-critical"` |
| prometheus.resources | object | Resource limits and requests. | `{"limits":{"cpu":"3","memory":"3500Mi"},"requests":{"cpu":"1","memory":"1Gi"}}` |
| prometheus.retention | string | Data retention period. | `"3d"` |
| prometheus.routePrefix | string | Route prefix for Prometheus web UI and API. | `"/prometheus/"` |
| prometheus.serviceAccount | object | Service account configuration. | `{"annotations":{},"create":true}` |
| prometheus.tolerations | list | Tolerations for the Prometheus pod. | `[]` |
| proxy | object | Union operator proxy configuration | `(see values.yaml)` |
| proxy.additionalVolumeMounts | list | Appends additional volume mounts to the main container's spec. May include template values. | `[]` |
| proxy.additionalVolumes | list | Appends additional volumes to the deployment spec. May include template values. | `[]` |
| proxy.affinity | object | affinity configurations for the proxy pods | `{}` |
| proxy.nodeName | string | nodeName constraint for the proxy pods | `""` |
| proxy.nodeSelector | object | nodeSelector constraints for the proxy pods | `{}` |
| proxy.secretManager.namespace | string | Set the namespace for union managed secrets created through the native Kubernetes secret manager. If the namespace is not set, the release namespace will be used. | `""` |
| proxy.tolerations | list | tolerations for the proxy pods | `[]` |
| proxy.topologySpreadConstraints | object | topologySpreadConstraints for the proxy pods | `{}` |
| resourcequota | object | Create global resource quotas for the cluster. | `{"create":false}` |
| scheduling | object | Global kubernetes scheduling constraints that will be applied to the pods.  Application specific constraints will always take precedence. | `{"affinity":{},"nodeName":"","nodeSelector":{},"tolerations":[],"topologySpreadConstraints":{}}` |
| scheduling.affinity | object | See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node | `{}` |
| scheduling.nodeSelector | object | See https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node | `{}` |
| scheduling.tolerations | list | See https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration | `[]` |
| scheduling.topologySpreadConstraints | object | See https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints | `{}` |
| secrets | object | Connection secrets for the Union control plane services. | `{"admin":{"clientId":"dataplane-operator","clientSecret":"","create":true,"enable":true}}` |
| secrets.admin.clientId | string | The client id used to authenticate to the control plane.  This will be provided by Union. | `"dataplane-operator"` |
| secrets.admin.clientSecret | string | The client secret used to authenticate to the control plane.  This will be provided by Union. | `""` |
| secrets.admin.create | bool | Create the secret resource containing the client id and secret.  If set to false the user is responsible for creating the secret before the installation. | `true` |
| secrets.admin.enable | bool | Enable or disable the admin secret.  This is used to authenticate to the control plane. | `true` |
| serving | object | Configure app serving and knative. | `(see values.yaml)` |
| serving.auth | object | Union authentication and authorization configuration. | `{"enabled":true}` |
| serving.auth.enabled | bool | Disabling is common if not leveraging Union Cloud SSO. | `true` |
| serving.enabled | bool | Enables the serving components. Installs Knative Serving. Knative-Operator must be running in the cluster for this to work. Enables app serving in operator. | `false` |
| serving.extraConfig | object | Additional configuration for Knative serving | `{}` |
| serving.metrics | bool | Enables scraping of metrics from the serving component | `true` |
| serving.replicas | int | The number of replicas to create for all components for high availability. | `2` |
| serving.resources | object | Resources for serving components | `(see values.yaml)` |
| sparkoperator.enabled | bool |  | `false` |
| sparkoperator.plugin_config | object |  | `{}` |
| storage | object | Object storage configuration used by all Union services. | `(see values.yaml)` |
| storage.accessKey | string | The access key used for object storage. | `""` |
| storage.authType | string | The authentication type.  Currently supports "accesskey" and "iam". | `"accesskey"` |
| storage.bucketName | string | The bucket name used for object storage. | `"{{ .Values.global.METADATA_BUCKET }}"` |
| storage.cache | object | Cache configuration for objects retrieved from object storage. | `{"maxSizeMBs":0,"targetGCPercent":70}` |
| storage.custom | object | Define custom configurations for the object storage.  Only used if the provider is set to "custom". | `{}` |
| storage.disableSSL | bool | Disable SSL for object storage.  This should only used for local/sandbox installations. | `false` |
| storage.endpoint | string | Define or override the endpoint used for the object storage service. | `""` |
| storage.fastRegistrationBucketName | string | The bucket name used for fast registration uploads. | `"{{ .Values.global.FAST_REGISTRATION_BUCKET }}"` |
| storage.fastRegistrationURL | string | Override the URL for signed fast registration uploads.  This is only used for local/sandbox installations. | `""` |
| storage.gcp | object | Define GCP specific configuration for object storage. | `{"projectId":""}` |
| storage.injectPodEnvVars | bool | Injects the object storage access information into the pod environment variables.  Needed for providers that only support access and secret key based authentication. | `true` |
| storage.limits | object | Internal service limits for object storage access. | `{"maxDownloadMBs":1024}` |
| storage.metadataPrefix | string | Example for Azure: "abfs://my-container@mystorageaccount.dfs.core.windows.net" | `""` |
| storage.provider | string | The storage provider to use.  Currently supports "compat", "aws", "oci", and "custom". | `"compat"` |
| storage.region | string | The bucket region used for object storage. | `"us-east-1"` |
| storage.s3ForcePathStyle | bool | Use path style instead of domain style urls to access the object storage service. | `true` |
| storage.secretKey | string | The secret key used for object storage. | `""` |
| userRoleAnnotationKey | string | This is the annotation key that is added to service accounts.  Used with GCP and AWS. | `"eks.amazonaws.com/role-arn"` |
| userRoleAnnotationValue | string | This is the value of the annotation key that is added to service accounts. Used with GCP and AWS. | `"arn:aws:iam::ACCOUNT_ID:role/flyte_project_role"` |

=== PAGE: https://www.union.ai/docs/v2/union/deployment/selfmanaged/helm-chart-reference/knative-operator ===

Deploys Knative Operator

## Chart info

| | |
|---|---|
| **Chart version** | 2025.6.3 |
| **App version** | 1.16.0 |
| **Kubernetes version** | `>= 1.28.0-0` |

## Values

| Key | Type | Description | Default |
|-----|------|-------------|---------|
| crds.install | bool |  | `true` |

