Service Providers
This guide shows you how to create Service Provider for the OpenMCP ecosystem from scratch. Service Providers are the heart of the OpenMCP platform, as they provide the capabilities to offer Infrastructure as Data services to end users.
In this guide, we will walk you through the steps of creating a Service Provider using the service-provider-template, explain the context a service provider operates in, and demonstrate how to run end-to-end tests for it.
By the end of this guide, you should have a solid understanding of how a service works and be ready to build a real world provider such as service-provider-velero.
Let's get started!
Overview
A service provider consists of the following two major parts, similar to a regular kubernetes controller:
- A user-facing ServiceProviderAPI: This allows end users to request a
DomainServicefor aManagedControlPlane, e.g.FooServiceorVelero. - A controller that reconciles the ServiceProviderAPI: This controller manages the lifecycle of the provided
DomainServiceand its API (such asFooor the CRDs of Velero).
For a visual overview of how these components fit into an openMCP installation, refer to the service provider deployment model.
Prerequisites
Start by creating a new repository for your service provider using the service-provider-template. Click "Use this template" button on the GitHub page and give your new repository a name that reflects the domain service it provides, e.g. service-provider-velero for a service provider that deploys Velero.
Clone the newly created repository to your local machine and open it with your favorite IDE. Finally, ensure that you have Go installed. You can download it from go.dev.
Service Provider Template Usage
The template allows you to create a service provider without requiring deep knowledge of the underlying OpenMCP platform.
Run the following command to generate a new provider. Replace velero with the kind of your service:
go run ./cmd/template -v -module github.com/openmcp-project/service-provider-velero -kind Velero -group velero
The template generates a fully functional service provider that can be executed and deployed on your local machine using cluster-provider-kind and openmcp-testing.
To run the the generated end-to-end test using task, init the build submodule and execute the e2e test:
git submodule update --init --recursive
task test-e2e
This test bootstraps a complete local openMCP installation with all required components, including:
- The platform cluster where your service provider is managed by the openmcp-operator
- The onboarding cluster where end users request the domain service you provider offers.
- A managed control plane cluster (MCP) where your service provider installs its DomainServiceAPI and optionally its workload.
- An optional workload cluster, provisioned when using the template flag
-w. Your service provider then requests a workload cluster from theopenmcp-operatorto deploy its workload outside the MCP. This will result in another kind cluster.
This means running the above command from the Service Provider Template Usage section with -w like this:
go run ./cmd/template -v -w -module github.com/openmcp-project/service-provider-velero -kind Velero -group velero
The template generator removes its own code after execution. If you want to revert your changes and start fresh, simply use git and delete any generated untracked files. For this reason, remove template-generation step from the e2e test .github/workflows/go.yaml before committing your changes (otherwise your workflow will fail).
- name: Generate template
run: |
go run ./cmd/template -v -w
Project Structure
The service provider template is built with kubebuilder, so the project follows the conventions of typical Kubernetes controllers:
- api/ includes generated types and their CRDs which the crd manager will install during the
initCommand. - internal/controller/ contains the
Reconcilerwhere you implement your domain specific reconcile logic. - pkg/runtime contains generic reconcilers that handle openMCP specific logic such as cluster access management and provider config updates. You normally should not modify this package. If you encounter any issues, create an issue in the template repository.
If you are new to implementing Kubernetes controllers, consider completing building a CronJob tutorial before returning to this guide. The rest of this guide highlights the most important steps to create a service provider and the differences compared to a regular Kubernetes controller.
Create your ServiceProviderAPI
The ServiceProviderAPI type defines the options available to end users when consuming your managed service offering. This API is watched by your ServiceProviderReconciler. The template starts with a simple example field:
// FooServiceSpec defines the desired state of FooService
type FooServiceSpec struct {
// foo is an example field of FooService. Edit fooservice_types.go to remove/update
// +optional
Foo *string `json:"foo,omitempty"`
}
Modify the spec to expose the configuration your provider supports.
For example, Velero allows the user to choose a version and which plugins to install:
apiVersion: velero.services.openmcp.cloud/v1alpha1
kind: Velero
metadata:
name: test-mcp
spec:
version: "v1.17.1"
plugins:
- name: "aws"
version: "v1.13.0"
Each onboarding API type must include common status fields. These are included in the template and you can add additional optional fields.
// FooServiceStatus defines the observed state of FooService.
type FooServiceStatus struct {
// The status of each condition is one of True, False, or Unknown.
// +listType=map
// +listMapKey=type
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
// ObservedGeneration is the generation of this resource that was last reconciled by the controller.
ObservedGeneration int64 `json:"observedGeneration"`
// Phase is the current phase of the resource.
Phase string `json:"phase"`
}
For example, Velero includes a list of managed resources.
// Constants representing the phases of a velero instance lifecycle.
const (
Pending InstancePhase = "Pending"
Progressing InstancePhase = "Progressing"
Ready InstancePhase = "Ready"
Failed InstancePhase = "Failed"
Terminating InstancePhase = "Terminating"
Unknown InstancePhase = "Unknown"
)
type VeleroStatus struct {
// The status of each condition is one of True, False, or Unknown.
// +listType=map
// +listMapKey=type
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`
// ObservedGeneration is the generation of this resource that was last reconciled by the controller.
ObservedGeneration int64 `json:"observedGeneration"`
// Phase is the current phase of the resource.
Phase string `json:"phase"`
// Resources managed by this velero instance
// +optional
Resources []ManagedResource `json:"resources,omitempty"`
}
// ManagedResource defines a kubernetes object with its lifecycle phase
type ManagedResource struct {
corev1.TypedObjectReference `json:",inline"`
Phase InstancePhase `json:"phase"`
Message string `json:"message,omitempty"`
}
Edit the ProviderConfig API
Service providers must expose a ProviderConfig which platform operators use to configure provider behavior. Because provider deployment does not support passing arguments to the binary directly, configuration must be expressed via this API.
Typical settings include version constraints, image locations and pull secrets which are especially important to support air-gapped environments.
For example, Velero also implicitly defines the available plugins via the ProviderConfig.
apiVersion: velero.services.openmcp.cloud/v1alpha1
kind: ProviderConfig
metadata:
name: default
spec:
pollInterval: 1m
availableImages:
- name: velero
versions: ["v1.17.1"]
image: "velero/velero"
- name: aws
versions: ["v1.13.0"]
image: "velero/velero-plugin-for-aws"
imagePullSecrets:
- name: privateregcred
Note that these image pull secrets reference secrets stored on the platform cluster. It is the responsibility of the service provider to ensure that the referenced secrets are copied to the cluster in which the deployments run.
To synchronize the image pull secrets between the platform cluster and the cluster where your workloads run, you can use the SecretMutator from controller-utils inside your CreateOrUpdate logic.
Velero implements this synchronization as follows:
// internal/controller/velero_controller.go
func (r *VeleroReconciler) CreateOrUpdate(ctx context.Context, obj *apiv1alpha1.Velero, pc *apiv1alpha1.ProviderConfig, clusters spruntime.ClusterContext) (ctrl.Result, error) {
...
workloadCluster := resources.NewManagedCluster(clusters.WorkloadCluster.Client(), clusters.WorkloadCluster.RESTConfig(), instance.Namespace(obj), resources.WorkloadCluter)
secret.Configure(workloadCluster, r.PlatformCluster, pc.Spec.ImagePullSecrets, r.PodNamespace)
...
}
// pkg/secret/secret.go
func Configure(cluster resources.ManagedCluster, platformCluster *clusters.Cluster, imagePullSecrets []corev1.LocalObjectReference, sourceNamespace string) {
for _, pullSecret := range imagePullSecrets {
secret := resources.NewManagedObject(&corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: pullSecret.Name,
Namespace: cluster.GetDefaultNamespace(),
},
}, resources.ManagedObjectContext{
ReconcileFunc: func(ctx context.Context, o client.Object) error {
oSecret := o.(*corev1.Secret)
sourceSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: pullSecret.Name,
Namespace: sourceNamespace,
},
}
// retrieve source secret from platform cluster
if err := platformCluster.Client().Get(ctx, client.ObjectKeyFromObject(sourceSecret), sourceSecret); err != nil {
return err
}
mutator := openmcpresources.NewSecretMutator(pullSecret.Name, cluster.GetDefaultNamespace(), sourceSecret.Data, corev1.SecretTypeDockerConfigJson)
return mutator.Mutate(oSecret)
},
StatusFunc: resources.SimpleStatus,
})
cluster.AddObject(secret)
}
}
Finally, ensure that the synchronized image pull secrets are referenced in any workload you reconcile.
Edit the ServiceProviderReconciler
Your Reconciler must implement two functions: CreateOrUpdate and Delete. The template sets up your reconciler together with the generic reconciler from pkg/runtime, which handles openMCP-specifics (cluster access, config updates, etc.).
CreateOrUpdate Operation
The template example contains a basic implementation that installs a CRD into the tenant MCP.
// CreateOrUpdate is called on every add or update event
func (r *FooServiceReconciler) CreateOrUpdate(ctx context.Context, svcobj *apiv1alpha1.FooService, _ *apiv1alpha1.ProviderConfig, clusters spruntime.ClusterContext) (ctrl.Result, error) {
spruntime.StatusProgressing(svcobj, "Reconciling", "Reconcile in progress")
managedObj := &apiextensionsv1.CustomResourceDefinition{
ObjectMeta: metav1.ObjectMeta{
Name: "foos.example.domain",
},
}
_, err := ctrl.CreateOrUpdate(ctx, clusters.MCPCluster.Client(), managedObj, func() error {
managedObj.Spec = fooCRD().Spec
return nil
})
if err == nil {
spruntime.StatusReady(svcobj)
}
return ctrl.Result{}, err
}
In a real world example like Velero, this step contains installing and reconciling every required resource, including CRDs, namespace(s), service account(s), deployments(s), etc. into the MCP and workload cluster.
The ClusterContext provides access to all request specific clusters. These clusters include the managed control plane and, when requested, the workload cluster associated with the current request. Note that the workload cluster is optional and will not be available for providers that deploy their workload directly to the managed control plane.
type ClusterContext struct {
MCPCluster *clusters.Cluster
WorkloadCluster *clusters.Cluster
}
In contrast, the platform and onboarding clusters are static clusters that are assigned to the reconciler at initialization.
type FooServiceReconciler struct {
OnboardingCluster *clusters.Cluster
PlatformCluster *clusters.Cluster
}
Any update to either the ServiceProviderAPI or ProviderConfig triggers reconciliation.
Delete Operation
When a user removes a service, the provider must delete all managed resources.
The basic template example deletes the Foo CRD:
func (r *FooServiceReconciler) Delete(ctx context.Context, obj *apiv1alpha1.FooService, _ *apiv1alpha1.ProviderConfig, clusters spruntime.ClusterContext) (ctrl.Result, error) {
l := logf.FromContext(ctx)
spruntime.StatusTerminating(obj)
managedObj := fooCRD()
if err := clusters.MCPCluster.Client().Delete(ctx, managedObj); client.IgnoreNotFound(err) != nil {
l.Error(err, "delete object failed")
return ctrl.Result{}, err
}
if err := clusters.MCPCluster.Client().Get(ctx, client.ObjectKeyFromObject(managedObj), managedObj); err != nil {
return reconcile.Result{}, client.IgnoreNotFound(err)
}
// object still exists
return ctrl.Result{
RequeueAfter: time.Second * 10,
}, nil
}
Restrict Cluster Access
The template code runs the provider with full admin permissions. Before releasing you provider, restrict its required permissions with the ClusterAccessReconciler. There are multiple occurrences in the main.go service provider setup.
spr := spruntime.NewSPReconciler[*fooservicesv1alpha1.FooService, *fooservicesv1alpha1.ProviderConfig](
func() *fooservicesv1alpha1.FooService { return &fooservicesv1alpha1.FooService{} },
).
WithClusterAccessReconciler(clusteraccess.NewClusterAccessReconciler(platformCluster.Client(), "FooService").
WithMCPScheme(mcpScheme).
WithMCPPermissions(adminPermissions).WithMCPRoleRefs([]common.RoleRef{
{
Name: "cluster-admin",
Kind: "ClusterRole",
}}))
Taskfile Usage
OpenMCP controllers use task instead of make to generate code, build your controller image, etc. The most important developer commands are:
- task generate to regenerate code after API changes
- task build:img:build-test to build an image for local testing
- task test-e2e for a full pipeline run including code generation, validation and e2e test execution.
Run task -l to see all available tasks.
openmcp-testing
The template includes an e2e test suite built with kubernetes e2e framework and openmcp-testing. It provides helper functions such as:
ConfigByPrefixto get a config forplatform,mcp,onboardingoronboardingclusters.CreateObjectFromDirto apply manifests to a cluster
Example:
Assess("verify service can be consumed", func(ctx context.Context, t *testing.T, c *envconf.Config) context.Context {
onboardingConfig, err := clusterutils.ConfigByPrefix("onboarding", "test")
if err != nil {
t.Error(err)
return ctx
}
objList, err := resources.CreateObjectsFromDir(ctx, onboardingConfig, "onboarding")
if err != nil {
t.Errorf("failed to create onboarding cluster objects: %v", err)
return ctx
}
for _, obj := range objList.Items {
if err := wait.For(openmcpconditions.Match(&obj, onboardingConfig, "Ready", corev1.ConditionTrue)); err != nil {
t.Error(err)
}
}
return ctx
})
Best Practices
For simple deployments
For simple deployments that don't require too much hassle, it is wise to just implement the deployment generation and creation in the Go code using plain
Deployment objects and ServiceAccount objects for access setups. For example:
// in authz package
// Configure adds a managed ClusterRoleBinding object to the given cluster.
// The passed in service account is granted the cluster-admin role.
func Configure(cluster resources.ManagedCluster, msa *authn.ManagedServiceAccount) {
crb := resources.NewManagedObject(&rbacv1.ClusterRoleBinding{
ObjectMeta: metav1.ObjectMeta{
Name: clusterRoleBindingName,
},
}, resources.ManagedObjectContext{
ReconcileFunc: func(_ context.Context, o client.Object) error {
oCRB := o.(*rbacv1.ClusterRoleBinding)
oCRB.Subjects = []rbacv1.Subject{
{
Kind: rbacv1.ServiceAccountKind,
Name: msa.Name,
Namespace: msa.Namespace,
},
}
oCRB.RoleRef = rbacv1.RoleRef{
APIGroup: rbacv1.GroupName,
Kind: "ClusterRole",
Name: "cluster-admin",
}
return nil
},
StatusFunc: resources.SimpleStatus,
})
cluster.AddObject(crb)
}
And then, during CreateOrUpdate or the Delete flows, you could call them as such:
// CreateOrUpdate is called on every add or update event
func (r *VeleroReconciler) CreateOrUpdate(ctx context.Context, obj *apiv1alpha1.Velero, pc *apiv1alpha1.ProviderConfig, clusters spruntime.ClusterContext) (ctrl.Result, error) {
l := log.FromContext(ctx)
spruntime.StatusProgressing(obj, "Reconciling", "Reconcile in progress")
err := r.ensureInstanceID(ctx, obj)
if err != nil {
return ctrl.Result{}, err
}
mgr, err := r.configResources(obj, pc, clusters)
if err != nil {
return ctrl.Result{}, err
}
results := mgr.Apply(ctx)
errRes := false
for _, r := range results {
if r.Error != nil {
l.Error(r.Error, objectutils.ObjectID(r.Object.GetObject()))
errRes = true
}
}
managedResources := resultsToResources(results)
obj.Status.Resources = managedResources
if allResourcesReady(managedResources) {
spruntime.StatusReady(obj)
}
if errRes {
return ctrl.Result{}, errors.New("reconciliation result contains errors")
}
return ctrl.Result{}, nil
}
Where configResources calls each individual package's Configure function like this:
namespace.Configure(mcpCluster, resources.Orphan)
...
authz.Configure(mcpCluster, mcpServiceAccount)
...
deployment.ConfigureMcp(mcpCluster, images["velero"], instance.GetID(obj))
...
...
This will always reconcile all the objects handled by this service provider.
For complex deployments with lot of moving parts
Now, consider a more complex deployment that requires and has a lot of buttons and switches to set during the deployment. This would be rather cumbersome and difficult to manage from Go code.
We recommend using Flux ( or any other deployment handling operator that understand Helm ) to do such deployments.
At the time of this writing, a flux provider implementation is in the works.
Simply create a HelmRelease object with all the bells whistles set using the ProviderConfig optionally.
Here is a plain example of an OCIRepository pointing to a configurable URL with a configurable version coming from either the ServiceProviderAPI:
// createOciRepository creates a repository pointing to the helm repo containing the to be deployed chart.
func createOciRepository(url, version string) *sourcev1.OCIRepository {
return &sourcev1.OCIRepository{
ObjectMeta: metav1.ObjectMeta{
Name: OCIRepositoryName,
Namespace: metav1.NamespaceDefault,
},
Spec: sourcev1.OCIRepositorySpec{
Interval: metav1.Duration{Duration: time.Minute},
URL: url,
Reference: &sourcev1.OCIRepositoryRef{
Tag: version,
},
},
}
}
// createHelmRelease creates a release for a chart with some optional helm values.
func createHelmRelease() (*helmv2.HelmRelease, error) {
values := make(map[string]interface{})
values["manager"] = map[string]interface{}{
"concurrency": map[string]interface{}{
"resource": 21,
},
"logging": map[string]interface{}{
"level": "debug",
},
}
content, err := json.Marshal(values)
if err != nil {
return nil, fmt.Errorf("failed to marshal helm value overrides: %w", err)
}
return &helmv2.HelmRelease{
ObjectMeta: metav1.ObjectMeta{
Name: HelmReleaseName,
Namespace: metav1.NamespaceDefault,
},
Spec: helmv2.HelmReleaseSpec{
Interval: metav1.Duration{Duration: time.Minute},
TargetNamespace: metav1.NamespaceDefault,
ChartRef: &helmv2.CrossNamespaceSourceReference{
Kind: "OCIRepository",
Name: OCIRepositoryName,
Namespace: metav1.NamespaceDefault,
},
Values: &apiextensionsv1.JSON{Raw: content},
// ServiceAccountName: --> could be created first and then passed in here.
KubeConfig: &meta.KubeConfigReference{
ConfigMapRef: &meta.LocalObjectReference{},
SecretRef: &meta.SecretKeyReference{},
},
},
}, nil
}
In here, the values, for example, could come from the ProviderConfig. Or further image accesses could be configured by other service accounts and secrets
being created as a first step.
Also notice the KubeConfig section. Your service provider controller is deployed in the Platform cluster. The target controller that your provider is deploying
will run in either the MCP or the Workload cluster. This depends on your choice, however, it is recommended to deploy into the workload cluster as documented here.
This means that Flux will need a KubeConfig in order to deploy to the ManagedControlPlane cluster. And this is the KubeConfig that is given to Flux in this section.
Note, that as of this writing, it's a bit difficult to obtain the KubeConfig for the Workload Cluster. MCPCluster KubeConfig is obtained via a AccessRequest. For an example, please look at how the service-provider-crossplane is doing this.