Machine Controller
A Machine
is the declarative spec for a Node
, as represented in Kubernetes
core. If a new Machine object is created, a provider-specific controller will
handle provisioning and installing a new host to register as a new Node
matching the Machine spec. If the Machine
s spec is updated, a provider-
specific controller is responsible for updating the Node in-place or replacing
the host with a new one matching the updated spec. If a Machine
object is
deleted, the corresponding Node
should have its external resources released by
the provider-specific controller, and should be deleted as well.
Machines can be associated with a Cluster using a custom label
cluster.k8s.io/cluster-name
. When the label is set and non-empty,
then it must reference the name of a cluster residing in the same namespace.
The label must be set only once and updates are not permitted,
an admission controller is going to enforce the change in a future version.
Machine
Machine
has 4 fields:
Spec
contains the desired machine state specified by the object. While much
of the Spec
is defined by users, unspecified parts may be filled in with
defaults or by Controllers such as autoscalers.
Status
contains only observed machine state and is only written by
controllers. Status
is not the source of truth for any information, but
instead aggregates and publishes observed state.
TypeMeta
contains metadata about the API itself - such as Group, Version,
Kind.
ObjectMeta
contains metadata about the specific object instance, for
example, it's name, namespace, labels, and annotations, etc. ObjectMeta
contains data common to most objects.
// Machine is the Schema for the machines API
// +k8s:openapi-gen=true
// +kubebuilder:resource:shortName=ma
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="ProviderID",type="string",JSONPath=".spec.providerID",description="Provider ID"
// +kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase",description="Machine status such as Terminating/Pending/Running/Failed etc"
// +kubebuilder:printcolumn:name="NodeName",type="string",JSONPath=".status.nodeRef.name",description="Node name associated with this machine",priority=1
type Machine struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MachineSpec `json:"spec,omitempty"`
Status MachineStatus `json:"status,omitempty"`
}
MachineSpec
The ProviderSpec
is recommended to be a serialized API object in a format
owned by that provider. This will allow the configuration to be strongly typed,
versioned, and have as much nested depth as appropriate. These provider-specific
API definitions are meant to live outside of the Machine API, which will allow
them to evolve independently of it. Attributes like instance type, which
network to use, and the OS image all belong in the ProviderSpec
.
Some providers and tooling depend on an annotation to be set on the Machine
to determine if provisioning has completed. For example, the clusterctl
command does this here:
// TODO: update once machine controllers have a way to indicate a machine has been provisoned. https://github.com/kubernetes-sigs/cluster-api/issues/253
// Seeing a node cannot be purely relied upon because the machine running the control plane
// will not be registering with the stack that provisions it.
ready := m.Status.NodeRef != nil || len(m.Annotations) > 0
return ready, nil
// MachineSpec defines the desired state of Machine
type MachineSpec struct {
// ObjectMeta will autopopulate the Node created. Use this to
// indicate what labels, annotations, name prefix, etc., should be used
// when creating the Node.
// +optional
metav1.ObjectMeta `json:"metadata,omitempty"`
// The list of the taints to be applied to the corresponding Node in additive
// manner. This list will not overwrite any other taints added to the Node on
// an ongoing basis by other entities. These taints should be actively reconciled
// e.g. if you ask the machine controller to apply a taint and then manually remove
// the taint the machine controller will put it back) but not have the machine controller
// remove any taints
// +optional
Taints []corev1.Taint `json:"taints,omitempty"`
// ProviderSpec details Provider-specific configuration to use during node creation.
// +optional
ProviderSpec ProviderSpec `json:"providerSpec"`
// Versions of key software to use. This field is optional at cluster
// creation time, and omitting the field indicates that the cluster
// installation tool should select defaults for the user. These
// defaults may differ based on the cluster installer, but the tool
// should populate the values it uses when persisting Machine objects.
// A Machine spec missing this field at runtime is invalid.
// +optional
Versions MachineVersionInfo `json:"versions,omitempty"`
// ConfigSource is used to populate in the associated Node for dynamic kubelet config. This
// field already exists in Node, so any updates to it in the Machine
// spec will be automatically copied to the linked NodeRef from the
// status. The rest of dynamic kubelet config support should then work
// as-is.
// +optional
ConfigSource *corev1.NodeConfigSource `json:"configSource,omitempty"`
// ProviderID is the identification ID of the machine provided by the provider.
// This field must match the provider ID as seen on the node object corresponding to this machine.
// This field is required by higher level consumers of cluster-api. Example use case is cluster autoscaler
// with cluster-api as provider. Clean-up logic in the autoscaler compares machines to nodes to find out
// machines at provider which could not get registered as Kubernetes nodes. With cluster-api as a
// generic out-of-tree provider for autoscaler, this field is required by autoscaler to be
// able to have a provider view of the list of machines. Another list of nodes is queried from the k8s apiserver
// and then a comparison is done to find out unregistered machines and are marked for delete.
// This field will be set by the actuators and consumed by higher level entities like autoscaler that will
// be interfacing with cluster-api as generic provider.
// +optional
ProviderID *string `json:"providerID,omitempty"`
}
MachineStatus
Like ProviderSpec
, ProviderStatus
is recommended to be a serialized API
object in a format owned by that provider.
Note that NodeRef
may not be set. This can happen if the Machine
and
corresponding Node
are not within the same cluster. Two reasons this might be
the case are:
- During bootstrapping, the control plane
Machine
will initially not be in the same cluster which is being created. - Some providers distinguish between manager and managed clusters. For
these providers a
Machine
and it's correspondingNode
may never be within the same cluster. TODO: There are open issues to address this.
// MachineStatus defines the observed state of Machine
type MachineStatus struct {
// NodeRef will point to the corresponding Node if it exists.
// +optional
NodeRef *corev1.ObjectReference `json:"nodeRef,omitempty"`
// LastUpdated identifies when this status was last observed.
// +optional
LastUpdated *metav1.Time `json:"lastUpdated,omitempty"`
// Versions specifies the current versions of software on the corresponding Node (if it
// exists). This is provided for a few reasons:
//
// 1) It is more convenient than checking the NodeRef, traversing it to
// the Node, and finding the appropriate field in Node.Status.NodeInfo
// (which uses different field names and formatting).
// 2) It removes some of the dependency on the structure of the Node,
// so that if the structure of Node.Status.NodeInfo changes, only
// machine controllers need to be updated, rather than every client
// of the Machines API.
// 3) There is no other simple way to check the control plane
// version. A client would have to connect directly to the apiserver
// running on the target node in order to find out its version.
// +optional
Versions *MachineVersionInfo `json:"versions,omitempty"`
// ErrorReason will be set in the event that there is a terminal problem
// reconciling the Machine and will contain a succinct value suitable
// for machine interpretation.
//
// This field should not be set for transitive errors that a controller
// faces that are expected to be fixed automatically over
// time (like service outages), but instead indicate that something is
// fundamentally wrong with the Machine's spec or the configuration of
// the controller, and that manual intervention is required. Examples
// of terminal errors would be invalid combinations of settings in the
// spec, values that are unsupported by the controller, or the
// responsible controller itself being critically misconfigured.
//
// Any transient errors that occur during the reconciliation of Machines
// can be added as events to the Machine object and/or logged in the
// controller's output.
// +optional
ErrorReason *common.MachineStatusError `json:"errorReason,omitempty"`
// ErrorMessage will be set in the event that there is a terminal problem
// reconciling the Machine and will contain a more verbose string suitable
// for logging and human consumption.
//
// This field should not be set for transitive errors that a controller
// faces that are expected to be fixed automatically over
// time (like service outages), but instead indicate that something is
// fundamentally wrong with the Machine's spec or the configuration of
// the controller, and that manual intervention is required. Examples
// of terminal errors would be invalid combinations of settings in the
// spec, values that are unsupported by the controller, or the
// responsible controller itself being critically misconfigured.
//
// Any transient errors that occur during the reconciliation of Machines
// can be added as events to the Machine object and/or logged in the
// controller's output.
// +optional
ErrorMessage *string `json:"errorMessage,omitempty"`
// ProviderStatus details a Provider-specific status.
// It is recommended that providers maintain their
// own versioned API types that should be
// serialized/deserialized from this field.
// +optional
ProviderStatus *runtime.RawExtension `json:"providerStatus,omitempty"`
// Addresses is a list of addresses assigned to the machine. Queried from cloud provider, if available.
// +optional
Addresses []corev1.NodeAddress `json:"addresses,omitempty"`
// Conditions lists the conditions synced from the node conditions of the corresponding node-object.
// Machine-controller is responsible for keeping conditions up-to-date.
// MachineSet controller will be taking these conditions as a signal to decide if
// machine is healthy or needs to be replaced.
// Refer: https://kubernetes.io/docs/concepts/architecture/nodes/#condition
// +optional
Conditions []corev1.NodeCondition `json:"conditions,omitempty"`
// LastOperation describes the last-operation performed by the machine-controller.
// This API should be useful as a history in terms of the latest operation performed on the
// specific machine. It should also convey the state of the latest-operation for example if
// it is still on-going, failed or completed successfully.
// +optional
LastOperation *LastOperation `json:"lastOperation,omitempty"`
// Phase represents the current phase of machine actuation.
// E.g. Pending, Running, Terminating, Failed etc.
// +optional
Phase *string `json:"phase,omitempty"`
}
// LastOperation represents the detail of the last performed operation on the MachineObject.
type LastOperation struct {
// Description is the human-readable description of the last operation.
Description *string `json:"description,omitempty"`
// LastUpdated is the timestamp at which LastOperation API was last-updated.
LastUpdated *metav1.Time `json:"lastUpdated,omitempty"`
// State is the current status of the last performed operation.
// E.g. Processing, Failed, Successful etc
State *string `json:"state,omitempty"`
// Type is the type of operation which was last performed.
// E.g. Create, Delete, Update etc
Type *string `json:"type,omitempty"`
}
Machine Actuator Interface
All methods should be idempotent. Each time the Machine controller attempts to reconcile the state it will call one or more of the following actuator methods.
Create()
will only be called when Exists()
returns false.
Update()
will only be called when Exists()
returns true.
Delete()
will only be called when the Machine
is in the process of being
deleted.
The definition of Exists()
is determined by the provider.
TODO: Provide more guidance on Exists()
.
// Actuator controls machines on a specific infrastructure. All
// methods should be idempotent unless otherwise specified.
type Actuator interface {
// Create the machine.
Create(context.Context, *clusterv1.Cluster, *clusterv1.Machine) error
// Delete the machine. If no error is returned, it is assumed that all dependent resources have been cleaned up.
Delete(context.Context, *clusterv1.Cluster, *clusterv1.Machine) error
// Update the machine to the provided definition.
Update(context.Context, *clusterv1.Cluster, *clusterv1.Machine) error
// Checks if the machine currently exists.
Exists(context.Context, *clusterv1.Cluster, *clusterv1.Machine) (bool, error)
}
Machine Controller Semantics
- Determine the
Cluster
associated with theMachine
from itscluster.k8s.io/cluster-name
label. - If the
Machine
hasn't been deleted and doesn't have a finalizer, add one. - If the
Machine
is being deleted, and there is no finalizer, we're done- Check if the
Machine
is allowed to be deleted. 1 - Call the provider specific actuators
Delete()
method.- If the
Delete()
method returns true, remove the finalizer.
- If the
- Check if the
- Check if the
Machine
exists by calling the provider specificExists()
method.- If it does, call the
Update()
method. - If the
Update()
fails and returns a retryable error:- Retry the
Update()
after N seconds.
- Retry the
- If it does, call the
- If the machine does not exist, attempt to create machine by calling
actuator
Create()
method.
The Machine actuator methods expect both a Cluster
and a Machine
to be
passed in. While there is not a strong link between Cluster
s and Machine
s,
the machine controller will determine which cluster to pass by looking for a
Cluster
in the same namespace as the Machine
There are two consequences of this:
- The machine actuator assumes there will be exactly one
Cluster
in the same namespace as anyMachine
s it reconciles. SeegetCluster()
for the details. - If the
Cluster
is deleted before theMachine
it will not be possible to delete theMachine
. ThereforeMachine
s must be deleted beforeCluster
s.
machine reconciliation logic
machine deletion block
machine object creation sequence
machine object deletion sequence
1 One reason a Machine
may not be deleted is if it corresponds to the
node running the Machine controller.