Ref.: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
First: Install metrics-server
Scales on only two nodes.
kubectl label nodes kind-worker2 php-apache-node=true
kubectl label nodes kind-worker3 php-apache-node=true
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
## nodeAffinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: php-apache-node
operator: In
values:
- "true"
## nodeSelector
# nodeSelector:
# php-apache-node: "true"
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
# ports:
# - port: 80
ports:
- port: 80
targetPort: 80
nodePort: 30009
selector:
run: php-apache
type: NodePort
EOF
Since each pod requests 200 milli-cores by kubectl run, this means an average CPU usage of 100 milli-cores. See Algorithm details for more details on the algorithm.
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
To do this, you’ll start a different Pod to act as a client. The container within the client Pod runs in an infinite loop, sending queries to the php-apache service.
Watch:
watch kubectl describe hpa php-apache
watch kubectl get hpa
Load:
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: default
spec:
behavior:
scaleDown:
policies:
- periodSeconds: 60
type: Pods
value: 1
selectPolicy: Max
stabilizationWindowSeconds: 0
scaleUp:
policies:
- periodSeconds: 60
type: Percent
value: 900
selectPolicy: Max
stabilizationWindowSeconds: 0
maxReplicas: 7
metrics:
- resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
EOF
Ref.: https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/
Units are in two formats:
1000m == 1 CPU
1.5 == 1.5 CPU
Limits and requets:
limits:
cpu: 500m <- Max limit for startup etc.
requests:
cpu: 200m <- Expected usage
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-min-max-demo-lr
spec:
limits:
- max:
cpu: "800m"
min:
cpu: "200m"
type: Container
kubectl get services -o json | grep kubernetes.fqdn
kubectl -n kube-system get svc kube-dns
kubectl -n namespace get ep
periodSeconds The first policy (Pods) allows at most 4 replicas to be scaled down in one minute. The second policy (Percent) allows at most 10% of the current replicas to be scaled down in one minute.
behavior:
scaleDown:
policies:
- type: Pods
value: 4
periodSeconds: 60
- type: Percent
value: 10
periodSeconds: 60
Pods are removed by the HPA to 10% per minute OR no more than 5 Pods are removed per minute. selectPolicy Min means that the autoscaler chooses the policy that affects the smallest number of Pods.
behavior:
scaleDown:
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 5
periodSeconds: 60
selectPolicy: Min