티스토리 뷰

IT/Container&k8s

[k8s] OpenInfra & Cloud Native Day Korea 2022

Hayley Shim 2023. 10. 29. 00:34

2022년 11월에 진행된 OpenInfra & Cloud Native Day Korea 2022 중 관심있었던 세션 내용을 공유합니다.

Session) The Future of Service Mesh

국내 service mesh 현황

service mesh 어려움 : 보안 적용 어려움, 복잡성 증가, 멀티클러스터/멀티 배포환경 관리
istio 도입 장애물 : 내부 인력 기술 이해도, TCO 비용 증가, 멀티클러스터, 멀티 배포환경 관리

service mesh 솔루션 소개

solo.io : istio, envoy 기반 솔루션 서비스, the future of istio is sidecar-less
참고 : https://www.solo.io/blog/ebpf-for-service-mesh/

— our enterprise service-mesh product (Gloo Mesh Enterprise) with eBPF to optimize the functionality around networking, observability, and security.

— A service mesh provides complex application-networking behaviors for services such as service discovery, traffic routing, resilience (timeout/retry/circuit breaking), authentication/authorization, observability (logging/metrics/tracing) and more.

— eBPF is Turing incomplete.

— eBPF is ideal for O(1) complexity (such as inspecting a packet, manipulating some bits, and sending it on its way). Implementing complex protocols like HTTP/2 and gRPC can be O(n) complexity and very difficult to debug

— For the complexities of Layer 7, Envoy remains the data plane for the service mesh.

— we see eBPF as a powerful way to optimize the service mesh, and we see Envoy proxy as the cornerstone of the data plane.

Ambient Mesh : 메시 기능을 l4/l7 분리, cni에 더 많은 기능 활용

Session) Application을 넘어 Infrastructure와 Kubernetes Infrastructure도 GitOps로 관리하기

k8s 클러스터 구성하는 방법

1. infra provisioning : managed k8s service with IaC

2. k8s cluster bootstrapping : kubeadm,

kubespray(https://kubernetes.io/ko/docs/setup/production-environment/tools/kubespray/)

3. addon

kubespray 중심으로 ansible 플레이북 추가하고 통합

ansible inventory별로 관리

문제점

온프렘, vm 에서는 잘 동작 그러나 플레이북 실행 시점에만 적용되어 환경 수정한 부분과 이후 재반영 시 충돌 발생
현재 git 저장소 내용과 실제 환경에 적용된 내역과의 비교/추적 및 조치 어려움
퍼블릭 클라우드 확장시, 비효율적이고 유지관리 부담

어플리케이션 배포 방법 개선

decapod 활용해서 app, service들을 gitops 형태로 배포/관리
gitops 장점 : git 내용과 실제 환경이 동일

Cluster API

Cluster API는 Kubernetes 클러스터를 프로비저닝, 업그레이드 및 운영하기 위한 선언적 API 및 도구를 제공하는 Kubernetes 하위 프로젝트

Cluster API 장점

Kubernetes 클러스터 구성뿐만 아니라 가상 머신, 네트워크,로드 밸런서 및 VPC와 같은 기반 인프라 구성 모두 Kubernetes에서 어플리케이션을 배포하고 관리하는 방식으로 정의하며 다양한 인프라 환경에서 일관된 방식으로 클러스터 배포 가능
GitOps와 결합하여 전체시스템의 버전 및 상태 관리를 일원화할 수 있고 이력을 확인하거나 롤백 등의 작업도 매우 수월하게 수행할 수 있음
또한, 풀 리퀘스트 및 리뷰를 통해 최종 변경 사항이 반영되기 때문에 운영상의 오류를 검토하고 바로 잡을 수 있는 기회 제공

Session) Kubernetes에서 확장 가능한 운영(HPA)

HPA

Scalability 문제를 해결하는 도구 중 하나

No-autoscaling vs Autoscaling

Autoscaling은 traffic에 따라 자동으로 pod를 확장 및 축소함

HPA 작동 방식

부하 생성 -> HPA가 autoscaling 수행 -> 신규 생성 파드도 부하 처리 -> 부하 주입 중단 -> 부하가 줄어들고 pod는 desired 상태만큼 감소함

HPA 효과

신뢰성: 사용자 및 application load에 적절하게 대응
비용절감
운영자동화 : 관리자 개입없이 load 증감에 따른 자동운영이 가능

HPA 작동방식

HPA horizontal controller
HPA Metric client : resourceclient, custom_metrics, external_metrics
HPA Conditions : horizontal controller가 참조하는 metric client

[참고]

HPA Spec

HPA CPU / Memory
HPA conditions : 현재 autoscaling 할 수 있는 여부 표시, replicas=0 설정 시 비활성화됨

HPA Tradeoff

Responsiveness vs cluster overhead
Responsiveness : cpu usage avgerage는 항상 load를 반영하지 못함
Responsiveness : detection delay. HPA sync and metric servers scrap에 지연이 발생
Target usage utilization이 높으면 비용이 절감되지만 responsiveness가 높음
서비스 특성에 맞게 선택
Keep image size small, keep startup time short, keep readiness check short

Reliability tip

HPA를 활용하면 pod의 종료 및 생성이 빈번하게 발생. 이러한 경우 reliability를 유지하는 팁

Gracefully shutdown

1) 각 framework는 gracefully shutdown 지원함.

2) Lifecycle terminationGracePeriodSeconds로 문제 해결 가능

2. Replicas of deployment : HPA가 설정한 값을 deployment manifest로 변경가능. Replicas 속성을 제거해서 대응

3. spike issue : HPA의 짧게 발생하는 peak는 HPA가 대응할 수 없음. Resource burstable QoS 전략으로 대응가능

4. readiness probe : 준비되지 않은 pod가 traffic을 받아서 connection refused가 발생

Session) 쿠버네티스 오브젝트의 Conditions 속성 살펴보기

k8s 오브젝트 일반적인 구성

apiVersion : 오브젝트가 나타내는 스키마 버전 정의
kind : 오브젝트가 나타내는 REST 자원 지정 ex) deployment, configmap
metadata : 오브젝트 자체를 설명하는 메타데이터

— name(namespace안에서 유일한 이름 지정)

— namespace

— labels(서로 연관된 오브젝트 그룹을 구분할 때 사용)

— annotions(오브젝트에 추가적인 정보를 붙일 때 사용)

— resourceVersion(오브젝트를 수정할 때마다 변경되는 문자열 식별자)

— generation(오브젝트 spec 내용이 변경될때마다 증가하는 숫자)

— ownerReference(k8s 오브젝트의 종속 관계를 표현할 때 사용)

spec & status
controller : custom resource 사용여부에 따라 operator와 차이
operator : custom resource 사용

오브젝트별 conditions 구조

각각의 상태를 나타낸 dictionary 형식을 가진 리스트 구조
오브젝트별로 추가적인 필드를 가진 condition 구조가 있음
nodeCondition : Node 오브젝트의 상태를 나타내는 NodeCondition 구조체. Node 상태를 표현하기 위해 새로운 type 형태와 field 존재( 정상적인 경우 type: ready, status: true)
podCondition : Pod 오브젝트의 상태를 나타내는 PodCondition 구조체(정상적인 경우 type: ready, status : true)
deploymentCondition : deployment 오브젝트의 상태를 나타내는 deploymentCondition 구조체(ReplicaFailure : quota 등 오류날 때 발생. 정상적인 경우 type: available, status : true)

오브젝트 상황별 condiiton 속성 변화

nodeCondition : MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable

— 가용메모리 < 100Mi

— 노드 메인 파일시스템 가용 공간 < 10%

— 컨테이너 이미지 및 컨테이너 파일 시스템 가용 공간 < 15%

— 노드 메인 파일시스템 inode 여유 <5%

— 노드에서 할당가능한 PID 여유 < 10%

podCondition : PodScheduled, Initialized, ContainersReady, Ready, PodHasNetwork

— PodScheduled : NodeAffinity 등을 이용해

— Initialized : initContainers 안에 정의한 컨테이너 실행 중 실패한 경우 initialized 컨디션

— ContainersReady : 컨테이너 이미지를 가져오지 못했거나 실패한 경우

-> 위 세가지 조건을 만족하면 Ready 상태가 됨

deploymentCondition : Available, Progressing, ReplicaFailure

— ReplicaFailure : 일시적인 에러 혹은 quota 할당량 부족으로 인해 pod 생성 실패 시 ReplicaFailure

— Available : MinimumReplicasAvailable 조건이 만족되면 True로 변경(spec.replicas, spec.strategy.rollingUpdate.maxUnvaialble..)

Session) 7천개가 넘어가는 클러스터에서 쏟아지는 온콜 이슈 처리하기

One Large Cluster vs Lots of Small Cluster

-> Lots of Small Cluster

격리, 보안 보장은 compliance를 위한 필수 요건
개발자유도 보장 및 개별 최적화

private Kubernetes as a Service,DKOS (Datacenter of Kakao Operating System)

7천개 이상 클러스터 존재
이에 따른 온콜 이슈 증가

Too Many(idle) Clusters

컴퓨팅 리소스 점유

2. Lots of on-call Issues

모든 개발자가 k8s를 잘알고 사용하지 않음

3. Known issues being forgotten

Even though we know some of issues right now, We forget after a few years

>>> Automation for Operation -> We need a “Detection as a Code”

Surviving From Endless Issues Coming From 7K+ Kubernetes Clusters — Wanhae Lee & Seok-yong Hong, Kakao Corp

Collector / Detector ->Extensible Components

blog migration project

written in 2022.11.1

https://medium.com/techblog-hayleyshim/k8s-openinfra-cloud-native-day-korea-2022-a53db49d9efe

'IT > Container&k8s' 카테고리의 다른 글

[k8s] Kube DNS & External DNS (2)	2023.10.29
[k8s] Kubernetes Operations (kOps) Install in AWS (0)	2023.10.29
[k8s] Network Resources (1)	2023.10.29
[k8s] 2022 kubecon- Security (0)	2023.10.29
[k8s] Debugging a k8s cluster (1)	2023.10.29

공지사항

광고 수익은 기부 활동에 사용됩니다:)

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

글 보관함

create value with tech, not tool