๋ฐ๋ธŒ์ฝ”์Šค_๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋ง

[Week10 Airflow] TIL 42์ผ์ฐจ Airflow ์‚ฌ์šฉํ•˜๊ธฐ ๋ฐ ์ดˆ๊ธฐ ์„ค์ •

๐Ÿช„ํ•˜๋ฃจ๐Ÿช„ 2023. 12. 12. 23:46
728x90

Airflow๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

โ… . Airflow๋ฅผ ์šด์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•

1. ์ง์ ‘ ์„ค์น˜+์šด์˜

1) Docker ์ด๋ฏธ์ง€๋กœ Airflow ์‚ฌ์šฉ (GCP)
2) ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค์˜ ๋ฆฌ๋ˆ…์Šค ์„œ๋ฒ„์— ์ง์ ‘ ์„ค์น˜ (AWS)

2. ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค์— ์„ค์น˜ํ•ด์„œ ์‚ฌ์šฉ

1) AWS

MWAA

2) GCS(Google Cloud Storage)

Clout compozer

3) MS Azure

Azure Data factory์˜ Airflow DAG ์ด์šฉ
 
 
ํ•ด๋‹น ๊ธ€์—์„œ๋Š” ์ง์ ‘ ์„ค์น˜+์šด์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‹ค์Šตํ•ด ๋ณธ๋‹ค.

โ…ก. Airflow ์ง์ ‘ ์„ค์น˜ํ•˜๊ณ  ์šด์˜ํ•˜๊ธฐ ์‹ค์Šต

1. Docker ์ด๋ฏธ์ง€๋กœ Airflow ์‚ฌ์šฉ

์ปดํ“จํ„ฐ์— ์ง์ ‘ ์„ค์น˜ํ•ด๋„ ๋˜์ง€๋งŒ, ๋„์ปค๋ฅผ ๋กœ์ปฌ ์ปดํ“จํ„ฐ์— ๋Œ๋ฆฌ๋ฉด ์ปดํ“จํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋Š๋ ค์ง„๋‹ค.
๊ทธ๋ž˜์„œ ์ด์™•์ด๋ฉด ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค์˜ ์„œ๋ฒ„์— Docker๋ฅผ ์„ค์น˜ํ•˜๊ณ  Airflow๋ฅผ ๋„์ปค ์ด๋ฏธ์ง€๋กœ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ๋ณด์ž.
์—ฌ๊ธฐ์„œ๋Š” GCS์˜ compute machine์„ ์‚ฌ์šฉํ•œ๋‹ค.
(AWS GCS ๋ชจ๋‘ ์ต์ˆ™ํ•˜๊ฒŒ ๋‹ค๋ค„๋ณด์ž)
 

1) VM ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ

์„œ์šธ ์ง€์—ญ์˜ n1์„œ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋ฌด๋ฃŒ ํฌ๋ ˆ๋”ง์„ ์ด์šฉํ•  ๊ฒƒ์ด๋‹ค.
๋จธ์‹  ์œ ํ˜•์˜ Custom์„ ์„ ํƒํ•˜๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ง€์ •ํ•˜์˜€๋‹ค.(vCPU : 4, Memory : 8 )

๋จธ์‹  ๊ตฌ์„ฑ

 
๋ถ€ํŒ… ๋””์Šคํฌ๋Š” Ubuntu-20.04๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

๋ถ€ํŒ… ๋””์Šคํฌ ์„ค์ •

 
๋ฐฉํ™”๋ฒฝ์€ HTTP, HTTPS ํŠธ๋ž˜ํ”ฝ์„ ๋ชจ๋‘ ํ—ˆ์šฉํ•˜์˜€๋‹ค.

๋ฐฉํ™”๋ฒฝ ์„ค์ •

 

2) ์„œ๋ฒ„์— Docker ์„ค์น˜

vm ์ธ์Šคํ„ด์Šค ์ •๋ณด

 
๋ฐฉ๊ธˆ ์ƒ์„ฑํ•œ ์ธ์Šคํ„ด์Šค์— SSH๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๋ฉด ๋ธŒ๋ผ์šฐ์ €์—์„œ ssh๋ฅผ ํ†ตํ•ด ์„œ๋ฒ„์— ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค.
 
โ‘  ์‹œ์Šคํ…œ์˜ ํŒจํ‚ค์ง€ ๋ชฉ๋ก์„ ์ตœ์‹  ์ƒํƒœ๋กœ ์—…๋ฐ์ดํŠธ
โ‘ก Docker๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•ด ํ•„์š”ํ•œ ์ถ”๊ฐ€์ ์ธ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜(apt-transport-https, ca-certificates, curl, software-properties-common)
โ‘ข Docker์˜ ๊ณต์‹ GPG ํ‚ค๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ APT ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ์ž์— ์ถ”๊ฐ€
โ‘ฃ Docker ํŒจํ‚ค์ง€๋ฅผ ์ œ๊ณตํ•˜๋Š” ์ €์žฅ์†Œ๋ฅผ ์‹œ์Šคํ…œ์— ์ถ”๊ฐ€
โ‘ค ์ƒˆ๋กœ ์ถ”๊ฐ€๋œ Docker ์ €์žฅ์†Œ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜์—ฌ ํŒจํ‚ค์ง€ ๋ชฉ๋ก์„ ์—…๋ฐ์ดํŠธ
โ‘ฅ ์„ค์น˜ ๊ฐ€๋Šฅํ•œ Docker ๋ฒ„์ „์˜ ๋ชฉ๋ก์„ ํ‘œ์‹œ โ‘ฆ ์•ž์„œ ์„ค์ •ํ•œ ์ €์žฅ์†Œ๋ฅผ ํ†ตํ•ด Docker Community Edition (CE)๋ฅผ ์„ค์น˜ + Docker ์—”์ง„ ํ™œ์„ฑํ™”

sudo apt update
sudo apt install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
apt-cache policy docker-ce
sudo apt install docker-ce

 

3) ์„œ๋ฒ„์— Docker-compose ์„ค์น˜

โ‘  Docker compose ํŒจํ‚ค์ง€ ๋‹ค์šด๋กœ๋“œ (2 ๋ฒ„์ „ ์ด์ƒ์„ ๋‹ค์šด๋กœ๋“œํ•˜์ž. 1 ๋ฒ„์ „์€ ์˜ค๋ฅ˜๊ฐ€ ๋งŽ์ด ๋‚œ๋‹ค)
โ‘ก ๋‹ค์šด๋กœ๋“œํ•œ Docker Compose ๋ฐ”์ด๋„ˆ๋ฆฌ ํŒŒ์ผ์— ์‹คํ–‰ ๊ถŒํ•œ์„ ๋ถ€์—ฌ
โ‘ข ํ˜„์žฌ ์‚ฌ์šฉ์ž๋ฅผ Docker ๊ทธ๋ฃน์— ์ถ”๊ฐ€

sudo curl -L https://github.com/docker/compose/releases/download/v2.4.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo usermod -aG docker $USER

 

4) uid, gid ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ ์„ค์ •

์‚ฌ์šฉ์ž ์•„์ด๋””, ๊ทธ๋ฃน ์•„์ด๋””๋กœ ( /etc/passwd, /etc/group) ํŒŒ์ผ์—์„œ ๊ด€๋ จ ์ •๋ณด๋“ค์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ 
ํ˜„์žฌ ์‚ฌ์šฉ์ž์— ๋Œ€ํ•œ ์ •๋ณด๋Š” id -u ๋ช…๋ น์–ด๋กœ ๊ทธ๋ฃน์— ๋Œ€ํ•œ ์ •๋ณด๋Š” id -g ๋ช…๋ น์–ด๋กœ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

export AIRFLOW_UID=$(id -u)
export AIRFLOW_GID=$(id -g)

 
 

6) Docker image๋กœ ํ™˜๊ฒฝ ๊ตฌ์ถ•ํ•˜๊ธฐ

๋„์ปค ์ด๋ฏธ์ง€๊ฐ€ ์žˆ๋Š” ๊นƒํ—ˆ๋ธŒ ๋ ˆํฌ์ง€ํ† ๋ฆฌ ํด๋ก 

git clone https://github.com/keeyong/airflow-setup.git

 

Apache Airflow์˜ Docker Compose ์„ค์ • ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œ

airflow-setup ํด๋”๋กœ ์ด๋™ํ•œ ๋’ค, Docker compose ์„ค์ •ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•œ๋‹ค.

cd airflow-setup
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.1/docker-compose.yaml'

 
yaml ํŒŒ์ผ ์ˆ˜์ •
์˜ค๋ฅ˜1

*** Log file does not exist: /opt/airflow/logs/~
*** Fetching from: http://:8793/log/dag_id=~
*** Failed to fetch log file from worker. Request URL is missing an 'http://' or 'https://' protocol.

 
docker์—์„œ DAG๋ฅผ ์‹คํ–‰์‹œํ‚ค๋ฉด ์œ„์˜ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์›์ธ์€ ์—ฌ๋Ÿฌ ๊ฐ€์ง€๊ฐ€ ์žˆ์ง€๋งŒ,
yamlํŒŒ์ผ ์ผ๋ถ€ ๋ณ€๊ฒฝ ๋ฐ airflow-scheduler ์ปจํ…Œ์ด๋„ˆ์—์„œ ๋กœ๊ทธ ํด๋”์˜ ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๋ฉด ํ•ด๊ฒฐ๋˜๋Š” ๋ฌธ์ œ์ด๋‹ค.

 

- docker/airflow-setup/docker-compose.yaml ํŒŒ์ผ ์ผ๋ถ€ ๋ณ€๊ฒฝ
๋‹ค์Œ์˜ ํ™˜๊ฒฝ๋ณ€์ˆ˜์— ์„ค์น˜ํ•  ๋ชจ๋“ˆ์„ ์ ๋Š”๋‹ค.(์‹ค์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋“  ๋ชจ๋“ˆ์˜ ์ •๋ณด์ด๋‹ค.) ํ•ด๋‹น ํ™˜๊ฒฝ๋ณ€์ˆ˜๋Š” Python ํŒจํ‚ค์ง€ ๊ด€๋ฆฌ์ž์ธ pip๋ฅผ ํ†ตํ•ด ์ถ”๊ฐ€์ ์ธ ํŒจํ‚ค์ง€ ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ง€์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ ์ž๋™์œผ๋กœ ์ถ”๊ฐ€์ ์ธ ํŒจํ‚ค์ง€๋ฅผ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

 _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- yfinance pandas numpy oauth2client gspread pymysql}

 

 

์˜ค๋ฅ˜ 

Got permission denied while trying to connect to the Docker daemon socke

 

๋„์ปค ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œ ๋ฐ›์œผ๋ฉด ์œ„์˜ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค

์›์ธ์€ ์—ฌ๋Ÿฌ๊ฐ€์ง€๊ฐ€ ์žˆ์ง€๋งŒ, Docker์™€ Docker Compose๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ๊ถŒํ•œ์ด ๋ถ€์—ฌ๋˜์ง€ ์•Š์€ ๊ฒƒ์ผ ์ˆ˜๋„ ์žˆ๋‹ค.

 

1. Docker ๊ทธ๋ฃน์— ์‚ฌ์šฉ์ž ์ถ”๊ฐ€:

sudo usermod -aG docker $USER

 

2. Docker ๋ฐ๋ชฌ ์†Œ์ผ“ ์†Œ์œ ๊ถŒ ๋ณ€๊ฒฝํ›„ ์‹œ์Šคํ…œ ์žฌ์‹œ์ž‘

sudo chown :docker /var/run/docker.sock
sudo reboot

 

๋„์ปค ์ด๋ฏธ์ง€ ๋‹ค์šด๋กœ๋“œ ๋ฐ ์ปจํ…Œ์ด๋„ˆ ์‹คํ–‰

๋„์ปค ์ด๋ฏธ์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œํ•œ ๋’ค, ์ปจํ…Œ์ด๋„ˆ ์œ„์— ์‹คํ–‰์‹œํ‚จ๋‹ค.

docker-compose -f docker-compose.yaml pull
docker-compose -f docker-compose.yaml up -d

- -d์˜ต์…˜
detach๋ชจ๋“œ๋กœ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ๋ฐฑ๊ทธ๋ผ์šด๋“œ์—์„œ ์‹คํ–‰๋˜๋ฉฐ ํ„ฐ๋ฏธ๋„๊ณผ์˜ ์—ฐ๊ฒฐ์ด ๋Š์–ด์ง„๋‹ค.
์ปจํ…Œ์ด๋„ˆ๋ฅผ ๋ฐฑ๊ทธ๋ผ์šด๋“œ์—์„œ ์‹คํ–‰ํ•˜๊ณ  ํ„ฐ๋ฏธ๋„์„ ๋‹ค๋ฅธ ๋ช…๋ น์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ช…๋ น์–ด ์˜ต์…˜์ด๋‹ค.
 
- airflow-scheduler ์ปจํ…Œ์ด๋„ˆ์—์„œ ๋กœ๊ทธ ํด๋”์˜ ๊ถŒํ•œ์„ ๋ถ€์—ฌ
โ‘  airflow-scheduler์˜ CONTAINER ID ์•Œ์•„๋‚ด๊ธฐ
docker ps
๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ airflow-scheduler์˜ CONTAINER ID ์ •๋ณด๋ฅผ ํš๋“ํ•œ๋‹ค.
โ‘ก Root ์‚ฌ์šฉ์ž๋กœ ์ปจํ…Œ์ด๋„ˆ ์ ‘์†ํ•˜๊ธฐ

docker exec -u root -it CONTAINER ID sh
โ‘ข ๋กœ๊ทธ ํด๋”์— ๊ถŒํ•œ ๋ถ€์—ฌํ•˜๊ธฐ
sudo chmod -R 777 /opt/airflow/logs
 

7) airflow ์ดˆ๊ธฐ ์„ค์ •ํ•˜๊ธฐ

docker-scheduler์— ๋กœ๊ทธ์ธํ•ด์„œ airflow.cfg ํŒŒ์ผ์˜ ๋‘ ๊ฐ€์ง€ ๋ณ€์ˆ˜๊ฐ’์„ ๋ณ€๊ฒฝํ•œ๋‹ค.
docker exec -it CONTAINER ID sh
vim airflow.cfg
โ‘  [core] executor = LocalExecutor
โ‘ก [database] sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost:5432/airflow
 

์ดํ›„์— ์žฌ์‹œ์ž‘ํ•œ๋‹ค

docker-compose restart

 

8) ์›น UI ๋กœ๊ทธ์ธํ•˜๊ธฐ

GCP์˜ ์™ธ๋ถ€ ip์ฃผ์†Œ๋ฅผ ๊ณ ์ •

์™ธ๋ถ€ ip์ฃผ์†Œ> ์™ธ๋ถ€ ์ฃผ์†Œ ์˜ˆ์•ฝ์—์„œ ์™ธ๋ถ€์ฃผ์†Œ๋ฅผ ๊ณ ์ • ip์ฃผ์†Œ๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค.
 

๋ฐฉํ™”๋ฒฝ ์„ค์ • ๋ณ€๊ฒฝ

VPC๋„คํŠธ์›Œํฌ > ๋ฐฉํ™”๋ฒฝ ๊ทœ์น™ > ๋ฐฉํ™”๋ฒฝ ๊ทœ์น™ ๋งŒ๋“ค๊ธฐ ์—์„œ 8080 ํฌํŠธ์— ๋Œ€ํ•œ ์ ‘์†์„ ํ—ˆ๊ฐ€ํ•œ๋‹ค.

๋ฐฉํ™”๋ฒฝ ๊ทœ์น™ ๋งŒ๋“ค๊ธฐ

 

๊ทœ์น™๋‚ด์šฉ

  • ๋Œ€์ƒ : ๋„คํŠธ์›Œํฌ์˜ ๋ชจ๋“  ์ธ์Šคํ„ด์Šค
  • ์†Œ์Šคํ•„ํ„ฐ : IPV4 ๋ฒ”์œ„
  • ์†Œ์Šค IPV4 ๋ฒ”์œ„ : 0.0.0.0/0
  • ํ”„๋กœํ† ์ฝœ ๋ฐ ํฌํŠธ : tcp - 8080

 

๊ณ ์ • ip ์ฃผ์†Œ ์ด์šฉํ•ด airflow UI ์ ‘์†

๋‹ค์Œ์€ ๊ณ ์ • ip ์ฃผ์†Œ๋ฅผ ์ด์šฉํ•ด airflow์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ‘์†ํ•œ ํ™”๋ฉด์ด๋‹ค.

airflow web UI

 

์ฐธ๊ณ )

 

Install Docker Engine on Ubuntu

Jumpstart your client-side server applications with Docker Engine on Ubuntu. This guide details prerequisites and multiple methods to install Docker Engine on Ubuntu.

docs.docker.com

 
 

2. AWS ๋ฆฌ๋ˆ…์Šค ์„œ๋ฒ„์— airflow ์ง์ ‘ ์„ค์น˜ํ•˜๊ธฐ

์—ฌ๊ธฐ์„œ๋Š” AWS ํด๋ผ์šฐ๋“œ ์„œ๋น„์Šค๋ฅผ ์ด์šฉํ•œ๋‹ค.
 

1) EC2 ์„œ๋ฒ„ ๋งŒ๋“ค๊ธฐ

Application and OS Images (Amazon Machine Image)

Ubuntu-20.04 ์‚ฌ์šฉ
 

์ธ์Šคํ„ด์Šค ์œ ํ˜•

t3.small ์„ ํƒ
 

ํ‚ค ํŽ˜์–ด

window 10 ์ด์ƒ๋ถ€ํ„ฐ๋Š”. pemํŒŒ์ผ์„ ์ง€์›ํ•œ๋‹ค.

ec2 key pair ์„ค์ •

 

๋„คํŠธ์›Œํฌ ์„ค์ •

EC2 ๋„คํŠธ์›Œํฌ ์„ค์ •

 

2) ์ ‘์†ํ•˜๊ธฐ

ํŒŒ์ผ ๊ถŒํ•œ ์ˆ˜์ •

ํŒŒ์ผ ๊ถŒํ•œ์„ ์ˆ˜์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
Permissions for 'airflow-dev.pem' are too open.
 
๋‹ค์Œ ๊ธ€์„ ๋”ฐ๋ผ ๊ถŒํ•œ์„ ์ˆ˜์ •ํ•˜๋„๋ก ํ•œ๋‹ค.

 

์œˆ๋„์šฐ ssh ์—ฐ๊ฒฐ ์—๋Ÿฌ(permissions ... too open๋“ฑ)

๋ชฉ์ฐจ ๋ฆฌ๋ˆ…์Šค์˜ ๊ฒฝ์šฐ pem ํŒŒ์ผ์„ chmod 400 "ํŒŒ์ผ๋ช…" ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค. Permissions... too open ์—๋Ÿฌ Permissions for '.pem' are too open It is required that your private key files are NOT accessible by others. This private key will be ignored

rainbound.tistory.com

 

2)-1. [pemํŒŒ์ผ ์†์„ฑ]-[๋ณด์•ˆ]-[๊ณ ๊ธ‰]-[์ƒ์†์‚ฌ์šฉ์•ˆํ•จ]-[์ด ๊ฐœ์ฒด์—์„œ ์ƒ์†๋œ ์‚ฌ์šฉ ๊ถŒํ•œ์„ ๋ชจ๋‘ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค]-[์ ์šฉ]-[ํ™•์ธ]

 

2)-2. ๋˜๋Š” cmd์—์„œ ์ง์ ‘ '๋‚˜'์—๊ฒŒ๋งŒ r, w๊ถŒํ•œ์„ ๋ถ€์—ฌํ•œ๋‹ค.

[cmd]
cd pemํŒŒ์ผ์ด ์žˆ๋Š” ํด๋”
icacls ํ‚ค์ด๋ฆ„.pem /inheritance:r /grant:r "%USERNAME%:RW"
 

3) EC2 ์„œ๋ฒ„์— airflow ์„ค์น˜

ssh -i "ํ‚ค์ด๋ฆ„.pem" ubuntu@ํผ๋ธ”๋ฆญ IPv4 DNS ์ฃผ์†Œ
 
๊ทธ ์ดํ›„์—” ๋‹ค์Œ ๊นƒํ—ˆ๋ธŒ ํŠœํ† ๋ฆฌ์–ผ์„ ๋”ฐ๋ผ๊ฐ€์ž
https://github.com/keeyong/airflow-setup/blob/main/docs/Airflow%202%20Installation.md

(๊ฐ„๋‹จํžˆ ์„ค๋ช…์„ ํ•˜์ž๋ฉด)

๊ณ„์ •๋ช… ubuntu airflow postgresql
 

4) Ubuntu ์ตœ์‹  ์ƒํƒœ๋กœ ์—…๋ฐ์ดํŠธ, python3 ์„ค์น˜, python-openssel ์—…๋ฐ์ดํŠธ
5) mysql๊ด€๋ จ ๋ชจ๋“ˆ ์„ค์น˜, airflow ์„ค์น˜, ๊ธฐํƒ€ ๋ชจ๋“ˆ ์„ค์น˜

6) ๊ทธ๋ฃน ๋ฐ ๊ณ„์ • ์ƒ์„ฑ(airflow)
(user : airflow, password : airflow)
7) ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค(postgresql) ์„ค์น˜

10) Postgresql ์žฌ์‹œ์ž‘

16) ์›น์„œ๋ฒ„, ์Šค์ผ€์ฅด๋Ÿฌ๋ฅผ ์„œ๋น„์Šค๋กœ ๋“ฑ๋ก→ํ™œ์„ฑํ™”→์‹œ์ž‘
(cmd์—์„œ ๋กœ๊ทธ์•„์›ƒํ•ด๋„ ๋ฐฑ๊ทธ๋ผ์šด๋“œ์—์„œ ์‹คํ–‰๋˜์–ด airflow์— ์ ‘์†ํ•  ์ˆ˜ ์žˆ๋„๋ก)

11) ๊ณ„์ • ์ „ํ™˜
12) dags ํด๋” ์ƒ์„ฑ
13) airflow ์ดˆ๊ธฐํ™”(init)
14) ํ™˜๊ฒฝํŒŒ์ผ
localexecutor๋กœ ๋ณ€๊ฒฝ
๊ธฐ๋ณธ DB๋ฅผ postgresql๋กœ ๋ณ€๊ฒฝ
15) airflow ์žฌ ์„ค์ •(init)

17) Airflow ์›น์„œ๋ฒ„์— ๋กœ๊ทธ์ธ ์–ด์นด์šดํŠธ ์ƒ์„ฑ
(role : Admin, username : admin, password : admin4321)

18) dag ํŒŒ์ผ ๊นƒํ—ˆ๋ธŒ์—์„œ ํด๋ก  ํ•ด์„œ ์šฐ๋ถ„ํˆฌ ์„œ๋ฒ„-airflow ํด๋”์— ์ €์žฅ

19) airflow ๊ณ„์ •์˜ ํ™˜๊ฒฝ๋ณ€์ˆ˜(AIRFLOW_HOME) ์„ค์ •

8) ๊ณ„์ • ์ „ํ™˜
7) postgres์— airflow ์‚ฌ์šฉ์ž ์ƒ์„ฑ
9) airflow Database ์ƒ

 

10) ๋ฒˆ ๊ณผ์ •

  • "executor" : SequentialExecutor → LocalExecutor๋กœ ์ˆ˜์ •(sqlite๋ฅผ ๊ธฐ๋ณธ DB๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์— SequentialExecutor์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.)
  • "sql_alchemy_conn" : sqlite → postgres๋กœ ๋ฐ”๊พผ๋‹ค
postgresql+psycopg2://์ƒˆ๋กœ์šด์‚ฌ์šฉ์ž:์ƒˆ๋กœ์šด๋น„๋ฐ€๋ฒˆํ˜ธ@์ƒˆ๋กœ์šดํ˜ธ์ŠคํŠธ:์ƒˆ๋กœ์šดํฌํŠธ/์ƒˆ๋กœ์šด๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์ด๋ฆ„

ํฌํŠธ๋ฒˆํ˜ธ๋Š” ์‚ฌ์šฉํ•˜๋Š” DB ์ข…๋ฅ˜์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง„๋‹ค(mysql : 3306, postgresql : 5432)

postgresql+psycogs2://airflow:airflow@localhost:5432/airflow
 
 
10) ๋ฒˆ์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฌธ๊ตฌ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

13๋ฒˆ ๊ณผ์ •)

github ํด๋ก  ํ›„์— ํ•ด๋‹น ๋ ˆํฌ์ง€ํ† ๋ฆฌ์˜ airflow-setupํด๋”/dagsํด๋”/์•ˆ์˜ ๋ชจ๋“  ํŒŒ์ผ์„ dags ํด๋”๋กœ ๋ณต์‚ฌ

git clone https://github.com/keeyong/airflow-setup.git
cp -r airflow-setup/dags/* dags

 
๋ณต์‚ฌ๋œ dag ํŒŒ์ผ์„ ํ™•์ธํ•ด ๋ณด์ž.

 

14๋ฒˆ ๊ณผ์ •)

๋‹ค์Œ๊ณผ ๊ฐ™์ด airflow๊ณ„์ •์˜ ๊ธฐ๋ณธ ํ™ˆ ๋””๋ ‰ํ„ฐ๋ฆฌ๊ฐ€ ์„ค์ •๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

20) ์›น์—์„œ EC2 ์„œ๋ฒ„์— ์ ‘์†

ํ•ด๋‹น ์ธ์Šคํ„ด์Šค์˜ ๋ณด์•ˆ๊ทธ๋ฃน์—์„œ ์ธ๋ฐ”์šด๋“œ ๊ทœ์น™ ์ถ”๊ฐ€๋กœ ํฌํŠธ๋ฅผ ์—ฐ๋‹ค(8080)
EC2์ธ์Šคํ„ด์Šค-๋ณด์•ˆ-์ธ๋ฐ”์šด๋“œ ๊ทœ์น™-๋ณด์•ˆ๊ทธ๋ฃน ํด๋ฆญ
ํ•ด๋‹น ๋ณด์•ˆ๊ทธ๋ฃน์„ ์„ ํƒํ•˜๊ณ  ์ธ๋ฐ”์šด๋“œ ๊ทœ์น™-์ธ๋ฐ”์šด๋“œ ๊ทœ์น™ ํŽธ์ง‘-๊ทœ์น™์ถ”๊ฐ€
๋‹ค์Œ์˜ ํฌํŠธ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.

 
์ด์ œ ํ•ด๋‹น https://EC2 DNS์ฃผ์†Œ:8080์— ์„ฑ๊ณต์ ์œผ๋กœ ์ ‘์†ํ•  ์ˆ˜ ์žˆ๋‹ค.

airflow web ์ ‘์†

 
ํ™•์ธํ•ด ๋ณด๋ฉด ์•„๊นŒ ๊นƒํ—ˆ๋ธŒ์—์„œ ๋ณต์‚ฌํ•œ ์„ธ ๊ฐ€์ง€ dag ๋ชฉ๋ก์ด๋‹ค.(๋‚˜๋จธ์ง€๋Š” airflow๊ฐ€ ๋กœ๋”ฉํ•ด ์ฃผ๋Š” ์˜ˆ์ œ์ด๋‹ค)

์ถ”๊ฐ€ํ•œ dag ๋ชฉ๋ก

 
Dag์— ์ ‘์†ํ•ด์„œ ํ™•์ธํ•ด ๋ณด๋ฉด, task์˜ ๊ฐœ์ˆ˜์™€ ํ™œ์„ฑํ™”๋ฅผ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.
2๊ฐœ์˜ task๋กœ ๊ตฌ์„ฑ

dag์˜ task ๋ชฉ๋ก

ํ™œ์„ฑํ™” ๋ฒ„ํŠผ

 
 
์ด๋ฒˆ ์‹œ๊ฐ„์—” airflow๋ฅผ ์ง์ ‘ ์„ค์น˜ํ•˜๊ณ  ์šด์˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋ฐฐ์šธ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
๊ฐ•์˜ ๋‚ด์šฉ์„ ํ•œ ๋ฒˆ์— ์ดํ•ดํ•˜์ง„ ๋ชปํ–ˆ์ง€๋งŒ, ์„œ๋ฒ„๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ์ง€์šฐ๊ณ  ๋‹ค์‹œ ํ•ด ๋ณด๋‹ˆ ์ˆ™์ง€๋˜์—ˆ๋‹ค.
๋˜, airflow, docker, GCP, AWS ๋ชจ๋‘ ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ์„œ๋น„์Šค๋ผ ์ง€๋ ˆ ๊ฒ์„ ๋จน์–ด ๊ฑฐ๋ถ€๊ฐ์ด ์žˆ์—ˆ๊ณ  ์‹ค์ œ๋กœ ์—ฌ๋Ÿฌ ์˜ค๋ฅ˜๋“ค๋กœ ์˜ค๋ž˜ ๊ฑธ๋ ธ์ง€๋งŒ ์ž˜ ํ•ด๋ƒˆ๋‹ค.

728x90