์นดํ…Œ๊ณ ๋ฆฌ ์—†์Œ

[Week6 ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์™€ SQL ๊ธฐ๋ณธ] TIL 24์ผ์ฐจ - JOIN ์—ฐ์‚ฐ

๐Ÿช„ํ•˜๋ฃจ๐Ÿช„ 2023. 11. 17. 22:44
728x90

์ด๋ฒˆ์‹œ๊ฐ„์—๋Š” SQL์˜ JOIN ์—ฐ์‚ฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

0) JOIN ์ด๋ž€

๋‘ ๊ฐœ ์ด์ƒ์˜ ํ…Œ์ด๋ธ”์„ ํŠน์ • ์กฐ๊ฑด์— ๋งž์ถฐ์„œ merge ํ•˜๋Š” ๊ฒƒ.

์šฐ์„ , ๋‘ ๊ฐœ์˜ ํ…Œ์ด๋ธ”์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๊ณ  ์™ผ์ชฝ ํ…Œ์ด๋ธ”์„ left, ์˜ค๋ฅธ์ชฝ ํ…Œ์ด๋ธ”์„ right์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

1) ํ…Œ์ด๋ธ”์˜ ๊ด€๊ณ„

1:1

์™„์ „ 1:1์ธ ํ…Œ์ด๋ธ”์ด๊ฑฐ๋‚˜, ํ•œ ํ…Œ์ด๋ธ”์ด ๋‹ค๋ฅธ ํ…Œ์ด๋ธ”์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์ธ ๊ฒฝ์šฐ์ด๋‹ค.

๋ณดํ†ต, join ์—ฐ์‚ฐ ์‹œ ํฌ๊ฒŒ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š”๋‹ค.

1:๋‹ค

join ์—ฐ์‚ฐ์‹œ ๊ธฐ์ค€์„ ์ž˜ ๋ชป ์žก์œผ๋ฉด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

๋‹ค:๋‹ค

๊ฐ€๋Šฅํ•˜๋ฉด join ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ  ๋‹ค:1, 1:1, 1:๋‹ค ํ…Œ์ด๋ธ”๋กœ ๋ณ€ํ™˜ํ•ด์„œ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.

 

โ€ป ์–ด๋Š ํ…Œ์ด๋ธ”์„ ๊ธฐ์ค€์œผ๋กœ ์กฐ์ธํ• ์ง€๋ฅผ ์ž˜ ์„ค์ •ํ•ด์•ผ ํ•œ๋‹ค. โ€ป

 

2) 6๊ฐ€์ง€ JOIN ์—ฐ์‚ฐ

INNER JOIN

LEFT, RIGHT ํ…Œ์ด๋ธ”์˜ ๊ต์ง‘ํ•ฉ์ธ ๋ ˆ์ฝ”๋“œ

FULL JOIN

LEFT, RIGHT ํ…Œ์ด๋ธ”์˜ ํ•ฉ์ง‘ํ•ฉ์ธ ๋ ˆ์ฝ”๋“œ(๋งค์นญ์ด ๋  ๊ฒฝ์šฐ MERGE, ๋งค์นญ๋˜์ง€ ์•Š๋Š” ๊ฐ’์— ๋Œ€ํ•ด์„œ๋Š” NULL)

CROSS JOIN

LEFT, RIGHT ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์˜ ์กฐํ•ฉ

LEFT JOIN

LEFT ํ…Œ์ด๋ธ” ๊ธฐ์ค€์œผ๋กœ ๋งค์นญํ•˜์—ฌ MERGE

RIGHT JOIN

RIGHT ํ…Œ์ด๋ธ” ๊ธฐ์ค€์œผ๋กœ ๋งค์นญํ•˜๋ฉฐ MERGE (๋ณดํ†ต LEFT JOIN์„ ์‚ฌ์šฉ)

SELF JOIN

๋™์ผํ•œ ํ…Œ์ด๋ธ”๋กœ ์กฐ์ธ์„ ์ˆ˜ํ–‰

 

์กฐ์ธ ํ•œ ๋ˆˆ์— ์‚ดํŽด๋ณด๊ธฐ

 

์ฝ”๋“œ

SELECT ~
FROM ํ…Œ์ด๋ธ” A ์กฐ์ธ์ด๋ฆ„JOIN ํ…Œ์ด๋ธ” B ON A.์ปฌ๋Ÿผ๋ช…=B.์ปฌ๋Ÿผ๋ช…
WHERE ~
GROUP BY ~
HAVING ~
ORDER BY ~
LIMIT ~

 

โ€ป ๊ทธ๋Ÿฐ๋ฐ ์•ž์„œ ํ•™์Šตํ•œ ๋ฐ์ดํ„ฐ ์›จ์–ดํ•˜์šฐ์Šค์˜ ํŠน์ง• ์ค‘์— Primary key์˜ unique๊ฐ€ ๋ณด์žฅ์ด ์•ˆ๋œ๋‹ค๋Š” ํ•ญ๋ชฉ์ด ์žˆ์—ˆ๋‹ค.

๋”ฐ๋ผ์„œ ์กฐ์ธ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ ํ•˜๊ธฐ ์ „์— ๋ฏธ๋ฆฌ Primary key ๋ฐ์ดํ„ฐ์˜ ์ค‘๋ณต์„ ์ฒดํฌํ•ด์•ผ ํ•œ๋‹ค. โ€ป

 

2) ์‹ค์Šต

์ž„์˜๋กœ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์กฐ์ธ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ด ๋ณด์ž.

raw_data

- vital (UserID, VitalID, Date, Weight) : ์‚ฌ์šฉ์ž๊ฐ€ ํ•˜๋ฃจ์— ํ•œ ๋ฒˆ์”ฉ ์ธก์ •ํ•ด์„œ ์ €์žฅ

- alert (UserID, VitalID, AlertType, Date, UserID) : ์‚ฌ์šฉ์ž์˜ ํ–‰๋™(์ •๋ณด)์— ๋”ฐ๋ผ ๋ฐœ์ƒํ•˜๋Š” ์•Œ๋žŒ, ์‚ฌ์šฉ์ž๊ฐ€ 2์ผ ์ด์ƒ ๋ฐ”์ดํƒˆ์„ ๊ธฐ๋กํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ VitalID NULL๋กœ AlterType MissingType์ด ๋ฐœ์ƒํ•œ๋‹ค.

 

VitalID๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์กฐํšŒํ•œ๋‹ค.

1. InnerJoin

(NULL)๊ฐ’์€ ์ œ์™ธํ•˜๊ณ  ๊ณตํ†ต์ผ ๊ฒฝ์šฐ merge๋ฅผ ์ˆ˜ํ–‰

SELECT ~
FROM VITAL A 
JOIN ALERT B ON A.VITALID=B.VITALID;

2. LeftJoin/RightJoin

(์™ผ์ชฝ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•ด) ๊ฐ’์ด ๊ฐ™์€ ๋ ˆ์ฝ”๋“œ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ merge, ์—†๋Š” ๊ฒฝ์šฐ null

SELECT ~
FROM VITAL A 
LEFT JOIN ALERT B ON A.VITALID=B.VITALID;

3. FullJoin

(์™ผ์ชฝ, ์˜ค๋ฅธ์ชฝ ๋ชจ๋“  ๋ ˆ์ฝ”๋“œ์— ๋Œ€ํ•ด) ๊ฐ’์ด ๊ฐ™์€ ๋ ˆ์ฝ”๋“œ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ merge, ์—†๋Š” ๊ฒฝ์šฐ null

SELECT ~
FROM VITAL A 
FULL JOIN ALERT B ON A.VITALID=B.VITALID;

4. CrossJoin

์™ผ์ชฝ, ์˜ค๋ฅธ์ชฝ ๋ ˆ์ฝ”๋“œ์˜ ๋ชจ๋“  ์กฐํ•ฉ์˜ ์ˆ˜(์™ผ์ชฝ ๋ ˆ์ฝ”๋“œ ์ˆ˜ X์˜ค๋ฅธ์ชฝ ๋ ˆ์ฝ”๋“œ ์ˆ˜

SELECT ~
FROM VITAL A 
CROSS JOIN ALERT B ON A.VITALID=B.VITALID;

5. SelfJoin

๊ฐ™์€ ํ…Œ์ด๋ธ”์— ๋Œ€ํ•ด alias๋งŒ ๋‹ค๋ฅด๊ฒŒ ํ•ด์„œ ์กฐ์ธ

SELECT ~
FROM VITAL A 
JOIN VITAL B ON A.VITALID=B.VITALID;

 

๋ณดํ†ต ํ•ด๋‹น ํ˜•ํƒœ๋ณด๋‹ค, ๋‹ค๋ฅธ ์กฐ๊ฑด๋“ค์ด ์ถ”๊ฐ€๋œ๋‹ค.

 

 

3) SQL ์œ ์˜์‚ฌํ•ญ

1. BOOLEAN๊ฐ’ ์ฒ˜๋ฆฌํ•˜๊ธฐ(True, False)

๋™์ผํ•œ ํ‘œํ˜„

์ปฌ๋Ÿผ=True

์ปฌ๋Ÿผ is True

 

๋‹ค๋ฅธ ํ‘œํ˜„(์ปฌ๋Ÿผ๊ฐ’์ด True, False์™ธ์— ๋‹ค๋ฅธ ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค. ex.null๊ฐ’)

์ปฌ๋Ÿผ is True

์ปฌ๋Ÿผ is not False

 

2. NULL๊ฐ’ ์ฒ˜๋ฆฌํ•˜๊ธฐ

null๊ฐ’์€ ๋น„๊ต์—ฐ์‚ฐ์ž ๋Œ€์‹  IS NULL, IS NOT NULL ์ด์šฉํ•œ๋‹ค.

NULLIF(์ปฌ๋Ÿผ๊ฐ’, ๊ฐ’) : ๊ฐ’ NULL

๋‚˜๋ˆ—์…ˆ์˜ ๊ฒฝ์šฐ 0์œผ๋กœ ๋‚˜๋ˆ„๋ฉด ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ NULLIF ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.

NULLIF(์ปฌ๋Ÿผ๊ฐ’, ๊ฐ’) → ์ปฌ๋Ÿผ๊ฐ’=๊ฐ’์ด๋ฉด NULL์„ ๋ฐ˜ํ™˜ ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์ปฌ๋Ÿผ ๋ฐ˜ํ™˜

COALESCE(๊ฐ’ 1, ๊ฐ’ 2,...) : NULL ๊ฐ’

์ œ์ผ ์ฒ˜์Œ NULL์ด ์•„๋‹Œ ๊ฐ’์„ return(๋๊นŒ์ง€ NULL์ด๋ฉด NULL์„ ๋ฐ˜ํ™˜)

 

3. ์˜ˆ์•ฝ๋œ ํ‚ค์›Œ๋“œ/๊ณต๋ฐฑ์„ ์ปฌ๋Ÿผ ์ด๋ฆ„์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์„ ๋•Œ

ํฐ ๋”ฐ์˜ดํ‘œ๋ฅผ ์ด์šฉํ•œ๋‹ค.

CREATE TABLE ํ…Œ์ด๋ธ”๋ช…(
	"์ปฌ ๋Ÿผ ๋ช…" ์ž๋ฃŒํ˜•,
    "์˜ˆ์•ฝ์–ด" ์ž๋ฃŒํ˜•
);

๋‚˜์ค‘์— ์‚ฌ์šฉํ•  ๋•Œ๋„ ํฐ๋”ฐ์˜ดํ‘œ๋ฅผ ๋ถ™์—ฌ์„œ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

 

 

4) ์–ด์ œ์ˆ™์ œ- ์›”๋ณ„ ์ฑ„๋„๋ณ„ ๋งค์ถœ์•ก ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ธฐ ์ฝ”๋“œ ์ˆ˜์ •ํ•˜๊ธฐ

SQL

SELECT
  TO_CHAR(T.ts, 'YYYY-MM') AS month,
  channel,
  COUNT(DISTINCT U.userid) as uniqueUsers, --ํ•œ ์‚ฌ์šฉ์ž๋Š” ํ•œ ๋ฒˆ๋งŒ count
  Count(DISTINCT CASE WHEN TR.amount >=0 THEN U.userid END)as paidUsers, --ํ•œ ์‚ฌ์šฉ์ž๋Š” ํ•œ ๋ฒˆ๋งŒ count
  ROUND(paidUsers AS Float*100.0/NULLIF(uniqueUsers, 0), 2) as conversionRate,
  Sum(CASE WHEN TR.refunded is NOT NULL then Tr.amount END) as grossRevenue,
  Sum(CASE WHEN TR.refunded=False then Tr.amount ELSE 0 END) as netRevenue
FROM raw_data.user_session_channel U
JOIN raw_data.session_timestamp T ON T.sessionid=U.sessionid
LEFT JOIN raw_data.session_transaction TR ON  T.sessionid=TR.sessionid
GROUP BY 1, 2
ORDER BY 1, 2;

 

- channel ํ…Œ์ด๋ธ” ์กฐ์ธ ํ•ด์ œ

user_session_channel ํ…Œ์ด๋ธ”์— ์ฑ„๋„ ์ด๋ฆ„ ์กด์žฌ

- paidUsers์— distinct userid ์‚ฌ์šฉ

ํ•œ ์‚ฌ๋žŒ์ด ์—ฌ๋Ÿฌ ๋ฒˆ ๊ตฌ๋งคํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ

- conversionRate์— typecasting ๋Œ€์‹  ์‹ค์ˆ˜ํ˜• ๊ณฑ์…ˆ

์‹ค์ˆ˜X์ •์ˆ˜=์‹ค์ˆ˜

- ๋‚˜๋ˆ—์…ˆ ์—ฐ์‚ฐ์— NULLIF ํ•จ์ˆ˜ ์‚ฌ์šฉ

0์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์—†์œผ๋ฏ€๋กœ NULL์„ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ํ•œ๋‹ค.

 

CTAS ์‚ฌ์šฉํ•ด์„œ redshift์— ์ƒˆ๋กœ์šด ํ…Œ์ด๋ธ” ์ƒ์„ฑ

%%sql
DROP TABLE IF EXISTS ์Šคํ‚ค๋งˆ.session_summary;
CREATE TABLE ์Šคํ‚ค๋งˆ.session_summary AS

๊ทธ ํ›„์—, ์œ„์˜ SQL ๋ฌธ์žฅ์„ ์‚ฌ์šฉํ•œ๋‹ค.(CTAS์—์„œ๋Š” orderby ๋ถ€๋ถ„์€ ์ ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค)

 

 

5) ์˜ค๋Š˜ ์ˆ™์ œ

1. ์‚ฌ์šฉ์ž๋ณ„ ์ฒ˜์Œ ์ ‘์†ํ•œ ์ฑ„๋„, ๋งˆ์ง€๋ง‰ ์ฑ„๋„ ํ™•์ธํ•˜๊ธฐ

- first_value, last_value ์ด์šฉ

%%sql
SELECT
  distinct usc.userid,
  FIRST_VALUE(usc.channel) OVER (PARTITION BY usc.userid ORDER BY st.ts 
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstValue,
  LAST_VALUE(usc.channel) OVER (PARTITION BY usc.userid ORDER BY st.ts 
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastValue
FROM raw_data.user_session_channel usc
JOIN raw_data.session_timestamp st ON usc.sessionid = st.sessionid;

2. Gross Revenue๊ฐ€ ๊ฐ€์žฅ ํฐ UserID 10๊ฐœ ํ™•์ธํ•˜๊ธฐ

%%sql
SELECT
    usc.userid,
    SUM(amount) gross_revenue
FROM raw_data.user_session_channel usc
LEFT JOIN raw_data.session_transaction str ON usc.sessionid = str.sessionid
where str.amount>0
GROUP BY 1 
ORDER BY 2 desc
limit 10;

3. nps ํ…Œ์ด๋ธ”์„ ๋ฐ”ํƒ•์œผ๋กœ ์›”๋ณ„ NPS ๊ณ„์‚ฐํ•˜๊ธฐ

%%sql
SELECT LEFT(created_at, 7) AS month,
  ROUND(SUM(
    CASE
    WHEN score >= 9 THEN 1
    WHEN score <= 6 THEN -1 
    END)*100.00/COUNT(1), 2)
FROM raw_data.nps
GROUP BY 1
ORDER BY 1;

 

 

728x90