๋ฐ๋ธŒ์ฝ”์Šค_๋ฐ์ดํ„ฐ์—”์ง€๋‹ˆ์–ด๋ง

[Week3 ์›น๋ฐ์ดํ„ฐ ํฌ๋กค ๋ฐ ๋ถ„์„] TIL 9์ผ์ฐจ - Selenium์œผ๋กœ ์Šคํฌ๋ž˜ํ•‘ํ•˜๊ธฐ

๐Ÿช„ํ•˜๋ฃจ๐Ÿช„ 2023. 10. 26. 15:37
728x90

0) Selenium ํŒจํ‚ค์ง€

์–ด์ œ ๋งํ–ˆ๋“ฏ, BeautifulSoup ํŒจํ‚ค์ง€๋กœ๋Š” ๋™์ ์œผ๋กœ ์ƒ์„ฑ๋œ ์ •๋ณด๋Š” ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์—†์—ˆ๋‹ค.

๋™์  ์›น์‚ฌ์ดํŠธ๋Š” ๋น„๋™๊ธฐ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•˜๋Š”๋ฐ ๋žœ๋”๋ง์ด ์™„๋ฃŒ๋˜์—ˆ์„ ๋•Œ ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ์ด ๊ผญ ์™„๋ฃŒ๋˜๋Š” ๊ฒƒ์€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋”ฐ๋ผ์„œ Selenium ํŒจํ‚ค์ง€๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ค๋ ค ํ•œ๋‹ค.

 

Sellenium ํŒจํ‚ค์ง€๋ฅผ ์ด์šฉํ•˜๋ฉด ํŒŒ์ด์ฌ์œผ๋กœ ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ๋™์ž‘์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

1. ์ž๋ฐ”์Šคํฌ๋ฆฝํŠธ๊ฐ€ ๋™์ ์œผ๋กœ ๋งŒ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์Šคํฌ๋ž˜ํ•‘

2. ์‚ฌ์ดํŠธ์— ์ด๋ฒคํŠธ(๋งˆ์šฐ์Šค ํด๋ฆญ, ํ‚ค๋ณด๋“œ ์ž…๋ ฅ ๋“ฑ)๋ฅผ ์ฃผ๊ธฐ

 

์‹ค์Šต์„ ํ†ตํ•ด ๋‹ค์Œ ๋‘ ๊ฐ€์ง€์˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•ด ๋ณด์ž.

 

 

1) ์›ํ•˜๋Š” ์ •๋ณด ๊ฐ€์ ธ์˜ค๊ธฐ

์˜ค๋Š˜๋‚ ์˜ ์›น ํŽ˜์ด์ง€๋Š” ์›นํฌ๋กค๋Ÿฌ๋‚˜ ๋‹ค๋ฅธ ์Šคํฌ๋ž˜ํ•‘ ์‚ฌ์šฉ์ž๋“ค์ด ์‰ฝ๊ฒŒ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ๊ฐ€์ง€ ๋ชปํ•˜๋„๋ก ํด๋ž˜์Šค ์ด๋ฆ„์„ ๋žœ๋ค ํ•˜๊ฒŒ ์ƒ์„ฑํ•œ๋‹ค.

ex) Class="navbar__ADSXD"

 

์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ์— "์œ„์น˜"๋ฅผ ํ™œ์šฉํ•ด ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ XPath๊ฐ€ ์žˆ๋‹ค.

 

0. XPath

Xml, HTML ๋ฌธ์„œ์˜ ํŠน์ • ์š”์†Œ์— ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•œ ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•˜๋Š” ์–ธ์–ด

XPath๋Š” ์กฐ๊ฑด์— ๋งž๋Š” ๋ชจ๋“  ๋…ธ๋“œ๋ฅผ ๊ฐ€์ ธ์™€์„œ ๋ฐฐ์—ด์˜ ํ˜•ํƒœ๋กœ ์ €์žฅํ•˜๋Š”๋ฐ ์ผ๋ฐ˜ ๋ฐฐ์—ด๊ณผ ๋‹ค๋ฅด๊ฒŒ ์‹œ์ž‘์œ„์น˜๊ฐ€ 1์ด๋‹ค.

ex) `//*[@id="__next"]/div/main/div[2]/div/div[4]/p[1]'์˜ HTML ์œ„์น˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

<div id="__next>
	<div ...>
    	<main ...>
        	<div id="1">
            </div>
            <div id="2>
            	<div ...>
                	<div id="01">
                    </div>
                    <div id="02">
                    </div>
                    <div id="03>
                    </div>
                    <div id="04">
                    	<p>์›ํ•˜๋Š”์ •๋ณด์—ฌ๊ธฐ์žˆ์–ด์š”</p>
                    </div>
                </div>
            </div>
        </main>
    </div>
</div>

 

1. ๊ธฐ๋‹ค๋ฆผ(wait)

์•ž์„œ์„œ ๋งํ–ˆ๋“ฏ์ด ๋™์  ์›น ์‚ฌ์ดํŠธ๋Š” ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ์ด ๋˜๊ธฐ ์ „์—๋„ ์›น ์‚ฌ์ดํŠธ์˜ ๋ Œ๋”๋ง์ด ์™„๋ฃŒ๋œ๋‹ค๋ฉด ์‚ฌ์šฉ์ž์—๊ฒŒ ํ•ด๋‹น ํ™”๋ฉด์„ ๋ณด์—ฌ์ค„ ์ˆ˜ ์žˆ๋‹ค. ์›ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์›น ์‚ฌ์ดํŠธ์— ํ•ด๋‹น ๋ฐ์ดํ„ฐ๊ฐ€ ๋กœ๋”ฉ๋˜์—ˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ์•„๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค.

์ด๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ธฐ๋‹ค๋ฆผ์ด๋‹ค.

๊ธฐ๋‹ค๋ฆผ์€ ์ผ๋ฐ˜์ ์œผ๋กœ time ํŒจํ‚ค์ง€์˜ ํƒ€์ด๋จธ๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ด๋ฒˆ ์‹œ๊ฐ„์—๋Š” ๋ช…์‹œ์  ๊ธฐ๋‹ค๋ฆผ(Explicit wait)๊ณผ ์•”๋ฌต์  ๊ธฐ๋‹ค๋ฆผ(Implicit wait)์„ ์‚ฌ์šฉํ•ด ๋ณด์ž.

 

๋ช…์‹œ์  ๊ธฐ๋‹ค๋ฆผ(Explicit wait) : ๋กœ๋”ฉ์ด ๋‹ค ๋  ๋•Œ๊นŒ์ง€ ์ง€์ •ํ•œ ์‹œ๊ฐ„ ๋™์•ˆ ๊ธฐ๋‹ค๋ฆผ 

์•”๋ฌต์  ๊ธฐ๋‹ค๋ฆผ(Implicit wait) : ํŠน์ • ์š”์†Œ์— ์ œ์•ฝ์„ ํ†ตํ•œ ๊ธฐ๋‹ค๋ฆผ

 

์ด์ œ ๋™์  ์›น ํŽ˜์ด์ง€์—์„œ ๋ช…์‹œ์  ๊ธฐ๋‹ค๋ฆผ, ์•”๋ฌต์  ๊ธฐ๋‹ค๋ฆผ์„ ์ด์šฉํ•ด ์›ํ•˜๋Š” ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜ค์ž.

 

 

2. ๋ช…์‹œ์  ๊ธฐ๋‹ค๋ฆผ์„ ์‚ฌ์šฉํ•œ ์›น ์Šคํฌ๋ž˜ํ•‘

๋กœ๋”ฉ์ด ๋‹ค ๋  ๋•Œ๊นŒ์ง€ ์ง€์ •ํ•œ ์‹œ๊ฐ„ ๋™์•ˆ ๊ธฐ๋‹ค๋ฆผ 

WebDriverWait()

  • until() : ์ธ์ž์˜ ์กฐ๊ฑด์ด ๋งŒ์กฑ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆผ
  • until_not() : ์ธ์ž์˜ ์กฐ๊ฑด์ด ๋งŒ์กฑ๋˜์ง€ ์•Š์„ ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆผ
# ์Šคํฌ๋ž˜ํ•‘์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

url="์›ํ•˜๋Š”์‚ฌ์ดํŠธ์ฃผ์†Œ"
# ์˜ˆ์‹œ ์‚ฌ์ดํŠธ์— ์š”์ฒญ์„ ์ง„ํ–‰ํ•˜๊ณ , ์˜ˆ์‹œ ์‚ฌ์ดํŠธ์˜ xpath๊ฒฝ๋กœ์ƒ์˜ ์ œ๋ชฉ์„ ๊ฐ€์ ธ์™€๋ด…์‹œ๋‹ค.
driver=webdriver.Chrome(service=Service(ChromeDriverManager().install()))
xpath='์›ํ•˜๋Š”๋ฐ์ดํ„ฐ์˜XPATH๊ฒฝ๋กœ'

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
# Explicit Wait๋ฅผ ํ™œ์šฉํ•ด์„œ ์Šคํฌ๋ž˜ํ•‘์ด ์ž˜ ์ด๋ฃจ์–ด์ง€๋„๋ก ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด๋ด…์‹œ๋‹ค.
# 10์ดˆ๋™์•ˆ Explicit Wait์„ ์ง„ํ–‰ํ•˜๋„๋ก ํ•ด์„œ ์Šคํฌ๋ž˜ํ•‘์ด ์ž˜ ์ด๋ฃจ์–ด์ง€๋„๋ก ์ˆ˜์ •ํ•ด๋ด…์‹œ๋‹ค.
with webdriver.Chrome(service=Service(ChromeDriverManager().install())) as driver:
    driver.get(url)
    element=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, xpath)))

 

3. ์•”๋ฌต์  ๊ธฐ๋‹ค๋ฆผ์„ ์‚ฌ์šฉํ•œ ์›น ์Šคํฌ๋ž˜ํ•‘

ํŠน์ • ์š”์†Œ์— ์ œ์•ฝ์„ ํ†ตํ•œ ๊ธฐ๋‹ค๋ฆผ

implicitly_wait(time)

(ํ•œ๊ณ„์‹œ๊ฐ„ ์ „์ด๋ผ๋ฉด) ๋กœ๋”ฉ์ด ๋‹ค ๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ฆผ

# ์Šคํฌ๋ž˜ํ•‘์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

url="์›ํ•˜๋Š”์‚ฌ์ดํŠธ์ฃผ์†Œ"
# ์˜ˆ์‹œ ์‚ฌ์ดํŠธ์— ์š”์ฒญ์„ ์ง„ํ–‰ํ•˜๊ณ , ์˜ˆ์‹œ ์‚ฌ์ดํŠธ์˜ xpath๊ฒฝ๋กœ์ƒ์˜ ์ œ๋ชฉ์„ ๊ฐ€์ ธ์™€๋ด…์‹œ๋‹ค.
driver=webdriver.Chrome(service=Service(ChromeDriverManager().install()))
xpath='์›ํ•˜๋Š”๋ฐ์ดํ„ฐ์˜XPATH๊ฒฝ๋กœ'

# 10์ดˆ๋™์•ˆ Implicit Wait์„ ์ง„ํ–‰ํ•˜๋„๋ก ํ•ด์„œ ์Šคํฌ๋ž˜ํ•‘์ด ์ž˜ ์ด๋ฃจ์–ด์ง€๋„๋ก ์ˆ˜์ •ํ•ด๋ด…์‹œ๋‹ค.
with webdriver.Chrome(service=Service(ChromeDriverManager().install())) as driver:
    driver.get(url)
    driver.implicitly_wait(10)
    driver.find_element(By.XPATH, xpath).text

 

 

2) ํŒŒ์ด์ฌ์œผ๋กœ ์›นํŽ˜์ด์ง€ ์ด๋ฒคํŠธ ์ฒ˜๋ฆฌํ•˜๊ธฐ(ํ‚ค๋ณด๋“œ, ๋งˆ์šฐ์Šค ์ž…๋ ฅ)

์‚ฌ์ดํŠธ์˜ ๋กœ๊ทธ์ธ ์ฐฝ์— ์ ‘์†ํ•˜์—ฌ ๋กœ๊ทธ์ธ์„ ์ž๋™์œผ๋กœ ํ•ด ๋ณด์ž.

ํ‚ค๋ณด๋“œ์™€ ๋งˆ์šฐ์Šค ์ž…๋ ฅ์— ๊ด€ํ•œ ์—ฌ๋Ÿฌ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ ์‚ฌ์ดํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์ž.

 

Mouse actions

A representation of any pointer device for interacting with a web page.

www.selenium.dev

 

๋กœ๊ทธ์ธ ์˜ˆ์ œ์˜ ๋งˆ์šฐ์Šค, ํ‚ค๋ณด๋“œ ์ž…๋ ฅ์˜ ์ˆœ์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

1. ๋งˆ์šฐ์Šค ์ž…๋ ฅ (๋กœ๊ทธ์ธ ๋ฒ„ํŠผ ํด๋ฆญ). click(๋กœ๊ทธ์ธ์š”์†Œ)

  • ๋งˆ์šฐ์Šค ์ž…๋ ฅ์„ ์›ํ•˜๋Š” ์š”์†Œ๋ฅผ ์ฐพ๊ณ 
  • ์›ํ•˜๋Š” ๋งˆ์šฐ์Šค ์ž…๋ ฅ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

2. ํ‚ค๋ณด๋“œ ์ž…๋ ฅ (id, password ์ž…๋ ฅ). send_keys_to_element(์ž…๋ ฅ์š”์†Œ, ๋ฌธ์ž์—ด)

  • ํ‚ค๋ณด๋“œ ์ž…๋ ฅ์„ ์›ํ•˜๋Š” ์š”์†Œ๋ฅผ ์ฐพ๊ณ 
  • ๋ฌธ์ž์—ด์„ ์ž…๋ ฅํ•œ๋‹ค.

3. ๋งˆ์šฐ์Šค ์ž…๋ ฅ (ํ™•์ธ ๋ฒ„ํŠผ ํด๋ฆญ)

  • ๋งˆ์šฐ์Šค ์ž…๋ ฅ์„ ์›ํ•˜๋Š” ์š”์†Œ๋ฅผ ์ฐพ๊ณ 
  • ์›ํ•˜๋Š” ๋งˆ์šฐ์Šค ์ž…๋ ฅ์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

๊ธฐ์กด ์ฝ”๋“œ๋Š” ๊ณ„์† ์˜ค๋ฅ˜๊ฐ€ ๋‚˜์„œ ๋ช…์‹œ์  ๊ธฐ๋‹ค๋ฆผ์„ ์ด์šฉํ•ด ๋™์  ์›นํŽ˜์ด์ง€์˜ ํŠน์„ฑ์„ ๋ฐ˜์˜ํ•˜์˜€๋‹ค.

# ์Šคํฌ๋ž˜ํ•‘์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
from selenium import webdriver
from selenium.webdriver import ActionChains
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.actions.action_builder import ActionBuilder
from selenium.webdriver import Keys, ActionChains
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

import time
driver=webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://hashcode.co.kr")
time.sleep(1)

# "๋กœ๊ทธ์ธ" ๋ฒ„ํŠผ์„ ์ฐพ์•„ ๋ˆ„๋ฅธ๋‹ค.
id_input=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="main-app-account"]/div/div[2]/div/div[2]/div[1]/div/div[2]/div[2]/input')))
ActionChains(driver).send_keys_to_element(id_input, "id๋ฌธ์ž์—ด์ž…๋ ฅ").perform()
password_input=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="main-app-account"]/div/div[2]/div/div[2]/div[1]/div/div[2]/div[4]/input')))
ActionChains(driver).send_keys_to_element(password_input, "password๋ฌธ์ž์—ด์ž…๋ ฅ").perform()

# "๋กœ๊ทธ์ธ" ๋ฒ„ํŠผ์„ ๋ˆŒ๋Ÿฌ์„œ ๋กœ๊ทธ์ธ์„ ์™„๋ฃŒํ•ฉ๋‹ˆ๋‹ค.
button=WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="main-app-account"]/div/div[2]/div/div[2]/div[1]/div/div[2]/button')))
ActionChains(driver).click(button).perform()

 

๋ฐฉ๊ณผ ํ›„ ํ•™์Šต)

์˜ค๋Š˜ ๋ฐฐ์šด ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ž๋™์œผ๋กœ ๋กœ๊ทธ์ธ ํ•œ ๋’ค, ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด๋ง ๊ฐ•์˜์˜ ๋Œ€์‹œ๋ณด๋“œ์— ์ง„์ž…ํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋™ํ™” ํ”„๋กœ๊ทธ๋žจ์„ ์ž‘์„ฑํ•ด ๋ณด์•˜๋‹ค. ๊ด€๋ จ ๊นƒํ—ˆ๋ธŒ ์ฃผ์†Œ-ํŒŒ์ผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

https://github.com/es3442/Programmers-DataEngineering/blob/main/Web/Crawling/Week3-Day4(%EB%B0%9C%EC%A0%84).%20%ED%94%84%EB%A1%9C%EA%B7%B8%EB%9E%98%EB%A8%B8%EC%8A%A4%20%EB%A1%9C%EA%B7%B8%EC%9D%B8%2B%EB%8C%80%EC%8B%9C%EB%B3%B4%EB%93%9C%20%EC%9E%85%EC%9E%A5%20%EC%9E%90%EB%8F%99%ED%99%94.ipynb

 

728x90