selenium爬虫项目(1)——微博自动登录 Introduction 最近上了内容安全的选修课,接触了爬虫和反爬技术,于是写下了这篇blog记录开发过程,同时也方便大家参考改进。(注:本文仅做技术讨论,禁止滥用爬虫技术!)
Step 1 注册一个微博账号
step 2 配置selenium和浏览器 这一步在网上都有教程,能打开网页就说明配置成功了。
step 3 利用selenium打开百度搜索“微博” Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from selenium import webdriverimport timeif __name__ == '__main__' : chrome_driver='C:\\Users\\lenovo\\Anaconda3\\Lib\\site-packages\\chromedriver.exe' driver = webdriver.Chrome(executable_path = chrome_driver) driver.get('https://www.baidu.com' ) try : driver.find_element_by_xpath('//*[@id="kw"]' ).click() driver.find_element_by_xpath('//*[@id="kw"]' ).send_keys('微博' ) time.sleep(3 ) driver.find_element_by_xpath('//*[@id="su"]' ).click() time.sleep(2 ) finally : time.sleep(30 ) driver.quit()
Explanation: 这一步较简单,直接看注释就OK
step4 关闭前面的界面,点击百度第一个结果,并滑动滚轮找到登录按钮点击 Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 from selenium import webdriverimport timeif __name__ == '__main__' : chrome_driver='C:\\Users\\lenovo\\Anaconda3\\Lib\\site-packages\\chromedriver.exe' driver = webdriver.Chrome(executable_path = chrome_driver) driver.get('https://www.baidu.com' ) try : driver.find_element_by_xpath('//*[@id="kw"]' ).click() driver.find_element_by_xpath('//*[@id="kw"]' ).send_keys('微博' ) time.sleep(3 ) driver.find_element_by_xpath('//*[@id="su"]' ).click() time.sleep(2 ) driver.find_element_by_xpath('//*[@id="1"]/div/div[1]/h3/a[1]' ).click() time.sleep(10 ) handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) driver.execute_script("window.scrollBy(0,3000)" ) time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[1]/div[1]/div[2]/div[1]/div/div/div[3]/div[1]/div/a[1]' ).click() time.sleep(5 ) finally : time.sleep(30 ) driver.quit()
Explanation: 首先关闭前面的句柄,并将新页面句柄设为当前句柄,否则xpath将寻找不到新页面的元素(在旧页面寻找);然后执行滚动动作的目的是,刚进入页面时并未发现登录按钮(未加载),在滑动滚轮之后出现,于是模拟滚动效果并点击登录按钮。 上图为滚动前,可以看出此时chrome已经检测出我们不是人为操作了。 滚动后可以发现右上角出现登录按钮,点击它。
step 5 点击账号登录,并关闭之前的页面 Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 from selenium import webdriverimport timeif __name__ == '__main__' : chrome_driver='C:\\Users\\lenovo\\Anaconda3\\Lib\\site-packages\\chromedriver.exe' driver = webdriver.Chrome(executable_path = chrome_driver) driver.get('https://www.baidu.com' ) try : driver.find_element_by_xpath('//*[@id="kw"]' ).click() driver.find_element_by_xpath('//*[@id="kw"]' ).send_keys('微博' ) time.sleep(3 ) driver.find_element_by_xpath('//*[@id="su"]' ).click() time.sleep(2 ) driver.find_element_by_xpath('//*[@id="1"]/div/div[1]/h3/a[1]' ).click() time.sleep(10 ) handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) driver.execute_script("window.scrollBy(0,3000)" ) time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[1]/div[1]/div[2]/div[1]/div/div/div[3]/div[1]/div/a[1]' ).click() time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[4]/div[1]/div/div[2]/div/div/div[5]/a[1]' ).click() handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) time.sleep(10 ) finally : time.sleep(30 ) driver.quit()
Explanation: 这一步和上面的操作差不多,看注释就行,然后就跳转到新界面。
step 6 输入账号密码,并点击登录 Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 from selenium import webdriverimport timeif __name__ == '__main__' : chrome_driver='C:\\Users\\lenovo\\Anaconda3\\Lib\\site-packages\\chromedriver.exe' driver = webdriver.Chrome(executable_path = chrome_driver) driver.get('https://www.baidu.com' ) try : driver.find_element_by_xpath('//*[@id="kw"]' ).click() driver.find_element_by_xpath('//*[@id="kw"]' ).send_keys('微博' ) time.sleep(3 ) driver.find_element_by_xpath('//*[@id="su"]' ).click() time.sleep(2 ) driver.find_element_by_xpath('//*[@id="1"]/div/div[1]/h3/a[1]' ).click() time.sleep(10 ) handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) driver.execute_script("window.scrollBy(0,3000)" ) time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[1]/div[1]/div[2]/div[1]/div/div/div[3]/div[1]/div/a[1]' ).click() time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[4]/div[1]/div/div[2]/div/div/div[5]/a[1]' ).click() handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) time.sleep(10 ) driver.find_element_by_id("loginname" ).click() driver.find_element_by_id("loginname" ).send_keys("你的账号" ) driver.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[2]/div/span' ).click() driver.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[2]/div/input' ).send_keys('你的密码' ) time.sleep(5 ) driver.find_element_by_class_name('W_btn_a' ).click() time.sleep(5 ) finally : time.sleep(30 ) driver.quit()
Explanation: 先定位到账号输入框和密码输入框,然后敲入账号密码,点击登录即可。账号密码校验成功后跳转到验证页面:
step 7 利用私信验证登录 Code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 from selenium import webdriverimport timeif __name__ == '__main__' : chrome_driver='C:\\Users\\lenovo\\Anaconda3\\Lib\\site-packages\\chromedriver.exe' driver = webdriver.Chrome(executable_path = chrome_driver) driver.get('https://www.baidu.com' ) try : driver.find_element_by_xpath('//*[@id="kw"]' ).click() driver.find_element_by_xpath('//*[@id="kw"]' ).send_keys('微博' ) time.sleep(3 ) driver.find_element_by_xpath('//*[@id="su"]' ).click() time.sleep(2 ) driver.find_element_by_xpath('//*[@id="1"]/div/div[1]/h3/a[1]' ).click() time.sleep(10 ) handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) driver.execute_script("window.scrollBy(0,3000)" ) time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[1]/div[1]/div[2]/div[1]/div/div/div[3]/div[1]/div/a[1]' ).click() time.sleep(5 ) driver.find_element_by_xpath('//*[@id="app"]/div[4]/div[1]/div/div[2]/div/div/div[5]/a[1]' ).click() handles=driver.window_handles for handle in handles: if handle!=driver.current_window_handle: driver.close() driver.switch_to.window(handle) time.sleep(10 ) driver.find_element_by_id("loginname" ).click() driver.find_element_by_id("loginname" ).send_keys("你的账号" ) driver.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[2]/div/span' ).click() driver.find_element_by_xpath('//*[@id="pl_login_form"]/div/div[3]/div[2]/div/input' ).send_keys('你的密码' ) time.sleep(5 ) driver.find_element_by_class_name('W_btn_a' ).click() time.sleep(5 ) driver.find_element_by_xpath('//*[@id="dmCheck"]' ).click() time.sleep(1 ) driver.find_element_by_id('send_dm_btn' ).click() finally : time.sleep(30 ) driver.quit()
Explanation: 点击私信验证按钮,跳转之后点击发送,然后在手机端微博的私信确认登录就完成登录了。 校验通过,登陆成功。