Python爬虫宝典：下载某度图片不再是难事

后端

2023-02-09 19:58:48

使用 Python 爬虫从百度批量下载图片

作为一名经验丰富的 Python 爬虫开发者，我迫不及待地想与大家分享使用 Python 爬虫根据从百度批量下载图片的技巧。无论你是初学者还是经验丰富的程序员，本教程将引导你一步步完成整个过程。准备好你的工具包，让我们开始吧！

必备工具

首先，让我们确保你拥有必要的工具：

Python 3 环境
requests 库
bs4 库
Pillow 库
可选：代理 IP（用于应对反爬虫措施）

代码实现

现在，让我们潜入 Python 代码的海洋：

import requests
from bs4 import BeautifulSoup
from PIL import Image
import os

# 设置要爬取的关键字
keyword = "风景"

# 设置要保存图片的路径
save_path = "D:/Pictures/baidu_images/"

# 创建一个代理 IP 池（可选）
proxy_pool = [
    {"http": "http://127.0.0.1:8080"},
    {"http": "http://127.0.0.1:8081"},
    {"http": "http://127.0.0.1:8082"}
]

# 循环爬取图片
for i in range(1, 101):
    # 构建请求头
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36"
    }

    # 随机选择一个代理 IP
    proxy = random.choice(proxy_pool)

    # 发送请求
    response = requests.get(
        "https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&fp=result&queryWord=%s&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=&copyright=&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&pn={}&rn=30&gsm=1e&1654147262220=".format(
            keyword, i), headers=headers, proxies=proxy)

    # 解析响应内容
    data = response.json()

    # 提取图片链接
    image_urls = [item["objURL"] for item in data["data"]]

    # 下载图片
    for image_url in image_urls:
        try:
            # 发送图片请求
            image_response = requests.get(image_url, headers=headers, proxies=proxy)

            # 保存图片
            image = Image.open(BytesIO(image_response.content))
            image.save(os.path.join(save_path, "{}.jpg".format(image_urls.index(image_url))))
        except Exception as e:
            print(e)
            continue