揭秘Python爬虫的奥秘：7个震撼案例助你解锁数据宝藏

2023-04-21 15:55:59

Python爬虫：解锁数据宝藏的利器

在21世纪，数据是不可或缺的资产。而Python爬虫正是一款可以帮助你轻松驾驭数据世界的利器。

Python爬虫小案例：玩转数据世界

为了让你快速掌握Python爬虫的精髓，我们整理了7个趣味且实用的Python爬虫小案例：

案例一：新闻爬取

随时随地获取新闻资讯，不再需要打开浏览器。

import requests
from bs4 import BeautifulSoup

# 指定目标网站的URL
url = 'https://www.bbc.com/news'

# 发送请求并获取响应
response = requests.get(url)

# 解析HTML响应
soup = BeautifulSoup(response.text, 'html.parser')

# 提取新闻标题和链接
news_titles = [a.text for a in soup.select('h3.media__title')]
news_links = [a['href'] for a in soup.select('h3.media__title a')]

# 输出结果
for i in range(len(news_titles)):
    print(f'{i+1}. {news_titles[i]} - {news_links[i]}')

案例二：商品数据采集

轻松构建你的商品数据库，为开网店做好准备。

import requests
from bs4 import BeautifulSoup

# 指定目标网站的URL
url = 'https://www.amazon.com/s?k=headphones'

# 发送请求并获取响应
response = requests.get(url)

# 解析HTML响应
soup = BeautifulSoup(response.text, 'html.parser')

# 提取商品信息
products = soup.select('div.s-result-item')
for product in products:
    title = product.select_one('span.a-size-base-plus').text
    price = product.select_one('span.a-offscreen').text
    print(f'{title} - {price}')

案例三：股票数据分析

深入分析股票走势，做出更明智的投资决策。

import requests
from bs4 import BeautifulSoup

# 指定目标网站的URL
url = 'https://finance.yahoo.com/quote/GOOGL'

# 发送请求并获取响应
response = requests.get(url)

# 解析HTML响应
soup = BeautifulSoup(response.text, 'html.parser')

# 提取股票信息
price = soup.select_one('fin-streamer').attrs['value']
change = soup.select_one('fin-streamer').attrs['data-symbol-change']
print(f'Current Price: {price}, Change: {change}')

案例四：天气预报抓取

随时随地查看天气预报，不再需要打开天气预报软件。

import requests
import json

# 指定目标网站的URL
url = 'https://api.openweathermap.org/data/2.5/weather?q=London&appid=YOUR_API_KEY'

# 发送请求并获取响应
response = requests.get(url)

# 解析JSON响应
data = json.loads(response.text)

# 提取天气信息
temperature = data['main']['temp']
humidity = data['main']['humidity']
print(f'Temperature: {temperature}K, Humidity: {humidity}%')

案例五：火车票查询

快速查询火车票信息，不再需要打开火车票预订软件。

import requests
from bs4 import BeautifulSoup

# 指定目标网站的URL
url = 'https://www.trainline.com/train-times/london-euston-to-birmingham-new-street'

# 发送请求并获取响应
response = requests.get(url)

# 解析HTML响应
soup = BeautifulSoup(response.text, 'html.parser')

# 提取火车票信息
trains = soup.select('li.train')
for train in trains:
    departure_time = train.select_one('span.departure-time').text
    arrival_time = train.select_one('span.arrival-time').text
    duration = train.select_one('span.duration').text
    print(f'{departure_time} - {arrival_time} - {duration}')

案例六：实时足球比分抓取

随时随地获取最新的足球比分，不再需要打开足球直播软件。

import requests
from bs4 import BeautifulSoup

# 指定目标网站的URL
url = 'https://www.skysports.com/football/live'

# 发送请求并获取响应
response = requests.get(url)

# 解析HTML响应
soup = BeautifulSoup(response.text, 'html.parser')

# 提取足球比分信息
matches = soup.select('div.score-container')
for match in matches:
    teams = match.select('div.team-name')
    team1 = teams[0].text
    team2 = teams[1].text
    scores = match.select('div.score')
    score1 = scores[0].text
    score2 = scores[1].text
    print(f'{team1} {score1} - {score2} {team2}')