返回
Python x 金庸 = 武侠世界代码之旅
人工智能
2024-02-11 05:00:57
导读
作为武侠迷,你是不是常常幻想着自己穿梭在金庸笔下那个快意恩仇的江湖?别急,有了Python,这个梦想离你不再遥远。本篇技术博客将带你用Python探索金庸小说世界,从网站爬取、数据整理到正则匹配,代码帮你一次性搞定武侠数据大作战!
1. 网站爬取:开启金庸数据之旅
我们先从金庸小说网站获取数据。这里推荐使用BeautifulSoup,它可以轻松解析HTML文档。先安装:
pip install beautifulsoup4
代码如下:
import requests
from bs4 import BeautifulSoup
url = 'https://www.jinyongwang.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
解析完成,接下来就是定位数据。使用XPath可以精准定位:
novel_list = soup.select('div.novellist_1 ul li a')
2. 数据整理:构建你的金庸知识库
数据爬取完成后,我们用Pandas整理一下:
import pandas as pd
novels = []
for novel in novel_list:
novels.append({
'name': novel.text,
'link': novel['href']
})
df_novels = pd.DataFrame(novels)
3. 正则匹配:抽丝剥茧,挖掘人物数据
接下来,我们来提取人物数据。使用正则匹配,精准定位:
import re
pattern = r'class="character">(.+?)</span>'
characters = []
for novel in df_novels['link']:
response = requests.get(novel)
soup = BeautifulSoup(response.text, 'html.parser')
characters.extend(re.findall(pattern, soup.text))
4. 代码实例:你的Python武侠指南
现在,我们整合一下知识点,写个代码示例:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
url = 'https://www.jinyongwang.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
novel_list = soup.select('div.novellist_1 ul li a')
novels = []
for novel in novel_list:
novels.append({
'name': novel.text,
'link': novel['href']
})
df_novels = pd.DataFrame(novels)
pattern = r'class="character">(.+?)</span>'
characters = []
for novel in df_novels['link']:
response = requests.get(novel)
soup = BeautifulSoup(response.text, 'html.parser')
characters.extend(re.findall(pattern, soup.text))
print(characters)
运行代码,你就可以获得金庸小说中所有人物的列表啦!
结语
通过这趟Python金庸小说之旅,你掌握了从爬取、整理到匹配的Python技能。有了这些代码利器,你还愁探索不了金庸笔下的武侠江湖?赶紧开动你的Python小宇宙,让金庸世界在代码中尽情驰骋吧!