返回

Python x 金庸 = 武侠世界代码之旅

人工智能

导读

作为武侠迷,你是不是常常幻想着自己穿梭在金庸笔下那个快意恩仇的江湖?别急,有了Python,这个梦想离你不再遥远。本篇技术博客将带你用Python探索金庸小说世界,从网站爬取、数据整理到正则匹配,代码帮你一次性搞定武侠数据大作战!

1. 网站爬取:开启金庸数据之旅

我们先从金庸小说网站获取数据。这里推荐使用BeautifulSoup,它可以轻松解析HTML文档。先安装:

pip install beautifulsoup4

代码如下:

import requests
from bs4 import BeautifulSoup

url = 'https://www.jinyongwang.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

解析完成,接下来就是定位数据。使用XPath可以精准定位:

novel_list = soup.select('div.novellist_1 ul li a')

2. 数据整理:构建你的金庸知识库

数据爬取完成后,我们用Pandas整理一下:

import pandas as pd

novels = []
for novel in novel_list:
    novels.append({
        'name': novel.text,
        'link': novel['href']
    })

df_novels = pd.DataFrame(novels)

3. 正则匹配:抽丝剥茧,挖掘人物数据

接下来,我们来提取人物数据。使用正则匹配,精准定位:

import re

pattern = r'class="character">(.+?)</span>'
characters = []
for novel in df_novels['link']:
    response = requests.get(novel)
    soup = BeautifulSoup(response.text, 'html.parser')
    characters.extend(re.findall(pattern, soup.text))

4. 代码实例:你的Python武侠指南

现在,我们整合一下知识点,写个代码示例:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

url = 'https://www.jinyongwang.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

novel_list = soup.select('div.novellist_1 ul li a')

novels = []
for novel in novel_list:
    novels.append({
        'name': novel.text,
        'link': novel['href']
    })

df_novels = pd.DataFrame(novels)

pattern = r'class="character">(.+?)</span>'
characters = []
for novel in df_novels['link']:
    response = requests.get(novel)
    soup = BeautifulSoup(response.text, 'html.parser')
    characters.extend(re.findall(pattern, soup.text))

print(characters)

运行代码,你就可以获得金庸小说中所有人物的列表啦!

结语

通过这趟Python金庸小说之旅,你掌握了从爬取、整理到匹配的Python技能。有了这些代码利器,你还愁探索不了金庸笔下的武侠江湖?赶紧开动你的Python小宇宙,让金庸世界在代码中尽情驰骋吧!