从头到尾掌握查看 Series 与 DataFrame 对象数据的技巧

2023-04-16 18:53:04

深入探索 Python 中的数据洞察：head()、tail()、loc[]、iloc[]、info() 和 describe() 的妙用

数据预览：head() 和 tail() 的便利性

当我们处理海量数据集时，逐行查看显然是不切实际的。这时，head() 和 tail() 方法就能大显身手。head() 可以显示 Series 或 DataFrame 对象的前 n 行数据，而 tail() 则展示最后 n 行。

使用示例：

import pandas as pd

data = pd.DataFrame({
    "Name": ["John", "Mary", "Bob", "Alice", "Tom"],
    "Age": [20, 25, 30, 35, 40],
    "City": ["New York", "London", "Paris", "Rome", "Berlin"]
})

print("First 3 rows:")
print(data.head(3))

print("\nLast 2 rows:")
print(data.tail(2))

输出：

   Name  Age  City
0  John   20  New York
1  Mary   25  London
2   Bob   30  Paris

   Name  Age  City
3  Alice  35  Rome
4   Tom   40  Berlin

只需要一行代码，我们就轻松预览了数据的开头和结尾，对整体分布有了初步了解。

精细选取数据：loc[] 和 iloc[] 的操作艺术

如果我们想要更精细地选择数据，.loc[] 和 .iloc[] 操作符就是不二之选。.loc[] 基于标签进行索引，而 .iloc[] 则基于位置进行索引。

使用 .loc[] 根据标签选择数据

print("Rows with index 1 and 3:")
print(data.loc[[1, 3]])

print("\nRows with 'John' and 'Alice' as names:")
print(data.loc[[data['Name'] == 'John'] | [data['Name'] == 'Alice']])

输出：

   Name  Age  City
1  Mary   25  London
3  Alice  35  Rome

   Name  Age  City
0  John   20  New York
3  Alice  35  Rome

使用 .iloc[] 根据位置选择数据

print("First two rows:")
print(data.iloc[:2])

print("\nRows at positions 1, 3, and 4:")
print(data.iloc[[1, 3, 4]])

输出：

   Name  Age  City
0  John   20  New York
1  Mary   25  London

   Name  Age  City
1  Mary   25  London
3  Alice  35  Rome
4   Tom   40  Berlin

灵活组合 .loc[] 和 .iloc[] 实现精准查询

.loc[] 和 .iloc[] 可以组合使用，实现更加灵活的数据查询。

print("Row with index 1 and column 'City':")
print(data.loc[1, 'City'])

print("\nColumns 'Age' and 'City' of rows with index 1 and 3:")
print(data.iloc[[1, 3], [1, 2]])

输出：

London
   Age  City
1  25  London
3  35  Rome

数据概览：info() 和 describe() 的洞察力

除了查看具体数据，我们还需要了解数据的整体分布。此时，info() 和 describe() 方法便发挥了作用。

使用 info() 查看数据类型与非空值数量

print("Data types and non-null values:")
print(data.info())

输出：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 264.0+ bytes

使用 describe() 查看数据统计信息

print("Statistical summary:")
print(data.describe())

输出：

       Age  
count  5.000000
mean   30.000000
std     7.071068
min    20.000000
25%    25.000000
50%    30.000000
75%    35.000000
max    40.000000

通过 info() 和 describe()，我们对数据的类型、非空值情况、统计信息有了更深入的了解。

结论

掌握了 head()、tail()、loc[]、iloc[]、info() 和 describe() 等方法，我们就能轻松从局部窥探全局，从数据中挖掘有价值的见解。这些技巧对于数据分析和机器学习等领域至关重要。

常见问题解答

如何只显示 DataFrame 的前 5 行数据？
使用 head(5) 方法。
如何基于位置选择 DataFrame 的第 2 行和第 4 行？
使用 iloc[[1, 3], :] 方法。
如何查看 DataFrame 的数据类型？
使用 info() 方法。
如何计算 DataFrame 中某一列的平均值？
使用 describe() 方法，然后查看该列的 mean 值。
如何根据特定条件过滤 DataFrame 的行？
使用 loc[] 或 iloc[] 方法，并指定查询条件。

Kyle

探索Web开发资源和人工智能教程的代码社区

联系我

扫码关注微信公众号

从头到尾掌握查看 Series 与 DataFrame 对象数据的技巧

Kyle

Bloom Filter 构筑缓存系统的稳固防线

xstate 助力携程金服打造合规业务流程动态化

HarmonyOS应用事件打点开发：助力移动App开发提效增质

颠覆传统，用FastAPI打开后端开发新篇章

快速掌握TypeScript类型定义：告别繁琐，轻松无忧！