Pandas concat() 的例子
Pandas的concat()方法用于连接像DataFrames和Series这样的pandas对象。我们可以传递各种参数来改变连接操作的行为。
1. pandas.concat() 语法
concat()方法的语法是:
concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
sort=None, copy=True)
- objs: a sequence of pandas objects to concatenate.
- join: optional parameter to define how to handle the indexes on the other axis. The valid values are ‘inner’ and ‘outer’.
- join_axes: deprecated in version 0.25.0.
- ignore_index: if True, the indexes from the source objects will be ignored and a sequence of indexes from 0,1,2…n will be assigned to the result.
- keys: a sequence to add an identifier to the result indexes. It’s helpful in marking the source objects in the output.
- levels: a sequence to specify the unique levels to create multiindex.
- names: names for the levels in the resulting hierarchical index.
- verify_integrity: Check whether the new concatenated axis contains duplicates. It’s an expensive operation.
- sort: Sort non-concatenation axis if it is not already aligned when join is ‘outer’. Added in version 0.23.0
- copy: if False, don’t copy data unnecessarily.
推荐阅读:Python Pandas教程
2. 例子:使用Pandas的concat()函数
让我们来看一个简单的例子,将两个DataFrame对象连接起来。
import pandas
d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}
df1 = pandas.DataFrame(d1, index={1, 2})
df2 = pandas.DataFrame(d2, index={3})
print('********\n', df1)
print('********\n', df2)
df3 = pandas.concat([df1, df2])
print('********\n', df3)
输出结果:
********
Name ID
1 Pankaj 1
2 Lisa 2
********
Name ID
3 David 3
********
Name ID
1 Pankaj 1
2 Lisa 2
3 David 3
请注意,拼接是按行进行的,即在0轴上进行。此外,源DataFrame对象的索引在输出中被保留。
3. 沿着列进行拼接,即1轴
d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Role": ["Admin", "Editor"]}
df1 = pandas.DataFrame(d1, index={1, 2})
df2 = pandas.DataFrame(d2, index={1, 2})
df3 = pandas.concat([df1, df2], axis=1)
print('********\n', df3)
输出: 用汉语以本地方式改写以下内容,只需要一种选项。
********
Name ID Role
1 Pankaj 1 Admin
2 Lisa 2 Editor
当源对象包含对象的不同种类数据时,按列连接是有意义的。
4. 为连接的数据帧索引分配键。
d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}
df1 = pandas.DataFrame(d1, index={1, 2})
df2 = pandas.DataFrame(d2, index={3})
df3 = pandas.concat([df1, df2], keys=["DF1", "DF2"])
print('********\n', df3)
产出:
********
Name ID
DF1 1 Pankaj 1
2 Lisa 2
DF2 3 David 3
5. 忽略源DataFrame对象在连接中的作用。
d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}
df1 = pandas.DataFrame(d1, index={10, 20})
df2 = pandas.DataFrame(d2, index={30})
df3 = pandas.concat([df1, df2], ignore_index=True)
print('********\n', df3)
There are various ways to achieve effective output in Chinese as it is a highly versatile language. However, here is one possible way to paraphrase the given sentence natively in Chinese:
结果:
********
Name ID
0 Pankaj 1
1 Lisa 2
2 David 3
这在源对象的索引不太有意义时会很有用。因此,我们可以忽略它们并将默认索引分配给输出的DataFrame。
6. 参考文献
- pandas.concat() API Doc