Pandas concat() 的例子

2 年 ago

清, 宇

2 minutes

Pandas的concat()方法用于连接像DataFrames和Series这样的pandas对象。我们可以传递各种参数来改变连接操作的行为。

1. pandas.concat() 语法

concat()方法的语法是：

concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
           keys=None, levels=None, names=None, verify_integrity=False,
           sort=None, copy=True)

objs: a sequence of pandas objects to concatenate.
join: optional parameter to define how to handle the indexes on the other axis. The valid values are ‘inner’ and ‘outer’.
join_axes: deprecated in version 0.25.0.
ignore_index: if True, the indexes from the source objects will be ignored and a sequence of indexes from 0,1,2…n will be assigned to the result.
keys: a sequence to add an identifier to the result indexes. It’s helpful in marking the source objects in the output.
levels: a sequence to specify the unique levels to create multiindex.
names: names for the levels in the resulting hierarchical index.
verify_integrity: Check whether the new concatenated axis contains duplicates. It’s an expensive operation.
sort: Sort non-concatenation axis if it is not already aligned when join is ‘outer’. Added in version 0.23.0
copy: if False, don’t copy data unnecessarily.

2. 例子：使用Pandas的concat()函数

让我们来看一个简单的例子，将两个DataFrame对象连接起来。

import pandas

d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}

df1 = pandas.DataFrame(d1, index={1, 2})
df2 = pandas.DataFrame(d2, index={3})

print('********\n', df1)
print('********\n', df2)

df3 = pandas.concat([df1, df2])

print('********\n', df3)

输出结果：

********
      Name  ID
1  Pankaj   1
2    Lisa   2
********
     Name  ID
3  David   3
********
      Name  ID
1  Pankaj   1
2    Lisa   2
3   David   3

请注意，拼接是按行进行的，即在0轴上进行。此外，源DataFrame对象的索引在输出中被保留。

3. 沿着列进行拼接，即1轴

d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Role": ["Admin", "Editor"]}

df1 = pandas.DataFrame(d1, index={1, 2})
df2 = pandas.DataFrame(d2, index={1, 2})

df3 = pandas.concat([df1, df2], axis=1)
print('********\n', df3)

输出: 用汉语以本地方式改写以下内容，只需要一种选项。

********
      Name  ID    Role
1  Pankaj   1   Admin
2    Lisa   2  Editor

当源对象包含对象的不同种类数据时，按列连接是有意义的。

4. 为连接的数据帧索引分配键。

d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}

df1 = pandas.DataFrame(d1, index={1, 2})
df2 = pandas.DataFrame(d2, index={3})

df3 = pandas.concat([df1, df2], keys=["DF1", "DF2"])
print('********\n', df3)

产出：

********
          Name  ID
DF1 1  Pankaj   1
    2    Lisa   2
DF2 3   David   3

5. 忽略源DataFrame对象在连接中的作用。

d1 = {"Name": ["Pankaj", "Lisa"], "ID": [1, 2]}
d2 = {"Name": "David", "ID": 3}

df1 = pandas.DataFrame(d1, index={10, 20})
df2 = pandas.DataFrame(d2, index={30})

df3 = pandas.concat([df1, df2], ignore_index=True)
print('********\n', df3)

There are various ways to achieve effective output in Chinese as it is a highly versatile language. However, here is one possible way to paraphrase the given sentence natively in Chinese:

结果：

********
      Name  ID
0  Pankaj   1
1    Lisa   2
2   David   3

这在源对象的索引不太有意义时会很有用。因此，我们可以忽略它们并将默认索引分配给输出的DataFrame。

6. 参考文献

pandas.concat() API Doc