将Pandas的to_csv()函数用于将DataFrame转换为CSV格式

1 年 ago

韵, 科

3 minutes

Pandas DataFrame to_csv() 函数将 DataFrame 转换为 CSV 数据。我们可以传递一个文件对象来将 CSV 数据写入文件中。否则，CSV 数据将以字符串形式返回。

Pandas DataFrame to_csv() 语法

DataFrame的to_csv()函数的语法是：

def to_csv(
    self,
    path_or_buf=None,
    sep=",",
    na_rep="",
    float_format=None,
    columns=None,
    header=True,
    index=True,
    index_label=None,
    mode="w",
    encoding=None,
    compression="infer",
    quoting=None,
    quotechar='"',
    line_terminator=None,
    chunksize=None,
    date_format=None,
    doublequote=True,
    escapechar=None,
    decimal=".",
)

一些重要的参数包括:

path_or_buf: the file object to write the CSV data. If this argument is not provided, the CSV data is returned as a string.
sep: the delimiter for the CSV data. It should be a string of length 1, the default is a comma.
na_rep: string representing null or missing values, default is empty string.
columns: a sequence to specify the columns to include in the CSV output.
header: the allowed values are boolean or a list of string, default is True. If False, the column names are not written in the output. If a list of string, it’s used to write the column names. The length of the list of string should be the same as the number of columns being written in the CSV file.
index: if True, index is included in the CSV data. If False, the index value is not written in the CSV output.
index_label: used to specify the column name for index.

将Pandas DataFrame保存为CSV文件的示例

让我们观察一些常见的例子，使用to_csv()函数将DataFrame转换为CSV数据。

将DataFrame转换为CSV字符串

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna'], 'ID': [1, 2], 'Role': ['CEO', 'CTO']}

df = pd.DataFrame(d1)

print('DataFrame:\n', df)

# default CSV
csv_data = df.to_csv()
print('\nCSV String:\n', csv_data)

输出：产出。

DataFrame:
      Name  ID Role
0  Pankaj   1  CEO
1  Meghna   2  CTO

CSV String:
 ,Name,ID,Role
0,Pankaj,1,CEO
1,Meghna,2,CTO

2. 指定 CSV 输出的分隔符

csv_data = df.to_csv(sep='|')
print(csv_data)

产出：输出

|Name|ID|Role
0|Pankaj|1|CEO
1|Meghna|2|CTO

如果指定的分隔符长度不为1，则会引发TypeError: “delimiter”必须是一个字符的字符串。

选择仅导出CSV文件的几列数据。

csv_data = df.to_csv(columns=['Name', 'ID'])
print(csv_data)

输出：

,Name,ID
0,Pankaj,1
1,Meghna,2

注意到索引不被视为有效的列。

忽略CSV输出中的标题行

csv_data = df.to_csv(header=False)
print(csv_data)

结果：输出。

0,Pankaj,1,CEO
1,Meghna,2,CTO

5. 在CSV中设置自定义列名。

csv_data = df.to_csv(header=['NAME', 'ID', 'ROLE'])
print(csv_data)

产出：

,NAME,ID,ROLE
0,Pankaj,1,CEO
1,Meghna,2,CTO

再次强调，索引并不被视为DataFrame对象的列。

6. 在CSV输出中跳过索引列

csv_data = df.to_csv(index=False)
print(csv_data)

输出：进行全球范围内的市场调研，以获得有关消费者行为和偏好的深入了解，并为业务发展提供战略指导和洞察力。

Name,ID,Role
Pankaj,1,CEO
Meghna,2,CTO

7. 在CSV文件中设置索引列的名称

csv_data = df.to_csv(index_label='Sl No.')
print(csv_data)

输出：以本地语言的方式对以下内容进行改述，只需要一种选择：

The information is displayed on the screen.
信息显示在屏幕上。

Sl No.,Name,ID,Role
0,Pankaj,1,CEO
1,Meghna,2,CTO

8. 将DataFrame转换为CSV文件。

with open('csv_data.txt', 'w') as csv_file:
    df.to_csv(path_or_buf=csv_file)

在CSV输出中，Null、NA或Missing Data的表示方法。

import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna'], 'ID': [1, pd.NaT], 'Role': [pd.NaT, 'CTO']}
df = pd.DataFrame(d1)
print('DataFrame:\n', df)

csv_data = df.to_csv()
print('\nCSV String:\n', csv_data)

csv_data = df.to_csv(na_rep="None")
print('CSV String with Null Data Representation:\n', csv_data)

输出：

DataFrame:
      Name   ID Role
0  Pankaj    1  NaT
1  Meghna  NaT  CTO

CSV String:
 ,Name,ID,Role
0,Pankaj,1,
1,Meghna,,CTO

CSV String with Null Data Representation:
 ,Name,ID,Role
0,Pankaj,1,None
1,Meghna,None,CTO

参考文献

Pandas read_csv() – Reading CSV File to DataFrame
Python Pandas Module Tutorial
DataFrame to_csv() API Doc