在将Excel或CSV文件中的内容导入到数据库时，使用Python的Pandas工具非常方便

2 年 ago

韵, 科

2 minutes

最近才知道，当提到Python的数据分析库时，Pandas是最受欢迎的选择。有了Pandas，可以轻松将Excel或CSV文件中的内容导入到数据库中。我想在这里总结一下。

环境

Pandas 2.0.2
SQLAlchemy 2.0.16

PostgreSQL 13.8

做好准备

这次我将尝试将数据导入到PostgreSQL数据库中。
无论是云端还是本地环境都可以，但这次我选择了在本地环境中建立一个PostgreSQL容器。

FROM postgres:13-alpine
ENV LANG ja_JP.utf8

version: '3.6'
services:
  db:
    container_name: postgres
    build: .
    ports:
      - 15432:5432
    volumes:
      - postgres_volume:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: root 
      POSTGRES_PASSWORD: root 

volumes:
  postgres_volume:

在启动容器后，使用CREATE DATABASE 创建名为test_db的数据库，然后使用CREATE TABLE 在hoge模式下创建名为test_tbl的表格。

CREATE TABLE hoge.test_tbl (id integer, name varchar(20), address  varchar(30));

将数据导入数据库中（写入）。

我刚刚创建了一个与之前准备好的test_tbl相匹配的Excel文件。
这次我创建了一个名为sample.xlsx的文件。

我会写一个用于导入的Python脚本。
使用Pandas的read_excel方法读取Excel文件，然后使用to_sql方法将数据导入数据库。
create_engine方法的参数-csearch_path=hoge用于指定模式，这与编写SET search_path TO schema_name查询的意义相同。
如果要读取CSV文件而不是Excel文件，则使用read_csv方法。

import pandas as pd
from sqlalchemy import create_engine

connection_config = {
  'user': 'root',
  'password': 'root',
  'host': 'localhost',
  'port': '15432',
  'database': 'test_db'
}

if __name__ == '__main__':

  url = 'postgresql://{user}:{password}@{host}:{port}/{database}'.format(**connection_config)
  engine = create_engine(url, connect_args={'options': '-csearch_path=hoge'})

  with engine.connect() as conn:
    df = pd.read_excel('sample.xlsx')
    df.to_sql(name='test_tbl', con=conn, if_exists='append', index=False)

name

テーブル名

con

SQL接続エンジン

if_exists

append: データが追加される
replace: 既存データが置き換えられる

index

DataFrame のインデックスをテーブルに追加するか否か

schema（上記コードには書いていません）

スキーマの指定

关于其他参数，请参考官方页面。

从数据库中获取数据

从数据库中获取数据使用的是 read_sql 方法。

import pandas as pd
from sqlalchemy import create_engine

connection_config = {
  'user': 'root',
  'password': 'root',
  'host': 'localhost',
  'port': '15432',
  'database': 'test_db'
}

if __name__ == '__main__':

  url = 'postgresql://{user}:{password}@{host}:{port}/{database}'.format(**connection_config)
  engine = create_engine(url, connect_args={'options': '-csearch_path=hoge'})

  with engine.connect() as conn:
    df = pd.read_sql('SELECT * FROM test_tbl', con=conn)
    print(df)

# 実行結果
    id    name   address
0  100    Taro     Tokyo
1  101  Hanako     Chiba
2  102  Ichiro  Kanagawa

sql

SQL クエリ
テーブル名を指定することもでき、その場合は全カラムの値を取得できる

con

SQL接続エンジン

赠品

您可以使用read_excel方法，在AWS S3存储桶中读取存储的Excel（CSV）文件。需要安装fsspec和s3fs。

fsspec 2023.6.0
s3fs 2023.6.0

  with engine.connect() as conn:
    df = pd.read_excel('s3://your_bucket_name/sample.xlsx')
    df.to_sql(name='test_tbl', con=engine, if_exists='append', index=False)

执行Python脚本时，请指定AWS CLI的配置文件名。

AWS_PROFILE=プロファイル名 python pandas_sample.py

填补

由于 SQLAlchemy 支持不仅 PostgreSQL，还包括 MySQL、SQLite、Oracle、Microsoft SQLServer 等常见数据库，因此笔者尚未验证，但我认为除了 PostgreSQL 外，可以以同样的方式导入其他数据库的数据。

请参阅这篇文章。

https://towardsdatascience.com/work-with-sql-in-python-using-sqlalchemy-and-pandas-cd7693def708