Python批量插入数据至MySql

starlin 279 2018-08-30

最近工作中需要进行数据迁移,于是乎就想到的Python,当然用Java也是可以完成,但是和Python相比代码量会更少

在这个过程中,也遇到了一个很大的坑,就是MySql的Python驱动,之前一直用的pymysql,MySQLdb两个效率没有明显的提升,而且Google搜出来的基本都是用这两种方式连接MySql,平时用于测试当然是没什么问题,但是有大量数据的情况下,能明显感觉到效率不行,有兴趣的可以测试下,我这里就不比较了

最终在Mysql官网下载Mysql的Python驱动,那效率杠杠的,所以在这里提醒大家,尽量多看看官方的文档吧

代码实现

# coding = utf-8
import datetime;
from mysql.connector import MySQLConnection

# 数据库
conn_test =  MySQLConnection(host="127.0.0.1", port=3306, user="root", passwd="root", db="test", charset="utf8")

# 获取游标 cursor
cursor_test = conn_test.cursor()

###########################################################################################
class SynTbale:
    delte_table_name = ''
    insert_table_name = ''
    select_table_name = ''
    def __init__(self, delte_table_name, select_table_name, insert_table_name):
        self.delte_table_name = delte_table_name
        self.select_table_name = select_table_name
        self.insert_table_name = insert_table_name

    def delete_table(self, delte_table_name):
        cursor_test.execute(delte_table_name)

    def syn_mid_command_record(self,delte_table_name,select_table_name,insert_table_name):
        cursor_test.execute(delte_table_name) #清空表数据
        cursor_test.execute(select_table_name)
        rows = cursor_test.fetchall()
        print("总数:",len(rows))
        sql_insert_mid_command_record = insert_table_name
        list = []
        pushtime = 0
        for i in range(len(rows)):
            # print(len(list_data))
            list.append(rows[i])
            if ((i + 1) % 10000 == 0 or (i + 1) == len(rows)): #一次插入1W条,分多次插入
                pushtime += 1
                print("推送次数:", pushtime)
                cursor_test.executemany(sql_insert_mid_command_record, list)  # 核心
                conn_test.commit()
                list = []

class Mid_Command_Record:
    delte_table_name = 'DELETE FROM PartyCustomerSyn'

    select_table_name = """ SELECT name,gender,email 
                             FROM test   """



    insert_table_name = """ insert into test_copy1 ( name, gender, email) 
                            VALUES (%s, %s, %s) """


    def syn_info(self):
        x = SynTbale(self.delte_table_name, self.select_table_name, self.insert_table_name)
        x.syn_mid_command_record(self.delte_table_name, self.select_table_name, self.insert_table_name)


# 同步数据
if __name__ == '__main__':
    start_time = datetime.datetime.now()
    Mid_Command_Record().syn_info()
    end_time = datetime.datetime.now()
    print(end_time - start_time)


# 关闭cursor
cursor_test.close()

# 关闭连接
conn_test.close()

运行

运行代码之前了,要查询的表test里面有3000000条数据,插入的表test_copy1表里面已经有了3111111条数据,运行结果如下:

3000000
推送次数: 1
推送次数: 2
..........
省略
..........
推送次数: 297
推送次数: 298
推送次数: 299
推送次数: 300
0:06:03.010696

总共用时6分03秒,推送次数300次
用时里面我感觉很大一部分时间花费在查询数据上面,其实真正的插入时间很短,有兴趣的小伙伴可以试试把从数据库查询数据改成写死的数据,在比较下时间

参考

MySql-python官方文档


# Python