最近工作中需要进行数据迁移,于是乎就想到的Python,当然用Java也是可以完成,但是和Python相比代码量会更少
在这个过程中,也遇到了一个很大的坑,就是MySql的Python驱动,之前一直用的pymysql,MySQLdb两个效率没有明显的提升,而且Google搜出来的基本都是用这两种方式连接MySql,平时用于测试当然是没什么问题,但是有大量数据的情况下,能明显感觉到效率不行,有兴趣的可以测试下,我这里就不比较了
最终在Mysql官网下载Mysql的Python驱动,那效率杠杠的,所以在这里提醒大家,尽量多看看官方的文档吧
代码实现
# coding = utf-8
import datetime;
from mysql.connector import MySQLConnection
# 数据库
conn_test = MySQLConnection(host="127.0.0.1", port=3306, user="root", passwd="root", db="test", charset="utf8")
# 获取游标 cursor
cursor_test = conn_test.cursor()
###########################################################################################
class SynTbale:
delte_table_name = ''
insert_table_name = ''
select_table_name = ''
def __init__(self, delte_table_name, select_table_name, insert_table_name):
self.delte_table_name = delte_table_name
self.select_table_name = select_table_name
self.insert_table_name = insert_table_name
def delete_table(self, delte_table_name):
cursor_test.execute(delte_table_name)
def syn_mid_command_record(self,delte_table_name,select_table_name,insert_table_name):
cursor_test.execute(delte_table_name) #清空表数据
cursor_test.execute(select_table_name)
rows = cursor_test.fetchall()
print("总数:",len(rows))
sql_insert_mid_command_record = insert_table_name
list = []
pushtime = 0
for i in range(len(rows)):
# print(len(list_data))
list.append(rows[i])
if ((i + 1) % 10000 == 0 or (i + 1) == len(rows)): #一次插入1W条,分多次插入
pushtime += 1
print("推送次数:", pushtime)
cursor_test.executemany(sql_insert_mid_command_record, list) # 核心
conn_test.commit()
list = []
class Mid_Command_Record:
delte_table_name = 'DELETE FROM PartyCustomerSyn'
select_table_name = """ SELECT name,gender,email
FROM test """
insert_table_name = """ insert into test_copy1 ( name, gender, email)
VALUES (%s, %s, %s) """
def syn_info(self):
x = SynTbale(self.delte_table_name, self.select_table_name, self.insert_table_name)
x.syn_mid_command_record(self.delte_table_name, self.select_table_name, self.insert_table_name)
# 同步数据
if __name__ == '__main__':
start_time = datetime.datetime.now()
Mid_Command_Record().syn_info()
end_time = datetime.datetime.now()
print(end_time - start_time)
# 关闭cursor
cursor_test.close()
# 关闭连接
conn_test.close()
运行
运行代码之前了,要查询的表test里面有3000000条数据,插入的表test_copy1表里面已经有了3111111条数据,运行结果如下:
3000000
推送次数: 1
推送次数: 2
..........
省略
..........
推送次数: 297
推送次数: 298
推送次数: 299
推送次数: 300
0:06:03.010696
总共用时6分03秒,推送次数300次
用时里面我感觉很大一部分时间花费在查询数据上面,其实真正的插入时间很短,有兴趣的小伙伴可以试试把从数据库查询数据改成写死的数据,在比较下时间