pyspark 读mysql数据_pyspark 数据库读写数据

一.从数据库读数据

1.导入jar包

c45307bc9383

image.png

在spark-hadoop包下的jars中导入对应数据库驱动的jar包

如

c45307bc9383

image.png

我所用的是oracle数据库,则导入ojdbc6-11.2.0.jar

2.数据库配置

我的数据库配置采用的 ini 配置文件的方式(此步可省略,手写链接配置也可以)

c45307bc9383

image.png

获取配置的方法:

#dbtype为[]中的名称,config_path为配置文件的地址

def get_db_config(dbtype,config_path='/home/ap/cognos/JRJY_Rec/config/db_config.ini'):

import configparser

#读取ini配置文件

cf = configparser.ConfigParser()

cf.read(config_path)

url = cf.get(dbtype,'url')

user = cf.get(dbtype,'user')

password = cf.get(dbtype,'password')

driver = cf.get(dbtype,'driver')

prop = {'user': user,'password': password,'driver': driver}

return prop,url

prop,url = get_db_config('oracle-hasdb')

#prop中为用户名,密码,驱动

#url为jdbc链接

3.从数据库导出数据到pyspark的dataframe

df = spark.read.jdbc(url=url,table='table_name',properties=prop)

# url jdbc连接

# table 数据库表名,也可以是查询语句,如:select * from table_name where ....

# properties 配置信息,也可以手动填写,如:properties={'user':'username','password':'password','driver':'driver'}

二.dataframe写入数据到数据库

prop,url = get_db_config('oracle-hasdb')

df.write.jdbc(url=url, table='table_name', mode='append', properties=prop)

# 配置文件和读数据库配置一样

# table table为数据库建立的表,如果不存在,spark会为df建立表

# mode append为追加写人数据