一、普通json格式
val session = SparkSession.builder().appName("sql").master("local").getOrCreate()
val df = session.read.format("json").load("./data/json")
二、读取嵌套的json格式文件
/**
* 格式:
* {"name":"zhangsan","score":100,"infos":{"age":20,"gender":'man'}}
*/
val spark = SparkSession.builder().master("local")
.appName("nextjson")
.getOrCreate()
//读取嵌套的json文件
val frame = spark.read.format("json").load("./data/NestJsonFile")
frame.printSchema()
frame.show(100)
frame.createOrReplaceTempView("infosView")
//json中嵌套有对象,通过infos.age就能取出来,写sql
spark.sql("select name,infos.age,score,infos.gender from infosView").show(100)
三、读取嵌套的jsonArray数组
/**
* 读取嵌套的jsonArray数组,格式如下:
* {"name":"lisi","age":19,"scores":[{"yuwen":58,"shuxue":50,"yingyu":78},{"dili":56,"shengwu":76,"huaxue":13}]}
*
*explode函数作用:将数组展开,数组中的每个json都是一条数据
*/
val spark = SparkSession.builder()
.appName("jsonArray")
.master("local")
.getOrCreate()
val frame = spark.read.format("json").load("./data/jsonArrayFile")
//不折叠显示
frame.show(false)
frame.printSchema()
import org.apache.spark.sql.functions._
import spark.implicits._
//select后,name还叫name,age还叫age,score叫allScores
val transDF = frame.select($"name",$"age",explode($"scores")).toDF("name","age","allScores")
transDF.show(100,false)
transDF.printSchema()
val result: DataFrame = transDF.select($"name", $"age",
$"allScores.yuwen" as "yuwen",
$"allScores.shuxue" as "shuxue",
$"allScores.yingyu" as "yingyu",
$"allScores.dili" as "dili",
$"allScores.shengwu" as "shengwu",
$"allScores.huaxue" as "huaxue")
result.show(100,true)
版权声明:本文为qq_36299025原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。