使用过spark的人都知道广播变量这个概念。广播变量相当于一个共享变量,将一个小数据集复制分发到每个task,task直接从本地读取。flink中有两种广播变量,一种静态的广播变量,一种实时动态的广播变量。
静态广播变量示例:
使用场景如: 黑名单判断,将黑名单广播出去进行数据匹配。
public class FlinkBroadcast2 {
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
DataSet<Integer> ds1 = env.fromElements(1, 2, 3, 4);
DataSource<Integer> ds2 = env.fromElements(7, 8, 9, 10);
ds2.map(new RichMapFunction<Integer, String>() {
List<Integer> list = new ArrayList<Integer>();
public void open(Configuration parameters) throws Exception {
list = getRuntimeContext().getBroadcastVariable("bs");
}
@Override
public String map(Integer integer) throws Exception {
return integer.intValue()+":"+list;
}
}).withBroadcastSet(ds1,"bs").print();
// env.execute();
}
动态广播变量示例:
使用场景: 数据依赖某些动态变化的处理规则
广播流一般都是从kafka或其他数据源获取,这里演示直接固定了。从kafka获取流,修改数据后,下游也会更新广播流。
key streaming 使用KeyedBroadcastProcessFunction
.
非key streaming 使用 BroadcastProcessFunction
.
public class FlinkBroadcast {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStream<Integer> ds1 = env.fromElements(2);
DataStreamSource<Integer> ds2 = env.fromElements(7,8,9,10);
MapStateDescriptor<String, Integer> ruleStateDescriptor = new MapStateDescriptor<>(
"BroadcastState",
BasicTypeInfo.STRING_TYPE_INFO,
TypeInformation.of(Integer.TYPE));
BroadcastStream<Integer> ruleBroadcastStream = ds1
.broadcast(ruleStateDescriptor);
ds2.connect(ruleBroadcastStream)
.process(
new BroadcastProcessFunction<Integer,Integer,String>() {
@Override
public void processElement(Integer integer, ReadOnlyContext readOnlyContext, Collector<String> collector) throws Exception {
ReadOnlyBroadcastState<String, Integer> state = readOnlyContext.getBroadcastState(ruleStateDescriptor);
Integer integer1 = state.get("test");
collector.collect(integer+"="+integer1);
}
@Override
public void processBroadcastElement(Integer integer, Context context, Collector<String> collector) throws Exception {
context.getBroadcastState(ruleStateDescriptor).put("test",integer);
}
}
).print();
env.execute();
}
}
版权声明:本文为wflh323原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。