flink on yarn 问题记录

java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_202]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_202]
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) ~[?:1.8.0_202]
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_202]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[?:1.8.0_202]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[?:1.8.0_202]
	at org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:222) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:164) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91) ~[rtdw-1.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_202]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_202]
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) ~[rtdw-1.0.jar:?]
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [rtdw-1.0.jar:?]
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [rtdw-1.0.jar:?]
	at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [rtdw-1.0.jar:?]
	at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [rtdw-1.0.jar:?]
	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [rtdw-1.0.jar:?]
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [rtdw-1.0.jar:?]
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [rtdw-1.0.jar:?]
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [rtdw-1.0.jar:?]
	at akka.actor.Actor.aroundReceive(Actor.scala:517) [rtdw-1.0.jar:?]
	at akka.actor.Actor.aroundReceive$(Actor.scala:515) [rtdw-1.0.jar:?]
	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [rtdw-1.0.jar:?]
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [rtdw-1.0.jar:?]
	at akka.actor.ActorCell.invoke(ActorCell.scala:561) [rtdw-1.0.jar:?]
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [rtdw-1.0.jar:?]
	at akka.dispatch.Mailbox.run(Mailbox.scala:225) [rtdw-1.0.jar:?]
	at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [rtdw-1.0.jar:?]
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [rtdw-1.0.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [rtdw-1.0.jar:?]
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [rtdw-1.0.jar:?]
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [rtdw-1.0.jar:?]
Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout
	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) ~[rtdw-1.0.jar:?]
	... 26 more
Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 300000 ms
	at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86) ~[rtdw-1.0.jar:?]
	... 26 more
2022-08-21 17:13:56,998 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Discarding the results produced by task execution d192cd91bb3ed7aec7da70be712ccd7e.
2022-08-21 17:13:57,001 INFO  org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - Calculating tasks to restart to recover the failed task 6b4d471152f7c802fd98f04640343bd2_1.
2022-08-21 17:13:57,002 INFO  org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy [] - 6 tasks should be restarted to recover the failed task 6b4d471152f7c802fd98f04640343bd2_1. 
2022-08-21 17:13:57,003 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job insert-into_default_catalog.default_database.course_modules_completion (077ddf5e864b081bf94e2db5fbfb9de2) switched from state RUNNING to FAILING.
org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout

1.无法分配所需的资源槽,资源槽请求超时!

导致flink-web上观察flink任务状态一直是created状态

解决:

yarn slot 满了,减少并发度
或者单个cpu使用率高,增加并发度, 减少单个并发内存 

2. rest.port  端口被占用

Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 39144    # 没有指定端口范围
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:212) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:600) [rtdw-1.0.jar:?]
	at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:99) [flink-dist_2.11-1.13.2.jar:1.13.2]
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
	at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:275) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:250) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:189) ~[rtdw-1.0.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_202]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_202]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.1.jar:?]
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:186) ~[rtdw-1.0.jar:?]
	... 2 more
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 39144
	at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:234) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:172) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:250) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:189) ~[rtdw-1.0.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_202]
	at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_202]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.1.jar:?]
	at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:186) ~[rtdw-1.0.jar:?]
	... 2 more

解决: vim  flink-conf.yaml 修改端口范围即可

The port to which the REST client connects to. If rest.bind-port has not been specified, then the server will bind to this port as well.

译: REST客户端连接到的端口。如果未指定REST.bind-port,则服务器也将绑定到此端口。
rest.port: 8087

Port range for the REST and web server to bind to 

译: REST和web服务器要绑定到的端口范围

3. java.lang.VerifyError: Cannot inherit from final class

观察log日志信息 发现是因为,flink中的cdc依赖问题

如果flink的lib目录下存在 ...cdc.jar 就是版本问题,需要和代码中使用的一致

如果不存在 ....cdc.jar 需要下载即可

下载链接:

wget https://repo1.maven.org/maven2/com/ververica/flink-connector-mysql-cdc/2.0.0/flink-connector-mysql-cdc-2.0.0.jar   

at com.ververica.cdc.connectors.mysql.debezium.DebeziumUtils.openMySqlConnection(DebeziumUtils.java:55) ~[rtdw-1.0.jar:?]
at com.ververica.cdc.connectors.mysql.MySqlValidator.validate(MySqlValidator.java:59) ~[rtdw-1.0.jar:?]
at com.ververica.cdc.connectors.mysql.debezium.DebeziumUtils.openMySqlConnection(DebeziumUtils.java:55) ~[rtdw-1.0.jar:?]
	at com.ververica.cdc.connectors.mysql.MySqlValidator.validate(MySqlValidator.java:59) ~[rtdw-1.0.jar:?]
	at com.ververica.cdc.debezium.DebeziumSourceFunction.open(DebeziumSourceFunction.java:215) ~[rtdw-1.0.jar:?]
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:442) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:582) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:562) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647) ~[rtdw-1.0.jar:?]
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759) ~[rtdw-1.0.jar:?]
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) ~[rtdw-1.0.jar:?]

解决完以上错误后终于是成功运行


版权声明:本文为m0_58149226原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。