Could not retrieve the web interface URL for the cluster 错误问题解决

问题描述:

      org.apache.flink.client.program.rest.RestClusterClient:Could not retrieve the web interface URL for the cluster.

详细日志如下

Exception in thread "main" java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at com.dtstack.flinkx.launcher.Launcher.main(Launcher.java:131)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
	at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:400)
	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
	at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:1255)
	at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:217)
	at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$15(FutureUtils.java:582)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.co

下载flink 1.13进行编译(注意一定要全部编译,如果单独编译可能会出现各种问题):

在RestClusterClient假如日志:

CompletableFuture<URL> getWebMonitorBaseUrl() {
        LOG.info(
                "------------------getWebMonitorBaseUrl {}, {},",restClusterClientConfiguration.getAwaitLeaderTimeout() , TimeUnit.MILLISECONDS);
        return FutureUtils.orTimeout(
                        webMonitorLeaderRetriever.getLeaderFuture(),
                        restClusterClientConfiguration.getAwaitLeaderTimeout(),
                        TimeUnit.MILLISECONDS)
                .thenApplyAsync(
                        leaderAddressSessionId -> {
                            final String url = leaderAddressSessionId.f0;
                            LOG.info("------------------getWebMonitorBaseUrl url is {}", url);
                            try {
                                return new URL(url);
                            } catch (MalformedURLException e) {
                                throw new IllegalArgumentException(
                                        "Could not parse URL from " + url, e);
                            }
                        },
                        executorService);
    }

注意如果报下面错误:

Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check (spotless-check) on project flink-clients_2.11: The following files had format violations:
    src\main\java\org\apache\flink\client\program\rest\RestClusterClient.java
        @@ -1,895 +1,895 @@
        -/*\n

需要使用 mvn spotless:apply 先格式化一下代码

编译后将含有日志的打印出来,效果如下:

可以看到超时时间是30秒,这个是正常情况.

接着往下定位:

异常环境中为false,可能是因为调度方式的问题。

初步原因是这两个地方

org.apache.flink.runtime.concurrent.FutureUtils中的orTimeOut方法。

返回标识此CompletableFuture的字符串及其完成状态。括号中的状态包含字符串 “Completed Normally”(“正常完成”)或字符串 “Completed Exceptionally”(“异常完成”),或字符串 “Not completed”(“未完成”),其后是取决于完成情况的CompletableFuture数量(如果有)。

引起此原因更深入的原因,查找完成后,看后续文章。

原因已经定位:

此问题为 flink启动yarn-session.sh方式,但是flink-conf.yaml配置文件中没有配置zookeeper高可用。

配置后重启 flink问题解决

 

 


版权声明:本文为yfqfy原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。