telnet成功显示什么_TVM系列 - 终于在手机成功部署Auto-TVM

在手机部署Auto-TVM真是说来一把心酸泪,踩了不少坑,尝试了不同的方法,花了近三个星期才部署成功。本来之前以为通过rpc将主机和手机连接成功已经成功在望,还是高兴的太早。下面将碰到的坑一一道来。

填坑之一:跑rpc测试代码android_rpc_test.py不成功

在主机开启rpc服务,绑定端口9191,在手机端通过rpc应用注册主机ip及端口,指定key值为android,手机与主机能成功连通。

63e64da549cca00d56630289dacaade3.png

然后跑官方提供的android_rpc_test.py代码,发现一直报错。

643648c70a9018e151b1374d93611ac8.png
# Establish remote connection with target hardware
    tracker = rpc.connect_tracker(tracker_host, tracker_port)
    remote = tracker.request(key, priority=0,
                             session_timeout=10, max_retry=2)

问题出在上述代码,错误信息显示主机与手机的Socket连接不成功。

首先是怀疑虽然主机与手机通过rpc能连接,Socket绑定的端口(手机是5001,主机是9191)是不是不通。用telnet命令验证端口是互通的。然后在tvm和rpc源码里加log日志发现Socket也能接收和传输信息。

3d0967b323900221d03623bde5ad9e69.png

最后一招:Android Studio外接usb线连接手机调试rpc app源码,运行android_rpctest.py在Android Studio里看到错误日志:找不到libtvm4j.so和libtvm_runtime.so,原来问题一直出在这里导致rpc连接不成功。在rpc源码tvm/jvm/core/src/main/java/org/apache/tvm/Base.java文件找到加载这两个库的代码如下。

static {
    boolean loadNativeRuntimeLib = true;
    try {
      try {
        tryLoadLibraryOS("tvm4j");
      } catch (UnsatisfiedLinkError e) {
        System.err.println("[WARN] TVM native library not found in path. ");
        NativeLibraryLoader.loadLibrary("tvm4j");
      }
    } catch (Throwable e) {
      System.err.println("[WARN] Couldn't find native library tvm4j.");
      e.printStackTrace();
      System.err.println("Try to load tvm4j (runtime packed version) ...");
      try {
        System.loadLibrary("tvm4j_runtime_packed");
        // if tvm runtime is packed in libtvm4j, we do not need to dlopen libtvm_runtime.so.
        loadNativeRuntimeLib = false;
      } catch (UnsatisfiedLinkError errFull) {
        System.err.println("[ERROR] Couldn't find native library tvm4j_runtime_packed.");
        throw new RuntimeException(errFull);
      }
    }

    System.err.println("libtvm4j loads successfully.");

    if (loadNativeRuntimeLib) {
      String tvmLibFilename = System.getProperty("libtvm.so.path");
      if (tvmLibFilename == null || !new File(tvmLibFilename).isFile()
          || _LIB.nativeLibInit(tvmLibFilename) != 0) {
        try {
          String runtimeLibname;
          String os = System.getProperty("os.name");
          // ref: http://lopica.sourceforge.net/os.html
          if (os.startsWith("Linux")) {
            runtimeLibname = "libtvm_runtime.so";
          } else if (os.startsWith("Mac")) {
            runtimeLibname = "libtvm_runtime.dylib";
          } else {
            // TODO(yizhi) support windows later
            throw new UnsatisfiedLinkError(os + " not supported currently");
          }
          NativeLibraryLoader.extractResourceFileToTempDir(runtimeLibname, new Action() {
            @Override public void invoke(File target) {
              System.err.println("Loading tvm runtime from " + target.getPath());
              checkCall(_LIB.nativeLibInit(target.getPath()));
            }
          });
        } catch (IOException e) {
          throw new RuntimeException(e);
        }
      }
    } else {
      _LIB.nativeLibInit(null);
    }

可以看到先在本地目录加载libtvm4j.so,如果找不到就去找libtvm4j_runtime_packed.so,这个库是通过执行rpc app工程下的jni目录里的build.sh脚本编译生成的,将libtvm4j.solibtvm_runtime.so打包为一个库文件。如果能成功加载libtvm4j_runtime_packed.so则不用再加载libtvm_runtime.so库。

在rpc的app工程目录下jni目录执行build.sh可以编译生成不同平台下的libtvm4j_runtime_packed.so库,如arm64-v8a、x86-64等,可以修改同目录下的Application.mk进行修改。

4a8bcfbb381395f29ae086763582d201.png

成功编译库文件后,还差最后一步:需要在app/src/main/java/org/apache/tvm/tvmrpc/RPCProcessor.java中添加导入库的代码。

public RPCProcessor(Activity activity) {
    super();
    rpc_activity = activity;
    System.loadLibrary("c++_shared");
    System.loadLibrary("tvm4j_runtime_packed");
//    System.loadLibrary("tvm4j");
//    System.loadLibrary("tvm_runtime");
  }

至此运行android_rpc_test.py成功,能正确返回结果。卡了两周的问题终于解决!

另外部署到手机需要将模型导出为arm架构的库文件部署,需要指定ndk的路径,可以在环境变量中设置TVM_NDK_CC指定ndk路径,我是在tvm/python/tvm/contrib/ndk.py中直接写死路径。

# if "TVM_NDK_CC" not in os.environ:
    #     raise RuntimeError("Require environment variable TVM_NDK_CC"
    #                        " to be the NDK standalone compiler")
    # compiler = os.environ["TVM_NDK_CC"]
    compiler = '/data_1/Projects/android/android-ndk/android-ndk-r21/opt2/android-toolchain-arm64/bin/aarch64-linux-android-clang'
    cmd = [compiler]

填坑之二:跑Auto_TVM测试代码deploy_model_on_android.py出错

解决了rpc连接和数据传输问题,能跑通android_rpc_test.py还有点小激动,然后再尝试跑官方提供的deploy_model_on_android.py竟然又报错了!!!

1d5f9bcd90f94de8a73d772fe9be87ef.png

看这个日志看不出所以然来,只能定位到是用rpc load_module有问题。再看Android Studio里面rpc的运行日志如下:

01353f0f785ecfaedc64dc0fc92c48e0.png

66cd4fa00361ab6a1ca8c43b8177ce15.png

第一日志显示导出的resnet18.so已经成功上传至手机,第二个日志显示:Binary was created using GraphRuntimeFactory but a loader of that name is not registered.意思解析resnet18.so的方法GraphRuntimeFactory没有在tvm_runtime注册。这个错误是在tvm源码tvm/src/runtime/http://library_module.cc中报错的。

for (uint64_t i = 0; i < size; ++i) {
    std::string tkey;
    CHECK(stream->Read(&tkey));
    // Currently, _lib is for DSOModule, but we
    // don't have loadbinary function for it currently
    VLOG(true) << " ProcessModuleBlob tkey: " << tkey << "n";
    if (tkey == "_lib") {
      auto dso_module = Module(make_object<LibraryModuleNode>(lib));
      modules.emplace_back(dso_module);
    } else if (tkey == "_import_tree") {
      CHECK(stream->Read(&import_tree_row_ptr));
      CHECK(stream->Read(&import_tree_child_indices));
    } else {
      std::string loadkey = "runtime.module.loadbinary_";
      std::string fkey = loadkey + tkey;
      // std::string fkey = "runtime.module.loadbinary_GraphRuntimeFactory";
      VLOG(true) << " ProcessModuleBlob fkey: " << fkey << "n";
      const PackedFunc* f = Registry::Get(fkey);
      if (f == nullptr) {
        std::string loaders = "";
        for (auto name : Registry::ListNames()) {
          if (name.rfind(loadkey, 0) == 0) {
            if (loaders.size() > 0) {
              loaders += ", ";
            }
            loaders += name.substr(loadkey.size());
          }
        }
        VLOG(true) << " ProcessModuleBlob loaders: " << loaders_info << "n";
        CHECK(f != nullptr)
            << "Binary was created using " << tkey
            << " but a loader of that name is not registered. Available loaders are " << loaders
            << ". Perhaps you need to recompile with this runtime enabled.";
      }
      Module m = (*f)(static_cast<void*>(stream));
      modules.emplace_back(m);
    }
  }

是因为在 const PackedFunc* f = Registry::Get(fkey) 找不到key值为"runtime.module.loadbinary_GraphRuntimeFactory"对应方法。在tvm/src/runtime/graph/http://graph_runtime_factory.cc文件中发现key="runtime.module.loadbinary_GraphRuntimeFactory"实际已经注册。

TVM_REGISTER_GLOBAL("runtime.module.loadbinary_GraphRuntimeFactory")
    .set_body_typed(GraphRuntimeFactoryModuleLoadBinary);

因此GraphRuntimeFactory对应的loader实际上已经在tvm_runtime注册。问题出在哪里?!

发现在rpc工程tvm_runtime.h文件(tvm/apps/android_rpc/app/src/main/jni/tvm_runtime.h)中并没有包含http://graph_runtime_factory.cc文件,所以才报找不到该方法的错误。解决方法是添加一行包含http://graph_runtime_factory.cc文件的代码。

#include "../src/runtime/graph/graph_runtime_factory.cc"

重新执行jni目录下的build.sh脚本,生成的libtvm4j_runtime_packed.so已经包含该方法,再执行deploy_model_on_android.py则成功运行。

6f17835eda08e9449772372e33389a74.png

这是tvm的一个bug,也花了近一周才终于解决。至此,官方提供Auto-TVM及rpc 模型部署代码都能成功运行!!

填坑之三:跑Auto_TVM测试代码tune_relay_arm.py出错

实际在手机上跑Auto-TVM代码优化mobilenet_v2的卷积算子时会碰到输出值一直为0的情况,如下图所示:

b7ae4a643fc9fad2211978ffce833aab.png

debug代码发现问题出在tvm/python/tvm/autotvm/measure/measure.py。

 try:
     random_fill = remote.get_function("tvm.contrib.random.random_fill")
 except AttributeError:
     raise AttributeError("Please make sure USE_RANDOM is ON in the config.cmake "
                                         "on the remote devices")

remote.get_function("tvm.contrib.random.random_fill")需要在手机端的tvm_runtime获取tvm.contrib.random.random_fill方法,获取不到则报错“Please make sure USE RANDOM is ON in the config.cmake on the remote devices,即需要在编译tvm_runtime时设置USE RANDOM=ON。该方法是在文件tvm/src/runtime/contrib/random/http://random.cc中注册的。

TVM_REGISTER_GLOBAL("tvm.contrib.random.random_fill").set_body([](TVMArgs args, TVMRetValue* ret) {
  RandomThreadLocalEntry* entry = RandomThreadLocalEntry::ThreadLocal();
  DLTensor* out = args[0];
  entry->random_engine.RandomFill(out);
});

解决方法:在rpc安卓工程的jni头文件tvm_runime.h中包含tvm/src/runtime/contrib/random/http://random.cc文件并重新执行build.sh编译tvm runimte库。

#include "../src/runtime/contrib/random/random.cc"

重新运行tune_relay_arm.py输出结果如下图所示,输出正常。

a81ad3036d80a27546574e8e2e20a84c.png

总结一下,在手机端跑tvm时,如果碰到方法get失败但该方法实际在tvm已经注册的情形,首先需要看下在rpc安卓工程的jni头文件tvm_runime.h是否已经包含该方法实现的文件,目前该头文件包含的.cc文件不全,需要自己添加。

下一步计划研究下tvm代码。