ANR 之 Java进程的Trace文件解析

和你一起终身学习,这里是程序员Android

经典好文推荐,通过阅读本文,您将收获以下知识点:

一 概述

Android 系统采用的 art 虚拟机,所有的 Java 进程都运行在 ART 之上,当应用发生 ANR,其中最终的一个环节便是向目标进程发送信号 SIGNAL_QUIT, 传统的 linux 则是终止程序并输出 core;而对于 Android 进程来说当收到 SIGQUIT 时,Java 层面的进程都是跑在虚拟机之上的,ART 虚拟机会捕获该信号,并输出相应的 traces 信息保存到目录 /data/anr/traces.txt。

当然也可以通过一条命令来获取指定进程的 traces 信息,例如输出 pid=888 的进程信息:

adb shell kill -3 888 //可指定进程pid
  • 1

执行完该命令后traces信息的结果保存到文件/data/anr/traces.txt,如下:

//[见小节2.2]
----- pid 888 at 2016-11-11 22:22:22 -----
Cmd line: system_server
ABI: arm
Build type: optimized
//[见小节3.1]
Zygote loaded classes=4113 post zygote classes=3239
//[见小节3.2]
Intern table: 57550 strong; 9315 weak
//共加载16动态库 [见小节3.3]
JNI: CheckJNI is off; globals=2418 (plus 115 weak)
Libraries: /system/lib/libandroid.so /system/lib/libandroid_servers.so /system/lib/libaudioeffect_jni.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/librs_jni.so /system/lib/libsechook.so /system/lib/libshell_jni.so /system/lib/libsoundpool.so /system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so /vendor/lib/libalarmservice_jni.so /vendor/lib/liblocationservice.so libjavacore.so (16)
//已分配堆内存大小40MB,其中29M已用,总分配207772个对象 [见小节3.4]
Heap: 27% free, 29MB/40MB; 307772 objects
... //省略GC相关信息

//当前进程总99个线程[见小节3.5]
DALVIK THREADS (99):
//主线程调用栈[见小节3.6]
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x75bd9fb0 self=0x5573d4f770
  | sysTid=12078 nice=-2 cgrp=default sched=0/0 handle=0x7fa75fafe8
  | state=S schedstat=( 5907843636 827600677 5112 ) utm=453 stm=137 core=0 HZ=100
  | stack=0x7fd64ef000-0x7fd64f1000 stackSize=8MB
  | held mutexes=
  //内核栈[见小节3.6.2]
  kernel: __switch_to+0x70/0x7c
  kernel: SyS_epoll_wait+0x2a0/0x324
  kernel: SyS_epoll_pwait+0xa4/0x120
  kernel: cpu_switch_to+0x48/0x4c
  native: #00 pc 0000000000069be4 /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001cca4 /system/lib64/libc.so (epoll_pwait+32)
  native: #02 pc 000000000001ad74 /system/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+144)
  native: #03 pc 000000000001b154 /system/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+80)
  native: #04 pc 00000000000d4bc0 /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
  native: #05 pc 000000000000082c /data/dalvik-cache/arm64/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+144)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:323)
  at android.os.Looper.loop(Looper.java:135)
  at com.android.server.SystemServer.run(SystemServer.java:290)
  at com.android.server.SystemServer.main(SystemServer.java:175)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:738)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:628)
  
"Binder_1" prio=5 tid=8 Native
  | group="main" sCount=1 dsCount=0 obj=0x12c610a0 self=0x5573e5c750
  | sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
  | state=S schedstat=( 796240075 863170759 3586 ) utm=50 stm=29 core=1 HZ=100
  | stack=0x7fa2647000-0x7fa2649000 stackSize=1013KB
  | held mutexes=
  kernel: __switch_to+0x70/0x7c
  kernel: binder_thread_read+0xd78/0xeb0
  kernel: binder_ioctl_write_read+0x178/0x24c
  kernel: binder_ioctl+0x2b0/0x5e0
  kernel: do_vfs_ioctl+0x4a4/0x578
  kernel: SyS_ioctl+0x5c/0x88
  kernel: cpu_switch_to+0x48/0x4c
  native: #00 pc 0000000000069cd0 /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 0000000000073cf4 /system/lib64/libc.so (ioctl+100)
  native: #02 pc 000000000002d6e8 /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+164)
  native: #03 pc 000000000002df3c /system/lib64/libbinder.so (_ZN7android14IPCThreadState20getAndExecuteCommandEv+24)
  native: #04 pc 000000000002e114 /system/lib64/libbinder.so (_ZN7android14IPCThreadState14joinThreadPoolEb+124)
  native: #05 pc 0000000000036c38 /system/lib64/libbinder.so (???)
  native: #06 pc 000000000001579c /system/lib64/libutils.so (_ZN7android6Thread11_threadLoopEPv+208)
  native: #07 pc 0000000000090598 /system/lib64/libandroid_runtime.so (_ZN7android14AndroidRuntime15javaThreadShellEPv+96)
  native: #08 pc 0000000000014fec /system/lib64/libutils.so (???)
  native: #09 pc 0000000000067754 /system/lib64/libc.so (_ZL15__pthread_startPv+52)
  native: #10 pc 000000000001c644 /system/lib64/libc.so (__start_thread+16)
  (no managed stack frames)
... //此处省略剩余的N个线程.
  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

  • 11

  • 12

  • 13

  • 14

  • 15

  • 16

  • 17

  • 18

  • 19

  • 20

  • 21

  • 22

  • 23

  • 24

  • 25

  • 26

  • 27

  • 28

  • 29

  • 30

  • 31

  • 32

  • 33

  • 34

  • 35

  • 36

  • 37

  • 38

  • 39

  • 40

  • 41

  • 42

  • 43

  • 44

  • 45

  • 46

  • 47

  • 48

  • 49

  • 50

  • 51

  • 52

  • 53

  • 54

  • 55

  • 56

  • 57

  • 58

  • 59

  • 60

  • 61

  • 62

  • 63

  • 64

  • 65

  • 66

  • 67

  • 68

  • 69

  • 70

  • 71

接下来从虚拟机角度说说目标进程收到该信号的处理过程,每一行关键信息都说明其所对应的输出方法。

1.1 trace参数解读

"Binder_1" prio=5 tid=8 Native
  | group="main" sCount=1 dsCount=0 obj=0x12c610a0 self=0x5573e5c750
  | sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
  | state=S schedstat=( 796240075 863170759 3586 ) utm=50 stm=29 core=1 HZ=100
  | stack=0x7fa2647000-0x7fa2649000 stackSize=1013KB
  | held mutexes=
  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

说明:

第0行:
线程名: Binder_1(如有daemon则代表守护线程)
prio: 线程优先级
tid: 线程内部id
线程状态: NATIVE
第1行:
group: 线程所属的线程组
sCount: 线程挂起次数
dsCount: 用于调试的线程挂起次数
obj: 当前线程关联的java线程对象
self: 当前线程地址
第2行:
sysTid:线程真正意义上的tid
nice: 调度有优先级
cgrp: 进程所属的进程调度组
sched: 调度策略
handle: 函数处理地址
第3行:
state: 线程状态
schedstat: CPU调度时间统计, 见proc/[pid]/task/[tid]/schedstat
utm/stm: 用户态/内核态的CPU时间(单位是jiffies), 见proc/[pid]/task/[tid]/stat
core: 该线程的最后运行所在核
HZ: 时钟频率
第4行:
stack:线程栈的地址区间
stackSize:栈的大小
第5行:
mutex: 所持有mutex类型,有独占锁exclusive和共享锁shared两类
schedstat含义说明:

nice值越小则优先级越高。此处nice=-2, 可见优先级还是比较高的;

schedstat括号中的3个数字依次是Running、Runable、Switch,紧接着的是utm和stm

Running时间:CPU运行的时间,单位ns
Runable时间:RQ队列的等待时间,单位ns
Switch次数:CPU调度切换次数
utm: 该线程在用户态所执行的时间,单位是jiffies,jiffies定义为sysconf(_SC_CLK_TCK),默认等于10ms
stm: 该线程在内核态所执行的时间,单位是jiffies,默认等于10ms
可见,该线程Running=186667489018ns,也约等于186667ms。在CPU运行时间包括用户态(utm)和内核态(stm)。utm + stm = (12112 + 6554) ×10 ms = 186666ms。

结论:utm + stm = schedstat第一个参数值。

1.2 涉及代码

/art/runtime/
    - signal_catcher.cc
    - runtime.cc
    - intern_table.cc
    - thread_list.cc
    - java_vm_ext.cc
    - class_linker.cc
    - gc/heap.cc
  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

二 ART信号捕获

2.1 SignalCatcher

SignalCatcher.cc

void* SignalCatcher::Run(void* arg) {
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  Runtime* runtime = Runtime::Current();
  
  Thread* self = Thread::Current();
  //当前进程状态处于非Runnable是
  DCHECK_NE(self->GetState(), kRunnable);
  {
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
    signal_catcher->cond_.Broadcast(self);
  }

  //设置需要handle的信号
  SignalSet signals;
  signals.Add(SIGQUIT); //信号3
  signals.Add(SIGUSR1); //信号10

  while (true) {
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    case SIGQUIT:
      //收到信号3 【见小节2.2】
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}
  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

  • 11

  • 12

  • 13

  • 14

  • 15

  • 16

  • 17

  • 18

  • 19

  • 20

  • 21

  • 22

  • 23

  • 24

  • 25

  • 26

  • 27

  • 28

  • 29

  • 30

  • 31

  • 32

  • 33

  • 34

  • 35

  • 36

  • 37

  • 38

  • 39

2.2 SignalCatcher::HandleSigQuit

signal_catcher.cc

void SignalCatcher::HandleSigQuit() {
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  os << "\n" << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";

  DumpCmdLine(os);

  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";
  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";
  // [见小节2.3]
  runtime->DumpForSigQuit(os);

  os << "----- end " << getpid() << " -----\n";
  // [见小节3.7]
  Output(os.str());
}
  • 1

  • 2

  • 3

  • 4

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

  • 11

  • 12

  • 13

  • 14

  • 15

  • 16

  • 17

  • 18

2.3 Runtime::DumpForSigQuit

runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {
  GetClassLinker()->DumpForSigQuit(os); //[见小节3.1]
  GetInternTable()->DumpForSigQuit(os); //[见小节3.2]
  GetJavaVM()->DumpForSigQuit(os); //[见小节3.3]
  GetHeap()->DumpForSigQuit(os); //[见小节3.4]
  TrackedAllocators::Dump(os);
  os << "\n";

  thread_list_->DumpForSigQuit(os); //[见小节3.5]
  BaseMutex::DumpAll(os);
}

原文链接:https://blog.csdn.net/liuwg1226/article/details/113901721

友情推荐:

Android 开发干货集锦

至此,本篇已结束。转载网络的文章,小编觉得很优秀,欢迎点击阅读原文,支持原创作者,如有侵权,恳请联系小编删除,欢迎您的建议与指正。同时期待您的关注,感谢您的阅读,谢谢!

点个在看,方便您使用时快速查找