mysql 查看等待事件_PostgreSQL等待事件-锁等待分析

阅读使人充实,讨论使人敏捷,写作使人精确。bef1b79b89cc31d5d7e50128b5a4210e.pngDB印象推荐搜索PostgreSQLOracleMySQL分布式>>>锁等待处理概述锁等待是我们比较常见的问题,无论是Oracle、MySQL还是PostgreSQL,都设计了大量的锁来保证并发操作的数据一致性,同时,当锁等待产生时会以队列方式进行,先到先得,后者依序排队等候。在高并发场景中,锁等待往往在某些凑巧的情况下能发挥非常巨大的杀伤力,严重的锁等待队列,简单的行锁等待就可以将系统的CPU打垮。

例如一个表的操作非常频繁,如果刚好前面有一个长事务没有完成,然后又刚好有事务需要获得一个排他锁,那么接下来的频繁DML请求都会被堵塞,如果队列中再来一个DDL,那么后果就更加严重。对于PG来说,解决锁等待通常有以下几种方式:1.设置lock_timeout、deadlock_timeout让等待自动超时,以便进程及时从等待队列中退出。

2.杀掉锁资源的阻塞源头,即持有锁的源头进程holder(或者叫root blocker):select pg_terminate_backend(holder_pid);

注意这里不是杀掉waiter,尤其是在高并发场景,waiter是杀不完的,比如Oracle中杀掉enq: TX - row lock contention的等待会话并非治本办法。

3.最后,就是整改优化业务逻辑了。

锁问题的处理其实并不复杂,关键是如何找到持有锁的源头,下面使用Tbase(基于pgxc)分析演示。

>>>锁等待场景模拟分析

首先模拟一个最简单的行锁等待现场:

在pgxc:cn001节点开启会话session A

显式事务不提交testdb=# begin;

BEGIN

testdb=# update aken set name='aa' where name='aa' and id =1;

UPDATE 1

testdb=#

在pgxc:cn001节点session B

在session 1未提交的情况下更新同一行,事务挂起:postgres=# \c testdb;

You are now connected to database "testdb" as user "dbmgr".

testdb=# update aken set name='aa' where name='aa' and id =1;

pgxc:cn002节点session C

在session 1、session2未提交的情况下更新同一行,事务挂起:postgres=# \c testdb

You are now connected to database "testdb" as user "dbmgr".

testdb=# update aken set name='aa' where name='aa' and id =1;

接着观察DN主节点的日志:

可以在csv日志中直接搜索关键字“waiting”来查看相关信息:2020-10-29 17:51:51.312 CST,"dbmgr","testdb",12240,coord(0,0),"192.168.155.177:38707",5f9a90b3.2fd0,coord(0,0),1,"UPDATE waiting",

2020-10-29 17:51:47 CST,60/5771127,1743392585,LOG,00000,"process 12240 still waiting for ShareLock on transaction 1743392584 after 1000.042 ms",

"Process holding the lock: 9724. Wait queue: 12240.",,,,"while updating tuple (0,11) in relation ""aken""",

"UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))",,,"pgxc:cn001"

2020-10-29 17:55:47.579 CST,"dbmgr","testdb",2406,coord(0,0),"192.168.136.55:42768",5f9a9192.966,coord(0,0),1,"UPDATE waiting",

2020-10-29 17:55:30 CST,100/6995248,1743392586,LOG,00000,"process 2406 still waiting for ExclusiveLock on tuple (0,11) of relation 19054 of database 18466 after 1000.044 ms",

"Process holding the lock: 12240. Wait queue: 2406.",,,,,"UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))",,,"pgxc:cn002"

上面的日志可以看出:

1.由pgxc:cn001发起的事务transaction-1743392585的进程12240在请求ShareLock on transaction 1743392584时被进程9724的事务1743392584阻塞;

2.由pgxc:cn002发起的事务transaction-1743392586的进程2406在请求ExclusiveLock on tuple (0,11) of relation 19054 of database 18466时被进程12240阻塞;

3.整个等待链为:进程9724(root lock holder)--阻塞-->进程12240(ShareLock waiter)--阻塞-->进程2406(ExclusiveLock waiter)。

除了查看日志信息,我们也可以在DN主节点进入对应的database通过相关的监控视图查看lockinfo、wait_event信息。

1.查看lock信息:

可以看到各种mode的锁的进程持有、等待请况。postgres=# \c testdb;

You are now connected to database "testdb" as user "dbmgr".

testdb=# select pid,CASE granted WHEN 'f' THEN 'waiter' ELSE 'holder' END as pid_role,mode,pg_blocking_pids(pid) bloker_pid,relation::regclass,virtualxid,virtualtransaction,transactionid from pg_locks

testdb-# where pid<>pg_backend_pid() order by pid,pid_role;

pid | pid_role | mode | bloker_pid | relation | virtualxid | virtualtransaction | transactionid

-------+----------+------------------+------------+----------+-------------+--------------------+---------------

2406 | holder | RowExclusiveLock | {12240} | aken | | 100/6995248 |

2406 | holder | ExclusiveLock | {12240} | | 100/6995248 | 100/6995248 |

2406 | holder | ExclusiveLock | {12240} | | | 100/6995248 | 1743392586

2406 | waiter | ExclusiveLock | {12240} | aken | | 100/6995248 |

9724 | holder | RowExclusiveLock | {} | aken | | 57/6548285 |

9724 | holder | ExclusiveLock | {} | | 57/6548285 | 57/6548285 |

9724 | holder | ExclusiveLock | {} | | | 57/6548285 | 1743392584

12240 | holder | ExclusiveLock | {9724} | | | 60/5771127 | 1743392585

12240 | holder | ExclusiveLock | {9724} | | 60/5771127 | 60/5771127 |

12240 | holder | RowExclusiveLock | {9724} | aken | | 60/5771127 |

12240 | holder | ExclusiveLock | {9724} | aken | | 60/5771127 |

12240 | waiter | ShareLock | {9724} | | | 60/5771127 | 1743392584

(12 rows)

testdb=#

2.查看waitevent信息:testdb=# select pid waiter_pid,pg_blocking_pids(pid) holder_pid,backend_type,state,wait_event_type,wait_event,query from pg_stat_activity where wait_event_type = 'Lock' and pid!=pg_backend_pid();

waiter_pid | holder_pid | backend_type | state | wait_event_type | wait_event | query

------------+------------+----------------+--------+-----------------+---------------+----------------------------------------------------------------------------

12240 | {9724} | client backend | active | Lock | transactionid | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))

2406 | {12240} | client backend | active | Lock | tuple | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))

(2 rows)

testdb=#

testdb=# select pid waiter_pid,pg_blocking_pids(pid) holder_pid,backend_type,state,wait_event_type,wait_event,query from pg_stat_activity where pid in (2406,9724,12240);

waiter_pid | holder_pid | backend_type | state | wait_event_type | wait_event | query

------------+------------+----------------+---------------------+-----------------+---------------+----------------------------------------------------------------------------

9724 | {} | client backend | idle in transaction | Client | ClientRead | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))

12240 | {9724} | client backend | active | Lock | transactionid | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))

2406 | {12240} | client backend | active | Lock | tuple | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))

(3 rows)

testdb=#

这里涉及到两大类等待事件:

1.两个waiter进程的wait_event_type=lock,state=active

说明处于等待链lockchain中,持续等待holder持有的锁资源释放。高并发场景、业务频繁重试类场景中会出现大量的处于active的lock等待进程。

2.持有者holder的wait_event_type=Client,state=idle in transaction

说明该holder执行完事务后处于空闲状态,正等待客户端发送新的请求,常见于业务框架代码忘记提交的场景。

除了上面的lock和client类型的waitevent type,到目前PostgreSQL共9种(截至pg-13)。因为我这里使用的tbase内核为pg10版本(共10种):

src\include\pgstat.h/* ----------

* Wait Classes

* ----------

*/

#define PG_WAIT_LWLOCK 0x01000000U

#define PG_WAIT_LOCK          0x03000000U <<

#define PG_WAIT_BUFFER_PIN    0x04000000U

#define PG_WAIT_ACTIVITY 0x05000000U

#define PG_WAIT_CLIENT 0x06000000U <<

#define PG_WAIT_EXTENSION     0x07000000U

#define PG_WAIT_IPC 0x08000000U

#define PG_WAIT_TIMEOUT       0x09000000U

#define PG_WAIT_IO 0x0A000000U

然后每个大类再分多个小类,下面是lock类型细分的11个小类:const char *const LockTagTypeNames[] = {

"relation",

"extend",

"page",

"tuple",

"transactionid",

"virtualxid",

"speculative token",

"object",

"userlock",

"advisory"

};

如果无法从pg_lock查看到相关的锁信息,我们还可以从进程的stack得到锁等待的锁是哪个:

在DN主节点查看进程信息# waiter-12240:

[aken@dbhost]$ ps -ef | grep 12240 | grep -v grep

tbase 12240 10172 0 17:51 ? 00:00:00 postgres: dbmgr testdb 192.168.155.177(38707) UPDATE waiting

# waiter-2406:

[aken@dbhost]$ ps -ef | grep 2406 | grep -v grep

tbase 2406 10172 0 17:55 ? 00:00:00 postgres: dbmgr testdb 192.168.136.55(42768) UPDATE waiting

[aken@dbhost]$

# holder-9724:

[aken@dbhost]$ ps -ef | grep 9724 | grep -v grep

tbase 9724 10172 0 17:51 ? 00:00:00 postgres: dbmgr testdb 192.168.155.177(38388) idle in transaction

接着打印stack信息, the transactionid waiter info:[aken@dbhost]$ pstack 12240

#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from lib64/libc.so.6

#1 0x0000000000973cfd in WaitEventSetWaitBlock (set=0x229f9e0, cur_timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1) at latch.c:1045

#2 0x0000000000973bd8 in WaitEventSetWait (set=0x229f9e0, timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1, wait_event_info=50331652) at latch.c:997

#3 0x00000000009734e8 in WaitLatchOrSocket (latch=0x7f53678a4a24, wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385

#4 0x00000000009733e9 in WaitLatch (latch=0x7f53678a4a24, wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339

#5 0x000000000098a262 in ProcSleep (locallock=0x1fe68b8, lockMethodTable=0xd2d200 ) at proc.c:1290

#6 0x00000000009844fd in WaitOnLock (locallock=0x1fe68b8, owner=0x2130648) at lock.c:1826

#7 0x000000000098334d in LockAcquireExtendedXC (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:1047

#8 0x0000000000982a3f in LockAcquireExtended (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:734

#9 0x00000000009829b8 in LockAcquire (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:693

#10 0x00000000009819bf in XactLockTableWait (xid=1743392584, rel=0x7f4f0b611b98, ctid=0x7ffffefd6844, oper=XLTW_Update) at lmgr.c:602

#11 0x00000000004e5ccb in heap_update (relation=0x7f4f0b611b98, otid=0x7ffffefd6ac0, newtup=0x229e458, cid=0, crosscheck=0x0, wait=1 '\001', hufd=0x7ffffefd69d0, lockmode=0x7ffffefd69cc) at heapam.c:4377

#12 0x0000000000750158 in ExecUpdate (mtstate=0x229cb98, tupleid=0x7ffffefd6ac0, oldtuple=0x0, slot=0x229f500, planSlot=0x229ce28, epqstate=0x229cc60, estate=0x229c848, canSetTag=1 '\001') at nodeModifyTable.c:1358

#13 0x0000000000751d0e in ExecModifyTable (pstate=0x229cb98) at nodeModifyTable.c:2338

#14 0x00000000007257ae in ExecProcNodeFirst (node=0x229cb98) at execProcnode.c:529

#15 0x000000000071cc10 in ExecProcNode (node=0x229cb98) at ../../../src/include/executor/executor.h:273

#16 0x000000000071f4b9 in ExecutePlan (estate=0x229c848, planstate=0x229cb98, use_parallel_mode=0 '\000', operation=CMD_UPDATE, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2298880, execute_once=1 '\001') at execMain.c:1955

#17 0x000000000071d249 in standard_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:465

#18 0x00007f53e2b736e1 in pgss_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_statements.c:893

#19 0x00007f53e1b27a5c in pgsl_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_log.c:494

#20 0x000000000071d0c5 in ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:406

#21 0x00000000009a87ce in ProcessQuery (plan=0x2299940, sourceText=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))", params=0x0, queryEnv=0x0, dest=0x2298880, completionTag=0x7ffffefd7250 "") at pquery.c:209

#22 0x00000000009ab157 in PortalRunMulti (portal=0x20286e8, isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0x2298880, altdest=0x2298880, completionTag=0x7ffffefd7250 "") at pquery.c:2048

#23 0x00000000009aa0b3 in PortalRun (portal=0x20286e8, count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', dest=0x2298880, altdest=0x2298880, completionTag=0x7ffffefd7250 "") at pquery.c:1320

#24 0x00000000009a16b6 in exec_simple_query (query_string=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))") at postgres.c:1508

#25 0x00000000009a6bc2 in PostgresMain (argc=1, argv=0x2001708, dbname=0x2001650 "testdb", username=0x2001638 "dbmgr") at postgres.c:5409

#26 0x00000000008ecc32 in BackendRun (port=0x2021060) at postmaster.c:5013

#27 0x00000000008ec3db in BackendStartup (port=0x2021060) at postmaster.c:4685

#28 0x00000000008e824a in ServerLoop () at postmaster.c:1990

#29 0x00000000008e78d1 in PostmasterMain (argc=5, argv=0x1fbc600) at postmaster.c:1598

#30 0x00000000007f360e in main (argc=5, argv=0x1fbc600) at main.c:228

主要关注以下几行:#9、#8、#7行申请锁,lockmode为5(ShareLock)#4这行打印等待事件为wait_event_info=50331652#0行进入epoll_wait观察list链表并sleep等待。#0  0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from /lib64/libc.so.6

#4  0x00000000009733e9 in WaitLatch (latch=0x7f53678a4a24, wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339

#7 0x000000000098334d in LockAcquireExtendedXC (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:1047

#8 0x0000000000982a3f in LockAcquireExtended (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:734

#9  0x00000000009829b8 in LockAcquire (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:693

通过wait_event_info=50331652通过以下公式可以推算到进程具体是在等待什么。

公式来源详见另一篇文章:

《PostgreSQL等待事件-源码窥探》classId=wait_event_info & 0xFF000000 --意为保留wait_event_info的的高二位。

eventId=wait_event_info & 0x0000FFFF --意为保留wait_event_info的的低四位。

具体换算方法为:

1.首先,将wait_event_info=50331652转化为无符号十六进制为0x3000004U[root@DBhost ~]# printf "Decimal十六进制: %d\nHex十六进制: %x\nOctal八进制: %o\n" 50331652 50331652 50331652

Decimal十六进制: 50331652

Hex十六进制: 3000004

Octal八进制: 300000004

[root@DBhost ~]#

2.接着求出等待事件所属大类 classid:ClassID= wait_event_info & 0xFF000000 = 0x3000004U & 0xFF000000 = 0x3000000U

运算结果0x3000000U对应等待事件类型wait class定义的PG_WAIT_CLIENT类型:#define PG_WAIT_LOCK 0x03000000U

3. 求wait_event的locktab_type,运算取值:locktag_type=wait_event_info & 0x0000FFFF = 0x06000000U & 0x0000FFFF =0x00000004U

运算值0x00000004U,对应十进制值为4

4.根据locktab_type求得wait_event=transactionwait_event=LockTagTypeNames[locktag_type]=LockTagTypeNames[4] = transactionid

继续打印第二个等待进程的stack信息,the tuple waiter info:[aken@dbhost]$ pstack 2406

#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from lib64/libc.so.6

#1 0x0000000000973cfd in WaitEventSetWaitBlock (set=0x229f9e0, cur_timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1) at latch.c:1045

#2 0x0000000000973bd8 in WaitEventSetWait (set=0x229f9e0, timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1, wait_event_info=50331651) at latch.c:997

#3 0x00000000009734e8 in WaitLatchOrSocket (latch=0x7f53678a58a4, wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331651) at latch.c:385

#4 0x00000000009733e9 in WaitLatch (latch=0x7f53678a58a4, wakeEvents=1, timeout=0, wait_event_info=50331651) at latch.c:339

#5 0x000000000098a262 in ProcSleep (locallock=0x1fe6918, lockMethodTable=0xd2d200 ) at proc.c:1290

#6 0x00000000009844fd in WaitOnLock (locallock=0x1fe6918, owner=0x2130648) at lock.c:1826

#7 0x000000000098334d in LockAcquireExtendedXC (locktag=0x7ffffefd6770, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:1047

#8 0x0000000000982a3f in LockAcquireExtended (locktag=0x7ffffefd6770, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:734

#9 0x00000000009829b8 in LockAcquire (locktag=0x7ffffefd6770, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:693

#10 0x00000000009817a3 in LockTuple (relation=0x7f4f0b611b98, tid=0x7ffffefd6844, lockmode=7) at lmgr.c:476

#11 0x00000000004e8ae2 in heap_acquire_tuplock (relation=0x7f4f0b611b98, tid=0x7ffffefd6844, mode=LockTupleNoKeyExclusive, wait_policy=LockWaitBlock, have_tuple_lock=0x7ffffefd6833 "") at heapam.c:5856

#12 0x00000000004e5ca7 in heap_update (relation=0x7f4f0b611b98, otid=0x7ffffefd6ac0, newtup=0x229e458, cid=0, crosscheck=0x0, wait=1 '\001', hufd=0x7ffffefd69d0, lockmode=0x7ffffefd69cc) at heapam.c:4375

#13 0x0000000000750158 in ExecUpdate (mtstate=0x229cb98, tupleid=0x7ffffefd6ac0, oldtuple=0x0, slot=0x229f500, planSlot=0x229ce28, epqstate=0x229cc60, estate=0x229c848, canSetTag=1 '\001') at nodeModifyTable.c:1358

#14 0x0000000000751d0e in ExecModifyTable (pstate=0x229cb98) at nodeModifyTable.c:2338

#15 0x00000000007257ae in ExecProcNodeFirst (node=0x229cb98) at execProcnode.c:529

#16 0x000000000071cc10 in ExecProcNode (node=0x229cb98) at ../../../src/include/executor/executor.h:273

#17 0x000000000071f4b9 in ExecutePlan (estate=0x229c848, planstate=0x229cb98, use_parallel_mode=0 '\000', operation=CMD_UPDATE, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2296e68, execute_once=1 '\001') at execMain.c:1955

#18 0x000000000071d249 in standard_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:465

#19 0x00007f53e2b736e1 in pgss_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_statements.c:893

#20 0x00007f53e1b27a5c in pgsl_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_log.c:494

#21 0x000000000071d0c5 in ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:406

#22 0x00000000009a87ce in ProcessQuery (plan=0x2297f28, sourceText=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))", params=0x0, queryEnv=0x0, dest=0x2296e68, completionTag=0x7ffffefd7250 "") at pquery.c:209

#23 0x00000000009ab157 in PortalRunMulti (portal=0x20286e8, isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0x2296e68, altdest=0x2296e68, completionTag=0x7ffffefd7250 "") at pquery.c:2048

#24 0x00000000009aa0b3 in PortalRun (portal=0x20286e8, count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', dest=0x2296e68, altdest=0x2296e68, completionTag=0x7ffffefd7250 "") at pquery.c:1320

#25 0x00000000009a16b6 in exec_simple_query (query_string=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))") at postgres.c:1508

#26 0x00000000009a6bc2 in PostgresMain (argc=1, argv=0x2001708, dbname=0x2001650 "testdb", username=0x2001638 "dbmgr") at postgres.c:5409

#27 0x00000000008ecc32 in BackendRun (port=0x2021060) at postmaster.c:5013

#28 0x00000000008ec3db in BackendStartup (port=0x2021060) at postmaster.c:4685

#29 0x00000000008e824a in ServerLoop () at postmaster.c:1990

#30 0x00000000008e78d1 in PostmasterMain (argc=5, argv=0x1fbc600) at postmaster.c:1598

#31 0x00000000007f360e in main (argc=5, argv=0x1fbc600) at main.c:228

这里的wait_event_info=50331651,转化为无符号的十六进制为0x03000003U:[root@DBhost ~]# printf "Decimal十六进制: %d\nHex十六进制: %x\nOctal八进制: %o\n" 50331651 50331651 50331651

Decimal十六进制: 50331651

Hex十六进制: 3000003

Octal八进制: 300000003

[root@DBhost ~]#

具体的转化方法和上面一样:

1.求等待事件所属大类 classid的运算取值:wait_event_type = wait_event_info & 0xFF000000 = 0x03000003U & 0xFF000000 = 0x03000000U

计算结果为0x03000000U,对应PG_WAIT_LOCK类等待事件:#define PG_WAIT_LOCK 0x03000000U

2. 求wait_event的locktab_type,运算取值得到0x00000003U:locktag_type=wait_event_info & 0x0000FFFF = 0x03000003U & 0x0000FFFF = 0x00000003U

3.根据locktab_type求得wait_event=tuple

上面第2步的运算值为0x00000003U,对应十进制值为3,即LockTagTypeNames的下标:wait_event=LockTagTypeNames[locktag_type]=LockTagTypeNames[3] = tuple

最后,我们再看看root blocker的情况, 首先是进程stack info :[aken@dbhost]$ pstack 9724

#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from lib64/libc.so.6

#1 0x0000000000973cfd in WaitEventSetWaitBlock (set=0x1fff408, cur_timeout=-1, occurred_events=0x7ffffefd7280, nevents=1) at latch.c:1045

#2 0x0000000000973bd8 in WaitEventSetWait (set=0x1fff408, timeout=-1, occurred_events=0x7ffffefd7280, nevents=1, wait_event_info=100663296) at latch.c:997

#3 0x000000000076f77b in secure_read (port=0x2021060, ptr=0x10caa20 , len=8192) at be-secure.c:166

#4 0x000000000077c95c in pq_recvbuf () at pqcomm.c:986

#5 0x000000000077c9f6 in pq_getbyte () at pqcomm.c:1029

#6 0x00000000009a0473 in SocketBackend (inBuf=0x7ffffefd74c0) at postgres.c:517

#7 0x00000000009a0943 in ReadCommand (inBuf=0x7ffffefd74c0) at postgres.c:726

#8 0x00000000009a6a93 in PostgresMain (argc=1, argv=0x2001708, dbname=0x2001650 "testdb", username=0x2001638 "dbmgr") at postgres.c:5331

#9 0x00000000008ecc32 in BackendRun (port=0x2021060) at postmaster.c:5013

#10 0x00000000008ec3db in BackendStartup (port=0x2021060) at postmaster.c:4685

#11 0x00000000008e824a in ServerLoop () at postmaster.c:1990

#12 0x00000000008e78d1 in PostmasterMain (argc=5, argv=0x1fbc600) at postmaster.c:1598

#13 0x00000000007f360e in main (argc=5, argv=0x1fbc600) at main.c:228

[aken@dbhost]$

这里wait_event_info=100663296,十六进制 0x06000000U,取高两位得到等待事件类型为PG_WAIT_CLIENT:#define PG_WAIT_CLIENT 0x06000000U

这个和LOCK类型的等待事件已经不是一个类别,在PG_WAIT_CLIENT类型的等待事件中,又分9种小类标签:

Definition at line 898 of file pgstat.h.{

WAIT_EVENT_CLIENT_READ = PG_WAIT_CLIENT,

WAIT_EVENT_CLIENT_WRITE,

WAIT_EVENT_GSS_OPEN_SERVER,

WAIT_EVENT_LIBPQWALRECEIVER_CONNECT,

WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE,

WAIT_EVENT_SSL_OPEN_SERVER,

WAIT_EVENT_WAL_RECEIVER_WAIT_START,

WAIT_EVENT_WAL_SENDER_WAIT_WAL,

WAIT_EVENT_WAL_SENDER_WRITE_DATA,

} WaitEventClient;

最后通过pgstat_get_wait_client()函数返回具体的等待事件名称:pgstat_get_wait_client(WaitEventClient w)

{

const char *event_name = "unknown wait event";

switch (w)

{

case WAIT_EVENT_CLIENT_READ:

event_name = "ClientRead"; <<<<< 进程7274的等待大类为PG_WAIT_CLIENT,对应该类别的WAIT_EVENT_CLIENT_READ类型,即

break;

case WAIT_EVENT_CLIENT_WRITE:

event_name = "ClientWrite";

break;

.....此处省略后续代码....

上面只是做了对锁等待的测试模拟,实践中生产环境等待事件类案例分析诊断可阅读如下文章:

>>>参考资料1.http://amitkapila16.blogspot.com/2016/03/troubleshooting-waits-in-postgresql.html

2.https://www.postgresql.org/docs/13/monitoring-stats.html

2fd717b7d81f81bcba5d604b68f80978.png

感谢阅读93a308f9bf7a7425c5a4239835717bb4.png


版权声明:本文为weixin_34878397原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。