阅读使人充实,讨论使人敏捷,写作使人精确。DB印象推荐搜索PostgreSQLOracleMySQL分布式>>>锁等待处理概述锁等待是我们比较常见的问题,无论是Oracle、MySQL还是PostgreSQL,都设计了大量的锁来保证并发操作的数据一致性,同时,当锁等待产生时会以队列方式进行,先到先得,后者依序排队等候。在高并发场景中,锁等待往往在某些凑巧的情况下能发挥非常巨大的杀伤力,严重的锁等待队列,简单的行锁等待就可以将系统的CPU打垮。
例如一个表的操作非常频繁,如果刚好前面有一个长事务没有完成,然后又刚好有事务需要获得一个排他锁,那么接下来的频繁DML请求都会被堵塞,如果队列中再来一个DDL,那么后果就更加严重。对于PG来说,解决锁等待通常有以下几种方式:1.设置lock_timeout、deadlock_timeout让等待自动超时,以便进程及时从等待队列中退出。
2.杀掉锁资源的阻塞源头,即持有锁的源头进程holder(或者叫root blocker):select pg_terminate_backend(holder_pid);
注意这里不是杀掉waiter,尤其是在高并发场景,waiter是杀不完的,比如Oracle中杀掉enq: TX - row lock contention的等待会话并非治本办法。
3.最后,就是整改优化业务逻辑了。
锁问题的处理其实并不复杂,关键是如何找到持有锁的源头,下面使用Tbase(基于pgxc)分析演示。
>>>锁等待场景模拟分析
首先模拟一个最简单的行锁等待现场:
在pgxc:cn001节点开启会话session A
显式事务不提交testdb=# begin;
BEGIN
testdb=# update aken set name='aa' where name='aa' and id =1;
UPDATE 1
testdb=#
在pgxc:cn001节点session B
在session 1未提交的情况下更新同一行,事务挂起:postgres=# \c testdb;
You are now connected to database "testdb" as user "dbmgr".
testdb=# update aken set name='aa' where name='aa' and id =1;
pgxc:cn002节点session C
在session 1、session2未提交的情况下更新同一行,事务挂起:postgres=# \c testdb
You are now connected to database "testdb" as user "dbmgr".
testdb=# update aken set name='aa' where name='aa' and id =1;
接着观察DN主节点的日志:
可以在csv日志中直接搜索关键字“waiting”来查看相关信息:2020-10-29 17:51:51.312 CST,"dbmgr","testdb",12240,coord(0,0),"192.168.155.177:38707",5f9a90b3.2fd0,coord(0,0),1,"UPDATE waiting",
2020-10-29 17:51:47 CST,60/5771127,1743392585,LOG,00000,"process 12240 still waiting for ShareLock on transaction 1743392584 after 1000.042 ms",
"Process holding the lock: 9724. Wait queue: 12240.",,,,"while updating tuple (0,11) in relation ""aken""",
"UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))",,,"pgxc:cn001"
2020-10-29 17:55:47.579 CST,"dbmgr","testdb",2406,coord(0,0),"192.168.136.55:42768",5f9a9192.966,coord(0,0),1,"UPDATE waiting",
2020-10-29 17:55:30 CST,100/6995248,1743392586,LOG,00000,"process 2406 still waiting for ExclusiveLock on tuple (0,11) of relation 19054 of database 18466 after 1000.044 ms",
"Process holding the lock: 12240. Wait queue: 2406.",,,,,"UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))",,,"pgxc:cn002"
上面的日志可以看出:
1.由pgxc:cn001发起的事务transaction-1743392585的进程12240在请求ShareLock on transaction 1743392584时被进程9724的事务1743392584阻塞;
2.由pgxc:cn002发起的事务transaction-1743392586的进程2406在请求ExclusiveLock on tuple (0,11) of relation 19054 of database 18466时被进程12240阻塞;
3.整个等待链为:进程9724(root lock holder)--阻塞-->进程12240(ShareLock waiter)--阻塞-->进程2406(ExclusiveLock waiter)。
除了查看日志信息,我们也可以在DN主节点进入对应的database通过相关的监控视图查看lockinfo、wait_event信息。
1.查看lock信息:
可以看到各种mode的锁的进程持有、等待请况。postgres=# \c testdb;
You are now connected to database "testdb" as user "dbmgr".
testdb=# select pid,CASE granted WHEN 'f' THEN 'waiter' ELSE 'holder' END as pid_role,mode,pg_blocking_pids(pid) bloker_pid,relation::regclass,virtualxid,virtualtransaction,transactionid from pg_locks
testdb-# where pid<>pg_backend_pid() order by pid,pid_role;
pid | pid_role | mode | bloker_pid | relation | virtualxid | virtualtransaction | transactionid
-------+----------+------------------+------------+----------+-------------+--------------------+---------------
2406 | holder | RowExclusiveLock | {12240} | aken | | 100/6995248 |
2406 | holder | ExclusiveLock | {12240} | | 100/6995248 | 100/6995248 |
2406 | holder | ExclusiveLock | {12240} | | | 100/6995248 | 1743392586
2406 | waiter | ExclusiveLock | {12240} | aken | | 100/6995248 |
9724 | holder | RowExclusiveLock | {} | aken | | 57/6548285 |
9724 | holder | ExclusiveLock | {} | | 57/6548285 | 57/6548285 |
9724 | holder | ExclusiveLock | {} | | | 57/6548285 | 1743392584
12240 | holder | ExclusiveLock | {9724} | | | 60/5771127 | 1743392585
12240 | holder | ExclusiveLock | {9724} | | 60/5771127 | 60/5771127 |
12240 | holder | RowExclusiveLock | {9724} | aken | | 60/5771127 |
12240 | holder | ExclusiveLock | {9724} | aken | | 60/5771127 |
12240 | waiter | ShareLock | {9724} | | | 60/5771127 | 1743392584
(12 rows)
testdb=#
2.查看waitevent信息:testdb=# select pid waiter_pid,pg_blocking_pids(pid) holder_pid,backend_type,state,wait_event_type,wait_event,query from pg_stat_activity where wait_event_type = 'Lock' and pid!=pg_backend_pid();
waiter_pid | holder_pid | backend_type | state | wait_event_type | wait_event | query
------------+------------+----------------+--------+-----------------+---------------+----------------------------------------------------------------------------
12240 | {9724} | client backend | active | Lock | transactionid | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))
2406 | {12240} | client backend | active | Lock | tuple | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))
(2 rows)
testdb=#
testdb=# select pid waiter_pid,pg_blocking_pids(pid) holder_pid,backend_type,state,wait_event_type,wait_event,query from pg_stat_activity where pid in (2406,9724,12240);
waiter_pid | holder_pid | backend_type | state | wait_event_type | wait_event | query
------------+------------+----------------+---------------------+-----------------+---------------+----------------------------------------------------------------------------
9724 | {} | client backend | idle in transaction | Client | ClientRead | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))
12240 | {9724} | client backend | active | Lock | transactionid | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))
2406 | {12240} | client backend | active | Lock | tuple | UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))
(3 rows)
testdb=#
这里涉及到两大类等待事件:
1.两个waiter进程的wait_event_type=lock,state=active
说明处于等待链lockchain中,持续等待holder持有的锁资源释放。高并发场景、业务频繁重试类场景中会出现大量的处于active的lock等待进程。
2.持有者holder的wait_event_type=Client,state=idle in transaction
说明该holder执行完事务后处于空闲状态,正等待客户端发送新的请求,常见于业务框架代码忘记提交的场景。
除了上面的lock和client类型的waitevent type,到目前PostgreSQL共9种(截至pg-13)。因为我这里使用的tbase内核为pg10版本(共10种):
src\include\pgstat.h/* ----------
* Wait Classes
* ----------
*/
#define PG_WAIT_LWLOCK 0x01000000U
#define PG_WAIT_LOCK 0x03000000U <<
#define PG_WAIT_BUFFER_PIN 0x04000000U
#define PG_WAIT_ACTIVITY 0x05000000U
#define PG_WAIT_CLIENT 0x06000000U <<
#define PG_WAIT_EXTENSION 0x07000000U
#define PG_WAIT_IPC 0x08000000U
#define PG_WAIT_TIMEOUT 0x09000000U
#define PG_WAIT_IO 0x0A000000U
然后每个大类再分多个小类,下面是lock类型细分的11个小类:const char *const LockTagTypeNames[] = {
"relation",
"extend",
"page",
"tuple",
"transactionid",
"virtualxid",
"speculative token",
"object",
"userlock",
"advisory"
};
如果无法从pg_lock查看到相关的锁信息,我们还可以从进程的stack得到锁等待的锁是哪个:
在DN主节点查看进程信息# waiter-12240:
[aken@dbhost]$ ps -ef | grep 12240 | grep -v grep
tbase 12240 10172 0 17:51 ? 00:00:00 postgres: dbmgr testdb 192.168.155.177(38707) UPDATE waiting
# waiter-2406:
[aken@dbhost]$ ps -ef | grep 2406 | grep -v grep
tbase 2406 10172 0 17:55 ? 00:00:00 postgres: dbmgr testdb 192.168.136.55(42768) UPDATE waiting
[aken@dbhost]$
# holder-9724:
[aken@dbhost]$ ps -ef | grep 9724 | grep -v grep
tbase 9724 10172 0 17:51 ? 00:00:00 postgres: dbmgr testdb 192.168.155.177(38388) idle in transaction
接着打印stack信息, the transactionid waiter info:[aken@dbhost]$ pstack 12240
#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from lib64/libc.so.6
#1 0x0000000000973cfd in WaitEventSetWaitBlock (set=0x229f9e0, cur_timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1) at latch.c:1045
#2 0x0000000000973bd8 in WaitEventSetWait (set=0x229f9e0, timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1, wait_event_info=50331652) at latch.c:997
#3 0x00000000009734e8 in WaitLatchOrSocket (latch=0x7f53678a4a24, wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385
#4 0x00000000009733e9 in WaitLatch (latch=0x7f53678a4a24, wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339
#5 0x000000000098a262 in ProcSleep (locallock=0x1fe68b8, lockMethodTable=0xd2d200 ) at proc.c:1290
#6 0x00000000009844fd in WaitOnLock (locallock=0x1fe68b8, owner=0x2130648) at lock.c:1826
#7 0x000000000098334d in LockAcquireExtendedXC (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:1047
#8 0x0000000000982a3f in LockAcquireExtended (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:734
#9 0x00000000009829b8 in LockAcquire (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:693
#10 0x00000000009819bf in XactLockTableWait (xid=1743392584, rel=0x7f4f0b611b98, ctid=0x7ffffefd6844, oper=XLTW_Update) at lmgr.c:602
#11 0x00000000004e5ccb in heap_update (relation=0x7f4f0b611b98, otid=0x7ffffefd6ac0, newtup=0x229e458, cid=0, crosscheck=0x0, wait=1 '\001', hufd=0x7ffffefd69d0, lockmode=0x7ffffefd69cc) at heapam.c:4377
#12 0x0000000000750158 in ExecUpdate (mtstate=0x229cb98, tupleid=0x7ffffefd6ac0, oldtuple=0x0, slot=0x229f500, planSlot=0x229ce28, epqstate=0x229cc60, estate=0x229c848, canSetTag=1 '\001') at nodeModifyTable.c:1358
#13 0x0000000000751d0e in ExecModifyTable (pstate=0x229cb98) at nodeModifyTable.c:2338
#14 0x00000000007257ae in ExecProcNodeFirst (node=0x229cb98) at execProcnode.c:529
#15 0x000000000071cc10 in ExecProcNode (node=0x229cb98) at ../../../src/include/executor/executor.h:273
#16 0x000000000071f4b9 in ExecutePlan (estate=0x229c848, planstate=0x229cb98, use_parallel_mode=0 '\000', operation=CMD_UPDATE, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2298880, execute_once=1 '\001') at execMain.c:1955
#17 0x000000000071d249 in standard_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:465
#18 0x00007f53e2b736e1 in pgss_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_statements.c:893
#19 0x00007f53e1b27a5c in pgsl_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_log.c:494
#20 0x000000000071d0c5 in ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:406
#21 0x00000000009a87ce in ProcessQuery (plan=0x2299940, sourceText=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))", params=0x0, queryEnv=0x0, dest=0x2298880, completionTag=0x7ffffefd7250 "") at pquery.c:209
#22 0x00000000009ab157 in PortalRunMulti (portal=0x20286e8, isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0x2298880, altdest=0x2298880, completionTag=0x7ffffefd7250 "") at pquery.c:2048
#23 0x00000000009aa0b3 in PortalRun (portal=0x20286e8, count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', dest=0x2298880, altdest=0x2298880, completionTag=0x7ffffefd7250 "") at pquery.c:1320
#24 0x00000000009a16b6 in exec_simple_query (query_string=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))") at postgres.c:1508
#25 0x00000000009a6bc2 in PostgresMain (argc=1, argv=0x2001708, dbname=0x2001650 "testdb", username=0x2001638 "dbmgr") at postgres.c:5409
#26 0x00000000008ecc32 in BackendRun (port=0x2021060) at postmaster.c:5013
#27 0x00000000008ec3db in BackendStartup (port=0x2021060) at postmaster.c:4685
#28 0x00000000008e824a in ServerLoop () at postmaster.c:1990
#29 0x00000000008e78d1 in PostmasterMain (argc=5, argv=0x1fbc600) at postmaster.c:1598
#30 0x00000000007f360e in main (argc=5, argv=0x1fbc600) at main.c:228
主要关注以下几行:#9、#8、#7行申请锁,lockmode为5(ShareLock)#4这行打印等待事件为wait_event_info=50331652#0行进入epoll_wait观察list链表并sleep等待。#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#4 0x00000000009733e9 in WaitLatch (latch=0x7f53678a4a24, wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339
#7 0x000000000098334d in LockAcquireExtendedXC (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:1047
#8 0x0000000000982a3f in LockAcquireExtended (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:734
#9 0x00000000009829b8 in LockAcquire (locktag=0x7ffffefd67b0, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:693
通过wait_event_info=50331652通过以下公式可以推算到进程具体是在等待什么。
公式来源详见另一篇文章:
《PostgreSQL等待事件-源码窥探》classId=wait_event_info & 0xFF000000 --意为保留wait_event_info的的高二位。
eventId=wait_event_info & 0x0000FFFF --意为保留wait_event_info的的低四位。
具体换算方法为:
1.首先,将wait_event_info=50331652转化为无符号十六进制为0x3000004U[root@DBhost ~]# printf "Decimal十六进制: %d\nHex十六进制: %x\nOctal八进制: %o\n" 50331652 50331652 50331652
Decimal十六进制: 50331652
Hex十六进制: 3000004
Octal八进制: 300000004
[root@DBhost ~]#
2.接着求出等待事件所属大类 classid:ClassID= wait_event_info & 0xFF000000 = 0x3000004U & 0xFF000000 = 0x3000000U
运算结果0x3000000U对应等待事件类型wait class定义的PG_WAIT_CLIENT类型:#define PG_WAIT_LOCK 0x03000000U
3. 求wait_event的locktab_type,运算取值:locktag_type=wait_event_info & 0x0000FFFF = 0x06000000U & 0x0000FFFF =0x00000004U
运算值0x00000004U,对应十进制值为4
4.根据locktab_type求得wait_event=transactionwait_event=LockTagTypeNames[locktag_type]=LockTagTypeNames[4] = transactionid
继续打印第二个等待进程的stack信息,the tuple waiter info:[aken@dbhost]$ pstack 2406
#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from lib64/libc.so.6
#1 0x0000000000973cfd in WaitEventSetWaitBlock (set=0x229f9e0, cur_timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1) at latch.c:1045
#2 0x0000000000973bd8 in WaitEventSetWait (set=0x229f9e0, timeout=-1, occurred_events=0x7ffffefd62a0, nevents=1, wait_event_info=50331651) at latch.c:997
#3 0x00000000009734e8 in WaitLatchOrSocket (latch=0x7f53678a58a4, wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331651) at latch.c:385
#4 0x00000000009733e9 in WaitLatch (latch=0x7f53678a58a4, wakeEvents=1, timeout=0, wait_event_info=50331651) at latch.c:339
#5 0x000000000098a262 in ProcSleep (locallock=0x1fe6918, lockMethodTable=0xd2d200 ) at proc.c:1290
#6 0x00000000009844fd in WaitOnLock (locallock=0x1fe6918, owner=0x2130648) at lock.c:1826
#7 0x000000000098334d in LockAcquireExtendedXC (locktag=0x7ffffefd6770, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001', only_increment=0 '\000') at lock.c:1047
#8 0x0000000000982a3f in LockAcquireExtended (locktag=0x7ffffefd6770, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001') at lock.c:734
#9 0x00000000009829b8 in LockAcquire (locktag=0x7ffffefd6770, lockmode=7, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:693
#10 0x00000000009817a3 in LockTuple (relation=0x7f4f0b611b98, tid=0x7ffffefd6844, lockmode=7) at lmgr.c:476
#11 0x00000000004e8ae2 in heap_acquire_tuplock (relation=0x7f4f0b611b98, tid=0x7ffffefd6844, mode=LockTupleNoKeyExclusive, wait_policy=LockWaitBlock, have_tuple_lock=0x7ffffefd6833 "") at heapam.c:5856
#12 0x00000000004e5ca7 in heap_update (relation=0x7f4f0b611b98, otid=0x7ffffefd6ac0, newtup=0x229e458, cid=0, crosscheck=0x0, wait=1 '\001', hufd=0x7ffffefd69d0, lockmode=0x7ffffefd69cc) at heapam.c:4375
#13 0x0000000000750158 in ExecUpdate (mtstate=0x229cb98, tupleid=0x7ffffefd6ac0, oldtuple=0x0, slot=0x229f500, planSlot=0x229ce28, epqstate=0x229cc60, estate=0x229c848, canSetTag=1 '\001') at nodeModifyTable.c:1358
#14 0x0000000000751d0e in ExecModifyTable (pstate=0x229cb98) at nodeModifyTable.c:2338
#15 0x00000000007257ae in ExecProcNodeFirst (node=0x229cb98) at execProcnode.c:529
#16 0x000000000071cc10 in ExecProcNode (node=0x229cb98) at ../../../src/include/executor/executor.h:273
#17 0x000000000071f4b9 in ExecutePlan (estate=0x229c848, planstate=0x229cb98, use_parallel_mode=0 '\000', operation=CMD_UPDATE, sendTuples=0 '\000', numberTuples=0, direction=ForwardScanDirection, dest=0x2296e68, execute_once=1 '\001') at execMain.c:1955
#18 0x000000000071d249 in standard_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:465
#19 0x00007f53e2b736e1 in pgss_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_statements.c:893
#20 0x00007f53e1b27a5c in pgsl_ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at pg_stat_log.c:494
#21 0x000000000071d0c5 in ExecutorRun (queryDesc=0x1fde558, direction=ForwardScanDirection, count=0, execute_once=1 '\001') at execMain.c:406
#22 0x00000000009a87ce in ProcessQuery (plan=0x2297f28, sourceText=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))", params=0x0, queryEnv=0x0, dest=0x2296e68, completionTag=0x7ffffefd7250 "") at pquery.c:209
#23 0x00000000009ab157 in PortalRunMulti (portal=0x20286e8, isTopLevel=1 '\001', setHoldSnapshot=0 '\000', dest=0x2296e68, altdest=0x2296e68, completionTag=0x7ffffefd7250 "") at pquery.c:2048
#24 0x00000000009aa0b3 in PortalRun (portal=0x20286e8, count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', dest=0x2296e68, altdest=0x2296e68, completionTag=0x7ffffefd7250 "") at pquery.c:1320
#25 0x00000000009a16b6 in exec_simple_query (query_string=0x21ab2a8 "UPDATE aken SET name = 'aa'::text WHERE ((name = 'aa'::text) AND (id = 1))") at postgres.c:1508
#26 0x00000000009a6bc2 in PostgresMain (argc=1, argv=0x2001708, dbname=0x2001650 "testdb", username=0x2001638 "dbmgr") at postgres.c:5409
#27 0x00000000008ecc32 in BackendRun (port=0x2021060) at postmaster.c:5013
#28 0x00000000008ec3db in BackendStartup (port=0x2021060) at postmaster.c:4685
#29 0x00000000008e824a in ServerLoop () at postmaster.c:1990
#30 0x00000000008e78d1 in PostmasterMain (argc=5, argv=0x1fbc600) at postmaster.c:1598
#31 0x00000000007f360e in main (argc=5, argv=0x1fbc600) at main.c:228
这里的wait_event_info=50331651,转化为无符号的十六进制为0x03000003U:[root@DBhost ~]# printf "Decimal十六进制: %d\nHex十六进制: %x\nOctal八进制: %o\n" 50331651 50331651 50331651
Decimal十六进制: 50331651
Hex十六进制: 3000003
Octal八进制: 300000003
[root@DBhost ~]#
具体的转化方法和上面一样:
1.求等待事件所属大类 classid的运算取值:wait_event_type = wait_event_info & 0xFF000000 = 0x03000003U & 0xFF000000 = 0x03000000U
计算结果为0x03000000U,对应PG_WAIT_LOCK类等待事件:#define PG_WAIT_LOCK 0x03000000U
2. 求wait_event的locktab_type,运算取值得到0x00000003U:locktag_type=wait_event_info & 0x0000FFFF = 0x03000003U & 0x0000FFFF = 0x00000003U
3.根据locktab_type求得wait_event=tuple
上面第2步的运算值为0x00000003U,对应十进制值为3,即LockTagTypeNames的下标:wait_event=LockTagTypeNames[locktag_type]=LockTagTypeNames[3] = tuple
最后,我们再看看root blocker的情况, 首先是进程stack info :[aken@dbhost]$ pstack 9724
#0 0x00007f53ea7c5eb3 in __epoll_wait_nocancel () from lib64/libc.so.6
#1 0x0000000000973cfd in WaitEventSetWaitBlock (set=0x1fff408, cur_timeout=-1, occurred_events=0x7ffffefd7280, nevents=1) at latch.c:1045
#2 0x0000000000973bd8 in WaitEventSetWait (set=0x1fff408, timeout=-1, occurred_events=0x7ffffefd7280, nevents=1, wait_event_info=100663296) at latch.c:997
#3 0x000000000076f77b in secure_read (port=0x2021060, ptr=0x10caa20 , len=8192) at be-secure.c:166
#4 0x000000000077c95c in pq_recvbuf () at pqcomm.c:986
#5 0x000000000077c9f6 in pq_getbyte () at pqcomm.c:1029
#6 0x00000000009a0473 in SocketBackend (inBuf=0x7ffffefd74c0) at postgres.c:517
#7 0x00000000009a0943 in ReadCommand (inBuf=0x7ffffefd74c0) at postgres.c:726
#8 0x00000000009a6a93 in PostgresMain (argc=1, argv=0x2001708, dbname=0x2001650 "testdb", username=0x2001638 "dbmgr") at postgres.c:5331
#9 0x00000000008ecc32 in BackendRun (port=0x2021060) at postmaster.c:5013
#10 0x00000000008ec3db in BackendStartup (port=0x2021060) at postmaster.c:4685
#11 0x00000000008e824a in ServerLoop () at postmaster.c:1990
#12 0x00000000008e78d1 in PostmasterMain (argc=5, argv=0x1fbc600) at postmaster.c:1598
#13 0x00000000007f360e in main (argc=5, argv=0x1fbc600) at main.c:228
[aken@dbhost]$
这里wait_event_info=100663296,十六进制 0x06000000U,取高两位得到等待事件类型为PG_WAIT_CLIENT:#define PG_WAIT_CLIENT 0x06000000U
这个和LOCK类型的等待事件已经不是一个类别,在PG_WAIT_CLIENT类型的等待事件中,又分9种小类标签:
Definition at line 898 of file pgstat.h.{
WAIT_EVENT_CLIENT_READ = PG_WAIT_CLIENT,
WAIT_EVENT_CLIENT_WRITE,
WAIT_EVENT_GSS_OPEN_SERVER,
WAIT_EVENT_LIBPQWALRECEIVER_CONNECT,
WAIT_EVENT_LIBPQWALRECEIVER_RECEIVE,
WAIT_EVENT_SSL_OPEN_SERVER,
WAIT_EVENT_WAL_RECEIVER_WAIT_START,
WAIT_EVENT_WAL_SENDER_WAIT_WAL,
WAIT_EVENT_WAL_SENDER_WRITE_DATA,
} WaitEventClient;
最后通过pgstat_get_wait_client()函数返回具体的等待事件名称:pgstat_get_wait_client(WaitEventClient w)
{
const char *event_name = "unknown wait event";
switch (w)
{
case WAIT_EVENT_CLIENT_READ:
event_name = "ClientRead"; <<<<< 进程7274的等待大类为PG_WAIT_CLIENT,对应该类别的WAIT_EVENT_CLIENT_READ类型,即
break;
case WAIT_EVENT_CLIENT_WRITE:
event_name = "ClientWrite";
break;
.....此处省略后续代码....
上面只是做了对锁等待的测试模拟,实践中生产环境等待事件类案例分析诊断可阅读如下文章:
>>>参考资料1.http://amitkapila16.blogspot.com/2016/03/troubleshooting-waits-in-postgresql.html
2.https://www.postgresql.org/docs/13/monitoring-stats.html
感谢阅读