守望的麦子

修复 OGG 的 Time Since Chkpt

2018-4-19    大连    /oracle/2018/04/19/ogg-tsc.html oracle oracle, linux,

遇到了 Oracle Golden Gate 状态显示异常为 unknown 的问题,尝试通过下面的方法修复了,记录一下。

 1 [oracle@localhost ~]$ ggsci
 2 
 3 Oracle GoldenGate Command Interpreter for Oracle
 4 Version 11.2.1.0.3 14400833 OGGCORE_11.2.1.0.3_PLATFORMS_120823.1258_FBO
 5 Linux, x64, 64bit (optimized), Oracle 11g on Aug 23 2012 20:20:21
 6 
 7 Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.
 8 
 9 
10 
11 GGSCI (localhost.localdomain) 1> info all
12 
13 Program     Status      Group       Lag           Time Since Chkpt
14 
15 MANAGER     RUNNING                                          
16 EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
17 EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
18 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
19 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
20 REPLICAT    RUNNING     REP12345     00:00:00      unknown    

尝试停止相关进程失败:

 1 GGSCI (localhost.localdomain) 2> stop *
 2 
 3 Sending STOP request to EXTRACT EXT12345 ...
 4 
 5 ERROR: sending message to EXTRACT EXT12345 (Timeout waiting for message).
 6 
 7 Sending STOP request to EXTRACT EXT67889 ...
 8 
 9 ERROR: sending message to EXTRACT EXT67889 (Timeout waiting for message).
10 
11 Sending STOP request to EXTRACT PUMP1234 ...
12 
13 ERROR: sending message to EXTRACT PUMP1234 (Timeout waiting for message).
14 
15 Sending STOP request to EXTRACT PUMP5678 ...
16 
17 ERROR: sending message to EXTRACT PUMP5678 (Timeout waiting for message).
18 
19 Sending STOP request to REPLICAT REP12345 ...
20 
21 ERROR: sending message to REPLICAT REP12345 (Timeout waiting for message).

尝试停止 MANAGER:

1 GGSCI (localhost.localdomain) 3> stop mgr!
2 
3 Sending STOP request to MANAGER ...
4 Request processed.
5 Manager stopped.

再次查看状态:

 1 GGSCI (localhost.localdomain) 4> info all
 2 
 3 Program     Status      Group       Lag           Time Since Chkpt
 4 
 5 MANAGER     STOPPED                                          
 6 EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
 7 EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
 8 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
 9 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
10 REPLICAT    RUNNING     REP12345     00:00:00      unknown    

目前 MANAGER 已被停止,但是 EXTRACT 和 REPLICAT 进程仍运行。

此时无法通过 kill 命令结束进程:

1 GGSCI (localhost.localdomain) 5> kill EXT12345
2 
3 ERROR: Manager not currently running.
4 
5 GGSCI (localhost.localdomain) 6> kill EXT67889
6 
7 ERROR: Manager not currently running.

查看状态:

 1 GGSCI (localhost.localdomain) 7> info all
 2 
 3 Program     Status      Group       Lag           Time Since Chkpt
 4 
 5 MANAGER     STOPPED                                          
 6 EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
 7 EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
 8 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
 9 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
10 REPLICAT    RUNNING     REP12345     00:00:00      unknown  

退出 GGSCI

1 GGSCI (localhost.localdomain) 8> exit

查看系统级 OGG 进程:

1 [oracle@localhost OGG]$ ps -ef|grep /opt/OGG
2 oracle    7479     1  0 Nov10 ?        00:03:31 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/EXT12345.prm REPORTFILE /opt/OGG/dirrpt/EXT12345.rpt PROCESSID EXT12345 USESUBDIRS
3 oracle    7480     1  0 Nov10 ?        00:02:30 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/EXT67889.prm REPORTFILE /opt/OGG/dirrpt/EXT67889.rpt PROCESSID EXT67889 USESUBDIRS
4 oracle    7483     1  0 Nov10 ?        00:00:01 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/PUMP1234.prm REPORTFILE /opt/OGG/dirrpt/PUMP1234.rpt PROCESSID PUMP1234 USESUBDIRS
5 oracle    7485     1  0 Nov10 ?        00:00:03 /opt/OGG/replicat PARAMFILE /opt/OGG/dirprm/REP12345.prm REPORTFILE /opt/OGG/dirrpt/REP12345.rpt PROCESSID REP12345 USESUBDIRS
6 oracle    7518     1  0 Nov10 ?        00:00:01 ./server -p 7847 -k -l /opt/OGG/ggserr.log
7 oracle    7677     1  0 Nov10 ?        00:00:15 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/PUMP5678.prm REPORTFILE /opt/OGG/dirrpt/PUMP5678.rpt PROCESSID PUMP5678 USESUBDIRS
8 oracle 25261 25112 0 24:48 pts / 1     0:00:00 grip  / opt / OGG

如果以上命令查询不到,可以尝试下面的命令:

1 ps -ef | grep <replicat name>;

kill 相关进程:

1 [oracle@localhost OGG]$ kill -9 7479 7480 7482 7483 7485  7518 7677
2 [oracle@localhost OGG]$ ps -ef|grep /opt/OGG
3 oracle 25264 25112 0 24:48 pts / 1     0:00:00 grip  / opt / OGG

登录 GGSCI 查看状态:

 1 [oracle@localhost OGG]$ ggsci
 2 
 3 Command Interpreter Oracle GoldenGate for  Oracle
 4 Version 11.1.1.0.0 Build 078
 5 Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11
 6 
 7 Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.
 8 
 9 
10 
11 GGSCI (localhost.localdomain) 1> info all
12 
13 Program     Status      Group       Lag           Time Since Chkpt
14 
15 MANAGER     STOPPED                                          
16 EXTRACT     ABENDED     EXT12345     00:00:00      unknown    
17 EXTRACT     ABENDED     EXT67889     00:00:00      unknown    
18 EXTRACT     ABENDED     PUMP1234     00:00:00      unknown    
19 EXTRACT     ABENDED     PUMP5678     00:00:00      unknown    
20 REPLICAT    ABENDED     REP12345     00:00:00      unknown    

状态变为 ABENDED,启动 MANAGER:

1 GGSCI (localhost.localdomain) 2> start mgr
2 
3 Manager started.

再次查看状态:

 1 GGSCI (localhost.localdomain) 3> info all
 2 
 3 Program     Status      Group       Lag           Time Since Chkpt
 4 
 5 MANAGER     RUNNING                                          
 6 EXTRACT     RUNNING     EXT12345     00:00:00      unknown    
 7 EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
 8 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
 9 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
10 REPLICAT    RUNNING     REP12345     00:00:00      unknown

进程恢复运行状态,但是 Time Since Chkpt 值仍为 unknown。关闭进程后再次查看:

 1 GGSCI (localhost.localdomain) 4> stop EXT12345
 2 
 3 Sending STOP request to EXTRACT EXT12345 ...
 4 Request processed.
 5 
 6 
 7 GGSCI (localhost.localdomain) 5> info all
 8 
 9 Program     Status      Group       Lag           Time Since Chkpt
10 
11 MANAGER     RUNNING                                          
12 EXTRACT     STOPPED     EXT12345     unknown       00:00:02   
13 EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
14 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
15 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
16 REPLICAT    RUNNING     REP12345     00:00:00      unknown    

启动进程:

 1 GGSCI (localhost.localdomain) 6> start EXT12345
 2 
 3 Sending START request to MANAGER ...
 4 EXTRACT EXT12345 starting
 5 
 6 
 7 GGSCI (localhost.localdomain) 7> info all
 8 
 9 Program     Status      Group       Lag           Time Since Chkpt
10 
11 MANAGER     RUNNING                                          
12 EXTRACT     RUNNING     EXT12345     unknown       00:00:14   
13 EXTRACT     RUNNING     EXT67889     00:00:00      unknown    
14 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
15 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
16 REPLICAT    RUNNING     REP12345     00:00:00      unknown    

Lag 状态异常,等待恢复……继续停止进程:

1 GGSCI (localhost.localdomain) 8> stop EXT67889
2 
3 Sending STOP request to EXTRACT EXT67889 ...

STOP xxx 命令需要等待,如果需要立即停止进程,可以使用 SEND EXTRACT xxx, FORCESTOP 命令。

 1 GGSCI (localhost.localdomain) 9> info all
 2 
 3 Program     Status      Group       Lag           Time Since Chkpt
 4 
 5 MANAGER     RUNNING                                          
 6 EXTRACT     RUNNING     EXT12345     unknown       00:00:02   
 7 EXTRACT     STOPPED     EXT67889     01:51:12      00:00:01   
 8 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
 9 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
10 REPLICAT    RUNNING     REP12345     00:00:00      unknown    

启动进程:

 1 GGSCI (localhost.localdomain) 10> start EXT67889
 2 
 3 Sending START request to MANAGER ...
 4 EXTRACT EXT67889 starting
 5 
 6 
 7 GGSCI (localhost.localdomain) 11> info all
 8 
 9 Program     Status      Group       Lag           Time Since Chkpt
10 
11 MANAGER     RUNNING                                          
12 EXTRACT     RUNNING     EXT12345     99:53:02      00:00:01   
13 EXTRACT     RUNNING     EXT67889     01:51:12      00:00:10   
14 EXTRACT     RUNNING     PUMP1234     00:00:00      unknown    
15 EXTRACT     RUNNING     PUMP5678     00:00:00      unknown    
16 REPLICAT    RUNNING     REP12345     00:00:00      00:00:00   

继续 STOPSTART 其他进程:

 1 GGSCI (localhost.localdomain) 15> stop PUMP1234
 2 
 3 Sending STOP request to EXTRACT PUMP1234 ...
 4 Request processed.
 5 
 6 
 7 GGSCI (localhost.localdomain) 16> start PUMP1234
 8 
 9 Sending START request to MANAGER ...
10 EXTRACT PUMP1234 starting
11 
12 
13 GGSCI (localhost.localdomain) 17> stop PUMP5678
14 
15 Sending STOP request to EXTRACT PUMP5678 ...
16 Request processed.
17 
18 
19 GGSCI (localhost.localdomain) 18> start PUMP5678
20 
21 Sending START request to MANAGER ...
22 EXTRACT PUMP5678 starting
23 
24 
25 GGSCI (localhost.localdomain) 19> info all
26 
27 Program     Status      Group       Lag           Time Since Chkpt
28 
29 MANAGER     RUNNING                                          
30 EXTRACT     RUNNING     EXT12345     00:00:00      00:00:01   
31 EXTRACT     RUNNING     EXT67889     00:00:00      00:00:10   
32 EXTRACT     RUNNING     PUMP1234     00:00:00      00:00:04   
33 EXTRACT     RUNNING     PUMP5678     00:00:00      00:00:05   
34 REPLICAT    RUNNING     REP12345     00:00:00      00:00:05   

一切恢复正常。

总结:

首先,强制关闭 MANAGER,然后退出 GGSCIkill OGG 相关进程,最后,再次进入 GGSCI 并启动 MANAGER,重启相关异常进程。

关于作者
麦子,80 后,现从事通信行业。安卓玩家一个人的书房朗读者。
MRJENGLISH
jsntn
jasonwtien
jasonwtien
更多…… /about.html

最近更新: