PostgreSQL 控制文件管理
0.1 控制文件的损坏。
- 不能使用备份的控制文件。
- 想要修复数据,只能重建控制文件。
0.2 控制文件损坏后
1、报错信息。
testdb=# checkpoint ;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Warning
控制文件一旦损坏了,不会立刻造成数据库宕机,但是做checkpoint 时 就会爆出以上错误。然后数据库实例就会立刻关闭。
0.3 重建控制文件
pg_resetwal
pg_resetxlog
0.3.1 备份pg_control
cp $PGDATA/global/pg_control $PGDATA/global/pg_control.bak
cd $PGDATA/global
rm -rf $PGDATA/global/pg_control
rm -rf $PGDATA/postmaster.pid
0.3.2 确定下一个WAL日志文件名称
- 确定
pg_wal
目录下,最新的WAL日志的文件名,并在logseq
部位+1
$PGDATA/pg_wal/000000010000000000000001
-l=
000000010000000000000001 + 1 =
000000010000000000000002
0.3.3 确定下一个对象ID
$PGDATA/pg_multixact/members/xxxx
-O=
(xxxx+1) * 0x10000
= (0000+1) * 0x10000
= 0x10000 * 0x10000
= 0x1000000000
0.3.4 确定下一个多事务 ID
$PGDATA/pg_multixact/offsets/xxxx
-m=
(xxxx+1) * 0x10000,(xxxx+1) * 0x10000
= (0000+1) * 0x1000 = 0x1000,0x1000
0.3.5 最小的事务ID
$PGDATA/pg_xact/0000
-x=
(xxxx+1) * 0x10000,(xxxx+1) * 0x10000
= (0000+1) * 0x1000 = 0x1000,0x1000
0.4 重建控制文件
cd $PGDATA/global
touch pg_control
pg_resetwal -l 000000050000000000000014 -O 0x100000000 -m 0x00010000,0x00010000 -x 0x10000 -f $PGDATA