PostgreSQL Page页结构解析 B-Tree索引的页头和行数据指针
本文简单介绍了PG索引数据页Page中存储的原始内容以及如何阅读它们,包括页头PageHeader和行数据指针ItemId(Line Pointer)。
测试数据准备
1 创建一张表,插入数据并创建索引
-- 创建一张表,插入几行数据
drop table if exists t_index;
create table t_index (id int,c1 char(8),c2 varchar(16));
insert into t_index values(2,'1','a');
insert into t_index values(4,'2','b');
insert into t_index values(8,'3','c');
insert into t_index values(16,'4','d');
-- 创建索引
alter table t_index add constraint pk_t_index primary key(id);
2 获取该索引对应的数据文件
testdb=# SELECT pg_relation_filepath('pk_t_index');
pg_relation_filepath
----------------------
base/13758/16441
3 Dump数据文件中的数据
[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441
00000000 00 00 00 00 08 9f 79 01 00 00 00 00 48 00 f0 1f |......y.....H...|
00000010 f0 1f 04 20 00 00 00 00 62 31 05 00 04 00 00 00 |... ....b1......|
00000020 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 bf |................|
00000040 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001ff0 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 |................|
00002000 00 00 00 00 78 9e 79 01 00 00 00 00 28 00 b0 1f |....x.y.....(...|
00002010 f0 1f 04 20 00 00 00 00 e0 9f 20 00 d0 9f 20 00 |... ...... ... .|
00002020 c0 9f 20 00 b0 9f 20 00 b0 9f 20 00 00 00 00 00 |.. ... ... .....|
00002030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00003fb0 00 00 00 00 04 00 10 00 10 00 00 00 00 00 00 00 |................|
00003fc0 00 00 00 00 03 00 10 00 08 00 00 00 00 00 00 00 |................|
00003fd0 00 00 00 00 02 00 10 00 04 00 00 00 00 00 00 00 |................|
00003fe0 00 00 00 00 01 00 10 00 02 00 00 00 00 00 00 00 |................|
00003ff0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 |................|
00004000
PageHeader
上一节提到过PageHeaderData,其数据结构如下:
typedef struct PageHeaderData
{
/* XXX LSN is member of *any* block, not only page-organized ones */
PageXLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog
* record for last change to this page */
uint16 pd_checksum; /* checksum */
uint16 pd_flags; /* flag bits, see below */
LocationIndex pd_lower; /* offset to start of free space */
LocationIndex pd_upper; /* offset to end of free space */
LocationIndex pd_special; /* offset to start of special space */
uint16 pd_pagesize_version;
TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
ItemIdData pd_linp\[1\]; /* beginning of line pointer array */
} PageHeaderData;
下面根据数据文件中的数据使用hexdump查看并逐个进行解析。
1 pd_lsn(8bytes)
[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 0 -n 8
00000000 00 00 00 00 08 9f 79 01 |......y.|
00000008
数据文件的8个Bytes存储的是LSN,其中最开始的4个Bytes是TimelineID,在这里是\x0000 0000(即数字0),后面的4个Bytes是\x01799f08 ,组合起来LSN为0/01799f08
Warning
A、0000000&0000008是hexdump工具的输出,不是数据内容
B、X86使用小端模式,阅读字节码时注意高低位变换
2 pd_checksum(2bytes)
[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 8 -n 2
00000008 00 00 |..|
0000000a
checksum为\x0000
3 pd_flags(2bytes)
\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 10 -n 2
0000000a 00 00 |..|
0000000c
flags为\\x0000
pd_flags 是标志位,初始化未定义的位,默认为零,并可能在将来使用。
- 此页面有空闲空间,且 行指针 位置小于pd_lower 则考虑重用.
- 此页面没有空闲空间时,UPDATE无法容纳新版本数据,则设置此位, 标志此页面出现行迁移.
4 pg_lower
[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441 -s 12 -n 2
0000000c 48 00 |(.|
0000000e
[xdb@localhost utf8db]$ echo $((0x0048))
lower为\\x0048,十进制值为72
5 pd_upper(2bytes)
\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 14 -n 2
0000000e f0 1f |\`.|
00000010
\[xdb@localhost utf8db\]$ echo $((0x1ff0))
8176
upper为\\x1f60,十进制为8032
6 pd_special(2bytes)
\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 16 -n 2
00000010 f0 1f |..|
00000012
\[xdb@localhost utf8db\]$ echo $((0x1ff0))
8176
Special Space为\x1ff0,十进制值为8176
- pd_special – 此变量用于索引。在表格中的页面中,它指向页面的末尾。(在索引内的页面中,它指向特殊空间的开头,该空间是仅由索引保存的数据区域,并根据索引类型(如B树,GiST,GiN等)包含特定数据。
7 pd_pagesize_version(2bytes)
\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 18 -n 2
00000012 04 20 |. |
00000014
pagesize_version为\\x2004,十进制为8196(即版本4)
8 pd_prune_xid(4bytes)
\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 20 -n 4
00000014 00 00 00 00 |....|
00000018
prune_xid为\\x0000,即0
ItemIds
PageHeaderData之后是ItemId数组,每个元素占用的空间为4Bytes,数据结构:
typedef struct ItemIdData
{
unsigned lp_off:15,/* offset to tuple (from start of page) */
lp_flags:2,/* state of item pointer, see below */
lp_len:15;/* byte length of tuple */
} ItemIdData;
typedef ItemIdData* ItemId;
1 lp_off
Warning
偏移量有问题,暂不知道问题所在.
\[xdb@localhost utf8db\]$ hexdump -C $PGDATA/base/13758/16441 -s 24 -n 2
00000018 62 31 |b1|
0000001a
取低15位
\[xdb@localhost utf8db\]$ echo $((0x3162 & ~$((1<<15))))
12641
12642 + 16
表示第1个Item(tuple)从12642开始
2 lp_len
[postgres@node1 ~]$ hexdump -C $PGDATA/base/13758/16441 -s 28 -n 2
0000001c 04 00 |..|
0000001e
取高15位
\[xdb@localhost utf8db\]$ echo $((0x0004))
4
表示第1个Item(tuple)的大小为4个字节
3 lp_flags
0x9fd8=1001111111011000
004e= 1001110000000000
取第17-16位,01,即1
当lp_flags 为1时,指定 一个未使用的行指针,是否重用.
当lp_flags 为其他值时,则不会立即重用.
小结
以上简单介绍了如何阅读Raw Datafile,包括页头和数据行指针信息,有兴趣的同学可在此基础上实现自己的“pageinspect"。下一节将介绍数据行信息。