PostgreSQL Page页结构解析 B-Tree索引行的存储结构
本文简单介绍了在PG数据库B-Tree索引的物理存储内容。
环境准备
创建数据表,插入数据并创建索引。
-- 创建一张表,插入几行数据
drop table if exists t_index;
create table t_index (id int,c1 char(8),c2 varchar(16));
insert into t_index values(2,'1','a'
insert into t_index values(4,'2','b');
insert into t_index values(8,'3','c');
insert into t_index values(16,'4','d');
-- 创建索引
alter table t_index add constraint pk_t_index primary key(id);
SELECT pg_relation_filepath('pk_t_index');
索引文件raw data
[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441
00000000 00 00 00 00 08 9f 79 01 00 00 00 00 48 00 f0 1f |......y.....H...|
00000010 f0 1f 04 20 00 00 00 00 62 31 05 00 04 00 00 00 |... ....b1......|
00000020 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 bf |................|
00000040 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001ff0 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 |................|
00002000 00 00 00 00 78 9e 79 01 00 00 00 00 28 00 b0 1f |....x.y.....(...|
00002010 f0 1f 04 20 00 00 00 00 e0 9f 20 00 d0 9f 20 00 |... ...... ... .|
00002020 c0 9f 20 00 b0 9f 20 00 b0 9f 20 00 00 00 00 00 |.. ... ... .....|
00002030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00003fb0 00 00 00 00 04 00 10 00 10 00 00 00 00 00 00 00 |................|
00003fc0 00 00 00 00 03 00 10 00 08 00 00 00 00 00 00 00 |................|
00003fd0 00 00 00 00 02 00 10 00 04 00 00 00 00 00 00 00 |................|
00003fe0 00 00 00 00 01 00 10 00 02 00 00 00 00 00 00 00 |................|
00003ff0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 |................|
00004000
B-Tree索引物理存储结构
我们可以通过pageinspect插件查看索引的PostgreSQL page页结构解析 解析方法#pageinspect插件
Page 0是索引元数据页:
select * from page_header(get_raw_page('pk_t_index',0));
select * from bt_metap('pk_t_index');
root=1提示root页在第1页,通过page_header查看页头数据:
testdb=# select * from page_header(get_raw_page('pk_t_index',1));
每个索引entries结构为IndexTupleData+Bitmap+Value,其中IndexTupleData占8个字节,Bitmap占4个字节,Value占4字节,合计占用16个字节,数据结构如下:
index tuple 组成
组成部分 | 字节 |
---|---|
InexTupleData | 8 bytes |
Bitmap | 4 bytes |
Value | 4 bytes |
InexTupleData 组成
组成部分 | 字节 |
---|---|
t_tid | 6 bytes |
t_info | 2 bytes |
t_tid组成 |
组成部分 | 字节 |
---|---|
ip_blkid | 4 bytes |
t_info | 2 bytes |
--------------------------------------itup.h-----------------------------------------
/*
* Index tuple header structure
*
* All index tuples start with IndexTupleData. If the HasNulls bit is set,
* this is followed by an IndexAttributeBitMapData. The index attribute
* values follow, beginning at a MAXALIGN boundary.
*
* Note that the space allocated for the bitmap does not vary with the number
* of attributes; that is because we don't have room to store the number of
* attributes in the header. Given the MAXALIGN constraint there's no space
* savings to be had anyway, for usual values of INDEX_MAX_KEYS.
typedef struct IndexTupleData
{
ItemPointerData t_tid; /* reference TID to heap tuple */
/* ---------------
* t_info is laid out in the following fashion:
*
* 15th (high) bit: has nulls
* 14th bit: has var-width attributes
* 13th bit: AM-defined meaning
* 12-0 bit: size of tuple
* ---------------
*/
unsigned short t_info; /* various info about tuple */
} IndexTupleData; /* MORE DATA FOLLOWS AT END OF STRUCT */
typedef IndexTupleData *IndexTuple;
typedef struct IndexAttributeBitMapData
{
bits8 bits[(INDEX_MAX_KEYS + 8 - 1) / 8];
} IndexAttributeBitMapData;
typedef IndexAttributeBitMapData * IndexAttributeBitMap;
// 说明: (32 + 8 - 1 /8)=4 ,表示4 个8位二进制,也就是4 bytes.
--------------------------------itemptr.h----------------------------------------
typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
}
/* If compiler understands packed and aligned pragmas, use those */
#if defined(pg_attribute_packed) && defined(pg_attribute_aligned)
pg_attribute_packed()
pg_attribute_aligned(2)
#endif
ItemPointerData;
----------------------------------block.h----------------------------------------------
typedef struct BlockIdData
{
uint16 bi_hi;
uint16 bi_lo;
} BlockIdData;
----------------------------------off.h------------------------------------------------
typedef uint16 OffsetNumber;
通过bt_page_items函数查看索引entries:
testdb=# select * from bt_page_items('pk_t_index',1);
itemoffset | ctid | itemlen | nulls | vars | data | dead | htid | tids
------------+-------+---------+-------+------+-------------------------+------+-------+------
1 | (0,1) | 16 | f | f | 02 00 00 00 00 00 00 00 | f | (0,1) |
2 | (0,2) | 16 | f | f | 04 00 00 00 00 00 00 00 | f | (0,2) |
3 | (0,3) | 16 | f | f | 08 00 00 00 00 00 00 00 | f | (0,3) |
4 | (0,4) | 16 | f | f | 10 00 00 00 00 00 00 00 | f | (0,4) |
相应的物理索引文件内容:
[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441
00000000 00 00 00 00 08 9f 79 01 00 00 00 00 48 00 f0 1f |......y.....H...|
00000010 f0 1f 04 20 00 00 00 00 62 31 05 00 04 00 00 00 |... ....b1......|
00000020 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 bf |................|
00000040 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001ff0 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 |................|
00002000 00 00 00 00 78 9e 79 01 00 00 00 00 28 00 b0 1f |....x.y.....(...|
00002010 f0 1f 04 20 00 00 00 00 e0 9f 20 00 d0 9f 20 00 |... ...... ... .|
00002020 c0 9f 20 00 b0 9f 20 00 b0 9f 20 00 00 00 00 00 |.. ... ... .....|
00002030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00003fb0 00 00 00 00 04 00 10 00 10 00 00 00 00 00 00 00 |................|
00003fc0 00 00 00 00 03 00 10 00 08 00 00 00 00 00 00 00 |................|
00003fd0 00 00 00 00 02 00 10 00 04 00 00 00 00 00 00 00 |................|
00003fe0 00 00 00 00 01 00 10 00 02 00 00 00 00 00 00 00 |................|
00003ff0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 |................|
00004000
查找index tuple 开始位置
echo $((0x00003fb0))
16304
ItemPointerData
[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441 -s 16304 -n 6
00003fb0 00 00 00 00 04 00 |......|
00003fb6
-- blockid=\x0000,offset=\x0004
t_info
[xdb@localhost utf8db]$ hexdump -C $PGDATA/base/13758/16441 -s 16310 -n 2
00003fb6 10 00 |..|
00003fb8
t_info=\x0010,即16,表示tuple(索引项)大小为16个字
values
select * from t_index ;
id | c1 | c2
----+----------+----
2 | 1 | a
4 | 2 | b
8 | 3 | c
16 | 4 | d
(4 rows)
[postgres@node1 ~]$ hexdump -C $PGDATA/base/13758/16441 -s 16312 -n 4
00003fb8 10 00 00 00 |....|
00003fbc
values=\x0010 ,即16 ,表示(0,4)的ID 值为16 .
小结
小结一下,主要有以下几点:
1、数据存储:索引数据页头和与普通数据表页头一样的结构,占用24个字节,ItemIds占用4个字节;
2、索引entries:结构为IndexTupleData+Bitmap+Value;
3、内容查看:可通过pageinspect插件,推荐通过hexdump物理文件查看,有助于理解数据结构和数据的底层存储格式.
4、 根据t_info 里面算出values 的长度。