diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml index 33378b4..1e7915e 100644 *** a/doc/src/sgml/bki.sgml --- b/doc/src/sgml/bki.sgml *************** *** 1,38 **** ! <acronym>BKI</acronym> Backend Interface ! Backend Interface (BKI) files are scripts in a ! special language that is understood by the ! PostgreSQL backend when running in the ! bootstrap mode. The bootstrap mode allows system catalogs ! to be created and filled from scratch, whereas ordinary SQL commands ! require the catalogs to exist already. ! BKI files can therefore be used to create the ! database system in the first place. (And they are probably not ! useful for anything else.) ! initdb uses a BKI file ! to do part of its job when creating a new database cluster. The ! input file used by initdb is created as ! part of building and installing PostgreSQL ! by a program named genbki.pl, which reads some ! specially formatted C header files in the src/include/catalog/ ! directory of the source tree. The created BKI file ! is called postgres.bki and is ! normally installed in the ! share subdirectory of the installation tree. ! Related information can be found in the documentation for ! initdb. <acronym>BKI</acronym> File Format --- 1,641 ---- ! System Catalog Declarations and Initial Contents ! PostgreSQL uses many different system catalogs ! to keep track of the existence and properties of database objects, such as ! tables and functions. Physically there is no difference between a system ! catalog and a plain user table, but the backend C code knows the structure ! and properties of each catalog, and can manipulate it directly at a low ! level. Thus, for example, it is inadvisable to attempt to alter the ! structure of a catalog on-the-fly; that would break assumptions built into ! the C code about how rows of the catalog are laid out. But developers ! can change the structure of catalogs in new major versions. ! The structures of the catalogs are declared in specially formatted C ! header files in the src/include/catalog/ directory of ! the source tree. In particular, for each catalog there is a header file ! named after the catalog (e.g., pg_class.h ! for pg_class), which defines the set of columns ! the catalog has, as well as some other basic properties such as its OID. ! Other critical files defining the catalog structure ! include indexing.h, which defines the indexes present ! on all the system catalogs, and toasting.h, which ! defines TOAST tables for catalogs that need one. ! Many of the catalogs have initial data that must be loaded into them ! during the bootstrap phase ! of initdb, to bring the system up to a point ! where it is capable of executing SQL commands. (For ! example, pg_class.h must contain an entry for itself, ! as well as one for each other system catalog and index.) Much of this ! initial data is kept in editable form in data files that are also stored ! in the src/include/catalog/ directory. For example, ! pg_proc.dat describes all the initial rows that must ! be inserted into the pg_proc catalog. + + To create the catalog files and load this initial data into them, a + backend running in bootstrap mode reads a BKI + (Backend Interface) file containing commands and initial data. + The postgres.bki file used in this mode is prepared + from the aforementioned header and data files, by a Perl script + named genbki.pl, while building + a PostgreSQL distribution. + Although it's specific to a particular PostgreSQL + release, postgres.bki is platform-independent and is + normally installed in the share subdirectory of the + installation tree. + + + + genbki.pl also produces a derived header file for + each catalog, for example pg_class_d.h for + the pg_class catalog. This file contains + automatically-generated macro definitions, and may contain other macros, + enum declarations, and so on that can be useful for C code that reads a + particular catalog. + + + + Most developers don't need to be directly concerned with + the BKI file, but almost any nontrivial feature + addition in the backend will require modifying the catalog header files + and/or initial data files. The rest of this chapter gives some + information about that, and for completeness describes + the BKI file format. + + + + System Catalog Declaration Rules + + + The key part of a catalog header file is a C structure definition + describing the layout of each row of the catalog. This begins with + a CATALOG macro, which so far as the C compiler is + concerned is just shorthand for typedef struct + FormData_catalogname. + Each field in the struct gives rise to a catalog column. + Fields can be annotated using the BKI property macros described + in genbki.h, to define a default value, mark the + field as nullable or not nullable, or specify a lookup rule that allows + OID values to be represented symbolically in the + corresponding .dat file. + The CATALOG line can also be annotated, with some + other BKI property macros described in genbki.h, to + define other properties of the catalog as a whole, such as whether + it has OIDs (by default, it does). + + + + The system catalog cache code (and most catalog-munging code in general) + assumes that the fixed-length portions of all system catalog tuples are + in fact present, because it maps this C struct declaration onto them. + Thus, all variable-length fields and nullable fields must be placed at + the end, and they cannot be accessed as struct fields. + For example, if you tried to + set pg_type.typrelid + to be NULL, it would fail when some piece of code tried to reference + typetup->typrelid (or worse, + typetup->typelem, because that follows + typrelid). This would result in + random errors or even segmentation violations. + + + + As a partial guard against this type of error, variable-length or + nullable fields should not be made directly visible to the C compiler. + This is accomplished by wrapping them in #ifdef + CATALOG_VARLEN ... #endif. This prevents C + code from carelessly trying to dereference fields that might not be + there. As an independent guard against creating incorrect rows, we + require that all columns that should be non-nullable are marked so + in pg_attribute. The bootstrap code will + automatically mark catalog columns as NOT NULL + if they are fixed-width and are not preceded by any nullable column. + Where this rule is inadequate, you can force correct marking by using + BKI_FORCE_NOT_NULL + and BKI_FORCE_NULL annotations as needed. But note + that NOT NULL constraints are only enforced in the + executor, not against tuples that are generated by random C code, + so care is still needed when manually creating or updating catalog rows. + + + + Frontend code should not include any pg_xxx.h + catalog header file, as these files may contain C code that won't compile + outside the backend. (Typically, that happens because these files also + contain declarations for functions + in src/backend/catalog/ files.) + Instead, frontend code may include the corresponding + generated pg_xxx_d.h header, which will contain + OID #defines and any other data that might be of use + on the client side. If you want macros or other code in a catalog header + to be visible to frontend code, write #ifdef + EXPOSE_TO_CLIENT_CODE ... #endif around that + section to instruct genbki.pl to copy that section + to the pg_xxx_d.h header. + + + + A few of the catalogs are so fundamental that they can't even be created + by the BKI create command that's + used for most catalogs, because that command needs to write information + into these catalogs to describe the new catalog. These are + called bootstrap catalogs, and defining one takes + a lot of extra work: you have to manually prepare appropriate entries for + them in the pre-loaded contents of pg_class + and pg_type, and those entries will need to be + updated for subsequent changes to the catalog's structure. + (Bootstrap catalogs also need pre-loaded entries + in pg_attribute, but + fortunately genbki.pl handles that chore nowadays.) + Avoid making new catalogs be bootstrap catalogs if at all possible. + + + + + System Catalog Initial Data + + + Each catalog that has any manually-created initial data (some do not) + has a corresponding .dat file that contains its + initial data in an editable format. + + + + Data File Format + + + Each .dat file contains Perl data structure literals + that are simply eval'd to produce an in-memory data structure consisting + of an array of hash references, one per catalog row. + A slightly modified excerpt from pg_database.dat + will demonstrate the key features: + + + + [ + + # LC_COLLATE and LC_CTYPE will be replaced at initdb time with user choices + # that might contain non-word characters, so we must double-quote them. + + { oid => '1', oid_symbol => 'TemplateDbOid', + descr => 'database\'s default template', + datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING', + datcollate => '"LC_COLLATE"', datctype => '"LC_CTYPE"', datistemplate => 't', + datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0', + datfrozenxid => '0', datminmxid => '1', dattablespace => '1663', + datacl => '_null_' }, + + ] + + + + Points to note: + + + + + + + The overall file layout is: open square bracket, one or more sets of + curly braces each of which represents a catalog row, close square + bracket. Write a comma after each closing curly brace. + + + + + + Within each catalog row, write comma-separated + key => + value pairs. The + allowed keys are the names of the catalog's + columns, plus the metadata keys oid, + oid_symbol, and descr. + (The use of oid and oid_symbol + is described in + below. descr supplies a description string for + the object, which will be inserted + into pg_description + or pg_shdescription as appropriate.) + While the metadata keys are optional, the catalog's defined columns + must all be provided, except when the catalog's .h + file specifies a default value for the column. + + + + + + All values must be single-quoted. Escape single quotes used within + a value with a backslash. (Backslashes meant as data need not be + doubled, however; this follows Perl's rules for simple quoted + literals.) + + + + + + Null values are represented by _null_. + + + + + + If a value is a macro to be expanded + by initdb, it should also contain double + quotes as shown above, unless we know that no special characters can + appear within the string that will be substituted. + + + + + + Comments are preceded by #, and must be on their + own lines. + + + + + + To aid readability, field values that are OIDs of other catalog + entries can be represented by names rather than numeric OIDs. + This is described in + below. + + + + + + Since hashes are unordered data structures, field order and line + layout aren't semantically significant. However, to maintain a + consistent appearance, we set a few rules that are applied by the + formatting script reformat_dat_file.pl: + + + + + + Within each pair of curly braces, the metadata + fields oid, oid_symbol, + and descr (if present) come first, in that + order, then the catalog's own fields appear in their defined order. + + + + + + Newlines are inserted between fields as needed to limit line length + to 80 characters, if possible. A newline is also inserted between + the metadata fields and the regular fields. + + + + + + If the catalog's .h file specifies a default + value for a column, and a data entry has that same + value, reformat_dat_file.pl will omit it from + the data file. This keeps the data representation compact. + + + + + + reformat_dat_file.pl preserves blank lines + and comment lines as-is. + + + + + + It's recommended to run reformat_dat_file.pl + before submitting catalog data patches. For convenience, you can + simply change to src/include/catalog/ and + run make reformat-dat-files. + That script can also be modified to perform bulk editing, as + described in below. + + + + + + If you want to add a new method of making the data representation + smaller, you must implement it + in reformat_dat_file.pl and also + teach Catalog::ParseData() how to expand the + data back into the full representation. + + + + + + + + OID Assignment + + + A catalog row appearing in the initial data can be given a + manually-assigned OID by writing an oid + => nnnn metadata field. + Furthermore, if an OID is assigned, a C macro for that OID can be + created by writing an oid_symbol + => name metadata field. + + + + Pre-loaded catalog rows must have preassigned OIDs if there are OID + references to them in other pre-loaded rows. A preassigned OID is + also needed if the row's OID must be referenced from C code. + If neither case applies, the oid metadata field can + be omitted, in which case the bootstrap code assigns an OID + automatically, or leaves it zero in a catalog that has no OIDs. + In practice we usually preassign OIDs for all or none of the pre-loaded + rows in a given catalog, even if only some of them are actually + cross-referenced. + + + + Writing the actual numeric value of any OID in C code is considered + very bad form; always use a macro, instead. Direct references + to pg_proc OIDs are common enough that there's + a special mechanism to create the necessary macros automatically; + see src/backend/utils/Gen_fmgrtab.pl. Similarly + — but, for historical reasons, not done the same way — + there's an automatic method for creating macros + for pg_type + OIDs. oid_symbol entries are therefore not + necessary in those two catalogs. Likewise, macros for + the pg_class OIDs of system catalogs and + indexes are set up automatically. For all other system catalogs, you + have to manually specify any macros you need + via oid_symbol entries. + + + + To find an available OID for a new pre-loaded row, run the + script src/include/catalog/unused_oids. + It prints inclusive ranges of unused OIDs (e.g., the output + line 45-900 means OIDs 45 through 900 have not been + allocated yet). Currently, OIDs 1-9999 are reserved for manual + assignment; the unused_oids script simply looks + through the catalog headers and .dat files + to see which ones do not appear. You can also use + the duplicate_oids script to check for mistakes. + (That script is run automatically at compile time, and will stop the + build if a duplicate is found.) + + + + The OID counter starts at 10000 at the beginning of a bootstrap run. + If a catalog row is in a table that requires OIDs, but no OID was + preassigned by an oid field, then it will + receive an OID of 10000 or above. + + + + + OID Reference Lookup + + + Cross-references from one initial catalog row to another can be written + by just writing the preassigned OID of the referenced row. But + that's error-prone and hard to understand, so for frequently-referenced + catalogs, genbki.pl provides mechanisms to write + symbolic references instead. Currently this is possible for references + to access methods, functions, operators, opclasses, opfamilies, and + types. The rules are as follows: + + + + + + + Use of symbolic references is enabled in a particular catalog column + by attaching BKI_LOOKUP(lookuprule) + to the column's definition, where lookuprule + is pg_am, pg_proc, + pg_operator, + pg_opclass, + pg_opfamily, + or pg_type. + BKI_LOOKUP can be attached to columns of + type Oid, regproc, oidvector, + or Oid[]; in the latter two cases it implies performing a + lookup on each element of the array. + + + + + + In such a column, all entries must use the symbolic format except + when writing 0 for InvalidOid. (If the column is + declared regproc, you can optionally + write - instead of 0.) + genbki.pl will warn about unrecognized names. + + + + + + Access methods are just represented by their names, as are types. + Type names must match the referenced pg_type + entry's typname; you do not get to use any + aliases such as integer + for int4. + + + + + + A function can be represented by + its proname, if that is unique among + the pg_proc.dat entries (this works like regproc + input). Otherwise, write it + as proname(argtypename,argtypename,...), + like regprocedure. The argument type names must be spelled exactly as + they are in the pg_proc.dat entry's + proargtypes field. Do not insert any + spaces. + + + + + + Operators are represented + by oprname(lefttype,righttype), + writing the type names exactly as they appear in + the pg_operator.dat + entry's oprleft + and oprright fields. + (Write 0 for the omitted operand of a unary operator.) + + + + + + The names of opclasses and opfamilies are only unique within an + access method, so they are represented + by access_method_name/object_name. + + + + + + In none of these cases is there any provision for + schema-qualification; all objects created during bootstrap are + expected to be in the pg_catalog schema. + + + + + + + Recipes for Editing Data Files + + + Here are some suggestions about the easiest ways to perform common tasks + when updating catalog data files. + + + + Add a new column with a default to a catalog: + + Add the column to the header file with + a BKI_DEFAULT(value) + annotation. The data file need only be adjusted by adding the field + in existing rows where a non-default value is needed. + + + + + Add a default value to an existing column that doesn't have + one: + + Add a BKI_DEFAULT annotation to the header file, + then run make reformat-dat-files to remove + now-redundant field entries. + + + + + Remove a column, whether it has a default or not: + + Remove the column from the header, then run make + reformat-dat-files to remove now-useless field entries. + + + + + Change or remove an existing default value: + + You cannot simply change the header file, since that will cause the + current data to be interpreted incorrectly. First run make + expand-dat-files to rewrite the data files with all + default values inserted explicitly, then change or remove + the BKI_DEFAULT annotation, then run make + reformat-dat-files to remove superfluous fields again. + + + + + Ad-hoc bulk editing: + + reformat_dat_file.pl can be adapted to perform + many kinds of bulk changes. Look for its block comments showing where + one-off code can be inserted. In the following example, we are going + to consolidate two boolean fields in pg_proc + into a char field: + + + + + Add the new column, with a default, + to pg_proc.h: + + + /* see PROKIND_ categories below */ + + char prokind BKI_DEFAULT(f); + + + + + + + Create a new script based on reformat_dat_file.pl + to insert appropriate values on-the-fly: + + - # At this point we have the full row in memory as a hash + - # and can do any operations we want. As written, it only + - # removes default values, but this script can be adapted to + - # do one-off bulk-editing. + + # One-off change to migrate to prokind + + # Default has already been filled in by now, so change to other + + # values as appropriate + + if ($values{proisagg} eq 't') + + { + + $values{prokind} = 'a'; + + } + + elsif ($values{proiswindow} eq 't') + + { + + $values{prokind} = 'w'; + + } + + + + + + + Run the new script: + + $ cd src/include/catalog + $ perl -I ../../backend/catalog rewrite_dat_with_prokind.pl pg_proc.dat + + At this point pg_proc.dat has all three + columns, prokind, + proisagg, + and proiswindow, though they will appear + only in rows where they have non-default values. + + + + + + Remove the old columns from pg_proc.h: + + - /* is it an aggregate? */ + - bool proisagg BKI_DEFAULT(f); + - + - /* is it a window function? */ + - bool proiswindow BKI_DEFAULT(f); + + + + + + + Finally, run make reformat-dat-files to remove + the useless old entries from pg_proc.dat. + + + + + For further examples of scripts used for bulk editing, see + convert_oid2name.pl + and remove_pg_type_oid_symbols.pl attached to this + message: + + + + + + <acronym>BKI</acronym> File Format *************** *** 340,346 **** ! Example The following sequence of commands will create the --- 943,949 ---- ! BKI Example The following sequence of commands will create the