diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml index 2d4ab85d45..5c3245c0ec 100644 --- a/doc/src/sgml/advanced.sgml +++ b/doc/src/sgml/advanced.sgml @@ -1,7 +1,7 @@ - Advanced Features + Advanced SQL Features Introduction diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 7883c3cd82..9db0ae2c78 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -1,7 +1,7 @@ - Overview of PostgreSQL Internals + Overview of Query Handling Author diff --git a/doc/src/sgml/architecture.sgml b/doc/src/sgml/architecture.sgml new file mode 100644 index 0000000000..e547a87d08 --- /dev/null +++ b/doc/src/sgml/architecture.sgml @@ -0,0 +1,1517 @@ + + + + Overview of Architecture and Implementation + + + Every DBMS implements basic strategies to ensure a fast + and robust system. This chapter provides an overview of the + techniques PostgreSQL uses to + achieve this. + + + + Collaboration of Processes, RAM, and Files + + In a client/server architecture clients do not have direct access + to database files and the data stored in them. Instead, they send + requests to the server and receive the requested data in the response. + In the case of PostgreSQL, the server + launches a single process for each client connection, referred to as a + Backend process. + Those Backend processes handle the client's requests by acting on the + Shared Memory. + This leads to other activities (file access, WAL, vacuum, ...) of the + Instance. The + Instance is a group of server-side processes acting on a common + Shared Memory. Notably, PostgreSQL does not utilize application + threading within its implementation. + + + + The first step in an Instance start is the start of the + Postmaster. + He loads the configuration files, allocates Shared Memory, and + starts the other processes of the Instance: + Background Writer, + Checkpointer, + WAL Writer, + WAL Archiver, + Autovacuum, + Statistics Collector, + Logger, and more. + Later, the Postmaster starts + Backend processes + which communicate with clients and handle their requests. + visualizes the processes + of an Instance and the main aspects of their collaboration. + + +
+ Architecture + + + + + + + + + + +
+ + + When a client application tries to connect to a + database, + this request is handled initially by the Postmaster. He + starts a new Backend process and instructs the client + application to connect to it. All further client requests + go to this process and are handled by it. + + + + Client requests like SELECT or + UPDATE usually lead to the + necessity to read or write some data. This is carried out + by the client's backend process. Reads involve a page-level + cache housed in Shared Memory (for details see: + ) for the benefit of all processes + in the instance. Writes also involve this cache, in additional + to a journal, called a write-ahead-log or WAL. + + + + Shared Memory is limited in size. Thus, it becomes necessary + to evict pages. As long as the content of such pages hasn't + changed, this is not a problem. But in Shared Memory also + write actions take place. Modified pages are called dirty + pages or dirty buffers and before they can be evicted they + must be written back to disk. This happens regularly by the + Background Writer and the Checkpointer process to ensure + that the disk version of the pages are kept up-to-date. + The synchronisation from RAM to disk consists of two steps. + + + + + + First, whenever the content of a page changes, a + WAL record + is created out of the delta-information (difference between the + old and the new content) and stored in another area of + Shared Memory. The parallel running WAL Writer process + reads them and appends them to the end of the current + WAL file. + Such sequential writes are much faster than writes to random + positions of heap and index files. All WAL records created + out of one dirty page must be transferred to disk before the + dirty page itself can be transferred to disk in the second step. + + + + Second, the transfer of dirty buffers from Shared Memory to + files must take place. This is the primary task of the + Background Writer process. Because I/O activities can block + other processes significantly, it starts periodically and + acts only for a short period. Doing so, its extensive (and + expensive) I/O activities are spread over time, avoiding + debilitating I/O peaks. Also, the Checkpointer process + transfers dirty buffers to file. + + + + The Checkpointer creates + Checkpoints. + A Checkpoint is a point in time when all older dirty buffers, + all older WAL records, and finally a special Checkpoint record + have been written and flushed to disk. Heap and index files + on the one hand and WAL files on the other hand are in sync. + Previous WAL is no longer required. In other words, + a possibly occurring recovery, which integrates the delta + information of WAL into heap and index files, will happen + by replaying only WAL past the last recorded checkpoint + on top of the current heap and files. This speeds up recovery. + + + + While the Checkpointer ensures that a running system can crash + and restart itself in a valid state, the administrator needs + to handle the case where the heap and files themselves become + corrupted (and possibly the locally written WAL, though that is + less common). The options and details are covered extensively + in the backup and restore section (). + For our purposes here, note just that the WAL Archiver process + can be enabled and configured to run a script on filled WAL + files — usually to copy them to a remote location. + + + + + + + The Statistics Collector collects counters about accesses to + SQL objects like tables, rows, indexes, pages, and more. It + stores the obtained information in system tables. + + + + The Logger writes text lines about serious and less serious + events which can happen during database access, e.g., wrong + password, no permission, long-running queries, etc. + + +
+ + + The logical Perspective: Cluster, Database, Schema + + + A server contains one or more + database clusters + (clusters + for short). Each cluster contains three or more + databases. + Each database can contain many + schemas. + A schema can contain + tables, + views, and a lot + of other objects. Each table or view belongs to a single schema + only; they cannot belong to another schema as well. The same is + true for the schema/database and database/cluster relation. + visualizes + this hierarchy. + + +
+ Cluster, Database, Schema + + + + + + + + + + +
+ + + A cluster is the outer container for a + collection of databases. Clusters are created by the command + . + + + + template0 is the very first + database of any cluster. Database template0 + is created during the initialization phase of the cluster. + In a second step, database template1 is generated + as a copy of template0, and finally database + postgres is generated as a copy of + template1. Any + new databases + of the cluster that a user might need, + such as my_db, will be copied from the + template1 database. Due to the unique + role of template0 as the pristine original + of all other databases, no client can connect to it. + + + + Every database must contain at least one schema because all + SQL Objects + are contained in a schema. + Schemas are namespaces for their SQL objects and ensure + (with one exception) that within their scope names are used + only once across all types of SQL objects. E.g., it is not possible + to have a table employee and a view + employee within the same schema. But it is + possible to have two tables employee in + different schemas. In this case, the two tables + are separate objects and independent of each + other. The only exception to this cross-type uniqueness is that + unique constraints + and the according unique index + () use the same name. + + + + Some schemas are predefined. public + acts as the default schema and contains all SQL objects + which are created within public or + without using an explicit schema name. public + should not contain user-defined SQL objects. Instead, it is + recommended to create a separate schema that holds individual + objects like application-specific tables or views. + pg_catalog is a schema for all tables and views of the + System Catalog. + information_schema is a schema for several + tables and views of the System Catalog in a way that conforms + to the SQL standard. + + + + There are many different SQL object + types: database, schema, table, view, materialized + view, index, constraint, sequence, function, procedure, + trigger, role, data type, operator, tablespace, extension, + foreign data wrapper, and more. A few of them, the + Global SQL Objects, are outside of the + strict hierarchy: All database names, + all tablespace names, and all + role names are automatically known and + available throughout the cluster, independent from + the database or schema in which they where defined originally. + + shows the relation between the object types. + + +
+ Hierarchy of Internal Objects + + + + + + + + + + +
+ +
+ + + The physical Perspective: Directories and Files + + + PostgreSQL organizes long-lasting + data as well as volatile state information about transactions + or replication actions in the file system. Every + has its root directory + somewhere in the file system. In many cases, the environment + variable PGDATA points to this directory. + The example shown in + uses + data as the name of this root directory. + + +
+ Directory Structure + + + + + + + + + + +
+ + + data contains many subdirectories and + some files, all of which are necessary to store long-lasting + as well as temporary data. The following paragraphs + describe the files and subdirectories in + data. + + + + base is a subdirectory in which one + subdirectory per database exists. The names of those + subdirectories consist of numbers. These are the internal + Object Identifiers (OID), which are numbers to identify + the database definition in the + System Catalog. + + + + Within the database-specific + subdirectories, there are many files: one or more for + every table and every index to store heap and index + data. Those files are accompanied by files for the + Free Space Maps + (extension _fsm) and + Visibility Maps + (extension _vm), which contain optimization information. + + + + Another subdirectory is global. + In analogy to the database-specific + subdirectories, there are files containing information about + Global SQL Objects. + One type of such Global SQL Objects are + tablespaces. + In global there is information about + the tablespaces, not the tablespaces themselves. + + + + The subdirectory pg_wal contains the + WAL files. + They arise and grow parallel to data changes in the + cluster and remain alive as long as + they are required for recovery, archiving, or replication. + + + + The subdirectory pg_xact contains + information about the status of each transaction: + in_progress, committed, + aborted, or sub_committed. + + + + In pg_tblspc, there are symbolic links + that point to directories containing such SQL objects + that are created within tablespaces. + + + + In the root directory data + there are also some files. In many cases, the configuration + files of the cluster are stored here. As long as the + instance is up and running, the file + postmaster.pid exists here + and contains the process ID (pid) of the + Postmaster which has started the instance. + + + + For more details about the physical implementation + of database objects, see . + + +
+ + + MVCC — Multiversion Concurrency Control + + + In most cases, PostgreSQL databases + support many clients at the same time. Therefore, it is necessary to + protect concurrently running requests from unwanted overwriting + of other's data as well as from reading inconsistent data. Imagine an + online shop offering the last copy of an article. Two clients have the + article displayed at their user interface. After a while, but at the same time, + both users decide to put it to their shopping cart or even to buy it. + Both have seen the article, but only one can be allowed to get it. + The database must bring the two requests in a row, permit the access + to one of them, block the other, and inform the blocked client + that the data was changed by a different process. + + + + A first approach to implement protections against concurrent + accesses to the same data may be the locking of critical + rows. Two such techniques are: + Optimistic Concurrency Control (OCC) + and Two Phase Locking (2PL). + PostgreSQL implements a third, more + sophisticated technique: Multiversion Concurrency + Control (MVCC). The crucial advantage of MVCC + over other technologies gets evident in multiuser OLTP + environments with a massive number of concurrent write + actions. There, MVCC generally performs better than solutions + using locks. In a PostgreSQL + database reading never blocks writing and writing never + blocks reading, even in the strictest level of transaction + isolation. + + + + Instead of locking rows, the MVCC technique creates + a new version of the row when a data-change takes place. To + distinguish between these two versions and to track the timeline + of the row, each of the versions contains, in addition to their user-defined + columns, two special system columns, which are not visible + for the usual SELECT * FROM ... command. + The column xmin contains the transaction ID (xid) + of the transaction, which created this version of the row. Accordingly, + xmax contains the xid of the transaction, which has + deleted this version, or zero, if the version is not + deleted. You can read both with the command + SELECT xmin, xmax, * FROM ... . + + + + When we speak about transaction IDs, you need to know that xids are like + sequences. Every new transaction receives the next number as its ID. + Therefore, this flow of xids represents the flow of transaction + start events over time. But keep in mind that xids are independent of + any time measurement — in milliseconds or whatever. If you dive + deeper into PostgreSQL, you will recognize + parameters with names such as 'xxx_age'. Despite their names, + these '_age' parameters do not specify a period of time but represent + a certain number of transactions, e.g., 100 million. + + + + The description in this chapter simplifies by omitting some details. + When many transactions are running simultaneously, things can + get complicated. Sometimes transactions get aborted via + ROLLBACK immediately or after a lot of other activities, sometimes + a single row is involved in more than one transaction, sometimes + a client crashes, sometimes the sequence of xids restarts + from zero, ... . Therefore, every version of a row contains more + system columns and flags, not only xmin + and xmax. + + + + So, what's going on in detail when write accesses take place? + shows details concerning + xmin, xmax, and user data. + + +
+ Multiversion Concurrency Control + + + + + + + + + +
+ + + An INSERT command creates the first + version of a row. Besides its user data 'x', + this version contains the ID of the creating transaction + 123 in xmin and + 0 in xmax. + xmin indicates that the version + exists since transaction 123 and + xmax that it is currently not deleted. + + + + Somewhat later, transaction 135 + executes an UPDATE of this row by + changing the user data from 'x' to + 'y'. According to the MVCC principles, + the data in the old version of the row does not change! + The value 'x' remains as it was before. + Only xmax changes to 135. + Now, this version is treated as valid exclusively for + transactions with xids from 123 to + 134. As a substitute for the non-occurring + data change in the old version, the UPDATE + creates a new version of the row with its xid in + xmin, 0 in + xmax, and 'y' in the + user data (plus all the other user data from the old version). + This version is now valid for all coming transactions. + + + + All subsequent UPDATE commands behave + in the same way as the first one: they put their xid to + xmax of the current version, create + the next version with their xid in xmin, + 0 in xmax, and the + new user data. + + + + Finally, a row may be deleted by a DELETE + command. Even in this case, all versions of the row remain as + before. Nothing is thrown away so far! Only xmax + of the last version changes to the xid of the DELETE + transaction, which indicates that it is only valid for + transactions with xids older than its own (from + 142 to 820 in this + example). + + + + In summary, the MVCC technology creates more and more versions + of the same row in the table's heap file and leaves them there, + even after a DELETE command. Only the youngest + version is relevant for all future transactions. But the + system must also preserve some of the older ones for a + certain amount of time because the possibility exists that + they are or could become relevant for any pending + transactions. Over time, also the older ones get out of scope + for ALL transactions and therefore become unnecessary. + Nevertheless, they do exist physically on the disk and occupy + space. + + + + Please keep in mind: + + + + + + xmin and xmax + indicate the range from where to where + row versions are valid (visible) for transactions. + This range doesn't imply any direct temporal meaning; + the sequence of xids reflects only the sequence of + transaction begin events. As + xids grow, old row versions get out of scope over time. + If an old row version is no longer valid for ALL existing + transactions, it's called dead. The + space occupied by dead row versions is part of the + bloat. + + + + + + Internally, an UPDATE command acts in the + same way as a DELETE command followed by + an INSERT command. + + + + + + Nothing gets wiped away — with the consequence that the database + occupies more and more disk space. It is obvious that + this behavior has to be corrected in some + way. The next chapter explains how autovacuum + fulfills this task. + + + + + +
+ + + Vacuum + + + As we have seen in the previous chapter, the database + tends to occupy more and more disk space, the + bloat. + This chapter explains how the SQL command + VACUUM and the automatically running + Autovacuum processes clean up + by eliminating bloat. + + + + + Autovacuum runs automatically by + default. Its default parameters as well as such for + VACUUM fit well for most standard + situations. Therefore a novice database manager can + easily skip the rest of this chapter which explains + a lot of details. + + + + + Client processes can issue the SQL command VACUUM + at arbitrary points in time. DBAs do this when they recognize + special situations, or they start it in batch jobs which run + periodically. Autovacuum processes run as part of the + Instance at the server. + There is a constantly running Autovacuum daemon. It permanently + controls the state of all databases based on values that are collected by the + Statistics Collector + and starts Autovacuum processes whenever it detects + certain situations. Thus, it's a dynamic behavior of + PostgreSQL with the intention to tidy + up — whenever it is appropriate. + + + + VACUUM, as well as Autovacuum, don't just eliminate + bloat. They perform additional tasks for minimizing future + I/O activities of themselves as well as of other processes. + This extra work can be done in a very efficient way since in most + cases the expensive physical access to pages has taken place anyway + to eliminate bloat. The additional operations are: + + + + + + + Freeze: Mark the youngest row version + as frozen. This means that the version + is always treated as valid (visible) independent from + the wraparound problem (see below). + + + + + + Visibility Map and + Free Space Map: Log information about + the state of the handled pages in two additional files, the + Visibility Map and the Free Space Map. + + + + + + Statistics: Collect statistics about the + number of rows per table, the distribution of values, and so on, + as the basis for decisions of the query planner. + + + + + + + The eagerness — you can call it 'aggression' — of the + operations eliminating bloat and + freeze is controlled by configuration + parameters, runtime flags, and in extreme situations by + the processes themselves. Because vacuum operations typically are I/O + intensive, which can hinder other activities, Autovacuum + avoids performing many vacuum operations in bulk. Instead, + it carries out many small actions with time gaps in between. + The SQL command VACUUM runs immediately + and without any time gaps. + + + Eliminate Bloat + + + To determine which of the row versions are superfluous, the + elimination operation must evaluate xmax + against several criteria which all must apply: + + + + + + xmax must be different from zero because a + value of zero indicates that the row version is still valid. + + + + + + xmax must contain an xid which is older + than the oldest xid of all currently running transactions + (min(pg_stat_activity.backend_xmin)). + This criterion guarantees that no existing or upcoming transaction + will have read or write access to this row version. + + + + + + The transaction of xmax must be committed. If it was rollback-ed, + this row version is treated as valid. + + + + + + If there is the situation that the row version is part of + multiple transactions, special care and some more actions + must be taken, see: . + + + + + + + After the vacuum operation detects a superfluous row version, it + marks its space as free for future use of writing actions. Only + in rare situations (or in the case of VACUUM FULL), + this space is released to the operating system. In most cases, + it remains occupied by PostgreSQL + and will be used by future INSERT or + UPDATE commands concerning this row or a + completely different one. + + + + Which actions start the elimination of bloat? + + + + + + When a client issues the SQL command VACUUM + in its default format, i.e., without any option. To boost performance, + in this and the next case VACUUM does not + read and act on all pages of the heap. + The Visibility Map, which is very compact and therefore has a small + size, contains information about pages, where bloat-candidates might + be found. Only such pages are processed. + + + + + + When a client issues the SQL command VACUUM + with the option FREEZE. (In this case, + it undertakes much more actions, see + Freeze Row Versions.) + + + + + + When a client issues the SQL command VACUUM + with the option FULL. + Also, in this mode, the bloat disappears, but the strategy used + is very different: In this case, the complete table is copied + to a different file skipping all outdated row versions. This + leads to a significant reduction of used disk space because + the new file contains only the actual data. The old file + is deleted. + + + + + + When an Autovacuum process acts. For optimization + purposes, it considers the Visibility Map in the same way as + VACUUM. Additionally, it ignores tables with few modifications; + see , + which defaults to 50 rows and + , + which defaults to 20%. + + + + + + + + This logic only applies to row versions of the heap. Index entries + don't use xmin/xmax. Nevertheless, such index + entries, which would lead to outdated row versions, are released + accordingly. + + + + The above descriptions omit the fact that xids on a real computer + have a limited size. They count up in the same way as sequences, and after + a certain number of new transactions they are forced to restart + from the beginning, which is called wraparound. + Therefore the terms 'old transaction' / 'young transaction' does + not always correlate with low / high values of xids. Near to the + wraparound point, there are cases where xmin has + a higher value than xmax, although their meaning + is said to be older than xmax. + + +
+ Cyclic usage of XIDs + + + + + + + + + + +
+ + Freeze Row Versions + + + The use of a limited range of IDs for transactions leads + to the necessity to restart the sequence sooner or later. + This does not only have the rare consequence previously + described that sometimes xmin is + higher than xmax. The far + more critical problem is that whenever the system has + to evaluate a WHERE condition, it must decide which row + version is valid (visible) from the perspective of the + transaction of this query. If a wraparound couldn't happen, + this decision would be relatively easy: the xid + must be between xmin and xmax, + and the corresponding transactions of xmin + and xmax must be committed. However, + PostgreSQL has to consider the + possibility of wraparounds. + Therefore the decision becomes more complex. The general + idea of the solution is to use the 'between + xmin and xmax' + comparison only during the youngest period of the row + versions lifetime and afterward replace it with a + 'valid forever' flag in its header. + + + + + + + As a first step, PostgreSQL + divides the complete range of + possible xids into two halves with the two split-points + 'txid_current' and 'txid_current + 2^31'. The half behind + 'txid_current' is considered to represent xids of the + 'past' and the half ahead of 'txid_current' those of the + 'future'. Those of the 'past' are valid (visible) and those + of the 'future' not. + + + + + + With each newly created transaction the two split-points + move forward. When 'txid_current + 2^31' would reach a + row version with xmin equal to that value, it would + immediately jump from 'past' to 'future' and would be + no longer visible! + + + + + + To avoid this unacceptable extinction of data, the vacuum + operation freeze clears the situation + long before the split-point is reached. It sets a flag + in the header of the row version, which completely eliminates + the future use of xmin/xmax and indicates + that the version is valid not only in the 'past'-half + but also in the 'future'-half as well as in all coming + epochs. + + + + + + Which row versions can be frozen by the vacuum operation? + Again, several criteria must be checked, and all must be met. + + + + + + xmax must be zero because only + non-deleted rows can be visible 'forever'. + + + + + + xmin must be older than all currently + existing transactions. This guarantees that no existing + transaction can modify or delete the version. + + + + + + The transactions of xmin and + xmax must be committed. + + + + + + + At what point in time does the freeze operation take place? + + + + + When a client issues the SQL command VACUUM + with its FREEZE option. In this case, all + pages are processed that are marked in the Visibility Map + to potentially have unfrozen rows. + + + + + When a client issues the SQL command VACUUM without + any options but finds that there are xids older than + + (default: 150 million) minus + + (default: 50 million). + As before, all pages are processed that are + marked in the Visibility Map to potentially have unfrozen + rows. + + + + + When an Autovacuum process runs. Such a process acts in one + of two modes: + + + + + + In the normal mode, it skips + pages with row versions that are younger than + + (default: 50 million) and works only on pages where + all xids are older. The skipping of young xids prevents + work on such pages, which are likely to be changed + by one of the future SQL commands. + + + + + The process switches + to an aggressive mode if it recognizes + that for the processed table their oldest xid exceeds + + (default: 200 million). The value of the oldest unfrozen + xid is stored per table in pg_class.relfrozenxid. + In this aggressive mode Autovacuum + processes all such pages of the selected table that are marked + in the Visibility Map to potentially have bloat or unfrozen rows. + + + + + + + + + + In the first two cases and with Autovacuum in + aggressive mode, the system knows + to which value the oldest unfrozen xid has moved forward and + logs the value in pg_class.relfrozenxid. + The distance between this value and the 'txid_current' split + point becomes smaller, and the distance to 'txid_current + 2^31' + becomes larger than before. + + +
+ Freeze + + + + + + + + + + +
+ + Protection against Wraparound Failure + + + The Autovacuum processes are initiated by the constantly running + Autovacuum daemon. If the daemon detects that for a table + autovacuum_freeze_max_age is exceeded, it + starts an Autovacuum process in aggressive mode + (see above) — even if Autovacuum is disabled. + + + Visibility Map and Free Space Map + + + The Visibility Map + (VM) contains two flags — stored as + two bits — for each page of the heap. If the first bit + is set, that indicates that the associated page does not + contain any bloat. If the second one is set, that indicates + that the page contains only frozen rows. + + + + Please consider two details. First, in most cases a page + contains many rows, some of them in many versions. + However, the flags are associated with the page, + not with a row or a row version. The flags are set + only under the condition that they are valid for ALL + row versions of the page. Second, since there + are only two bits per page, the VM is considerably + smaller than the heap. Therefore it is buffered + in RAM in almost all cases. + + + + The setting of the flags is silently done by VACUUM + and Autovacuum during their bloat and freeze operations. + This is done to speed up future vacuum actions, + regular accesses to heap pages, and some accesses to + the index. Every data-modifying operation on any row + version of the page clears the flags. + + + + The Free Space Map + (FSM) tracks the amount of free space per page. It is + organized as a highly condensed b-tree of (rounded) sizes. + As long as VACUUM or Autovacuum change + the free space on any processed page, they log the new + values in the FSM in the same way as all other writing + processes. + + + Statistics + + + Statistic information helps the Query Planner to make optimal + decisions for the generation of execution plans. This + information can be gathered with the SQL commands + ANALYZE or VACUUM ANALYZE. + But also Autovacuum processes gather + such information. Depending on the percentage of changed rows + per table , + the Autovacuum daemon starts Autovacuum processes to collect + statistics per table. This dynamic invocation of analyze + operations allows PostgreSQL to + adopt queries to changing circumstances. + + + + For more details about vacuum operations, especially for its + numerous parameters, see . + + +
+ + + Transactions + + Transactions + are a fundamental concept of relational database systems. + Their essential point is that they bundle multiple + read- or write-operations into a single all-or-nothing + operation. Furthermore, they separate and protect concurrent + actions of different connections from each other. Thereby + they implement the ACID paradigm. + + + + In PostgreSQL there are two ways + to establish a transaction. The explicit way uses the keywords + BEGIN and + COMMIT (respectively + ROLLBACK) before + and after a sequence of SQL statements. The keywords mark + the transaction's start- and end-point. On the other hand, you + can omit the keywords. This is the implicit way, where + every single SQL command automatically establishes a new + transaction. + + +BEGIN; -- establish a new transaction +UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; +UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; +COMMIT; -- finish the transaction + +-- this UPDATE runs as the only command of a separate transaction ... +UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; + +-- ... and this one runs in another transaction +UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; + + + + + As mentioned, the primary property of a transaction is its + atomicity: either all or none of its operations succeed, + regardless of the fact that it may consist of a lot of + different write-operations, and each such operation may + affect thousands or millions of rows. As soon as one of the + operations fails, all previous operations fail also, which + means that all modified rows retain their values as of the + beginning of the transaction. + + + + The atomicity also affects the visibility of changes. No + connection running simultaneously to a data modifying + transaction will ever see any change before the + transaction successfully executes a COMMIT + — even in the lowest + isolation level + of transactions. PostgreSQL + does never show uncommitted changes to other connections. + + + + The situation regarding visibility is somewhat different + from the point of view of the modifying transaction. + SELECT commands issued inside a + transaction delivers all changes done so far by this + transaction. + + + How does it work? + + + Every INSERT, UPDATE, + and DELETE command creates new row + versions — according to the MVCC rules. This + creates the risk that other transactions may see the + new row versions, and after a while and some more + activities of the modifying transaction they may see the + next row versions. Results would be a kind of 'moving + target' in absolute contrast to the all-or-nothing + principle. + + + + PostgreSQL overcomes this + problem by showing only such row versions to other + transactions whose originating transaction is + successfully committed. It skips all row versions of + uncommitted transactions. And + PostgreSQL solves one more + problem. Even the single COMMIT + command needs a short time interval for its execution. + Therefore its critical 'dead-or-survival' phase + runs in a priviledged mode where it cannot be + interrupted by other processes. + + + What are the benefits? + + + Transactions relieve applications from many standard + actions that must be implemented for nearly every use case. + + + + Business logic often contains strong, but for a computer, + relative abstract requirements. The above example shows + the transfers of some money from one account to another. + It is obvious + that the decrease of the one and the increase of the + other must be indivisible. Nevertheless, there is no particular + need for an application to do something to ensure the + atomicity + of this behavior. It's enough to surround them with + BEGIN and COMMIT. + + + + Applications often demand the feature of 'undoing' + previously taken actions under some application-specific + conditions. In such cases, the application simply issues a + ROLLBACK command instead of a + COMMIT. The ROLLBACK + cancels the transaction, and all changes made so far remain + invisible forever; it is as if they had never happened. There + is no need for the application to log its activities and + undo every step of the transaction separately. + + + + Transactions ensure that the + consistency + of the complete database always keeps valid. Declarative + rules like + primary- or + foreign keys, + checks, + other constraints, or + triggers + are part of the all-or-nothing nature of transactions. + + + + Also, all self-evident — but possibly not obvious + — low-level demands on the database system are + ensured; e.g. index entries for rows must become + visible at the same moment as the rows themselves. + + + + There is the additional feature + 'isolation level', + which separates transactions from each other in certain ways. + It automatically prevents applications from some strange + situations. + + + + Lastly, it is worth to notice that changes done by a + committed transaction will survive all future application, + instance, or hardware failures. The next chapter + explains this + durability. + + + + + Reliability + + + Nothing is perfect and failures inevitably happen. + However, the most common types of failure are + well known and PostgreSQL + implements strategies to overcome them. + Such strategies use parts of the previously presented + techniques MVCC and transaction-rollback, plus additional + features. + + + Failures at the client side + + A client + can fail in different ways. Its hardware can get damaged, + the power supply can fail, the network connection to the + server can break, or the client application may run into + a severe software error like a null pointer exception. + Because PostgreSQL uses a + client/server architecture, no direct problem for the + database will occur. In all of this cases, the + Backend process, + which is the client's counterpart at the server-side, + may recognize that the network connection is no longer + working, or it may run into a timeout after a while. It + terminates, and there is no harm to the database. As + usual, uncommitted data changes initiated by this client + are not visible to any other client. + + + Failures at the server-side + + Instance failure + + The instance may suddenly fail because of power off + or other problems. This will affect all running processes, the RAM, + and possibly the consistency of disk files. + + + After a restart, PostgreSQL + automatically recognizes that the last shutdown of the + instance did not happen as expected: files might not be + closed properly and the postmaster.pid + file exists. PostgreSQL + tries to clean up the situation. This is possible because + all changes in the database are stored twice. First, + the WAL files contain them as a chronology of + WAL records, + which include the new data values and information about commit + actions. The WAL records are written first. Second, + the data itself shall exist in the heap and index files. + In opposite to the WAL records, this part may or may + not have been transferred entirely from Shared Memory + to the files. + + + The automatic recovery searches within the WAL files for + the latest + checkpoint. + This checkpoint signals that the database files are in + a consistent state, especially that all WAL records up to + this point were successfully stored in heap and index. Starting + here, the recovery process copies the following WAL records + to heap and index. As a result, the files contain all + changes and reach a consistent state. Changes of committed + transactions are visible; those of uncommited transactions + are also in the files, but - as usual - they are never seen + by any of the following transactions because uncommited + changes are never shown. Such recovery actions run + completely automatically, it is not necessary that a + database administrator configure or start anything by + himself. + + + Disk crash + + If a disk crashes, the course of action described previously + cannot work. It is likely that the WAL files and/or the + data and index files are no longer available. The + database administrator must take special actions to + overcome such situations. + + + He obviously needs a backup. How to take such a backup + and use it as a starting point for a recovery of the + cluster is explained in more detail in the next + chapter. + + + Disk full + + It is conceivable that over time the disk gets full, + and there is no room for additional data. In this case, + PostgreSQL stops accepting + data-modifying commands or even terminates completely. + No data loss or data corruption will occur. + + + To come out of such a situation, the administrator should + remove unused files from this disk. But he should never + delete files from the + data directory. + Nearly all of them are necessary for the consistency + of the database. + + + High availability + + Database servers can work together to allow a second + server to quickly take over the workload if the + primary server fails for whatever reason + (high availability), + or to allow several computers to serve the same data + for the purpose of load balancing. + + + + + + Backup + + + Taking backups is a basic task of database maintenance. + PostgreSQL supports + three different strategies; each has its own + strengths and weaknesses. + + + + File system level backup + + + + + Logical backup via pg_dump + + + + + Continuous archiving based on pg_basebackup + and WAL files + + + + + + File system level backup + + You can use any appropriate OS tool to create a + copy + of the cluster's directory structure and files. In + case of severe problems such a copy can serve as + the source of recovery. But in order to get a + USABLE backup by this method, + the database server MUST be + shut down during the complete runtime of the copy + command! + + + The obvious disadvantage of this method is that there + is a downtime where no user interaction is possible. + The other two strategies run during regular operating + times. + + + Logical backup via pg_dump + + The tool pg_dump is able to take a + copy + of the complete cluster or certain parts of it. It stores + the copy in the form of SQL CREATE and + INSERT commands. It runs in + parallel to other processes in its own transaction. + + + The output of pg_dump may be used as + input of psql to restore the data + (or to copy it to another database). + + + The main advantage over the other two methods is that it + can pick parts of the cluster, e.g., a single table or one + database. The other two methods work only at the level of + the complete cluster. + + + Continuous archiving based on pg_basebackup and WAL files + + This method + is the most sophisticated and complex one. It + consists of two phases. + + + First, you need to create a so called + basebackup with the tool + pg_basebackup. The result is a + directory structure plus files which contains a + consistent copy of the original cluster. + pg_basebackup runs in + parallel to other processes in its own transaction. + + + The second step is recommended but not necessary. All + changes to the data are stored in WAL files. If you + continuously save such WAL files, you have the history + of the cluster. This history can be applied to a + basebackup in order to recreate + any state of the cluster between the time of + pg_basebackup's start time and + any later point in time. This technique + is called 'Point-in-Time Recovery (PITR)'. + + + If configured, the + Archiver process + will automatically copy every single WAL file to a save location. + Its configuration + consists mainly of a string, which contains a copy command + in the operating system's syntax. In order to protect your + data against a disk crash, the destination location + of a basebackup as well as of the + archived WAL files should be on a + disk which is different from the data disk. + + + If it gets necessary to restore the cluster, you have to + copy the basebackup and the + archived WAL files to + their original directories. The configuration of this + recovery procedure + contains a string with the reverse copy command: from + archive location to database location. + + + + + + +
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 38e8aa0bbf..7490d3c9c2 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -80,6 +80,7 @@ %allfiles; + diff --git a/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg b/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg new file mode 100644 index 0000000000..7e13753d48 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-ink-svgo.svg @@ -0,0 +1,160 @@ + + + Server (Hardware, Container, or VM) + + + + + + + + schema 'public' + + + tables, views, ... + + + + (more system schemas) + + + + + + + schema 'public' + + + tables, views, ... + + + + 'my_schema' (optional) + + + tables, views, ... + + + + (more system schemas) + + + + + UML Note + + + + + + + + + + Server (Hardware, Container, or VM) + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + database 'template0' + + + + + + database 'template1' + + + + + + database 'postgres' + + + + + + database 'my_db' (optional) + + + + + + Global SQL objects + + + + + + + + + + 1) + + + By default, you work in the cluster 'data', database 'postgres', + + + schema 'public'. + + + 2) + + + More system schemas: pg_catalog, information_schema, + + + pg_temp, pg_toast. + + + 3) + + + Global SQL objects: Some SQL objects are automatically active + + + and known database- or even cluster-wide. + + + 4) + + + The command 'initdb' creates a new cluster with the three + + + databases 'template0', 'template1', and 'postgres'. The command + + + 'createdb' creates a new database. + + + 5) + + + If multiple clusters are active on one server at the same time, + + + each one is managed by an individual instance. Each such instance + + + uses a different port. + + + 6) + + + No client application is allowed to connect to 'template0'. + + + diff --git a/doc/src/sgml/images/cluster-db-schema-ink.svg b/doc/src/sgml/images/cluster-db-schema-ink.svg new file mode 100644 index 0000000000..1fffb9737a --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-ink.svg @@ -0,0 +1,482 @@ + + + + + + image/svg+xml + + Server (Hardware, Container, or VM) + + + + + Server (Hardware, Container, or VM) + + + + + + + + + + schema 'public' + tables, views, ... + + (more system schemas) + + + + + + + + schema 'public' + tables, views, ... + + 'my_schema' (optional) + tables, views, ... + + (more system schemas) + + + UML Note + + + + + + + + + + + + + + Server (Hardware, Container, or VM) + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + database 'template0' + + + + + database 'template1' + + + + + database 'postgres' + + + + + database 'my_db' (optional) + + + + + Global SQL objects + + + + + + + + + 1) + By default, you work in the cluster 'data', database 'postgres', + schema 'public'. + 2) + More system schemas: pg_catalog, information_schema, + pg_temp, pg_toast. + 3) + Global SQL objects: Some SQL objects are automatically active + and known database- or even cluster-wide. + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + 6) + No client application is allowed to connect to 'template0'. + + diff --git a/doc/src/sgml/images/cluster-db-schema-raw.svg b/doc/src/sgml/images/cluster-db-schema-raw.svg new file mode 100644 index 0000000000..af50c07330 --- /dev/null +++ b/doc/src/sgml/images/cluster-db-schema-raw.svg @@ -0,0 +1,173 @@ + + + + Server (Hardware, Container, or VM) + + + + + + + + + + + + + + schema 'public' + tables, views, ... + + + (more system schemas) + + + + + + + + + + schema 'public' + tables, views, ... + + + 'my_schema' (optional) + tables, views, ... + + + (more system schemas) + + + + UML Note + + + + + + + + + + + + + + + + + + Server (Hardware, Container, or VM) + + + + + cluster 'data' (default, managed by one instance) + + + + cluster 'cluster_2' (optional, managed by a different instance) + + + + + + + database 'template0' + + + + + + database 'template1' + + + + + + database 'postgres' + + + + + + database 'my_db' (optional) + + + + + + Global SQL objects + + + + + + + + + + + 1) + By default, you work in the cluster 'data', database 'postgres', + schema 'public'. + + 2) + More system schemas: pg_catalog, information_schema, + pg_temp, pg_toast. + + 3) + Global SQL objects: Some SQL objects are automatically active + and known database- or even cluster-wide. + + 4) + The command 'initdb' creates a new cluster with the three + databases 'template0', 'template1', and 'postgres'. The command + 'createdb' creates a new database. + + 5) + If multiple clusters are active on one server at the same time, + each one is managed by an individual instance. Each such instance + uses a different port. + + 6) + No client application is allowed to connect to 'template0'. + + + + diff --git a/doc/src/sgml/images/directories-ink-svgo.svg b/doc/src/sgml/images/directories-ink-svgo.svg new file mode 100644 index 0000000000..95fa76b9c6 --- /dev/null +++ b/doc/src/sgml/images/directories-ink-svgo.svg @@ -0,0 +1,164 @@ + + + Directory structure of a cluster + + + + + + Directory + + + + + + + File + + + + + + + + + + + Directory Structure + + + + + ... /pg/ + + + An arbitrary directory + + + + + + data/ + + + Root of cluster 'data' (see: PGDATA) + + + + + + base/ + + + Subdirectory containing per-database subdirectories + + + + + + 1/ + + + Subdirectory for data of first database 'template0' + + + + + + 12992/ + + + Subdirectory for data of second database 'template1' + + + + + + 12999/ + + + Subdirectory for data of third database 'postgres' + + + + + + nnnnn/ + + + Optional: more subdirectories for databases, e.g. 'my_db' + + + + + + global/ + + + Subdirectory with information about Global SQL Objects + + + + + + pg_wal/ + + + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + + + pg_xact/ + + + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + + + pg_tblspc/ + + + Subdirectory containing symbolic links to tablespaces + + + + + + pg_... / + + + Some more subdirectories + + + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + + ... /xyz/ + + + Same or another arbitrary directory + + + + + + cluster_2/ + + + Root of another cluster 'cluster_2' + + + diff --git a/doc/src/sgml/images/directories-ink.svg b/doc/src/sgml/images/directories-ink.svg new file mode 100644 index 0000000000..8151cf583a --- /dev/null +++ b/doc/src/sgml/images/directories-ink.svg @@ -0,0 +1,397 @@ + + + + + + image/svg+xml + + Directory structure of a cluster + + + + + Directory structure of a cluster + + + + + Directory + + + + + + File + + + + + + + + + + + + Directory Structure + + + + ... /pg/ + An arbitrary directory + + + + data/ + Root of cluster 'data' (see: PGDATA) + + + + base/ + Subdirectory containing per-database subdirectories + + + + + 1/ + Subdirectory for data of first database 'template0' + + + + 12992/ + Subdirectory for data of second database 'template1' + + + + 12999/ + Subdirectory for data of third database 'postgres' + + + + nnnnn/ + Optional: more subdirectories for databases, e.g. 'my_db' + + + + global/ + Subdirectory with information about Global SQL Objects + + + + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + pg_tblspc/ + Subdirectory containing symbolic links to tablespaces + + + + pg_... / + Some more subdirectories + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + ... /xyz/ + Same or another arbitrary directory + + + + cluster_2/ + Root of another cluster 'cluster_2' + + diff --git a/doc/src/sgml/images/directories-raw.svg b/doc/src/sgml/images/directories-raw.svg new file mode 100644 index 0000000000..6d16a03169 --- /dev/null +++ b/doc/src/sgml/images/directories-raw.svg @@ -0,0 +1,144 @@ + + + + Directory structure of a cluster + + + + + + + + Directory + + + + + + + File + + + + + + + + + + + + + + + + Directory Structure + + + + + ... /pg/ + An arbitrary directory + + + + + data/ + Root of cluster 'data' (see: PGDATA) + + + + + base/ + Subdirectory containing per-database subdirectories + + + + + + 1/ + Subdirectory for data of first database 'template0' + + + + 12992/ + Subdirectory for data of second database 'template1' + + + + 12999/ + Subdirectory for data of third database 'postgres' + + + + nnnnn/ + Optional: more subdirectories for databases, e.g. 'my_db' + + + + + global/ + Subdirectory with information about Global SQL Objects + + + + + pg_wal/ + Subdirectory for Write Ahead Log files ('pg_xlog' before version 10) + + + + + pg_xact/ + Subdirectory for transaction commit status ('pg_clog' before version 10) + + + + + pg_tblspc/ + Subdirectory containing symbolic links to tablespaces + + + + + pg_... / + Some more subdirectories + + + + + + 'postmaster.pid' and other files with cluster-wide relevance + + + + + + ... /xyz/ + Same or another arbitrary directory + + + + + cluster_2/ + Root of another cluster 'cluster_2' + + + diff --git a/doc/src/sgml/images/freeze-ink-svgo.svg b/doc/src/sgml/images/freeze-ink-svgo.svg new file mode 100644 index 0000000000..6fedfb7633 --- /dev/null +++ b/doc/src/sgml/images/freeze-ink-svgo.svg @@ -0,0 +1,84 @@ + + + Freeze + + + + + + + + + + + + Freeze to keep visible + + + + + | (0) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > (1) (5) | (2) | (3) | (4) + + + + PAST + + + FUTURE + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ 32 - 1 + + + 1: txid_current + 2 ^ 31 (split-point) + + + 2: autovacuum_freeze_max_age (200 mio.) + + + 3: vacuum_freeze_table_age (150 mio.) + + + 4: vacuum_freeze_min_age (50 mio.) + + + 5: txid_current (split-point, jungest xid) + + + per table: pg_class.relfrozenxid must be between (1) and (5); + + + normally it is between (3) and (4) + + + + Unfrozen xid + + + + Frozen xid + + + (figure is out of scale) + + + diff --git a/doc/src/sgml/images/freeze-ink.svg b/doc/src/sgml/images/freeze-ink.svg new file mode 100644 index 0000000000..009cfe4b41 --- /dev/null +++ b/doc/src/sgml/images/freeze-ink.svg @@ -0,0 +1,365 @@ + + + + + + image/svg+xml + + Freeze + + + + + Freeze + + + + + + + + + + + + + + + + Freeze + to keep visible + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + (1) + (5) + | + (2) + | + (3) + | + (4) + + + + PAST + FUTURE + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ +32 + - 1 + 1: txid_current + 2 ^ 31 (split-point) + 2: autovacuum_freeze_max_age (200 mio.) + 3: vacuum_freeze_table_age (150 mio.) + 4: vacuum_freeze_min_age (50 mio.) + 5: txid_current (split-point, jungest xid) + per table: pg_class.relfrozenxid + must + be between (1) and (5); + normally it is between (3) and (4) + + Unfrozen xid + + Frozen xid + (figure is out of scale) + + diff --git a/doc/src/sgml/images/freeze-raw.svg b/doc/src/sgml/images/freeze-raw.svg new file mode 100644 index 0000000000..2d1d256184 --- /dev/null +++ b/doc/src/sgml/images/freeze-raw.svg @@ -0,0 +1,123 @@ + + + + Freeze + + + + + + + + + + + + + + + + + + + + + + Freeze to keep visible + + + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + (1) + (5) + + | + (2) + + | + (3) + + | + (4) + + + + + + PAST + FUTURE + + + + + + + + + + + + + + + + + + + + + + 0: 0 .. 2 ^ 32 - 1 + 1: txid_current + 2 ^ 31 (split-point) + 2: autovacuum_freeze_max_age (200 mio.) + 3: vacuum_freeze_table_age (150 mio.) + 4: vacuum_freeze_min_age (50 mio.) + 5: txid_current (split-point, jungest xid) + per table: pg_class.relfrozenxid + must be between (1) and (5); + normally it is between (3) and (4) + + + Unfrozen xid + + + Frozen xid + + (figure is out of scale) + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg b/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg new file mode 100644 index 0000000000..26bce6176d --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-ink-svgo.svg @@ -0,0 +1,83 @@ + + + Hierarchy of Internal Objects + + + + + Hierarchy of internal Objects + + + + + + Cluster + + + + + Database Names + + + + + + Tablespace + + + + + + Replication Origins + + + + + + Subscription for + + + Logical Replication + + + + + + Role + + + + + + + Database + + + + + Extension + + + + + + Collation + + + + + + Schema + + + + + Table, View, ... + + + + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-ink.svg b/doc/src/sgml/images/internal-objects-hierarchy-ink.svg new file mode 100644 index 0000000000..e5745818d9 --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-ink.svg @@ -0,0 +1,255 @@ + + + + + + image/svg+xml + + Hierarchy of Internal Objects + + + + + + Hierarchy of Internal Objects + + + + + + Hierarchy of internal Objects + + + + + Cluster + + + Database Names + + + + Tablespace + + + + Replication Origins + + + + Subscription for + Logical Replication + + + + Role + + + + + Database + + + + Extension + + + + Collation + + + + Schema + + + Table, View, ... + + + + + + diff --git a/doc/src/sgml/images/internal-objects-hierarchy-raw.svg b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg new file mode 100644 index 0000000000..f0dc890f6b --- /dev/null +++ b/doc/src/sgml/images/internal-objects-hierarchy-raw.svg @@ -0,0 +1,95 @@ + + + + Hierarchy of Internal Objects + + + + + + + + Hierarchy of internal Objects + + + + + + + + Cluster + + + + Database Names + + + + + Tablespace + + + + + Replication Origins + + + + + Subscription for + Logical Replication + + + + + Role + + + + + + + Database + + + + + Extension + + + + + Collation + + + + + Schema + + + + Table, View, ... + + + + + + + diff --git a/doc/src/sgml/images/mvcc-ink-svgo.svg b/doc/src/sgml/images/mvcc-ink-svgo.svg new file mode 100644 index 0000000000..8e67da93d1 --- /dev/null +++ b/doc/src/sgml/images/mvcc-ink-svgo.svg @@ -0,0 +1,151 @@ + + + MVCC + + + + + + + + + + + + + + + + + T 123 : INSERT + + + + + 123 + + + 0 + + + 'x' + + + + + T 135 : UPDATE + + + + + 135 + + + 0 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + T 142 : UPDATE + + + + + 142 + + + 0 + + + 'z' + + + + 135 + + + 142 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + T 821 : DELTE + + + + + 142 + + + 821 + + + 'z' + + + + 135 + + + 142 + + + 'y' + + + + 123 + + + 135 + + + 'x' + + + + + + Legend + + + + xmin + + + xmax + + + data + + + + diff --git a/doc/src/sgml/images/mvcc-ink.svg b/doc/src/sgml/images/mvcc-ink.svg new file mode 100644 index 0000000000..f4161b3e79 --- /dev/null +++ b/doc/src/sgml/images/mvcc-ink.svg @@ -0,0 +1,398 @@ + + + + + + image/svg+xml + + MVCC + + + + + MVCC + + + + + + + + + + + + + + + + + + + + + + + T + 123 +: INSERT + + + + 123 + 0 + 'x' + + + + T + 135 +: UPDATE + + + + 135 + 0 + 'y' + + 123 + 135 + 'x' + + + + T + 142 +: UPDATE + + + + 142 + 0 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + T + 821 +: DELTE + + + + 142 + 821 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + Legend + + xmin + xmax + data + + + diff --git a/doc/src/sgml/images/mvcc-raw.svg b/doc/src/sgml/images/mvcc-raw.svg new file mode 100644 index 0000000000..0481c4c938 --- /dev/null +++ b/doc/src/sgml/images/mvcc-raw.svg @@ -0,0 +1,145 @@ + + + + MVCC + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + T + 123 + : INSERT + + + + + 123 + 0 + 'x' + + + + + T + 135 + : UPDATE + + + + + 135 + 0 + 'y' + + 123 + 135 + 'x' + + + + + T + 142 + : UPDATE + + + + + 142 + 0 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + T + 821 + : DELTE + + + + + 142 + 821 + 'z' + + 135 + 142 + 'y' + + 123 + 135 + 'x' + + + + + + Legend + + xmin + xmax + data + + + + + diff --git a/doc/src/sgml/images/ram-proc-file-ink-svgo.svg b/doc/src/sgml/images/ram-proc-file-ink-svgo.svg new file mode 100644 index 0000000000..723d67bd8d --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-ink-svgo.svg @@ -0,0 +1,285 @@ + + + PG Overall Server Architecture + + + + + + UML Note (200 x 20 px) + + + + + + UML Note (250 x 20 px) + + + + + + UML Note (100 x 35 px) + + + + + + UML Note (170 x 50 px) + + + + + + UML State (300x120) + + + + + + UML State (350x120) + + + + + + Disc + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + Client + + + Server + + + + + + maintenance_work_mem (per connection) + + + work_mem (per query operation) + + + autovacuum_work_mem (per worker process) + + + temp_buffer (per connection) + + + ... + + + + Individual Memory + + + + + + shared_buffers (heap and index) + + + wal_buffers (WAL records) + + + ... + + + + Shared Memory (per Instance) + + + + + Postmaster + + + + + + 1 + + + + + Backend processes (one per connection) + + + + + + + + 3 + + + + + + + Creates backend processes + + + + + + 2 + + + + + + + + WAL Writer + + + + + + Checkpointer + + + + + + + + Checkpoint + + + Record + + + + + Background Writer + + + + + + WAL Archiver + + + + + + Autovacuum + + + + + + Logger + + + + Stats Collector + + + + + Log + + + text lines, + + + sequential + + + + + + + Heap and + + + Index + + + binary blocks, + + + random + + + + + + + Read heap and index + + + pages and transfer + + + them to shared_buffers + + + + + + WAL + + + binary records, + + + sequential + + + + + + Archived + + + WAL + + + + + + + Via TCP/IP or socket + + + + RAM + + + PROCESSES + + + FILES + + diff --git a/doc/src/sgml/images/ram-proc-file-ink.svg b/doc/src/sgml/images/ram-proc-file-ink.svg new file mode 100644 index 0000000000..4490bf51e1 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-ink.svg @@ -0,0 +1,841 @@ + + + + + + image/svg+xml + + PG Overall Server Architecture + + + + + PG Overall Server Architecture + + + + + UML Note (200 x 20 px) + + + + UML Note (250 x 20 px) + + + + UML Note (100 x 35 px) + + + + UML Note (170 x 50 px) + + + + + UML State (300x120) + + + + UML State (350x120) + + + + + Disc + + + + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + + + + + Client + Server + + + + + maintenance_work_mem (per connection) + work_mem (per query operation) + autovacuum_work_mem (per worker process) + temp_buffer (per connection) + ... + + Individual Memory + + + + + shared_buffers (heap and index) + wal_buffers (WAL records) + ... + + Shared Memory (per Instance) + + + + + Postmaster + + + + + 1 + + + + + Backend processes (one per connection) + + + + + + + 3 + + + + + + Creates backend processes + + + + 2 + + + + + + + + + + WAL Writer + + + + + + + Checkpointer + + + + + + + Checkpoint + Record + + + + + Background Writer + + + + + + + WAL Archiver + + + + + + + Autovacuum + + + + + + + Logger + + + + + Stats Collector + + + + + + + Log + text lines, + sequential + + + + + Heap and + Index + binary blocks, + random + + + + + Read heap and index + pages and transfer + them to shared_buffers + + + + WAL + binary records, + sequential + + + + Archived + WAL + + + + + + Via TCP/IP or socket + + + + RAM + PROCESSES + FILES + + diff --git a/doc/src/sgml/images/ram-proc-file-raw.svg b/doc/src/sgml/images/ram-proc-file-raw.svg new file mode 100644 index 0000000000..aec5811c54 --- /dev/null +++ b/doc/src/sgml/images/ram-proc-file-raw.svg @@ -0,0 +1,301 @@ + + + + PG Overall Server Architecture + + + + + + + + UML Note (200 x 20 px) + + + + UML Note (250 x 20 px) + + + + UML Note (100 x 35 px) + + + + UML Note (170 x 50 px) + + + + + + UML State (300x120) + + + + UML State (350x120) + + + + + + Disc + + + + + + + + + Laptop + + + + + + + + + + + + + + + + + + + + + + + + + + + + Client + Server + + + + + + + maintenance_work_mem (per connection) + work_mem (per query operation) + autovacuum_work_mem (per worker process) + temp_buffer (per connection) + ... + + Individual Memory + + + + + + shared_buffers (heap and index) + wal_buffers (WAL records) + ... + + Shared Memory (per Instance) + + + + + + Postmaster + + + + + 1 + + + + + + Backend processes (one per connection) + + + + + + + + 3 + + + + + + + Creates backend processes + + + + 2 + + + + + + + + + + + + WAL Writer + + + + + + + + Checkpointer + + + + + + + Checkpoint + Record + + + + + + Background Writer + + + + + + + + WAL Archiver + + + + + + + + Autovacuum + + + + + + + + Logger + + + + + + Stats Collector + + + + + + + + Log + text lines, + sequential + + + + + + Heap and + Index + binary blocks, + random + + + + + + Read heap and index + pages and transfer + them to shared_buffers + + + + + WAL + binary records, + sequential + + + + + Archived + WAL + + + + + + + Via TCP/IP or socket + + + + + RAM + PROCESSES + FILES + + + diff --git a/doc/src/sgml/images/wraparound-ink-svgo.svg b/doc/src/sgml/images/wraparound-ink-svgo.svg new file mode 100644 index 0000000000..9882d2be23 --- /dev/null +++ b/doc/src/sgml/images/wraparound-ink-svgo.svg @@ -0,0 +1,40 @@ + + + Cyclic usage of XIDs + + + + + + + + + + Cyclic usage of XIDs modulo 2 ^ 32 + + + + + | (0) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > | (1) | (2) | (3) | (4) + + + + + 0: 0 .. 2 ^ 32 - 1 + + + 1: oldest active xid (pg_stat_activity.backend_xmin) + + + 2: xmin of one row version + + + 3: xmax of the same row version + + + 4: jungest xid (txid_current) + + + diff --git a/doc/src/sgml/images/wraparound-ink.svg b/doc/src/sgml/images/wraparound-ink.svg new file mode 100644 index 0000000000..a9c51f4e43 --- /dev/null +++ b/doc/src/sgml/images/wraparound-ink.svg @@ -0,0 +1,198 @@ + + + + + + image/svg+xml + + Cyclic usage of XIDs + + + + + Cyclic usage of XIDs + + + + + + + + + + + + Cyclic usage of XIDs modulo 2 + ^ +32 + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + | + (1) + | + (2) + | + (3) + | + (4) + + + + 0: 0 .. 2 ^ +32 + - 1 + 1: oldest active + xid (pg_stat_activity.backend_xmin) + 2: xmin of one row version + 3: xmax of the same row version + 4: jungest xid (txid_current) + + diff --git a/doc/src/sgml/images/wraparound-raw.svg b/doc/src/sgml/images/wraparound-raw.svg new file mode 100644 index 0000000000..9406f52970 --- /dev/null +++ b/doc/src/sgml/images/wraparound-raw.svg @@ -0,0 +1,79 @@ + + + + Cyclic usage of XIDs + + + + + + + + + + + + + + + + Cyclic usage of XIDs modulo 2 + ^ 32 + + + + + + + + | + (0) + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > + | + (1) + | + (2) + | + (3) + | + (4) + + + + + 0: 0 .. 2 ^ 32 - 1 + 1: oldest active xid (pg_stat_activity.backend_xmin) + 2: xmin of one row version + 3: xmax of the same row version + 4: jungest xid (txid_current) + + + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 730d5fdc34..e9e9f9495f 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -248,6 +248,7 @@ break is not needed in a wider output rendering. + &architecture; &arch-dev; &catalogs; &protocol; diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml index 9bb5c1a6d5..abb61445f2 100644 --- a/doc/src/sgml/start.sgml +++ b/doc/src/sgml/start.sgml @@ -53,7 +53,7 @@ - Architectural Fundamentals + Client/Server Model Before we proceed, you should understand the basic @@ -68,34 +68,52 @@ client/server model. A PostgreSQL session consists of the following cooperating processes (programs): + - - - - A server process, which manages the database files, accepts - connections to the database from client applications, and - performs database actions on behalf of the clients. The - database server program is called - postgres. - postgres - - + + + + A process at the server site with the name + Postmaster. + postgres + postmaster + It accepts connection requests from client applications, starts + (forks) a new + Backend process for each of them, and passes + the connection to it. From that point on, the client and the new + Backend process communicate directly without intervention by the original + Postmaster process. Thus, the Postmaster process is always running, + waiting for new client connections, whereas clients and associated + Backend processes come and go. (All of this is of course invisible + to the user. We only mention it here for completeness.) + + - - - The user's client (frontend) application that wants to perform - database operations. Client applications can be very diverse - in nature: a client could be a text-oriented tool, a graphical - application, a web server that accesses the database to - display web pages, or a specialized database maintenance tool. - Some client applications are supplied with the - PostgreSQL distribution; most are - developed by users. - - + + + A group of processes at the server site, the Instance, to which also + the Postmaster process belongs. Their duties are handling of + central, common database activities like file access, transaction + handling, vacuum, checkpoints, replication, and more. The mentioned + Backend processes delegate those actions to the instance. + + - - + + + The user's client (frontend) application that wants to perform + database operations. Client applications can be very diverse + in nature: a client could be a text-oriented tool, a graphical + application, a web server that accesses the database to + display web pages, or a specialized database maintenance tool. + Some client applications are supplied with the + PostgreSQL distribution; most are + developed by users. + + + + As is typical of client/server applications, the client and the @@ -106,18 +124,6 @@ file name) on the database server machine. - - The PostgreSQL server can handle - multiple concurrent connections from clients. To achieve this it - starts (forks) a new process for each connection. - From that point on, the client and the new server process - communicate without intervention by the original - postgres process. Thus, the - supervisor server process is always running, waiting for - client connections, whereas client and associated server processes - come and go. (All of this is of course invisible to the user. We - only mention it here for completeness.) -