summaryrefslogtreecommitdiff
path: root/www/wiki/docs/database.txt
diff options
context:
space:
mode:
Diffstat (limited to 'www/wiki/docs/database.txt')
-rw-r--r--www/wiki/docs/database.txt195
1 files changed, 195 insertions, 0 deletions
diff --git a/www/wiki/docs/database.txt b/www/wiki/docs/database.txt
new file mode 100644
index 00000000..dbc92044
--- /dev/null
+++ b/www/wiki/docs/database.txt
@@ -0,0 +1,195 @@
+Some information about database access in MediaWiki.
+By Tim Starling, January 2006.
+
+------------------------------------------------------------------------
+ Database layout
+------------------------------------------------------------------------
+
+For information about the MediaWiki database layout, such as a
+description of the tables and their contents, please see:
+ https://www.mediawiki.org/wiki/Manual:Database_layout
+ https://phabricator.wikimedia.org/diffusion/MW/browse/master/maintenance/tables.sql
+
+
+------------------------------------------------------------------------
+ API
+------------------------------------------------------------------------
+
+To make a read query, something like this usually suffices:
+
+$dbr = wfGetDB( DB_REPLICA );
+$res = $dbr->select( /* ...see docs... */ );
+foreach ( $res as $row ) {
+ ...
+}
+
+For a write query, use something like:
+
+$dbw = wfGetDB( DB_MASTER );
+$dbw->insert( /* ...see docs... */ );
+
+We use the convention $dbr for read and $dbw for write to help you keep
+track of whether the database object is a slave (read-only) or a master
+(read/write). If you write to a slave, the world will explode. Or to be
+precise, a subsequent write query which succeeded on the master may fail
+when replicated to the slave due to a unique key collision. Replication
+on the slave will stop and it may take hours to repair the database and
+get it back online. Setting read_only in my.cnf on the slave will avoid
+this scenario, but given the dire consequences, we prefer to have as
+many checks as possible.
+
+We provide a query() function for raw SQL, but the wrapper functions
+like select() and insert() are usually more convenient. They take care
+of things like table prefixes and escaping for you. If you really need
+to make your own SQL, please read the documentation for tableName() and
+addQuotes(). You will need both of them.
+
+
+------------------------------------------------------------------------
+ Basic query optimisation
+------------------------------------------------------------------------
+
+MediaWiki developers who need to write DB queries should have some
+understanding of databases and the performance issues associated with
+them. Patches containing unacceptably slow features will not be
+accepted. Unindexed queries are generally not welcome in MediaWiki,
+except in special pages derived from QueryPage. It's a common pitfall
+for new developers to submit code containing SQL queries which examine
+huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a
+table is like counting beans in a bucket.
+
+
+------------------------------------------------------------------------
+ Replication
+------------------------------------------------------------------------
+
+The largest installation of MediaWiki, Wikimedia, uses a large set of
+slave MySQL servers replicating writes made to a master MySQL server. It
+is important to understand the issues associated with this setup if you
+want to write code destined for Wikipedia.
+
+It's often the case that the best algorithm to use for a given task
+depends on whether or not replication is in use. Due to our unabashed
+Wikipedia-centrism, we often just use the replication-friendly version,
+but if you like, you can use wfGetLB()->getServerCount() > 1 to
+check to see if replication is in use.
+
+=== Lag ===
+
+Lag primarily occurs when large write queries are sent to the master.
+Writes on the master are executed in parallel, but they are executed in
+serial when they are replicated to the slaves. The master writes the
+query to the binlog when the transaction is committed. The slaves poll
+the binlog and start executing the query as soon as it appears. They can
+service reads while they are performing a write query, but will not read
+anything more from the binlog and thus will perform no more writes. This
+means that if the write query runs for a long time, the slaves will lag
+behind the master for the time it takes for the write query to complete.
+
+Lag can be exacerbated by high read load. MediaWiki's load balancer will
+stop sending reads to a slave when it is lagged by more than 30 seconds.
+If the load ratios are set incorrectly, or if there is too much load
+generally, this may lead to a slave permanently hovering around 30
+seconds lag.
+
+If all slaves are lagged by more than 30 seconds, MediaWiki will stop
+writing to the database. All edits and other write operations will be
+refused, with an error returned to the user. This gives the slaves a
+chance to catch up. Before we had this mechanism, the slaves would
+regularly lag by several minutes, making review of recent edits
+difficult.
+
+In addition to this, MediaWiki attempts to ensure that the user sees
+events occurring on the wiki in chronological order. A few seconds of lag
+can be tolerated, as long as the user sees a consistent picture from
+subsequent requests. This is done by saving the master binlog position
+in the session, and then at the start of each request, waiting for the
+slave to catch up to that position before doing any reads from it. If
+this wait times out, reads are allowed anyway, but the request is
+considered to be in "lagged slave mode". Lagged slave mode can be
+checked by calling wfGetLB()->getLaggedSlaveMode(). The only
+practical consequence at present is a warning displayed in the page
+footer.
+
+=== Lag avoidance ===
+
+To avoid excessive lag, queries which write large numbers of rows should
+be split up, generally to write one row at a time. Multi-row INSERT ...
+SELECT queries are the worst offenders should be avoided altogether.
+Instead do the select first and then the insert.
+
+=== Working with lag ===
+
+Despite our best efforts, it's not practical to guarantee a low-lag
+environment. Lag will usually be less than one second, but may
+occasionally be up to 30 seconds. For scalability, it's very important
+to keep load on the master low, so simply sending all your queries to
+the master is not the answer. So when you have a genuine need for
+up-to-date data, the following approach is advised:
+
+1) Do a quick query to the master for a sequence number or timestamp 2)
+Run the full query on the slave and check if it matches the data you got
+from the master 3) If it doesn't, run the full query on the master
+
+To avoid swamping the master every time the slaves lag, use of this
+approach should be kept to a minimum. In most cases you should just read
+from the slave and let the user deal with the delay.
+
+
+------------------------------------------------------------------------
+ Lock contention
+------------------------------------------------------------------------
+
+Due to the high write rate on Wikipedia (and some other wikis),
+MediaWiki developers need to be very careful to structure their writes
+to avoid long-lasting locks. By default, MediaWiki opens a transaction
+at the first query, and commits it before the output is sent. Locks will
+be held from the time when the query is done until the commit. So you
+can reduce lock time by doing as much processing as possible before you
+do your write queries.
+
+Often this approach is not good enough, and it becomes necessary to
+enclose small groups of queries in their own transaction. Use the
+following syntax:
+
+$dbw = wfGetDB( DB_MASTER );
+$dbw->begin( __METHOD__ );
+/* Do queries */
+$dbw->commit( __METHOD__ );
+
+Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They
+are poorly implemented in InnoDB and will cause regular deadlock errors.
+It's also surprisingly easy to cripple the wiki with lock contention.
+
+Instead of locking reads, combine your existence checks into your write
+queries, by using an appropriate condition in the WHERE clause of an
+UPDATE, or by using unique indexes in combination with INSERT IGNORE.
+Then use the affected row count to see if the query succeeded.
+
+------------------------------------------------------------------------
+ Supported DBMSs
+------------------------------------------------------------------------
+
+MediaWiki is written primarily for use with MySQL. Queries are optimized
+for it and its schema is considered the canonical version. However,
+MediaWiki does support the following other DBMSs to varying degrees.
+
+* PostgreSQL
+* SQLite
+* Oracle
+* MSSQL
+
+More information can be found about each of these databases (known issues,
+level of support, extra configuration) in the "databases" subdirectory in
+this folder.
+
+------------------------------------------------------------------------
+ Use of GROUP BY
+------------------------------------------------------------------------
+
+MySQL supports GROUP BY without checking anything in the SELECT clause.
+Other DBMSs (especially Postgres) are stricter and require that all the
+non-aggregate items in the SELECT clause appear in the GROUP BY. For
+this reason, it is highly discouraged to use SELECT * with GROUP BY
+queries.
+