diff options
Diffstat (limited to 'www/wiki/docs/database.txt')
-rw-r--r-- | www/wiki/docs/database.txt | 195 |
1 files changed, 195 insertions, 0 deletions
diff --git a/www/wiki/docs/database.txt b/www/wiki/docs/database.txt new file mode 100644 index 00000000..dbc92044 --- /dev/null +++ b/www/wiki/docs/database.txt @@ -0,0 +1,195 @@ +Some information about database access in MediaWiki. +By Tim Starling, January 2006. + +------------------------------------------------------------------------ + Database layout +------------------------------------------------------------------------ + +For information about the MediaWiki database layout, such as a +description of the tables and their contents, please see: + https://www.mediawiki.org/wiki/Manual:Database_layout + https://phabricator.wikimedia.org/diffusion/MW/browse/master/maintenance/tables.sql + + +------------------------------------------------------------------------ + API +------------------------------------------------------------------------ + +To make a read query, something like this usually suffices: + +$dbr = wfGetDB( DB_REPLICA ); +$res = $dbr->select( /* ...see docs... */ ); +foreach ( $res as $row ) { + ... +} + +For a write query, use something like: + +$dbw = wfGetDB( DB_MASTER ); +$dbw->insert( /* ...see docs... */ ); + +We use the convention $dbr for read and $dbw for write to help you keep +track of whether the database object is a slave (read-only) or a master +(read/write). If you write to a slave, the world will explode. Or to be +precise, a subsequent write query which succeeded on the master may fail +when replicated to the slave due to a unique key collision. Replication +on the slave will stop and it may take hours to repair the database and +get it back online. Setting read_only in my.cnf on the slave will avoid +this scenario, but given the dire consequences, we prefer to have as +many checks as possible. + +We provide a query() function for raw SQL, but the wrapper functions +like select() and insert() are usually more convenient. They take care +of things like table prefixes and escaping for you. If you really need +to make your own SQL, please read the documentation for tableName() and +addQuotes(). You will need both of them. + + +------------------------------------------------------------------------ + Basic query optimisation +------------------------------------------------------------------------ + +MediaWiki developers who need to write DB queries should have some +understanding of databases and the performance issues associated with +them. Patches containing unacceptably slow features will not be +accepted. Unindexed queries are generally not welcome in MediaWiki, +except in special pages derived from QueryPage. It's a common pitfall +for new developers to submit code containing SQL queries which examine +huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a +table is like counting beans in a bucket. + + +------------------------------------------------------------------------ + Replication +------------------------------------------------------------------------ + +The largest installation of MediaWiki, Wikimedia, uses a large set of +slave MySQL servers replicating writes made to a master MySQL server. It +is important to understand the issues associated with this setup if you +want to write code destined for Wikipedia. + +It's often the case that the best algorithm to use for a given task +depends on whether or not replication is in use. Due to our unabashed +Wikipedia-centrism, we often just use the replication-friendly version, +but if you like, you can use wfGetLB()->getServerCount() > 1 to +check to see if replication is in use. + +=== Lag === + +Lag primarily occurs when large write queries are sent to the master. +Writes on the master are executed in parallel, but they are executed in +serial when they are replicated to the slaves. The master writes the +query to the binlog when the transaction is committed. The slaves poll +the binlog and start executing the query as soon as it appears. They can +service reads while they are performing a write query, but will not read +anything more from the binlog and thus will perform no more writes. This +means that if the write query runs for a long time, the slaves will lag +behind the master for the time it takes for the write query to complete. + +Lag can be exacerbated by high read load. MediaWiki's load balancer will +stop sending reads to a slave when it is lagged by more than 30 seconds. +If the load ratios are set incorrectly, or if there is too much load +generally, this may lead to a slave permanently hovering around 30 +seconds lag. + +If all slaves are lagged by more than 30 seconds, MediaWiki will stop +writing to the database. All edits and other write operations will be +refused, with an error returned to the user. This gives the slaves a +chance to catch up. Before we had this mechanism, the slaves would +regularly lag by several minutes, making review of recent edits +difficult. + +In addition to this, MediaWiki attempts to ensure that the user sees +events occurring on the wiki in chronological order. A few seconds of lag +can be tolerated, as long as the user sees a consistent picture from +subsequent requests. This is done by saving the master binlog position +in the session, and then at the start of each request, waiting for the +slave to catch up to that position before doing any reads from it. If +this wait times out, reads are allowed anyway, but the request is +considered to be in "lagged slave mode". Lagged slave mode can be +checked by calling wfGetLB()->getLaggedSlaveMode(). The only +practical consequence at present is a warning displayed in the page +footer. + +=== Lag avoidance === + +To avoid excessive lag, queries which write large numbers of rows should +be split up, generally to write one row at a time. Multi-row INSERT ... +SELECT queries are the worst offenders should be avoided altogether. +Instead do the select first and then the insert. + +=== Working with lag === + +Despite our best efforts, it's not practical to guarantee a low-lag +environment. Lag will usually be less than one second, but may +occasionally be up to 30 seconds. For scalability, it's very important +to keep load on the master low, so simply sending all your queries to +the master is not the answer. So when you have a genuine need for +up-to-date data, the following approach is advised: + +1) Do a quick query to the master for a sequence number or timestamp 2) +Run the full query on the slave and check if it matches the data you got +from the master 3) If it doesn't, run the full query on the master + +To avoid swamping the master every time the slaves lag, use of this +approach should be kept to a minimum. In most cases you should just read +from the slave and let the user deal with the delay. + + +------------------------------------------------------------------------ + Lock contention +------------------------------------------------------------------------ + +Due to the high write rate on Wikipedia (and some other wikis), +MediaWiki developers need to be very careful to structure their writes +to avoid long-lasting locks. By default, MediaWiki opens a transaction +at the first query, and commits it before the output is sent. Locks will +be held from the time when the query is done until the commit. So you +can reduce lock time by doing as much processing as possible before you +do your write queries. + +Often this approach is not good enough, and it becomes necessary to +enclose small groups of queries in their own transaction. Use the +following syntax: + +$dbw = wfGetDB( DB_MASTER ); +$dbw->begin( __METHOD__ ); +/* Do queries */ +$dbw->commit( __METHOD__ ); + +Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They +are poorly implemented in InnoDB and will cause regular deadlock errors. +It's also surprisingly easy to cripple the wiki with lock contention. + +Instead of locking reads, combine your existence checks into your write +queries, by using an appropriate condition in the WHERE clause of an +UPDATE, or by using unique indexes in combination with INSERT IGNORE. +Then use the affected row count to see if the query succeeded. + +------------------------------------------------------------------------ + Supported DBMSs +------------------------------------------------------------------------ + +MediaWiki is written primarily for use with MySQL. Queries are optimized +for it and its schema is considered the canonical version. However, +MediaWiki does support the following other DBMSs to varying degrees. + +* PostgreSQL +* SQLite +* Oracle +* MSSQL + +More information can be found about each of these databases (known issues, +level of support, extra configuration) in the "databases" subdirectory in +this folder. + +------------------------------------------------------------------------ + Use of GROUP BY +------------------------------------------------------------------------ + +MySQL supports GROUP BY without checking anything in the SELECT clause. +Other DBMSs (especially Postgres) are stricter and require that all the +non-aggregate items in the SELECT clause appear in the GROUP BY. For +this reason, it is highly discouraged to use SELECT * with GROUP BY +queries. + |