Previous Page
Next Page

5.12. Comparison of Statement-Based Versus Row-Based Replication

Each binary logging format has advantages and disadvantages. This section summarizes them to give you a better basis for choosing the format most appropriate for your situation.

Advantages of statement-based replication:

  • Proven technology that has existed in MySQL since 3.23.

  • Smaller log files. When updates or deletes affect many rows, much smaller log files. Smaller log files require less storage space and are faster to back up.

  • Log files contain all statements that made any changes, so they can be used to audit the database.

  • Log files can be used for point-in-time recovery, not just for replication purposes. See Section 4.10.3, "Point-in-Time Recovery."

  • A slave can be a newer version of MySQL with a different row structure.

Disadvantages of statement-based replication:

  • Not all UPDATE statements can be replicated: Any non-deterministic behavior (for example, when using random functions in an SQL statement) is hard to replicate when using statement-based replication. For statements that use a non-deterministic user-defined function (UDF), it is not possible to replicate the result using statement-based replication, whereas row-based replication will just replicate the value returned by the UDF.

  • Statements cannot be replicated properly if they use a UDF that is non-deterministic (its value depends on things other than the given parameters).

  • Statements that use one of the following functions cannot be replicated properly:

    • LOAD_FILE()

    • UUID()

    • USER()

    • FOUND_ROWS()

    All other functions are replicated correctly (including RAND(), NOW(), LOAD DATA INFILE, and so forth).

  • INSERT ... SELECT requires a greater number of row-level locks than with row-based replication.

  • UPDATE statements that require a table scan (because no index is used in the WHERE clause) must lock a greater number of rows than with row-based replication.

  • For InnoDB: An INSERT statement that uses AUTO_INCREMENT blocks other non-conflicting INSERT statements.

  • Slower to apply data on slave for complex queries.

  • Stored functions (not stored procedures) will execute with the same NOW() value as the calling statement. (This may be regarded both as a bad thing and a good thing.)

  • Deterministic UDFs must be applied on the slaves.

  • When getting something wrong on the slave, the difference between master and slave will grow with time.

  • Tables have to be (almost) identical on master and slave.

Advantages of row-based replication:

  • Everything can be replicated. This is the safest form of replication. Note that currently, DDL (data definition language) statements such as CREATE TABLE are replicated using statement-based replication, while DML (data manipulation language) statements, as well as GRANT and REVOKE statements, are replicated using row-based-replication. For statements like CREATE ... SELECT, a CREATE statement is generated from the table definition and replicated statement-based, while the row insertions are replicated row-based.

  • The technology is the same as most other database management systems; knowledge about other systems transfers to MySQL.

  • In many cases, it is faster to apply data on the slave for tables that have primary keys.

  • Fewer locks are needed (and thus higher concurrency) on the master for the following types of statements:

    • INSERT ... SELECT

    • INSERT statements with AUTO_INCREMENT

    • UPDATE or DELETE statements with WHERE clauses that don't use keys or don't change most of the examined rows

  • Fewer locks on the slave for any INSERT, UPDATE, or DELETE statement.

  • It's possible to add multiple threads to apply data on the slave in the future (works better on SMP machines).

Disadvantages of row-based replication:

  • Larger log files (much larger in some cases).

  • Binary log will contain data for large statements that were rolled back.

  • When using row-based replication to replicate a statement (for example, an UPDATE or DELETE statement), each changed row must be written to the binary log. In contrast, when using statement-based replication, only the statement is written to the binary log. If the statement changes many rows, row-based replication may write significantly more data to the binary log. In these cases the binary log will be locked for a longer time to write the data, which may cause concurrency problems.

  • Deterministic UDFs that generate large BLOB values will be notably slower to replicate.

  • You cannot examine the logs to see what statements were executed.

  • You cannot see on the slave what statements were received from the master and executed.

  • When making a bulk operation that includes non-transactional storage engines, changes are applied as the statement executes. With row-based replication logging, this means that the binary log is written while the statement is running. On the master, this doesn't provide any problems with concurrency, because tables are locked until the bulk operation terminates. On the slave server, however, tables aren't locked while the slave applies changes, because it doesn't know that those changes are part of a bulk operation.

    In that scenario, if you retrieve data from a table on the master (for example, SELECT * FROM table_name), the server will wait for the bulk operation to complete before executing the SELECT statement, because the table is read-locked. On the slave, the server won't wait (because there is no lock). This means that until the "bulk operation" on the slave has completed you will get different results for the same SELECT query on the master and on the slave.

    This behavior will eventually change, but until it does, you should probably use statement-based replication in a scenario like this.


Previous Page
Next Page