Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crashing innodb due to innobase_share usage_count not being reduced for temp table accesses #5212

Open
wants to merge 1 commit into
base: 5.7
Choose a base branch
from

Conversation

lottaquestions
Copy link

Description

The MySQL engine is designed with the assumption that accesses to temporary tables will only ever occur from the thread that created the temporary table. Cross thread accesses of temporary tables are not expected and are therefore not designed for. However, if the information schema is queried for details about a temporary table, then the temporary table gets accessed from a thread that is different from the thread that created the temporary table, breaking the design assumption.

When this cross-thread accesss of a temporary table happens for engines that use the InnoDB storage engine, the INNOBASE_SHARE use_count for the table gets incremented, but never gets decremented when the none-owning thread completes. In certain situations, this can lead to a crash of the database engine, when invariants in the function innobase_build_index_translation fail on assertion. For example we found a use case that cause MySQL
to crash with the assertion failure below:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f3132141859 in __GI_abort () at abort.c:79
#2  0x000055ed7452a19a in ut_dbg_assertion_failed (expr=0x55ed7489f848 "share->idx_trans_tbl.index_count == mysql_num_index",
    file=0x55ed7489ea90 "/github/MySQL5.7/mysql-server/storage/innobase/handler/ha_innodb.cc", line=5680)
    at /github/MySQL5.7/mysql-server/storage/innobase/ut/ut0dbg.cc:75
#3  0x000055ed7437ecda in innobase_build_index_translation (table=0x7f30bcaaa4d0, ib_table=0x7f30bc023888, share=0x7f30b002c100)
    at /github/MySQL5.7/mysql-server/storage/innobase/handler/ha_innodb.cc:5680
#4  0x000055ed7437fe6d in ha_innobase::open (this=0x7f30bcaa6a10, name=0x7f30bcaab1c0 "./test/#sql-bae1d_3", mode=2, test_if_locked=2)
    at /github/MySQL5.7/mysql-server/storage/innobase/handler/ha_innodb.cc:6115

The above issue is fixed by allowing the use_count for an INNODB_SHARE object of a temporary table to be decremented during
query cleanup, whenever a temporary table is accessed from a none-owning thread. Care is taken to make sure that the use_count decrement is done inside of the a critical region protected by the innobase_share_mutex mutex.

Testing

Environment Preparation

Run the following commands in MySQL 5.7.x

create database test;
use test;
create table t_with_1_index(id int primary key,id1 int, key idx(id1));
create table t_with_2_index(id int primary key,id1 int, key idx(id1),key idx1(id1));

Reproduction of conditions that would cause a crash due to cross-thread temporary table access

Session 1:

In one terminal session, run the below mysqlslap command to send 2 ALTER statements one, after another in a continuous loop to the MySQL 5.7.x instance:

mysqlslap -umsandbox -pmsandbox -h127.0.0.1 -P5741 --create-schema=information_schema --concurrency=1 --iterations=1000000 --create-schema=test --number-of-queries=10000000 --query="alter table t_with_1_index engine=innodb, algorithm=copy;alter table t_with_2_index engine=innodb, algorithm=copy;"

Session 2:

Open another terminal session, connect to the MySQL instance and:

  1. run "show full processlist;" and get the id for the thread running alter.
  2. run "pgrep mysqld" to get the pid of mysqld. If sandbox enabled, get into the sandbox and get the pid.

Here are the id's in my case:

mysql [localhost:5741] {msandbox} ((none)) > show full processlist;
+----+----------+-----------------+------+---------+------+-------------+----------------------------------------------------------+
| Id | User     | Host            | db   | Command | Time | State       | Info                                                     |
+----+----------+-----------------+------+---------+------+-------------+----------------------------------------------------------+
|  9 | msandbox | localhost:60342 | NULL | Sleep   |    6 |             | NULL                                                     |
| 10 | msandbox | localhost:60344 | test | Query   |    0 | System lock | alter table t_with_2_index engine=innodb, algorithm=copy |
| 11 | msandbox | localhost       | NULL | Query   |    0 | starting    | show full processlist                                    |
+----+----------+-----------------+------+---------+------+-------------+----------------------------------------------------------+
3 rows in set (0.00 sec)

mysql [localhost:5741] {msandbox} ((none)) > \q
Bye
[ec2-user@ip-172-31-54-9 msb_5_7_41]$ ps auxf | grep 5741
ec2-user 1216361  0.1  0.0 142008 10552 pts/6    Sl+  02:28   0:00  |           \_ mysqlslap -umsandbox -px xxxxxx -h127.0.0.1 -P5741 --create-schema=information_schema --concurrency=1 --iterations=1000000 --number-of-queries=10000000 --create-schema=test --query=alter table t_with_1_index engine=innodb, algorithm=copy;alter table t_with_2_index engine=innodb, algorithm=copy;
ec2-user 1216367  0.0  0.0  12108  1080 pts/7    S+   02:30   0:00  |           \_ grep --color=auto 5741
ec2-user  194565  0.0  0.0  60408 14216 pts/2    S+   Jul12   0:00  |   \_ /home/ec2-user/opt/mysql/bin/mysql -uroot -px xxxxxx -P5741 -h127.0.0.1
ec2-user 1214947 16.5  0.2 1984352 318844 ?      Sl   02:21   1:31  \_ /home/ec2-user/opt/mysql/5.7.41/bin/mysqld --defaults-file=/home/ec2-user/sandboxes/msb_5_7_41/my.sandbox.cnf --basedir=/home/ec2-user/opt/mysql/5.7.41 --datadir=/home/ec2-user/sandboxes/msb_5_7_41/data --plugin-dir=/home/ec2-user/opt/mysql/5.7.41/lib/plugin --log-error=/home/ec2-user/sandboxes/msb_5_7_41/data/msandbox.err --pid-file=/home/ec2-user/sandboxes/msb_5_7_41/data/mysql_sandbox5741.pid --socket=/tmp/mysql_sandbox5741.sock --port=5741
  1. Calculate Hex of both processlist_id and pid

     [ec2-user@ip-172-31-54-9 msb_5_7_41]$ printf '%x\n' 1214947
     1289e3
     [ec2-user@ip-172-31-54-9 msb_5_7_41]$ printf '%x\n' 10
     a
    
  2. Connect to MySQL and run the following query such that table_name is #sql-<hex(pid)_hex(processlist_id); Below is the example in my case.

     mysql [localhost:5741] {msandbox} ((none)) > select table_rows from information_schema.tables where table_name='#sql-1289e3_a' and table_schema='test';
     +------------+
     | table_rows |
     +------------+
     |          0 |
     +------------+
     1 row in set (0.00 sec)
    
  3. Without the fix, the first session will exit as the server was crashed, but with the fix, the crash does not occur.

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.

…ed for temp table accesses

The MySQL engine is designed with the assumption that accesses to
temporary tables will only ever occur from the thread that created
the temporary table. Cross thread accesses of temporary tables are
not expected and are therefore not designed for. However, if the
information schema is queried for details about a temporary table,
then the temporary table gets accessed from a thread that is
different from the thread that created the temporary table,
breaking the design assumption.

When this cross-thread accesss of a temporary table happens for
engines that use the InnoDB storage engine, the INNOBASE_SHARE
use_count for the table gets incremented, but never gets
decremented when the none-owning thread completes. In certain
situations, this can lead to a crash of the database engine, when
invariants in the function innobase_build_index_translation fail
on assertion. For example we found a use case that cause MySQL
to crash with the assertion failure below:

    (gdb) bt
    #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    percona#1  0x00007f3132141859 in __GI_abort () at abort.c:79
    percona#2  0x000055ed7452a19a in ut_dbg_assertion_failed (expr=0x55ed7489f848 "share->idx_trans_tbl.index_count == mysql_num_index",
        file=0x55ed7489ea90 "/github/MySQL5.7/mysql-server/storage/innobase/handler/ha_innodb.cc", line=5680)
        at /github/MySQL5.7/mysql-server/storage/innobase/ut/ut0dbg.cc:75
    percona#3  0x000055ed7437ecda in innobase_build_index_translation (table=0x7f30bcaaa4d0, ib_table=0x7f30bc023888, share=0x7f30b002c100)
        at /github/MySQL5.7/mysql-server/storage/innobase/handler/ha_innodb.cc:5680
    percona#4  0x000055ed7437fe6d in ha_innobase::open (this=0x7f30bcaa6a10, name=0x7f30bcaab1c0 "./test/#sql-bae1d_3", mode=2, test_if_locked=2)
        at /github/MySQL5.7/mysql-server/storage/innobase/handler/ha_innodb.cc:6115

The above issue is fixed by allowing the use_count for an
INNODB_SHARE object of a temporary table to be decremented during
query cleanup, whenever a temporary table is accessed from a
none-owning thread. Care is taken to make sure that the use_count
decrement is done inside of the a critical region protected by
the innobase_share_mutex mutex.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
@VarunNagaraju
Copy link
Contributor

Hi @lottaquestions , Thanks a lot for reporting this issue. We were able to reproduce the same. Regarding the fix, we need to analyze it more and might fix it slightly differently. We have created a JIRA ticket to keep track of the issue https://perconadev.atlassian.net/browse/PS-9108. Any updates regarding the issue will be posted there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants