Even after a lot of advancement in Big Data/NoSQL technologies, more that 95 percent of the applications are still being developed using relational databases. For small to medium range applications, the preferred database of choice in open source technology is MySQL. Today, MySQL has a significant share of web application development projects.
Since Oracle’s acquisition of MySQL in 2010, significant effort has been put in improving the database’s performance. Still, as is true with all relational databases, managing heavy transaction volume and data growth with MySQL can prove to be really tricky at times. In this post, I have covered a few tricks to manage slow query performance for MySQL.
Fixing slow performing queries is a three step process:
Understanding the client domain and data model
Identifying the problem
Understanding client’s domain and data model
Before getting to any tech wizardry, the first step to solving performance issues related to slow queries is to try to get a good handle on the domain model. This helps us understand the reasoning behind the certain data model decisions and data retrieval approach. In fact it also helpful to understand the application architecture and deployment details.
Identify the problem
To identify the slow running queries, MySQL provides a slow query log which can be enabled by configuration or by using global variables. Slow running queries can be identified by analyzing this log file. In a moderately complex online application analyzing the slow query log can take quite a while. There are a lot of tools available in the market that can help you save this time. To get a visual representation of the slow queries I use HTML 5 browser based tool, “mysql-slow-query-log-visualizer [Ref-1]” which parses and analyzes MySQL slow query log and creates a visually navigable model.
Fig 2 . Sample visual report from Mysql Slow Query Log visualizer.
On Linux based platforms, MySQL provides “mysqldumpslow[Ref-2]” perl script which can summarize the MySQL slow query log files. Here is a sample summary output:
Both these tools are great to analyze slow query logs but they have a small limitation when profiling the queries based on date i.e today vs yesterday. A simple shell script can be written to analyze the logs and report the slow query for a given date. This can provide answer to the question – “ Which are my top 20 slow queries for today? “.
Following is sample output from a shell script that I have written – sqltop.sh
In addition to slow query log, MySQL provides run time data using command “show [Full] processlist [Ref-4]“ for query execution along with time taken. This is useful in solving the “too many connections” error message as well.
Besides database specific tools other database agnostic performance monitoring tools are also available to give you insights into the same.
Solving the problem
Once the slow queries are identified the next step is to optimize them following a three step process:
The first step is to understand the query, the tables and columns it uses & how its being called from the application (reference or transection data, cached or un-cached, Lazy or eager fetch, Prepared statement or statement , batch or web application ). This categorization comes in handy when we try to optimize the query or tune the server parameters.
Sometimes, while analyzing slow running stored procedures remotely, it can get a little difficult to see the complete output from this procedure . For instance, if you are connected to the database over a VPN connection, you may find yourself in a situation where the VPN connection drops before the stored procedure execution completes. To address this issue, you can execute the procedure in background and redirect the STD Error & STF Out streams to a file. Here is a small script I have written a small shell script which execute the sp in background and logs the verbose output in a log file.
echo Strated execution of ” $1 ” at `date`
Once you have figured out the basic information for the query, the next step is to benchmark it based on data volume / load i.e. record of query execution time, table(s) count and output of “Show Profile”[Ref-6] command. There are a lot of tools available which provide this feature out of the box. I have used SqlYog [Ref-5], which can give you more insight into the execution of a query by:
Using SHOW PROFILE information
Capturing the difference in STATUS variables due to execution of a query
Using EXPLAIN statement, and
Using EXPLAIN EXTENDED statement
Once the above analysis is complete, you can use the following tips to rewrite the query for optimization:
In most cases LEFT [OUTER] JOIN can be faster than an equivalent sub query —in fact this is not specific to MySQL Server alone. Prior to SQL-92, outer joins were not supported, so subqueries were the only way to do certain things. Today all databases support outer joins and that is much more performance efficient. For example, lets look at this query:
SELECT * FROM EMPLOYEE WHERE EMP_ID IN (SELECT EMP_ID FROM SALARY);
Can be rewritten as:
SELECT DISTINCT E.* FROM EMPLOYEE E, SALARY WHERE E.EMP_ID=SALARY.EMP_ID;
#Tables & Columns
MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space. Similarly declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.
Now everyone knows that an index can make data retrieval faster as it lets the database avoid the full table scan. But redundant indexes can slow down the queries as MySQL has to examine more query plans. One should only define the index which are frequently used. Having unused indexes make insert and update slower.
Generally, the entire value of a column is used to build the index – this is fine for small data types (integers and the likes) but can result in a really big index size for larger data types (CHAR and VARCHAR, for example). Using an index prefix i.e index on subset of column can make the index size smaller. Once an index is added we can check if that it being used “explain[Ref-7]”. Over time, with new releases going into production, redundant databases find their way into production systems. You should review your index usage every once in a month to see if there are any indexes which are not used any longer. Such indexes should be pruned regularly for better performance.
Use the correct transection isolation level in your transactions. Read uncommitted isolation level may result in deadlocks for multithreaded application. From version 5.1, read committed does not lock the records under the select part of the insert / select, so it does not block other transactions [Ref-3].
#Storage Engine Optimization
With MySQL 5.5, the default DB engine is InnoDB. But you can also choose a different engine for each table. InnoDB provides row locking which allows much more threads to use the table than MyISAM. But InnoDB needs more resources than MyISAM. So, in case you are looking for full text search you should use MyISAM table . A word of caution though, at medium to heavy scale, Database full text search does not perform well. You should use specialized technologies like Solr or Sphinx for heavy volume full text searches.
In order to optimize the disc I/O, the right value for “innodb_buffer_pool_size” should be mentioned in the server configurations. When table data is cached in the InnoDB buffer pool, it can be processed over and over by queries without requiring any disk I/O. Similarly while working with large number of tables, increase the default value of “innodb_open_files” parameter for better performance.
#Replication & Sharding
In certain cases, esp. when dealing with really large data tables, you may not get the desired performance even after employing the above mentioned tips. In these cases, you will need to use “My Table Partition”, “MySQL Replication” or “Sharding”. Table partitioning gives you better scalability by dividing the table data in different physical segments. Replication or Sharding achieves better performance through redundancy across multiple MySQL instances.
Like other performance management areas, managing performance of MySQL based database is an ongoing thing. The performance needs to be periodically monitored and analyzed after every release or as the data volume grows. There are various monitoring tools available in market which can help you do that. I have used nagios – “check mysql plugin” and MonYog in the past.