Content extract
25 Point Basic MySQL Setup/Optimization Checklist by jestep Daily I run into new web programmers that are using PHP and MySQL to create their blogs and websites. I created this checklist as a guide for new and experienced to make sure they are covering the basics of a MySQL server setup. This guide is by no means all inclusive, but should help to cover some of the major gaps in knowledge and commonly overlooked fundamentals that I run into on a daily basis. The checklist is separated into 5 equal sections: Server Setup, Schema Design, Table Design, Index Optimization, Query Optimization, and a 6th Bonus Tips section. 1. Section 1 – Server Setup 1. Root User For security reasons, the root MYSQL user must be setup with a secure password, and should only have access from localhost. It is a bad idea to allow outside access to the root account Create additional users if you need to access the database remotely! 2. Backup and Restore Before allowing a database to be used in a production
environment, there should be a usable backup and restore process. I use the phrase “in case the database server is completely destroyed” because the backup location and method needs to be completely independent from the database server. Note: Even a weekly database backup is better than no backup at all 3. Benchmarking There’s no easy way to determine bottlenecks and trouble unless a method to benchmark performance is in place. The slow query log should always be enabled, and it’s a good idea to install a benchmarking program. Monyog is an external program that provides a number of real-time reports useful for monitoring and performance tuning. 4. DNS If you do not allow outside access, or you can access your server from known IP addresses, disabling DNS look-ups can speed up server operations. Additionally, if the MySQL server loses it’s DNS look-up service, the usability of the entire Database can all but halt. 5. Privileges When adding users to a database, only give them
the permissions that are absolutely required, and be specific in where they can access from. “GRANT ALL ON EVERYTHING TO USER@ANYWHERE” is a really bad idea. If you do need to give users full permissions for installations or another purpose, it’s a good practice to change them back to the minimum once complete. 2. Section 2 – Schema Design 6. Naming A standardized naming scheme should always be used. The best practice is to use lowercase letters with an underscore connecting names, such as `my personal database`. Tables, and individual column names should carry the same naming convention. Use descriptive names for every column including id columns. `id` is not descriptive whereas `contact id` is 7. Collation Use the same collation for all parts of the database, and avoid using UTF-8 or multi-byte formats unless you specifically need them. Keeping the same format on all tables and columns can help prevent data corruption and conversion problems. UTF-8 requires
significantly more disk space and overhead which can reduce performance. If you need UTF8, use it , but don’t make your entire database UTF-8 because you’re lazy 8. Foreign Keys Always use foreign keys to ensure that bad / incomplete data stays out of the database. Nothing replaces good application level programming, but foreign keys are the best way to prevent putting bad data into your database in the first place. You will need InnoDB to use foreign keys, but the benefit is worth it. 9. Logically Segmented Data in Tables Tables should be segmented logically by the data they contain and their association with other tables. In this manner, there may be more total tables, but will help eliminate tables with a huge number of columns which can really hurt performance. Additionally, it will make querying easier as it’s unlikely that every column is needed for every query. This also allows for Single-to-many relationships such as storing multiple addresses related to a single entry.
Don’t be afraid of 20 tables with 20 columns each, be afraid of 1 table with 400 columns! 10. Reserved Words Avoid using reserved words for any name in your database schema. Words like date, time, decimal, etc. are often used, and can wreak havoc trying to get queries to work properly, and can cause even more difficulty in debugging. You can technically use these words if they are placed in back-ticks (`date`), but this is a bad practice and should always be avoided. 3. Section 3 – Table Design 11. Data types MySQL has many data types, probably more than any other database. Using the correct data type for the data being stored is one of the most important aspects in design. Failing at this step can break a database’s speed and the usability of an application. Whole Numbers – BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT Decimals – DECIMAL, FLOAT, DOUBLE, REAL Dates – TIMESTAMP*, DATETIME, DATE, TIME, YEAR Strings – CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, SET
Additionally, using the correct data type allows for the use of MySQL’s built-in functions which can sort, do math, comparisons, date conversions, etc. For example, I often see dates stored in VARCHAR columns, which completely prevents MySQL from sorting, or using any date related function. 12. Large Numerical Keys It’s common for new programmers to use a BIGINT(20) when they need a key column. While admirable, this is a waste of disk space. An UNSIGNED INT(10) has over 4 billion possible numbers, which is more than most will ever use. Even so, by that time, you will want to look into partitioning, and will have a variety of other problems on your hand. Don’t use BIGINT’s unless you need to store very-very large numbers. 13. Smallest Length Using the smallest length data length is important. Every byte of savings adds up when a database’s size and usage goes up. Lazy programming by using VARCHAR(255) or DECIMAL(20,2) creates unnecessary overhead and causes problems down the
road. Give yourself one extra byte of space if needed, but 100 is a overkill. 14. Avoid TEXT and BLOB Columns TEXT and BLOB type columns can eat up a server’s resources when being selected. While these are most certainly needed to store larger amounts of data, they should only be used for that purpose. VARCHAR can hold up to 255 bytes and should always be used before TEXT whenever possible. 15. Non‐Relational Storage A huge design mistake is storing data in a non-relational format. It’s common to see data stored in a CSV format like (value1,value2,value3) in a single column. This effectively bypasses MySQL’s ability to use the data. It’s best to use multiple tables for single-to-many relationships, as this allows for MySQL to handle the data in an elegant manner. There are some situations where storing csv-like data would make sense, but for all intensive purposes, avoid storing data like this. 4. Section 4 – Index Optimization 16. Use proper indexes MySQL supports
several types of indexes (PRIMARY, UNIQUE, NORMAL, PARTIAL, and FULLTEXT). It is important to use the correct type of index for the job It is also important to only use indexes when needed, and not to create duplicate indexes. For example a primary key column already has an index, so adding a second UNIQUE INDEX on the primary key is a complete waste of overhead and disk space. 17. Multi-Column Indexes If there is a data set that is constantly queried with more than one column in the WHERE clause, it may be a good idea to create a multi-column index. If you have an index on (`user id`,`user category`) the index will work when both are in the where clause or the first column (`user id`) is in the where clause. However, the index will not be used if only `user category` is in the where clause. 18. Modifying Indexed Fields During a Query Unless you specify the length of an index, modifying an indexed column will prevent the index from being used. For example, if there is an index on
`credit card number` and you perform a query like this: SELECT `user id` FROM `my table` WHERE LEFT(`credit card number`,4) = 5666; The index will not be used. If this was a common scenario, you could create a partial index of (`credit card number`,4), and the above query would use the index. 19. Indexes With a High Cardinality Indexes work best when there are many unique values in relation to the total number of rows. This allows the database engine to quickly reduce the number of possible rows in the result set. Indexes on columns with only a few unique values are inefficient and will end up being a waste of overhead. 20. Unique and NULL Column Indexes Allowing NULLS in index columns adds an additional byte of storage per row to the index. This again equates to a waste of space and overhead and will slow down MySQL’s performance. It’s better to use no value rather than NULL 5. Section 5 – Query Optimization 21. Specific Column Names Always use specific column names instead
of * when querying a table. SELECT * is lazy programming. While it is completely valid syntax, you won’t know the columns that will be returned. If you don’t know what you’re going to get with a query, there’s no reason to use it. right? Write out any column names that you need data from This way your code is intuitive, you won’t have problems trying to use data from a column that doesn’t exist, and the next person using your script wont hate you. 22. MySQL’s Built-in Functions MySQL has a variety of very advanced, and very fast, built-in functions. They probably are much more efficient than php or most other application level scripts. These functions can greatly increase your application’s speed, and reduce its complexity. MySQL has everything from mathematical operations, date comparisons, even spacial functions for calculation geographic equations. Learn to use them 23. Selecting TEXT and BLOB Columns When a TEXT or BLOB column is select in a query, MySQL will
create a temporary internal table. If large result sets are selected with TEXT or BLOB columns, this can create a major load on the database, and unnecessary overhead. This relates back to SELECT *, don’t select a TEXT or BLOB type column unless you actually need to use the data. 24. Use Transactions Transactions are another great way of preventing incomplete or corrupted data while inserting or altering data. When using a transaction, you can insert or alter any number of rows of data If there is an error, all of the queries in the transaction will be aborted. Think of inserting 50,000 rows into a report table, and having 10 arbitrary rows not insert correctly. That entire set of data is now corrupt, and a transaction would have prevented that. 25. SQL NO CACHE SQL NO CACHE is a great way to prevent MySQL from caching a query’s result. This is important for results with a rapidly changing data, or very large result sets. Both of these situations can eat up server resources without
any gain to the application or end-user. 6. Bonus 1. TIMESTAMP vs DATETIME TIMESTAMP and DATETIME store dates in the exact same format (YYYY-MM-DD HH:MM:SS) but TIMESTAMP uses less space to do so. The only limitation is that TIMESTAMP cannot be used for dates older than Jan 1st, 1970. 2. SIGNED INT Unless you need to store negative numbers, only use UNSIGNED INT and other numerical data type fields. There’s no reason to allow for negative numbers if they will never be used 3. Collation: ci vs cs The ci in a collation stands for “case insensitive”. If you care about case sensitivity use a collation that ends in cs. The can be very important for searching and other operations where John ≠ john! 4. InnoDB vs MyISAM If you’re using MyISAM as a storage engine only because it was the default, you may be making a mistake. InnoDB is superior in several areas (Reliability, Backups, Foreign Keys, and Performance in many situations) and while maybe not always the best option (Full
Text Indexing), you should know why you’re using the engine you’re using. You can also mix the 2, but this can make performance tuning especially difficult. 5. Consult a Professional When you get a project and the database design, usage or other factor is just over your head, it’s a good idea to consult a professional. It may cost a fair sum, but the cost down the road could be substantial. Planning is always cheaper than reacting 25‐point Basic MySQL Setup/Optimization Checklist 2009 Jamie Estep ∙ www.saynotoflashcom Server Setup 1. Root user is setup with a secure password, and local access only 2. Have a backup and restoration plan in place, in case database server is completely destroyed 3. Install method or program to benchmark performance, and enable the slow query log 4. Turn off MySQLs DNS service, and authenticate by IP only if possible Only grant access for specific "privileges ON db TO user @ host". Do not ever use: GRANT ALL ON EVERYTHING TO 5.
USER@ANYWHERE Schema Design Name database, tables and column with lower case letters and an " " separating words. 6. `id` is not a descriptive name! 7. 8. 9. 10. Database, tables, and columns are using the same collation and character set and not UTF‐8 unless required. Foreign keys being used to maintain data integrity ‐ N/A for MyISAM. Tables segmented logically based on their purpose. Dont use any reserved word (date, int, decimal, time, etc.) for a database, table or column names Table Design 11. Use proper data types For example, numbers and dates not stored in character or text fields 12. Use unsigned int(10) instead of bigint for columns requiring large numerical keys 13. Use the smallest length for the data type being used No VARCHAR(255), BIGINT(20), DECIMAL(20,2), etc 14. Avoid using TEXT and BLOB data types whenever possible Do not store data in a non‐relational format (value1,value2,value3, in a single column) . 15. Use multiple tables and joins to retrieve
one‐to‐many relationships and to preserve data integrity Index Optimization 16. Use the proper indexes (unique, normal, partial, etc) for the data, and do not create duplicate indexes Dont modify an indexed field in a query or the index will not be used. 17. " WHERE DATE(`date of birth`) > 2009‐07‐01 " 18. Use multi‐column (concatenated) indexes for columns that frequently get queried together 19. Try to use indexes in columns with a high cardinality (many unique values in relation to the total rows) 20. Avoid indexing columns with very few unique values or many nulls An empty value is better than NULL Query Optimization 21. Use specific column names when selecting Avoid: SELECT *. 22. Use MySQLs built‐in functions (COUNT, SUM, AVG, etc) instead of application level functions 23. Do not select TEXT or BLOB columns unless absolutely necessary 24. Use transactions whenever applicable 25. Use SQL NO CACHE for very large queries, and queries on data that changes
frequently Bonus ‐ 5 Tips and Tricks 1. Use TIMESTAMP and not DATETIME for all date fields newer than Jan 01 1970 2. Never use SIGNED INT fields unless you actually need to store negative numbers 3. The ci at the end of a collation means "case insensitive" Use ca collation if you care about case sensitivity Know that InnoDB is often a better choice than MyISAM. Dont use MyISAM just because it is default and somebody 4. told you its better If you’re making a serious application and you are unsure of your database design and management ability, consult a qualified MySQL expert to prevent major problems down the road. Spending some money now could save your 5. business down the road 2009 Jamie Estep ∙ www.saynotoflashcom Complete 9 8 9 8 9 8 9 8 9 8 9 8 9 9 9 9 8 8 8 8 9 9 9 9 8 8 8 8 9 8 9 8 9 8 9 9 9 8 8 8 9 9 9 9 9 8 8 8 8 8 9 9 9 8 8 8 9 8 9 8