Development, Analysis And Research


MySQL & PHP Performance Optimization Tips

Posted in Db, General by Andrew Johnstone on the July 25th, 2007

In high performance web applications you will always have bottlenecks within your application. Identifying these bottlenecks and optimizing is a tedious task and typically show themselves underload. A single bad/unindexed query can bring a server to its knees. A large number of rows will also help to highlight any poor queries, and on very large datasets you may come to the point where you may have to make decisions whether to denormilize database schema.

Explain each page

Whilst I develope sites, I typically print out all queries, EXPLAIN each select statement at the bottom of each page, and highlight it red if its doing a full table scan, temp tables or a filesort. As well as displaying SHOWS INDEXES FROM TABLE…

Not only will it help you to optimize sites, you can also see bad logic and areas to optimize such as a query for each loop when looking through a users table for example.

MySQL indexing optimization

How do you identify where bottlenecks occur?

One of my favourite linux commands lately is the watch command. For Mac users you can get this from macports via “sudo port install watch”. Also a few other handy applications are mysqlreport, mytop.

# Appends file with processlist
watch -n1 "mysqladmin -uroot processlist >>watch.processlist.txt"

# Count the number of locked processes
watch -n1 "mysqladmin -uroot processlist | grep -i 'lock' | wc -l ";

# Count the number of processes sleep
watch -n1 "mysqladmin -uroot processlist | grep -i 'sleep' | wc -l ";

# Run a specific query every second
watch -n1 "mysql -uadmin -p`cat /etc/psa/.psa.shadow` trade_engine --execute "SELECT NOW(),date_quote FROM sampleData WHERE 1=1 AND permission = '755' AND  symbol='IBZL' GROUP BY date_quote;" "

# Emails mysqlreport every 60 seconds
watch -n60 mysqlreport --all --email andrew@email.com

# Displays process list as well as appending the contents to a file
watch -n1 "mysqladmin -uadmin -p`cat /etc/psa/.psa.shadow` processlist | tee -a process.list.txt"

Watching the processlist is very handy in identifying locked, sleeping or sorting process states. If you have a large number of locked processes you typically should change the table type to INNODB, which supports row level locking. if you have a large number of sleeping connections, and you have persistent connections enabled, most likely indicates that connections are not being reused.

Running a specific query every second is exceptionally handy, the example I gave indicates whether one of our crons is correctly functioning and as each row is inserted you can watch something being either inserted or updated. mysqlreport gives numerous peices of information, extremely helpful in identifying issues, you can see more indepth at hackmysql.com/mysqlreportguide.

Look at the mysql slow query log and optimize each query starting with the most common, think whether you have to execute that query at all and use a cache such as memcached.

I also typically tend to look at the following:

  • vmstat -S M
  • ps axl | grep -i ‘mysql’
  • pstree –G
  • free –m

Reference:
http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html

C++

Posted in C++, General, PHP by Andrew Johnstone on the July 8th, 2007

I’ve had alot of experience with other programming languages, however I had to learn C++ from scratch in a very short period of time, a number of weeks ago. This was to develop a real-time stock quote client, the goal was simply to push data from remote servers into our databases, filter what messages it would receive and get something up and running fast as deadlines lingured. This was simple enough, however with the rush the application had its inherent flaws, due to my lack of knowledge of C++, the API, and the goals it had to acomplish.

I’ve since had time to learn a little more C++ and limited time to design the application properly.

The Problems

The core problems with the application:

  • refactor, refactor, refactor
  • database connection pooling
  • Query remote CSP servers*1
  • Query remote CSP servers*1 from PHP
  • Configuration management
  • Monitoring
  • Flexible Database schema
    • Add columns to database schema dependent on datatype.
    • Log messages in XML per trade message with date/time, columns and values.

Compatible GCC

The first issue was that I used an API from interactive-data, which was compatible with “gcc version 3.2.3″ and is not kept up to date. This meant compiling a compatible gcc from source for 32bit platforms only.


./configure --prefix=/usr/local/gcc/ --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --enable-languages=c,c++,objc,obj-c++

make bootstrap
cd gcc
make
sudo make install

Once having a compatible compiler, I then had to make modifications to the Makefile, move a number of lib/so files to get MySQL to compile and get things working. Unfortunately I did not have a local machine to attach a debugger, so everything was trial and error from the command line with g++32, which makes life difficult identifying runtime errors.

The Logic

Once everything was in place, the logic was fairly simple, foreach field retreived construct a query with the field name, checking the fields values datatype whether it be a datetime, varchar etc. Insert each trade message in a table, update another and if either failed, check if the fault was due to a missing column, if so add it and re-execute queries.

The problem soon arrises when you need to know when each column was actually last updated, with which field, value, datetime and the last insert id for the trade messages. Whilst looping through each trade message, I constructed an XML schema containing the above, however the tricky part is to ensure that it only updates the fragment matching the field in the schema. Not an ideal format to query from a database.

Storing Data

One of the fundemental issues is managing and storing data. For some exchanges you don’t want to store every trade message; simply storing the current data for a number of instruments is enough. Which servers or databases do you peg data to? If one database goes down, how do you handle fault tolerance? MySQL cluster is not a feasible solution, requiring multiple servers and large memory requirements per installation. The databases are highly susceptible to curruption or faults. Also particular sites may require data from multiple exchanges, so seperating trade messages per database is not also ideal.

All of this fundamentally comes down to configuration management.

Configuration

One of the fundamental aspects of the application is configuration management. This contains where data should be stored for a particular exchange, the type of data to store, whether it is per trade message, current data or both. Which servers to source data from, whether it is real time or delayed, whether to source data for bonds, equities, automated trades etc… All queries can be grouped, or to query remote servers. Some of the products for example just for the London Stock Exchange is:

  • London Stock Exch – Covered Warrants L1
  • London Stock Exch – International Equity Mkt Service L1
  • London Stock Exch – International Equity Mkt Service Level 2
  • London Stock Exch – UK Equity Mkt Service L1
  • London Stock Exch – UK Equity Mkt Service Level 2 (Depth Refresh)
  • London Stock Exchange: UK Equity Market Service Level 2

All of which is stored in several database tables and managed via a MySQL database and PHP frontend.