$theTitle=wp_title(" - ", false); if($theTitle != "") { ?> } else { ?> } ?>
by Andrew Johnstone
In: General
2 Jan 2010I just read “How to use locks in PHP cron jobs to avoid cron overlaps” and I thought I would elaborate on this and provide some more examples. In order for a lock to work correctly it must handle, Atomicity / Race Conditions, and Signaling.
I use the following bash script to create locks for crontabs and ensure single execution of scripts.
“The clever bit is to get a lock file test and creation (if needed) to be atomic, that is done without interruption. The set -C stops a redirection from over writing a file. The : > touches a file. In combination, the effect is, when the lock file exists, the redirection fails and exits with an error. If it does not exist, the redirection creates the lock file and exits without an error.The final part is to make sure that the lock file is cleaned up. To makes sure it is removed even if the script is terminated with a ctrl-c, a trap is used. Simply, when the script exits, the trap is run and the lock file is deleted.”, The Lab Book Pages
In addition it also checks the process list and tests whether the pid within the lock file is active.
#!/bin/bash LOCK_FILE=/tmp/my.lock CRON_CMD="php /var/www/..../fork.php -t17" function check_lock { (set -C; : > $LOCK_FILE) 2> /dev/null if [ $? != "0" ]; then RUNNING_PID=$(cat $LOCK_FILE 2> /dev/null || echo "0"); if [ "$RUNNING_PID" -gt 0 ]; then if [ `ps -p $RUNNING_PID -o comm= | wc -l` -eq 0 ]; then echo "`date +'%Y-%m-%d %H:%M:%S'` WARN [Cron wrapper] Lock File exists but no process running $RUNNING_PID, continuing"; else echo "`date +'%Y-%m-%d %H:%M:%S'` INFO [Cron wrapper] Lock File exists and process running $RUNNING_PID - exiting"; exit 1; fi else echo "`date +'%Y-%m-%d %H:%M:%S'` CRIT [Cron wrapper] Lock File exists with no PID, wtf?"; exit 1; fi fi trap "rm $LOCK_FILE;" EXIT } check_lock; echo "`date +'%Y-%m-%d %H:%M:%S'` INFO [Cron wrapper] Starting process"; $CRON_CMD & CURRENT_PID=$!; echo "$CURRENT_PID" > $LOCK_FILE; trap "rm -f $LOCK_FILE 2> /dev/null ; kill -9 $CURRENT_PID 2> /dev/null;" EXIT; echo "`date +'%Y-%m-%d %H:%M:%S'` INFO [Cron wrapper] Started ($CURRENT_PID)"; wait; # remove the trap kill so it won't try to kill process which took place of the php one in mean time (paranoid) trap "rm -f $LOCK_FILE 2> /dev/null" EXIT; rm -f $LOCK_FILE 2> /dev/null; echo "`date +'%Y-%m-%d %H:%M:%S'` INFO [Cron wrapper] Finished process";
With the implementation described in the post at abhinavsingh.com, it will fail if you put it as a background process as an example see below.
andrew@andrew-home:~/tmp.lock$ php x.php ==16169== Lock acquired, processing the job... ^C andrew@andrew-home:~/tmp.lock$ php x.php ==16169== Previous job died abruptly... ==16170== Lock acquired, processing the job... ^C andrew@andrew-home:~/tmp.lock$ php x.php ==16170== Previous job died abruptly... ==16187== Lock acquired, processing the job... ^Z [1]+ Stopped php x.php andrew@andrew-home:~/tmp.lock$ ps aux | grep php andrew 16187 0.5 0.5 50148 10912 pts/2 T 09:53 0:00 php x.php andrew 16192 0.0 0.0 3108 764 pts/2 R+ 09:53 0:00 grep --color=auto php andrew@andrew-home:~/tmp.lock$ php x.php ==16187== Already in progress...
You can use pcntl_signal to trap interruptions to the application and handle cleanup of the process. Here is a slightly modified implementation to handle cleanup. Just to highlight the register_shutdown_function will not help to cleanup on any signal/interruption.
<?php class lockHelper { protected static $_pid; protected static $_lockDir = '/tmp/'; protected static $_signals = array( // SIGKILL, SIGINT, SIGPIPE, SIGTSTP, SIGTERM, SIGHUP, SIGQUIT, ); protected static $_signalHandlerSet = FALSE; const LOCK_SUFFIX = '.lock'; protected static function isRunning() { $pids = explode(PHP_EOL, `ps -e | awk '{print $1}'`); return in_array(self::$_pid, $pids); } public static function lock() { self::setHandler(); $lock_file = self::$_lockDir . $_SERVER['argv'][0] . self::LOCK_SUFFIX; if(file_exists($lock_file)) { self::$_pid = file_get_contents($lock_file); if(self::isrunning()) { error_log("==".self::$_pid."== Already in progress..."); return FALSE; } else { error_log("==".self::$_pid."== Previous job died abruptly..."); } } self::$_pid = getmypid(); file_put_contents($lock_file, self::$_pid); error_log("==".self::$_pid."== Lock acquired, processing the job..."); return self::$_pid; } public static function unlock() { $lock_file = self::$_lockDir . $_SERVER['argv'][0] . self::LOCK_SUFFIX; if(file_exists($lock_file)) { error_log("==".self::$_pid."== Releasing lock..."); unlink($lock_file); } return TRUE; } protected static function setHandler() { if (!self::$_signalHandlerSet) { declare(ticks = 1); foreach(self::$_signals AS $signal) { if (!pcntl_signal($signal, array('lockHelper',"signal"))) { error_log("==".self::$_pid."== Failed assigning signal - '{$signal}'"); } } } return TRUE; } protected static function signal($signo) { if (in_array($signo, self::$_signals)) { if(!self::isrunning()) { self::unlock(); } } return FALSE; } }
As an example:
andrew@andrew-home:~/tmp.lock$ php t.php ==16268== Lock acquired, processing the job... ^Z==16268== Releasing lock...
Whilst the implementation above simply uses files, it could be implemented with shared memory (SHM/APC), distributed caching (memcached), or a database. If over a network, factors such as packet loss, latency etc can cause race conditions and should be taken into account. Depending on the application it maybe better to implement as a daemon. If your looking to distribute tasks amongst servers, take a look at Gearman
In: General
31 Dec 2009Last year I wrote an application to highlight media outlets and their reach (coverage of media outlets), selecting regions within the UK and highlighting aspects of a map. This had many issues where by hitting performance problems of rendering within browsers and also limitations of converting KML to tiles via google. A list of these limitations are:
Some of these limits have since been increased by google and are documented.
Maximum fetched file size (raw KML, raw GeoRSS, or compressed KMZ) 3MB Maximum uncompressed KML file size 10MB Maximum number of Network Links 10 Maximum number of total document-wide features 1,000
In order to alleviate these issues I ended up with the following
So depending on the depth (zoom) of the map and the area selected as well the volume of data, it would either use tiles or googles KML directly (Increased functionality).
In order to have greater control over the spatial data within our database we split this into areas, regions, and sub_regions, which held lookups to postcodes, towns and spatial data itself (There are a lot of discrepancies over outlines of maps).
Left hand menu:
<ul style="display: block;"> <li id="East"><a href="#" onclick="loadTilesFromGeoXML('|1|'); return false;">East</a> <ul style="display: none;"> <li><a href="#" onclick="loadTilesFromGeoXML('|1|6'); return false;">Bedfordshire</a></li> <li><a href="#" onclick="loadTilesFromGeoXML('|1|18'); return false;">Cambridgeshire</a></li> ... </ul> </li> </ul>
Javascript to locate tiles
function loadTilesFromGeoXML(entity_id) { // Matches database record ids that are mapped to spatial data within MySQL mapTownsId = entity_id.toString().split('|')[0]; mapRegionsId = entity_id.toString().split('|')[1]; mapSubRegionsId = entity_id.toString().split('|')[2]; locationUrl ='map_towns_id='+mapTownsId+'&map_regions_id='+mapRegionsId+'&map_sub_regions_id='+mapSubRegionsId; var cc = map.fromLatLngToDivPixel(map.getCenter()); map.setZoom(1); // Request URL to cached titles links geoXMLUrl = '/ajax/mapping/get/overlays/region?'+locationUrl; geoXMLUrl+='&format=JSON&method=getLinks&x='+cc.x+'&y='+cc.y+'&zoom='+map.getZoom(); // tileUrlTemplate: 'http://domain.com/maps/proxy/regions/?url=http%3A%2F%2Fdomain.com/ajax/mapping/get/cache/?filename=.1.6.0&x={X}&y={Y}&zoom={Z}', $.getJSON(geoXMLUrl, function(data) { $.each(data, function(i,link) { kmlLinks+=encodeURIComponent(link)+','; }); // Builds the location for tiles to be mapped tileUrlTemplate = '/maps/proxy/regions/?url='+kmlLinks+'&x={X}&y={Y}&zoom={Z}'; var tileLayerOverlay = new GTileLayerOverlay( new GTileLayer(null, null, null, { tileUrlTemplate: tileUrlTemplate, isPng:true, opacity:1.0 }) ); if (debug) GLog.writeUrl('/maps/proxy/regions/?url='+kmlLinks+'&x={X}&y={Y}&zoom={Z}'); map.addOverlay(tileLayerOverlay); }); }
Response whilst retrieving links (if cached)
The code behind this simply caches the KML files, if it does not exist, otherwise attempts to create it and also outputs a json request with the files matching the sequence and globs for any files with a similar pattern, all files are suffixed with their page number.
["/ajax/mapping/get/cache/?filename=.1..0&x=250&y=225&zoom=5","/ajax/mapping/get/cache/?filename=.1..1&x=250&y=225&zoom=5"]
Proxying googles tiles and merging the layer ids
$kmlUrls = urlencode($_GET['url']); $cachePath = dirname(__FILE__).'/cache.maps/tiles/'; $cachedFiles = array_filter(explode(',',rawurldecode($kmlUrls))); $hash = sha1(rawurldecode($kmlUrls).".w{$_GET['w']}.h{$_GET['h']}.x{$_GET['x']}.y{$_GET['y']}.{$_GET['zoom']}"); $cachePath.="{$_GET['x']}.{$_GET['y']}/{$_GET['zoom']}/"; if (!is_dir($cachePath)) { @mkdir($cachePath, 0777, true); } // Returns image if cached already and aggregated. if (file_exists($path = $cachePath.$hash)) { header('Content-Type: image/png'); $fp = fopen($path, 'rb'); fpassthru($fp); } // Extract layer id's from KML files that are to be merged. $layerIds = array(); foreach( $cachedFiles AS $kmlFile) { $kmlFile="http://{$_SERVER['HTTP_HOST']}{$kmlFile}"; $url = "http://maps.google.com/maps/gx?q={$kmlFile}&callback=_xdc_._1fsue7g2w"; @$c = file_get_contents($url); if (!$c) throw new Exception("Failed to request {$url} - {$c}"); preg_match_all('/layer_id:"kml:(.*)"/i', $c, $matches); if (count($matches)>0 && isset($matches[1][0])) { $layerIds[] = "kml:{$matches[1][0]}"; } } // Cache locally. if (count($layerIds)>0) { header('Content-Type: image/png'); // Aggregate layers into a single image $link = "http://mlt0.google.com/mapslt?lyrs=" . implode(',',$layerIds); $link.="&x={$_GET['x']}&y={$_GET['y']}&z={$_GET['zoom']}&w={$_GET['w']}&h={$_GET['h']}&source=maps_api"; echo $c = file_get_contents($link); @file_put_contents($path, $c); } else { // Output 1x1 png header('Content-Type: image/png'); echo base64_decode('iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAC0lEQVQIHWNgAAIAAAUAAY27m/MAAAAASUVORK5CYII='); } }
Paging GeoXML loading
function loadGeoXMLPaged(geoXMLUrl) { var cc = map.fromLatLngToDivPixel(map.getCenter()); geoXMLUrl+='&format=JSON&method=getLinks&x='+cc.x+'&y='+cc.y+'&zoom='+map.getZoom(); if (debug) GLog.writeUrl(geoXMLUrl); $.getJSON(geoXMLUrl, function(data) { geoXmlPager = data; loadGeoXmlPage(); }); } var timeoutPID = null; function loadGeoXmlPage(){ if (data = geoXmlPager.pop()){ if (debug) GLog.writeUrl(BASE_URL+data); geoXmlStack.push(new GGeoXml(BASE_URL+data)); map.addOverlay(geoXmlStack[geoXmlStack.length - 1]); GEvent.addListener(geoXmlStack[geoXmlStack.length - 1],"load",function() { timeoutPID = setTimeout("loadGeoXmlPage()", 500); }); }else{ clearTimeout(timeoutPID); map.setZoom(map.getBoundsZoomLevel(bounds)); map.setCenter(bounds.getCenter()); try { geoXmlStack[geoXmlStack.length - 1].gotoDefaultViewport(map); } catch(e) {} } }
All the code above has been modified slightly to make it applicable to others, however don’t accept raw input as its simply an example.
In: General
24 Oct 2009I recently came across a peculiar issue that meant dates and times were causing issues with a product we had developed within Australia. The issue being that within “Red Hat Enterprise Linux Server release 5 (Tikanga)” the date within PHP was being read as EST instead of AEST/AEDT, however running “date” from the terminal or running “SELECT NOW()” from MySQL displayed the correct time.
[user@server ~]$ date Wed Oct 14 22:24:20 EST 2009 [user@server ~]$ php -r'var_dump(date("r"));' string(51) "Wed, 14 Oct 2009 21:25:07 +1000 Australia/Melbourne" [user@server ~]$ php -r'var_dump(date("r e"));var_dump(getenv("TZ"));var_dump(ini_get("date.timezone"));var_dump(date_default_timezone_get());'; string(51) "Wed, 14 Oct 2009 21:25:07 +1000 Australia/Melbourne" bool(false) string(0) "" string(19) "Australia/Melbourne" [user@server ~]$ mysql -uuser -ppassword -e 'SELECT NOW();' +---------------------+ | NOW() | +---------------------+ | 2009-10-14 22:26:12 | +---------------------+
As you can see php incorrectly gets the time, being an hour off. Running the above on debian worked perfectly fine and comparing the zoneinfo matched my local machine.
[user@server ~]$ md5sum /etc/localtime && md5sum /usr/share/zoneinfo/Australia/Sydney && md5sum /usr/share/zoneinfo/Australia/Melbourne
85285c5495cd5b8834ab62446d9110a9 /etc/localtime
85285c5495cd5b8834ab62446d9110a9 /usr/share/zoneinfo/Australia/Sydney
8a7f0f78d5a146db4bf865ca91cc1c42 /usr/share/zoneinfo/Australia/Melbourne
After a fair amount of digging I ended up coming across the following ticket @478566. Amazingly the ticket is marked as “CLOSED WONTFIX”.
There were a few interesting points from some of the conversations I read.
” Alphabetic time zone abbreviations should not be used as unique identifiers for UTC offsets as they are ambiguous in practice. For example, “EST” denotes 5 hours behind UTC in English-speaking North America, but it denotes 10 or 11 hours ahead of UTC in Australia; and French-speaking North Americans prefer “HNE” to “EST”. twinsun”
Due to different locations in Australia having various interpretations of summer time with start/end dates and clock shifts. As well as the operating system not having zoneinfo data for DEST, AEDT etc (unless you create these yourself) it means you cannot rely on the correct time from php on redhat.
So far I have resorted to the following
[user@server ~]$ php -r ‘date_default_timezone_set(“Etc/GMT-11”); var_dump(date(“r”));’;
string(31) “Wed, 14 Oct 2009 22:24:29 +1100”
In: General
22 Jun 2009I have been migrating a large number of websites and consolidating servers to reduce costs.
As a result it is important to ensure that services are migrated smoothly, planned effectively,
after which I had a think about aspects to consider prior to migrating services.
Let me know if there is anything you think I have missed.
In: Linux
14 Jun 2009Recently we had an issue with one of our hosting providers load balancing (LVS), which resulted in some very small outages. As a result we decided to setup our own load balancing that we had full control over, and could manage ourselves. In addition to choosing a better suited weighting algorithm.
Each webserver is setup using ucarp an implementation of Common Address Redundancy Protocol (CARP) allowing failover of a single Virtual IP (VIP) for high availability. We bound multiple VIPs for each host as we noticed some HTTP 1.0 clients incorrectly sending the host address to the server.
There are many ways you can then proxy the webservers and load balance, however we decided to use haproxy. This can also be acheived by pound, apache mod_proxy, mod_backhand etc.
In order to setup ucarp & haproxy:
apt-get install -y haproxy ucarp
Modify /etc/network/interfaces giving each interface a unique ucarp-vid and adjust ucarp-advskew for weighting on each server (increment by one for each server) and set ucarp-master to yes if it is to be the master. Modify the configuration below appropriately.
# The primary network interface auto eth0 iface eth0 inet static address 10.10.10.2 # IP address of server netmask 255.255.255.255 broadcast 10.10.10.10 gateway 10.10.10.1 ucarp-vid 3 ucarp-vip 10.110.10.20 # VIP to listen to ucarp-password password ucarp-advskew 10 ucarp-advbase 1 ucarp-facility local1 ucarp-master yes iface eth0:ucarp inet static address 10.10.10.20# VIP to listen to netmask 255.255.255.255
To bring the interface up, simply run the following:
ifdown eth0; ifup etho0
ifdown eth0:ucarp; ifup eth0:ucarp
In order to configure haproxy:
sed -i -e ‘s/^ENABLED.*$/ENABLED=1/’ /etc/default/haproxy
Reconfigure apache to listen only on local interfaces (/etc/apache2/ports.conf):
So replace “Listen 80″ with
Listen 10.10.10.20:80
Listen 10.10.10.2:80
edit /etc/haproxy/haproxy.cfg
listen web 10.10.10.20:80 mode http balance leastconn stats enable stats realm Statistics stats auth stats:password stats scope . stats uri /stats?stats #persist server web1 10.10.10.2:80 check inter 2000 fall 3 server web2 10.10.10.3:80 check inter 2000 fall 3 server web3 10.10.10.4:80 check inter 2000 fall 3 server web4 10.10.10.5:80 check inter 2000 fall 3 server web5 10.10.10.6:80 check inter 2000 fall 3
Then restart haproxy with /etc/init.d/haproxy restart
Carp & HA Load Balancing
After changing your DNS to point to 10.10.10.20 you will be able to see the traffic balanced between the servers by going to the URL http://10.10.10.20/stats?stats with the credentials assigned above and see the bytes balanced between the servers listed.
Some other alternatives are:
I was recently working on a project to expose our trading systems via XmlRpc, Rest and SOAP. It was quite an interesting project, which took two of us three weeks to develop (Amongst other things).
This involved creating a testbed, that would automatically generate the payload and response for each protocol. The parameters are introspected for each class method capturing each parameters data type, allowing for user input via standard html forms. This is probably best described with a picture or two.
Most of the documentation was generated via reflection and comments within the docblocks, parameters, notes were also generated making it quick and simple to update. In addition to parsing the start and end line of each method for any applicable error codes/faults that may be returned.
Using the Zend Framework for the first time in a commercial product was not exactly hassle free, and still has quite a few issues with its webservices implementation. Currently there seems to be quite a bit of confusion regarding its Rest implementation and whether it is to be merged, would be great if someone clarify this.
The main issue I found with the Zend Frameworks implementation of XmlRpc and Rest is that it assumes that the payload it receives is valid. During my development, I tended to mix the payloads from SOAP, XmlRpc and Rest, yet it would assume that simple_xml can parse the input.
For example $this->_sxml is assumed to be a valid object, if not you will either get invalid method call or an undefined index, which doesn’t render well for an xmlrpc server.
/** * Constructor * * @param string $data XML Result * @return void */ public function __construct($data) { $this->_sxml = simplexml_load_string($data); } /** * toString overload * * Be sure to only call this when the result is a single value! * * @return string */ public function __toString() { if (!$this->getStatus()) { $message = $this->_sxml->xpath('//message'); return (string) $message[0]; } else { $result = $this->_sxml->xpath('//response'); if (sizeof($result) > 1) { return (string) "An error occured."; } else { return (string) $result[0]; } } }
One of the main issues with Rest was that it needed ksort when using the Rest client as the arguments were not necessarily passed in order. This can be “rest.php?method=x&arg1=1&arg0=0″ and it would interpret each arg in the order it received them. This should be sorted in the next release of the ZF.
As the webservices we are exposing needs to have quite good performance with the number of transactions it will be handling and the amount of reflection that Zend Server Reflection (Only noticed after I started profiling) performs and I wanted to optimize any overhead, which got me looking at Zend_XmlRpc_Server_Cache. First thing I did was profile Zend_XmlRpc_Server_Cache, which added a considerable amount of overhead. Looking at its implementation, it uses serialize, which is a relatively slow process and should be avoided, unless there is a large overhead in initializing objects. So most likely Zend_XmlRpc_Server_Cache will not add any benefit. And var_dump’ing out the reflection in XmlRpc spews out a shocking amount of information on some fairly large classes.
if (!Zend_XmlRpc_Server_Cache::get($cacheFile, $server)) { }
I tried a number of WSDL generators including the implementation in incubator for ZF, which I found to be the best, yet I still had to write a large chunk of the WSDL by hand and adapt it.
The best way to debug is to run the soap client with verbose mode on, and it will typically tell you the issue straight away.
Some other obscurities I found was capturing the raw request data. In our local development environment reading the raw request input, and then once again within the Zend Frameworks appears to work fine. However in our pre-production environment it fails to read the second request to read the raw request. (PHP 5.2.2)
if (!isset($HTTP_RAW_POST_DATA)){
$HTTP_RAW_POST_DATA = file_get_contents('php://input');
}
It does seem a little odd that the XmlRpc does not check whether $HTTP_RAW_POST_DATA isset before attempting to re-read raw input.
Whilst running PHPUnit I noticed a very weird quirk in our local dev environment, which essentially did the following… You would expect this to output the contents of an array right? Well between the method call to x and return the result back to method y returns NULL. This is very obscure and i’ve never seen anything like it especially considering it is explicitly set. I had a number of colleagues check this, which had us all scratching our heads. Has anyone else seen anything similar to this?
class test { public function x() { $ret = array(); for(...) { $ret[] = $row; } return $ret; } public function y() { $response = $this->x(); var_dump($response); } } $t = new test(); $t->y();
Overall the project went pretty well, I’m confident it is now stable especially with the number of tests we ran against it. It is adaptable to other projects that we may need to expose via an API, in total there is about 6000 lines of code alone just testing the 3 different protocols it supports. I would have rather avoided the Rest implementation with ZF as it still needs a lot of work, however XmlRpc is a lot more stable and I would quite happily use again. As there is a lot of overhead with reflection it is not the fastest implementation and was contrasted to some of the heavier web pages we have for some simple functionality. It would be ideal to replace the reflection with something lighter such as an array with the corresponding methods, parameters and types, however I would have to look into that if performance did become a major issue.
PS. Just to note I used PHP’s in built soap server.
I’ve had alot of experience with other programming languages, however I had to learn C++ from scratch in a very short period of time, a number of weeks ago. This was to develop a real-time stock quote client, the goal was simply to push data from remote servers into our databases, filter what messages it would receive and get something up and running fast as deadlines lingured. This was simple enough, however with the rush the application had its inherent flaws, due to my lack of knowledge of C++, the API, and the goals it had to acomplish.
I’ve since had time to learn a little more C++ and limited time to design the application properly.
The core problems with the application:
The first issue was that I used an API from interactive-data, which was compatible with “gcc version 3.2.3″ and is not kept up to date. This meant compiling a compatible gcc from source for 32bit platforms only.
./configure --prefix=/usr/local/gcc/ --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --enable-languages=c,c++,objc,obj-c++
make bootstrap
cd gcc
make
sudo make install
Once having a compatible compiler, I then had to make modifications to the Makefile, move a number of lib/so files to get MySQL to compile and get things working. Unfortunately I did not have a local machine to attach a debugger, so everything was trial and error from the command line with g++32, which makes life difficult identifying runtime errors.
Once everything was in place, the logic was fairly simple, foreach field retreived construct a query with the field name, checking the fields values datatype whether it be a datetime, varchar etc. Insert each trade message in a table, update another and if either failed, check if the fault was due to a missing column, if so add it and re-execute queries.
The problem soon arrises when you need to know when each column was actually last updated, with which field, value, datetime and the last insert id for the trade messages. Whilst looping through each trade message, I constructed an XML schema containing the above, however the tricky part is to ensure that it only updates the fragment matching the field in the schema. Not an ideal format to query from a database.
One of the fundemental issues is managing and storing data. For some exchanges you don’t want to store every trade message; simply storing the current data for a number of instruments is enough. Which servers or databases do you peg data to? If one database goes down, how do you handle fault tolerance? MySQL cluster is not a feasible solution, requiring multiple servers and large memory requirements per installation. The databases are highly susceptible to curruption or faults. Also particular sites may require data from multiple exchanges, so seperating trade messages per database is not also ideal.
All of this fundamentally comes down to configuration management.
One of the fundamental aspects of the application is configuration management. This contains where data should be stored for a particular exchange, the type of data to store, whether it is per trade message, current data or both. Which servers to source data from, whether it is real time or delayed, whether to source data for bonds, equities, automated trades etc… All queries can be grouped, or to query remote servers. Some of the products for example just for the London Stock Exchange is:
All of which is stored in several database tables and managed via a MySQL database and PHP frontend.
In high performance web applications you will always have bottlenecks within your application. Identifying these bottlenecks and optimizing is a tedious task and typically show themselves underload. A single bad/unindexed query can bring a server to its knees. A large number of rows will also help to highlight any poor queries, and on very large datasets you may come to the point where you may have to make decisions whether to denormilize database schema.
Whilst I develope sites, I typically print out all queries, EXPLAIN each select statement at the bottom of each page, and highlight it red if its doing a full table scan, temp tables or a filesort. As well as displaying SHOWS INDEXES FROM TABLE…
Not only will it help you to optimize sites, you can also see bad logic and areas to optimize such as a query for each loop when looking through a users table for example.
One of my favourite linux commands lately is the watch command. For Mac users you can get this from macports via “sudo port install watch”. Also a few other handy applications are mysqlreport, mytop.
# Appends file with processlist watch -n1 “mysqladmin -uroot processlist >>watch.processlist.txt” # Count the number of locked processes watch -n1 “mysqladmin -uroot processlist | grep -i ‘lock’ | wc -l “; # Count the number of processes sleep watch -n1 “mysqladmin -uroot processlist | grep -i ’sleep’ | wc -l “; # Run a specific query every second watch -n1 “mysql -uadmin -p`cat /etc/psa/.psa.shadow` trade_engine –execute “SELECT NOW(),date_quote FROM sampleData WHERE 1=1 AND permission = ‘755′ AND symbol=’IBZL’ GROUP BY date_quote;” ” # Emails mysqlreport every 60 seconds watch -n60 mysqlreport –all –email andrew@email.com # Displays process list as well as appending the contents to a file watch -n1 “mysqladmin -uadmin -p`cat /etc/psa/.psa.shadow` processlist | tee -a process.list.txt”
Watching the processlist is very handy in identifying locked, sleeping or sorting process states. If you have a large number of locked processes you typically should change the table type to INNODB, which supports row level locking. if you have a large number of sleeping connections, and you have persistent connections enabled, most likely indicates that connections are not being reused.
Running a specific query every second is exceptionally handy, the example I gave indicates whether one of our crons is correctly functioning and as each row is inserted you can watch something being either inserted or updated. mysqlreport gives numerous peices of information, extremely helpful in identifying issues, you can see more indepth at hackmysql.com/mysqlreportguide.
Look at the mysql slow query log and optimize each query starting with the most common, think whether you have to execute that query at all and use a cache such as memcached.
I also typically tend to look at the following:
Reference:
http://dev.mysql.com/tech-resources/presentations/presentation-oscon2000-20000719/index.html
Recently I had to install memcache on a number of servers, and I would always tend to end up with errors whilst memcache tries to locate libevent. I always seem to forgett LD_DEBUG, so I figured I would write up the process for installing memcache.
One of the dependencies of memcache is libevent, so firstly download the source files for Libevent.
tar -xvf libevent-1.3b.tar.gz
cd libevent-1.3b
./configure;make;make install;
Download the latest Memcached source code from danga.com
gunzip memcached-1.2.1.tar.gz
tar -xvf memcached-1.2.1.tar
cd memcached-1.2.1
./configure;make;make install;
Often libevent.so cannot be found when executing memcache. A useful command LD_DEBUG, is very helpful to determine where libraries are being loaded from.
LD_DEBUG=help memcached -v
LD_DEBUG=libs memcached -v 2>&1 > /dev/null | less
18990: find library=libevent-1.3b.so.1 [0]; searching
...
18990: trying file=/usr/lib/libevent-1.3b.so.1
18990:
memcached: error while loading shared libraries: libevent-1.3b.so.1: cannot open shared object file: No such file or directory
Simply place the library where memcached will find it and execute memcached.
ln -s /usr/local/lib/libevent-1.3b.so.1 /lib/libevent-1.3b.so.1
memcached -d -u nobody -m 512 127.0.0.1 -p 11211
The options for memcached are:
-l <ip_addr>
Listen on <ip_addr>; default to INDRR_ANY. This is an important option to consider as there is no other way to secure the installation. Binding to an internal or firewalled network interface is suggested.
-d
Run memcached as a daemon.
-u <username>
Assume the identity of <username> (only when run as root).
-m <num>
Use <num> MB memory max to use for object storage; the default is 64 megabytes.
-M
Instead of throwing items from the cache when max memory is reached, throw an error
-c <num>
Use <num> max simultaneous connections; the default is 1024.
-k
Lock down all paged memory. This is a somewhat dangerous option with large caches, so consult the README and memcached homepage for configuration suggestions.
-p <num>
Listen on port <num>, the default is port 11211.
-r
Maximize core file limit
-M
Disable automatic removal of items from the cache when out of memory. Additions will not be possible until adequate space is freed up.
-r
Raise the core file size limit to the maximum allowable.
-h
Show the version of memcached and a summary of options.
-v
Be verbose during the event loop; print out errors and warnings.
-vv
Be even more verbose; same as -v but also print client commands and responses.
-i
Print memcached and libevent licenses.
-P <filename>
Print pidfile to <filename>, only used under -d option.
To install the pecl package for PHP
wget http://pecl.php.net/get/memcache-2.1.2.tgz
gzip -df memcache-2.1.2.tgz
tar -xvf memcache-2.1.2.tar
cd memcache-2.1.2
phpize
./configure;make;make install;
Add memcache.so to the php.ini file
extension=memcache.so
Then run
php -i | grep -i 'memcache'
memcache should be listed and then restart the web server.
For further information:
Distributed Caching with Memcached
Currently I’m working with stock market data, and its quite an interesting topic when we are getting to the point of real time data as it brings a number of new concepts into the mix. The first challenge is to import information from the feeds into our databases (MySQL), whilst this should be a relatively straight forward task, I’m sure we are going to hit issues in terms of writes to the database (INSERTS/UPDATES). The information from these feeds will be used for various tasks, that will require alot of processing. The information displayed to the user will be via the web, therefore we have to maintain updated stock market information dynamically to the user via the use of AJAX.
The concept of real time computing should ideally be under 1 millisecond, however I have previously worked for companies where their distinction of real time meant a 15 minute delay. Whilst delays over the web are inevitable I believe a one to three second delay would be acceptable for users to view current information via AJAX.
As we will be using the data from the stock market for multiple applications, we will need to replicate the data from MySQL, this will only add a further bottleneck in the application. Most notably performance with replication will become an issue because every slave still needs to execute the same write queries as the master. Whilst the majority of queries, will be writes over reads, this becomes a fundamental problem in itself, making replication questionable. So we will have to look at multi-master MySQL server setup, or MySQL cluster, which holds databases in memory. The fundamental problem with replication is ensuring the consistency of the data between writes once replicated. Ideally if a slave falls behind we want to ignore Updates, that have previously been issued and just use the current values to ensure we do not have stale data.
We will ideally have to create a heartbeat monitor and validate the latency of data between nodes. As mentioned previously we would want to ensure that all slaves do not fall behind, however any slave that did fall behind we would want to ensure that updates for stocks were only applied with the latest and the rest of the binary log is ignored. Additionally we would need to seperate inserts for historical data to be inserted based on a sample time (‘1 Min’,’15 Min’,’Hour’,’Midday’,’End Of Day’,’End Of Week’,’End Of Month’), ideally this would most benefically be horizontally scaled.
This could be extended to monitor the latency of the end user and notify them that the data is out of date between the last sync with a little javascript.
The website itself will have to use AJAX to dynamically update all stock prices and activity in the market that are applicable on that page. The fundamental issue is that the prices are updating in real time, how often do we create a http request that is with in reason on server resources? Looking at this further, we will have the bottleneck of TCP/IP connections, the clients bandwidth, ideally testing users bandwidth, and whether the client accepts gzip or compressed content to reduce bandwidth costs.
AJAX request every second, servers typically handles 200 requests per second
say 25 users online, 25*60 =1500 requests per minute or 2,160,000 p/d
say 100 users online, 100*60 =6000 requests per minute or 8,640,000 p/d
We could optionally increase the clients connection limit in internet explorer, with a registry key to increase the
2 connection limit standards from rfc 2216 for persistent connections to http 1.1 agents.
IE7 release does not increase this limit by default however this is more notable when a user downloads 2 files and IE waits for the connection to release before starting a 3rd download for example.
Windows Registry Editor Version 5.00 [HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionInternet Settings] "MaxConnectionsPerServer"=dword:00000010 "MaxConnectionsPer1_0Server"=dword:0000010
I have been a developer for roughly 10 years and have worked with an extensive range of technologies. Whilst working for relatively small companies, I have worked with all aspects of the development life cycle, which has given me a broad and in-depth experience.