$theTitle=wp_title(" - ", false); if($theTitle != "") { ?> } else { ?> } ?>
by Andrew Johnstone
One of our clients Gorkana was bought by Durrants. As such I have been working on integrating and merging data. As a result I wrote a script to synchronize data between MySQL and Oracle, which is presently a one way synchronization.
The script(s) to synchronize data was broken into a few scripts.
Key points of the script.
Whilst setting this up I came across a few issues. The infrastructure for Gorkana is located at Bytemark in Manchester, and Durrants in London.
The first task was to setup a VPN connection between Durrants and our office, as well as our production machines.
We had issues setting up the VPN and tried varying software VPN clients including racoon and openvpn with a site to site connection. This chewed up quite some time and I ended up using vpnc as we were unable to establish a site to site connection in adequate time.
Whilst using vpnc the connection dropped after long periods of time and I wrote a few scripts to reestablish the tunnel if it was down. Ping is restricted on the network, which is why I used “netcat -w 3 -z ${ORACLE_SERVER} ${PORT}” to test the connection.
/etc/vpnc/vpn.wrapper.sh download – Ensure single execution
/etc/vpnc/routes.sh download – Tests and attempts to reconnect VPN.
Whilst testing, the VPN connection would drop out and one of oci_connect, oci_parse, oci_execute would block IO indefinitely on a read syscall. Despite the php documentation stating “OCI8 can hang until a TCP timeout occurs and an error is returned, which might be several minutes.”.
There are a few ways I attempted to resolve the problem.
I added connect and send timeouts to “$ORACLE_HOME/network/admin/sqlnet.ora” with the following.
TCP.CONNECT_TIMEOUT=10 SQLNET.SEND_TIMEOUT=300
Oracle supports “Fast Application Notification (FAN) Support“, which effectively pings the oracle server and fails over to other connections in the pool. The only issue with this is that the network I am connecting to prohibit ping. As such not making this method feasible.
In order to enable FAN add to the php ini “oci8.ping_interval=1” and “oci8.events = On” and enable on oracle;
SQL> execute dbms_service.modify_service( SERVICE_NAME => 'GMD', AQ_HA_NOTIFICATIONS => TRUE);
I have yet to see the script become blocked on IO after making the above changes or tested over a long period, however if this fails, I can fork the script and either set an alarm or terminate the child from the parent process using IPC .
I first pushed the data to Oracle using Multi Table Inserts, which has a limitation in Oracle 10g where by the sum of all the INTO columns cannot exceed 999 (ORA-24335). As an example 100 rows (1000 columns divided with 10 columns = maximum 100 rows to insert). This has been corrected in 11g.
INSERT ALL INTO changelog_users (changetype, fieldname fromval, toval) VALUES ('INSERT', 'fname', '','Andrew') INTO changelog_users (changetype, fieldname fromval, toval) VALUES ('INSERT', 'sname', '','Johnstone')
I switched this to union the results, which avoids this limitation.
INSERT INTO changelog_addresses (addresses_changelog_id,date_created,changetype,fieldname,fromval,toval,entity_id) SELECT '11833904',(to_date('2011-01-18 15:44:37', 'yyyy-mm-dd hh24:mi:ss')),'INSERT','is_primary','','0','45012' FROM dual UNION ALL SELECT '11833905',(to_date('2011-01-18 15:44:37', 'yyyy-mm-dd hh24:mi:ss')),'INSERT','is_primary','','0','45013' FROM dual
Before pushing data to oracle I rewrote queries to take advantage of associative fieldnames in order, so I do not have to filter data in PHP. As an example the following converts enum types to boolean values to insert directly into oracle.
SELECT `cl`.*, IF(toval IS NULL,NULL, IF (toval = "YES",0,1)) AS `toval` FROM `users_changelog` AS `cl` INNER JOIN `users` AS `u` ON u.users_id = cl.entity_id WHERE (synchronized = 'NO') AND (u.user_type = "JOURNALIST") AND (`fieldname` IN ('search_hidden_gds','is_primary_contact'))
Queries are batched into chunks of 10,000 rows to insert into Oracle and grouped by the corresponding tables to insert. Database queries to source data from MySQL were reused within filters to ensure only necessary data was pushed across, as only a subset was required to be pushed.
There are a few things that need to be optimized yet, updating a single table with 14 million rows is taking 44 seconds to update and the index is not being used with the between statement. However once the data has been processed it no longer needs to be retained. As such can simply truncate the data.
UPDATE addresses_changelog SET synchronized = "YES", last_sync=NOW() WHERE synchronized = "NO" AND addresses_changelog_id BETWEEN 11833904 AND 11840954 Finished Query MySQL in 44.525539875
It takes 0.2 seconds to insert batches of 10k into oracle an and currently the MySQL queries have not been optimized, but execute between 0.2 to 40 seconds on tables that range up to 18 million rows. The data to be synced after the initial transfer of data will be fairly minimal as the application is not write intensive.
Ideally I need to fork the script when executing statements, which will vastly speed up the script.
I have been a developer for roughly 10 years and have worked with an extensive range of technologies. Whilst working for relatively small companies, I have worked with all aspects of the development life cycle, which has given me a broad and in-depth experience.