A tale of three-phased hardware upgrade

Now the second part of our three-phased hardware upgrade is complete. This phase included the upgrade of all electrical mains and fuses to 64 Amps from 16 Amps. This was need as the additional disks, sol nodes and the new SQL server, drain much more power than previous setup. The electrical part was completed yesterday, slightly interrupted by getting an official to sign off on the wiring part (which is needed for such jobs in the UK).

The part that we completed today was to move the transaction log of the SQL cluster over to our fiber-channel array, from an SCSI array that it had been using. This was need as we learned during the first phase that our SCSI array and an active-standby setup of Microsoft Windows 2000 Server cause it to enter re-synch mode each time there is a considerable change in the cluster. As we plan to replace the SQL Server Cluster with brand new servers and Microsoft Windows 2003, we wanted to move the transaction logs over to the fiber-channel array to save a 12 hour re-synch period for the SCSI array.

Now this was all completed at around 14:00 today, but when we start up Tranquility, performance was way off. We of course suspected that the total number of spindles servicing the transaction log disk-volume was insufficient, so we upgraded it to 6 disks servicing the log drive. That didn’t change any thing and the problem was eventually chased down to a deadlock fault in the procedure that handles the loading of stations and automatic roll back of failed item exchanged sessions. As the hardware part was the biggest change we didn’t spot it straight away, I apologies for this.

Now all I/O and power upgrades are completed and we are ready to replace the SQL cluster and add the additional SOL nodes next week, thus having doubled TQ to handle future growth.

Again I apologize for yet again having exceeded our downtime estimations.