System Status

Information on the Network Outage this Morning

This morning at 6:40AM, campus networking staff applied firmware upgrades to various network switches on campus including those in the CARLI Production Data Center at UIUC. This update failed on our redundant 10Gb networking equipment. They were able to get the network online after a few hours but without redundancy. At 12:10PM, they backed out the firmware updates on our data center switches which took services offline for another half hour but restored the network to full redundancy.

VuFind Local Catalog outage this morning

At 10:10AM this morning, the server running the SOLR index containing the VuFind Local catalogs stopped responding. A java process was consuming all CPU and the server was restarted to restore service. It looks like the service was offline for 12 minutes.

This has happened before and we think it has something to do with Java Garbage Collection (which happens at random intervals). The solution may be to schedule restarts of this server on a regular basis in the middle of the night.

Brandon Gant

CARLI System Services

VuFind Local Catalog Maintenance Tonight

On Sunday morning, we moved a rebuilt SOLR index of Local Catalogs to the VuFind server. That index has not updated the past two nights. After 10PM tonight, we will copy the index over again. This will require a reboot of the server which will take this service offline for a few minutes. Brandon Gant CARLI System Services

VuFind Problem 02/07

According to monitoring, the local catalog SOLR index for VuFind went offline at 7:16PM. This caused searches in all local catalogs to display a connection timeout error message ("Data not available for display"). That server has been rebooted and the system appears to be working again. Brandon Gant CARLI System Services

Voyager Connection Problems on Tuesday, Jan 29th

Yesterday (January 29th) at 12:00PM and again at 1:30PM, client connections to Voyager were dropped. The Production CARLI servers are at the University of Illinois at Urbana-Champaign. The campus network received a large amount of external network traffic which caused one of the firewalls to drop all existing external connections to campus. They worked with the firewall vendor and resolved the problem by 5PM. Brandon Gant CARLI System Services

Reports Server Updates

Right now, the Reports Server is a copy of Production from Saturday (before we took it offline). We are working on gettting the Reports Server back in sync with Production and it should be caught up by this afternoon. Brandon Gant CARLI System Services

Reports Server Online

The data transfer is complete and the server has been been booted up on the new disk array. We still need to do some verifications on the data today and change a few more cables, but it should not require more downtime. Brandon Gant CARLI System Services

Extended Downtime for Reports Server

After 10 hours, the data copy process is about 25% complete. At this rate, the Reports Server should be back online Tuesday, January 15th. Sorry about the additional downtime, but we need to get this data migrated off of the older Sun 6140 hardware. Brandon Gant CARLI System Services

Reports Server Reboot

Tonight at midnight, I will bring the Reports Server down and kick off a script to migrate data from the current Sun 6140 disk array to our new Dell Compellent disk array. To be ready for tonight, I need to at least reboot the Reports Server so that it will apply kernel changes to see the new disk volumes. I will start the reboot process in the next few minutes. Brandon Gant CARLI

Voyager Downtime

Voyager should be back online in about 30 minutes. The maintenance on Oracle and VuFind is complete. The Voyager maintenance would be complete too, but many of the Voyager services were in a "disabled" state due to all the Oracle server reboots. They should all now be enabled and I am rebooting the server again to make sure they all come online automatically. Brandon Gant CARLI