Analysis of the Voyager Issues Thursday Morning (11/07/13)
Friday, November 8, 2013 - 10:45am
Yesterday morning (Thursday, Nov 7th) at around 4:45AM, the disk space on the Voyager server reached 100% usage. All services were still online, but it prevented requests in VuFind and WebVoyage, blocked record updates in the Voyager Staff Clients, and caused other issues. Two problems combined to create this service outage:
First, a non-MARC file was manually generated to diagnose a record loading issue. An automated batch job retrieved this file and began working on it. We have modified the batch job to only retrieve files with a very specific file name structure.
Second, the automated batch job tried to parse the non-MARC data as if it were MARC data. This caused the job to go into an infinite loop writing data to disk (which filled up the disk space). We have modified this batch job to ignore any non-MARC data it may encounter in the future.
Once we understood the problem and cleaned up files to free up disk space, the fastest way to restore service was to reboot the Voyager server.
Thursday, November 7, 2013 - 9:35am
Planned network maintenance may cause intermittent outages for all CARLI services from midnight until 10AM Sunday morning.
RESOLVED: November 7, 2013: WebVoyage offline and problems working in Voyager clients
Thursday, November 7, 2013 - 7:57am
As of 9:15am, CARLI Staff rebooted the Voyager server and all systems should be back online and working properly.
WebVoyage stopped responding at approximately 5:00am on Thursday, November 7, 2013. Voyager server usage has been unusually high and, in addition to affecting WebVoyage, libraries may also experience problems logging into or working in the Voyager clients. CARLI Staff are aware of this issue and are working to resolve it.
I-Share Voyager offline October 2, 2013, midnight-6am
Tuesday, October 1, 2013 - 1:38pm
The CARLI Voyager system, I-Share, will be taken offline for all libraries from midnight until 6am Wednesday, October 2, to apply an emergency fix to a database index. During the outage VuFind will be available for searching the local and union catalogs, but VuFind will not be able to display item status information, nor will it support access to patron account records or the placing of requests.
During the outage Voyager staff client access will be limited to the Offline Backup function of the circulation client only. CARLI’s batch jobs that run circulation notices and the Voyager request promotion service will not run tonight; those data will be picked up in the batch run the following night. Other services such as SFX and discovery services that query Voyager will also be affected by this outage.
While we have successfully applied the fix on our test server, there is a slight risk that our work tonight may fail to resolve the problem. If that is the case, downtime may extend into the day on Wednesday. Please watch your email and the System Status News for the latest information and as always, if you have questions or experience any system problems not listed in our announcements, please contact us at firstname.lastname@example.org.
VuFind Problems on Monday 9/9/2013
Tuesday, September 10, 2013 - 10:58am
Yesterday between 12:06pm and 12:40pm, VuFind received many thousand requests for the same search from the same IP address. That amount of traffic severely impacted VuFind performance. We have contacted the institution that owns the IP address to find out what happened and to prevent it from happening again.
VuFind Outage to Expand Disk
Wednesday, August 21, 2013 - 9:43pm
The "I-Share" portion of VuFind is going offline during the next hour so that the underlying disk volume can be expanded. "Local" catalogs will still be available.
This maintenance will allow CARLI staff to add new functionality to VuFind.
UIUC Campus Network Upgrades
Saturday, August 3, 2013 - 8:11am
The UIUC campus networking staff are upgrading routers on campus this weekend and next weekend. One of the routers in the data center failed last night at 8:30PM and has been powered off until it can be replaced. There have been a few short network disruptions to the CARLI servers during the last 12 hours and there could be more today and tomorrow.
Sunday 6-10AM Maintenance
Thursday, July 25, 2013 - 4:54pm
This Sunday, July 28th between 6AM and 10AM, Production Voyager, Oracle, WebVoyage, and VuFind servers will be restarted causing a small amount of downtime.
This work is being done to make some final changes to the operating systems before the Fall Semester start.
Finished Voyager Migration to Linux
Saturday, July 13, 2013 - 9:55pm
Production Voyager is now running on Linux. Services have been cutover to the new server and the old Solaris server has been turned off. Voyager Cataloging and Acquisitions Clients should be enabled again.
Please contact email@example.com if you run into any problems with this new server or notice any performance issues.
Starting Voyager Migration to Linux
Friday, July 12, 2013 - 5:09pm
We are getting ready to start the Voyager migration from Solaris to Linux. Most services will be online during this work, but we are about to take Voyager Cataloging and Acquistions Clients offline to make sure records are not updated while we are reindexing data. I will post another message when the migration is complete.