VuFind Outage this Morning (July 31)
Friday, July 31, 2015 - 2:52pm
The vufind.carli.illinois.edu service stopped responding this morning at around 10:30AM. The Apache web server was restarted and the service was back online at 12:10AM. I apologize for the delay in getting the system online, but a few staff are on vacation today, so troubleshooting took longer than normal.
This is the third time Apache has stopped responding on this server since we upgraded to Voyager 9.1.1. We saw this time that certain Apache processes are not being released properly and are accumulating over time. This eventually exhausts the number of web processes available on the system causing it to ignore new requests.
We don't know exactly what is causing the processes to accumulate, but we know for certain that restarting Apache clears them out of the system. To prevent future outages, we have configured the server to restart the web server every day. The restart is quick and should not disrupt user searches.
Voyager Maintenance this Sunday (July 19th)
Thursday, July 16, 2015 - 2:22pm
Starting at 12:01AM this Sunday, July 19th the Voyager and Oracle servers will be brought down so that their data can be transferred to a different storage array. The data transfer should take at least 10 hours. I will send an update before 10AM Sunday if it looks like the transfer is taking longer than anticipated.
To avoid confusion, we will also take VuFind offline while Voyager is offline.
Voyager Outage at 6:36AM (July 10th)
Friday, July 10, 2015 - 11:14am
The Production Voyager server stopped responding at 6:36AM this morning. By 6:57AM, the VMware operating system decided that the server was really offline and initiated a restart. Services were back online by 7:05AM.
There is no indication that Voyager or the Linux operating system had any problem at that time. At 6:31AM, VMware automatically migrated some servers over to the same physical server that Production Voyager is on to rebalance the load. It looks like the Voyager virtual server lost access to the physcial CPU's, so this migration process either caused or contributed to the crash.
Since we have spare capacity and are in no danger of overloading the servers, I have configured the automated rebalance to be a more manual process.
Sunday Server Patching (July 12th)
Wednesday, July 8, 2015 - 3:00pm
This Sunday morning starting at 12:01AM, Production servers will be patched and rebooted. All patching should be completed before 10AM Sunday, July 12th.
Voyager will be down from approximately 12:01AM to 12:30AM since the Oracle server needs to be patched and rebooted while Voyager is offline.
VuFind web server outages
Tuesday, July 7, 2015 - 1:09pm
There have been two VuFind outages that appear to be caused by the Apache web server going into an odd state. The first outage was Friday, June 26th at 9PM and the second was at around 5AM yesterday (Monday, July 6th). In both cases, no obvious cause was found and the issue was corrected by restarting Apache.
Please continue to contact CARLI support if you notice that VuFind has stopped responding or has slowed down dramatically. If the problem persists and no cause has been identified, we may need to implement a weekly or nightly restart of the application before Fall Semester start.
CONTENTdm Down for Software Upgrade
Tuesday, June 23, 2015 - 3:43pm
CARLI's instance of CONTENTdm is down for a software upgrade from June 23 – 28, 2015. CARLI Digital Collections is still available for searching and browsing, though collections are static as of 5:00pm, June 23, 2015.
Note to library staff: Access to the CONTENTdm Project Clients and CONTENTdm Web Administration will not be available during the upgrade. Collections cannot be created or published, and no changes can be made to collections or collection home pages.
We expect CONTENTdm and the improved CARLI Digital Collections website to be back online no later than Monday, June 29.
Please refer to the CONTENTdm Upgrade webpage for more information including what changes libraries will need to make after he upgrade.
Send any questions to the CARLI Office at firstname.lastname@example.org
Voyager Slow this Afternoon (Wednesday, June 17)
Wednesday, June 17, 2015 - 3:58pm
As many of you probably noticed, the system was extremely slow this afternoon (Wednesday, June 17) starting at around 12:30PM. We have narrowed things down to a set of Course Reserve database queries that originated through VuFind from one member library. We killed those queries at 3PM and the system recovered immediately.
We do not know exactly what is causing this yet, but we now know what to look for and which queries to kill. We will need to do more analysis on the queries before we find a permanent solution.
Additional info on the Universal Catalog outage
Wednesday, June 17, 2015 - 10:34am
This morning at 2:53AM, the Universal Catalog database stopped responding. The new Oracle version has a parameter that limits the amount of data that can be written to Archive Logs to 10GB. We have increased that setting for all databases and will monitor it to see if it should be increaed further. After making that change, the Universal Catalog started working again at 8:53AM this morning.
As we use the upgraded system over the next few weeks, we will most likely encounter more limits/thresholds in the database and operating system that need to be increased. Thanks for your patience and keep sending us information about any anomalies or outages you encounter.
Resolved: Oracle Issues Affecting I-Share 6/17/2015
Wednesday, June 17, 2015 - 9:10am
This issue has been resolved. If you start seeing problems with holdings displays in I-Share please email email@example.com
Oracle Issues Affecting I-Share 6/17/2015
Wednesday, June 17, 2015 - 8:38am
This morning we are seeing Oracle issues affecting the display of holdings in I-Share. Searching is not affected. We are actively investigating this issue.