VuFind Local Catalog Outages
Monday, December 8, 2014 - 12:29pm
This morning at 7:14AM Monday, December 8th, the SOLR search index for VuFind "local" catalogs failed (there is a separate SOLR index that handles deduplicated "consortial" searches). The java process controlling the index was pinned at 100% CPU and no requests were being processed. At 7:34AM this morning, the SOLR service was restarted restoring service. At 11:22AM the service went offline again and was restarted.
Looking at the logs, we see that there are java "OutOfMemoryError" messages. The java allocation for SOLR has been increased from 14GB to 20GB of RAM and we will continue to monitor this service.
Tuesday, September 9, 2014 - 9:40am
Last month UIUC migrated to our SFX server becoming our 53rd hosted customer. While focusing on the SFX server during this migration, we (the CARLI IT staff) wondered if there were any changes we could make that would improve overall performance. One of the tuning changes (virtual CPU allocation) was implemented this past Saturday and the results were mixed, but on Monday it became clear that SFX performance was getting much worse under load. We modified the settings again and rebooted the server during the lunch hour yesterday (Monday, Sept 8th). The new setting is stable so far and appears to have improved performance. We will monitor the system and back out the changes if we reach the same "tipping point" that we reached Monday morning.
I apologize for the SFX service disruption yesterday. As we continue to look for new ways to make our services perform better, we will do our best to avoid service outages.
Downtime Sunday, August 10th for Patching
Friday, August 1, 2014 - 4:48pm
Sunday, August 10th from 6AM to 10AM, Production Voyager, VuFind, CONTENTdm, and SFX services will be taken offline to apply operating system patches. Downtime for each service should be less than an hour.
If you discover any issues with the services after 10AM, please contact firstname.lastname@example.org.
Network Outage this Saturday, July 12th
Monday, July 7, 2014 - 3:51pm
This Saturday, July 12th between 4AM and Noon, UIUC campus networking staff (CITES) will replace the backbone routers connected to the Production CARLI Data Center.
For some period of time during this upgrade, all CARLI services could be unavailable.
Information about this network event, any status updates, and an announcement that work has been completed can be found on the following website:
After the upgrade, please contact email@example.com if you notice that any CARLI services are not working properly. We will be checking systems and restarting any that were impacted by the network outage. If there are any ongoing issues after the work is completed, they will be posted to http://www.carli.illinois.edu/system-status.
Network Switch Failure
Friday, June 6, 2014 - 9:04pm
At approximately 6:19PM today (Friday, June 6th), one of two redundant network switches in the Production Data Center went into an odd state. All lights were on, but no traffic was moving in the switch. CARLI servers are plugged into both switches for redundancy, but the failure took down all networking. The failed switch was restarted and all services were back online at 6:54PM.
We will need to look closely at the log files to see if we can determine what caused the network switch error and why the redundancy did not protect us from this error.
UIUC Database Outage April 15
Tuesday, April 15, 2014 - 4:28pm
At 10:51AM this morning, our Database Administrator changed the Oracle password for the University of Illinois at Urbana-Champaign (UIUC) database account to run some tests. He thought he was logged into the Oracle Test server, but he was actually logged into the Production Oracle server. This caused Voyager client errors, forced UIUC circulation clients into "offline circ" mode, displayed "The catalog is not available" message in UIUC's WebVoyage instance, and blocked UIUC's VuFind access. The problem was corrected and UIUC Voyager services were brought back online at 11:12AM and VuFind at 11:29AM.
I apologize for this outage. At our next IT staff meeting we will discuss ways to prevent this type of error from happening again.
Oracle security changes caused Voyager and VuFind problems
Monday, April 14, 2014 - 4:08pm
At 9:30AM yesterday morning (Sunday, April 13th) we made a change to Production Oracle to enhance our database security. This change caused problems in VuFind, so it was backed out by 10AM Sunday morning. The change did not cause any issues in our Test server environment. We have identified what is different between Production and Test and are working to make sure they are identical for future testing.
The work on Sunday also introduced a permissions conflict on some database tables. The effect was that some of our weekend batch jobs did not run properly and will need to be submitted again. Some libraries were also not able to save changes to records or received errors in their Voyager clients. We identified this problem and corrected it at 10AM this morning (Monday, April 14th).
Hopefully we have identified the issues surrounding this security change, but we will wait until Spring semester classes have ended before applying it to Production Oracle again.
No Heartbleed on CARLI Servers
Friday, April 11, 2014 - 4:31pm
The Heartbleed bug in OpenSSL has been all over the news this week. It is a serious enough problem that it even has its own website (www.heartbleed.com). We scanned our systems and we did not find this problem on any of them, so there is no need to worry about changing passwords on CARLI systems at this time.
We are always looking for ways to improve the performance and security of our services. For example, we were already planning to make some changes to our web servers this summer to improve the strength of our SSL connections (newer versions, better ciphers, Perfect Forward Secrecy). If you have suggestions for improving our services, please contact us at firstname.lastname@example.org.
UIUC DNS Outage Today (April 4th)
Friday, April 4, 2014 - 6:03pm
The UIUC Domain Name Service (DNS) went offline at approximately 3:06PM today and campus networking staff report that it was brought back online at 3:42PM. The DNS service translates human-friendly names (i.e. voyager.carli.illinois.edu) into computer-friendly addresses (i.e. 188.8.131.52).
Without this service, some CARLI servers are unable to lookup addresses so that they can talk to other CARLI servers. The campus Voice-over-IP phone service also was impacted by this outage which resulted in busy signals when calling our office.
The UIUC campus maintains three DNS servers for redundancy: two located in Urbana and one in Chicago. A bad configuration was replicated across all three servers simultaneously causing them to go offline. The DNS manager has provided campus IT staff with a list of the issues that occurred today and the changes they will make to avoid these issues in the future. If needed, CARLI IT staff also have the option to setup our own caching DNS servers that we can use along with the UIUC DNS servers.
RESOLVED: ILDS label problems when using Internet Explorer
Thursday, February 13, 2014 - 8:41am
The issue with Internet Explorer has been resolved. Users should now be able to create labels on the ILDS website when using the Internet Explorer browser.
Description of the problem that has been resolved: A problem creating ILDS labels when using Internet Explorer. If you experience problems with the ILDS website, please try using Firefox or another browser to create labels. CARLI staff are investigating the problem.