System Status

Emergency Maintenance Starting

Due to a new Linux glibc exploit called "GHOST" that could be a serious threat to the security of our systems, Production CARLI servers/services will now start going offline to be patched and rebooted.

I will post another update when this work has been completed. If a service is offline or not working properly after emergency maintenance is completed, please contact support@carli.illinois.edu.

Brandon Gant
CARLI

VuFind Performance this Morning (Monday, Jan 12)

This morning at around 9:20AM, the Production http://vufind.carli.illinois.edu server stopped responding. We tweaked the Apache 2.4 conf settings and it was running smoothly again by 9:50AM.

Our apache2.conf file contains performance tuning parameters and we discovered that the new Apache 2.4 has separate conf files that set the parameters back to their defaults. We need to make a few more changes to the Apache conf files to clean them up, but I don't see any other issues with the operating system upgrade.

VuFind Downtime Sunday Morning Jan 11

This Sunday morning between 6AM and 10AM, we will upgrade the Production VuFind Apache/Linux server. This will cause an outage of at most 30 minutes while the operating system is being upgraded.

We are currently running on Ubuntu Server 10.04 64-bit and support ends for this release in April 2015. We are upgrading this server to 12.04, then to 14.04 which is supported until April 2019. We will also take this opportunity to improve our HTTPS encryption settings in Apache by using the recommendations from https://cipherli.st and by verifying the new settings at https://ssllabs.com.

VuFind Local Catalog Outages

This morning at 7:14AM Monday, December 8th, the SOLR search index for VuFind "local" catalogs failed (there is a separate SOLR index that handles deduplicated "consortial" searches). The java process controlling the index was pinned at 100% CPU and no requests were being processed. At 7:34AM this morning, the SOLR service was restarted restoring service. At 11:22AM the service went offline again and was restarted.

SFX Performance

Last month UIUC migrated to our SFX server becoming our 53rd hosted customer. While focusing on the SFX server during this migration, we (the CARLI IT staff) wondered if there were any changes we could make that would improve overall performance. One of the tuning changes (virtual CPU allocation) was implemented this past Saturday and the results were mixed, but on Monday it became clear that SFX performance was getting much worse under load. We modified the settings again and rebooted the server during the lunch hour yesterday (Monday, Sept 8th).

Downtime Sunday, August 10th for Patching

Sunday, August 10th from 6AM to 10AM, Production Voyager, VuFind, CONTENTdm, and SFX services will be taken offline to apply operating system patches. Downtime for each service should be less than an hour.

If you discover any issues with the services after 10AM, please contact support@carli.illinois.edu.

Brandon Gant
CARLI

Network Outage this Saturday, July 12th

This Saturday, July 12th between 4AM and Noon, UIUC campus networking staff (CITES) will replace the backbone routers connected to the Production CARLI Data Center.

For some period of time during this upgrade, all CARLI services could be unavailable.

Information about this network event, any status updates, and an announcement that work has been completed can be found on the following website:
http://status.cites.uiuc.edu/SystemStatus/jsp/view_events.jsp?eventId=427

Network Switch Failure

At approximately 6:19PM today (Friday, June 6th), one of two redundant network switches in the Production Data Center went into an odd state. All lights were on, but no traffic was moving in the switch. CARLI servers are plugged into both switches for redundancy, but the failure took down all networking. The failed switch was restarted and all services were back online at 6:54PM.

We will need to look closely at the log files to see if we can determine what caused the network switch error and why the redundancy did not protect us from this error.

Brandon Gant
CARLI

UIUC Database Outage April 15

At 10:51AM this morning, our Database Administrator changed the Oracle password for the University of Illinois at Urbana-Champaign (UIUC) database account to run some tests. He thought he was logged into the Oracle Test server, but he was actually logged into the Production Oracle server. This caused Voyager client errors, forced UIUC circulation clients into "offline circ" mode, displayed "The catalog is not available" message in UIUC's WebVoyage instance, and blocked UIUC's VuFind access.

Oracle security changes caused Voyager and VuFind problems

At 9:30AM yesterday morning (Sunday, April 13th) we made a change to Production Oracle to enhance our database security. This change caused problems in VuFind, so it was backed out by 10AM Sunday morning. The change did not cause any issues in our Test server environment. We have identified what is different between Production and Test and are working to make sure they are identical for future testing.

Pages