Child pages
  • Supercomputer Status
Skip to end of metadata
Go to start of metadata
24 Feb 2016, 17:30Emergency network restartDue to a critical security vulnerability disclosed this evening, we have to apply an emergency update to our firewall software. This will momentarily cause interruptions to connectivity to the supercomputer, which should only be momentary.
20 Oct 2015, 15:00Head node rebootIn order to apply an important system update, we need to restart the scheduler head node on Tandy. Running jobs will be paused, and users will briefly be unable to submit new jobs. We will resume all paused jobs immediately on completion of this reboot.
7 Apr 2015, 17:10Available

We are in the process of completing the file system migration, working closely with our storage vendor.

Access to home directories is available again, and jobs accessing those directories are running properly. Some jobs that were running when the migration began may have exited and will need to be resubmitted. We are making additional nodes available to most users in order to try to catch up on any backlog of jobs.

Although we do not believe that there was any data loss, we need to run a file system check on home directories to confirm this. Let us know if anything looks to be missing or corrupted, and we will work to get them restored.

25 Mar 2015, 12:45Partially restored

We have set up your /scratch/<yourusername> directory as a temporary home directory. You can interact with the system as normal, EXCEPT that ANY ATTEMPTS TO ACCESS FILES IN /home/ or /panfshome/ will freeze until the migration is completed (and you'll probably need to restart your login session).

Once the issue is resolved, we will undo the workaround, and your logins will again automatically take you to /home/<yourusername>.

25 Mar 2015, 12:00Recovering

The migration of home directories from the failed storage system to the backup is continuing to move at a slow pace. Until it is finished, users will not be able to log into their home directories.

We are currently working on setting up a workaround to permit users to log in. Although home directories will still be unavailable for some time yet, jobs will be able to be submitted and monitored.

24 Mar 2015, 18:05Recovering

The migration of home directories from the failed storage system to the backup is underway but going slowly. We anticipate it will be at least overnight, perhaps longer, before it is completed and full access to Tandy is restored.

We have extended the run limit of currently running jobs to avoid their being killed due to execution delays caused by this issue. In the meantime, we are working with our storage vendor to find a way to accelerate this process or otherwise mitigate the problem. No user data has been lost.

24 Mar 2015, 11:30Unavailable

The storage issue from Friday has occurred again. We believe there is a persistent problem with the component of our main storage array that handles users' home directories. As a precaution, we are migrating all services from that component to its backup.

We do not have a firm ETA for the completion of this operation, but it may take several hours. Until it is completed, home directories will not be available, and access to Tandy may be unavailable entirely. 

No user data has been lost, and in general file operations will simply hang until this transfer is completed, though it is possible that some currently running jobs will be interrupted and need to be resubmitted. 

Access to scratch directories by jobs continues uninterrupted, so jobs using exclusively scratch storage will continue to run normally.

20 Mar 2015, 7:45Available

After some delay, the failed part of Tandy's main storage system has switched to its backup by 7:45 AM.

Access to Tandy is restored, although we have not yet identified the underlying problem.

No data was lost, and unless you receive an error e-mail from the scheduler, running jobs were not adversely affected.

20 Mar 2015, 6:53Unavailable

We are experiencing an issue with the main storage system on Tandy as of 6:53 AM this morning. We're working to isolate and resolve the issue as soon as possible. In the meantime, access to Tandy is unavailable.

24 Feb 2015, 17:34Available

Connectivity to our upstream Internet service provider has been restored.

24 Feb 2015, 14:25Unavailable

We have lost connectivity again to from of our upstream Internet service providers.

We are escalating the issue with their support and still waiting for more information from them. In the meantime we are pursuing a workaround and will update you when we have information.

24 Feb 2015, 10:30Intermittent connectivity outage

TSC is still impacted by an ongoing partial outage of one of our upstream Internet service providers. Access to the documentation portal is unavailable, and connectivity to Tandy is likely to continue to be slow or intermittent. Software packages on Tandy requiring connections to external license servers may be temporarily unavailable.

23 Feb 2015, 19:24Intermittent connectivity outage

We are currently experiencing an intermittent outage with our upstream Internet service provider. 

We have opened a case with them and hope the issue will be resolved promptly. In the meantime, connectivity to Tandy is likely to be slow or unavailable intermittently. Currently running or submitted jobs should not be impacted.

19 Feb 2015, 13:00Login node rebootIn order to apply a critical software patch, we need to reboot the login node submit000 on Tandy. This will have no impact on running jobs, but it may interrupt any file transfers in progress to, and users may briefly be unable to log in to, submit000.


  • No labels