Human error? Try abject stupidity, Amtrak
Friday, Mar 1, 2019 - Posted by Rich Miller
* Tribune…
Human error caused the switching system issue that brought rail traffic to an almost standstill at Union Station Thursday, affecting tens of thousands of Amtrak and Metra passengers, Amtrak said Friday.
“The root cause was human error in the process of deploying a server upgrade in our technology facility that supports our dispatch control system at Chicago Union Station,” Amtrak said in a statement. “We failed to provide the service that Amtrak customers, Metra commuters and the general public expect of us. We own the system. We will fix this problem.”
Metra and Amtrak service returned to normal for Friday morning’s rush but it was an entirely different scene at Union Station on Thursday, as more than 60,000 Chicago-area commuters either faced crowds and long delays.
* From Sen. Richard Durbin…
I talked to [Amtrak CEO and President Richard Anderson] this morning and in blunt terms, asked him what happened in Chicago yesterday? Why did thousands of commuters see their service interrupted? He was honest and direct, and admitted that Amtrak made a series of errors.
The most important error they made was to decide to do a server upgrade to their computers during peak hours of service. This should be done in the middle of the night when only a handful of trains are running. Along with that, a worker fell on a circuit board, which turned off the computers and lead to the interruption of service that went on all day long.
Today, Amtrak issued a public apology to the people who were inconvenienced in Chicago yesterday. Mr. Anderson also told me there will be changes made when it comes to computer programming and upgrades in the future. But my job in Washington, along with Senator Tammy Duckworth, is to ensure that these federal agencies are held accountable. Amtrak failed yesterday, but I appreciate their honesty.
Emphasis added because… holy moly, how irresponsibly stupid can you get?
- Anon E Moose - Friday, Mar 1, 19 @ 12:06 pm:
Um, “a worker fell on a circuit board,” is something I have NEVER heard.
- Ron Burgundy - Friday, Mar 1, 19 @ 12:10 pm:
These rush hour mass debacles seem to happen at least ten times a year for one reason or another, and the frequency is increasing. Perhaps running station traffic out of Michigan remotely is a bad idea? Combine this with the wholly inadequate concourses at Union and one of these days someone is gonna get trampled. Is that what it takes to get some real solutions to the overcrowding and disrepair in and outside Union?
- wordslinger - Friday, Mar 1, 19 @ 12:12 pm:
–The most important error they made was to decide to do a server upgrade to their computers during peak hours of service.–
No one involved questioned the timing? That’s flabbergasting, and no way to run a railroad.
Perhaps those responsible in the decision-making loop need to be freed to pursue other opportunities.
- Regular democrat - Friday, Mar 1, 19 @ 12:12 pm:
I know u normally dont allow swearing on the blog but maybe today u can make an exception. Its mind numbing how dumb this was
- City Zen - Friday, Mar 1, 19 @ 12:22 pm:
==decide to do a server upgrade to their computers during peak hours of service==
Having to manage similar system upgrades from software to infrastructure, I wouldn’t be able to do a daytime upgrade of a production system no matter how hard I tried. It would never pass change management.
- wordslinger - Friday, Mar 1, 19 @ 12:22 pm:
–Combine this with the wholly inadequate concourses at Union and one of these days someone is gonna get trampled.–
That’s happened, more than once, that I’ve witnessed over the years during rush hours at Union.
Someone trips and falls and gets kicked around pretty good before the wildebeest herd can accommodate it.
The escalators at rush hours are particularly dangerous. Just too many people in a hurry all at once in very tight spaces. I go out of my way to enter and exit through the Great Hall. Room to maneuver.
- Blue Dog Dem - Friday, Mar 1, 19 @ 12:24 pm:
If we paid those servers the new minimum wage of $15/ hr now, instead of waiting til 2025, I bet this never would have happened.
- Ron Burgundy - Friday, Mar 1, 19 @ 12:29 pm:
— enter and exit through the Great Hall. Room to maneuver.—
Except when half of it is taken up by a squash court for no apparent reason. Also to continue my rant, communication both at the station as to what is going on and what people should do is still disjointed and insufficient. Communication to the public before they arrive is improving.
- Skeptic - Friday, Mar 1, 19 @ 12:32 pm:
” I wouldn’t be able to do a daytime upgrade of a production system no matter how hard I tried.” That would be the sane thing to do. Beancounters, however, when presented with a potential bill for overtime are frequently not sane.
- ike - Friday, Mar 1, 19 @ 12:32 pm:
I work by the metra station by Western Avenue and lucked out that my afternoon train was only delayed 30 mins (though i had to board a completely filled train), but the signal/switch issues has been an on going problem that Metra never seems to try and fix.
- Skeptic - Friday, Mar 1, 19 @ 12:32 pm:
“a worker fell on a circuit board,” Is that what they’re calling it these days?
- PublicServant - Friday, Mar 1, 19 @ 12:33 pm:
Somebody ought to tell the server infrastructure group that they’re 24×7 when it comes to infrastructure upgrades. While general maintenance might lend itself to a regular 9-5 schedule, when large software migrations and/or server upgrades occur Saturday and Sunday evenings are no-brainers for an entity that has such clearly defined heavy periods of use.
- anon - Friday, Mar 1, 19 @ 12:37 pm:
But did they try rebooting?
- Not It - Friday, Mar 1, 19 @ 12:38 pm:
Someone should be fired, but it is government, so they’ll probably be promoted up and out.
- Da Big Bad Wolf - Friday, Mar 1, 19 @ 12:38 pm:
“A worker fell on a circuit board”. Sometimes it’s a good idea to have a spare circuit board laying around.
- PublicServant - Friday, Mar 1, 19 @ 12:47 pm:
===Sometimes it’s a good idea to have a spare circuit board laying around.===
That’s why someone fell on the first one…just sayin.
- What's in a name? - Friday, Mar 1, 19 @ 12:53 pm:
Once again Wordslinger is my hero.
“Perhaps those responsible in the decision-making loop need to be freed to pursue other opportunities.”
Perfect.
- Stuff Happens - Friday, Mar 1, 19 @ 12:54 pm:
Something doesn’t add up here, and feels like it was easier to share this narrative than to explain what really happened.
Was there a zero-day exploit on the server? Was it being actively hacked and they had to address it immediately? I have a hard time believing that someone was so irresponsible as to pick that time without having a good reason. But if it’s a reason like that, maybe the story being shared is better…
- Perrid - Friday, Mar 1, 19 @ 1:02 pm:
Stuff Happens, I am a firm believer in Hanlon’s Razor: “Never attribute to malice that which is adequately explained by stupidity.”
The odds that they would lie about this to hide a security breach are much, much smaller than a bunch of people making stupid decisions.
- Tommydanger - Friday, Mar 1, 19 @ 1:03 pm:
Little did we know that Homer Simpson is now working at Amtrak.
- Anonymous - Friday, Mar 1, 19 @ 1:12 pm:
Give them time, they will show you.
- Lester Holt’s Mustache - Friday, Mar 1, 19 @ 1:12 pm:
==Um, “a worker fell on a circuit board,” is something I have NEVER heard.==
I’m going to assume this is just Durbin misspeaking, and what he was actually told was “control” board. If someone fell on a circuit board, that means Metra has people working around equipment with the outside panels removed.
- Huh? - Friday, Mar 1, 19 @ 1:12 pm:
So the three stooges were in charge of the computer upgrade? Or was it Laurel and Hardy?
- FormerParatrooper - Friday, Mar 1, 19 @ 1:13 pm:
1:12 was me
- Independent - Friday, Mar 1, 19 @ 1:14 pm:
While falling on the circuit board did the worker take out a vacuum tube as well?
- GA Watcher - Friday, Mar 1, 19 @ 1:17 pm:
“Communication to the public before they arrive is improving.”
The Transit Tracker on the Ventra App was worthless yesterday. It reported that all BNSF trains were running “As Scheduled” even as the media was reporting BNSF was implementing load and go.
- Anon - Friday, Mar 1, 19 @ 1:20 pm:
Survey says: Entity was trying to avoid paying overtime or shift differential so they did it during the middle of a week day
- Terry Salad - Friday, Mar 1, 19 @ 1:31 pm:
Did they try turning it off and on again?
- Sonny - Friday, Mar 1, 19 @ 1:57 pm:
What’s wild is they just did a free pass weekend as some kind of please forgive us campaign.
- ChrisB - Friday, Mar 1, 19 @ 1:59 pm:
So fun story: I was on the first train affected by the shutdown, scheduled to be in the station at 8:35. We sat outside of Union, just south of the post office for over an hour. We were already late/slow because BNSF switches went down earlier. Finally got off the train at 10:10.
What kills me about this explanation is that Amtrak didn’t even have switchmen on site to fix the problem. The conductor was trying to keep up updated, but there was literally nothing to say. No updates from the radios, no news from Union. He was audibly frustrated and couldn’t see any workers to manually flip the switches for a good hour. That tells me that there was absolutely no redundancy or failsafes in place for this server upgrade.
I didn’t even try to go home on the BNSF. The excuses are unacceptable.
This is all going to get really fun when the current contract is up on April 30th, and Metra and Amtrak have to negotiate a new one. Metra is paying relative peanuts ($244Mil total since 1984), and word is that Amtrak wants more. Meanwhile the concourses don’t have ceiling tiles, and the concrete above the tracks is literally falling on people’s heads, but hey, Ye Olde Fashioned Barbershop is top notch. Keep fiddling, Amtrak/Metra.
- Sir Reel - Friday, Mar 1, 19 @ 2:03 pm:
The second explanation was, my dog ate the server upgrade.
- OneMan - Friday, Mar 1, 19 @ 2:15 pm:
“holy moly, how irresponsibly stupid can you get”
As a long time BNSF commuter, I have every confidence that someone in this situation is going to get stupider as time progresses.
It would be interesting to see exactly what sort of “server upgrade” they were doing, was this an OS upgrade, and upgrade to a software system or a hardware upgrade? All of those have different levels of risk and different risks involved.
The ‘circuit board’ getting fallen upon adds a whole different set of risk questions, did someone fall into a server (it can sort of happen) was the board normally covered? Was the board normally exposed without a cover? What was it’s height above floor level? Did Amtrak have an additional board handy?
But the Amtrak part of this, while comical and annoying is a relatively small piece of the risk.
The risk in this whole thing is when we have the station full and the vestibules full and something happens. Lets use the least ’sinister’ option of someone has a can of mace accidentally discharge in an enclosed area with hundreds of people right next to each other jostling for position. There are a half dozen other things that could cause a crowd like that to panic without it being a malicious act that have happened at Union station. A fist fight, ceiling tile coming down (that risk is mitigated by the station not having had ceiling tiles in most spaces in the drop ceiling for a at least a year), pipe burst, platform ceiling collapse, etc. Don’t even think what a malicious actor could do in that situation.
You are going to have hundreds of people trying to get out a small space very quickly, that isn’t going to end well. Tbh that is why I avoid Union Station in these situations, because short of a ceiling collapse on the 2-4 platform I think that is the most likely way my commute might kill me someday, in a stampede of panicked people out of Union Station.
But why do people congregate in Union Station when the trains are having issues?
Because Metra does a terrible and I mean terrible, job of communicating how there is a problem, how long they think the problem is going is last and what the cause of the issue is. I think at some level Governor Rod has more credibility than Metra.
Since Metra has 0 credibility and offers these sort of wishy washy alerts, people ignore them and head to the station.
I am a commuter who gets alerts from Metra and I found out the issues were going to impact the afternoon rush via tweets from Chicago media outlets. No alert email from them, no tweet, those came later.
With Information I can make my decision to, wait it out (go to a bar), take the UP-W line and plan how to get from Geneva to my station or Uber to my station.
Fundamentally Metra and Amtrak need to make the ‘we are not letting people into the station at all’ call earlier and with more feeling. Their crowd control plans (relatively new), are a joke.
I have said this before, but Metra as an organization views itself as a victim, a victim of low funding, a victim of Amtrak and victim of the rail carriers (they don’t seem to hold the BNSF accountable for anything) and with that attitude they are destined to fail.
- Ron Burgundy - Friday, Mar 1, 19 @ 2:59 pm:
One thing they could do pretty quickly is to put up some electronic signage at the entrances to the north and south concourses for how they want traffic to go. For instance Southwest and Heritage Corridor here, BNSF that way, etc. When BNSF only has an issue, people on those other lines can’t get through the mob of people standing around. Ideally what they need is one method of internal communication.Announcements plus people with bullhorns who sometimes say two conflicting things is not working.
- OneMan - Friday, Mar 1, 19 @ 3:11 pm:
Ron
True that. The amount of times during an issue where Metra and the BNSF contradict each other is disappointingly high
- a drop in - Friday, Mar 1, 19 @ 3:18 pm:
Someone had to call in for a fix, “this is the AMTRAK Technical Support. All line are busy helping other people. Please stay on the line…..”
- Keyrock - Friday, Mar 1, 19 @ 4:12 pm:
“A worker fell on a circuit board”.
They might as well have blamed the Pepsi Syndrome,