Cisco OpenStack Private Cloud delivers a true public cloud experience in a private cloud environment-on your premises, behind your firewall. We handle the engineering, setup, and ongoing operations, so you can focus on building and running your applications. It's a core component of Cisco's growing cloud portfolio.
Today we meet with one of the real people behind the scenes that make this solution possible: John Schofield
John, what is it that you do for Cisco as part of the Cisco OpenStack Private Cloud team?
My thing has always been Linux system administration. Mostly I've been working on proactive hardware monitoring. Lately I've been doing more OpenStack work around automating live migrations. And of course helping our customers with the issues they run into with OpenStack.
Proactive hardware monitoring? Can you explain? What is it and why does it matter in a cloud environment?
Every five minutes, we are asking our customers' hardware "How ya doing?" Which really means, "How are your fans? How's your hard drive? How are your RAID controllers?" And most of the time the response is "My fans are fine. My hard drive is humming along nicely. And my RAID controllers are working perfectly."
We call it "proactive" because we're reaching out and checking for problems on a regular basis rather than waiting for them to happen and then dealing with the fallout. For example, in a simple RAID, you have five drives, and if one of them fails, you don't lose data. You're fine. As long as you replace that first drive before the second drive fails. That's the kind of thing proactive hardware monitoring would catch.
When we first rolled this out to clients, we found a whole lot of failures they were unaware of. Some were things that didn't cause data loss, but instead cause poor performance. Like the battery being dead in your RAID array. A RAID has a cache. So your program thinks it's writing to the hard drive, but what's really happening is the RAID card has accepted the data and stored it in its cache and then it's going to write it to the hard drive later. Accepting it to the cache is much faster than writing it straight to the hard drive. And it will do that as long as it has a battery protecting what's in the cache. If the battery is dead, it just turns off the cache and everything is a little bit slower.
Talk to me about the software that makes this possible
The basic software is provided by the server vendor. They will generally provide a tool that lets me type a command to see, for instance, the fans on one server. But that doesn't do me any good. We need to automate the process of asking how the fans are doing on all the boxes, and integrate it into our monitoring solution, so that if a fan goes out it will send an alert to let us know, and we can coordinate a fix with the customer.
And that's what you do?
It's one part of what I do. I wrote the code that allows these vendor-supplied tools to interact with the monitoring system in a way that's more useful to us. First I figured out what questions we needed to be asking the hardware, and then I wrote the code that automated the process of asking the questions and tying the answers into our monitoring infrastructure. Essentially what I did was aggregate the data, and put the data in a place that made it useful to our team. It's critical to ensuring maximum uptime for our cloud solution. Because if the hardware goes down-even if it's not our hardware-it impacts cloud performance and effects customer perception of how well the solution works.
Why does a Linux guy need to be doing this?
Because all of the servers I'm referring to are Linux boxes.
Aha. How did you get into Linux?
Well, I started off managing computers for the printing and publishing industry. I did everything. Back end, front end, kept everything running. I've said that the theme throughout my career is that I'm the plumber. You don't see what I do, but I keep everything flowing. Then I took a job with a startup that made touchscreen computers to provide brain fitness software for seniors and sold them to senior living providers. So I ended up being the sole administrator for 850 of these Linux boxes that I had designed all of this backend stuff for. And that is where I got a lot of Linux chops, because the admin ratio was literally 1:850. It meant I had to create a lot of automation and do a lot of automated troubleshooting.
How do you feel about working on the Cisco OpenStack Private Cloud team?
I love it. The people are brilliant. They're nice. They're passionate. There are no jerks. You know, one of the things that I found at other companies where I've worked is that the people were also brilliant, but there was less of a team atmosphere and no training. You were expected to figure everything out for yourself. But here, shortly after I started, they sat down with me and explained their plan. Laid out the big picture, described the building blocks. I loved that. It seems like something every company would do, but in my experience that hasn't been true. Here at Cisco-on this team-they gave me the foundation I needed to succeed.
I also like the collaborative aspect of this group. Everyone pitches in to solve problems. I was doing something last night that totally should have worked, but didn't, and one of my colleagues came over to help me right away. When he and I still couldn't figure out why it didn't work, we called another colleague over, and together the three of us figured it out. We're all passionate about technology, and we don't just look for answers because wehaveto-we look for them because it's what we love to do.
John has his own blog, so next time you're in the mood to read about things like Sublime Text Multi-Line Edits or Working Around VirtualBox Bug 12879, check out schof.org.