Rob's not talking about real babies of course-he's talking about OpenStack Core and how shiny new projects that need nurturing and care-like babies do-are not good candidates for it-no matter how beautiful and chock-full of potential they may to be their doting supporters in the community. Core is a place for mature, stable projects that people can depend on.
And that's just a tiny little preview of the spot-on insights Rob shared during our interview with him on OpenStack Podcast#14. He also spoke about:
To see who we're interviewing next, or to sign-up for the OpenStack Podcast, check out the show schedule! Interested in participating? Tweet us at @nextcast and @nikiacosta.
For a full transcript of the interview, click read more below.
Niki Acosta: All right and we are live. Good morning viewing audience or good afternoon depending on wherever you may be. I am Niki Acosta from Cisco and I am serving today for Jeff Dickey of Redapt. We are here with a very awesome guest today, we are super excited to have Rob. Rob, introduce yourself.
Rob Hirschfeld: My name is Rob Hirschfeld. I am an OpenStack board member, I'm also the founder of the OpenCrowbar Project which is a physical infrastructure provisioning project. I recently stepped out of Dell where I was leading the OpenStack and Hadoop technical project to pursue OpenCrowbar full time. We've built this company called RackN or RackN as we like to say where we're working on building up that as a supported platform and really being able to help customers leverage the potential we've opened up with Crowbar.
Niki Acosta: Wow. Rob, I've known you for quite some time. Talk about somebody who's been involved in OpenStack from the very beginning and I definitely want to talk to you about that. Before we do that let's go back a little bit further. Obviously you and I are in the same city-ish. Talk to us about your ... How you got into tech.
Rob Hirschfeld: Sure. Let me start where I got into Cloud. I got into Cloud and believe it or not in New Orleans in '99 with partner in crime Dave McCrory, who is pretty well known in certain circles for his data gravity work. He and I basically started at what was at the time called an application service provider back in '99. We would now call it an IS or a Cloud. I've been playing at this game for quite awhile, trying to make it work, trying to figure out what the right way is to make infrastructure more available.
Niki Acosta: Where are your roots? Are your roots in systems administration? Are they in devops? What do you classify as your roots?
Rob Hirschfeld: I'm an engineer by training, industrial engineering, which might sound very strange, "How does an industrial engineer get into Cloud?" What I found over time is that Cloud is about process, it's about automation and recently I've been really involved in lean, agile, that type of process. That type of process translates into what we do in Cloud-it's actually a natural evolution. For the same type of revolution we saw an industry in the 90's where we moved into just in time manufacturing.
Cloud has been creating the same overall change in the industry as just in time and lean deployment have. In operations we've really changed the way we operate IT. I put that frame on, I usually consider myself a developer. I've been doing ops since 99 because I found it was so hard to be a developer and actually deploy things. I've been trying to make that part of my job easier ever since.
If you look at my career it's always been about how do we take great ideas people can write as code and then make them usable for people at scale. Sometimes that scale has two users at a time, sometimes that can be hundreds or thousands. It's a significant gap. It's much easier to write code than deploy code unfortunately.
Niki Acosta: How was the Crowbar project... I guess you... It might be great to sum that up for people what intentionally it was. But how did that get started and fast forward now to where you are today with it.
Rob Hirschfeld: Crowbar started frankly because my team was up against the ropes at Dell. Not through anything we expected, we had a new class of servers from Dell that were cloud-enabled. We were dealing with (this was pre- OpenStack) we were dealing with the Azures and the Eucalyptuses and the early Hadoops, some of the early big data plays. We had the software, we had the hardware, but we would go to customers with it and fall apart. It was just really, really hard.
What we discovered was... This is the same time devops was getting defined as a word. What we found was this huge gap between the software and the hardware, we had to fill it with some automation, otherwise we would be doing one-off deployments every single time. Crowbar came out of those battle scars of: We can't walk into a data center and hope to get the install right in a reasonable amount of time. We had to automate our processes and do it in a repeatable way.
So we could duplicate it in our lab, we could duplicate it in the field and then we could go back to that site six months later and have them still be successful. Crowbar was really taking those best practices and automating the best practices, that's really how we see Crowbar. When we did it for OpenStack people ... We described it as an OpenStack installer. That's really sort of what we needed to describe it as. But when you look at what Crowbar is really doing, it's really an orchestration system that sequences physical operations as part of provisioning.
So it's really a system tool. I wouldn't call it system management-that has its own meaning-but it really is responsible for coordinating all the operational activities and start doing scale deployment.
Niki Acosta: That is used for... Namely for installation and then what, for scaling as well?
Rob Hirschfeld: This is one of the things, you asked me about a little bit of history. We started Crowbar back in... The 2011 time frame is when we'd OpenSource and we had our first OpenStack install. Crowbar was actually the first OpenStack install back in the Cactus days. I've been doing OpenStack and OpenStack installs for a long time. When we took the feedback and literally re-architected Crowbar into OpenCrowbar.
Niki Acosta: Oh, oh-we lost you?
Rob Hirschfeld: You lost my audio?
Niki Acosta: We're losing a little bit. There we go. We're back.
Rob Hirschfeld: Okay. We literally re-architected Crowbar -Crowbar2 and in that process we defined something called ready state and it's been a really important thing for us. We found that in order to do OpenStack deployments there are a lot of ways to deploy OpenStack, Chef, Puppet, Ansible, Salt, and there's a whole bunch of installers out in the market. Packstack is a good example of one that we've been playing with a little bit.
What we found was that all of the installers for them to work had to get to a point where there was a good ready state. The networking was set up, all of the infrastructure was passed out, keys were set up, all of that work. One of the things I know we want to talk about is how do you make this successful OpenStack, what are my favorite parts of OpenStack?
One of the things I really feel like is that for the community to come together around these installers we have to have a baseline so that somebody can say, "I did it this way," and you can get to the same place and break in the same place when they're doing installs. That's a lot of what we want OpenCrowbar to be about. It's about getting you to a baseline, a repeatable baseline and then you build on top of that. Then you can build all sorts of different ways on top of that.
We're very flexible from that and what's fun is that you don't have to just build OpenStack on top of it because of this ready state boundary. You could build Cloud Foundry on top or Ceph on top or Hadoop or Mesos, or Kubernetes. Our goal is to take all of that pain from the physical infrastructure, be vendor-agnostic so you can swap out different types of hardware but your scripts above that would still run. Sort of a long answer but it's a big vision. It's really an important thing to help make operations more consistent for a community like OpenStack.
Niki Acosta: Being open I'm imagining that there would be a mechanism by which different types of hardware would be compatible with OpenCrowbar. Is that right?
Rob Hirschfeld: Exactly. Crowbar was designed to have basically an abstraction layer for hardware and so we can deal with the fact that you have a Dell gear, HP, Cisco. It really doesn't matter from that perspective because what we do is we break out the control actions into very small pieces. Then each piece can be run separately. If it's a Dell, actually Dell has two types of gear that we've dealt with.
One type you have to make all your changes and corrections internal to the system what we call the side band or in-band change. Then some types of gear you only... You control bass through their BMC networks in their out of band control planes. That's perfectly typical. We find the amount of gear, variety of gear, very high and we just have to deal with it. We have to deal with it inside of Dell, we have to deal with in every data center.
We can't... practically... maybe if you want to buy for only one vendor and only one model you could have a completely homogeneous system. You get to a thousand nodes and even from one vendor there's a fair bit of variety.
Niki Acosta: Now, speaking of vendors you obviously spent a long time at Dell and there's always been this notion-at least how Rackspace founded OpenStack-there was a notion that the hardware really shouldn't matter. You should be able to put whatever kind of hardware you want or make your own or buy in mass quantities from some black label provider. Do you believe that's true? How much does hardware matter in OpenStack?
Rob Hirschfeld: Really interesting question. Um, the hardware ... The hardware shouldn't matter at all to the end user of course. The whole purpose of OpenStack is to make all of the mess that is a physical data center go away. Unfortunately when you're building something like OpenStack, Hadoop is a really good example, Ceph, Swift inside of OpenStack. All really care about the physical topology, the infrastructure.
You care about your switch layouts, you care about your hard drive configurations, you care about which machines are next to each other in the racks. Because they have ... they're in the same power zones. Those details do matter, the vendor doesn't matter. The vendor is much more specific just from your operational needs and what you're trying to accomplish but there's a really interesting movement called OpenCompute that we started to play in a little bit where people try to open source the hardware designs.
Even with open source the hardware still comes from a vendor. It's still Supermicro or Dell or HP or Quanta making those boards. It's not until you get to Facebook scale that you go directly to the OEM and tell them what to make. The hardware... We thought that we could make hardware not matter. When you're dealing at the hardware level, the hardware matters. You have to get it right, you have to create an abstraction for it.
What we had to do is we decided that we needed a boundary and to say, "All right, we're gonna let ... We're going to try keep all hardware details below this level, this ready state level. Then focus our scripts on dealing with the abstraction above it. I think we're getting to a point where there's sort of three layers, right. There's a cloud user level where you're using OpenStack and that OpenStack infrastructure is very portable and homogeneous. There's a level below that where you're doing physical ops on top of an abstraction boundary so that the scripts should be the same site to site.
Then there's the actual physical ops themselves where you have to deal with the variety of NICs and rate cards and topologies and how people want to manage their gear.
Niki Acosta: In Cisco probably more so then working at Rackspace. When I joined Metacloud we dealt with a ton of enterprise users and it seems like the enterprise users actually do care what the hardware is but I'm not sure it's for the right reasons. Obviously your data center guys and your infrastructure teams have probably been working with the same vendors for eons and have these big vendor relationships.
What happens to all the intelligence that people have laid down on all their boxes? Does that go away in the Crowbar scenario or do they still have access to that, the tools and different things that they can use to troubleshoot hardware?
Rob Hirschfeld: From our perspective a lot of those tools we've tried to preserve and work with. Crowbar's job is not to be very opinionated. We started off more opinionated and we heard very clearly that we were way too opinionated by selecting Chef. What we found is that there's a couple of aspects, one is that people are heterogeneous by design. Most data centers, to prevent themselves from being locked in to one vendor, pick multiple suppliers. They do it for business continuity reasons. In some cases they've done it because they've acquired companies or do they have different projects and have bought different gear and they have to pull those together.
Also very normal, we see that a lot in the devops tools. You'll get a team that likes Chef and a team that likes Puppet and then they have to come together and live in harmony in the operations environment. What we've tried to do is make those things much more neutral and sort of be a neutral territory for that. When we were first building OpenStack certified hardware, building Dell's first reference architecture-and I know that I was working with your team at Rackspace when we were in that process.
We would talk to one customer and they would have one opinion, so we'd morph to that opinion. Then we talk to somebody else and they'd have a second opinion and maybe even a third opinion. They'd be fighting internally. We found that it just wasn't worth telling people they were wrong because they weren't. There are six different correct ways to install OpenStack at least-probably exponentially more than that. There's a lot of different hardware that works for this. I think the market over the last four years has converged into two or three patterns that are pretty consistent.
For example people with compute typically are going for a 1 or 2 U box with six drives and teamed 10 gig NICs and dual proc with 48 to 96 Gigs... I can give you ... It's not that hard to come up with the general spec. Even inside that spec you see how much wiggle room I'm giving myself. The reality is that's normal and it's a fool's errand in my opinion to try and tell people they're wrong when they have just a reasonable alternative. Our job was to create ways that you could abstract that... From Dell selling OpenStack reference architectures.
If I told people, "Hey, we don't use teamed 10 gig nics," which our original spec didn't, they told us no. It wasn't helpful and they weren't wrong. It was more expensive. This is the biggest shock. Maybe it comes back to your question. Sometimes people make choices that are much more expensive than they have to because it's what they're comfortable with.
Because they're comfortable with the vendor or because they think they need 10 gig teamed NICs where they want to physically segregate their public and private traffic onto different physical networks. It's not helpful for me to tell them they're wrong and to me this is part of the challenge with community. One of our big challenges here is that in communities there's a lot of right answers. There's a lot of people who are right, there's a lot of use cases that are different. You get into a weird corner if you tell everybody, yes.
You'll also get in very and equally weird corner if you don't tell anybody no, and I think that as we bridge into the OpenStack community and things like that we can come back to that.
Niki Acosta: Have you learned that the hard way? Not telling people, "No." It's just as important as not telling anybody.
Rob Hirschfeld: On the OpenStack side and I hope we have people who have ... Who are here to talk here about OpenStack, to talk about the OpenStack core pieces. Two years ago we had a real dilemma about how we were going to define OpenStack core. OpenStack was still growing-at that point it seemed modestly compared to today-but it was still growing, and people were having trouble figuring out how to make two OpenStack clouds work together.
Most famously Rackspace's and HP's didn't, although we maybe gave them a harder time because they were the first for something that's become a pattern a little bit.
Niki Acosta: Thanks for that Rob.
Rob Hirschfeld: This is the challenge being first, right? I think that kudos to getting the sites up. I think that this was a classic case where the aspirations of having a uniform public cloud across multiple vendors and the reality of what it took to do that was much, much more sophisticated than we thought. In part because we had to say no to things in order to create this interoperable base, and we didn't. We spent a lot of times building a community which meant saying yes to people.
You see this in the OpenStack community today as we keep adding more and more projects. We love to bring in developers, we love to bring in more projects, we like saying, "Yes, we want your code." That's exactly what a community should do. As an operator-and when I talk to operators-that same behavior is very frustrating. They will turn around and say, "Wait a second do I have to have this component? Do I have to have that component? Is Ceilometer a required piece? Or can I substitute?"
Leaving that ambiguous is very frustrating to the community and it leads to funny behaviors where... I was listening to Michael Stills talk about Nova, and additions that were made into Nova to support, I think it was Trove-I don't remember exactly which project-that were causing other projects to have challenges in implementation or there are something stalled through the gate. It's a complex series of interconnections.
The fact the Trove is a "yes" in OpenStack-which makes sense-but is not clear yet to the operators if it's a required piece or not a required piece really causes a lot of confusion. Then it causes us to make changes to the APIs to support the project because we feel like it's required. Then potentially interrupt other components that other people will think are required. You end up with this interlock dilemma of who's most important. OpenStack has been going through some really interesting things in the TC side with this levels definition.
This big tent says, "We're going to say yes to more people," level says "But not all animals are equal," in my animal farm reference. There needs to be somebody at the bottom that says, "These things have to integrate together as level zero." Then we layer things on top of it and more radially. Things at the base have to be there that brighten my sunshine and then these two pieces don't necessarily have to be related. It's part of how you grow a project the size of OpenStack.
Jumping all the way back this was just an emerging problem two years ago when we started this core definition work where there's a lot of stress between Swift and Nova and how we would work out what was required, what wasn't required. We already singled out Ceph deployments where people were substituting Ceph for Swift. I wish they'd made those phonetically more different.
What we've seen here is that we needed to be able to tell people very clearly "This is what you had to have to have an OpenStack cloud." "This is what you didn't have to have to have an OpenStack cloud."
Niki Acosta: That's a tough thing to do, right? Just by sheer number of vendors that are involved, you look at all the projects, you look at all the priorities of different companies that are participating. You're trying to sort through all of that and figure out what you need, what you don't need, what's required, what's not required. Does this work seamlessly with this other project, if I implement this this other work or does it break? That can be a very, very difficult process.
To the point where someone might just throw their hands up and say, "I'm done with OpenStack." I think we've seen a lot of people go that route, the DIY route and say, "I'm going to build this on my own. It's going to be great." Then they start experiencing some of these intricacies and they just say, "Whoa." Rob, you've done probably more than anybody else in the DefCore front. If you wanted to talk ... Maybe you define DefCore for the people, I guess we already have.
You've written a lot of blogs about it, you've talked to a lot of people about it and I think it's an important movement. One that I certainly get a lot of questions about. Let's hear a little bit about that.
Rob Hirschfeld: I'd be happy to talk about it. It's not as big and scary as some people think. I've had people come back and say "When you explained it-it all makes perfect sense now." Let me see if I can help take the people who are DefCore's critics and explain what we're doing. The first thing that people don't realize is it's about commercial use of OpenStack. We get a lot of people up in arms because they think we're trying to run the technical side of this and it's not at all that.
What we're really doing is, the OpenStack Board controls the trademark for OpenStack. It lets people use the word OpenStack in their product and use the logo. If you want to do that we need to ... For actually trademark management reasons we have to tell people you can use or can't use OpenStack in this way. If we don't do it we actually could lose control of the trademark. It's very important for us to describe that but it's only commercial.
People using in the community, contributing code that's managed by the TC, the technical committee and they control which projects are in and which code is in and all that stuff. That's the first thing that sort of gets people up and people understanding. We have a very commercial flavor about what we're doing for DefCore because we're trying to help create a commercial ecosystem.
I'm very unapologetic, right, OpenStack has to make money for the people, the companies that are paying, the developers who participate. We're at over 90% corporate sponsored development in OpenStack, I think the number is even higher. The people's sponsoring those developers have to see some return for their investment. It's all tied into that. I hate to be ... In an open source community you want to be able to say, "Kumbaya, we're all doing this because we love open source."
But OpenStack is not exactly a Kumbaya project. It's operations, infrastructure and at the end of the day like you were saying, it has to deliver workloads. It has to be stable. Stability is a primary feature for OpenStack even more than some of the bells and whistles we want to add on. That said when you start looking at Core you're going to have to say no to people. Right, this is I'm thinking to be our theme for the Podcast is you have to have a way to say no.
It doesn't work to say no if you just look at them and say, "You know I don't like you. No." You have to give them a way to ... A reason why you're saying no, you have to tell them what it would take to say yes and that's a lot of what we spent the last two years doing with DefCore. In DefCore we started with some basic principles, to describe how everything fits together. We made a decision at the time that it would be test driven so it would be very quantitative, not qualitative in making decisions.
Then we had to say, "How do we pick the test?" We spent another couple of months figuring out how pick tests and came up with 12 criteria that say, "These tests are going to be in, these tests are not." They give very clear signals to the community. It seems like documentation is part of it, use is part of it. That is important. Then we also had to say which parts of the code would be required because OpenStack is not an API, it's also a project with living code. We had to go through a process to say, "This is part of ... These parts of the code are required or not."
The reason we've done that is when we start saying, "You can be. You are core. You're not core." It's really not as much you are core, you're not, you're telling a vendor "You must implement these parts of OpenStack, you don't have to implement those parts of OpenStack."
Niki Acosta: Do you expect a shorter list of vendors that meet a ton of criteria or do you see a larger lists of vendors that meet a smaller set of criteria? Where is that balance?
Rob Hirschfeld: It's a great question. We actually were struggling with that balance on the Board. Because we have some vendors, SwiftStack is a really good example but they're not the only, there are going to be others in their wake. We just want to use Swift, right? They don't need to use OpenStack Nova in their product. That doesn't help them. They really want a core definition that fits for them for their use. We have some vendors like DreamHost who want to use Ceph, they were one of the original proponents for Ceph.
They want to use Ceph as their object storage instead of Swift. They want OpenStack Nova and the packaging around the compute side and they don't want to be told they have to use Swift as part of their deployment. In those case, yes, we can support smaller vendors-this is the change we made in October-to create OpenStack components for core so there's a core component concept.
Then there'll be broader level vendors, I know IBM is very interested at this level, I believe Red Hat will be too. At the platform level where they say, "We use all the components. Everything is good." I think Rackspace will be a component because they have the history with Swift would be a component or a platform level where they are using both Swift and components but everything...
There's no free ride with any of this. I haven't talked to any vendor in the DefCore process that said, "We're already in compliance. It's not going to be a big deal for us." A great example is Keystone for Rackspace... Rackspace famously didn't implement Keystone for I believe very sound technical reasons when they made those decisions. They have to figure out how they're going to implement the Keystone requirements. Today Keystone code isn't required-just the APIs. Actually not even the APIs in our current set.
It's very important to understand how all these pieces work together but we definitely have an outlet for people who don't want the whole project. We have an added mark or more flexible mark for people, vendors who can implement the whole project. Then what's very important to me is we have a community process by which people can see what's coming with this. Can talk about which capabilities and tests they think should be in Core. They can talk about which code should be required or not.
Our goal is not... And really the board isn't capable of doing this. I can explain capable maybe a little bit more. We're not in a position to make a caveat decision with 24 board members. We want this, we don't want that. We have to have a way that people can say, "I see what you're doing. I come to you with an opinion. I have an objection to this." If the objection makes sense we incorporate it. If they don't reflect the majority view then we have an outlet so that they don't get trumped on by the community.
There's been a lot of nuance in those things. There's a lot of safety valves in the process but we at the end of the day have to be able to define something, say "This is the limit," and then move on.
Niki Acosta: That's got to be a really a tough job. Just thinking about the scope of the number of people involved in those decisions and all the things that vendors are going to have to do to meet that core definition. At the end of the day for people who are seeking that commercial option, it's probably a good thing. It doesn't mean that if you're not using ... Someone who meets that core definition that you're not going to get what you need, right?
There's going to be probably some due diligence on the part of users to figure that out for their own sake, right?
Rob Hirschfeld: I think so. What we really want is people to start with the Core when they do an implementation then extend forward, right. People will pull in projects that they think are valuable. We want to know what those projects are so we can start adding it to Core. That's one of the other things, this isn't a recommendation, it's a base. And then from that if you want to add Trove. Trove's really good, I think up and coming candidate to be a component of Core then you have Trove, you start using it.
The more people use it the more likely it is to become a core component because here's our goal. If I write an application and I'm very ecosystem focused with OpenStack. OpenStack has a great community, it has a great development community but it's evolved. To be very frank on this. It doesn't really do us that much good if we have a whole bunch of people getting together in exotic locations, Paris isn't that exotic but in Paris or whatever, in Vancouver and we all pat each other on the back and say we're great.
Because that's internal inside our own community. The thing that we really want is a thriving ecosystem of people who are building applications, building tools and things like that above us. Right? Anyway, Amazon is fantastic for having a very vibrant e