If there was an award for "The Nicest Guy in OpenStack" my vote would be coin toss between Edgar Magana from Workday and Tim Bell of CERN. Actually, Sean Lynch, Metacloud Co-founder is right up there, too, but since he's technically my boss's boss, he's inelgible for my vote. But I digress....
Edgar's story begins with his "low income" upbringing in Mexico, to an interest in computers, a PhD in Computer Science in Barcelona, and a great career as an authority on OpenStack at Workday. The most difficult part of Edgar's journey? Learning English!
This episode was certainly one of the most pragmatic shows we've done. I expected to hear a bunch of enthusiam about OpenStack Neutron, but that wasn't compeletely the case. Edgar, formerly of Cisco, shared his opinion on the readiness of Neutron for large-scale production workloads and where OpenvSwitch falls short (40-50 nodes, in case you're wondering.) Edgar believes that from the operator perspective, Neutron still has shortcomings and more must be done to simplify networking for developers and operators.
Edgar was also kind enough to share information about how he and others have transformed the team at Workday to take advantage of the agility that cloud provides. Through training, labs, and bringing departments together, Edgar is biulding a private cloud suitable for scaling and deplying Workday's data-sensistive applications. We closed out with Edgar's willingness to build diverse teams-something Edgar is passionate about since he has one daughter and another on the way. (Congrats!)
If you follow Neutron, SDN, or networking in general, don't pass this podcast up!
You can follow Edgar on Twitter at @emaganap and find his OpenStack sessions (including a use case session) here.
Jeff and I are headed to Vancouver! Check out Jeff's sessions, my sessions, and follow @openstackpod to catch the Summit Minicasts of OSPod.
See past episodes, subscribe, or view the upcoming schedule on the OSPod website.
For a full transcript of this podcast, click "Read more," below
Niki Acosta: All right. Good morning, good morning, good morning. This is Niki Acosta from Cisco. Jeff is off today, so he will not be joining us. However, he might be spying on us from an airplane. I'm not sure. We have an awesome guest with us today, Edgar Magana from Workday. Edgar, introduce yourself.
Edgar Magana: Hey, thank you and good morning everybody as well. My name is Edgar Magana, I'm the cloud operations architect here at Workday. We are one of the top 20 as companies here in the bay area. It's an awesome place to work for. To be honest, it's been a change in my role. I just work for networking companies such PLUMgrid or Cisco, and now I'm in the operator side, so it's very, very excited.
Niki Acosta: Edgar, we typically start the podcast by asking about you. How did you get into tech? Obviously, you have an accent. Tell us where you're from and how you got to OpenStack.
Edgar Magana: Absolutely. This is a lot of fun. I was always curious about technology since I was a kid. I just broke every single toy that I have when I was a kid, because I wanted to find out how they really work. I knew I wanted to be an engineer, something like that. My daughter, who is 4 years older than me, suffered the most, because I use to broke their toys as well, because they were more fancy than mine. Sorry about that. Sorry sister. Anyway, as I start growing up, when I was at elementary school, this is something that really changed my life because I was lower than average student, but I was smart, kind of.
Anyway, the thing is the principal in my school, who was a very good guy, told me one day and said, "Do you like computers?" I was like, "I guess so. I like video games. I never really played with a computer." I hear about it, but no idea, right? Early '80s. He was like, "Okay. I can make a deal with you. If you actually get better grades, I can let you play with my computer." I don't know, 20, 30 days later, everything was an A, everything. I came back to him and he actually let me play with the first computer that I ever touch in my life. I was playing chess, and it was awesome. From that moment, I decided that I wanted to be involved with computers from whatever.
I'm originally from Mexico. The technology over there is a little bit behind of what we have here, so I had read everything in a different language with terminology that I didn't even know about it. I tried to take as many magazines that I could get, as many ... Internet at that time was not even something, especially for me. Whenever I could have the chance to get my hands in some electronic device, I did it. My first computer was a Commodore 64, and I was having a lot of fun with it. Before high school, I was able to play with my first program in language, which was Logo. I don't know if you ever knew about that one, but it was a lot of fun. That's where I'm coming from, very proud of the roots and the things that I've been doing so far.
Niki Acosta: I'm from El Paso which technically almost makes me from Mexico. My dad was born in the Philippines and my mom is a Caucasian. Culturally, I think my twin sister and I are more Mexican than anything else.
Edgar Magana: Your last name is a typical Spanish last name.
Niki Acosta: I know. I have to explain how Spain invade the Philippines, and made people pick names out of a book, and they're like, "Oh, okay." Good times on that. That's interesting. Growing up in Mexico and learning, did you teach yourself English by way of tech as well?
Edgar Magana: I always say to everybody, the hardest single thing for me was actually English. As you can tell that I'm still having trouble pronouncing some words. It's very hard for somebody who's coming from a low-income family in Mexico, because you don't have access to good teachers. Most of the people that you learn from ... There were some other countrymen that came here in America, learn some English, and they went back to get a job. You're accent is even worse, because you're learning from somebody who doesn't even speak properly, but this is what you have. I've been always in English courses, I don't know, my entire life. Well, not anymore. I just talk to people here and that's my keep training.
It was very hard. As soon as I have the opportunity to move to a different country, I did it. When I grad from school, I went to Spain, Barcelona specifically, to start a master and a PhD on computer science. I know Spain speaks Spanish, the same thing, but actually was an internal PhD, so I did collaborated with a lot of people from Europe in general, like France, UK, et cetera. I start taking more courses over there. Now my teachers were from UK, so I could speak a little bit more proper English. It was a very fun journey actually.
Niki Acosta: It's so cool that you made that journey, most certainly, like you said, coming from a low-income family in Mexico, and then getting to do a PhD. What was your PhD dissertation on?
Edgar Magana: That's a funny part. I knew I like working. When I joined the PhD, the research team at the UPC in Barcelona, it's the University, it was around network management. The hot topic at that time was distributing systems, and the hot topic inside this distributing system was great computing. If you ever view what is great computing about, it's actually a lot of foundation of what we have right now with cloud computing. My PhD was policy waste management system for great computing.
Niki Acosta: Probably coming full circle now. That seems to be a hot topic currently inside of the Neutron Project, right?
Edgar Magana: Exactly. It's coming back.
Niki Acosta: Yey. That's cool. It's worth it. At the end of the day, you did learn something that was applicable to what you're doing in your career which is different than most people we talk to on the podcast, kind of interesting.
Edgar Magana: Yeah. The foundation about policy waste management, it's been there for probably 15 years now. It's nothing new. Having these what we call PDP, policy decision points, PEP, policy enforcement point, they've been there forever. Now they call it [inaudible 00:07:01] points or something different, but actually the idea behind them, it's nothing new. Now, what is new is actually the backend technology. Moving from cord networks, physically wired, to these virtual networking, it's what is different. Obviously, some of the terminology needs to be adjust and adapted to this new backend systems.
Niki Acosta: You work at Workday as a ... You're a cloud operations architect, right?
Edgar Magana: Mm-hmm (affirmative).
Niki Acosta: What is Workday doing with OpenStack?
Edgar Magana: Well, that's a good question. As any other sales company, we host obviously all our application. Our data centers are really becoming huge. Workday is one of those companies who decide to invest on their own people, their own application. We design everything from scratch, internally. Every single Workday application, which are right now oriented to finance, human resources, recruitment, analytics, et cetera, is being designed from scratch here. Everything has been working nicely on the bare metal side. We are growing. We are getting more and more customer. Our projection for this year and the next ones are really, really impressive, which means our operations planning for our hosting services should grow.
What's the problem? The problem is running on bare metal will identify that we are just consuming probably 20, 30% of the CPU for every single server that we have in our data center. On the top of that, in memory, we consume no more than 50% for the server. We have typical use case that we need. We urgently need to virtualize services for both, our dev environments and also for our customer's environment. I [inaudible 00:09:03] back in June last year. They already have a project to create what is the project that is called WPC, Workday Private Cloud. They already have an engagement with OpenStack.
They did some POCs. They like it. They like the APIs. They like the OpenStack foundation, that it was not a crazy open source project just hanging around. They invited me to be the chief architect for this WPC. Since I joined, we're just working on creating the new data center from scratch again. We are not extending our data center. We're actually creating new ones, physically located in the same places, but the cord network, the aggregation network, and obviously the virtualized layer is all new, and it's a very amazing and awesome project.
Niki Acosta: Are you guys using any public cloud resources at all, or for security reasons, can you not do that? Is that an option?
Edgar Magana: We do it for development. For our customers, we cannot do it obviously for security reasons, as you said, and also for compliance. We manage very confidential information from our customers, and obviously we keep everything very secure. Even myself, chief architect, I don't have access to our customer's data. We just use public clouds like Amazon Web Services for our dev environments. We went to be as fast and dynamic. I don't know if you notice, but one of those few companies that we deliver a new patch every Friday night. Every Friday night the software is updated for all our customers at the same time, in a very short period window.
Niki Acosta: How are you guys currently handling that, versus how you will handle ... I don't know. Are you running OpenStack in production today?
Edgar Magana: Not yet.
Niki Acosta: Not yet.
Edgar Magana: Very soon, otherwise I would get fired. We're very close to do that. It's a huge project. We have integration of 3 different teams, our cloud [inaudible 00:11:12] team, which is focusing on the software development. We have our dev ops team, which is focused on the Chef deployment of OpenStack. We have Chef base companies, so we extended all the stack for open source Cookbooks for OpenStack deployment. The third team is obviously our infrastructure team, which actually help us to move out of our nice, very well-designed dev environment into the real world, which is the data centers, with minimal connections, a lot of security requirements. It's not easy, believe me. Going to production has been a very good challenge, but we're almost there.
Niki Acosta: Tell me about the cultural aspects. I'm asking that because when I go and I talk to big companies about cloud deployments, it seems like the traditional infrastructure teams are almost at odds with some of these developers who are writing next-generation super-scalable applications. Are you finding it easy to navigate and walk that line in between traditional infrastructure teams, people who are comfortable using bare metal or VMWare and now ushering in this new platform for everything moving forward?
Edgar Magana: Yeah. That's actually an amazing question. I'll tell you why. The good thing about Workday is it's still a young company. We just [inaudible 00:12:47] a couple of weeks ago. Most of the teams are very young and they are very open to learn now things. However, we manage very important information. The most difficult thing to deal with is security. It's not because they are not nice guys or they want to be mean with us. It's because they know how important it is to keep that zero number about security breaches that we have had so far. None so far. We want to keep it that way. They are sniffing every single piece of code that we write. They actually pen test, penetration test, for all the OpenStack code. We found very good things. We're planning to provide all these information back to the OpenStack community obviously.
Coming back to the insights, the technology that they are running on the data center has been designed locally. Now, here comes this guy who looks like came from South America, Central America, whatever, they don't even know where he's coming from, with this open source project, and they have lot of questions. It's been part of my job to do a lot of evangelizations inside of Workday, explaining, educating, actually motivating people to get in love, as I am, with OpenStack. They are getting it. It's been a little bit difficult at the beginning as any change. The inertia, the beginning is very hard. Then you release a little bit of friction, and then you have all the way for the transition. We are in that mood right now. The company is ready. They are yelling at us, "When can I get it? Where can I have that API code that is going to give me [inaudible 00:14:46] and all that dynamic, all the networking?"
It's so funny. Workday is a typical company who designed everything very well, very structured, but at the same time, it wasn't very dynamic. Every time that they, each team, receive a ticket about "I need to have this new port open" the typical networking case, "I have to have this new application running now. I need a new port open." It will take 2 or 3 weeks to get that. Security has to approve it. [Inaudible 00:15:22] has to test it. There are changes in 5, 6 routers in 10 different switches [inaudible 00:15:29] enabled. It was a nightmare. We were transitioning from that model to the new virtual network model, and everybody is in love with it. Just one guide, review the policy, send in to our network management system, deploy it into the inside. We do some validation down. The ticket can be closed the same day it was opened.
Niki Acosta: That's amazing. [Inaudible 00:15:56] something that took 2 or 3 weeks to a day. It has to be positive for the business.
Edgar Magana: Absolutely. We are growing really, really fast. Our [inaudible 00:16:07] team should keep that momentum, should keep growing at the same side. We like to grow up in a way that we are making our customers still very, very happy. Aneel who is our CEO, he always say to the whole company, "The most important things for us is the customer's satisfaction." We are on the top 98, 99, the last time that we were review. He doesn't want that number to go low for any [inaudible 00:16:37] reason. We'll review everything 2 or 3 times. We do a lot of testing. We keep things very seriously, but we also have our very good environment.
I will tell you, the 100% software development environment is very different today. Infrastructure of the operation environment, these guys, they know how to have fun for sure. The cloud environment is more introverted environment. The infrastructure operation is thorough extroverted, and I enjoy in being part of both. It's so amazing.
Niki Acosta: I'm sure they're benefiting from your expertise. Let's talk about that a little bit. You mentioned having worked at PLUMgrid, having worked at Cisco, now very much still involved as a core contributor to Neutron, how's the Neutron Project going?
Edgar Magana: It's going very well. It's been a very difficult journey for all the networking people at OpenStack. The first networking project, when it started back in the Santa Clara Summit 2012, which actually was the very first summit I attended at. It was I think the third summit. It was a bunch of people coming from Reseda, now VMWare, Citrix, Cisco, Entity, and a few other companies that we came together to say, "Yeah, nova network looks great." I think it's natural as the other projects involved, nova volume and [inaudible 00:18:17], et cetera, into its own project.
We have, at that time, 2 main objectives. The first objective was to keep power to the tenants. With nova network, it's a cloud operator who provides the networking to their tenants. The tenants, they don't have any power to actually create their own networking, so the network isolation is limited to certain technology like VLAN. Our first goal was to provide that power to the tenants, so they can create any kind of rich topology for their own applications. The second part was how can we make these systems pluggable. We have a lot of backend technology, so we want you to have it in such a way that we can connect Linux bridges, or Open vSwitch, or for instance, a [inaudible 00:19:13] router, or something else, anything that we can grab it from the open source, or even vendor specific technology.
We create these quantum project at that time. As you know, we have to change the name for some law issues because we were like ... I don't know. The Quantum company wasn't very happy with the name, so we chose the Neutron. It was just the main reason everybody knows about that. It's been, from the [inaudible 00:19:46] are very challenged project. We have different vendors who has different ideas. We implement technology in different ways. Trying to make a project isolated 100% of any vendor specific idea, it was very hard, but we did it as much as possible. Then it was a point of how we really make it that we can replace this simple nova network. We ended up having a trade-off. What did you want? Do you want complexity, assuming that you will need to train your own operation people on the network in side, or do you want to keep it in the simplest case where you cannot extend it as much as you can do it with the Neutron?
It's been a discussion. Everybody called this discussion the nova network replacement as being forever. We've been trying to duplicate nova network for many summits. Every time that we provide a migration patch, we find out that it's not very easy. We probably miss a little thing that it makes nova network guys to slow down the migration and the deprecation. It's nobody's fault. It's just we ended up having 2 very different technologies and 2 very different ways to do things that maybe it make sense to keep it both. Why not? If I just want to have a simple flat networks, I don't care really about isolation between my tenants, because I chose them, because it's just a dev environment. Who cares about that? Maybe nova network, it's good to know. You don't need to go to the Neutron side.
We didn't make Neutron complex just because we like complexity. It was an evolution of where we had a hand to actually provide the same kind of functionality that we already have in nova network and also extend it for vendor specific. What was available at that time, just simple Linux switches without the tables, and then we extend it onto main spaces to provide more routing functionality. It's a very nice architectural design. Whoever it's reviewing that, it will say, "Wow." It's highly complex, but it's also highly functional in terms of the features that it can provide, tenant isolation, routing, flooding IP addresses, security groups, you name it. Now it's extend through services.
Back in the Boston Summit we were talking about services insertion for the first time. I remember I was leaving that session, and everybody was like, "Is this an API? What is this?" It was too soon for services insertion, but we knew at that time that we had to extend Neutron to be able to provide functionality for other so-called advance network functions like load balancing as a service, Firewall as a service, et cetera. I do not believe they are advance features, networking features. They are just network functions. Now they are very well known as NFEs. We are able to extend our APIs to actually adapt those things. It's a very rich project, has lot of functionality, has a lot of components. Obviously, we extend the [inaudible 00:23:34] system, we extend the agents. We have obviously the API server.
It's been a very, very nice and also challenging role so far, but it's very enjoyable. The team is also willing to fix bugs, to collaborate more. We're trying to finally getting a very specific document about networking. Hopefully, for Vancouver, we will be able to release the networking guide just specifically for Neutron.
Niki Acosta: Yey. That will be awesome.
Edgar Magana: Yeah. Got 2 patches there.
Niki Acosta: Yeah. Look, I would say probably without a shadow of a doubt, the project that's probably most intimidating, to me at least, and I know that there's others that shares this sentiment is the Neutron Project. Just STN in general is a quantum leap from what people are accustomed to doing in the network space. I want to ask you anther question, because I hear both sides of this argument, and I know you did a talk with Randy about it a while back. Is Neutron ready for production? Is it stable enough? Does it scale enough? Is it a project that you feel comfortable telling people you can trust this today?
Edgar Magana: Back in time when I was in San Francisco, with Randy, my answer to that question, if I recall probably, was just for private clouds, not for public clouds. Especially because the public cloud supports a higher level of security between the tenants. It was very difficult to provide that with the native tools that we are implementing in Neutron. I just mentioned to them I have the tables inside our main spaces. This combination between having a Linux switch and then an Open vSwitch bridge, and then creating [inaudible 00:25:31] to create tunnel, like end to end connections in order to provide these security groups. I believe for production system, that is not going to work 100%. We need to have a better way to handle these things.
I'm not saying it doesn't work. Ultimately, it works, and it will scale. I'm telling you, as an operator, if I have to debunk an issue there, I need to have 10 of my guys who are an expert in network virtualization, bedsides to be experts in network, in order to find where the packet has been dropped. Then I need to run tools like TCP dump in 4 or 5 different interfaces. Is that really what I want for my data center? Now in a position, let me wear my ... Let me take away my hat as an OpenStack Neutron developer, and I put in my hat as an operator. My answer was, "Come on, no way. I need to find another solution."
Fortunately, Neutron provides a way to do that. We have a highly pluggable system. We have a very good technology out there that will mitigate that nightmare. One of the first controllers plug-ins that we have at Neutron was the openflow plug-in. It's been extended to a certain level. We have also other open source plug-ins like the OpenContrail one, who provides that held to the operator, besides to provide an extra level of backend technology. Every time that I've been asked about Neutron in production, I'd say "How is the network [inaudible 00:27:22] layer will be provide?" It's just going to use OVS and you will figure out everything, or Linux bridge, and you will try to figure out everything by yourself. Good luck with that. I think you need an extra help solving to provide you the information in a centralize point. It could be as simple as slot system in the backend, or as complicated as a fancy UI, proving some graphics about the nova functionality. We tried to collect analytics and let you predict some failures in the network.
This is what the operator wants. We don't want something as [inaudible 00:27:58] if I need to open 10 different screens to find where the backend system dropping. There's nothing wrong with Neutron. We are using all the open source tool that we have available. The data, somebody came out with that open source dynamic configurable turning mobile to do routing, and switching, and security groups altogether, we will replace where we are right now, and it's going to be much more simpler, but right now it doesn't exist.
Niki Acosta: When do you think that will happen if you had to predict? Who's going to come out with that first?
Edgar Magana: Obvious is it's being working on that for a while. I think any time soon, they will release a new version. We'll make the ACLs configuration much more simpler. That will simplify the network configuration a lot, especially on the Neutron side. We will start that alignment of having to use, or the need to use Open vSwitch. Now the Open vSwitch will provide much of that functionality. As soon as that system is ... My concern on that system will be the performance. So far, and I'm talking in behalf of Workday, we test Open vSwitch at the highest scale and the performance was really bad after 40, 50 nodes. We have our concerns that we actually decided to not use it because of that. It's about also not just about providing other features, also it will scale properly and the performance would now be degraded.
Niki Acosta: What's the best bet right now for large-scale cloud users that want to go down the Neutron path? I'm hearing from you that as an operator, that it definitely has some short comings. If I wanted to go that route now, what should I look at? What should I be evaluating? What kind of skills do I need on my team to be able to validate that my implementation is going to work?
Edgar Magana: It was funny. When I came to what they ... There were a lot of confusion about the use cases. It's funny to think about that, but sometimes the operator itself doesn't know very well the use case, or there are a lot of confusion about it. There were people asking me, "Do we really need this Neutron thing?" I would say, "Hell, you need it. Otherwise, you won't be able to fulfill even the simplest use case that we have here at Workday." I found that a lot of times they need to understand that our use case is first. Do we really need security between tenants? What level of security do we need, at the port level, at the back level? Why don't you have all those questions answered? You need to find it out what are your requirements of them of performance. Maybe over the years, we'll be good for you, we'll be enough, and you can try to implement it, and now you need to look for other alternatives.
On all the LCN space, I will also recommend operators to look for who's really changing the way things will work in the data center. There are few LCN companies that they rely on the OVS, which mean your backend technology is [inaudible 00:31:25] OVS. You need to think about OVS performance in other stages just Linux switch. You need to think about the Linux switch performance. For the ones who wants more functionality and they're looking for this controller, LCN controller base, well, you need to find it out if that LCN really change the backend technology, if this is a new model, and how it's going to perform.
Back into the operator's training and technology aspects, most of these guys, they ... I'm finding it out. They are very good at the core level, at the core technology. They do understand very well fabric switches and routers. When we move from that environment to the virtualize, they get a little bit lost. I always recommend to held them into that transition. I do not recommend to create 2 teams, the one to focus on the physical layer, and the one that is focused on the virtual layer, because you are creating 2 different teams, and they will start fighting each other for every single thing. You need to converge your physical networking team into a virtual networking working team. It's a transition that will take time and training, but the limit, it's money that is worth it to invest for your team, for your people.
Also, they will actually come out with better solutions that you cannot even imagine, because they will bring that core knowledge into the virtualize, and they will start asking so na?ve questions that actually will make people think like, "well that's interesting. Do we really need that part? What if we don't need it?" It's like when we talk about IPv6. The naive people tell they just want to replace IPv6 with IPv4. They want to replace all the network functions that we have there, DHCP, NAT, and all those things. Sorry, but that is stupid. IPv6 is a highly extendable protocol t