Artwork

İçerik Reblaze Technologies Ltd. tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Reblaze Technologies Ltd. veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.
Player FM - Podcast Uygulaması
Player FM uygulamasıyla çevrimdışı Player FM !

Episode 21: Maintaining Envoy Proxy with Snow Petterson

34:09
 
Paylaş
 

Arşivlenmiş dizi ("Etkin olmayan yayın" status)

When? This feed was archived on July 01, 2022 02:28 (2y ago). Last successful fetch was on October 25, 2021 23:04 (2+ y ago)

Why? Etkin olmayan yayın status. Sunucularımız bir süredir geçerli bir podcast beslemesi alamadı

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 300259981 series 2968145
İçerik Reblaze Technologies Ltd. tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Reblaze Technologies Ltd. veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Sponsored by Reblaze, creators of Curiefense

Panelists

Justin Dorfman | Tzury Bar Yochay

Guest

Snow Pettersen
Envoy Proxy Senior Maintainer

Show Notes

Hello and welcome to Committing to Cloud Native Podcast! It’s the podcast by Reblaze where we talk about the confluence of Cloud Native and Open Source. Today, our special guest is Snow Pettersen, who is an Envoy Proxy Senior Maintainer working at Lyft on the Resilience team. Snow has done Cloud Native at Square, Netflix, Lyft, and he tells us how it’s changed over the years and a particular challenge he had recently. He also shares with us about problems with the release and rollout with sidecars in Envoy. Speaking of Envoy, Snow explains exactly what it is and what it does. We also learn the architecture of Envoy, the new contrib folder proposal, extensions coming out, and the “golden rules” to follow when reviewing a code. Go ahead and download this episode now to hear more and thank you for joining us today!

[00:02:06] Snow has done Cloud Native at Square, Netflix, and Lyft. Find out how it’s changed over the years. He also tells us about a recent challenge he had.

[00:03:47] We learn from Snow that the biggest headache he’s seeing with people using Envoy has been the release and rollout problem with sidecars.

[00:06:47] Tzury wonders how Snow would explain Envoy to someone. He also tells us how it switches to the new set of configurations while processing and Envoy’s scalability on a single machine.

[00:13:16] Snow goes more in depth about the architecture of Envoy and the new contrib folder proposal.

[00:20:24] Find out how many people are actually maintaining, monitoring, and moderating the process.

[00:24:02] Justin asks what Snow anticipates on extensions that will be coming out that can’t make it to core and what is it that people want that they can’t get right now.

[00:26:43] Tzury wonders what the most obscure, unexpected use of Envoy was in production that Snow came across.

[00:28:17] Over the years that Snow has been at Envoy, he tells us how much of his time he spends writing new code versus reviewing others versus answering emails and file or responding to issues on GitHub. Justin shares some stats from Snow’s GitHub profile.

[00:29:54] Snow shares the “golden rules” when you review a code.

[00:33:04] Find out where you can follow Snow online, and he gives a shout-out to the entire Envoy community!

Links

Curiefense

Curiefense Twitter

Curiefense Blog

Cloud Native Community Groups-Curifense

community@curiefense.io

Reblaze

Justin Dorfman Twitter

jdorfman@curiefense.io

podcast@curiefense.io

Richard Littauer Twitter

Tzury Bar Yochay Twitter

Snow Pettersen Twitter

Snow Pettersen GitHub

Lyft

Envoy

Episode #17: “99.99999% Uptime with Anna Berenberg”

Credits


Transcript

[00:00] Snow Petterson:There was a period of time around this time when I started being a maintainer and a bit before when I was writing a lot of code, just because again, I think it aligned very well with what my company needed at the time. Now, over time I've just gotten review ownership over more and more codes and being brought into more and more like, hey, you know how this works, so can you chime in? So I've definitely like drifted away more towards the side of communication. It's always nice to get some code written every now and then, but there's so much other stuff that happens that I always have to be careful about making myself the blocker for the code landing.

[00:42] Intro: Hello, and welcome to Committing to Cloud Native, the podcast where we talk about the interface between open source and cloud native. We're super excited about our guest today, can't wait to introduce him. Our panelists today are Justin Dorfman and Tzury Bar Yochay, and they're going to have an awesome conversation. I really enjoyed listening to it and I really hope you enjoy this conversation.

[01:06] Justin: Today we have Snow Peterson joining us from Lyft. He's on the Envoy Proxy Project as well, senior maintainer. Tzury, you're here, what's up? I thought you almost had a COVID, but you're good.

[01:18] Tzury: Hey JD. Hey Snow. How are you guys? I'm all good. I'm fine. Thank God.

[01:22] Justin: Okay. Thank God and Snow, how are you? Are you doing good?

[01:26] Snow Petterson:I'm doing great. yes. Happy to be here. Thanks for having me.

[01:30] Justin: I really appreciate you coming back because for the audience that doesn't know the backstory, Snow was on like a month or two ago and the audio was so bad that we had to pull the plug. So we rescheduled and Snow, thank God said yes and that's where we're at. And we just want to basically go over what we talked about, but this time with a new recording platform and new equipment. So thank you again, Snow for really taking the time to do that.

[02:04] Snow Petterson:Yes, no problem at all.

[02:06] Justin: So cloud-native, you've done it at Square, you've done that Netflix, you've done it at Lyft. How has it changed over the years?

[02:13] Snow Petterson:It's definitely matured a lot. I think a lot of the stuff we were doing early on at Square, particularly in the Envoy spaces, which is how I ended up in this whole space. It was rough around the edges and it took quite a while to ramp up on things and things didn't always work the way you wanted and I think now things have definitely matured. I guess it's been four or five years at this point. So more problems are solved, things are easier to do, but still a lot of challenges.

[02:42] Justin: What's a major challenge that you've recently experienced, whether it's at Lyft or just maintaining the project?

[02:49] Snow Petterson:I think one of the interesting [Inaudible 02:51] there's been this push towards like a [Inaudible 02:58]approach where a lot systems are relying more and more on these open source projects that run next to their services and Kubernetes and assessments as well and this has been like a trend in cloud-native where more and more problems have been sold via site cars, which on its own has cost like a bunch of new problems around like management of these site cars. And I think a lot of people who jumped on the site car bandwagon early on are now running into issues with managing all of these site cars with companies having 5, 10, 15 site cars running and their pods resulting in a whole set of new difficulties that people didn't realize would be this bad once when they were preaching about the value of site cars.

[03:48] Justin: Is it like a performance issue or is it more of a security? What's the biggest headache that you're seeing with people using Envoy and site car to loading?

[03:57] Snow Petterson:It's a release and rollout problem, that's a huge one where it's tricky to have a good release policy for site cars because you're kind of torn between two sides. One which you want to get new code out quickly and safely, but it's hard to do quickly if you have to roll your entire fleet, there's a lot of work to do this safely because you can try to roll your entire fleet, what kind of stats are you monitoring, what kinds of systems are in place to make sure that things don't go wrong and just the idea of having a gradual rollout of site cars can be very tricky because you end up having to often the build your own systems if you want something more granular than like per Kubernetes cluster, for example. So what you get per cluster, it's probably not too bad because you can just kind of do them one at a time, but taking down an entire cluster can be pretty bad as well, depending on your setup, not everybody runs with a bunch of redundant clusters.

[05:01]Then you have this problem of once you start building automation for rolling the fleet to up-to-date site car, if you have a lot of site cars that need constant updating because you not only do you have like the open-source readily available site cars for you also building your own internal site cars, you end up having to roll your fleet quite a lot and it just creates a lot of churn and each one of these can cause a lot of issues. Sometimes you bundled multiple site car updates into the same update and then it gets very complicated because the person doing the actual rollout might not have any context around the site car being updated. And then the other way of doing it, where you don't roll the fleet, you put the onus on service owners to manage it whenever they update, they got a brand new set of site cars as its own set of problems where like, oh, you updated your app, you have seven new site cars, something's wrong. What do you do?

[05:55] Justin: That's defeating the purpose of the whole microservices architecture. Microservice is supposed to make it easier and then this is just like monoliths all over again, kind of.

[06:05] Snow Petterson:That's a nice analogy because you're going back to like a more monolithic setup in a way. But the app owner doesn't really know because they're just looking at their own code and they're like, oh, my code has 500 lines of code. This is barely anything, so I'm just going to deploy this rapidly because it's such a tiny amount. But if you're then also factoring in the tens and thousands of lines of codes, you are deploying every time a site car upgrade goes out, things become tricky and it's harder to move at the same velocity that you would have expected given how small your supposed microservice is.

[06:40] Justin: Right. Tzury, thoughts?

[06:42] Tzury: I think we have the perfect candidate that should answer the following question. How would you explain Envoy to a new, I wouldn't even say new commerce, somebody who knows nothing about Envoy, he here is the name, he/she, they heard the name and they went, what is Envoy? But they have no connection to the internet. They cannot type it in Google and get the answer. What would you say to them on a cocktail party Saturday afternoon? How would you explain to them Envoy?

[07:15] Snow Petterson:Usually what I do, if I'm talking to somebody who might not even be familiar with microservice and disparate systems is kind of tell the story of how you end up here, which is once upon a time, there was a bunch of mainframes and everything around on one machine and things were easy, if you wanted to have different subsystems call each other but you just call the function or it's just all in there, it's easy. Then to scale out, you ended up building this microservice world where you have all different component talking to each other, and then you have to teach each of these components how to talk to each other and also then became difficult because you had a lot of different components wanting to talk to each other and they all had to like understand how to talk to each other. So, Envoy in the capacity of service [Inaudible 07:58] in other ways as well, but as a service mesh, which I think is how most people use it, it serves a role of centralizing how these services talk to the each other into a single process that can run alongside all of the applications, allowing them to communicate with each other and this is like the very basic of what it does.

[08:22]Where its real attractiveness came from its ability for dynamic reconfiguration, which provides a consistent way for people to interact with it. Many people have done similar things with things that relied on very like laborious ways of updating the console or complicated conflict management systems. So the big thing that Envoy brought in to make this problem more attractive was all of these APIs and XTS mechanism that allows for a management server to push all of the configurations that it might need and update without restarting and in a very efficient manner and react to changes really quickly. In addition to this scale of really well, have a relatively low performance overhead non-zero, but it's fairly low and also be able to scale up to very large use cases.

[09:19] Justin: So when Envoy gets an updated figuration from XTS server, whatever that might be, how does it gracefully switch to the new set of configurations while processing? By the way, what is the Envoy scalability on a single machine or a single corp to your knowledge?

[09:40] Snow Petterson:In terms of concurrency, that's an interesting one, because the way that we suggest that we run it is that, and this will also go into play in explaining how this conflict update works, where Envoy will generally run one, if you want to run on a single machine, you run one [Inaudible 09:57]for core, and there are very few cross-thread interactions. There's some, but for the data paths where you're handling requests and proxying them, a single worker thread accepts it and handles it for its entire lifetime. And Envoy will basically, each thread I think is able to handle like, that's a really hard question to handle the scalability because it depends heavily on which features you're using, TLS versus TLS makes a huge difference for example. I recall in the past there being talks about like, I think 20 or 30,000 RPS synthetic benchmarks being run, but that's obviously very like you strip down everything and you like run it with like nothing.

[10:39] Tzury: So, it wasn't HTTP two, it was HTTP one, for example, with HTTPS, no TLS.

[10:45] Snow Petterson:Yes, like no modification of the page load, no access control, no nothing, which is not how people would use it. So, it becomes of the value of such a benchmark becomes questionable because there's this whole other set of features that I might use.

[11:02] Tzury: Well, that's still an impressive number.

[11:05] Snow Petterson:Going back to how the conflict update works it's generally, there's the static and SIG that it comes up with, the calls we've struck in SIG, which is required run time. This has to be, you need to do a process [Inaudible 11:18] changes, but the typical way you would define this static config is that you will basically say get all of my actual config for the management server and then the management server is responsible, we'll receive a request from the client requesting, hey, I need to know about all of my configuration and so the server is [Inaudible 11:39] of these and the [Inaudible 11:42]comes up with my internal object [Inaudible 11:46] resource he wants. So these are like listeners, which define which port, protocol and transport socket TLS one knock configuration for each listener clusters, how to communicate with all their systems, endpoints, the IP addresses associated with these clusters and then as any new conflict comes in, this gets handled on the main thread, which is used exclusively for this control plane interaction and it gets processed, it gets validated and then it gets posted to all of the worker threads, telling it to update as thread local version of this data.

[12:19]Then there's another mechanism where generally [Inaudible 12:22]snapped to a stream or a request. So if you get a request and while you're processing the request you get a conflict update, it also acts on the old configuration just to make sure that each request has a consistent view of what the configuration looks like.

[12:42] Tzury: So in terms of the Envoy architecture, which is something that when you dive in a little, you find it quite quickly. The beauty of having a stripped-down core barebone, I would say, minimal core of Envoy while almost anything you want to implement and do, things of which you consider obvious and ground level are pretty much implemented as filters or as extensions and I believe that was done by choice at the time. Can you elaborate a bit about this architecture?

[13:17] Snow Petterson:So if we just start off with the filters for processing an [Inaudible 13:22] requests you can define like a list of filters, which is like each filter will process the incoming request and the outgoing response and gives you a chance to like modify the request. I think early on this predates me, but I assume it was natural to just implement the routing mechanism as one of these because what is the actual routing mechanism? Well, it's the thing that accepts the request and it generates a response. So this fits neatly into the filter mechanism and I think in constructing this extension mechanism, that was an early choice to make it very generic so that the way extensions are done can be reused for basically anything. Basically, you have some C plus API that you can implement and you register a factory that accepts [Inaudible 14:10]that defines it and this very generic extension mechanism means that you can do basically anything that's very easy to make anything an extension point. So a lot of things were quickly [Inaudible 14:26] where you'd take something that was previously not, for example, TLS used to be baked in, but in order to better support other ways of transforming the data on that level, it was extracted into a transfer socket extension so that now TLS is just an extension and this just opens up for so many other extensions we built that kind of like fits in the same spot in the stack as TLS.

[14:55]So this is definitely a fantastic choice and has allowed us to do stuff like have different security postures for different extensions. We can say that we have built-in say 50 different filters. Only 10 of them are robust to trusted downstream sort of upstreams and whatnot, which allows us to better evaluate. We can tell people, we can guarantee that this has been vetted via fuzzing and via other production testing. We expect this to be secure. Others, newer extensions generally will be flagged as we don't know, or have a less robust one, which means that if we then get issues coming in and saying that, oh, I found a bug, if this, this and this happens, we can crash the process. If it falls into one or our like less trusted extensions, we say, well, that's okay, we'll just fix it while the other ones will go through like a security release, file a CV for it, because we've already made the promise that these things are secure and so we have to go through the right process to make sure that we disclose it in a responsible manner.

[16:07] Justin: Does this have anything to do with the new contrib folder proposal, or is it completely different?

[16:15] Snow Petterson:So the contrib folder is basically a proposed new way of structuring extensions, where in addition to having basically core and extensions, instead we'd have core extensions and contrib and contributing like a new collection of extensions that are held to, like not the same bar as core extensions and this is to address in terms of like coverage, making sure that it's been signed off by a core maintainer and whatnot, the idea being that there's a lot of very useful extensions that we love proposing, but some of them are, they show up with like a 5,000 lifeline PR and they're like, hey, I made this extension. It's super useful, we've been using it in production, but in order for us to get it into a state where we'll be okay putting it to core, it just takes so much time and efforts unless we have somebody willing to do that work it just sits there. So contrib is a way for us to say, yes, we would happily take it, we're going to put into contrib, which means that we'll do some like due diligence, make sure it looks sane and then we'll probably put it in there.

[17:24] Justin: I mean, yes, that's definitely going to help. It's kind of like the WordPress model where you have this plugin directory and I could just see this taking Envoy probably to the next level if this proposal gets accepted and then built upon because it has to be very discouraging for developers coming and sending a pull request and then getting it declined. It's not like you want to do that, it's just got to find the time and there are other stakeholders that are going to be affected by this inclusion in the course. So I think this is probably the best way to kind of combat this issue that you're having.

[18:01] Snow Petterson:I think it's going to help a lot. I think at the moment, there are so many different paths towards a very extensible platform that the web assembly work that is being done it's a little bit too immature, I think for a lot of people to be willing to use it, but it's getting there. There's also a proposal around adding better support for Sego. So you could ride a shelter that like call send through a Sego API into a real goal binary, as opposed to like goal web assembly extension. There's a lot of interest in making the platform more extensible and so I think [Inaudible 18:40]will help quite a bit and if I'm select the space of like native extensions, but there are other things which I think will help a lot as well, which is like web assembly, in particular, I think that that has a lot of potential. The Sego one is also very interesting.

[18:54] Justin: I'm looking at a gid hub Envoy project and I see over 700 developers contributing code and patching and so how do you guys maintain the time utilisation all this takes, the efforts and the time to navigate between the community users, developers and I believe even without that, you would have the roadmap and the utilities already set out for the next upcoming years. I mean, Envoy is its own roadmap, I mean, let me put it this way, when someone comes to a live project and Envoy is definitely one of those who right now is super cool, super popular taking over cloud vendors. We had the projected Joshi from Google last week and we just talked about how Envoy was actually embedded within Google cloud products and we know Azure and Microsoft Azure under AWS simply do the same. So, envoy has its own roadmap and even without our community involvement, it will have its own tasks, projects, priorities, features upcoming and so on. Now we come in with our own ideas, some of them matching what is already on the roadmap, some really cool ideas, some of them less cool, probably. How do you guys prioritize, manage, maintain these and found a balance between all of this? How many people within the community, if I say, how many of you guys are actually maintaining and monitoring and moderating all these processes?

[20:37] Snow Petterson:We have maybe 10 or 12, something around that, maintainers. We're weekly on call rotation, where every week there is a new maintainer who's on call who will be responsible for triaging issues and PRs assigning them to a review room and I think that works reasonably well. It's always tricky because sometimes there's very hard questions to ask and we don't know the answer and that requires investigation time but a lot of them it's about finding the right person to tag on the issues and kind of understanding who the domain expert on the different ports are and just knowing who might have opinions of things. And there's a prioritizing the work, that's always tricky because as you can imagine, most people working on it, generally have their plates full. Smaller things can often be, you kind of just get it done. Like if somebody asks for like a very small addition and I know exactly how to do it, I can go easily do it right there and then there's sort of a PR, but a lot of things are much larger and then a lot of the time we're kind of reliant on somebody from the community being able to step up and take ownership over mending it, or somebody from like a maintainer or whatnot that has some company internal motivation for getting it done. That tends to be very helpful.

[22:03]Where for example, Google did a lot of work in order to reduce the footprint of stats because they had a lot of like issues internally, as I understand it with certain large deployments hitting memory issues. So for them it was easy to just kind of show up one day and say, hey, we're going to rework how the stats subsystem work in order to improve things and a lot of the work, we're seeing a lot of work now coming in from Google as well on implementing quick and because that's in their interest. If somebody shows up and they have a feature request around quick, where they might want to, whatever it is, something that might not be on the immediate roadmap, given that there's already interest in, there's a lot of people in the community who are working on it, it's a lot easier to prioritize that kind of work, but there's not a lot of like requests around stuff that none of the maintainers have a strong desire to work on either via their company or personal that you basically fall reliant on some reporter or somebody else stepping up and saying, yeah, I'd like to implement this.

[23:12] Justin: So I hate to like go back a little, but I'm really interested in contrib. The reason I found it was I subscribed to the Envoy mailing list and I saw the Google doc base submitted and you see excitement in the Google doc. So I believe that this will just be a new chapter for Envoy and because we did have Anna Brandenburg on and she's very pro Envoy and in also running it very lean without too many extensions. So this whole new way of using extensions that might not be Google would never install and to their infrastructure, but many others might, what do you anticipate on extensions that will be coming out that can't make it to core? What is it that people want that they can't get right now?

[24:12] Snow Petterson:Small extensions have typically been a lot easier to get in. So, if somebody wants an extension that does like a very small modification [Inaudible 24:22]as long as it's somewhat generic, that it's okay. So I think the bigger thing will be like larger filters and extension. So, a great example here, I think are like protocol parsers that will like generate stats and was like statutory ones and also like ones that can give routing. So various protocols that aren't currently supported, I said, this is because I've reviewed some of the PRS to add support for other protocols. I helped align the support for suite keepers that generation, that's like a three, 4,000 line PR, which is fairly hard to get in because it definitely requires aligning interests between maintainers and contributors without requiring this maintainer sponsorship. I think it will be a lot easier for people to add in support for parsing data of various other protocols. So I think that that's probably a big one because I definitely seen a lot of issues where people asking about support for some protocols that I'd never heard about. Stuff in like Telecom and all those things where they want it to be able to have some kind of like understanding of the protocol and without a sponsor that's been really hard to get in.

[25:31] Justin: Are you talking like there are four protocols or higher ones?

[25:36] Snow Petterson:I think they're all higher. Yes, I think they run on top of TCP. I don't think the room for extensions and implementing layer four protocols is a bit tricky, I think just because we're kind of where they essentially lay today, but I'm sure there are people who will be interested in doing that too.

[25:52] Justin: But they can basically implement that say as a GCP filter, right?

[25:57] Snow Petterson:Yeah, exactly.

[25:58] Justin: On top of the GCP?**
**

[25:59] Snow Petterson:Yes, like that's how the zookeeper filter is implemented. It's a TCP field networks filter that sits before the TSP proxy. So all it does is inspect the byte as it flows through a proxy and parses the protocol, which then allows us to generate stats based on which commands are used, which I think it's mainly around which commands are used before it gets passed to the TCP proxy, which just does a standard TCP proxy off the data.

[26:31] Justin: None of this is considered site car loading? This is just...

[26:36] Snow Petterson:Yes, you're beefing up your site car.

[26:38] Tzury: Okay. Yes, I like that. What was the, I would say the most obscure, unexpected use of Envoy in production that you came across that you'd say, oh, we never imagined that people will use Envoy that way or for this purpose, then you just, oh, that's awesome, look to people's imagination?

[26:58] Snow Petterson:I think there are some cases I've heard of them running it inside of vehicles. I don't think it's fully autonomous, but as part of some system, I forget the details, but I've definitely heard cases of distributed systems being operated within some kind of vehicles. I forget the details, I'm sorry.

[27:17] Justin: All good. Like autonomous, like in the car.

[27:20] Snow Petterson:Yes. But if you really think about it, it's also not that weird because I'm sure they used to have some microservices running and they need to connect them somehow. So they're probably running Kubernetes too, I don't know.

[27:32] Justin: Like a raspberry pie stack.

[27:34] Tzury: Well, why would you have microservices inside your Tesla? How many services are running within Tesla, for example?

[27:41] Snow Petterson: [Inaudible 27:42]I'm sure at some point you got the same problems in that space too or like you want one team to work on these components and they want to be able to have a nice contract so that they can make changes to their system without having to affect the other ones. So then yes, now you're talking about API contracts between components and you're not too far away from the microservice architecture. Maybe you're not going all over the internet but...

[28:09] Justin: They are probably using portable data center, I would say, mini data center, right? Interesting. So if I'm asking you a like over the years, your roles in Envoy, how much of your time is actually has to do with writing new code versus previewing others versus answering emails and file or responding to issues on gid hub, which is accrual into emails and communication in general?

[28:41] Snow Petterson:There was a period of time around the time when I started being a maintainer and like a bit before it, when I was writing a lot of code, just because again, I think it aligned very well with what my company needed at the time. Now, like over the time I've just gotten review ownership or more and more code and being brought into more and more like, hey, you know how this works so can you chime in? So I definitely drifted away more towards the side of just communication. It's always nice to get some code written every now and then, but there's so much other stuff that happens that I always have to be careful about making myself the blocker for code landing.

[29:21] Justin: Looking at your gid hub profile, 72% of your time is on code review, 12% on commits and 16% on poll requests. So, you're doing quite a review.

[29:33] Snow Petterson:Yes, no, there's I think just for my first time, I like six months like a review at work and I pulled my status and I think I had like three or 400 reviews, which is the number of times I like hit submit review or whatever over six months. So it's quite a bit.

[29:51] Tzury: What are the golden rules for code review that you would share with the public when you review a code? What are the do's and the don'ts?

[30:01] Snow Petterson:One really important part is making sure that, assume that the person that wrote the code has good intentions and that they're doing their best. So be respectful in that way and make sure that you're not, if they do something wrong, tell them gently. There's no reason to be mean. So tell them gently and explain what's happening and ask questions and don't be too arrogant. I think I see a trend somewhere where people say you see comments, like this doesn't make any sense at all, why are you doing it like this? And sometimes they're wrong because sometimes it's because the thing that they did is perfectly reasonable if they just like miss the context because even if you're the supposed expert doesn't mean everything. So I'll do much more of a cautious approach where if there's something that doesn't make sense to me, I'll ask them why this was done and try to understand it and kind of present it like that. I think having that kind of, showing that kind of respect for the people who wrote the code is very helpful because it helps the PR you're working on to proceed smoothly. But it also means that they're more likely to come back because they've had a good experience. So I think just we're creating like a very inclusive environment. Let's see, there's a running thing and Envoy PR reviews where the maintainers and 99% of the time always like includes the word things when they approve a PR, always have to show your appreciation.

[31:30] Justin: No, it goes a long way. It really does because it's really hard to get into someone's head on the other side and you'd be like, are they mad at me? Are they annoyed? So yes, that definitely goes a long way.

[31:41] Snow Petterson:I thought this was super helpful when I was like ramping up on the project as well and also seeing sometimes a PR drags on for a long time and also including like a, hey, thanks for iterating on this, this will be great. Just making sure that people are encouraged to keep working on it because yes, like you said, it's hard to communicate your own state of mind to get up issues or PRS. So like including like a very clear like, hey, no, I'm very happy that you're doing this work. I think it's very helpful.

[32:15] Justin: Definitely. I mean like the emoticons, those are cool, but sometimes people just like to do a thumbs down and that's just like a burn, but I think overall it definitely goes a long way, no doubt about it. It's very important that people do communicate their gratitude and it would actually be a really interesting college thesis to see which projects, what the language is like, thanks versus not thanks, how healthy is the project in terms of adoption? That's not, I don't know, if we have a listener that wants to do that that'd be really interesting and I'm sure Snow, it would help you out. Anyway, it was so great having you on. We're going to do a little after show after, but before we go on to that next phase, how can people find you online? Is it Twitter? Is it gid hub? Where can people find Snow?

[33:10] Snow Petterson:I'm on Twitter, I think, what am I @snowypeas? And my gid hub is snow peas, there are some links there to find me as well.

[33:20] Justin: Yes, it will be in those show notes for sure and is there anyone in the Envoy community specifically in the Envoy community that you'd like to give a shout out to so they can be like, oh my God, I was mentioned on the podcast?

[33:31] Snow Petterson:The entire Envoy community, thank you. Thank you for everything.

[33:35] Justin: Awesome

[33:36] Outro: Listeners, I hope you enjoyed this one. Do tune in next time, we're really excited about our line-up of guests. We have super exciting guests next week as well. Check out the show notes for this podcast at podcast.curiefense.io. That's C U R I E F E N S E podcast.curiefense.io for the community to cloud native podcast. Thanks again for listening, tune in next week, catch you later.

Special Guest: Snow Pettersen.

Sponsored By:

  continue reading

25 bölüm

Artwork
iconPaylaş
 

Arşivlenmiş dizi ("Etkin olmayan yayın" status)

When? This feed was archived on July 01, 2022 02:28 (2y ago). Last successful fetch was on October 25, 2021 23:04 (2+ y ago)

Why? Etkin olmayan yayın status. Sunucularımız bir süredir geçerli bir podcast beslemesi alamadı

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Manage episode 300259981 series 2968145
İçerik Reblaze Technologies Ltd. tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Reblaze Technologies Ltd. veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Sponsored by Reblaze, creators of Curiefense

Panelists

Justin Dorfman | Tzury Bar Yochay

Guest

Snow Pettersen
Envoy Proxy Senior Maintainer

Show Notes

Hello and welcome to Committing to Cloud Native Podcast! It’s the podcast by Reblaze where we talk about the confluence of Cloud Native and Open Source. Today, our special guest is Snow Pettersen, who is an Envoy Proxy Senior Maintainer working at Lyft on the Resilience team. Snow has done Cloud Native at Square, Netflix, Lyft, and he tells us how it’s changed over the years and a particular challenge he had recently. He also shares with us about problems with the release and rollout with sidecars in Envoy. Speaking of Envoy, Snow explains exactly what it is and what it does. We also learn the architecture of Envoy, the new contrib folder proposal, extensions coming out, and the “golden rules” to follow when reviewing a code. Go ahead and download this episode now to hear more and thank you for joining us today!

[00:02:06] Snow has done Cloud Native at Square, Netflix, and Lyft. Find out how it’s changed over the years. He also tells us about a recent challenge he had.

[00:03:47] We learn from Snow that the biggest headache he’s seeing with people using Envoy has been the release and rollout problem with sidecars.

[00:06:47] Tzury wonders how Snow would explain Envoy to someone. He also tells us how it switches to the new set of configurations while processing and Envoy’s scalability on a single machine.

[00:13:16] Snow goes more in depth about the architecture of Envoy and the new contrib folder proposal.

[00:20:24] Find out how many people are actually maintaining, monitoring, and moderating the process.

[00:24:02] Justin asks what Snow anticipates on extensions that will be coming out that can’t make it to core and what is it that people want that they can’t get right now.

[00:26:43] Tzury wonders what the most obscure, unexpected use of Envoy was in production that Snow came across.

[00:28:17] Over the years that Snow has been at Envoy, he tells us how much of his time he spends writing new code versus reviewing others versus answering emails and file or responding to issues on GitHub. Justin shares some stats from Snow’s GitHub profile.

[00:29:54] Snow shares the “golden rules” when you review a code.

[00:33:04] Find out where you can follow Snow online, and he gives a shout-out to the entire Envoy community!

Links

Curiefense

Curiefense Twitter

Curiefense Blog

Cloud Native Community Groups-Curifense

community@curiefense.io

Reblaze

Justin Dorfman Twitter

jdorfman@curiefense.io

podcast@curiefense.io

Richard Littauer Twitter

Tzury Bar Yochay Twitter

Snow Pettersen Twitter

Snow Pettersen GitHub

Lyft

Envoy

Episode #17: “99.99999% Uptime with Anna Berenberg”

Credits


Transcript

[00:00] Snow Petterson:There was a period of time around this time when I started being a maintainer and a bit before when I was writing a lot of code, just because again, I think it aligned very well with what my company needed at the time. Now, over time I've just gotten review ownership over more and more codes and being brought into more and more like, hey, you know how this works, so can you chime in? So I've definitely like drifted away more towards the side of communication. It's always nice to get some code written every now and then, but there's so much other stuff that happens that I always have to be careful about making myself the blocker for the code landing.

[00:42] Intro: Hello, and welcome to Committing to Cloud Native, the podcast where we talk about the interface between open source and cloud native. We're super excited about our guest today, can't wait to introduce him. Our panelists today are Justin Dorfman and Tzury Bar Yochay, and they're going to have an awesome conversation. I really enjoyed listening to it and I really hope you enjoy this conversation.

[01:06] Justin: Today we have Snow Peterson joining us from Lyft. He's on the Envoy Proxy Project as well, senior maintainer. Tzury, you're here, what's up? I thought you almost had a COVID, but you're good.

[01:18] Tzury: Hey JD. Hey Snow. How are you guys? I'm all good. I'm fine. Thank God.

[01:22] Justin: Okay. Thank God and Snow, how are you? Are you doing good?

[01:26] Snow Petterson:I'm doing great. yes. Happy to be here. Thanks for having me.

[01:30] Justin: I really appreciate you coming back because for the audience that doesn't know the backstory, Snow was on like a month or two ago and the audio was so bad that we had to pull the plug. So we rescheduled and Snow, thank God said yes and that's where we're at. And we just want to basically go over what we talked about, but this time with a new recording platform and new equipment. So thank you again, Snow for really taking the time to do that.

[02:04] Snow Petterson:Yes, no problem at all.

[02:06] Justin: So cloud-native, you've done it at Square, you've done that Netflix, you've done it at Lyft. How has it changed over the years?

[02:13] Snow Petterson:It's definitely matured a lot. I think a lot of the stuff we were doing early on at Square, particularly in the Envoy spaces, which is how I ended up in this whole space. It was rough around the edges and it took quite a while to ramp up on things and things didn't always work the way you wanted and I think now things have definitely matured. I guess it's been four or five years at this point. So more problems are solved, things are easier to do, but still a lot of challenges.

[02:42] Justin: What's a major challenge that you've recently experienced, whether it's at Lyft or just maintaining the project?

[02:49] Snow Petterson:I think one of the interesting [Inaudible 02:51] there's been this push towards like a [Inaudible 02:58]approach where a lot systems are relying more and more on these open source projects that run next to their services and Kubernetes and assessments as well and this has been like a trend in cloud-native where more and more problems have been sold via site cars, which on its own has cost like a bunch of new problems around like management of these site cars. And I think a lot of people who jumped on the site car bandwagon early on are now running into issues with managing all of these site cars with companies having 5, 10, 15 site cars running and their pods resulting in a whole set of new difficulties that people didn't realize would be this bad once when they were preaching about the value of site cars.

[03:48] Justin: Is it like a performance issue or is it more of a security? What's the biggest headache that you're seeing with people using Envoy and site car to loading?

[03:57] Snow Petterson:It's a release and rollout problem, that's a huge one where it's tricky to have a good release policy for site cars because you're kind of torn between two sides. One which you want to get new code out quickly and safely, but it's hard to do quickly if you have to roll your entire fleet, there's a lot of work to do this safely because you can try to roll your entire fleet, what kind of stats are you monitoring, what kinds of systems are in place to make sure that things don't go wrong and just the idea of having a gradual rollout of site cars can be very tricky because you end up having to often the build your own systems if you want something more granular than like per Kubernetes cluster, for example. So what you get per cluster, it's probably not too bad because you can just kind of do them one at a time, but taking down an entire cluster can be pretty bad as well, depending on your setup, not everybody runs with a bunch of redundant clusters.

[05:01]Then you have this problem of once you start building automation for rolling the fleet to up-to-date site car, if you have a lot of site cars that need constant updating because you not only do you have like the open-source readily available site cars for you also building your own internal site cars, you end up having to roll your fleet quite a lot and it just creates a lot of churn and each one of these can cause a lot of issues. Sometimes you bundled multiple site car updates into the same update and then it gets very complicated because the person doing the actual rollout might not have any context around the site car being updated. And then the other way of doing it, where you don't roll the fleet, you put the onus on service owners to manage it whenever they update, they got a brand new set of site cars as its own set of problems where like, oh, you updated your app, you have seven new site cars, something's wrong. What do you do?

[05:55] Justin: That's defeating the purpose of the whole microservices architecture. Microservice is supposed to make it easier and then this is just like monoliths all over again, kind of.

[06:05] Snow Petterson:That's a nice analogy because you're going back to like a more monolithic setup in a way. But the app owner doesn't really know because they're just looking at their own code and they're like, oh, my code has 500 lines of code. This is barely anything, so I'm just going to deploy this rapidly because it's such a tiny amount. But if you're then also factoring in the tens and thousands of lines of codes, you are deploying every time a site car upgrade goes out, things become tricky and it's harder to move at the same velocity that you would have expected given how small your supposed microservice is.

[06:40] Justin: Right. Tzury, thoughts?

[06:42] Tzury: I think we have the perfect candidate that should answer the following question. How would you explain Envoy to a new, I wouldn't even say new commerce, somebody who knows nothing about Envoy, he here is the name, he/she, they heard the name and they went, what is Envoy? But they have no connection to the internet. They cannot type it in Google and get the answer. What would you say to them on a cocktail party Saturday afternoon? How would you explain to them Envoy?

[07:15] Snow Petterson:Usually what I do, if I'm talking to somebody who might not even be familiar with microservice and disparate systems is kind of tell the story of how you end up here, which is once upon a time, there was a bunch of mainframes and everything around on one machine and things were easy, if you wanted to have different subsystems call each other but you just call the function or it's just all in there, it's easy. Then to scale out, you ended up building this microservice world where you have all different component talking to each other, and then you have to teach each of these components how to talk to each other and also then became difficult because you had a lot of different components wanting to talk to each other and they all had to like understand how to talk to each other. So, Envoy in the capacity of service [Inaudible 07:58] in other ways as well, but as a service mesh, which I think is how most people use it, it serves a role of centralizing how these services talk to the each other into a single process that can run alongside all of the applications, allowing them to communicate with each other and this is like the very basic of what it does.

[08:22]Where its real attractiveness came from its ability for dynamic reconfiguration, which provides a consistent way for people to interact with it. Many people have done similar things with things that relied on very like laborious ways of updating the console or complicated conflict management systems. So the big thing that Envoy brought in to make this problem more attractive was all of these APIs and XTS mechanism that allows for a management server to push all of the configurations that it might need and update without restarting and in a very efficient manner and react to changes really quickly. In addition to this scale of really well, have a relatively low performance overhead non-zero, but it's fairly low and also be able to scale up to very large use cases.

[09:19] Justin: So when Envoy gets an updated figuration from XTS server, whatever that might be, how does it gracefully switch to the new set of configurations while processing? By the way, what is the Envoy scalability on a single machine or a single corp to your knowledge?

[09:40] Snow Petterson:In terms of concurrency, that's an interesting one, because the way that we suggest that we run it is that, and this will also go into play in explaining how this conflict update works, where Envoy will generally run one, if you want to run on a single machine, you run one [Inaudible 09:57]for core, and there are very few cross-thread interactions. There's some, but for the data paths where you're handling requests and proxying them, a single worker thread accepts it and handles it for its entire lifetime. And Envoy will basically, each thread I think is able to handle like, that's a really hard question to handle the scalability because it depends heavily on which features you're using, TLS versus TLS makes a huge difference for example. I recall in the past there being talks about like, I think 20 or 30,000 RPS synthetic benchmarks being run, but that's obviously very like you strip down everything and you like run it with like nothing.

[10:39] Tzury: So, it wasn't HTTP two, it was HTTP one, for example, with HTTPS, no TLS.

[10:45] Snow Petterson:Yes, like no modification of the page load, no access control, no nothing, which is not how people would use it. So, it becomes of the value of such a benchmark becomes questionable because there's this whole other set of features that I might use.

[11:02] Tzury: Well, that's still an impressive number.

[11:05] Snow Petterson:Going back to how the conflict update works it's generally, there's the static and SIG that it comes up with, the calls we've struck in SIG, which is required run time. This has to be, you need to do a process [Inaudible 11:18] changes, but the typical way you would define this static config is that you will basically say get all of my actual config for the management server and then the management server is responsible, we'll receive a request from the client requesting, hey, I need to know about all of my configuration and so the server is [Inaudible 11:39] of these and the [Inaudible 11:42]comes up with my internal object [Inaudible 11:46] resource he wants. So these are like listeners, which define which port, protocol and transport socket TLS one knock configuration for each listener clusters, how to communicate with all their systems, endpoints, the IP addresses associated with these clusters and then as any new conflict comes in, this gets handled on the main thread, which is used exclusively for this control plane interaction and it gets processed, it gets validated and then it gets posted to all of the worker threads, telling it to update as thread local version of this data.

[12:19]Then there's another mechanism where generally [Inaudible 12:22]snapped to a stream or a request. So if you get a request and while you're processing the request you get a conflict update, it also acts on the old configuration just to make sure that each request has a consistent view of what the configuration looks like.

[12:42] Tzury: So in terms of the Envoy architecture, which is something that when you dive in a little, you find it quite quickly. The beauty of having a stripped-down core barebone, I would say, minimal core of Envoy while almost anything you want to implement and do, things of which you consider obvious and ground level are pretty much implemented as filters or as extensions and I believe that was done by choice at the time. Can you elaborate a bit about this architecture?

[13:17] Snow Petterson:So if we just start off with the filters for processing an [Inaudible 13:22] requests you can define like a list of filters, which is like each filter will process the incoming request and the outgoing response and gives you a chance to like modify the request. I think early on this predates me, but I assume it was natural to just implement the routing mechanism as one of these because what is the actual routing mechanism? Well, it's the thing that accepts the request and it generates a response. So this fits neatly into the filter mechanism and I think in constructing this extension mechanism, that was an early choice to make it very generic so that the way extensions are done can be reused for basically anything. Basically, you have some C plus API that you can implement and you register a factory that accepts [Inaudible 14:10]that defines it and this very generic extension mechanism means that you can do basically anything that's very easy to make anything an extension point. So a lot of things were quickly [Inaudible 14:26] where you'd take something that was previously not, for example, TLS used to be baked in, but in order to better support other ways of transforming the data on that level, it was extracted into a transfer socket extension so that now TLS is just an extension and this just opens up for so many other extensions we built that kind of like fits in the same spot in the stack as TLS.

[14:55]So this is definitely a fantastic choice and has allowed us to do stuff like have different security postures for different extensions. We can say that we have built-in say 50 different filters. Only 10 of them are robust to trusted downstream sort of upstreams and whatnot, which allows us to better evaluate. We can tell people, we can guarantee that this has been vetted via fuzzing and via other production testing. We expect this to be secure. Others, newer extensions generally will be flagged as we don't know, or have a less robust one, which means that if we then get issues coming in and saying that, oh, I found a bug, if this, this and this happens, we can crash the process. If it falls into one or our like less trusted extensions, we say, well, that's okay, we'll just fix it while the other ones will go through like a security release, file a CV for it, because we've already made the promise that these things are secure and so we have to go through the right process to make sure that we disclose it in a responsible manner.

[16:07] Justin: Does this have anything to do with the new contrib folder proposal, or is it completely different?

[16:15] Snow Petterson:So the contrib folder is basically a proposed new way of structuring extensions, where in addition to having basically core and extensions, instead we'd have core extensions and contrib and contributing like a new collection of extensions that are held to, like not the same bar as core extensions and this is to address in terms of like coverage, making sure that it's been signed off by a core maintainer and whatnot, the idea being that there's a lot of very useful extensions that we love proposing, but some of them are, they show up with like a 5,000 lifeline PR and they're like, hey, I made this extension. It's super useful, we've been using it in production, but in order for us to get it into a state where we'll be okay putting it to core, it just takes so much time and efforts unless we have somebody willing to do that work it just sits there. So contrib is a way for us to say, yes, we would happily take it, we're going to put into contrib, which means that we'll do some like due diligence, make sure it looks sane and then we'll probably put it in there.

[17:24] Justin: I mean, yes, that's definitely going to help. It's kind of like the WordPress model where you have this plugin directory and I could just see this taking Envoy probably to the next level if this proposal gets accepted and then built upon because it has to be very discouraging for developers coming and sending a pull request and then getting it declined. It's not like you want to do that, it's just got to find the time and there are other stakeholders that are going to be affected by this inclusion in the course. So I think this is probably the best way to kind of combat this issue that you're having.

[18:01] Snow Petterson:I think it's going to help a lot. I think at the moment, there are so many different paths towards a very extensible platform that the web assembly work that is being done it's a little bit too immature, I think for a lot of people to be willing to use it, but it's getting there. There's also a proposal around adding better support for Sego. So you could ride a shelter that like call send through a Sego API into a real goal binary, as opposed to like goal web assembly extension. There's a lot of interest in making the platform more extensible and so I think [Inaudible 18:40]will help quite a bit and if I'm select the space of like native extensions, but there are other things which I think will help a lot as well, which is like web assembly, in particular, I think that that has a lot of potential. The Sego one is also very interesting.

[18:54] Justin: I'm looking at a gid hub Envoy project and I see over 700 developers contributing code and patching and so how do you guys maintain the time utilisation all this takes, the efforts and the time to navigate between the community users, developers and I believe even without that, you would have the roadmap and the utilities already set out for the next upcoming years. I mean, Envoy is its own roadmap, I mean, let me put it this way, when someone comes to a live project and Envoy is definitely one of those who right now is super cool, super popular taking over cloud vendors. We had the projected Joshi from Google last week and we just talked about how Envoy was actually embedded within Google cloud products and we know Azure and Microsoft Azure under AWS simply do the same. So, envoy has its own roadmap and even without our community involvement, it will have its own tasks, projects, priorities, features upcoming and so on. Now we come in with our own ideas, some of them matching what is already on the roadmap, some really cool ideas, some of them less cool, probably. How do you guys prioritize, manage, maintain these and found a balance between all of this? How many people within the community, if I say, how many of you guys are actually maintaining and monitoring and moderating all these processes?

[20:37] Snow Petterson:We have maybe 10 or 12, something around that, maintainers. We're weekly on call rotation, where every week there is a new maintainer who's on call who will be responsible for triaging issues and PRs assigning them to a review room and I think that works reasonably well. It's always tricky because sometimes there's very hard questions to ask and we don't know the answer and that requires investigation time but a lot of them it's about finding the right person to tag on the issues and kind of understanding who the domain expert on the different ports are and just knowing who might have opinions of things. And there's a prioritizing the work, that's always tricky because as you can imagine, most people working on it, generally have their plates full. Smaller things can often be, you kind of just get it done. Like if somebody asks for like a very small addition and I know exactly how to do it, I can go easily do it right there and then there's sort of a PR, but a lot of things are much larger and then a lot of the time we're kind of reliant on somebody from the community being able to step up and take ownership over mending it, or somebody from like a maintainer or whatnot that has some company internal motivation for getting it done. That tends to be very helpful.

[22:03]Where for example, Google did a lot of work in order to reduce the footprint of stats because they had a lot of like issues internally, as I understand it with certain large deployments hitting memory issues. So for them it was easy to just kind of show up one day and say, hey, we're going to rework how the stats subsystem work in order to improve things and a lot of the work, we're seeing a lot of work now coming in from Google as well on implementing quick and because that's in their interest. If somebody shows up and they have a feature request around quick, where they might want to, whatever it is, something that might not be on the immediate roadmap, given that there's already interest in, there's a lot of people in the community who are working on it, it's a lot easier to prioritize that kind of work, but there's not a lot of like requests around stuff that none of the maintainers have a strong desire to work on either via their company or personal that you basically fall reliant on some reporter or somebody else stepping up and saying, yeah, I'd like to implement this.

[23:12] Justin: So I hate to like go back a little, but I'm really interested in contrib. The reason I found it was I subscribed to the Envoy mailing list and I saw the Google doc base submitted and you see excitement in the Google doc. So I believe that this will just be a new chapter for Envoy and because we did have Anna Brandenburg on and she's very pro Envoy and in also running it very lean without too many extensions. So this whole new way of using extensions that might not be Google would never install and to their infrastructure, but many others might, what do you anticipate on extensions that will be coming out that can't make it to core? What is it that people want that they can't get right now?

[24:12] Snow Petterson:Small extensions have typically been a lot easier to get in. So, if somebody wants an extension that does like a very small modification [Inaudible 24:22]as long as it's somewhat generic, that it's okay. So I think the bigger thing will be like larger filters and extension. So, a great example here, I think are like protocol parsers that will like generate stats and was like statutory ones and also like ones that can give routing. So various protocols that aren't currently supported, I said, this is because I've reviewed some of the PRS to add support for other protocols. I helped align the support for suite keepers that generation, that's like a three, 4,000 line PR, which is fairly hard to get in because it definitely requires aligning interests between maintainers and contributors without requiring this maintainer sponsorship. I think it will be a lot easier for people to add in support for parsing data of various other protocols. So I think that that's probably a big one because I definitely seen a lot of issues where people asking about support for some protocols that I'd never heard about. Stuff in like Telecom and all those things where they want it to be able to have some kind of like understanding of the protocol and without a sponsor that's been really hard to get in.

[25:31] Justin: Are you talking like there are four protocols or higher ones?

[25:36] Snow Petterson:I think they're all higher. Yes, I think they run on top of TCP. I don't think the room for extensions and implementing layer four protocols is a bit tricky, I think just because we're kind of where they essentially lay today, but I'm sure there are people who will be interested in doing that too.

[25:52] Justin: But they can basically implement that say as a GCP filter, right?

[25:57] Snow Petterson:Yeah, exactly.

[25:58] Justin: On top of the GCP?**
**

[25:59] Snow Petterson:Yes, like that's how the zookeeper filter is implemented. It's a TCP field networks filter that sits before the TSP proxy. So all it does is inspect the byte as it flows through a proxy and parses the protocol, which then allows us to generate stats based on which commands are used, which I think it's mainly around which commands are used before it gets passed to the TCP proxy, which just does a standard TCP proxy off the data.

[26:31] Justin: None of this is considered site car loading? This is just...

[26:36] Snow Petterson:Yes, you're beefing up your site car.

[26:38] Tzury: Okay. Yes, I like that. What was the, I would say the most obscure, unexpected use of Envoy in production that you came across that you'd say, oh, we never imagined that people will use Envoy that way or for this purpose, then you just, oh, that's awesome, look to people's imagination?

[26:58] Snow Petterson:I think there are some cases I've heard of them running it inside of vehicles. I don't think it's fully autonomous, but as part of some system, I forget the details, but I've definitely heard cases of distributed systems being operated within some kind of vehicles. I forget the details, I'm sorry.

[27:17] Justin: All good. Like autonomous, like in the car.

[27:20] Snow Petterson:Yes. But if you really think about it, it's also not that weird because I'm sure they used to have some microservices running and they need to connect them somehow. So they're probably running Kubernetes too, I don't know.

[27:32] Justin: Like a raspberry pie stack.

[27:34] Tzury: Well, why would you have microservices inside your Tesla? How many services are running within Tesla, for example?

[27:41] Snow Petterson: [Inaudible 27:42]I'm sure at some point you got the same problems in that space too or like you want one team to work on these components and they want to be able to have a nice contract so that they can make changes to their system without having to affect the other ones. So then yes, now you're talking about API contracts between components and you're not too far away from the microservice architecture. Maybe you're not going all over the internet but...

[28:09] Justin: They are probably using portable data center, I would say, mini data center, right? Interesting. So if I'm asking you a like over the years, your roles in Envoy, how much of your time is actually has to do with writing new code versus previewing others versus answering emails and file or responding to issues on gid hub, which is accrual into emails and communication in general?

[28:41] Snow Petterson:There was a period of time around the time when I started being a maintainer and like a bit before it, when I was writing a lot of code, just because again, I think it aligned very well with what my company needed at the time. Now, like over the time I've just gotten review ownership or more and more code and being brought into more and more like, hey, you know how this works so can you chime in? So I definitely drifted away more towards the side of just communication. It's always nice to get some code written every now and then, but there's so much other stuff that happens that I always have to be careful about making myself the blocker for code landing.

[29:21] Justin: Looking at your gid hub profile, 72% of your time is on code review, 12% on commits and 16% on poll requests. So, you're doing quite a review.

[29:33] Snow Petterson:Yes, no, there's I think just for my first time, I like six months like a review at work and I pulled my status and I think I had like three or 400 reviews, which is the number of times I like hit submit review or whatever over six months. So it's quite a bit.

[29:51] Tzury: What are the golden rules for code review that you would share with the public when you review a code? What are the do's and the don'ts?

[30:01] Snow Petterson:One really important part is making sure that, assume that the person that wrote the code has good intentions and that they're doing their best. So be respectful in that way and make sure that you're not, if they do something wrong, tell them gently. There's no reason to be mean. So tell them gently and explain what's happening and ask questions and don't be too arrogant. I think I see a trend somewhere where people say you see comments, like this doesn't make any sense at all, why are you doing it like this? And sometimes they're wrong because sometimes it's because the thing that they did is perfectly reasonable if they just like miss the context because even if you're the supposed expert doesn't mean everything. So I'll do much more of a cautious approach where if there's something that doesn't make sense to me, I'll ask them why this was done and try to understand it and kind of present it like that. I think having that kind of, showing that kind of respect for the people who wrote the code is very helpful because it helps the PR you're working on to proceed smoothly. But it also means that they're more likely to come back because they've had a good experience. So I think just we're creating like a very inclusive environment. Let's see, there's a running thing and Envoy PR reviews where the maintainers and 99% of the time always like includes the word things when they approve a PR, always have to show your appreciation.

[31:30] Justin: No, it goes a long way. It really does because it's really hard to get into someone's head on the other side and you'd be like, are they mad at me? Are they annoyed? So yes, that definitely goes a long way.

[31:41] Snow Petterson:I thought this was super helpful when I was like ramping up on the project as well and also seeing sometimes a PR drags on for a long time and also including like a, hey, thanks for iterating on this, this will be great. Just making sure that people are encouraged to keep working on it because yes, like you said, it's hard to communicate your own state of mind to get up issues or PRS. So like including like a very clear like, hey, no, I'm very happy that you're doing this work. I think it's very helpful.

[32:15] Justin: Definitely. I mean like the emoticons, those are cool, but sometimes people just like to do a thumbs down and that's just like a burn, but I think overall it definitely goes a long way, no doubt about it. It's very important that people do communicate their gratitude and it would actually be a really interesting college thesis to see which projects, what the language is like, thanks versus not thanks, how healthy is the project in terms of adoption? That's not, I don't know, if we have a listener that wants to do that that'd be really interesting and I'm sure Snow, it would help you out. Anyway, it was so great having you on. We're going to do a little after show after, but before we go on to that next phase, how can people find you online? Is it Twitter? Is it gid hub? Where can people find Snow?

[33:10] Snow Petterson:I'm on Twitter, I think, what am I @snowypeas? And my gid hub is snow peas, there are some links there to find me as well.

[33:20] Justin: Yes, it will be in those show notes for sure and is there anyone in the Envoy community specifically in the Envoy community that you'd like to give a shout out to so they can be like, oh my God, I was mentioned on the podcast?

[33:31] Snow Petterson:The entire Envoy community, thank you. Thank you for everything.

[33:35] Justin: Awesome

[33:36] Outro: Listeners, I hope you enjoyed this one. Do tune in next time, we're really excited about our line-up of guests. We have super exciting guests next week as well. Check out the show notes for this podcast at podcast.curiefense.io. That's C U R I E F E N S E podcast.curiefense.io for the community to cloud native podcast. Thanks again for listening, tune in next week, catch you later.

Special Guest: Snow Pettersen.

Sponsored By:

  continue reading

25 bölüm

Tüm bölümler

×
 
Loading …

Player FM'e Hoş Geldiniz!

Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.

 

Hızlı referans rehberi