Mastering The Dark Side: Keysight's CIO Talks User Experience And Digital Transformation In IT
A lot has changed over the last few years. Remote work is here to stay, and hybrid architectures are shaking things up. But despite all that upheaval, one constant remains: user experience is as important as ever. That's why we sat down with Dan Krantz, Keysight Technologies' chief information officer, to get his thoughts on monitoring distributed networks, leading network operations teams, and the evolving role IT plays in digital transformation.
Organizationally, How Much Importance Do You Place On User Experience? In Your Opinion, How Much Effect Does It Have On The Business As A Whole?
We put a lot of focus on user experience in IT. In fact, two years ago, I started an employee recognition program called the Smarter IT Awards, where we hand out little Yoda statues because we're "battling the Dark Side" of bad user experience. The goal is to incentivize people to come up with new, Yoda-worthy ways to improve user experience through fewer clicks, faster response times, or clever automation. We have both a peer recognition award and one selected by management and I always highlight the winners at my quarterly all-hands meetings. Then, at the end of the year, I present a Jedi Master award to someone who is not only battling the Dark Side of bad user experience but is also teaching others to do so as well.
User experience is so important, in my opinion, because bad user experience leads to wasted time. Time is the one resource we share with our competitors, and none of us can get more of it. So, the more time I can get people in Keysight to spend doing something productive, the more competitive we’re going to be. Conversely, if people are less productive wasting time on bad user experience or bad IT interactions we’re just going to be less competitive. I always tell people on my team, it’s not really about people liking the systems and applications or even liking IT, for that matter. If something looks and feels really cool but takes just as long to navigate through, it’s not worthwhile. For me, the business value of user experience is all about time compression and how much faster someone can get something done.
For example, as we help Keysight achieve its own digital transformation goals, our IT department is undergoing its own transformation. Whether an employee has an issue with a PC, wants to order a headset, or needs a user account on a reporting platform, we want to automate every one of those requests and fulfill them through an artificial intelligence (AI) engine. That way, if you put in a request and approval is needed, it’s automatically routed to your manager who simply needs to say “yes” or “no” before the action is autonomously fulfilled on the back end. Instead of taking days, it’ll take hours or minutes. It’s a real game changer. It’s going to require everyone in IT to tackle things differently, automate a lot of stuff, and essentially turn us all into software engineers.
What Are Some Of The Most Common Challenges You've Experienced With Network Monitoring And Maintaining Consistent Quality Of Service?
The biggest challenge with network monitoring is trying to go beyond the basics. We are emerging from a place where we were very human-driven, relying on our outsourced partner to manually monitor and react to each switch, each router, and each circuit in our global network. To scale our growth, we've moved this in-house and begun automating. We want our systems and tools to inform us when there's an issue and, ideally, self correct if they can. It's been hard enough to set up the basic, automated up/down tracking, but I really want to go beyond that. I want to measure performance at the raw level like throughput, megabits per second or latency and measure the real user experience.
The other challenge is filtering out the noise, like false alarms. Now that we've moved past the human approach to a systems-based approach to monitoring, you have to get things configured properly. Otherwise, you can easily drown in meaningless alerts and have to revert to people making sense of all the data that's coming in.
In your opinion, how do synthetic monitoring tools fit into a NetOps tool stack? Do they work well in concert with more passive tools, like packet fed application and network performance management platforms?
Synthetic monitoring tools are where an organization needs to be. You want to get to the stage where you can really see the true user experience, but you need to make sure you have your basic table stakes monitoring in place first. Network monitoring is a stack, and synthetic monitoring tools are the next layer up. If you don't have a solid foundation like your basic monitoring of availability, for example who cares? You've got to get the basics right first and then implement synthetic monitoring for insights on user experience.
The challenge I see with synthetic monitoring tools is our new work-from anywhere / cloud-connect-to-anywhere mode. As employees, we might be accessing systems on our corporate network, such as our data center, or maybe it's in a lab environment or systems in a manufacturing line. We might also need to access something in the cloud or a software-as-a-service (SaaS) application. But then you take your laptop, leave the office, and need to work from home for a while. Instrumenting synthetic monitoring there gets tricky because you need it on the endpoints. That's what users are taking whenever they connect to access things on the corporate network, in the public cloud, or through SaaS applications. The one constant is the endpoint. You need monitoring beyond your corporate network if you really want full resolution into the user experience.
Challenges in Network Monitoring and Maintaining Consistent Quality of Service
The biggest challenge with network monitoring is trying to go beyond the basics. We are emerging from a place where we were very human-driven, relying on our outsourced partner to manually monitor and react to each switch, each router, and each circuit in our global network. To scale our growth, we’ve moved this in-house and begun automating. We want our systems and tools to inform us when there’s an issue and, ideally, self-correct if they can. It’s been hard enough to set up the basic, automated up / down tracking, but I really want to go beyond that. I want to measure performance at the raw level like throughput, megabits per second, or latency and measure the real user experience.
The other challenge is filtering out the noise, like false alarms. Now that we’ve moved past the human approach to a systems-based approach to monitoring, you have to get things configured properly. Otherwise, you can easily drown in meaningless alerts — and have to revert to people making sense of all the data that’s coming in.
NetOps Tool Stack Performance
Synthetic monitoring tools are where an organization needs to be. You want to get to the stage where you can really see the true user experience, but you need to make sure you have your basic table stakes monitoring in place first. Network monitoring is a stack, and synthetic monitoring tools are the next layer up. If you don’t have a solid foundation like your basic monitoring of availability, for example who cares? You’ve got to get the basics right first and then implement synthetic monitoring for insights on user experience.
The challenge I see with synthetic monitoring tools is our new work-from-anywhere / cloud-connect-to-anywhere mode. As employees, we might be accessing systems on our corporate network, such as our data center, or maybe it’s in a lab environment or systems in a manufacturing line. We might also need to access something in the cloud or a software-as-a-service (SaaS) application. But then you take your laptop, leave the office, and need to work from home for a while. Instrumenting synthetic monitoring there gets tricky because you need it on the endpoints. That’s what users are taking whenever they connect to access things on the corporate network, in the public cloud, or through SaaS applications. The one constant is the endpoint. You need monitoring beyond your corporate network if you really want full resolution into the user experience.
For example, when everyone was working from home last year, we were fielding lots of escalations about individual applications not performing well. But the problem wasn’t our network it was employees’ internet service providers (ISPs) or home Wi-Fi. One time, we were trying to play a video in an important virtual meeting, and it started glitching and freezing up. We’d tested it multiple times, but it turns out the person who was trying to play the video had suffered an internet outage in their neighborhood right at the same time. That’s the real challenge right there. Even as we go back into the office, I think the idea of working anywhere is here to stay. So how can we use synthetic monitoring to understand the “real” user experience when people are transiting across all of these networks that we don’t own?
Since so many companies are stitching together various network providers, I’d love to see those providers start instrumenting their networks with APIs [application programming interfaces]. That way, companies like Keysight could tap into them to get insights and visibility into what’s going on. Rather than just going to an ISP and seeing the standard red, yellow, and green network statuses, it would be really cool if they enabled companies like us to tap into the APIs of their network monitoring tools, so we could federate performance data across the ecosystem. Even if it were monetized, we might be willing to pay all the ISPs our employees use to tap into their APIs to see the performance of their networks. That way, we could better serve our employees who rely on those ISPs to do company work.
The other piece of the puzzle is the endpoint. Now we have to toy with the endpoint agent side of things. That’s one of the ways we are getting at this problem here at Keysight. We’re putting a small, unobtrusive AI agent onto all computers that tries to anticipate issues before users observe them. That way, we can utilize the agent’s machine learning algorithms to predict problems before they occur including in the network space widening our aperture into what’s going on from a performance perspective. In other words, we’re coming at network performance from the endpoint versus from the network switch or infrastructure. Since we no longer own the physical network in all cases, we have to come at it differently.
To run a global network where you have interconnected switching, routing, and fabric at each site the only way to make it scale and run effectively is to start with consistency and standardization. For example, when Keysight acquires a company, we typically rip out whatever network they had and put in our network. Even if it works fine or better than ours, it’s not the same. And for us to be able to scale and run a global network, it needs to have the exact same model of network switches and routers as us so we can standardize changes. That way, when we’re going to make a small update and push it out, hundreds of devices are all going to take that change at scale. That’s the way to run a network — making sure there’s consistency and standardization across the board.
Then, when you have all that, you can start to build automation. So, instead of manually making changes which can ruin a network engineer’s weekend by forcing them to log on to hundreds of network switches and update each one individually you need to have a way to just press a button and push out the change, so it propagates automatically. At the same time, that automation needs to be smart, so you have a way to roll things back if there’s a mistake. I’ve seen issues where a company rolled out an application change that caused outages on our network. However, that same company showed us how, when they push out code changes, they don’t just send it out to everyone at the same time. This particular application has over 130 million users at any given time, so they send out updates in what they call “rings.” The change first goes to ring 1, then ring 2, ring 3, and starts to scale out. That way, if they detect a problem, they can contract it back in reverse order.
Lastly, make sure you treat all your tools whether it’s network monitoring, automation, or whatever with the same importance you place on the network fabric itself. Some teams have a habit of campaigning to get management to buy them new network monitoring tools. But once we’ve forked over the money and implemented them, the team doesn’t take care of them. The tools start to erode in their effectiveness and lose their value. You’ve got to treat them just like you would a top-of-stack network switch that’s of critical importance to your on-site network. You have to maintain your tools. You need to be updating them, patching them, and continuously making them better and better. Your network monitoring and automation tools are no different than any production component of your network. Make sure you don’t shortchange them.
With the proliferation of things like SD-WAN, edge computing, virtualization, and cloud, what challenges do you foresee in monitoring increasingly hybridized networks? How do you think organizations will evolve to meet these new challenges?
Since so many companies are stitching together various network providers, I'd love to see those providers start instrumenting their networks with APIs [application programming interfaces]. That way, companies like Keysight could tap into them to get insights and visibility into what's going on. Rather than just going to an ISP and seeing the standard red, yellow, and green network statuses, it would be really cool if they enabled companies like us to tap into the APIs of their network monitoring tools, so we could federate performance data across the ecosystem. Even if it were monetized, we might be willing to pay all the ISPs our employees use to tap into their APIs to see the performance of their networks. That way, we could better serve our employees who rely on those ISPs to do company work.
The other piece of the puzzle is the endpoint. Now we have to toy with the endpoint agent side of things. That's one of the ways we are getting at this problem here at Keysight. We're putting a small, unobtrusive AI agent onto all computers that tries to anticipate issues before users observe them. That way, we can utilize the agent's machine learning algorithms to predict problems before they occur including in the network space widening our aperture into what's going on from a performance perspective. In other words, we're coming at network performance from the endpoint versus from the network switch or infrastructure. Since we no longer own the physical network in all cases, we have to come at it differently.
If You Could Give A Network Operations Team Any Piece Of Advice, What Would IT Be?
To run a global network where you have interconnected switching, routing, and fabric at each site the only way to make it scale and run effectively is to start with consistency and standardization. For example, when Keysight acquires a company, we typically rip out whatever network they had and put in our network. Even if it works fine or better than ours, it's not the same. And for us to be able to scale and run a global network, it needs to have the exact same model of network switches and routers as us so we can standardize changes. That way, when we're going to make a small update and push it out, hundreds of devices are all going to take that change at scale. That's the way to run a network making sure there's consistency and standardization across the board.
Then, when you have all that, you can start to build automation. So, instead of manually making changes which can ruin a network engineer's weekend by forcing them to log onto hundreds of network switches and update each one individually you need to have a way to just press a button and push out the change, so it propagates automatically. At the same time, that automation needs to be smart, so you have a way to roll things back if there's a mistake. I've seen issues where a company rolled out an application change that caused out ages on our network. However, that same company showed us how, when they push out code changes, they don't just send it out to everyone at the same time. This particular application has over 130 million users at any given time, so they send out updates in what they call "rings." The change first goes to ring 1, then ring 2, ring 3, and starts to scale out. That way, if they detect a problem, they can contract it back in reverse order.
Lastly, make sure you treat all your tools whether it's network monitoring, automation, or whatever with the same importance you place on the network fabric itself. Some teams have a habit of campaigning to get management to buy them new network monitoring tools. But once we've forked over the money and implemented them, the team doesn't take care of them. The tools start to erode in their effectiveness and lose their value. You've got to treat them just like you would a top-of-stack network switch that's of critical importance to your on site network. You have to maintain your tools. You need to be updating them, patching them, and continuously making them better and better. Your network monitoring and automation tools are no different than any production component of your network. Make sure you don't short change them.