There’s already been a lot written about the persistent tension between operations and security. The security team’s mission is protecting the business from malicious activity, and that sometimes means locking systems down. The operations team’s mission is to maximize the business’s ability to do business on their IT systems, including managing software and configurations.
Then, of course, there is the user experience. Have the security tooling and other changes consumed so many system resources that users can’t perform their jobs? Is memory maxed out? Are applications crashing? You need to have a way to measure user experience to answer these questions.
Tracking user experience
When something breaks, how do you know? Change control is great but you need a way to measure the impact of changes that have been made. Let’s say you’ve closed 10 vulnerabilities on your endpoints. Are your applications crashing? Have your systems started using more resources? Do you have more systems running at 100-percent CPU usage than you did before? Because a system with no resources means there is an employee that’s being prevented from doing their job.
This is where you need analytics. You can’t depend solely on users for timely, reliable information.
Analytics and the user experience
To take some of the burden off the service desk, many large organizations simply give all users admin rights. They resort to that because they don’t have a way to identify systems ahead of time that will generate problems.
They don’t have any way to measure resource utilization, which is done regularly on servers but not on user devices. So, they have no clue what the user experience is. They have no data except, “Has anybody opened a ticket?”
Performance metrics are a subset of IT analytics and they’re critical. When the security team wants to install more agents, operations can show that user systems are already running at 75% of maximum capacity. Add those new tools and users won’t be able to work. Those are the analytics that support business decisions.
Cyber hygiene and analytics for the C-level
When it comes to cyber hygiene, the primary question of C-level executives is “Can my users do their jobs?” Many IT decisions are based on the risk of IT systems getting in the way of employees being able to work. But making those decisions without supporting data leads to trouble.
This is where executive-level dashboards can make a huge difference. Easily consumable metrics can help execs figure out at a glance where to draw the line between security and operational risk.
For example, if a key indicator shows that 20% of organizational systems are missing critical patches that’s normally cause for concern. However, if the dashboard shows that last month the figure was 50%, the trend is at least headed in the right direction. That’s certainly something to keep an eye on month over month to ensure the trend continues improving.
At the same time, if a system’s performance monitoring indicator displays “green,” indicating minimal outages, that’s all the executive needs to know that risk has been reduced this month while ensuring solid system performance.
Here are three key indicators an executive dashboard might include:
Percentage of systems with baseline security tooling
Percentage of systems vulnerable to missing patches
Percentage of systems performing above or below a defined performance threshold — CPU, RAM, disc utilization, etc.
If there’s a problem at the summary level, executives can alert their IT teams to dig into it. They don’t need to know the details; they just need to know that approved standards are not being met.
The importance of fresh data
When an issue arises requiring intervention, it’s critical that engineers have access to real-time data on all their systems in one place. Without it, they’re forced to spot check systems or wait until they get the next scheduled report. They end up not knowing what’s accurate and what isn’t.
If you’re doing it right, the engineering team should always know before leadership does. Ideally, before an issue hits the executive dashboard, it’s resolved.
How did the move to remote workforce affect the practice of cyber hygiene?
A lot of companies lost a minimum of six months adjusting to life with a distributed workforce. The information that IT executives needed to make qualified business decisions disappeared overnight. When 90% of the workforce went remote, the companies with great on-premises tools lost visibility to everyone working from home.
They couldn’t get data from, update, or even see endpoints that weren’t connected 24×7 to the corporate network. Companies that couldn’t connect with endpoints over the Internet lost the ability to gather endpoint data and understand their state. So, from an analytics and decision-making perspective, they were forced to guess.
Cyber hygiene, Zero Trust, and the remote workforce
When the pandemic hit, many companies couldn’t provide desktops or laptops for everyone, so they effectively said, “Use your own device and we’ll deal with consequences later.” In some cases, critical patches were missed because organizations had no way to patch remotely.
Without making tough decisions like that, people could not work and the business would not be able to function. So, this was the opposite of Zero Trust. It was blind trust — and hope for the best.
Without good cyber hygiene, there’s no moving to Zero Trust. With poor IT hygiene, Zero Trust can bring your operations to a grinding halt because nothing will be trusted.
A large number of users and devices will fall into the “don’t trust” category. So, before companies purchase and try to implement a Zero Trust solution, they need to get the basics of cyber hygiene right.