Business — Improving Cybersecurity:  1st Draft of a new way to put out the dumpster fire.


This post is long and dense.  But I hope … and I believe … it’s worth your time if your professional success depends on cybersecurity.


Cybersecurity … or, more accurately, cyber-insecurity … is worsening.


Threats and attacks are increasing, not subsiding.  Vendors continue to spew out products and services that are laden with vulnerabilities (aka cybersecurity quality defects) and are too difficult for mere mortals to securely install, configure, and operate. 


The impact of attacks is growing in both severity and scope. 


The economic impact is enormous.  The well-being of our businesses and our lives is at increased risk of further worsening.


5G, IoT, and AI will change the attack surface in ways that will exacerbate the situation. 


Clearly, the current cybersecurity regime is not getting us where we need it to be … and where it can be.  If I were a senior executive in any business, I’d be desperate for an answer. Well, here you go; this is what I’d do and I bet it will work.


A … if not the …  impediment to improvement is the absence of an overall quantitative measure of cybersecurity performance … an architecture organizing and linking economic impact, useful operational performance metrics, and best-practice usage metrics … and a performance management system.  Caveat:  Perhaps such a thing exists but I haven’t seen it.  I’ve tried these ideas on a couple of cybersecurity experts and there was no uptake, i.e., maybe I’m wrong.


But I doubt it.  I’m very confident that the approach is implementable and can yield significant improvements in cybersecurity.  Why?  Because I have used a similar approach to deliver dramatic improvements in performance … financial, Customer satisfaction, and operations … in a variety of very different contexts:  product businesses, service businesses, network performance, and national security operations.  I can think of no reason why the same approach wouldn’t improve cybersecurity. 


I also know that these thoughts will change and improve as they are applied in the real world, as we learn by doing.  That’s why I call it a 1st draft.


What follows is how to do it.  I would love to work with a business unit or company that wants to give it a serious try.


Cybersecurity Financial Impact


The most useful top metric is the (estimated) Financial Impact due to actual cybersecurity events.  It’s easy to understand.  It’s in the language the Board and senior executives use.  While it’s retrospective, so are financial results.  Here’s how you could calculate it:

      • Revenue lost (e.g., refunds to Customers because a service wasn’t available or lost sales because an on-line ordering or CRM system wasn’t available).

        • Additional cost (e.g., recovering a data base, replacing infected hardware, buying identity monitoring services for Customers impacted by a data breach, etc.)

          • Wasted cost (e.g., lost productive hours of your employees because they can’t use the tools and data they work with).  Anecdote:  I once analyzed IT outages and impairments at a work center with over a thousand people.  I used trouble tickets and MTTR.  On average, each of those thousand people couldn’t use all their tools and/or IT services for over 6 hours a week, almost a full workday every week!

            • Asset losses (e.g., the value of stolen intellectual property).

          Implementation tip:  Someone from the CFO team must be part of your Cybersecurity metrics effort, right from the start.  They need to develop a process for quantifying the Financial Impact using a consistent approach and consistent assumptions.  When you report results, this will assure you have the CFO’s validation that they stand behind the financial impact reported.


          Cybersecurity Primary Operational Metric — DPM


          I really, really like something analogous to the Network Reliability DPM (defects per million) metric now commonly used to measure the reliability of telecom and IT services.  (All credit to AT&T’s Frank Ianna and Art Deacon for that break-thru idea and its implementation.)


          For Network Reliability, a defect is defined as 1 minute of outage for a User (a Customer or an employee).  Using the word “defect” is a subtle but important for framing downtime as a quality defect.


          How do you calculate it?  If a User had 1 minute of outage on a service in a 24-hour period (1,440 minutes), that would be 694 DPM (1 million times 1 outage minute divided by 1440). 


          High quality telecom networks run in the 50-200 DPM range.  100 DPM is 99.99% uptime … 4 nines.  100 DPM is about 1 minute of down time in a week.


          Here’s the important part:  Having such a metric allows you to measure your overall cybersecurity operational performance.  When consistently applied over time, you can track improvement progress.  You can look at DPM by product or service.  You can look at DPM by Customer or market segment.  When you look at the contributors to defects, it allows you to prioritize investments in reducing the User-impacting outages and/or speeding restoration.  These are the only 2 levers you can pull to make things better:  make it break less frequently … and fix it faster when it does break.


          Network Reliability DPM is easily understood by … and resonates with … Customers, your employees, and executives.  And there’s an obvious, clear, and logical link between financial impact and operations.  Reduce DPM and you reduce financial impact.


          A vignette.  I’ve used this technique to improve Reliability of a “new” telecom service.  The service was built with “enterprise grade” … not “carrier grade” hardware and software.  Providers had a “best effort” mindset.  In short, the industry had convinced itself that the service was condemned to perform poorly compared to other telecom services.  Not so.  Using this approach, we drove DPM from 1,000 to 100-200 DPM performance in a couple of years.  Customer satisfaction, revenue growth, and market share soared.


          We … and the world … need something like Network Reliability DPM for cybersecurity.


          So, what might that be?


          How about Cybersecurity PI+AIUM DPM – The number of “Potentially” Impacted User Minutes (PIUM)… and Actually Impacted User Minutes (AIUMs) … each divided by total system-user minutes … times 1 million.  Sorry PI+AIUM is a mouthful … but at least it’s descriptive.


          Potentially Impacted User Minutes (PIUM)


          Vulnerabilities are quality defects and cybersecurity ticking time bombs.  PIUMs are the # of users who are using a system that has a cybersecurity vulnerability times the # of minutes the vulnerability is known to have existed. 


          Example:  Let’s say a vulnerability in a system is discovered to have existed for 1 week before it was identified and fixed (which I guess is fast).  Let’s say the system had 1,000 users.  PIUM = 1 week times 7 days per week times 24 hours per day times 60 minutes per hour times 1,000 users = 10.08M potentially impacted user minutes.  


          Want to improve?  Reduce the # of vulnerabilities and reduce how long they exist.


          Let’s say your company has 10 major systems, each with 1,000 users.  User Minutes for this same 4-week period would be 10 systems times 1,000 users each times 4 weeks times 7 days per week times 24 hours per day times 60 minutes per hour = 403.2M User Minutes.


          PIUM DPM would be 1,000,000 times the 10.08M PIUM divided by 403.2M User Minutes = 25,000 PIUM DPM.


          Actually Impacted User Minutes (AIUM)


          These are the cybersecurity time bombs that have gone off.  AIUMs are the # of users who are using a system that is impaired or out-of-service due to a cybersecurity incident times the # of minutes the impairment or outage lasts. 


          Example:  Let’s say in the same 4-week period, there is a system with 1,000 users that is impaired or out-of-service for 1 hour (again, I guess this is fast).  AIUM = 1 hour times 60 minutes per hour times 1,000 users = 0.06 million actually impacted user minutes.  


          Want to improve?  Reduce the # of incidents and reduce their duration.


          Let’s again say your company has 10 major systems, each with 1,000 users.  User Minutes for this same 4-week period would be 10 systems times 1,000 users each times 4 weeks times 7 days per week times 24 hours per day times 60 minutes per hour = 403.2M User Minutes.


          AIUM DPM would be 1,000,000 times the 0.06M AIUM divided by 403.2M User Minutes = 149 AIUM DPM. 


          What matters is driving both PIUM DPM and IUM DPM down with practical, straightforward operational actions you can take:

            • Reduce the # of vulnerabilities.  Demand improved quality from your software and hardware suppliers.

            • Reduce how long it takes to discover a vulnerability and fix it.  Demand quicker discovery and fixes from your suppliers.

            • Reduce the number of incidents.  Improve your defenses.

            • Reduce how long it takes to stop them.  Expect quicker incident sensing and corrective action from your CIO / CISO team.

          How to “eat the elephant, one bite at a time?” 

            • For each vulnerability and incident, identify the root cause. 

            • When you look at all the root causes over several months, you will see root causes that are the “fat rabbits”, those that contribute the most PIUMs and AIUMs.  (Almost certainly, there will be a Pareto distribution with a few root causes contributing most of the PIUMs and AIUMs.) 

            • Which of these “fat rabbits” can be most easily, cheaply, and quickly fixed … so that they “never” happen again … for all systems? 

            • Who owns fixing each?  What is their action plan and schedule for fixing each?

            • “Rinse and repeat.”

          Cybersecurity Secondary Operational Metrics


          For sure, you will need to measure: 

            • # of systems.  (You might start with your top N most important systems, perhaps those with the largest number of users, be they employees or Customers.)

            • # of users for each system.

            • Mean Time to Identify (how long did it take to detect a vulnerability or an attack).

            • Mean Time to Contain (how long did it take to fix the vulnerability or shield the attack).

            • # of known vulnerabilities in each system.

            • # of known out-of-service or impairment incidents on each system.

          I recommend starting with the above and letting what you learn drive what other metrics you need.  Following are some examples, which I provide for purely illustrative purposes:

            • # of vulnerabilities per vendor (Microsoft, Apple, IBM, AWS, etc.)  (We need a lot more attention to vendors releasing into the marketplace products and services that have too many vulnerabilities and that are overly difficult for mere mortals to install, configure, and operate.)

            • # of SSL certificates configured incorrectly.  (This is a vulnerability.)

            • Volume of data transferred on the network.  Look for suspicious anomalies, e.g., why is node A suddenly and unexpectedly sending data to node B.

            • # of users with access privileges beyond those required by their work role.

            • # of days to de-activate former employees’ and contractors’ access privileges and credentials.

            • # & Duration of open ports.

            • IP addresses attached to the network and the owner of each, by name.

            • # of systems without standards for “approved” cybersecurity configuration settings.

            • # of systems not compliant with “approved” cybersecurity configuration settings.

          After a few months, evaluate all your existing cybersecurity metrics to see if they are useful and if they have a clear logical linkage to Cybersecurity DPM.  If not, you can stop using them.


          Implementation


          Most of the cybersecurity industry’s attention has been on tertiary, best practice-oriented metrics.  That focus is not improving … much less radically improving … cybersecurity. 


          They lack any context and relative importance that higher order metrics … Cybersecurity Financial Impact and Cybersecurity DPM PI+AIUMs … will provide.   


          The final ingredient in the secret sauce is a disciplined management system for reviewing results, identifying and prioritizing improvement opportunities, designing and resourcing improvement programs, and tracking improvement progress and programs.


          It will take 6-months to be well on your way.  Initially, the senior leader will need to devote a 2-3 days per month to implementation and managing performance, supported by no more than a couple of staff people who can gather the data and report results in a consistent and uniform way.


          I welcome your comments on the above … and repeat my offer to help you give it a try.



          Discover more from Reed Harrison's

          Subscribe to get the latest posts sent to your email.

          2 thoughts on “Business — Improving Cybersecurity:  1st Draft of a new way to put out the dumpster fire.”

          1. Impressive article (I think)🥴
            While I understood very little of the content, I was reminded of Reed I knew in school😂
            Easy to see why you made the big bucks.
            Can you come up with a similar plan to get the O’s to the World Series?

            1. Hi Mike, Thx for reading my ramblings … and your kind words. Like you, I worked really hard. Easy because I loved it. And, I was more than a little lucky with great bosses and lots of being at the right place at the right time. My plan for the O’s is to minimize my in-person game attendance. :-). Reed

          Leave a Reply

          Discover more from Reed Harrison's

          Subscribe now to keep reading and get access to the full archive.

          Continue reading