Assurance vs. Risk Management

August 29, 2012

One of my current hot button is the over-emphasis of assurance with regards to risk management. I recently was given visibility to a risk management framework where ‘management assurance’ was listed as the goal of the framework. However, the framework did not allow for management to actually manage risk.

Recently at BSidesLA I attempted to reduce the definitions of risk and ‘risk management’ down to fundamental attributes because there are so many different – and in a lot of cases contextually valid – definitions of risk.

Risk: Something that can happen that can result in loss. It is about the frequency of events that can have an adverse impact to our time, resources and of course our money.

Risk Management: Activities that allow us to reduce our uncertainty about risk(s) so we can make good trade off decisions.

So how does this tie into assurance? The shortcoming with an assurance-centric approach to risk management is that assurance IMPLIES 100% certainty that all risks are known and that all identified controls are comprehensive and effective. An assurance-centric approach also implies that a control gap, control failure or some other issue HAS to be mitigated so management can have FULL assurance regarding their risk management posture.

Where risk management comes into play is when management does not require with having 100% assurance because there may not be adequate benefit to their span of control or the organization proper. Thus, robust risk management frameworks need to have a management response process – i.e. risk treatment decisions – when issues or gaps are identified. A management response and risk treatment decision process has a few benefits:

1. It promotes transparency and accountability of management’s decisions regarding their risk management mindset (tolerance, appetite, etc.).

2. It empowers management to make the best business decision (think trade-off) given the information (containing elements of uncertainty) provided to them.

3. It potentially allows organizations to better understand the ‘total cost of risk’ (TCoR) relative to other operational costs associated with the business.

So here are the take-aways:

1. Assurance does always not equate to effective risk management.

2. Effective risk management can facilitate levels of assurance, confidence as well one’s understanding of uncertainty regarding loss exposures they are faced with.

3. Empowering and enabling management to make effective risk treatment decisions can provide management a level of assurance that they are running their business they way they deem fit.


Metricon 6 Wrap-Up

August 10, 2011

Metricon 6 was held in San Francisco, CA on August 9th, 2011. A few months ago, I and a few others were asked by the conference chair – Mr. Alex Hutton (@alexhutton) – to assist in the planning and organization of the conference. One of the goals established early-on was that this Metricon needed to be different then previous Metricon events. Having attended Metricon 5, I witnessed firsthand the inquisitive and skeptical nature of the conference attendees towards speakers and towards each other. So, one of our goals for Metricon 6 was to change the culture of the conference. In my opinion, we succeeded in doing that by establishing topics that would draw new speakers and strike a happy balance between metrics, security and information risk management.

Following are a few Metricon 6 after-thoughts…

Venue: This was my first non-military trip to San Francisco. I loved the city! The vibe was awesome! The sheer number of people made for great people-watching entertainment and so many countries / cultures were represented everywhere I went. It gave a whole new meaning to America being a melting pot of the world.

Speakers: We had some great speakers at Metricon. Every speaker did well, the audience was engaged, and while questions were limited due to time – they took some tough questions and dealt with them appropriately.

Full list of speakers and presentations…

Favorite Sessions: Three of the 11 sessions stood out to me:

Jake Kouns – Cyber Insurance. I enjoyed this talk for a few reasons: a. it is an area of interest I have and b. the talk was easy to understand. I would characterize it as an overview of what cyber insurance is [should be] as well as some of the some of the nuances. Keeping in mind it was an overview – commercial insurance policies can be very complex – especially for large organizations. Some organizations do not buy separate “cyber insurance” policies – but utilize their existing policies to cover potential claims / liability arising from operational information technology failures or other scenarios. Overall – Jake is offering a unique product and while I would like to know more details – he appears to be well positioned in the cyber insurance product space.

Allison Miller / Itai Zukerman – Operationalizing Analytics. Alli and Itai went from 0 to 60 in about 5 seconds. They presented some work that brought together data collection, modeling and analysis- in less then 30 minutes. Itai was asked a question about the underlying analytical engine used – and he just nonchalantly replied ‘I wrote it in Java myself’ – like it was no big deal. That was hot.

Richard Lippman – Metrics for Continuous Network Monitoring. Richard gave us a glimpse of a real-time monitoring application; specifically, tracking un-trusted devices on protected subnets. The demo was very impressive and probably gave a few in the room some ‘metrigasms’ (I heard this phrase from @mrmeritology).

People: All the attendees and speakers were cordial and professional. By the end of the day – the sense of community was stronger then what we started with. A few quick shout-outs:

Behind-the-scenes contributors / organizers. The Usenix staff helped us out a lot over the last few months. We also had some help from Richard Baker who performed some site reconnaissance in an effort to determine video recording / video streaming capabilities – thank you sir. There were a few others that helped in selecting conference topics – you know who you are – thank you!

@chort0 and his lovely fiancé Meredith. They pointed some of us to some great establishments around Union Square. Good luck to the two of you as you go on this journey together.

@joshcorman. I had some great discussion with Josh. While we have only known each other for a few months – he has challenged me to think about questions [scenarios] that no one else is addressing.

+Wendy Nather. Consummate professional. Wendy and I have known of each other for a few years but never met in person prior to Metricon6. We had some great conversation; both professional and personal. She values human relationships and that is more important in my book then just the social networking aspect.

@alexhutton & @jayjacobs – yep – it rocked. Next… ?

All the attendees. Without attendance, there is no Metricon. The information sharing, hallway collaboration and presentation questions contributed greatly to the event. Thank you!

***

So there you go everyone! It was a great event! Keep your eyes and ears open for information about the next Metricon. Consider reanalyzing your favorite conferences and if you are looking for small, intimate and stimulating conferences – filled with thought leadership and progressive mindsets – give Metricon a chance!


Simple Risk Model (Part 4 of 5): Simulating both Loss Frequency & Loss Magnitude

February 5, 2011

Part 1 – Simulate Loss Frequency Method 1
Part 2 – Simulate Loss Frequency Method 2
Part 3 – Simulate Loss Frequency Method 3

In this post we want to combine the techniques demonstrated in parts two and three into a single simulation. To accomplish this simulation we will:

1.    Define input parameters
2.    Introduce VBA code – via a macro – that consumes the input parameters
3.    Perform functions within the VBA code
4.    Take the output from functions and store them in the spreadsheet
5.    Create a histogram of the simulation output.

Steps 3 & 4 will be performed many times; depending on the number of iterations we want to perform in our simulation.

You can download this spreadsheet to use as a reference throughout the post. The spreadsheet should be used in Excel only. The worksheets we are concerned with are:

test – This worksheet contains code that will step through each part of the loss magnitude potion of the simulation. By displaying this information, it allows you to validate that both the code and calculations are functioning as coded. This tab is also useful for testing code in small iterations. Thus, the number of iterations should be kept fairly low (“test”; B1).

prod – Unlike the “test” tab, this tab does not display the result of each loss magnitude calculation per iteration. This is the tab that you would want to run the full simulation on; thousands of iterations.

Here we go…and referencing the “prod” worksheet…

Input Parameters.
Expected Loss Frequency. It is assumed for this post that you have estimated or derived a most likely or average loss frequency value. Cell B2 contains this value. The value in this cell will be one of the input parameters into a POISSON probability distribution to return an inverse cumulative value (Part 2 of this Series).

Average Loss Magnitude. It is assumed for this post that you have estimated or derived a most likely or average loss magnitude value. Cell B3 contains this value. The value in this cell will be one of the input parameters into a NORMAL probability distribution to return an inverse cumulative value (Part 3 of this Series).

Loss Magnitude Standard Deviation. It is assumed for this post that you have estimated or derived the standard deviation for loss magnitude. Cell B4 contains this value. The value in this cell will be one of the input parameters into a NORMAL probability distribution to return an inverse cumulative value (Part 3 of this Series).

The Simulation.
On the “prod” tab, when you click the button labeled “Prod” – this will execute a macro composed of VBA code. I will let you explore the code on your own – it is fairly intuitive. I also left a few comments in the VBA so I remember what certain sections of the code are doing. There are four columns of simulation output that the macro will generate.

Iter# (B10). This is the iteration number. In cell B1 we set the number of iterations to be 5000. Thus, the VBA will cycle through a section of its code 5000 times.

LEF Random (C10). For each iteration, we will generate a random value between 0 and 1 to be used in generating a loss frequency value. Displaying the random value in the simulation is not necessary, but I prefer to see it so I can informally analyze the random values themselves and gauge the relationship between the random value and the inverse cumulative value in the next cell.

LEF Value (D10). For each iteration, we will use the random value we generated in the adjacent cell (column c), combine it with the Expected Loss Frequency value declared in B2 and input these values as parameters into a POISSON probability distribution that returns an inverse cumulative value. The value returned will be an integer – a whole number. Why a whole number? Because you can’t have half a loss event – just like a woman cannot be half pregnant ( <- one of my favorite analogies). This is a fairly important concept to realize from a loss event modeling perspective.

Loss Magnitude (E10). For each iteration, we will consume the value in the adjacent cell (column D) and apply logical rules to it.

a.    If the LEF Value = 0, then the loss magnitude is zero.
b.    If the LEF Value > 0, then for each instance of loss we will:
1.    Generate a random value
2.    Consume the average loss magnitude value in cell B3
3.    Consume the loss magnitude standard deviation in cell B4
4.    Use the values referenced in 1-3 as input parameters into a Normal probability distribution and return an inverse cumulative value. In other words, given a normal distribution with mean $2000 and standard deviation of $1000 – what is the value of that distribution point given a random value between 0 and 1.
5.    We will add all the instances of loss for that iteration and record the sum in column E.

Note: Steps 4 and 5 can be observed on the “test” worksheet by clicking the button labeled “test”.

The code will continue to loop until we have completed the number of iterations we specified in cell B1.

The Results. Now that the simulation is complete we can begin to analyze the output.

# of Iterations With No Loss (B5). This is the number of iterations where the returned inverse cumulative value was zero.

# of Iterations With Loss (B6). This is the number of iterations where the returned inverse cumulative value was greater than zero.

# of Loss Events (B7). This is the sum of loss events for all the iterations. There was some iteration where there was more then one loss event.

Max. # of Loss Events for an iteration (B8). This is the maximum number of loss events for any given iteration.

Next, let’s look at some of the simulation output in the context of loss severity; $.

Min. Loss (K6). This is minimum loss value returned from the simulation. I round the results to the nearest hundred in the worksheet.

Max. Loss (K7). This is maximum loss value returned from the simulation. I round the results to the nearest hundred in the worksheet.

Median (G5). This is the 50th percentile of the simulation results. In other words, 50% of the simulations results were equal to or less then this value.

Average (G6). This is the average loss value for the simulation. This is the quotient of summing all the loss magnitude values and dividing by the number of iterations. This value can quickly be compared to the median to make inferences about the skew of the simulation output.

80th % (G7). This is the 80th percentile of the simulation results. In other words, 80% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-5 loss.

90th % (G8). This is the 90th percentile of the simulation results. In other words, 90% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-10 loss.

95th % (G9). This is the 95th percentile of the simulation results. In other words, 95% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-20 loss.

99th % (G10). This is the 99th percentile of the simulation results. In other words, 99% of the simulations results were equal to or less then this value. In some industries, this is often referred to as the 1-in-100 loss.

Note 2: Generally speaking, the 95th, 99th and greater percentiles are often considered as being part of the tail of the loss distribution. I consider all the points in cells G5:G10 to be useful. For some loss exposures, the median and average values are more than enough to make informed decisions. For some loss exposures; the 80th, 90th, 95th, 99th and even larger percentiles are necessary.

Simulated Loss Magnitude Histogram. A histogram is a graphical representation showing the distribution of data. The histogram in the “prod” worksheet represents the distribution of data for all iterations where the loss was greater than zero.

Wrap Up. What I have presented in this post is a very simple model for a single loss exposure using randomness and probability distributions. Depending on your comfort level with VBA and creativity, one can easily build out more complex models; whether it is hundreds of loss exposures you want to model for or just a few dependent loss exposures.


Risk Fu Fighting

January 31, 2011

If you are an information risk analyst or perform any type of IT risk analysis – you should really consider joining the Society of Information Risks Analysts mailing list. Over the last several weeks there have been some amazing exchanges of ideas, opinions, and spirited debate over the legitimacy and value of risk analysis. Some of the content is no doubt a pre-cursor to the anxiously awaited “Risk Management Smackdown” at RSA on 2/15. Regardless of my role within SIRA or the upcoming RSA debate, the SIRA mailing list is a great resource for learning more about IT risk analysis and IT risk management in general.

Below are a couple of links:

Society of Information Risk Analysts (SIRA)
Society of Information Risk Analysts – Mailing List
Society of Information Risk Analysts – Mailing List Archives (must be subscribed to the mailing list to view)
RSA Session Catalog (filter on security tag “risk management”)


More Heat Map Love

May 11, 2010

In my previous post “Heat Map Love” I attempted to illustrate the relationship between plots on a heat map and a loss distribution. In this post I am going to illustrate another method to show the relationship – hopefully in simpler terms.

In the heat map above I have plotted five example risk issues:

I: Application security; cross-site scripting; external facing application(s); actual loss events between 2-10 times a year; low magnitude per event – less then $10,000.

II: Data confidentiality; lost / stolen mobile device or computer; no hard disk encryption; simulated or actual one loss event per year, low to moderate magnitude per event.

III:  PCI-DSS Compliance; level 2 Visa merchant; not compliant with numerous PCI-DSS standards; merchant self-reports not being in compliance this year; merchant expects monthly fines of $5,000 for a one year total of $60,000.

IV: Malware outbreak; large malware outbreak (greater then 10% of your protected endpoints). Less then once every ten years; magnitude per event could range between $100,000 and $1,000,000; productivity hit, external consulting, etc.

V: Availability; loss of data center; very low frequency; very large magnitude per event.

Since there is a frequency and magnitude of loss associated with each of these issues we can conceptually associate these issues with a loss distribution (assuming that our loss distribution is a normal-like or log normal).

Step 1: Hold a piece of paper with the heat map looking like the image below:

Step 2: Flip the paper towards you so the heat map looks like image below (flip vertical):


Step 3: Rotate the paper counter-clockwise 90 degrees; it should like the image below.


For ease of illustration; let’s overlay a log normal distribution.

What we see is in line with what we discussed in the “Heat Map Love” post:

Risk V – Loss of data center; is driving the tail; very low frequency; very large magnitude.
Risk IV – Malware outbreak; low frequency; but significant or high magnitude.
Risk III – Annual PCI fines from Visa via acquirer / processor; once per year; $60K.
Risk II – Lost or stolen laptop that had confidential information on it; response and notification costs not expected to be significant.
Risk I – Lots of small application security issues; for example cross site scripting; numerous detected and reported instances per year; low cost per event.

There you have it – a less technical way to perform a sniff test on your heat map plots and / or validate against a loss distribution.

Once you have taught everyone how to perform this artwork paper rotation trick. You can have a paper airplane flying contest.


What’s In Your Wallet?

December 28, 2009

A few weeks back I jumped feet first into a blog post at Securosis by David Mortman titled “Changing The Game”. There are a lot of comments but one comment in particular by Rich Mogull has resulted in me doing some soul searching, adding a new question to my bank of interview questions, and forcing me to write a blog post (while on Christmas / New Year’s vacation).

Below is the majority of the comment that Rich made:

“The problem I think we have in infosec is that the economics are skewed to distort risk analysis (see my post on the anonymization of losses), and we fundamentally lack the proper data to make truly informed risk decisions.

I do think we are creeping slowly in the right direction- the Verizon report is one example on the data front, and it’s the main reason we are focusing so much on metrics models.

One area where I do think we need to be cautious is the need in many financial and insurance models to tie everything to monetary value. Since “loss” has a different meaning in the digital world due to us usually not losing access to the asset as with physical loss, the models don’t fully translate.”

So here is my question to you as a reader: What Is Your Information Risk Management Philosophy in regards to risk quantification? Do you even have one?

There is a lot of skepticism in our industry – sometimes packaged as healthy scrutiny – when it comes to the topic of risk quantification and tying loss forms to monetary values. Below are some of my “philosophical” thoughts about Information Risk Management specifically as it pertains to risk quantification.

1.    Security Events / Incidents Have An Opportunity Cost. When something “bad” occurs – it costs the company money to respond. Whether it is “green dollars” going out the door or soft dollars associated with the hourly cost of full time employees responding to the event, the reality is that the company will deal with the incident and that response effort usually takes away from other responsibilities or objectives. We can count green dollars, but counting the internal costs can be more challenging; the size and maturity of the HR/IT organization will factor into the ease of doing this. Bottom line: It costs money.

2.    It Costs Money to Maintain a Security Posture. One of the executives at my company referred to this concept as “anchoring costs”. A perfect example of this is malware protection. A company may spend $125,000 dollars a year in malware maintenance / support fees; a solution that is considered to be 96% effective against malware in the wild with advanced detection / heuristic capabilities.  For simple illustration purposes, let’s state that there are two full-time employees on the malware team ($50K each) – that’s an additional (minimum, excluding benefits, etc..) $100K on top of the $125K to manage, maintain, and support a malware protection capability; a grand total of $225K per year. This is an example of an anchoring cost: the company is spending $225K a year to protect against a malware outbreak or event that could result in loss of productivity – i.e. deliver its value proposition – or prevent data theft / compromise. We could probably spend a few days debating if this particular anchoring cost accounts for the expected amount we would lose in a given year without malware protection or if this annual anchoring cost is to address a risk value further out in a loss distribution (1-in20; 80th percentile, 1-in-100; 99th percentile). Bottom line: It costs money to maintain a security posture.

3.    Overcapitalization. Now we are moving into the ERM space – and this concept may be limited in scope from an industry perspective – but it is evolving and can facilitate decision making in some organizations. Economic capital models account for various types of risk. One of those risk types is operational risk – of which information security and continuity management risks fall under. Below is a broad definition of economic capital (Wikipedia):

“Economic capital is the amount of risk capital, assessed on a realistic basis, which a firm requires to cover the risks that it is running or collecting as a going concern, such as market risk, credit risk, and operational risk.” (BTW, I really like the phrase “assessed on a realistic basis”)

One analogy I read on overcapitalization in the last few days was comparing overcapitalization to an overweight person. Too much weight can lead to health problems and other challenges. In addition, the extra weight inhibits our flexibility and speed.

Assuming that you are quantifying risk issues, and assuming that these data points can be rolled up into an economic capital model – it is clear that risk quantification for the information security / continuity management issues we manage- can contribute to enterprise risk management. I think an argument can be made – especially in the insurance industry – that company leadership has much more opportunity and influence to manage (reduce) operational risk – then other risk types, for example weather / catastrophe risk. Yes, operational risk is probably a very small percentage of economic capital. However, the higher the economic capital amount – the higher cost to the company to maintain that amount and it could reduce their ability to use some of that money for other purposes. In addition, regardless if operational risk is only a tiny percentage of economic capital models – the margin of difference between competing products and competitors in the market place is sometimes so small that reducing just a small percentage of expenses or operational risk – could result in some form of competitive advantage (product pricing, investments, expansion, etc..).

Bottom line: I would rather be contributing to our business in a strategic manner using words, concepts and measurement methods  they are familiar with, versus some qualitative approach that does not lend itself to effective decision making.

4.    Motives. Given the current economic climate, a lot of people (infosec professionals, infosec executives, friends, relatives, etc..) are skeptical of risk models. I understand why. Here is how I professionally reconcile such concerns / skepticisms.

a.    Apples and Oranges. Economic capital models ( and at a smaller level – risk issue quantification) and investment models have different purposes. The former is about ensuring a company can covers its liabilities. The latter – in most cases – is about opportunity – profit.

b.    Motives. I think you have to look at the motives of companies or individuals that are attempting to quantify information security / continuity management risk. What they are trying to do is ensure that their company understands their exposure in the information risk management space. This is where the phrase “assessed on a realistic basis” comes back to mind. Is a sound and repeatable risk assessment methodology being used consistently to assess risks? Are loss forms that are being estimated best case, most likely loss, worse case loss or a combination (distribution) of the three? Are we packaging information that allows effective decision making, or are we “crying wolf” and packaging scare tactics? In most cases, information risk management groups are just trying to give the best information. Yes, there will be misses in either frequency of loss or magnitude of loss – but that is the nature of risk.

So there you have it, some of my thoughts on risk quantification and why I support it passionately. Ask yourself, “Can I defend why I am passionate about my favorite aspect of information risk management?” If not, I challenge you to go through the thought exercises.  I welcome your feedback.

Happy Holidays!


Reputation Risk Q&A – Richard Levick (2 of 2)

August 6, 2009

reputation-balloon

This is part two of a reputation risk Q&A with Mr. Richard Levick; President and CEO of Levick Strategic Communications in Washington, DC.

Part one can be found here.

6. In your opinion, how do you distinguish between worst-case reputation loss versus expected reputation loss?

Richard Levick: One word – experience. That’s how you anticipate what’s coming next and prevent the worst-case scenario from coming to fruition. It’s all about staying one step ahead.

Today, the period of time between the gating event that alerts you to a brand crisis and the bet-the-company moment is increasingly indistinguishable. When video of two Domino’s employees defiling customers’ food was posted to YouTube earlier this year, one million people – a number greater than those who subscribe to The New York Times or The Wall Street Journal – had viewed it within the first 48 hours. What that tells us is that crises now move faster than ever before and that companies have to be ready to act at moment’s notice. That means preventing and responding to reputational risks and crisis needs to be in the DNA. You don’t get that by accident. Or maybe you do, but at a terribly high price.

To do it right and prepare ahead of time means knowing what regulators, Congress, or state attorneys generals are going to do next. It means anticipating the next moves of the plaintiffs’ bar. It means monitoring the blogosphere and other social and digital media for intelligence as to where the traditional media may soon be heading. It means identifying likely company risks now and extrapolating what this means in terms of Search Engine Optimization, High Authority Bloggers, and social media. If you are reading this last sentence and don’t understand what I mean, your company is at far greater risk than you think.

To get started, build a relationship with crisis managers now – before you need them – so that you can build the trust that fast action demands. In crisis, you’ve got to see how the dominos – no pun intended – are lined up and know how they’re going to fall. It’s the only way to keep up with a news cycle that is now measured in minutes, not hours.

7. What are the key controls an information security risk analyst should take into consideration when assessing reputation loss impact (or magnitude)?

Richard Levick: With virtually every traditional journalists now regularly reading blogs for story ideas, careful monitoring of the blogosphere provides invaluable intelligence as to the scope of the reputational damage that may result from IT security breach.

That means knowing the high-authority bloggers – those with the greatest influence over perceptions – that cover your industry. And it also means being ready to engage them should a data breach occur. By bringing bloggers into the fold, companies allow themselves an opportunity to shape the narrative before it influences the traditional commentary to follow – and thus limit the reputational damage potential at play.

8. Do you have any tips for effectively communicating reputation risk to middle management and executive leadership?

Richard Levick: In today’s media environment, the C-Suite has to know that everything it does – or chooses not to do – can potentially impact the corporate brand. That means always thinking like your consumers, investors, regulators, and stakeholders that run the gamut – and taking their perceptions into consideration whenever a decision that could potentially impact these audiences is made.

I think middle managers need to own issues like understanding who the High Authority Bloggers are and having personal relationships; anticipating risks and knowing who controls those terms on the search engines; tracking YouTube, Twitter, and other sites for signs of consumer or stockholder dissatisfaction or industry unrest; and recommending instant positive intervention. Middle managers need to think differently. Today is a good day to start.

9. Do you have a favorite reputation risk engagement that you are willing to share (regardless of outcome)?

Richard Levick: I often look back to what Hasbro did during the 2007 lead-paint scare because it demonstrates how a crisis can be transformed into opportunity if a company articulates leadership in solving the problems at hand.

While Hasbro did not initiate a single recall during the lead paint crisis, the company recognized that its entire industry was under siege. Inaction could have led to guilt by association in the Court of Public Opinion. More important, remaining on the sidelines could have allowed a significant opportunity to differentiate itself from the competition to slip by.

So, rather than sit back and let the competition take the heat, Hasbro stepped up by implementing a “Total Safety Program” and making the initiative a central element of its traditional and online marketing strategies. As a result, the company became the “gold standard” around which all of its competitors were forced to rally. Though it wasn’t directly impacted by the crisis, Hasbro took action to abate it. As a result, its October 2007 earnings jumped 64 percent from the previous year.

10. Are there any good sources of information you can recommend for learning more about this subject?

Richard Levick: I would point to four such resources maintained by my firm…

Levick Strategic Communications’ Bulletproof Blog™ (www.bulletproofblog.com)…

Our e-newsletter, High Stakes™ (http://www.levick.com/resources/highstakes/)…

Our Crisis Communications Desk Reference (http://www.levick.com/crisis_communications_desktop_reference/)…

And our book, Stop The Presses (http://www.levick.com/resources/books/stop_the_presses/).

Also, I would encourage your readers to keep an eye out for our next book, on leadership during crisis in the digital age, which will be coming out in early 2010.

***

I intend on posting some of my thoughts on Richard’s answers in an upcoming post. I hope you found Mr. Levick’s perspective to be as useful and intriguing as I do. Regardless, thank you Richard for participating in this effort; I look forward to continued interactions.


The Risk Is Right.

May 21, 2009

Of particular interest to me right now is the appropriate risk amount to report on for any given issue. Being IT folks –warning broad stroke in progress – we prefer to want “precise” numbers that are not refutable by anyone and are supported by the over-whelming amount of electronic data that we have at our disposal. However, in reality – and in the information security risk management space – we lack such data. As such, there are information security industry super-stars that discourage the idea of taking a stand on quantifying information security risk; and from my perspective – devalue the subject matter expertise (some industry folks water this down to the word “opinion”) that security professionals offer to their organization. I guess I am getting off-topic – so let’s get back to topic: appropriate risk value to report on.

Quite a few risk quantification tools and methodologies tend to produce a risk value often referred to as the “expected loss amount”. Typically, this is the product of a loss event frequency value (LEF for those FAIR-minded folks) and the average monetary loss magnitude. For most information security risk practitioners and the organizations that employ them, the expected loss amount may be the most appropriate risk value to articulate to decision makers for any given risk issue. However, an additional minute or two of analysis of your loss distribution could result in you wanting to articulate a risk amount different then the expected loss amount.

Let’s take a look at some phrases and a few examples.

Loss event frequency: The probable frequency of which we expect a loss to incur.

Average loss magnitude: This is the average (or mean) loss value from a simulation or actual loss events. For example, if I perform 1001 simulations where a value between $1 and $10 dollars is drawn– I would add up the sum of all the simulations and divide it by 1000.

Expected loss magnitude: This is the product of the loss event frequency (most often the mean LEF) and the average loss magnitude. For example, if my loss event frequency is 0.1 per year (once every ten years), and my average loss magnitude is $10,000; my expected loss magnitude would be $1000.

Remember what the median is? The median is the number that is directly in the middle of a range of numbers. For example, if we perform 1001 simulations where a value between $1000 and $20,000 could be drawn and the number in the middle (value number 501, when ordered from lowest to highest) is $10,000 – that is our median.

At this point we have what could be the first comparison in determining which risk value to report. Generally speaking, if the mean and the median are close to each other, then the data set – or loss magnitude values may not be too skewed. If the mean is a lot higher then the median, then this could be the result of large loss magnitude values that are having a significant impact on the mean – somewhat “inflating” the average loss magnitude. The same concept applies is the mean is a lot lower then the median.

In some cases, using the mean loss magnitude to calculate the expected loss magnitude is appropriate. In other cases, the median may be more appropriate because the values influencing the mean are so far out in the distribution – or tail – that it would be inappropriate to use the average loss magnitude.

Now let’s look at another example. We have a risk scenario where the average loss value (per event) is $73,400, and you expect on average, 4 loss events per year. The annual expected loss ($73,400 x 4) is $293,600. However, we are dealing with probabilities and distributions and in reality there could be one year where we only have one loss event related to this specific issue and some years where we might have 10 loss events. How do we deal with this?

I performed a small experiment to help me better understand this.
From a previous risk issue, I derived the mean and standard deviations from the simulated loss event frequency (LEF) values and loss magnitudes (LM) values. In Excel, I wrote a small VBA-macro that allows me to define some simulation parameters and reference both the LEF and LM mean and standard deviation values. For each simulation iteration, the macro generates an LEF value based off a distribution that leverages the LEF mean and standard deviation. Then for each LEF value ( I round to the nearest integer), the macro then generates a loss magnitude value for each loss event and then sums those loss magnitude values. For example, if my LEF is two, then my utility randomly generated two loss values, using a distribution that leverages the LM mean and standard deviation; then sums those two values. The simulation continues until the desired number of iterations is complete. For my small experiment, I performed a simulation consisting of 3001 iterations. You can see the LEF and LM means and standard deviations in the image below.

risk_right_1_090521

Now that we have simulated loss values, we want to visually represent them. I want to represent the values two ways.

risk_right_2_090521

This is a small scatter plot diagram with a smoothed line. In Excel we create loss magnitude bins and count the number of times each iteration’s loss magnitude sum fell into these bins. As you can see the loss magnitude values look normally distributed.

risk_right_3_090521

In this chart, I want to show the percentage of loss magnitude values in relation to the loss amounts themselves. So in this chart, my simulated loss is greater then $14,924; 99.999% of the time. However, there is roughly a 10% chance that the risk could be greater then $404,924.

So what does all of this mean? What it means is that even though our expected loss value was $293,600* – the simulation resulted in the values below:

risk_right_4_090521

The lowest simulated loss magnitude was: $14,924.
The largest simulated loss magnitude was: $620,000.
The mean (average) loss magnitude was: $308,636.
The median of the loss magnitude value was: $309,000.
There is a 20% chance (80th percentile or 1-in-5), that the loss amount could be: $380,000.
There is a 5% chance (95th percentile or 1-in-20), that the loss amount could be: $441,900.

Note: The values above would change from simulation to simulation – but not significantly assuming the input parameters (LEF and LM mean and standard deviation values) remain constant.

Note: It is important to note that the term “tail risk” is usually associated with values at the 97.5th percentile or greater, or less then 2.5% of the time. While the numbers at the 1-in-20 and various tail risk points are tempting to use: please keep in mind that these are low probability / high magnitude loss amounts. Grandstanding on these values just for the shock factor – is the equivalent of crying wolf and undermines the value we can provide to our decision makers.

Now, our decision maker is faced with a harder decision. Do I assume or mitigate the risk associated with an expected loss amount of $308,636 or does this 1-in-5 loss magnitude value of $380,000 stand out to me? While it may seem like we are dealing with a small difference between the mean and the 1-in-5 values – risk tolerance, risk thresholds, and risk management strategies vary between decision makers and organizations.

Here is the take away: as you start going down the risk quantification road keep the following in mind:

1.    There is NO absolute 100% guaranteed predictable loss value – especially from a simulated loss distribution; but you have to report something. Thus choose a tool that lets you see the points from the distribution – not just a single value.

2.    Be mindful of how you articulate risk values. A consistent theme I hear and read about on a regular basis is that risk implies uncertainty – always. You need to underscore this when articulating risk to leadership.

3.    Have the discussion with your management / decision makers as to what loss value they would prefer to see. Their feedback may highly influence the value you report.

4.    Use the right value for the right purpose. For single risk issues, expected loss amounts may be appropriate. For a loss distribution (model) that represents dozens or even hundred of risks – the 1-in-5, 1-in-10, 1-in-20 and maybe some tail risk values may be the best values to react to or budget for.

Have a great Memorial Day weekend!

* In the interest of transparency, the observant reader will notice that my mean LEF is actually 4.17. For simulation purposes, I have rounded generated loss values to the nearest integer. In a given year, you can’t have 4.17 loss events. You would either have 4 or you would have 5. However, if you take the product of 4.17 and $73,400; $306,078 – you will notice that it is within a few thousand dollars of the simulation’s mean and median values.


Risk Scenario – Hidden Field / Sensitive Information (Part 3 of 4)

January 15, 2009

The Assessment (Threat Community B – Initech Novelty, Inc.)

There is some duplicate information from part 2 at the beginning of this assessment to aid some readers who may have landed on this page with out reading Part 1 or Part 2.

In part two of “Hidden Field / Sensitive Information” we assessed the risk for “Threat Community” A – Malware. The FAIR assessment resulted in a risk rating of MEDIUM. As mentioned in Part 2 – there is another Threat Community we need to address and that is Initech Novelty Inc. (INI) itself. Because INI is a PCI merchant and is accountable for the security of its applications that process payment card information, the vulnerability that has been identified and confirmed – in the eyes of the Security Manager – makes INI non-compliant with PCI-DSS. I am assuming the INI is going to declare / update this in their SAQ.

Note: For the “Hidden Field / Sensitive Information” Assessment, I am choosing to perform two assessments; one for each threat community. Usually, I would choose the most likely TCOMM and focus on that, but because there are PCI compliance implications with this scenario – it is appropriate to address that as well.

1.    Identify the Asset(s) at Risk: (Page 3 of the FAIR Basic Risk Assessment Guide; aka BRAG)

a.    Consumer payment card information. Specifically, the payment card primary account number (PAN) and CVV2/CID/CVC2 values, expiration dates, and cardholder name information.

b.    The state of Initech Novelty Inc. PCI Compliance.

2.    Identify the “Threat Community” (TCOMM); (Page 3 of the FAIR BRAG): There are multiple threat communities that pose a threat to the assets described above. For this scenario, the first two communities that come to mind are zero-day malware and Initech Novelty Inc. itself.

a.    Zero-Day Malware. I am choosing this TCOMM for the assessment for several reasons. Most of the INI consumers are accessing the INI ecommerce portal from a PC at their home or what they consider to be a trusted PC. The most like threat to these types of machines / users is malware.

b.    Initech Novelty Inc. (INI). I am selecting INI as a TCOMM for several reasons. First, The INI Security Manager thinks that the security vulnerability no longer makes INI 100% compliant with PCI-DSS. The security manager will be updating the INI PCI Self-Assessment Questionnaire (SAQ) to reflect a gap with requirement 6.5 (specifically 6.5.7). Thus, INI is its own threat because declaring non-compliance subjects them to non-compliance implications.

** The remainder of this post will be focused on TCOMM B – Initech Novelty Inc.**

3.    Threat Event Frequency (TEF): TEF is the probable frequency, within a given time frame, that a threat agent will come into contact and act against an asset. For this step, I am going to select MODERATE or between 1 and 10 times per year. Here is why:

a.    INI is a level three merchant. Level 3 merchants are required to complete a SAQ once per year. SAQs may be updated through-out the year as needed.

b.    Keep in mind that self-reporting non-compliance does not mean a merchant will be fined. But it can be a pre-cursor to being fined (assuming no breach or other related incident).

*NOTE – It may make more sense to skip to Step Five and then come back to Step Four.

4.    Threat Capability (TCAP); (Page 5 of the FAIR BRAG): The probable level of force that a threat agent (within a threat community) is capable of applying against an asset. For this step I am selecting a value of VERY HIGH; meaning that at least 98% of the threat community is capable of applying force (or reporting non-compliance) on INI’s PCI Compliance posture. Here is my reasoning:

a.    INI wants to be ethical and not appear to be covering up vulnerabilities that affect compliance but also could harm consumer confidence in INI.

b.    Given a and some of the information in the Control Resistance section, INI is highly capable of reporting self compliance.

5.    Control Resistance (CR; aka Control Strength); (Page 6 of the FAIR BRAG): The expected effectiveness of controls, over a given time frame, as measured against a baseline level of force. The baseline level of force in this case is going to be the greater threat population. In most scenarios it is usually easy to differentiate between a “threat community” and the “threat population” it is part of. For this particular assessment (TCOMM B), they are the same – Initech Novelty Inc.

Because we are assessing risk in the context of a state of compliance versus more tangible concepts like threats and security controls – there could be some confusion about this step of the assessment. Here is my reasoning for selecting VERY LOW “Control Resistance”:

a.    INI *has* to self report annually. Now a merchant could have found vulnerability after completing an SAQ and make a conscious decision to not update their SAQ – that is their choice and probably a whole different discussion.

b.    In the spirit of optimism, I am assuming that INI takes PCI Compliance seriously and is risk averse in the sense they would rather declare non-compliancy versus have a breach or a related incident and be found to be non-compliant.

c.    Finally, there are no other regulations, standards, or legal barriers (on-going litigation, security investigations, etc…) that would negate the need for INI to not self-report the vulnerability that makes them non-compliant. Thus, they *have* to and *want* to report; making the VERY LOW Control Resistance selection the most appropriate.

6.    Vulnerability (VULN); (Page 7 of the FAIR BRAG): The probability that an asset will be unable to resist the actions of a threat agent. The basic FAIR methodology determines vulnerability via a look-up table that takes into consideration “Threat Capability” and “Control Resistance”.

a.    In step four – Threat Capability (TCAP) – we selected a value of VERY HIGH.

b.    In step five – Control Resistance (CR) – we selected a value of VERY LOW.

c.    Using the TCAP and CR inputs in the Vulnerability table, we are returned with a vulnerability value of VERY HIGH.

7.    Loss Event Frequency (LEF); (Page 8 of the FAIR BRAG): The probable frequency, within a given time frame, that a threat agent will inflict harm upon an asset. The basic FAIR methodology determines LEF via a look-up table that takes into consideration “Threat Event Frequency” and “Vulnerability”.

a.    In step three – Threat Event Frequency (TEF) – we selected a value of MODERATE; between 1 and 10 times per year.

b.    The outcome of step 6 was a VULN value of VERY HIGH.

c.    Using the TEF and VULN inputs in the Loss Event Frequency table, we are returned with a LEF value of MODERATE.

*Note: the loss magnitude table used in the FAIR BRAG and the loss magnitude table for the Initech, Inc. scenarios are different. The Initech loss magnitude table can be viewed at the Initech, Inc. page of this blog.

8.    Estimate Worst-Case Loss (WCL); (Page 9 of the FAIR BRAG): Now we want to start estimating loss values in terms of dollars. For the basic FAIR methodology there are two types of loss: worst case and probable (or expected) loss. The BRAG asks us to: determine the threat action that would most likely result in a worst-case outcome, estimate the magnitude for each loss form associated with that threat action, and sum the loss magnitude. For this step, I am going to select DISCLOSURE in the threat action columns and RESPONSE / FINES & JUDGMENTS / REPUTATION, in the loss form columns, with a WCL value of SIGNIFICANT (between $20,000 and $50,000). Here is why:

a.    The most likely worst case scenario (assuming no breach or related security incident), is that INI gets fined for not being compliant. For level 1 or level 2 merchants we know that monthly fines for non-compliance can be between $5K and $25K. There is also a chance that INI’s processor could increase INI’s transaction fees for not being compliant.

b.    Because INI does not store PAN in its web application, it is hard to envision a scenario where a large number of its consumers will have their payment card information breached all at one time. The Malware TCOMM better addresses this threat.

c.    There is precedence for level three or four merchants being fined in cases of a breach incident. Again, we are not too concerned about a breach scenario but having at least one known fine at this merchant level let’s us contextualize a worst case scenario.

d.    Given the above, I am estimating that worst case, that INI could be fined one or two thousand a month, plus there could be some RESPSONSE costs; a few thousand dollars, and maybe some reputation damage if the non-compliance is reported publicly. My estimate would be more around the $15k-$25k range, but I need to select the pre-defined FAIR BRAG range that best fits my estimates.

9.    Estimate Probable Loss Magnitude (PLM); (Page 10 of the FAIR BRAG): In step eight, we focused on worst-case loss. Now we are going to focus on probable loss. Probable loss is for the most part always going to be lower then “worst case” loss. The BRAG asks us to: determine the threat action that would most likely result in an expected outcome, estimate the magnitude for each loss form associated with that threat action, and sum the loss magnitude. For this step, I am going to select DISCLOSURE in the threat action columns and RESPONSE, in the loss form columns, with a PLM value of LOW (between $1000 and $5,000).:

a.    Since INI is going to update its SAQ and report non-compliance, there is a RESPONSE cost to performing this; between 5 and 10 hours.

b.    Also, the Security Manager estimates that it will take between 20 and 30 hours of development time to mitigate the risk. This includes development, testing, implementation, and validation.

c.    The security manager is assuming an internal resource cost of $75 per hour for the resources required to address a and b above; which results in an estimated cost of $3000.

d.    Keep in mind we are looking for accuracy not precision. So the predefined range of between $1000 and $5000 (LOW) is the most appropriate selection.

10.     Derive and Articulate Risk; (Page 11 of the FAIR BRAG): At this point in the basic FAIR methodology we can now derive a qualitative risk rating. Using the table on page 11 of the BRAG worksheet, we use the LEF value from step seven and the PROBABLE LOSS MAGNITUDE value from step nine to derive our overall qualitative risk label.

a.    LEF value from step seven was MODERATE.

b.    PLM from step nine was LOW.

c.    Overall risk using the BRAG table on page 11 is MEDIUM.

In Part 4, I will summarize the risk assessment findings as well as summarize some possible mitigation solutions.

** Some personal notes on this part of the assessment. The “RESPONSE” loss form type has resulted in some interesting conversations. The concept of soft dollars and hard dollars comes up very often. Some business units only care about hard dollars (dollars going outside the company) and some business units only care about soft dollars (dollars within the company). As you are explaining response costs – it may make sense to highlight this differentiation. The way that I see it is that even by estimating “soft dollars” we are able to show that response takes away from other higher value activities that can help the business achieve its goals. This concept should not be discounted.


Risk and PCI-DSS

December 17, 2008

I recently had lunch with a friend and we spent a lot of time talking about PCI. I am heavily involved in some PCI-related activities at work that has resulted in me knowing more about PCI then I would care to. I wanted to document some of the subject matter we discussed especially with regard to: how to approach a PCI compliance assessment project, states of compliance, and articulating risk related to requirement gaps.

So where to begin? Well, you can start by going to the PCI Standards Council website – there is a load of information there. You can find out:

1.    The merchant level of your business based on the number of card transactions your company performs.
2.    The kind of validation actions you are required to take.
3.    How your “validation actions” need to be validated.

One point I would like to throw out there for companies with numerous subsidiaries is you need to work with your QSA to determine if the subsidiaries that are under your company umbrella are separate from a compliance perspective or if they are under the compliance status of the parent company. Not understanding this early in the process could result in a lot of wasted time.

Another point on this topic is that one needs to understand the flow of PAN through your environment. Visually (and accurately) representing this very early in the process will reduce assumptions and be of great benefit through the assessment lifecycle.

In general, PCI Compliance seems to be very binary – you are either compliant or not; the severity of one’s non-compliance is where the grey is. Non-compliance can result in fines, increased transaction fees, and whatever other penalties exist. I will end this thought with stating that not all QSAs are created equal nor are all payment processors created equal. On the PCI Council website – they advise people to choose QSAs that know your industry – that is great advice. Some payment processors will recommend QSAs for you. Do your homework and make sure it is a QSA you are comfortable with.

Let’s talk risk and PCI. I think of PCI risk different then some folks. I separate the risk of being non-compliant from the risk of the gaps themselves. Let me explain.

Risk associated with just being non-compliant (no incident has occurred): Merchants can be fined for not being compliant. In this type of scenario – the merchant’s payment processor could levy the fine. As a matter of fact, the card associations can fine the payment processor who can then fine the merchant. Fine amounts can vary and there is no guarantee that you will be fined for being out of compliance. Also, it would appear that fines for this type of non-compliance are independent of the number of requirement gaps. So whether a merchant has one requirement gap or dozens – the fine is for being non-compliant. If an organization is not compliant – the expected annualized loss magnitude for not being complaint would be the expected monthly fine amount multiplied by 12.

Risk associated with being non-complaint; an incident has occurred. This is where the pucker factor starts. This is a situation where a merchant has suffered a breach and is non-compliant or possibly even compliant. There are numerous resources out there about the fines that can be levied against the merchants by the card associations, payment processor fines / increases transaction charges, card replacement costs, reputation costs and much, much more. This is a worst case loss scenario from a risk perspective. For large merchants, I would think that monetary impact would easily be hundreds of thousands of dollars to a couple million dollars; more depending on the size of the breach. TJX is a good *worst* worst case benchmark.

Risk associated with the gaps themselves. When it comes to assessing the gaps – or risk issues – I prefer to assess them independent of the compliance status. Even though a single risk issue can result in a state of non-compliance which has its own risk (fine) amounts – what good is it for the risk issue itself to assume the risk amount of not being compliant in cases where you have multiple gaps? So, I prefer to think of them independently. This allows for better mitigation prioritization, cost benefit analysis, and being more objective in how you articulate the risk to the decision makers.

If you think about what I have written you can easily imagine a situation where a merchant could justify remaining out of compliance because the cost of not being compliant (in absence of a breach) is cheaper then becoming compliant. For some companies that may be a viable option (though few would probably ever admit that is their approach), but most companies want to be compliant, want to show due care to their customers / consumers, and do not want to take the chance on a breach and the reputation impact that comes with that.

Finally, for most companies – achieving compliance cannot happen with one stroke of the magic wand. There may be a period of time between gap identification and mitigation, which means the company, needs to manage that risk accordingly. I hope some of my thoughts above might help with that (combined with your legal council input, leadership input, your QSA, etc.)

Now, if the PCI Security Standards Council would take more of a risk based approach in determining level of compliancy and magnitude of fines – that would be pretty cool. And no, using CVSS scores to tag technical vulnerabilities is not really a risk based approach.

** Late addition to the post ** – My next post will be about a positive PCI QSA experience I recently had.


Follow

Get every new post delivered to your Inbox.