Wonder Twin Powers Activate…

March 7, 2013

…form of risk professional. I really miss blogging. The last year or so has been a complete gaggle from a relocation and time-management perspective. So naturally, discretionary activities – like blogging – take a back seat. I want to share a few quick thoughts around the topic of transitioning from a pure information technology / information security mindset to a risk management professional mindset.

1. Embrace the Gray Space. Information technology is all about bits, bytes, ones and zeros. Things either work or don’t work; it is either black or white, it is either good or bad – you get the point. In the discipline of risk management we are interested in everything between the two extremes. It is within this space where there is information to allow decision makers to make more well informed decisions.

2. Embrace Uncertainty. Intuitively, the concept of uncertainty is contrary to a lot of information technology concepts. Foundational risk concepts revolve around understanding and managing uncertainty and infusing it into our analysis / conversation with decision makers. There is no reason why this cannot be done within information risk management programs as well.  At first, it may feel awkward as an IT professional to admit to a leader that there is uncertainty inherent within some of the variables included in your analysis. However, what you will find – assuming you can clearly articulate your analysis – is that infusing the topic of uncertainty in your conversations and analysis has indirect benefits. Such an approach implies rigor, maturity and builds confidence with the decision maker.

3. Find New Friends. Notice I did not type find different friends. There is an old adage that goes something to the effect of “you are who you surround yourself with”. Let me change this up: “you are who you are learning from”. You want to learn risk management? Indulge yourself in non-IT risk management knowledge sources, learn centuries old principles of risk management and then begin applying what you have learned to the information technology / information security problem space. Here are just a few places to begin:

a. https://www.societyinforisk.org/
b. Risk Management Magazine
c. The Risk Management Society
d. Property & Casualty  – Enterprise Risk Management

4. Change Your Thinking. This is going to sound heretical but bear with me. Stop thinking like an IT professional and begin thinking like a business and a risk management professional. Identify and follow the money trails for the various risk management problem spaces you are dealing with. Think like a commercial insurer. An entire industry exists to reduce the uncertainty associated with technology-related, operational risk – when bad things happen. Thus, learn how commercial insurers think so you can manage risk more effectively without having to overspend on third party risk financing products – as well as manage risk in such a way that can tie back to the financials – feelings and emotions. This is why I am so on-board with the AICPCU’s Associate in Risk Management (ARM) professional designation. You can also check out the FAIR risk measurement methodology which is also very useful for associating loss forms to adverse events which can also help tell the story around financial consequences.

5. Don’t Die On That Hill. I have to thank my new boss for this advice. Choose your risk management battles wisely and in the heat of the conversation ask yourself if you need to die on this hill. Not all of our conversations with decision makers, leaders or even between ourselves – as dear colleagues – is easy. It is way too easy for passion to get in the way of progress and influencing. Often, if you find yourself “on the hill” asking if you need to die – something has gone terribly wrong. Instead of dying and ruining a long term relationship – take a few steps back, get more information that will help in the situation, regroup and attack again. This is an example of being a quiet professional.

That is all for now. Take care.


OpenPERT – A FREE Add-In for Microsoft Office Excel

August 15, 2011

INTRODUCTION. In early June of this year, Jay Jacobs and I started having a long email / phone call discussion about risk modeling, model comparisons, descriptive statistics, and risk management in general. At some point in our conversation the topic of Excel add-ins came up and how nice it would be to NOT have to rely upon 3rd party add-ins that cost between hundreds and thousands of dollars to acquire. You can sort of think of the 80-20 rule when it comes to out of the box Excel functionality – though it is probably more like 95-5 depending on your profession – most of the functionality you need to perform analysis is there. However, there are at least two capabilities not included in Excel that are useful for risk modeling and analysis: the betaPERT distribution and Monte Carlo simulation. Thus,  the need for costly 3rd-party add-ins or a free alternative, the OpenPERT add-in.

ABOUT BETAPERT. You can get very thorough explanations about  the betaPERT distribution here, here, and here. What follows is the ‘cliff notes’ version. The betaPERT distribution is often used for modeling subject matter expert estimates in scenarios where there is no data or not enough of it. The underlying distribution is the beta distribution (which is included in Microsoft Office Excel).  If we can over-simply and define a distribution as a collection or range of values – the betaPERT distribution when initially used with three values, such as minimum, most likely (think mode) and maximum values will create a distribution of values (output) that can then be used for statistical analysis and modeling. By introducing a fourth parameter – which I will refer to as confidence, regarding the ‘most likely’ estimate – we can account for the kurtosis – or peakedness – of the distribution.

WHO USES BETAPERT? There are a few professions and disciplines that leverage the betaPERT distribution:

Project Management – The project management profession is most often associated with betaPERT. PERT stands for Program (or Project) Evaluation and Review Technique. PERT was developed by the Navy and Booz-Allen-Hamilton back in the 1950’s (ref.1; see below ) – as part of the Polaris missile program. Anyway, it is often used today in project management for project / task planning and I believe it is covered as part of the PMP certification curriculum.

Risk Analysis / Modeling – There are some risk analysis scenarios where due to a lack of data, estimates are used to bring form to components of scenarios that factor into risk. The FAIR methodology – specifically some tools that leverage the FAIR methodology as applied to IT risk – is such an example of using betaPERT for risk analysis and risk modeling.

Ad-Hoc Analysis – There are many times where having access to a distribution like betaPERT is useful outside the disciplines listed above. For example, if a baker is looking to compare the price of her/his product with the rest of the market – data could be collected, a distribution created, and analysis could occur. Or, maybe a church is analyzing its year to year growth and wants to create a dynamic model that accounts for both probable growth and shrinkage – betaPERT can help with that as well.

OPENPERT ADD-IN FOR MICROSOFT OFFICE EXCEL. Jay and I developed the OpenPERT add-in as an alternative to paying money to leverage the betaPERT distribution. Of course, we underestimated the complexity of not only creating an Excel add-in but also working with the distribution itself and specific cases where divide by zero errors can occur. That said, we are very pleased with version 1.0 of OpenPERT and are excited about future enhancements as well as releasing examples of problem scenarios that are better understood with betaPERT analysis. Version 1.0 has been tested on Microsoft Office Excel 2007 and 2010; on both 32 bit and 64 bit Microsoft Windows operating systems. Version 1.0 of OpenPERT is not supported on ANY Microsoft Office for Mac products.

The project home of OpenPERT is here.

The downloads page is here. Even if you are familiar with the betaPERT distribution, please read the reference guide before installing and using the OpenPERT add-in.

Your feedback is welcome via support@openpert.org

Finally – On behalf of Jay and myself – a special thank you to members of the Society of Information Risk Analysts (SIRA) that helped test and provided feedback on the OpenPERT add-in. Find out more about SIRA here.

Ref. 1 – Malcolm, D. G., J. H. Roseboom, C. E. Clark, W. Fazar Application of a Technique for Research and Development Program Evaluation OPERATIONS RESEARCH Vol. 7, No. 5, September-October 1959, pp. 646-669


Deconstructing Some HITECH Hype

February 23, 2011

A few days ago I began analyzing some model output and noticed that the amount of loss exposure for ISO 27002 section “Communications and Operations Management” had increased by 600% in a five week time frame. It only took a few seconds to zero-in on an issue that was responsible for the increase.

The issue was related to a gap with a 3rd party of which there was some Health Information Technology for Economic and Clinical Health Act (HITECH) fine exposure. The estimated HITECH fines were really LARGE. Large in the sense that the estimates:

a.    Did not pass the sniff test
b.    Could not be justified based off any documented fines / or statutes.
c.    From a simulation perspective were completely skewing the average simulated expected loss value for the scenario itself.

I reached out to better understand the rationale of the practitioner who performed the analysis and after some discussion we were in agreement that some additional analysis was warranted to accurately represent assumptions as well as refine loss magnitude estimates – especially for potential HITECH fines. About 10 minutes of additional information gathering yielded valuable information.

In a nutshell, the HITECH penalty structure is a tiered system that takes into consideration the nature of the data breach, the fine per violation and maximum amounts of fines for a given year. See below (the tier summary is from link # 2 at the bottom of this post; supported by links # 1 and 3):

Tier A is for violations in which the offender didn’t realize he or she violated the Act and would have handled the matter differently if he or she had. This results in a $100 fine for each violation, and the total imposed for such violations cannot exceed $25,000 for the calendar year.

Tier B is for violations due to reasonable cause, but not “willful neglect.” The result is a $1,000 fine for each violation, and the fines cannot exceed $100,000 for the calendar year.

Tier C is for violations due to willful neglect that the organization ultimately corrected. The result is a $10,000 fine for each violation, and the fines cannot exceed $250,000 for the calendar year.

Tier D is for violations of willful neglect that the organization did not correct. The result is a $50,000 fine for each violation, and the fines cannot exceed $1,500,000 for the calendar year.

Given this information and the nature of the control gap – one can quickly determine the penalty tier as well as estimate fine amounts. The opportunity cost to gather this additional information was minimal and the benefits of the additional analysis will result  in not only more accurate and defendable analysis – but also spare the risk practitioner from what would have been certain scrutiny from other IT risk leaders and possibly business partner allegations of IT Risk Management once again “crying wolf”.

Key Take-Away(s)

1.    Perform sniff tests on your analysis; have others review your analysis.
2.    There is probably more information then you realize about the problem space you are dealing with.
3.    Be able to defend assumptions and estimates that you make.
4.    Become the “expert” about the problem space not the repeater of information that may not be valid to begin with.

Links / References associated with this post:

1.    HIPAA Enforcement Rule ref. HITECH <- lots of legalese
2.    HITECH Summary <- less legalese
3.    HITECH Act scroll down to section 13410 for fine information <-lots of legalese
4.    Actual instance of a HITECH-related fine
5.    Interesting Record Loss Threshold Observation; Is 500 records the magic number?


Simple Risk Model (Part 3 of 5): Simulate Loss Magnitude

December 22, 2010

Part 1 – Simulate Loss Frequency Method 1
Part 2 – Simulate Loss Frequency Method 2

In parts one and two of this series we looked at two methods for simulating loss frequency. Method one – while useful – has shortcomings as it primarily requires working with expected loss frequency values less then 1 (once per year). In addition, with method one, it was not possible to determine iterations where loss frequency could be greater then once per year.

Method two overcame these limitations. We leveraged the Poisson probability distribution (discrete distribution) as well as an expected loss frequency value and a random value between 0 and 1 to return a loss value (an integer) for any given iteration. Using this method – about 10% of our iterations resulted in loss events and some of those iterations had multiple loss events. From my perspective method two is the more useful of the two – especially since it has the potential to account for low probability situations where there could be numerous loss events for any simulation iteration.

The purpose of this post is to simulate loss magnitude. Conceptually, we are going to do what we did with loss frequency method two – but our distribution and input parameters will differ. To simulate loss magnitude we need four things:

1.    A continuous probability distribution.
2.    A random value between 0 and 1
3.    An expected or average loss magnitude
4.    A loss magnitude standard deviation

Continuous Probability Distribution. Technically, if you have internal or external loss magnitude data, you would analyze that data and fit the data to an appropriate continuous probability distribution. There are dozens of such distributions. There are often times where we have limited data or we need to make good faith (or “educated”) assumptions about the shape of our loss magnitude curve. A lot of IT risk scenarios loss magnitude curves are often assumed to be normal or lognormal in nature. Normal is often assumed but it has its limitations since there can be negative values and rarely is there a “perfect” normal loss magnitude curve for IT risk scenarios. However, most of the “normal-like” distributions converge to normal (as data points increase). Thus, for the purposes of demonstration I am going to use the normal distribution.

Random Value Between 0 and 1. Because we are dealing with uncertainty and a distribution, we will use random values between 0 and 1 in our probability distribution; think Excel function RAND().

Expected or Average Loss Magnitude. Statistics 101 – If you take the sum of X values and divide by X you get the average. Quantitative risk analysis methodologies like FAIR can facilitate deriving an average loss magnitude estimate. Or maybe, you have actual loss magnitude data points. How you derive average loss magnitude is not the focus of this post – just remember that to use the normal distribution you need that average loss magnitude value.

Loss Magnitude Standard Deviation. More Statistics 101. At a high level, standard deviation is a statistic or measure of how spread out our data points are relative to the mean. The larger the number, the greater or flatter our distribution (think bell curve) will be; the smaller the number – the more narrow the bell curve will be. In the interest of brevity, it is assumed that either you can use existing Excel functions to calculate a standard deviation from your loss magnitude data points, or your risk analysis tool sets will provide this value to you. In some cases you may not have actual data sets to calculate a standard deviation let alone an average magnitude value – in those cases we have to make our best estimates and document assumptions accordingly.

How do these work together? In layman’s terms – given a normal loss distribution with an average loss magnitude of $5000 and a standard deviation of $1000; what is the loss value (inverse cumulative value) at any point in the distribution, given a random probability value?

You may want to download this Excel spreadsheet to reference for the rest of the post (it should work in Excel 2003, Excel 2007 and Excel 2010; I have not tested it on Office for Mac). Reference tab “magnitude” and make sure you view it in Excel and NOT in Google Apps.


a.    The average loss magnitude amount is $5000 (cell B1; tab “magnitude”)

b.    The loss magnitude standard deviation is $1000 (cell B2; tab “magnitude”)

c.    For purposes of demonstration, we will number some cells to reflect the number of iterations (A9:A1008; A9=1; A10=A9+1; drag A10 down to you get to 1000).

d.    In Excel, we would use the =RAND() function to generate the random values in cells B9:B1008.

e.    Now, in column C beginning in cell C9 – we are going to combine a Normal probability distribution with our average loss ($B$1), standard deviation ($B$2) and the random value(s) in column B to return a loss value. In other words, given a normal distribution with mean $5000 and standard deviation of $1000 – what is the value of that distribution given a random value between 0 and 1 – rounded to the nearest 10th? You would type the following in C9 and then drag C9 down to C1008:
=ROUND(MAX(NORMINV(B9,$B$1,$B$2),0),-1)

Let’s dissect this formula.

i.    ROUND. I am going to round the output of this formula to the nearest 10; annotated by the -1.
ii.    MAX. Because we are using the normal distribution and because some values could be less then zero which is not applicable for most IT scenarios, we are going to compare the value generated by the NORMINV function to 0. Which ever is larger is the value that then gets rounded to nearest 10.
iii.    NORMINV. This is the function built into Excel that returns an inverse cumulative value of a normal distribution given a probability, a mean and a standard deviation.

f.    Once you have values in all the cells – hit F9 a few times.

g.    Cell B3 gives the minimum loss value from cells C9 through C1008. The random value associated with the minimum value is probably less then 0.00xxxx.

h.    Cell B4 gives the maximum loss value from cells C9 through C1008. The random value associated with the maximum value is probably greater then 0.99xxxx.

i.    The histogram shows the count of iterations whose loss magnitude values falls within a loss magnitude bin. If you drew a line around the tops of each column it would resemble a bell curve. We expect to get this since we are using the normal distribution.

j.    Press the F9 key; new random values will be generated. Every time you press F9 think of it as a new simulation with 1000 iterations. Press F9 lots of times and you will notice that the histogram changes as well. While individual bin counts will change – the general shape of the histogram does not.

k.    By the way, if you change the average loss magnitude value in cell B1 – the histogram will probably break. But you can change the value in B2 to 500, hit F9 a few times and observer how the bell-curve shape becomes more narrow. Or, change B2 to 2000 and you will see a much flatter bell curve.

KEY TAKE-AWAY(S)

1.    As we did with simulating loss frequency, we leverage randomness to simulate loss magnitude.

2.    While we typically talk about an average loss magnitude value; losses can range in terms of magnitude. Being able to work within a range of loss values gives us a more complete view of our loss potential.

In part four of the series, we will combine loss frequency and loss magnitude into one simulation. For every iteration, we will randomly derive a loss frequency value (an integer) and a loss magnitude value. We will then calculate an expected loss, which is the product of the loss frequency and the loss magnitude values. Perform this cycle thousands or millions of time and you now have an expected loss distribution.


Simple Risk Model (Part 1 of 5): Simulate Loss Frequency #1

October 25, 2010

Let’s start this series by defining risk. I am going to use the FAIR definition of risk which is: the probable frequency and probable magnitude of future loss. From a modeling perspective, I need at least two variables to model the risk for any given risk issue: a loss frequency variable and a loss magnitude variable. Hopefully, you are using a risk analysis methodology that deconstructs risk into these two variables…

The examples I am sharing in this blog series are an example of stochastic modeling. The use of random values as an input to a probability distribution ensures there is variation in the output; thus making it stochastic. The variable output allows for analysis through many different lenses; especially when there are additional (meaningful) attributes associated with any given risk issue (policy section, business unit, risk type, etc…).

Part 1 and 2 of this series will focus on “probable or expected [loss] frequency”. Frequency implies a number of occurrences over a given period of time. Loss events are discrete in nature; there are no “partial” loss events. So, when we see probable loss frequency values like 0.10 or 0.25 – and our time period is a year – we interpret that to mean that there is a 10% or 25% chance of a loss event in any given year. Another way of thinking about it is in terms of time; we expect a loss event once every ten years (0.10) or once every four years (0.25). Make sense?

You may want to download this Excel spreadsheet to reference for the rest of the post (it should work in Excel 2003, Excel 2007 and Excel 2010; I have not tested it on Office for Mac).

Make sure you view it in Excel and NOT Google Apps.

In a simulation, how would we randomly draw loss frequency values for a risk issue whose expected loss frequency is 0.10, or once every ten years? I will share two ways; the first of which is the remainder of this post.

For any simulation iteration, we would generate a random value between 0 and 1; and compare the result to the expected loss value

a.    The stated expected loss frequency is 0.10 (cell B1; tab “loss 1”)

b.    For purposes of demonstration, we will number some cells to reflect the number of iterations (A6:A1005; A6=1; A7=A6+1; drag A7 down to you get to 1000).

c.    In Excel, we would use the =RAND() function to generate the random values in cells B6:B1005.

d.    We would then compare the randomly generated value to the expected loss frequency value in cell B1; with this code in C6 dragged down to C1005:

=IF(B6<=$B$1,1,0)

i.    If the generated random value in cell B6 is equal to or less then 0.1000 (cell B1), then the number of loss events for that iteration is 1.
ii.    If the generated random value in B6 is greater then 0.1000, then the number of loss events for that iteration is 0

e.    Once you have values in all the cells, you can now look at how many iterations resulted in a loss and how many did not. Cell B2 counts the number of iterations you had a loss and cell B3 counts the number of iterations you did not have a simulated loss; their corresponding percentages are next to each other.

f.    The pie chart shows the percentage and count for each loss versus no loss.

g.    Press the F9 key; new random values will be generated. Every time you press F9 think of it as a new simulation with 1000 iterations. Press F9 lots of times and you will notice that in some simulations loss events occur greater then 10% of the time and in some simulations less then 10% of the time.

h.    What you are observing is the effect of randomness. Over a large number of iterations and/or simulations we would expect the loss frequency to converge to 10%.

i.    Another thing worth mentioning, is that output from the RAND() function is uniform in nature. Thus, there is equal probability of all values between 0 and 1 being drawn for any given iteration.

j.    Since our expected loss frequency is 0.1000 and the RAND() functions output is uniform in nature – we would expect to see 10% of our iterations result in loss; some were more and some were less.

There are some limitations with this method for simulating the loss frequency portion of our risk model:

1.    If the expected loss frequency is greater then 1 then using RAND() is not viable, because RAND() only generates values between 0 and 1.

2.    In iterations where you had a loss event; this method does not reflect the actual number of loss events for that iteration. In reality, there could be some iterations (or years) where you have more then one loss event.

Some of the first models I built used this approach for generating loss frequency values. There is usefulness regardless of its simplicity. However, there are other methods to simulate loss frequency that are more appropriate for modeling and overcome the limitations listed above. In the next post, we will use random values, a discreet probability distribution and the expected loss frequency value to randomly generate loss frequency values.

NOTES / DISCLAIMERS: I am intentionally over-simplifying these modeling examples for a few reasons:
1.    To demonstrate that IT Risk modeling is achievable; even to someone that is not an actuarial or modeling professional.
2.    To give a glimpse of the larger forest past some of the trees blocking our view within the information risk management profession.
3.    As with any model – simple or complex – there is diligence involved to ensure that the right probability distributions and calculations are being used; reflective of the data being modeled.
4.    In cases where assumptions are being made in a model; they would be documented.


ANALYTICAL THINKING or F.U.D?

August 28, 2010

Image Source; fudsec.com

SHORT BLOG POST VERSION:
I get a lot of satisfaction from teaching others the FAIR methodology. But equally satisfying is me knowing that I am helping build a culture of analytical thinking for both the class participant and our employer.

LONG BLOG POST VERSION:
This past week I had the privilege of teaching a three-day BASIC FAIR course at my employer. This is the second FAIR course I have taught and I can honestly state that I learned a lot about my company and the course participants; most of which I will be interacting with in the coming months in a consulting capacity.

Teaching the FAIR methodology is very challenging and rewarding. Because people’s preconceived notions of risk are challenged within minutes of being introduced to FAIR – there is no shortage of AH-HAH moments for them as well as no shortage of the instructor being stretched to unimaginable limits to take their examples and questions and view them through the lens of FAIR. I have walked away from both classes feeling like I learned more then they did.

I am currently reading “The Flaw of Averages” by Sam L. Savage. I highly recommend this book for a seasoned information risk practitioner. I will probably reference the book may times in future posts but for this post I want to talk about a sentence or two from Chapter 11; page 85 (hardcover). Savage references Well Fargo in 1997 and how they ‘maintained a culture of analytical thinking’.

So ask yourself this: Does my information risk management program instill a culture of analytical thinking or one of F.U.D. (Fear, Uncertainty & Doubt)?

The FAIR methodology when used correctly will force the practitioner to be analytical. But for an entire information risk management program to require all of its members to go through this training is telling of the culture we are creating. And guess what? This analytical thinking is not limited to our information risk management program. Our practitioners have to be able to explain their risk analysis to those individuals (IT & Business) accountable for the risk and responsible for the mitigation activity.

In summary, I get a lot of satisfaction from teaching others the FAIR methodology. But equally satisfying is me knowing that I am helping build a culture of analytical thinking for both the class participant and our employer.


Heat Map Love

May 6, 2010

First, I would like to welcome Jack Jones *back* to the world of risk blogging. Jack blogged a few weeks ago on the subject of heat maps; “Lipstick on Pigs” and “Lipstick Part II”; prompting a response by Jared Pfost of Third Defense. These are great posts that underscore the need to structure and leverage heat maps in an effective and defensible manner.

The purpose of this post is to share a recent “ah hah” moment involving heat maps and loss distributions. Whether you are an advocate or not of risk quantification or simulation modeling – it is hard to criticize one for having tools or procedures in place that essentially serve as a “risk sniff test”. I consider reconciling portions of a loss distribution to a heat map – a pretty useful sniff test.

QUESTION TO BE ANSWERED: How do I reconcile – or validate – the plotting of a heat map bubble with a loss distribution?

Well, it depends…but let’s establish some context.

•    5-by-5 heat map. The X axis of the heat map represents “Frequency of Loss”; the Y axis represents “Magnitude of Loss”. Each axis is broken into 5 sections.

•    Let’ say we have a heat map whose bubbles represent categories of risk issues (ISO 27002 categories, BASEL II OpRisk Categories, etc…).

•    At a minimum, all of these issues have been assessed with some methodology and/or tool (I prefer FAIR) that allow us to associate the frequency and magnitude of loss for each and every issue.

•    We can perform thousands of simulation iterations for each and every issue in the risk repository, perform analysis, determine categories of risk that are contributing the most to various percentiles of the loss distribution, and then associate them with a heat map.

•    For the purpose of this post we are going to make a good faith assumption that our loss distribution resembles a log-normal or normal “like” distribution.

Back in my “Rainbow Risk” post I shared an example of a “rainbow chart”; a 100% stacked bar chart representing the contribution of loss that a category contributes (by percentage) to any given loss distribution percentile. For example, in the rainbow chart on that post, it showed that Business Continuity Mgmt category of risk issues accounted for about 55% of the risk in the 99th percentile. On a heat map, most significant IT Business Continuity issues are probably going to be very low frequency, very high magnitude events. Thus, it is fairly intuitive that very low frequency / very high frequency magnitude loss events would “drive” the tail of a given loss distribution.

In the images below, I have mapped areas on a heat map (image 1) to areas on a distribution (image 2). Specifically, I am trying to illustrate how frequency and magnitude for any given issue factors into or most likely represented in a loss distribution.

Image 1
Image 2

Area A – Very low frequency, very high magnitude risk issues. These types of events or risk issues drive the tail portion of a loss distribution.

Area B – Very low frequency, moderate or high magnitude risk issues; or low to moderate frequency, very high magnitude loss events. It can be said that these type of issues also drive the tail – but maybe not as much past the 99th percentile like issues associated with Area A.

Area C – Low to Moderate frequency, moderate or high magnitude.  These issues are best represented in the middle of the distribution; generally speaking, around one standard deviation on both sides of the mean.

Area D – Very frequent, moderate or high magnitude. Loss associated with these issues is not as severe as those of Areas A and B; but are typically greater then the mean expected loss.

Area E – Very frequent, very high magnitude. Generally speaking, these issues probably drive the portion of the distribution between 1 and 2.5 standard deviations (to the right of the mean).

Area F – Low or moderate frequency, low or moderate magnitude. These issues best factor into the area of the distribution left of the mean. Loss associated with these issues is less then the mean.

In closing, I would share at least one use case for performing this analysis or validation. Key risk heat maps. If all of your issues have frequency and magnitude values as well as some other attributes associated with the issue, you can:

1.    Perform simulations on all of these issues.
2.    Calculate their contributions to various distribution percentiles
3.    Analyze the results by various attributes (ISO 27002, BASEL II, IT Process, etc…).
4.    Chart derived information (categories of risk) on a heat map
5.    Review for plausibility / accuracy (this should occur all the time)

I welcome any feedback!


The Risk Is Right.

May 21, 2009

Of particular interest to me right now is the appropriate risk amount to report on for any given issue. Being IT folks –warning broad stroke in progress – we prefer to want “precise” numbers that are not refutable by anyone and are supported by the over-whelming amount of electronic data that we have at our disposal. However, in reality – and in the information security risk management space – we lack such data. As such, there are information security industry super-stars that discourage the idea of taking a stand on quantifying information security risk; and from my perspective – devalue the subject matter expertise (some industry folks water this down to the word “opinion”) that security professionals offer to their organization. I guess I am getting off-topic – so let’s get back to topic: appropriate risk value to report on.

Quite a few risk quantification tools and methodologies tend to produce a risk value often referred to as the “expected loss amount”. Typically, this is the product of a loss event frequency value (LEF for those FAIR-minded folks) and the average monetary loss magnitude. For most information security risk practitioners and the organizations that employ them, the expected loss amount may be the most appropriate risk value to articulate to decision makers for any given risk issue. However, an additional minute or two of analysis of your loss distribution could result in you wanting to articulate a risk amount different then the expected loss amount.

Let’s take a look at some phrases and a few examples.

Loss event frequency: The probable frequency of which we expect a loss to incur.

Average loss magnitude: This is the average (or mean) loss value from a simulation or actual loss events. For example, if I perform 1001 simulations where a value between $1 and $10 dollars is drawn– I would add up the sum of all the simulations and divide it by 1000.

Expected loss magnitude: This is the product of the loss event frequency (most often the mean LEF) and the average loss magnitude. For example, if my loss event frequency is 0.1 per year (once every ten years), and my average loss magnitude is $10,000; my expected loss magnitude would be $1000.

Remember what the median is? The median is the number that is directly in the middle of a range of numbers. For example, if we perform 1001 simulations where a value between $1000 and $20,000 could be drawn and the number in the middle (value number 501, when ordered from lowest to highest) is $10,000 – that is our median.

At this point we have what could be the first comparison in determining which risk value to report. Generally speaking, if the mean and the median are close to each other, then the data set – or loss magnitude values may not be too skewed. If the mean is a lot higher then the median, then this could be the result of large loss magnitude values that are having a significant impact on the mean – somewhat “inflating” the average loss magnitude. The same concept applies is the mean is a lot lower then the median.

In some cases, using the mean loss magnitude to calculate the expected loss magnitude is appropriate. In other cases, the median may be more appropriate because the values influencing the mean are so far out in the distribution – or tail – that it would be inappropriate to use the average loss magnitude.

Now let’s look at another example. We have a risk scenario where the average loss value (per event) is $73,400, and you expect on average, 4 loss events per year. The annual expected loss ($73,400 x 4) is $293,600. However, we are dealing with probabilities and distributions and in reality there could be one year where we only have one loss event related to this specific issue and some years where we might have 10 loss events. How do we deal with this?

I performed a small experiment to help me better understand this.
From a previous risk issue, I derived the mean and standard deviations from the simulated loss event frequency (LEF) values and loss magnitudes (LM) values. In Excel, I wrote a small VBA-macro that allows me to define some simulation parameters and reference both the LEF and LM mean and standard deviation values. For each simulation iteration, the macro generates an LEF value based off a distribution that leverages the LEF mean and standard deviation. Then for each LEF value ( I round to the nearest integer), the macro then generates a loss magnitude value for each loss event and then sums those loss magnitude values. For example, if my LEF is two, then my utility randomly generated two loss values, using a distribution that leverages the LM mean and standard deviation; then sums those two values. The simulation continues until the desired number of iterations is complete. For my small experiment, I performed a simulation consisting of 3001 iterations. You can see the LEF and LM means and standard deviations in the image below.

risk_right_1_090521

Now that we have simulated loss values, we want to visually represent them. I want to represent the values two ways.

risk_right_2_090521

This is a small scatter plot diagram with a smoothed line. In Excel we create loss magnitude bins and count the number of times each iteration’s loss magnitude sum fell into these bins. As you can see the loss magnitude values look normally distributed.

risk_right_3_090521

In this chart, I want to show the percentage of loss magnitude values in relation to the loss amounts themselves. So in this chart, my simulated loss is greater then $14,924; 99.999% of the time. However, there is roughly a 10% chance that the risk could be greater then $404,924.

So what does all of this mean? What it means is that even though our expected loss value was $293,600* – the simulation resulted in the values below:

risk_right_4_090521

The lowest simulated loss magnitude was: $14,924.
The largest simulated loss magnitude was: $620,000.
The mean (average) loss magnitude was: $308,636.
The median of the loss magnitude value was: $309,000.
There is a 20% chance (80th percentile or 1-in-5), that the loss amount could be: $380,000.
There is a 5% chance (95th percentile or 1-in-20), that the loss amount could be: $441,900.

Note: The values above would change from simulation to simulation – but not significantly assuming the input parameters (LEF and LM mean and standard deviation values) remain constant.

Note: It is important to note that the term “tail risk” is usually associated with values at the 97.5th percentile or greater, or less then 2.5% of the time. While the numbers at the 1-in-20 and various tail risk points are tempting to use: please keep in mind that these are low probability / high magnitude loss amounts. Grandstanding on these values just for the shock factor – is the equivalent of crying wolf and undermines the value we can provide to our decision makers.

Now, our decision maker is faced with a harder decision. Do I assume or mitigate the risk associated with an expected loss amount of $308,636 or does this 1-in-5 loss magnitude value of $380,000 stand out to me? While it may seem like we are dealing with a small difference between the mean and the 1-in-5 values – risk tolerance, risk thresholds, and risk management strategies vary between decision makers and organizations.

Here is the take away: as you start going down the risk quantification road keep the following in mind:

1.    There is NO absolute 100% guaranteed predictable loss value – especially from a simulated loss distribution; but you have to report something. Thus choose a tool that lets you see the points from the distribution – not just a single value.

2.    Be mindful of how you articulate risk values. A consistent theme I hear and read about on a regular basis is that risk implies uncertainty – always. You need to underscore this when articulating risk to leadership.

3.    Have the discussion with your management / decision makers as to what loss value they would prefer to see. Their feedback may highly influence the value you report.

4.    Use the right value for the right purpose. For single risk issues, expected loss amounts may be appropriate. For a loss distribution (model) that represents dozens or even hundred of risks – the 1-in-5, 1-in-10, 1-in-20 and maybe some tail risk values may be the best values to react to or budget for.

Have a great Memorial Day weekend!

* In the interest of transparency, the observant reader will notice that my mean LEF is actually 4.17. For simulation purposes, I have rounded generated loss values to the nearest integer. In a given year, you can’t have 4.17 loss events. You would either have 4 or you would have 5. However, if you take the product of 4.17 and $73,400; $306,078 – you will notice that it is within a few thousand dollars of the simulation’s mean and median values.


Stuart King – Information Security Annoyances – Response 2

March 21, 2009

In my last post, I provided some thoughts on one of Stuart King’s Top 5 Information Security Annoyances; specifically, security awareness programs. In this post, I want to touch on Stuart’s comments regarding Risk Modeling. Here are Stuart’s thoughts:

“Many “experts” preach the importance of working through risk models. It’s a load of tosh. No matter which way you try to do it, you’ll always come out with the answer you first thought of.  You might as well use a crystal ball and read tarot cards. Nobody needs to work through a complex risk model to understand that if a retail website suffers a denial of service that it’ll have some financial consequences, or  that if the internet connection is lost that there wont be  access to the..er..Internet. I’ve got better, more constructive and practical ways to spend my day than conspiring over risk models. Much more relevant is threat modelling – understand your systems and know the business so that you can make relevant risk-based decisions.”

In November of 2008, I posted a rebuttal regarding Stuart’s dislike for my approach to risk assessments. I am still convinced that Stuart’s approach is more a vulnerability assessment rather then a risk assessment – the latter of which focuses more on frequency of loss and impact while also accounting for how “vulnerable” something is. So, it is no wonder that Stuart is down on risk modeling; if the risk assessment foundation he is using is cracked, then any risk model built on top of it is probably flawed.

So what is a risk model? It means different things to different people. But here is a general description that I like from the Inter-American Development Bank : “A mathematical, graphical or verbal description of risk for a particular environment and set of activities within that environment. Useful in Risk Assessment for consistency, training and documentation of the assessment.”

Now, modeling activities themselves can be both complex and simple. I *think* that the complexity that Stuart may be referring to is more in the context of the modeling activity versus the output, or the model itself. However, information professionals can still model risk without being have degrees in statistics, being an actuarial, or attending months of technical training.  Let me explain…

Effective does not have to be expensive or complex.

simple_model

First, I beg your pardon for the image above – as it truly does push the limits of my standards for public-facing decency – but there is a real story behind this picture (essentially a risk model). My first real IT job outside the Marine Corps was for a holding company in Washington, DC of which there were five subsidiaries (two lobbying firms, two public relations firms, and a crisis management firm). The year was 1998 and our company had just hired its first CIO, who to this day is still one of a handful of folks I consider a close friend. The picture above is a representation of what our new CIO drew to the CFO of the holding company to justify purchasing a new firewall – little explanation needed. It worked. A few weeks after he presented the risk model – we were installing a Raptor firewall and were no longer relying on a Cisco router with NAT capabilities to protect our edge.

risktical_pi_chart

The image above is referred to as a Probability / Impact (P-I) Chart. It is often generically referred to as a heat map. For every risk issue and subsequent risk assessment, there is an associated loss event frequency and expected impact – that can be plotted within a P-I chart. These are not very complex to create and are very flexible. Combine some creativity with flexibility and you can visually represent risk issues in appealing ways. The ranges can be modified to be more reflective of thresholds for your particular company. It is definitely not as crude as the CIO/Firewall image above, and it allows us to plot numerous risk points. Finally, these charts are great tools for helping to prioritize which risks to mitigate first.

annloss_curve1

Above is an annualized “expected loss” curve that was produced by a risk tool I work with on a regular basis. Most tools of this nature leverage Monte-Carlo or Latin Hypercube simulation capabilities. It took only a few minutes to plug in the variables that the simulation model needs to perform the simulation (I use the FAIR methodology). For this particular risk issue, I asked the tool to perform 1000 Monte Carlo simulation iterations. It took about 8 seconds to perform. The output of the simulation gives me the expected loss event frequency and expected loss amount – both if which could be modeled like above. However, the curve above is the annualized risk curve. The annualized risk value is achieved by multiplying the expected loss event frequency by the expected loss amount. Do this a 1000 times and you get the curve above. Again, the tool I use does this all for me – in about 8 seconds. What this curve tells me is that about 90% of the simulations resulted in expected loss amounts of less then $80,000.

In closing, please understand that there are very simple and affordable risk assessment and risk model tools available to you. Most IT security risks do not require complex risk models or tools that can take hours, days, months, or even years to build – let alone simulate. Tremendous progress has been made in the last 10-15 years that gives security practitioners like ourselves capabilities that scientists and engineers only dreamed of as recent as 20 years ago.

Let’s stop hobbling ourselves and instead empower ourselves to make as big of a positive impact as possible to our employers as well as our profession. Be creative, educate yourself, be part of the solution – not part of the problem, periodically reassess your skills. This goes for Computer Weekly and the bloggers / writers they hire as well.


Risk Scenario – Hidden Field / Sensitive Information (Part 3 of 4)

January 15, 2009

The Assessment (Threat Community B – Initech Novelty, Inc.)

There is some duplicate information from part 2 at the beginning of this assessment to aid some readers who may have landed on this page with out reading Part 1 or Part 2.

In part two of “Hidden Field / Sensitive Information” we assessed the risk for “Threat Community” A – Malware. The FAIR assessment resulted in a risk rating of MEDIUM. As mentioned in Part 2 – there is another Threat Community we need to address and that is Initech Novelty Inc. (INI) itself. Because INI is a PCI merchant and is accountable for the security of its applications that process payment card information, the vulnerability that has been identified and confirmed – in the eyes of the Security Manager – makes INI non-compliant with PCI-DSS. I am assuming the INI is going to declare / update this in their SAQ.

Note: For the “Hidden Field / Sensitive Information” Assessment, I am choosing to perform two assessments; one for each threat community. Usually, I would choose the most likely TCOMM and focus on that, but because there are PCI compliance implications with this scenario – it is appropriate to address that as well.

1.    Identify the Asset(s) at Risk: (Page 3 of the FAIR Basic Risk Assessment Guide; aka BRAG)

a.    Consumer payment card information. Specifically, the payment card primary account number (PAN) and CVV2/CID/CVC2 values, expiration dates, and cardholder name information.

b.    The state of Initech Novelty Inc. PCI Compliance.

2.    Identify the “Threat Community” (TCOMM); (Page 3 of the FAIR BRAG): There are multiple threat communities that pose a threat to the assets described above. For this scenario, the first two communities that come to mind are zero-day malware and Initech Novelty Inc. itself.

a.    Zero-Day Malware. I am choosing this TCOMM for the assessment for several reasons. Most of the INI consumers are accessing the INI ecommerce portal from a PC at their home or what they consider to be a trusted PC. The most like threat to these types of machines / users is malware.

b.    Initech Novelty Inc. (INI). I am selecting INI as a TCOMM for several reasons. First, The INI Security Manager thinks that the security vulnerability no longer makes INI 100% compliant with PCI-DSS. The security manager will be updating the INI PCI Self-Assessment Questionnaire (SAQ) to reflect a gap with requirement 6.5 (specifically 6.5.7). Thus, INI is its own threat because declaring non-compliance subjects them to non-compliance implications.

** The remainder of this post will be focused on TCOMM B – Initech Novelty Inc.**

3.    Threat Event Frequency (TEF): TEF is the probable frequency, within a given time frame, that a threat agent will come into contact and act against an asset. For this step, I am going to select MODERATE or between 1 and 10 times per year. Here is why:

a.    INI is a level three merchant. Level 3 merchants are required to complete a SAQ once per year. SAQs may be updated through-out the year as needed.

b.    Keep in mind that self-reporting non-compliance does not mean a merchant will be fined. But it can be a pre-cursor to being fined (assuming no breach or other related incident).

*NOTE – It may make more sense to skip to Step Five and then come back to Step Four.

4.    Threat Capability (TCAP); (Page 5 of the FAIR BRAG): The probable level of force that a threat agent (within a threat community) is capable of applying against an asset. For this step I am selecting a value of VERY HIGH; meaning that at least 98% of the threat community is capable of applying force (or reporting non-compliance) on INI’s PCI Compliance posture. Here is my reasoning:

a.    INI wants to be ethical and not appear to be covering up vulnerabilities that affect compliance but also could harm consumer confidence in INI.

b.    Given a and some of the information in the Control Resistance section, INI is highly capable of reporting self compliance.

5.    Control Resistance (CR; aka Control Strength); (Page 6 of the FAIR BRAG): The expected effectiveness of controls, over a given time frame, as measured against a baseline level of force. The baseline level of force in this case is going to be the greater threat population. In most scenarios it is usually easy to differentiate between a “threat community” and the “threat population” it is part of. For this particular assessment (TCOMM B), they are the same – Initech Novelty Inc.

Because we are assessing risk in the context of a state of compliance versus more tangible concepts like threats and security controls – there could be some confusion about this step of the assessment. Here is my reasoning for selecting VERY LOW “Control Resistance”:

a.    INI *has* to self report annually. Now a merchant could have found vulnerability after completing an SAQ and make a conscious decision to not update their SAQ – that is their choice and probably a whole different discussion.

b.    In the spirit of optimism, I am assuming that INI takes PCI Compliance seriously and is risk averse in the sense they would rather declare non-compliancy versus have a breach or a related incident and be found to be non-compliant.

c.    Finally, there are no other regulations, standards, or legal barriers (on-going litigation, security investigations, etc…) that would negate the need for INI to not self-report the vulnerability that makes them non-compliant. Thus, they *have* to and *want* to report; making the VERY LOW Control Resistance selection the most appropriate.

6.    Vulnerability (VULN); (Page 7 of the FAIR BRAG): The probability that an asset will be unable to resist the actions of a threat agent. The basic FAIR methodology determines vulnerability via a look-up table that takes into consideration “Threat Capability” and “Control Resistance”.

a.    In step four – Threat Capability (TCAP) – we selected a value of VERY HIGH.

b.    In step five – Control Resistance (CR) – we selected a value of VERY LOW.

c.    Using the TCAP and CR inputs in the Vulnerability table, we are returned with a vulnerability value of VERY HIGH.

7.    Loss Event Frequency (LEF); (Page 8 of the FAIR BRAG): The probable frequency, within a given time frame, that a threat agent will inflict harm upon an asset. The basic FAIR methodology determines LEF via a look-up table that takes into consideration “Threat Event Frequency” and “Vulnerability”.

a.    In step three – Threat Event Frequency (TEF) – we selected a value of MODERATE; between 1 and 10 times per year.

b.    The outcome of step 6 was a VULN value of VERY HIGH.

c.    Using the TEF and VULN inputs in the Loss Event Frequency table, we are returned with a LEF value of MODERATE.

*Note: the loss magnitude table used in the FAIR BRAG and the loss magnitude table for the Initech, Inc. scenarios are different. The Initech loss magnitude table can be viewed at the Initech, Inc. page of this blog.

8.    Estimate Worst-Case Loss (WCL); (Page 9 of the FAIR BRAG): Now we want to start estimating loss values in terms of dollars. For the basic FAIR methodology there are two types of loss: worst case and probable (or expected) loss. The BRAG asks us to: determine the threat action that would most likely result in a worst-case outcome, estimate the magnitude for each loss form associated with that threat action, and sum the loss magnitude. For this step, I am going to select DISCLOSURE in the threat action columns and RESPONSE / FINES & JUDGMENTS / REPUTATION, in the loss form columns, with a WCL value of SIGNIFICANT (between $20,000 and $50,000). Here is why:

a.    The most likely worst case scenario (assuming no breach or related security incident), is that INI gets fined for not being compliant. For level 1 or level 2 merchants we know that monthly fines for non-compliance can be between $5K and $25K. There is also a chance that INI’s processor could increase INI’s transaction fees for not being compliant.

b.    Because INI does not store PAN in its web application, it is hard to envision a scenario where a large number of its consumers will have their payment card information breached all at one time. The Malware TCOMM better addresses this threat.

c.    There is precedence for level three or four merchants being fined in cases of a breach incident. Again, we are not too concerned about a breach scenario but having at least one known fine at this merchant level let’s us contextualize a worst case scenario.

d.    Given the above, I am estimating that worst case, that INI could be fined one or two thousand a month, plus there could be some RESPSONSE costs; a few thousand dollars, and maybe some reputation damage if the non-compliance is reported publicly. My estimate would be more around the $15k-$25k range, but I need to select the pre-defined FAIR BRAG range that best fits my estimates.

9.    Estimate Probable Loss Magnitude (PLM); (Page 10 of the FAIR BRAG): In step eight, we focused on worst-case loss. Now we are going to focus on probable loss. Probable loss is for the most part always going to be lower then “worst case” loss. The BRAG asks us to: determine the threat action that would most likely result in an expected outcome, estimate the magnitude for each loss form associated with that threat action, and sum the loss magnitude. For this step, I am going to select DISCLOSURE in the threat action columns and RESPONSE, in the loss form columns, with a PLM value of LOW (between $1000 and $5,000).:

a.    Since INI is going to update its SAQ and report non-compliance, there is a RESPONSE cost to performing this; between 5 and 10 hours.

b.    Also, the Security Manager estimates that it will take between 20 and 30 hours of development time to mitigate the risk. This includes development, testing, implementation, and validation.

c.    The security manager is assuming an internal resource cost of $75 per hour for the resources required to address a and b above; which results in an estimated cost of $3000.

d.    Keep in mind we are looking for accuracy not precision. So the predefined range of between $1000 and $5000 (LOW) is the most appropriate selection.

10.     Derive and Articulate Risk; (Page 11 of the FAIR BRAG): At this point in the basic FAIR methodology we can now derive a qualitative risk rating. Using the table on page 11 of the BRAG worksheet, we use the LEF value from step seven and the PROBABLE LOSS MAGNITUDE value from step nine to derive our overall qualitative risk label.

a.    LEF value from step seven was MODERATE.

b.    PLM from step nine was LOW.

c.    Overall risk using the BRAG table on page 11 is MEDIUM.

In Part 4, I will summarize the risk assessment findings as well as summarize some possible mitigation solutions.

** Some personal notes on this part of the assessment. The “RESPONSE” loss form type has resulted in some interesting conversations. The concept of soft dollars and hard dollars comes up very often. Some business units only care about hard dollars (dollars going outside the company) and some business units only care about soft dollars (dollars within the company). As you are explaining response costs – it may make sense to highlight this differentiation. The way that I see it is that even by estimating “soft dollars” we are able to show that response takes away from other higher value activities that can help the business achieve its goals. This concept should not be discounted.


Follow

Get every new post delivered to your Inbox.