The Boy Who Cried "Terrorist" instead of "Wolf"

Doha, 4th April 2017

Review by: Bachir El Nakib, Senior Consultant, Compliance Alert (LLC)

The classic fable of the Boy who Cried “Wolf” a young boy thinks he sees a menacing wolf and cries out to the townspeople, “Wolf!” The townspeople come to examine the danger and find no wolf, they presume he is mistaken. Soon thereafter the boy, sensing another menacing wolf, cries out again, “Wolf!” The townspeople again investigate and again find nothing, perhaps now they think the boy is doing it for attention. This continues on a few more times until finally, the townspeople no longer respond to the boy's cry of “Wolf,” because they believe they will again find out it is a false alarm.(They no longer even care why the boy is doing it, they no longer give it any thought at all.) One day a real wolf comes along and the boy cries “Wolf!” to no avail. The boy gets eaten by the wolf. The moral of the story is “if you are  a consistent liar, (or wrong,) you should not be surprised when people don't believe you.  (This is markedly different than Aesop's version where the boy is tricking the townspeople because he is bored. My version introduces error and belief into consideration.) 

So what is really happening here? What is the boy's intention? To make people aware there is a dangerous wolf and to be rescued from that danger. (Presumably.) However, we must only presume this, we can't actually know it because we have only the boy's word to go on. We must accept the boy's social engineering because there is no experiential wolf to be found. At first, the townspeople are willing to give him the benefit of the doubt. In time, with repeated false alarms, the boys' intention begins to be seen as manipulation rather than persuasion. This is the product of reduced transparency, (we don't know what the actual intention is, because there is nary a wolf to be found, which would be the best experience of the intention) and increased force, (the repetition.) Finally, they completely ignore the boy who cries “wolf,” to his peril. The townspeople had the power to change the outcome and didn't, by doing nothing. The boy had the power to change the outcome and couldn't, even though he tried. There are two different interpretations of the boy's “crying wolf” paradigm. For the boy, for some reason unknown to us, his intention keeps demonstrating a false result to the townspeople. For the townspeople, this results in the boy's intention being considered false. The boy has changed the results of his intention and his conditions are no longer being satisfied. 

What if the boy wasn't crying “wolf” but instead “terrorist?” How many times could the boy cry out without showing any results before the people stopped coming to his rescue? Is this scenario any different? In both cases the intentions have come up false until action has taken place. In other words, due to so many false alarms, the wolf needs to eat the boy in order for us to believe him. On the other hand, the terrorist needs to terrorize if we keep getting false positives. Otherwise, we simply stop believing in the intention. Action speak louder than ideas unless the action is an idea. “Crying wolf” is only an action if the idea is understood and accepted. Once it is not, it has become something else. Not knowing the prior intention of the boy is a detriment because we are unable to evaluate it, however it is irrelevant because the intention in action has proven it false. We, as observers outside of the engineer of the intention, can only interpret it as we do. It should be expected that false positives would yield false responses. 

Simply put, we thought the wolf wasn't real. It turns out the wolf was real, we know this because we can go experience the boys' torn up, half-eaten corpse. In this way, it was the wolf that made the boys' intentions a reality, it is the wolf who cried, “boy!” However, this couldn't have happened without the demonstration of the intention, the rejection of the intentions meaning and finally the bloody proof. This makes the boy's death causally self-referential. The boy had the “cry wolf” paradigm, used it, falsely, repeatedly, until it lost any meaning. Then when he needed it, it wouldn't work anymore and he was killed. If you didn't know the story, if you were a Detective showing up on the scene, you could assume that he was killed by a wolf and leave it at that. Once you were informed by the townspeople of how they didn't bother helping him because of all the false alarms, who would you blame for the boys death, beyond the wolf? I'm guessing the boy. After all it was his engineering that programmed the townspeople not to respond, or rather, to respond a certain way.

Sometime the discovery of a false positive can be a relief, but in AML/CFT prevention, that relief would be misplaced. A false screening represents an insulted customer, a potentially loyal buyer who was rejected by an over-cautious fraud prevention system. Chances are, given all the alternatives open to them, this annoyed and frustrated consumer won’t be coming back.

“Over-screening” is prevalent, with watch-lists including targets from distant sanctions regimes and “PEPs” who are not really PEPs at all. Meanwhile, financial institutions (FIs) tend not to screen domestic transactions due to a tacit agreement across the industry to rely upon each other’s customer screening controls. And, although systems are in place to monitor the money laundering risk associated with transactional behavior, no such controls are in place to evaluate the money laundering risk of transactions from the perspective of who they involve. An advanced sanctions prevention and Anti-Money Laundering (AML) screening program would consider:

1. Tailoring the content and application of watch-lists to the organization’s risk profile and appetite — significant operational cost savings can be made by streamlining watch lists in the following ways:

► Identifying a “core” set of sanctions lists to use for all customer relationships and transactions, and only utilizing lists belonging to other regimes where particular scenarios (e.g., destination of a transaction) dictate that this is necessary

► Determining the organization’s definition of a PEP upfront (e.g., based on public office held, country of office, relationship with the primary PEP, etc.) and tailoring PEP lists accordingly

► Avoiding duplication across screening controls — e.g., FIs may determine that it is not necessary to re-screen their own customers in transactions when they have already been screened against the relevant lists

2. Screening unconventional but known areas of sanctions and money laundering risk — real risks are present in some previously overlooked areas:

► In domestic transactions involving sanctions targets — relying on the customer screening controls of other domestic institutions could lead to a breach occurring

► Where individuals and organizations that pose a high risk of money laundering (e.g., PEPs) are transacting with the organization’s customers; it is likely many transactions involving such targets should give rise to suspicion of money laundering activity

Unfortunately, false positives can be hard to prevent, because there are times when good customers look like fraudulent ones. Often it feels like you’re looking at the picture at the top of this post: only one of those orange windows represents a peek into the life of a fraudster – but do you know which one?

The  harmful excessive reporting, called crying wolf, can arise in this setup. As the bank cannot share its signal with the government, the government must make decisions based on whether or not it observes the report. Intuitively, if the bank identifies all transactions as suspicious, then it fails to identify any one of them - exactly as if it would not have identified a single one. Thus, crying wolf can fully eliminate the information value of reports. Crying wolf can arise because excessively high fines for false negatives force the uncertain bank to err on the safe side and report transactions which are less suspicious. In the extreme case the bank is forced to report all transactions, thereby fully diluting the information value of reports. Fines have increased in the last ten years, especially so after the USA Patriot Act. In response, banks have reported an increasing number of transactions. However, the number of money laundering prosecutions has fallen - even though the estimates of money laundering volumes have been stable. Furthermore, regulatory agencies have identified ‘defensive filing’ which exhibits striking similarities with what happens under crying wolf.

By definition “Crying Wolf” arises when excessive reporting dilutes the information value of reports. In the extreme case of crying wolf, reports become completely uninformative.

There are at least four objectives which screening may be used to help address:

1. Sanctions: To not permit financial transactions with sanctions targets

2. Enhanced Due Diligence (EDD): To undertake EDD on PEPs and other high risk customers

3. Suspicious activity: To identify suspicious activity that may be indicative of money laundering

4. Negative news: To undertake negative news searches as a part of EDD on PEPs and high-risk customers

The disparate range of objectives may, in itself, shed some light on the sometimes conflicting priorities in screening programs.

For each objective, there is a need to accurately identify an individual or organization (a “target”) in order that some other action can be taken, whether it be stopping a transaction or identifying the need to undertake additional due diligence. It may be argued that the “Suspicious Activity” objective is more relevant to an AML transaction monitoring program than a screening program. However, its relevance here is in relation to the fact that transactions can be suspicious in virtue of the counterparties they involve, not just the behavior which they exhibit. Most screening programs implement at least two main screening controls to help meet these objectives: customer screening and transaction screening. Customer screening is used to identify new or existing customer relationships which may involve targets of interest; transaction screening is used to identify transactions involving such targets. Together, customer and transaction screening are intended to form a complete set of automated screening controls for identifying sanctions, PEPs and other high risk targets entering the organization or having financial dealings with it. However, there are a number of limitations in the way in which these controls are implemented, often resulting in risks not being effectively managed and significant inefficiencies being introduced into the screening programs.

Another assumption made is that all cross-border transactions do need to be screened because controls in some countries are weaker. For example, in the diagram above the Bank B would not screen transactions with Bank A but it would screen transactions with Bank C because they are cross-border. However, in many cases there is no reason to believe that the strength of controls is any more varied between banks in different countries than it is between banks within the same country. Indeed, for at least some European countries the level of variation in controls is similar. Therefore, it seems arbitrary to assert that cross-border transactions ought to be screened and domestic transactions should not. There can be challenges in obtaining the relevant data for screening domestic transactions as they often do not have as rich information relating to the originator and beneficiary as wires do. However, given that FIs have sufficient information to carry out these transactions, data quality is unlikely to provide a defense. Another concern from a sanctions perspective is the amount of over-screening which occurs, particularly when FIs utilize third party list providers for customer screening. Often it is assumed that all sanctions lists should be screened.

Though sanctions regimes can be extra-territorial (notably the US regime), they are typically limited by one or more of the following:

► Jurisdiction — transactions involving the jurisdiction the regime belongs to

► Citizenship — nationals of the jurisdiction the regime belongs to

► Currency — transactions in the currency that belongs to the regime

► Correspondent banking — existence of correspondent banking relationships with banks which have operations in the jurisdiction the regime belongs to .The fact that regimes are limited means that for any given customer relationship or transaction, many sanctions lists may not be relevant.

For example, a primarily European bank may determine that when opening a new customer relationship that screening against a Chinese sanctions list is not relevant even if the potential customer is Chinese, as it may have no obligation to comply with the Chinese sanctions regime.

If that bank (Bank A) could potentially transact with China then it may need to comply when such transactions occur. In this case, it may decide to implement a complementary transaction screening control so that, when it does transact with China, it does screen the payment — including both its customer and the counterparty — against the Chinese list to help prevent any breaches occurring. Lists: European Bank A Chinese Bank E Sanctions Targeted

► Chinese lists Core ► EU ► OFAC ► ... £ New customers could generally be screened against “core” lists only, and screened against “targeted” lists only if the origin or destination of their transactions necessitates it

 How do you know if the thresholds are set correctly in your OFAC (Office of Foreign Assets Control) Sanctions Filtering or BSA (Bank Secrecy Act) Transaction Monitoring system?  That is certainly an important question, and a mystery in the world of anti-money laundering (AML). Commonly in the financial industry, and for the purposes of this article, the act of tuning with the goal of false positive reduction will focus on OFAC Sanctions.  

Filtering and/or AML Transaction Monitoring.

Tuning is often driven by the need to improve quality of alerts voluntarily by the organization or it can be mandated by the regulators. Regardless the reason, it is very important that tuning be done periodically and correctly. The benefits of tuning can reduce workload, allowing more time to be spent on alerts that are more meaningful, thereby improving quality (See chart 1). However, if the tuning process is not conducted correctly, a greater risk can be created due to missing alerts. The goal in this article is to explain the science behind the process of what is called false positive tuning, the terminology used, the iterations, and how this affects you.  

Regardless the reason, it is very important that tuning be done periodically and correctly. The benefits of tuning can reduce workload, a llowing more time to be spent on alerts that are more meaningful, thereby improving quality (See chart 1). However, if the tuning process is not conducted correctly, a greater risk can be created due to missing alerts. The goal in this article is to explain the science behind the process of what is called false positive tuning, the terminology used, the iterations, and how this affects you.  

The Definitions

Much of the industry is familiar with the term False Positives, correct?  Of course, but how familiar are you with the other terms? First, let us explain some of the terminology you should know such as Positives, Negatives, False Positives (Type I Errors) and False Negatives (Type II Errors) (Wikipedia, 2015).

  • Positives are suspicious activity that generate an alert, requiring that a SAR is completed and submitted to the government.
  • False Positives (Type I Errors) are non-suspicious activity that generate an alert.    
  • False Negatives (Type II Errors) are suspicious activity that does not generate an alert, but should have had a SAR completed and submitted to the government.
  • Negatives are non-suspicious activity that does not generate an alert. (See chart 2)

The Risks

Next, let’s talk about what this means and how it applies to you. Positives and Negatives present a perfect process, which is rare or never can realistically exist without False Positives and False Negatives. False Positives create some risks because of the additional work that takes away focus from the investigator on Positive alerts. However, False Negatives are by far the greatest risk for any AML department and the financial institution as a whole since these should have generated an alert for their suspicious activity. Even an effective compliance operation has a level of False Negatives. The goal of tuning is to find the right balance between False Positives and False Negatives. The AML tools (software) allow for various levels of tuning, and expert consulting companies have various levels of expertise.

The Methodologies

One of the methodologies that we use is to conduct a statistical review. Our process is to review Positives, Negatives, False Positives (Type I Errors) and False Negatives (Type II Errors) (see chart 3), and develop a threshold that can be evaluated and justified.


From there we use a standard deviation formula that identifies a number of statistical clues including the more obvious factors such as the outliers and the percentile formula to determine the 85th percentile (Bland & Altman, 1996). This methodology should be conducted on each rule/scenario to isolate parameters and thresholds relative to the data or population that is being analyzed.  The aggregate of the alerts that are generated can also have an impact on a determination of a false positive and a false negatives. However, you must also account for the AML risk profile/assessment of the organization to determine if its right for your institution based on the statistical analysis.  In the end, it is critical that the statistics, AML risk profile, technical experience and compliance experience all weigh in on determing what the new proposed thresholds should be for iteration testing.

 Another methodology is to conduct iteration testing, or trial runs of the system. Typically, this first involves determining how many iterations you expect to run. To determine this, you need to know how many variables are involved in the process, such as how many threshold deviations, how many rules sets, how much data and so on. As stated above, we use the statistical data results to determine the thresholds and parameters of each iteration test. Generally, we see three to five iteration tests per rule set. This is best done in a clean test environment with all data files and a reset or rollback process in place.  In addition, the process should be planned, executed and documented for each step, including justification of each test case as to what steps and tests have been conducted (See Chart 4).


 These are just two methodologies used for false positive tuning. We often combine both methodologies to create a more thorough approach and completeness of the exercise. 

Risk vs. Quality

The management process of an alert, specifically risk and quality, is the key to a successful AML Governance Program, since most AML Governance Programs in the end are really managed by your AML software. We distinguish quality and risk in two ways, even though their relationship is intertwined. The quality of an alert process is improved by reducing false positives. The risk of an alert process is decreased by reducing false negatives (See Chart 5).


However, note that as we decrease false positives, the quality of an alert improves but that does not affect the risk of an alert. To affect the risk of an alert, we have to decrease the number of false negatives. There are tools that allow you to have more control over your false positives and false negatives, without adjusting your 85% percentile. The most common process in the industry in a tuning exercise is to conduct iteration testing; but we combine methodologies such as a statistical analysis (standard deviations, percentile, and distributions) and iteration testing for a more thorough review. We believe that one of the most important lessons to be learned is to understand the quality of alerts and the risks associated.


We hope this brief article has level-set some of the terminology, summarized our methodology, and discussed the difference between quality and risk as it relates to alerts. While there are varying methods to conduct the tuning process, it is most important that you understand your risk profile/assessment and the different types of positives and negatives. Further, a better understanding of the triggers generating the alerts will help improve quality while lowering risk.


* Excerpt from Brian Taylor book Anti-Social Engineering the Hyper-Manipulated Self

Bland, J. M. & Altman, D. (June 29, 1996). Measurement Error. BMJ. Volume 312. Retrieved 2015, February 25.