Abstract: The United States has collected petabytes of data relevant to counterterrorism and the study of terrorism over the past 20 years. More recently, and especially since 2018, the U.S. government has been making some big moves to integrate and scale data-science, machine learning, and artificial intelligence-driven approaches across its national security enterprise as a way to push change, innovate, and prepare for the coming AI-driven future. This article examines the intersection of these two developments—the United States’ vast terrorism data holdings and the transformative power of data science and AI—by highlighting additional potential associated with four types of data: terrorism incident data, primary sources recovered by U.S. and partner forces, terror propaganda, and data about counterterrorism activity. It argues that the United States should create a new terrorism and counterterrorism data action plan, and it offers five recommended focus areas that deserve attention and emphasis as part of that plan. These five focus areas, which are not exhaustive and are only designed to shape conversations, include the need to: 1) reinvest in and advance core terrorism data, 2) strategically leverage captured material, 3) better develop and utilize counterterrorism data, 4) practice data alchemy, and 5) automate basic and other analytical tasks, and augment data.

The United States has collected petabytes of data relevant to counterterrorism and the study of terrorism over the past 20 years.1 The amalgamation and analysis of data led the United States to Usama bin Ladin’s hideout in Abbottabad, Pakistan. Innovations in how the United States processed and fused information and made data actionable has also been one of the most important and game-changing achievements of the United States’ two-decade long war on terrorism.2 Indeed, the tactical and operational counterterrorism successes that the United States and its partners have accomplished since 9/11 is a story intimately tied to how analysts and practitioners have exploited data to better understand and degrade terror networks.

The United States has shifted its strategic emphasis and focus to address the rise of, and threats posed by, China and Russia, a transition that is needed and overdue. Yet, despite the United States’ desire to put terrorism in the rear or side view mirror, terrorism is not going away anytime soon. The threats posed by transnational groups like al-Qa`ida and the Islamic State, key militant Iranian proxies, and other networks—to include a diverse mix of domestic extremists in the United States—will evolve and continue to manifest in one dangerous form or another. There is also a real risk, due to the enduring nature of the terrorism threat, that the manner of America’s terrorism pivot could end up complicating the United States’ ability to maintain its near-peer focus. This is because while the United States has moved on to other priorities, core U.S. terror adversaries have not, and they like to disrupt and spoil. Indeed, as Brian Michael Jenkins has noted in this publication, “Events, not plans or preferences, will determine how much the United States will be able to shift or not shift resources away from counterterrorism and toward near peer competition.”3

Much is riding on how the United States balances and manages these two national security priorities—counterterrorism and near-peer competition—in practice, as in the years and decades ahead the United States is going to need to be able to deal with both challenges and do so simultaneously, and well. It needs to get better at both.

Data, and what the United States does with data, will be a central part of that future. The United States recognizes that data is a strategic asseta and that data science-informed approaches and artificial intelligence (AI)—like electricity—“holds the secrets which will reorganize the life of the world.”4 b For the past several years, the United States has been making big moves to adjust, adapt to, and prepare for the coming AI-driven future, and to position itself to lead. One only needs to look at the mix of national-level to agency-specific AI strategy documents and plans,5 hefty financial investmentsc and organizational adaptations made to drive and scale AI initiatives,6 and the testing and operational application of machine learning (ML)/AI approaches7 to see that the large, ‘sea tanker-like bureaucracy’ of the U.S. government is in the process of making an important strategic pivot.d

A high-level overview of recent changes that have taken place within the Department of Defense (DoD) is instructive.e Since 2018, for example, the DoD has released its artificial intelligence strategy (2018), digital modernization strategy (2019), and data strategy (2020). In 2018, the Defense Advanced Research Project Agency (DARPA) announced a “multi-year investment of more than $2 billion in new and existing programs called the ‘AI Next’ campaign” with emphasis placed on key areas.8 In 2018, these efforts and investments were given added organizational structure and form through the creation of the DoD’s Joint Artificial Intelligence Center (JAIC), an entity established to be the focal point for carrying out DoD’s AI Strategy.9 That same year, U.S. Special Operations Command (SOCOM) made a similar organizational move through the establishment of its Command Data Office, designed “to oversee … workforce transformation, as well as provide a node for industry outreach, data governance, and application of a data-focused perspective to capability development decision-making processes.”f The Defense Innovation Unit has been active in “pursuing a number of AI projects to optimize business processes in the DoD” as well.10

The JAIC, SOCOM, and the National Media Exploitation Center (NMEC) have also played critical roles in operationalizing ‘big data’ through AI and ML approaches.11 When it comes to counterterrorism, a seminal example is Project Maven, a “pathfinder effort” to employ AI and ML “in the fight against ISIS, al-Qaeda, and their geographically dispersed proxies.”12 As noted by the scholar Richard Shultz and SOCOM Commander General Richard Clarke, Project Maven’s initial objective was “to automate the processing, exploitation, and dissemination of massive amounts of full-motion video collected by intelligence, surveillance, and reconnaissance (ISR) assets.”13 This was achieved through the utilization of “specially trained algorithms,” which “could search for, identify, and categorize objects of interest in massive volumes of data and flag items of interest.”14

These moves are important signs of momentum and advancement. The data and AI strategy documents and plans that the U.S. government has released provide the broad framework for how it intends, or hopes, to move forward in the data and AI arena. And that has been complemented by vision offered by seasoned practitioners like former head of the Defense Intelligence Agency Lieutenant General (Ret) Robert Ashley for where these changes should, or are likely, to lead. For example, according to Ashley, in the future “Leveraging data from captured enemy material, applying machine learning and computer vision against petabytes of publicly available information, embracing open-source intelligence and open architectures should be a routine part of every military operation going forward.”15 But as the United States looks forward and works through how to ‘right size’ counterterrorism,16 it also needs a more defined plan for how it intends to utilize, integrate, and more fully leverage the petabytes of terror and counterterrorism data it has collected, collated, and created over the past 20 years. Those vast quantities of data are an incredible resource—a strategic asset that if leveraged in smart and strategic ways will help the United States to continue to learn and transfer knowledge across generations, track future terror developments, identify new counterterrorism opportunities, and gain analytical efficiencies.

There are two primary reasons why such a new terrorism and counterterrorism data action plan is needed and should be developed and resourced. Like other domains, counterterrorism is evolving into a means of geopolitical influence that states, including the United States and its near-peer rivals, have been competitively using to develop relationships and to secure defense-related access and placement. And while counterterrorism assistance to foreign partners usually revolves around hardware, training, and financial assistance, the parties that will evolve and lead the counterterrorism field over the next decade are those actors who possess the ‘best’ data and who are able to make most effective use of that data. General (Ret) Joseph Votel, the former commander of U.S. Central Command, highlighted this point in a recent interview in this publication: “I do think the future will be dominated by those who understand it [data] the best, whether it is through publicly available information sources, managing large data, or whether it is the ability to see and understand what is happening in areas so that it preserves our decision space and informs our policy choices.”17

A new terrorism and counterterrorism data action plan is also needed for efficiencies’ sake. For if the United States wants to focus less on terrorism and more on China and other strategic competitors, it needs to find areas where efficiencies can be gained in the processing, analysis, and use of terrorism-related data through deeper focus on data science-informed approaches and investment in and broader experimentation and adoption of ML and AI. This article is designed to help shape the conversation of how this can be done.

The piece starts by unpacking in greater detail why such a new terrorism and counterterrorism data action plan is needed. The article is then organized around five recommended focus areas (in addition to other priorities) that deserve attention and emphasis as part of that plan. There is a need to: 1) reinvest in and advance core terrorism data, 2) strategically leverage captured material, 3) better develop and utilize counterterrorism data, 4) practice data alchemy, and 5) automate basic and other analytical tasks, and augment data. To ground those discussions, each section contains practical examples that demonstrate how specific categories, or types, of data could be better leveraged in relation to the five “need” areas, and how different approaches could be used to extract more utility from existing data sources.

While this article discusses various types of data, emphasis is intentionally placed on four categories of data: 1) terrorism incident data, 2) primary sources picked up by U.S. military and partner forces, 3) official terror group propaganda, and 4) data about counterterrorism activity and assistance. This article does not substantively examine the potential associated with other types of data, such as social media data or the general category often referred to as publicly available information (PAI),g court records, financial data, signals intelligence (SIGINT), human intelligence (HUMINT), geospatial intelligence (GEOINT), detainee records, biometric information, or data about how extremists use and/or exploit digital platforms. That is not because these and other types of data are not important to operations or the future of counterterrorism; they are extraordinarily important. Indeed, as Nicholas Rasmussen stated in 2015 when he was serving as the director of the National Counterterrorism Center: “Just the sheer volume of threat information that we see every day in social media communications suggests that we need to increase our capacity to make better use of this information.”18

The importance of the four particular types of data discussed in this article—terrorism incident data, primary sources picked up by U.S. military and partner forces, official terror group propaganda, and data about counterterrorism activity and assistance—takes on even more significance when one considers that the fusion of these various sources, and the information gleaned from the integration of them, is usually even more valuable than the original sources themselves.

The decision to place emphasis on the four particular types of data was made for three reasons. The first relates to ease of use and access. Two of the four types of data—terrorism incident data and official terror propaganda—are open sources that can be found online. There are also fewer privacy concerns associated with using these two types of sources. The DoD has also taken recent steps to make primary source material recovered by U.S. and partner forces—another of the four types of data discussed in this article—more accessible and less controlled than it has been in the past. The combination of these factors makes these three types of data easier to access and use, which as Amy Zegart and others have highlighted should help to facilitate more rapid experimentation and testing of ML/AI tools and approaches, without the complexities associated with classified data.19

Secondly, emphasis was placed on data about counterterrorism activity and assistance because that category of data does not receive a lot of attention generally, despite its importance.

Thirdly, the public conversation about terrorism data and AI thus far has mostly focused on select types of ‘big data’ such as digital social media data or PAI, bulk telephone metadata, or full motion video collected from unmanned aerial vehicles or other surveillance assets.20 Instead of covering the same territory, this piece narrows its focus to the AI potential associated with just the aforementioned four important types of terrorism data. This article aims to broaden, diversify, and advance discussions about what can or should be done with terrorism and counterterrorism data moving forward, and what is possible.

This article represents the view of a researcher who evaluates data and sources to address strategic, and not operational or tactical, questions. The views that this article presents and the suggestions it offers are also limited, as there are other important issues, such as privacy and ethical considerations, relevant to the collection and study of data and measures taken to ensure AI safety21 that any new terror and counterterrorism data action plan would also need to consider and tackle.

U.S. Special Operations Command Chief Data Officer David Spirk, USSOCOM Commander General Richard D. Clarke, and USSOCOM Senior Enlisted Leader Chief Master Sergeant Gregory Smith officially open the USSOCOM Data Engineering Lab in Tampa, Florida, on September 25, 2019. (U.S. Air Force Master Sergeant Barry Loo/U.S. Department of Defense)

Why a New Terror Data Action Plan Is Needed
An overview of some hard truths and challenges brings the need for a ‘what to do with all that terrorism data’ vision into focus. First, despite counterterror accomplishments, terrorism as a global and regional problem, or even a local one in the United States, is not going away anytime soon. The United States and its counterterrorism allies have degraded the ability of al-Qa`ida and the Islamic State to conduct strategic attacks, and to directly attack the U.S. homeland in high-impact ways. The coalition to defeat the Islamic State has also been able to disrupt and limit the group’s ability to seize and hold territory in Syria and Iraq. Thanks to collaboration among technology and social media companies, it is now harder for terror networks and sympathizers to maintain a consistent presence online, spread propaganda and share information, and virtually interact. Those hard-earned gains are important, but without consistent pressure, focus, and appropriate levels of ongoing investment, many of those gains will also be fleeting.

Indeed, as noted by Michael Morell, former acting director of the Central Intelligence Agency, the pattern of activity from mainstay groups like al-Qa`ida and the Islamic State is like a sine wave: “They get very dangerous, you degrade them, they weaken, you take your eye off them, and they rebound. And I don’t think that pattern is going to stop. I think we’re going to see this for quite some period of time.”22

These challenges are compounded and complicated by other terrorism and threat landscape trends. Compared to 9/11, today’s terror threat is more geographically dispersed, more diverse, and more complex.23 Or put another way, today there are more terror groups active in more countries around the world, and more organizations, networks, interactions, and agendas for analysts and practitioners to understand and track.24 The complexity of international and U.S. domestic terror threats have both gone up over the past five years, while U.S. emphasis and willingness to pay attention to foreign terror activity is being ‘right sized’—or perhaps more cynically ‘downsized.’ One only needs to look at the proliferation of non-state, jihadi-inspired militants active in key regions of Africa,h or the diversity and fluidity of far-right extremist networks in the United States, to see that there are still a lot of active terrorism threats around the world, and a lot of different type of actors. And even though many of today’s foreign terror groups do not present a direct, or substantive, threat to the United States, they still complicate local and regional security environments and threaten U.S. partners—so they need to be monitored.

But that may be easier said than done, as the resources needed to monitor even a prioritized list of terror networks is in competition with the diverse array of threats—from cyber, economic, and informational challenges to biological threats and those posed by new weapons systems or emerging technologies—with which the U.S. security enterprise must contend. Plus, as Amy Zegart has noted, “more threats” is only one of the “Five Mores”i dramatically changing the business of intelligence.

These threat-specific ‘scope, scale, and complexity challenges’ are complicated by two other issues: the flood of data that counterterrorism professionals need to weed through, and the velocity or speed at which information moves today.j The world is awash in data, information that can be leveraged to identify new threats or enhance understanding of existing terror dangers—that is, if one can identify which data is relevant and make sense of it in a timely way. Given the overwhelming and varied mix of data that the U.S. defense and intelligence enterprise collects and that is increasingly accessible through open or commercial sources, this is a fundamental challenge, one the intelligence community has acknowledged. For example, according to a key strategy document released by the Office of the Director of National Intelligence (ODNI) in 2019: “The pace at which data are generated, whether by collection or publically available information (PAI), is increasing exponentially and long ago exceeded our collective ability to understand it or to find the most relevant data with which to make analytic judgments.”25

Two ‘troves’ of terrorism data recovered by U.S. forces during operations nearly a decade apart—the first in 2006 in Iraq and the second in 2015 in Syria—provide insight into the change in scale of data being collected that analysts, leveraging a variety of tools and tradecraft, must wade through. During the first operation, a mission that led to critical information about Abu Musab al-Zarqawi, al-Qa`ida in Iraq’s leader at the time, U.S. forces recovered a thumb drive “with 1 gigabyte of memory,” which included “a trove of information … a treasure of data.”26 The second operation, a raid in eastern Syria, resulted in “four to seven terabytes of data” that was “harvested from laptops, cellphones, and other materials recovered.”27 The takeaway: what the U.S. government considered a trove of data recovered during operations jumped from one gig in 2006 to multiple terabytes less than a decade later.k

That jump in scale is not unique or limited to information scooped up during physical, on-the-ground operations, but is a broader challenge affecting multiple types of ‘INTs.’ An example cited by ODNI puts the scope of the challenge into context: the “Director of NGA [National Geospatial-Intelligence Agency] has publically estimated that at the current, accelerating pace of collection, we would need over 8 million imagery analysts by 2037 to process all imagery data.”28 l

Given the deluge, analysts often face three key problems: 1) navigating through the ‘noise’ to identify important pieces of information, 2) identifying how pieces of data relate to one another, or fit together, and 3) appreciating the deeper context, or history, associated with a particular issue or group.

This is not because government analysts are not talented (they are) or do not care (they do); it is because they typically do not have the luxury, space, or time to step away from the tactical and operationally focused tasks—like identifying and disrupting terror plots—that consume them. That level of focus is obviously needed, but it has also come at a strategic cost, as it has not been complemented by meaningful emphasis placed on the strategic review, analysis, and exploitation of terrorism data. For example, due to the pace of counterterrorism operations over the past two decades, it has not been uncommon for material after it has been exploited for operational and tactical purposes to be set aside, where it typically remains underutilized, gathering proverbial dust. Over time, the amount, diversity, and richness of the primary source data collected by the United States has only grown—and grown exponentially.

With this context in hand, this article next explores five areas that deserve attention and focus in formulating a new terrorism and counterterrorism data action plan.

Five Focus Areas for Terror Data

1. Focus on Fundamentals: (Re)Invest in and Advance Core Terrorism Data
To avoid detection and disruption, terror networks are security conscious and try to conduct their internal affairs in a clandestine way. This can make it hard to identify terror group plans or gain insight into how networks and their actions are evolving. The withdrawal of U.S. forces from Afghanistan and ongoing U.S. effort to ‘right size’ counterterrorism is compounding this issue and is leading to a reduction in the number and quality of sensors—the human and technical ‘eyes and ears’—and key data injects that have been leveraged to better understand, track, and target priority terror groups.

To address and minimize this problem, the United States should identify how it can extract more meaning from existing terrorism data repositories; how it can creatively aggregate or stitch those sources of data together so it can better, and more efficiently, spot patterns, anomalies, and hidden trends; and identify what type of ‘new’ sensors or sources of data will need to be engineered or leveraged to maintain useful windows into the activity of terror groups around the world.

The United States should thus be looking at key terrorism data resources and holdings, and related data streams, to see if and how those resources can help to fill the gap and enhance understanding of terrorism dynamics and better illuminate how the strategic environment is evolving during this transitionary period. And it should start by looking at core data resources, like that provided by the Global Terrorism Database (GTD)—a key open-source repository that contains data on more than 200,000 global terror incidents since 1970—and then work outward.

When it comes to terror data resources, the GTD is not sexy: It catalogs base-level information about terror attacks, details such as “the date and location of the incident, the weapons used, nature of the target, the number of casualties, and—when identifiable—the group or individual responsible.”29 Analysis of GTD data is not going to help the U.S. government to identify or prevent the next terror attack, and as a result, in U.S. government circles it often does not receive the support or attention it deserves. But the GTD is a foundational and underutilized data resource that can be used to help identify longitudinal trends, evaluate shifting terror group priorities, and situate trends related to terror group interactions, tactics, or geography. Its core strength is that it provides data-driven historical context; information needed to baseline terror attack trends, identify change over time, and understand high-level threat patterns. Having that type of data on hand is critical for the United States to achieve strategic intelligence objectivesm and optimize its counterterrorism activity and investments.

An example helps to bring the strategic utility of the GTD to light. The Philippines is often positioned and viewed as one of the Global War on Terror’s success stories, an example of where and how a comparatively modest level of U.S. advise and assist activity and counterterrorism investment has led to the containment, or reduction, of Islamist-inspired terrorism. That is a useful narrative, but an analysis of 20 years’ worth of GTD attack data demonstrates how that view is disconnected from reality. According to the GTD, terrorism has become considerably more of a problem in the Philippines over the last decade than the decade prior. For example, 79 percent of all terror attacks in the Philippines (regardless of ideological orientation) occurred between 2011-June 2019, while slightly more than 21 percent of such attacks took place from 2001-2010. Complicating matters further, the rise in the volume of terror attacks over the past decade in the country was not limited to one category of group, but was a standard trend across Islamist militant groups, communist-inspired organizations, and operations conducted by unknown entities.

Why do these long-term trends matter? Is not the GTD primarily a tool for academics? They matter because the Philippines is a treaty ally, a strategic partner, and a country that is extremely relevant to near-peer competition and the ongoing fight against Islamist militancy. The Philippines sits smack dab at the intersection of both of those issues. Given the importance of the Philippines alliance, it is critical that the United States understands how terror threats—from Islamist to communist-inspired—are evolving in the country so that it can be a better partner and optimize or adjust its investments or approach.

As explained in the automate and augment section below, the foundational data provided by the GTD is not just limited to identifying long-term trends, but through automation, it can also be leveraged, or enhanced, to help the U.S. government more effectively spot other more pressing patterns and changes in terror group behavior; information that can help the United States anticipate the direction and modality of future threats.

So as the United States’ security establishment re-evaluates its terrorism data resources and maps out what it can, or should, do with them, it would be wise to focus on fundamentals and (re)invest in resources like the GTD.30 The GTD is not a perfect resource, but there are few resources like it and the database provides core and reliable baseline data that the U.S. government can use to track change, inform shifts to U.S. counterterrorism policies and priorities, and enhance the utility of other tools and data resources.

2. Better Leverage and Make Strategic Use of Captured Material
The United States has collected a massive amount of information during counterterrorism operations conducted since 9/11. In an article published in Joint Forces Quarterly in 2020, it was estimated that the U.S. military was in possession of “over 300 terabytes of CEM gathered from across the globe.”31 n This diverse archive of material includes forensic material32 and data from computers, external hard drives, and cell phones, and from physical items like books, manuals, diaries, letters, and other types of personal correspondence that have been recovered.

All that data, what the U.S. military has taken to calling ‘Collected Exploitable Material (CEM)’ or battlefield evidence, has been critical in helping counterterror practitioners identify and locate new terror targets and enhance understanding of internal terror groups dynamics, such as leader priorities, inter-group relationships, organizational challenges, and the bureaucratic minutiae associated with running terror networks. CEM holds great potential and has been used in important ways “to investigate and prosecute foreign terrorist fighters, screen and watchlist terrorist suspects, or deny” travel.33

Some additional detail highlights the strategic value of CEM.34 The U.S. military’s CEM archive includes personal correspondence between senior terror group leaders like Usama bin Ladin and his key lieutenants; the fingerprints and other signatures of bomb makers; detailed personnel, payment, and organizational data on tens of thousands of fighters who joined the Islamic State; internal records about and produced by the Afghan Taliban and key figures—including Jalaluddin Haqqani—who helped shape the direction of that movement; and troves of financial records produced in various languages.

Since 9/11, the U.S. government has made significant strides in how it processes and makes use of CEM, and that work—which has placed emphasis on speed, the use of various tools and approaches (e.g., “investments in text recognition technology, object detection, machine translation, audio and image categorization”), and the sharing of data—is ongoing.35 CEM data is operationally valued and utilized by the U.S. counterterrorism community.o One important reflection of this is the U.S. government’s use of AI to rapidly process the trove of more than 470,000 documents that Navy Seals recovered from Usama bin Ladin’s compound in Abbottabad, Pakistan, and the efforts made to evaluate that collection of material in relation to a broader corpus of data to identify “future plots, emerging threats and [develop] a greater understanding of mysteries” about al-Qa`ida that were not well understood before.36 “Had AI not been used in that instance,” the Defense Intelligence Agency’s Science and Technology director of artificial intelligence remarked in 2020, “it would have taken the entire federal workforce to piece the puzzle together and it still probably wouldn’t have succeeded.”37 While the ability of seasoned experts to process such document collections and generate key national security takeaways on their own steam should not be underestimated, this sentiment speaks to the perceived benefits of AI-informed approaches.

For some of the ML and data analytics tools that the U.S. government has invested in, such as the Advanced Analytics and Machine Learning Microservices Platform (A2M2P), the bin Ladin archive was a key open-source test case. According to statements made in May 2019 by a representative of the company that developed the A2M2P tool, the “next step is to modify the tool to integrate sensitive-site exploitation data with information from open sources, signals intelligence and human intelligence.”38

The bin Ladin example illustrates the power of AI and how the U.S. government has been leveraging AI to exploit large collections of CEM and other data for operational purposes.p Those gains are critically important, but the U.S. military’s vast CEM holdings still remain a strategically underleveraged and underutilized resource. The Defense Department recognizes that the amount and varied nature of its CEM holdings have been key challenges and that it has “struggled to get these materials to our allies and partners in a usable format and timely manner.”39 q A new set of guidelines issued by former Secretary of Defense Mark Esper in January 2020 reflect DoD’s awareness of these issues and the need to solve them, as the memo directed that “all new CEM be unclassified unless sensitive sources, methods, or activities were used to acquire it.”40

As noted by Michael Fenzel, Leslie Slootmaker, and Kim Cragin, the “new guidance lays the foundation for CEM to be used well beyond the battlefield. It allows for easier transfer of CEM from the military to other U.S. Government agencies, as well as our allies and partner nations.”41 It will also make it easier for DoD to share CEM data with technology partners and other service providers.

Several factors have aligned to create a ripe window for the United States to step back and develop a plan for how it can make more effective and strategic use of CEM. For example, al-Qa`ida and the Islamic State both currently pose less of a significant threat to the United States. And as outlined above, the U.S. counterterrorism enterprise is also currently navigating a major inflection point, where U.S. counterterrorism posture and activity—given the emphasis placed on near-peer competition—is being reevaluated and ‘right sized’ across the board. If there ever was a ‘good’ time to re-envision how the U.S. government can better utilize and draw upon its rich repository of CEM, that time is now.

The opportunity may be fleeting, however. For instance, if the United States does not take advantage of this current window, there is a danger that as less and less new material gets added to the CEM archive, attitudes about the usefulness of CEM may shift and CEM may be increasingly viewed as a less relevant and historic out-of-date resource over time. The ongoing shift to strategic competition is only likely to amplify and compound these pressures.

There are numerous reasons why not developing a plan to strategically leverage CEM would be lamentable. Al-Qa`ida and the Islamic State—the two groups for which there is the most amount of CEM—have been degraded, but they are not going away. Those two groups will evolve and will present terrorism threats in the months and years ahead. Thus, as the United States continues to shift its strategic emphasis and focus toward near-peer competition, it is likely that deep knowledge about al-Qa`ida and the Islamic State will diminish over time, especially as seasoned government experts retire, shift to other problems, or take on new jobs. In the years to come, it seems likely that there will be fewer specialists well versed in the U.S. military’s rich and varied CEM stockpile. Thus, one reason why developing a plan to make strategic use of CEM makes smart sense is because it would help capture institutional knowledge and fuel the continued development of such knowledge about the two primary terrorist adversaries the United States has been fighting for the past 20 years.

The U.S. military’s primary source CEM archive, for example, could be used to write the definitive history of al-Qa`ida and the Islamic State, with specific chapters—and related CEM data appendices—tailored to key periods, regions, themes, or topics. Those two resources, which could be supported and underwritten by the U.S. government and developed by a mix of government personnel and leading scholars, would provide a detailed, thoughtful, and comprehensive ‘go-to’ resource for the next generation of analysts involved in activity to counter al-Qa`ida and the Islamic State, or their future spinoffs and manifestations. That would be a smart, and relatively low-cost, investment that would allow the U.S. government to preserve knowledge and gain analytical efficiencies over time.

Another related reason why it would be lamentable to not do more with CEM is because not doing more assumes that the U.S. government has learned all it can from its archive; that the CEM archive does not contain data relevant to new counterterrorism lines of operation. Indeed, just as CEM can be leveraged to help look back and better understand the past, it can also be leveraged to help the United States uncover issues it has missed and identify information that could inform, or lead to, future actions. For example, the material could be used to enhance understanding of the Islamic State’s global supply chain network (with potential emphasis placed on suppliers utilized in countries such as Turkey or China).

The importance of leveraging CEM in this way takes on additional salience when one considers the United States’ Afghanistan withdrawal, and current predicament. Due to that decision, the United States—as noted by General Votel in an interview in this publication—is going to need to develop ways to understand and disrupt or attack terror targets at a greater stand-off distance.42 One potential way it could do that is by focusing on the logistical and financial support networks that have helped to sustain groups like the Afghan Taliban or Islamic State Khorasan. The CEM archive contains thousands of financial ledgers, many of which are in Pashto or Dari, that were recovered by U.S. and partner forces in Afghanistan.43 If the United States needs new ways to continue to apply pressure to those two groups, or gain leverage, the CEM archive likely holds some important insights, clues, and uncovered secrets.

Lastly, as Fenzel, Slootmaker, and Cragin have argued, CEM “also holds unlimited potential for strategic competition” and can be creatively leveraged in that regard.44

3. Create and Utilize CT Data Resources to Learn Lessons, Improve, and Advance the Study of CT
The stockpile of data the United States has acquired since 9/11 is not limited to data on terror adversaries. The U.S. military also holds detailed information about its own counterterrorism activity over the past two decades. This ‘blue’ data should be studied and leveraged so the United States can learn from it and identify which strategies or approaches have worked or not worked. Doing so would help the U.S. military determine how it can become more effective as a force. It would also advance the discipline of counterterrorism as an area of academic inquiry.

An example highlights why leveraging ‘blue’ data in a more comprehensive way to look inward would be a smart play.r One of the approaches that has guided the United States’ counterterrorism approach since 9/11 has been leadership decapitation: the removal, either through capture or killing, of key terror group personnel. The strategy of decapitation and its effectiveness has been the subject of academic debate for more than a decade, and key studies have put forth different interpretations about how leadership decapitation impacts the survivability, or endurance, of terror groups.45

The U.S. military holds data on thousands, if not tens of thousands, of real-world counterterrorism operations that could be used to inform some of the decapitation points of debate and advance understanding of where, how, and under what conditions that approach has or has not worked. For example, existing studies typically examine leadership decapitation through the lens of top, or senior, terror group leaders on whom information has been published. Data on mid-level leaders or key terror personnel who might not necessarily be leaders but who play critical roles in an organization (e.g., key financiers, logisticians, etc.) is less available via open-sources.46 It therefore would be useful to know how removal of other key terror group members, beyond senior leaders, has impacted or not impacted the ability of al-Qa`ida, the Islamic State, the Haqqani network, or the Afghan Taliban to operate.

Similarly, if the United States has an interest in improving its future use of leadership decapitation, it would be useful to know what the data says about when the effects of counterterrorism actions are more lasting: Is it when leaders and mid-level managers are removed in rapid succession, when key support personnel are targeted, or perhaps when kinetic actions have been complemented with additional counterterrorism approaches that place other forms of pressure on a group? The data could support some of the academic findings on the topic: that leadership decapitation is of limited effectiveness when regularly applied across time, especially when applied against older and more seasoned groups. And if that ends up being the case, the United States should give additional consideration to when leadership decapitation would be most beneficial, when it is counterproductive, and when other approaches might lead to more lasting effects.

Looking back on decades’ worth of operational counterterrorism data will also likely pay other dividends, as when that data is reviewed in hindsight and from a strategic perspective it could reveal and spotlight patterns that the United States did not see, or was moving too fast to notice; information that could prove useful over the next five to 10 years as the fight against terror evolves.

Ideally, the United States would take a broad and comprehensive look at its counterterrorism data holdings, as there are additional types of data that can reveal important insights about the scale, application, and effectiveness of other U.S. counterterror tools and approaches.s In many cases, this includes data resources that are available but are either scattered—with pieces of data about a particular issue located in various places—or that are not well structured to facilitate data-driven analysis. One example is data about the United States’ use of two primary terror sanction tools: Executive Order 13224 and the State Department’s Foreign Terrorist Organization (FTO) designation. In September 2020, the Combating Terrorism Center released a major longitudinal study that leveraged data from those two tools to attempt to empirically evaluate outcomes associated with their use.47 Why? Because while the U.S. government believes those two tools are useful and lead to better or more effective outcomes, no one within government had done the work to empirically evaluate if that was the case. To facilitate that look, CTC just needed to add some structure to the data and examine it through an analytical framework. That took time, but it was not rocket science.

Unfortunately, when it comes to counterterror data, the shortcomings described above are not uncommon. Another example is U.S. security cooperation data. Given the ongoing ‘right sizing’ of U.S. counterterrorism, over the next decade the United States is likely going to need to rely on partners more, not less. U.S. security cooperation assistance provides an important set of authorities and tools to help the United States bolster the capabilities and capacity of allies, as well as maintain the relationships needed to enhance U.S. influence, shape or conduct counterterrorism operations, and enrich understanding of how foreign terror threats are evolving. Yet, despite the importance of U.S. security cooperation activity to the future of U.S. counterterrorism, the security cooperation data landscape leaves a lot to be desired. There is much data that exists about the topic, but it can be hard to find, and where structured publicly available data resources exist, they only provide a high-level view of security cooperation efforts. It takes an informed and discerning eye to make sense of the available data on counterterrorism-focused U.S. security cooperation programs. That makes it hard, and an inefficient process, to aggregate and stitch data together about historical and more current security cooperation programs that have a counterterrorism nexus, which in turn makes it even harder to provide a perspective on the effectiveness of those programs (typically valued at tens of millions of dollars) over time. It also makes it hard to empirically identify cross-cutting building partner capacity challenge areas tied to specific capabilities, systems, or U.S. approaches that can be common to various partners.

More structured analysis and tracking of security cooperation data also has other benefits, as when data on China’s and Russia’s counterterrorism-focused security assistance are matched with data on U.S. activity, it can reveal where the United States is competing, or not competing, with its near-peer adversaries around the world, and how that landscape is evolving.

4. Practice Alchemy: Aggregate, Integrate, Experiment, and Make Creative Use of Data
Sometimes, data relevant to terrorism and counterterrorism is hiding in plain sight: it just needs to be ‘found’ and leveraged in novel and creative ways. Thus, as the United States looks forward, it should also give serious consideration to what other types of under- or less-utilized data it can leverage to advance understanding of terror group behavior and counterterrorism activity. Two examples highlight the power of investing in and embracing terror and counterterror data alchemy.

The first example relates to data that was imaginatively extracted from a terror group’s writings about its own fallen recruits. Recruits are the life blood of terror groups, as without members—and the recruitment of new members—terror groups stagnate and wither away. This is one of the many reasons why it is essential to develop insight into who joins terrorist groups, what motivates those individuals, and how terror organizations attract and make use of its new members. For a group like the Islamic State and its predecessor entities, there are a plethora of recovered documents—from payment spreadsheets to individual registration forms—that provide insight into that group’s tens of thousands of recruits. But the same type of material is not as available for other important militant groups, like the Pakistani terror outfit Lashkar-e-Taiba (LeT), which orchestrated the high-profile and complex attack in India’s Mumbai in 2008.48 So, if developing a deeper understanding of who joins a group like LeT, how they join, and what those recruits do in the group is a priority, then that data needs to be found elsewhere, or it needs to be engineered.

Fortunately, terrorist groups like to ‘talk’ and they like to publish material about their worldview, their activity, and their accomplishments. LeT is no exception. Indeed, since the 1990s, LeT, as well as many other FTO-designated Pakistani terror organizations, have openly published a mix of periodicals targeted to specific audiences—from Urdu language magazines designed for men, women, and children to publications released in English.49 LeT’s magazines are chock full of all sorts of information the group has decided to publicly privilege and publish, including tributes to fallen fighters who have died during LeT operations. Month after month, year after year, and across two decades, the group has published details about its fallen recruits. That data has been available: it just needed to be extracted, coded/structured, and analyzed. So, in the early 2010s, that is what a CTC effort did.

The result: the creation of a 900-person dataset filled with details about the background, recruitment, training, deployment, and death of recruits who joined the group across a 13-year period—the largest public dataset on Pakistani militant group members of its kind at the time.50 The analytical report that accompanied the dataset provided granular, data-driven insights about where, down to the district and village level in Pakistan, LeT has historically recruited its members and the specific regions, usually in areas of India-occupied Kashmir, where those recruits died.51 To derive more meaning, information about the educational background of the militants was also extracted from LeT’s magazines and evaluated in relation to publicly available statistical data released by the government of Pakistan about male educational attainment levels in the country, data that helped to dispel some myths about the typical level and type of education LeT militants have received. In the LeT example, the explanatory power of the data did not lie in the individual martyrdom biographies that the group published, but in the longitudinal aggregation and statistical examination of that collection of data.

The second example of a novel use of data for counterterror purposes comes not from physical magazines, but from information captured overhead: from satellites. In early 2021, Eric Robinson and Sean Mann published a series of reports through a collaborative effort between RAND and the National Geospatial-Intelligence Agency (NGA).52 As part of that effort, the two RAND researchers leveraged geospatial data on nighttime lighting to evaluate the temporal growth and continued use of 380 detention facilities in Xinjiang, China53—facilities the Chinese government has reportedly used to detain Uighur Muslims as part of its domestic efforts, it claims, to counter separatism, extremism, and terrorism in the country.

The project was unique due to the novel approach taken, particularly the creative analytical use of under-utilized data (i.e., nighttime lighting); its blended use of data; and its overarching focus (i.e., the initiative shined a data-driven light on the disconnect between China’s claims about its detention facilities in Xinjiang and what commercial satellite data suggests about their use).

The effort was also noteworthy because it appears that the approach taken could—with some data science engineering—be automated, scaled, or applied to other similar use cases around the world.

5. Automate and Augment: From ‘Big’ and Merged Data to Smaller Scale, Basic-Level Applications
Due to the diversity of terror threats and the complexity and scale of information that needs to be reviewed, the United States also needs to figure out which data processing and analytical tasks it can automate through investment in data science and ML/AI-driven approaches. Regardless of whether one is a fan of the Star Wars spin-off series The Mandalorian or not, when it comes to the future, like the creed espoused by the show’s main character, “This is the way.”

The U.S. government has been moving in the automation direction and has recognized the broad potential of data-enabled technologies for years.54 Indeed, as noted by Shultz and Clarke, Project Maven—DoD’s AI ‘path finder’ counterterrorism effort to automate the processing of ISR [intelligence, surveillance, and reconnaissance] data—“is not the endgame—it is a start point,” a “first step toward a data-enabled force.”55 t The “PED [Processing, Exploitation, and Dissemination] problem with FMV [Full Motion Video]” that Project Maven aimed to address was, according to the two authors, “a single point of entry. The intelligence warfighting function alone has many other data-rich nodes, such as digital media and other forms of captured enemy material, that are ripe for AI/ML application.”56 The broad vision that the United States has for automation is also reflected in the National Security Commission for AI’s 2021 report, which recommended that “Starting immediately, the IC [Intelligence Community] should prioritize automating each stage of the intelligence cycle to the greatest extent possible and processing all available data and information through AI-enabled analytic systems before human analyst review.”57 Such steps, if taken and executed well, will—as researcher Brian Katz has observed—lead to numerous benefits, such an helping to create more “strategic bandwidth” for analysts.58

A well-publicized area where the power and automation benefits of AI has been utilized for counterterrorism purposes is the identification, moderation, and removal of online terror content by social media and technology companies, from Facebook, YouTube, and Twitter to Snap and others.59 This includes AI tools and approaches utilized at these companies, to a content classifier created by a data science/AI firm in partnership with the U.K. Home Office,60 to collaborative initiatives like the Global Internet Forum to Counter Terrorism (GIFCT)61 or the United Nations-supported Tech Against Terrorism,62 which were set up to foster technical know-how, share knowledge, and advance research, as well as other efforts.63 In July 2021, GIFCT made news by announcing it would be diversifying the ideological type of information it shares with partners via a central database it manages, with primary emphasis no longer placed just on Islamist extremist content, an historical area of focus, but also on information about far-right extremist activity.64 GIFCT also announced that it would expand the types of information it shares—beyond photos and videos—to three new categories of content: “PDFs of terrorist or violent extremist attackers, terrorist publications that use specific branding and logos, and URLs that are often shared on social networks.”65 These developments point to a maturation of how extremist digital content is handled, and the scaling of AI and technical solutions to different types of extremist content and material produced by networks motivated by different ideologies.

AI has been used in more controversial ways to automate the identification of patterns of interest to counterterrorism practitioners. As highlighted by Kathleen McKendrick, one such example comes from “leaked details of the US National Security Agency’s SKYNET” effort, “which was purportedly used in Pakistan in 2007.”66 The algorithm developed was “used to analyse metadata from 55 million domestic Pakistani mobile phone users. This was a machine learning model built by exposure to this data; it classified the phone users into two separate groups, one of which exhibited a usage pattern matching that of a small group of persons known to be terrorist couriers, the other comprising the remainder of the mobile phone users.”67 Even though the model reportedly had a low false positivity rate (0.008 percent), the scale of data collected and purportedly analyzed “would result in the wrongful identification of some 15,000 individuals as of interest”—a large number.68 Despite the privacy considerations associated with the model and approach, it “shows how seemingly non-sensitive data may have predictive value when identifying close links with terrorism or likely intelligence value.”69

DoD has also been placing AI emphasis on approaches that blend or fuse different types of data. Shultz and Clarke put the vision and emphasis into context:

The ability to access publicly available data, find connections within classified archives, and rapidly alert a strategic commander to a threat, update the situational awareness of a unit in the field, or enable quick and precise information operations became a very real possibility and an invaluable opportunity. Effectively, operationalized AI and cloud computing were inseparable. Recognizing the potential, USSOCOM extended the partnership with Project Maven and US Air Force research and development offices to build an algorithmic capability that will blend publicly available data with classified information across the intelligence, planning, and operational portfolios. This vision has expanded beyond Project Maven and USSOCOM, and now features prominently in DoD’s Digital Modernization Strategy.70

In the discussion on the GTD above, it was pointed out how analysis of foundational and readily available open-source data on terror incidents could be automated and leveraged to provide insight into changes occurring in a specific place, or in relation to a group or issue. As noted earlier, there are a lot of groups and geographic ground for terrorism analysts to cover, and as a result, it can be hard for analysts to systematically track changes in key terrorism indicators—such as an uptick in the frequency of attacks in one area or a group’s shift to a new type of target in another—over time. Analysts’ time is precious and limited, and it is better spent on more complex analytical tasks rather than tracking a set of terrorism indicators or crunching that data; these tasks can and should be automated.u For example, GTD or another similar data repository like the Armed Conflict Location & Event Data Project (ACLED) could provide the historical data needed to generate and identify longitudinal, empirical terror trends. With one of those resources functioning as a data backbone, a data interface or dashboard could be designed that would allow users to select from a defined list of meaningful terrorism indicators (e.g., changes in scale of activity, geographic shifts, targeting trends, organizational complexity measures, lethality, etc.) that could be organized by region or group, tailorable to each user’s specific interests.

The tool would then present results to the user when noteworthy changes (which could be toggled at different levels of sensitivity) or data anomalies occur—and do so in an automated, alert-type way. The tool, and not the user, would run the data analytics. The value is that the tool would arm the user with both immediate context and notification that a meaningful change in data has been observed. The data and technology exist to create such a tool, which would be akin to an automated notification tool, not for specific events (like Dataminr) but for specific terrorism data trends. That way, analysts can focus less on context and change, and more on navigating other data complexities. The power of such a tool would be enhanced even more if it relied not just on a single data source of data like GTD or ACLED, but if it also fused or integrated indicator data from multiple sources,v and eventually different types of data.w

To develop such a tool, the U.S. government could draw upon lessons learned from prior AI efforts sponsored by the Intelligence Advanced Research Projects Activity (IARPA) and DARPA that have grappled with similar challenges. One example is IARPA’s Open Source Indicators Program, which was launched in 2011, and aimed to “develop methods for continuous, automated analysis of publicly available data in order to anticipate and/or detect significant societal events, such as political crises, humanitarian crises, mass violence, riots, mass migrations, disease outbreaks.”71 Another example is the Integrated Crisis Early Warning System (ICEWS), a project that DARPA kicked off in 2007.72

Conclusion
U.S. counterterrorism is at an important inflection point. Data and what the United States does with data will enable, and be a critical driver of, what the future of U.S. counterterrorism looks like. To prepare for that future, the United States needs to figure out how it can more effectively harness, extract more meaning from, and make more efficient and timely use of the vast stockpiles of terrorism-related data it has, and will continue to acquire in the years ahead. Given the U.S. withdrawal from Afghanistan and the ongoing ‘right sizing’ of U.S. counterterrorism, the United States also needs to navigate how it can maintain visibility into the inner-workings and plans of key international terror networks, especially when operating more from afar, an issue that could affect the ‘quality’ or ‘currency’ of terror data over time, and create new risks.

Even though the United States’ national security apparatus has access to state-of-the-art, leading-edge technology, what it can do to better leverage data is still constrained by technical obstacles and other barriers.73 For example, as noted by Shultz and Clarke, the “greatest roadblock to advancing AI capabilities for the warfighter” is “the lack of a dedicated cloud-based data management infrastructure capable of quickly cutting across classification levels.”74 Navigating through those various challenges requires vision to guide change; the resources, leadership, infrastructure, technical know-how, and talent needed to advance it; and the development of a cultural environmentx that fosters creativity, experimentation, risk and therefore the acceptance of possible failure, and that creates the time, ‘space,’ and opportunities needed to bring that future to life.y

The ideas shared in this article are designed to advance conversations, and hopefully spur debate, about what a terror data action plan could, and arguably should, look like; the need for it; and how components of it could be pursued. Given this article’s emphasis on four types of data that have generally received less public data science and ML/AI-focused attention, the view that it offers is partial and limited. When it comes to the broader vision for what the U.S. government should do with its diverse and multifaceted data holdings that provide insight into terrorism and counterterrorism questions, a useful starting point is an effort that begins at two points of departure. The first would be more traditional and focus on the primary terrorism and counterterror data injects, resources, software, and systems that the United States already has in play, or plans to acquire, and develop approaches to help it structure, integrate, and derive more meaning in relation to key priorities.

The second approach would be a bit more unconventional and start from a blank sheet of paper. Such an approach is recommended as it would create an opportunity for the U.S. government to take a step back from the current suite of systems, tools, and solutions that it utilizes, which could limit or constrain the pace and power of change. This is because while existing data and AI-focused tools and systems are essential to the practice of intelligence and continuity of operations over the short-term, they may wed the United States to approaches that rely on those systems and in turn prevent the United States from designing and implementing new concepts and approaches that could help it to drive more radical, rapid, and meaningful change over the mid- to long term. Or, put another way, starting from a blank sheet of paper would give the United States the opportunity to think through and design a construct that would allow it to achieve the goals that is has for specific types of data and the collective integration of various forms of data. And do so without constraints.     CTC

Don Rassler is an Assistant Professor in the Department of Social Sciences and Director of Strategic Initiatives at the Combating Terrorism Center (CTC) at the U.S. Military Academy. His research interests are focused on how terrorist groups innovate and use technology; counterterrorism performance and the evolution of counterterrorism practices and strategy; and understanding the changing dynamics of militancy in select countries in Asia. Twitter: @DonRassler

The views expressed in this article are the author’s and do not necessarily reflect those of the Combating Terrorism Center, United States Military Academy, Department of Defense, or U.S. Government.

© 2021 Don Rassler

Substantive Notes
[a] For example, as noted in the U.S. Department of Defense’s 2020 Data Strategy: “The DoD now recognizes that data is a strategic asset that must be operationalized in order to provide a lethal and effective Joint Force that, combined with our network of allies and partners, sustains American influence and advances shared security and prosperity.” “DoD Data Strategy,” U.S. Department of Defense, 2020, p. i. For another perspective on the strategic utility of data to U.S. national security, see Edmund L. Andrews, “Re-Imagining Espionage in the Era of Artificial Intelligence,” Stanford Institute for Human-Centered Artificial Intelligence, August 17, 2021.

[b] When it comes to national security, as noted by the 2019 NSCAI report, “AI will change how we defend America,” “AI will change how intelligence agencies make sense of the world,” and “AI will change how we fight.” “Interim Report,” National Security Commission on Artificial Intelligence, November 2019. For other views on the transformative potential of AI, see Greg Allen and Taniel Chan, Artificial Intelligence and National Security (Cambridge, MA: Belfer Center for Science and International Affairs, July 2017) and Michael C. Horowitz, Gregory C. Allen, Edoardo Saravalle, Anthony Cho, Kara Frederick, and Paul Scharre, Artificial Intelligence and International Security (Washington, D.C.: Center for New American Security, July 2018).

[c] For example, DARPA announced in September 2018 a multi-year investment of more than $2 billion in new and existing programs called the “AI Next” campaign. See “AI Next Campaign,” DARPA.

[d] One important reflection of this pivot are the goals outlined in the 2021 final report from the National Security Commission on Artificial Intelligence, which aim to have an “AI Ready IC by 2025” and an “AI Ready DoD by 2025.” See chapters 5 and 3 of the final report for full context.

[e] The U.S. Department of Defense has been investing in AI approaches for a considerable period. For example, the National Media Exploitation Center has been investing in AI for at least 15 years. See Brooke Crothers, “Artificial intelligence linked to Bin Laden raid is being used to find future threats,” Fox News, July 2, 2020.

[f] As Richard Shultz and Richard Clarke have noted: “Not only is SOF AT&L [Special Operations Forces Acquisition, Technology, and Logistics] taking steps to seek out potential applications of AI/ML in all existing programming lines, in March 2020, the directorate formally created a Program Executive Officer for SOF Digital Applications to improve enterprise-wide acquisition of software solutions.” See Richard H. Shultz and Richard D. Clarke, “Big Data at War: Special Operations Forces, Project Maven, and Twenty-First Century Warfare,” Modern War Institute, August 25, 2020.

[g] The author recognizes that social media platforms are often primarily outlets or tools through which terror groups and violent extremists release propaganda material. The point being made here is that this article does not directly focus on or explore the potential associated with social media data as a general category of data. Instead, it discusses official terror group propaganda, which is often distributed online through social media mechanisms, as an example of one specific type of data that remains underleveraged.

[h] As Tricia Bacon and Jason Warner noted in this publication, jihadi “violence now affects at least five regions on the [African] continent and 22 countries, including several that had no history of jihadism prior to 2001, such as Mozambique and Burkina Faso.” See Tricia Bacon and Jason Warner, “Twenty Years after 9/11: The Threat in Africa – The New Epicenter of Global Jihadi Terror,” CTC Sentinel 17:7 (2021).

[i] According to Zegart, the “Five Mores” are: 1) more threats, 2) more data, 3) more speed, 4) “the expanding number of decision makers who need intelligence,” and 5) more competition. See Andrews for background.

[j] Or, as succinctly characterized by Zegart, “more data” and “more speed.”

[k] Reporting by The Los Angeles Times provides another data point with respect to this issue. According to reporting by W.J. Hennigan in 2016, up till that point, “The largest data trove was recovered when U.S.-backed Syrian rebel forces recaptured Manbij, an Islamic State stronghold in northern Syria, in mid-August [2016]. Intelligence agencies recovered more than 120,000 documents, nearly 1,200 devices and more than 20 terabytes of digital information.” See W.J. Hennigan, “Captured battlefield cellphones, computers are helping the U.S. target and kill Islamic State’s leaders,” Los Angeles Times, October 26, 2016. It has been reported that the raid that targeted Usama bin Ladin in Abbottabad, Pakistan, resulted in the recovery of more than 470,000 individual files and 2.7 terabytes of data. See Sandra Erwin, “Can artificial intelligence help U.S. SOCOM track weapons of mass destruction?” Space News, April 24, 2018.

[l] Another example cited by Shultz and Clarke noted: “Full-motion video (FMV) collected by UAV platforms grew exponentially in the early 2010s. Understanding what this encompassed can be stupefying. For example, one estimate noted that in 2011, UAVs ‘sent back over 327,000 hours (or 37 years) of FMV footage.’ By 2017, it was estimated for that year that the video US Central Command collected could amount to ‘325,000 feature films [approximately 700,000 hours or eighty years].’” See Shultz and Clarke.

[m] For example, GTD data is important baseline data to identify and assess, as noted by the 2019 U.S. intelligence strategy, “the capabilities, activities, and intentions of states and non-state entities to develop a deep understanding of the strategic environment, warn of future developments on issues of enduring interest, and support U.S. national security policy and strategy decisions” or to “Broaden and deepen strategic knowledge of the global terrorism landscape to provide context to customers.” See “National Intelligence Strategy of the United States of America,” Office of the Director of National Intelligence, January 22, 2019, pp. 8, 12.

[n] Two other statistics help to provide a sense of scale of the amount of data collected. For example, over one four-month period in 2017, U.S. special operations forces were involved in or directly supported 2,175 ground operations against the Taliban, Islamic State Khorasan, and Haqqani network militants in Afghanistan. See “Operation Freedom’s Sentinel: Quarterly Report to the United States Congress – October 1, 2017 – December 31, 2017,” U.S. Department of Defense Office of Inspector General, February 16, 2018, p. 39. Second, as Shultz and Clarke noted about U.S. operations in Iraq, as “the JIATF took shape, and raids increased to three hundred a month, intelligence became unmanageable with massive amounts of captured enemy material—documents, hard drives, thumb drives, cell phones—flowing into the system.” See Shultz and Clarke.

[o] For example, in 2018 more “than 75 officials from across the U.S. government participated in a battlefield evidence senior leader seminar” held at SOCOM. See “US senior leaders explore battlefield evidence processes at USSOCOM,” USSOCOM, December 10, 2018.

[p] In another interesting use case, the Dutch Ministry of Justice and Security commissioned RAND Europe to conduct a study of a considerable portion of the full archive of material (470,000 records) recovered from Usama bin Ladin’s compound and released to the public. That study leveraged different technologies and ML to make sense of the collection. See Jacopo Bellasio et al., Insights from the Bin Laden Archive: Inventory of research knowledge and initial assessment and characterization of the Bin Laden archive (Santa Monica, CA: RAND Corporation, 2021). The UBL archive has also been creatively utilized in other ways. For an example, see “Machine learning, UFOs, and Darth Vader,” Fathom Information Design, August 20, 2018.

[q] As noted by Shultz and Clarke, “Data classification—both in terms of archival organization and security compartmentalization—had become a monumental roadblock.” See Shultz and Clarke.

[r] Two examples of data-driven efforts focused on this area include the Counterterrorism Net Assessment Data Structure (CT-NEADS) project and the Government Actions in Terror Environments (GATE) Dataset, the latter of which placed initial focus on Israel and Canada. For background on CT-NEADS, see “Counterterrorism Net Assessment Data Structure,” START. For background on GATE, see Laura Dugan and Erica Chenoweth, “Introducing Government Actions in Terror Environments (GATE) Dataset,” paper presented at The Construction of Terrorism Conference, December 3-4, 2015.

[s] One useful resource that could help kick start and guide such an effort and related research inquires would be the Influencing Violent Extremist Organizations (I-VEO) Knowledge Matrix tool developed by START, which contains a list of 183 hypotheses “about influencing VEOs, from positive incentives to punitive actions.” For background on that project, see “IVEO Knowledge Matrix,” start.umd.edu; start.foxtrotdev.com; and “START launches new tool for counterterrorism community,” start.umd.edu, August 31, 2012.

[t] As noted by the ODNI’s AIM Strategy, the U.S. intelligence community holds a similar view: “Leveraging artificial intelligence, automation, and augmentation technologies to amplify the effectiveness of our workforce will advance mission capability and enhance the IC’s ability to provide needed data interpretation to decision makers.” “The AIM Initiative: A Strategy for Augmenting Intelligence Using Machines,” Office of the Director of National Intelligence, January 16, 2019.

[u] Or as described by Shultz and Clarke: “Reinvesting the analyst’s expertise and energy away from screen-watching and onto more exquisite tasks is not just economical, it is a combat multiplier.” Shultz and Clarke.

[v] A useful proof of concept in this regard is the ExTrac tool, which combines real-time terror attack data with terror communications and is marketed as leveraging AI to develop analytical insights. For background, see https://extrac.io/. Another useful case to look at is Andi Peng’s undergraduate thesis paper, which provides “a novel approach to studying terrorism” by integrating “supervised machine learning techniques with terrorism specific domain knowledge to extract macro-level conclusions about the pattern of terrorist behavior.” See Andi Peng, “An Integrated Machine Learning Approach To Studying Terrorism,” Yale University thesis, April 20, 2018.

[w] As noted by the ODNI’s AIM Strategy: “Nearly all current commercial applications of AI are narrow solutions in that they solve a single problem with a single kind of data. Image classification, face recognition, and human language translation are all examples of narrow AI solutions. The IC must bring together data from multiple INTs to provide context and meaning to analysts over a variety of different data. Multimodal AI presents a whole new group of challenges in a number of areas that the IC must overcome.” See “The AIM Initiative: A Strategy for Augmenting Intelligence Using Machines.”

[x] As a CSIS Intelligence and Technology Task Force report noted: “The primary obstacle to intelligence innovation is not technology, it is culture.’’ See “Maintaining the Intelligence Edge: Reimagining and Reinventing Intelligence through Innovation,” Center for Strategic and International Studies, January 2021.

[y] There are a variety of ideas and methods to foster creative approaches to data and data application challenges. For example, as noted by Amy Zegart, “One of the most intriguing ideas that the [CSIS Intelligence and Technology] task force [report] came up with is to have AI ‘red cells,’ or teams that use open-source information and AI and compete against human analysts.” See Andrews. Another method, already pursued by SOCOM, is student competitions. See “MIT Army ROTC Cadets tackle SOCOM Innovation Challenge,” U.S. Army Cadet Public Affairs, December 1, 2020.

Citations
[1] Brooke Crothers, “Artificial intelligence linked to Bin Laden raid is being used to find future threats,” Fox News, July 2, 2020.

[2] For a perspective on this issue, see Richard H. Shultz and Richard D. Clarke, “Big Data at War: Special Operations Forces, Project Maven, and Twenty-First Century Warfare,” Modern War Institute, August 25, 2020.

[3] Brian Michael Jenkins, “The Future Role of the U.S. Armed Forces in Counterterrorism,” CTC Sentinel 13:9 (2020).

[4] “Final Report: National Security Commission on Artificial Intelligence,” National Security Commission on Artificial Intelligence, March 2021, p. 7.

[5] For examples, see the National Artificial Intelligence Research and Development Strategic Plan (2016), Department of Defense AI Strategy (2018), Office of the Director of National Intelligence’s AIM Strategy (2019), DoD’s Data Strategy (2020), Department of Homeland Security’s AI Strategy (2020).

[6] For some examples, see Tajha Chappellet-Lanier, “Pentagon’s Joint AI Center is ‘established,’ but there’s much more to figure out,” FedScoop, July 20, 2018; Jennifer Bocanegra, “USSOCOM cuts ribbon on new Data Engineering Lab,” USSOCOM, October 3, 2019; Dave Nyczepir, “National AI Initiative Office Launched by White House,” FedScoop, January 2021; “The Biden Administration Launches AI.gov Aimed at Broadening Access to Federal Artificial Intelligence Innovation Efforts, Encouraging Innovators of Tomorrow,” The White House, May 5, 2021; “Today’s NCTC,” Office of the Director of National Intelligence, August 2018 (see in particular the section on the Office of Data Strategy and Innovation (ODSI) on p. 22).

[7] Examples of ODNI’s AI focus can be found on IARPA’s “Research Programs” webpage. For examples of JAIC’s testing of AI concepts and applications, see “Eye in the Sky: DOD Announces AI Challenge,” Defense Innovation Unit, August 15, 2019, and “JAIC partners with USSOCOM to deliver AI-enabled predictive maintenance capabilities,” JAIC, December 17, 2020. For a DARPA example, see Eduard Hovy, “Knowledge-directed Artificial Intelligence Reasoning Over Schemas (KAIROS),” DARPA.

[8] AI Next Campaign webpage, DARPA.

[9] To learn more about JAIC, see JAIC’s website. See also Don Rassler, “A View from the CT Foxhole: Lieutenant General John N.T. ‘Jack’ Shanahan, Director, Joint Artificial Intelligence Center, Department of Defense,” CTC Sentinel 12:11 (2019).

[10] “Final Report: National Security Commission on Artificial Intelligence,” Chapter 2.

[11] David Vergun, “AI Gleaned Information About Emerging Threats, Future Plots From bin Laden Raid,” DoD News, June 26, 2020.

[12] Shultz and Clarke. For other examples, see Kathleen McKendrick, “Artificial Intelligence Prediction and Counterterrorism,” RUSI, August 2019.

[13] Shultz and Clarke.

[14] Ibid. For a perspective on the ongoing nature of this type of work, see “ISR Processing, Exploitation, and Dissemination Laboratory,” MIT Lincoln Laboratory webpage.

[15] Robert P. Ashley, “Ten Years After Bin Laden, We Still Need Better Intelligence Sharing,” Defense One, May 1, 2021.

[16] For a perspective on some of the issues associated with ‘right sizing’ U.S. counterterrorism, see Stephen Tankel, “Making the U.S. Military’s Counter-Terrorism Mission Sustainable,” War on the Rocks, September 28, 2020.

[17] Paul Cruickshank, Don Rassler, and Kristina Hummel, “Twenty Years after 9/11: Reflections from General (Ret) Joseph Votel, Former Commander U.S. Central Command,” CTC Sentinel 14:7 (2021).

[18] Paul Cruickshank, “A View from the CT Foxhole: An Interview with Nick Rasmussen, Director, NCTC,” CTC Sentinel 8:9 (2015).

[19] Edmund L. Andrews, “Re-Imagining Espionage in the Era of Artificial Intelligence,” Stanford Institute for Human-Centered Artificial Intelligence, August 17, 2021; “Final Report: National Security Commission on Artificial Intelligence,” Chapter 5; “Maintaining the Intelligence Edge: Reimagining and Reinventing Intelligence through Innovation,” Center for Strategic and International Studies, January 2021.

[20] There are some important exceptions to this. For example, see Boaz Ganor, “Artificial or Human: A New Era of Counterterrorism Intelligence,” Studies in Conflict and Terrorism 44:7 (2019) and Jonathan Fischbach, “A New AI Strategy to Combat Domestic Terrorism and Violent Extremism,” Harvard Law School National Security Journal (online), May 6, 2020. See also “Predicting Radicalisation Before It Happens – General AI for Law Enforcement,” in Marie Schroeter, Artificial Intelligence and Countering Violent Extremism: A Primer (London: GNET, 2020) and Walter Haydock, “Ghosts in the Machine: Targeting Homegrown Violent Extremists with Artificial Intelligence-enabled Investigations,” Lawfare, November 22, 2016.

[21] For background, see “Final Report: National Security Commission on Artificial Intelligence,” Chapter 8; Manjana Sold and Julian Junk, Researching Extremist Content on Social Media Platforms: Data Protection and Research Ethics – Challenges and Opportunities (London: GNET, 2021); and Thomas Hegghammer, “Resistance is Futile: The War on Terror Supercharged State Power,” Foreign Affairs, September/October 2021. See also the work of the Stanford Center for AI Safety (SAFE) at its website.

[22] Paul Cruickshank, Don Rassler, and Kristina Hummel, “Twenty Years After 9/11: Reflections from Michael Morell, Former Acting Director of the CIA,” CTC Sentinel 14:7 (2021).

[23] “’Total Failure’: The War on Terror 20 Years On,” Agence France-Presse, August 26, 2021.

[24] Ibid.

[25] “The AIM Initiative: A Strategy for Augmenting Intelligence Using Machines,” Office of the Director of National Intelligence, January 16, 2019.

[26] “U.S. Reveals Face of Alleged New Terror Chief,” CNN, June 15, 2006.

[27] Eric Schmitt, “A Raid on ISIS Yields a Trove of Intelligence,” New York Times, June 9, 2015.

[28] “The AIM Initiative: A Strategy for Augmenting Intelligence Using Machines.”

[29] For background on the Global Terrorism Database, see https://www.start.umd.edu/gtd/

[30] For background on GTD’s funding and recent related challenges, see Karina Panyan, “START resumes Global Terrorism Database collection; 1970 – 2019 data file now available to researchers,” START, February 26, 2021.

[31] For the quote, see Michael Fenzel, Leslie Slootmaker, and Kim Cragin, “The Strategic Potential of Collected Exploitable Material,” Joint Forces Quarterly 99 (2020).

[32] For background on forensic material recovered from CEM, see Christopher Stoltz, “Augmenting the AOR: EOD Airman provides critical skillset to Army forensics team,” U.S. Air Force – 386th Air Expeditionary Wing Public Affairs, July 23, 2018, and Fenzel, Slootmaker, and Cragin.

[33] “US senior leaders explore battlefield evidence processes at USSOCOM,” USSOCOM, December 10, 2018. For additional background on utility of CEM for criminal prosecutions, see Adam Pearlman, “Strong CT Partners Make for Better Global Competitiveness,” The SCIF, March 30, 2021, and Robert Cryer, “The UN Guidelines on ‘Battlefield’ Evidence and Terrorist Offences: A Frame, a Monet, or a Patchwork?” Just Security, August 21, 2020.

[34] For another perspective on the strategic value of CEM, see Fenzel, Slootmaker, and Cragin.

[35] Crothers.

[36] Vergun; Crothers.

[37] Vergun.

[38] Darwin McDaniel, “Leidos Brings AI to Big Data Analytics Through New Tool,” Executive Biz, May 24, 2019; Stew Magnuson, “News from SOFIC: New Software Can Sift Through Decades of Stovepiped Intelligence Data,” National Defense, May 23, 2019; “Advanced Analytics and Machine Learning Microservices Platform,” Leidos.

[39] Fenzel, Slootmaker, and Cragin.

[40] Ibid.

[41] Ibid.

[42] Cruickshank, Rassler, and Hummel, “Twenty Years After 9/11: Reflections from General (Ret) Joseph Votel, Former Commander of U.S. Central Command.”

[43] Author’s review of unclassified and approved for release batches of primary sources/CEM recovered during operations in Afghanistan.

[44] Fenzel, Slootmaker, and Cragin.

[45] Jenna Jordan, “Attacking the Leader, Missing the Mark: Why Terrorist Groups Survive Decapitation Strikes,” International Security 38:4 (2014), pp. 7-38; Bryan Price, “Targeting Top Terrorists: How Leadership Decapitation Contributes to Counterterrorism,” International Security 36:4 (2012): pp. 9-46; Patrick B. Johnston, “Does Decapitation Work? Assessing the Effectiveness of Leadership Targeting in Counterinsurgency Campaigns,” International Security 36:4 (2012): pp. 47-79; Daniel Milton and Bryan Price, “Too central to fail? Terror networks and leadership decapitation,” International Interactions 46 (2020).

[46] There has been some limited work done in this area. For example, see Amira Jadoon, Andrew Mines, and Daniel Milton, “Leader Decapitation and its Short-term Effects,” working paper.

[47] Seth Loertscher, Daniel Milton, Bryan Price, and Cynthia Loertscher, The Terrorist Lists: An Examination of the U.S. Government’s Counterterrorism Designations Efforts (West Point, NY: Combating Terrorism Center, 2020).

[48] For background on LeT’s evolution, see Stephen Tankel, Storming the World Stage: The Story of Lashkar-e-Taiba (Oxford: Oxford University Press, 2013).

[49] For background on LeT’s primary sources, see C. Christine Fair, In Their Own Words: Understanding Lashkar-e-Tayyaba (New York: Oxford University Press, 2018), and Samina Yasmeen, Jihad and Dawah: Evolving Narratives of Lashkar-e-Taiba and Jamat ud Dawah (London: Hurst and Company, 2017).

[50] Don Rassler, C. Christine Fair, Anirban Ghosh, Arif Jamal, and Nadia Shoeb, “The Fighters of Lashkar-e-Taiba: Recruitment, Training, Deployment, and Death,” Occasional Paper, Combating Terrorism Center, April 2013.

[51] Ibid.

[52] For background, see the Human Rights topic category on the National Geospatial-Intelligence Agency’s tearline.mil website.

[53] Eric Robinson and Sean Mann, “Part 1: Investigating the Growth of Detention Facilities in Xinjiang Using Nighttime Lighting,” Tearline.mil, February 24, 2021.

[54] Shultz and Clarke. For useful context on computational approaches to counterterrorism, see V. S. Subrahmanian ed., Handbook of Computational Approaches to Counterterrorism (New York: Springer-Verlag, 2013).

[55] Shultz and Clarke.

[56] Ibid.

[57] “Final Report: National Security Commission on Artificial Intelligence,” Chapter 5.

[58] Brian Katz, “The Intelligence Edge: Opportunities and Challenges from Emerging Technologies for U.S. Intelligence,” CSIS Brief, April 17, 2020.

[59] For some background, see Monika Bickert and Brian Fishman, “Hard Questions: How We Counter Terrorism,” Facebook, June 15, 2017; John Constine, “Snap joins rivals Facebook and YouTube to fight terrorism,” Tech Crunch, July 31, 2017; and Digital Counterterrorism: Fighting Jihadists Online (Washington, D.C.: Bipartisan Policy Center, 2018).

[60] “Stopping the spread of online Daesh propaganda,” Faculty.

[61] For background, see “About,” Global Internet Forum to Counter Terrorism. See also the Global Network on Extremism and Technology website.

[62] For background, see “About Tech Against Terrorism,” Tech Against Terrorism website.

[63] For an example of other initiatives that have leveraged AI to counter extremism, see the work of Moonshot CVE. Nickie Louise, “Moonshot CVE, a Google-backed startup is using internet ads to counter online extremism,” Tech Startups, March 29, 2018.

[64] Adi Robertson, “Tech company anti-terrorism initiative will increase its focus on far-right groups,” Verge, July 26, 2021.

[65] Ibid.

[66] McKendrick.

[67] Ibid.

[68] Ibid.

[69] Ibid.

[70] Shultz and Clarke.

[71] See “Open Source Indicators (OSI)” webpage.

[72] For background, see “DARPA program to develop computer system to forecast wars and other political instability,” Military & Aerospace Electronics, February 1, 2007. See also “Integrated Crisis Early Warning System (ICEWS),” Lockheed Martin webpage.

[73] For background on some of these barriers, see “American Artificial Intelligence Initiative: Year One Annual Report,” White House Office of Science and Technology Policy, February 2020; Brian Katz, “The Analytic Edge: Leveraging Emerging Technologies to Transform Intelligence Analysis,” CSIS Brief, October 9, 2020.

[74] Shultz and Clarke.

Stay Informed

Sign up to receive updates from CTC.

Sign up