Data Analytics and Machine Learning Science

Thrust Area Leader:
Gregory Ditzler, The University of Arizona

Current approaches in cybersecurity are primarily signature-based that operate by looking for a signature to detect a malicious attack; however, these techniques fail to identify malicious activities that do not exactly match the signature. Recent, machine learning and data science have shown to be incredibly successful in many application driven fields such as cybersecurity. In fact, some classification tasks, such as image classification, machine learning has been shown to outperform humans at the same task. Therefore, in this proposal, we seek to integrate machine learning into each of the cybersecurity and forensic tasks to improve upon the state-of-the-art. Furthermore, many machine learning models are static in the sense that once they are trained, they are not updated again. Unfortunately, many real-world classification tasks are not static and naturally evolve or change over time. It is against this background that we focus on machine learning techniques that model an adversary in a cyber environment for machine learning algorithms. Therefore, we will research change detection algorithms and how knowing information about a machine learning model and the adversary can be exploited to make the algorithm more robust in a cyber environment. Our machine learning and data science expertise will be applied to cybersecurity and we will focus on the entire data science pipeline (i.e., data collection, preprocessing, learning and classification) to improve upon the state-of-the-art in cybersecurity.

Research Approach:
To achieve the aforementioned goals, we propose to: (1) research how to learn in a dynamic and changing environment, such that learning can be done with high accuracy as well as detect change in a complex environment, (2) leverage adversarial information (i.e., data and information about the attack model) to improve upon the accuracy of the system, (3) develop a data science pipeline for modeling the adversary from the data collection to classification, (4) identify and develop metrics of success for learning in such environments, and (5) benchmark the proposed approaches against state-of-the-art algorithms in cybersecurity.

The following are descriptions of some of the research projects in this thrust area:


Machine Learning Techniques to Adjust to Continuous Changes in Data Streams

The goal of this project is to develop an Adaptive Big Data Analytics Environment (ABDAE) that can adjust computations to respond promptly to rapid changes in data and cyber physical environments. Addressing this problem space will require using adaptive Big streaming data analytics that can: (a) model cyber physical infrastructures that encompass realistically complex critical infrastructure operations, (b) ingest the massive data sets needed to capture large-scale dynamic systems complexity, and (c) process and update the analytics results in a timely manner in order to test contrasting mechanistic models and drive the next set of analyses. The outcomes of this project will be: (1) a proof-of-concept algorithm that implements active change detection and passive learning models for non-stationary environments; (2) a framework for hybrid/passive learning methods for learning in high-volume data streams; (3) provide a recommendation system for cybersecurity applications that face non-stationary streaming environments; and (4) provide a benchmark using synthetic and real-world data sets along with statistics that measure the overall efficacy of the proposed approaches.

Our first focus will be on developing and identifying algorithms for change detection in large volume data streams. We have identified algorithms for change detection that use the probability of error from a classifier to detect a change in the data stream. We have identified change detection algorithms that use the probability of error to detect a change. Furthermore, many of the approaches described in this section have the ability to not only detect the change, but also provide a warning of a potential change in the data stream.

The change detection algorithms can be implemented as a part of monitoring the classification error of a model. We now describe the change detection algorithms that use classification error as a mechanism to aid in the detection of change in a data stream. We should note that we are primarily interested in increases in the error rate since we expect the error to drop or converge as new data are presented over time. An increase in the error is an indication that some property of the data stream has changed, and the learner should be reset. We have also identified the computational intelligence CUSUM (CI-CUSUM) change detection test that was developed by Alippi and Roveri (2008). CI-CUSUM addresses the change detection aspect leaving the design of just-in-time adaptive classification systems to a companion paper. Two completely automatic tests for detecting nonstationarity phenomena are suggested, which neither require a priori information nor assumptions about the process generating the data. In particular, an effective computational intelligence-inspired test is provided to deal with multidimensional situations, a scenario where traditional change detection methods are generally not applicable or scarcely effective.

Supervised classifiers will be identified and learned based on cybersecurity data sources that are domain specific (e.g., features collected from specific types text that are often labeled). The idea of specific data sources will have certain features sets that will allow us to choose the classifier that is the best fit for the types of data. For example, the Hoeffding tree is an online algorithm for learning a decision tree classifier. These algorithms have the desirable quality that they can easily handle many different data types that are not as trivial to work compare to other classifiers (e.g., neural networks). If data are sampled from real-valued quantities and the decision rule is simple then an online linear classifier could be learned by using stochastic gradient descent to minimize a convex loss function (i.e., hinge-loss would produce a support vector machine with a linear kernel). To address the possibility of learning in a nonstationary environment, we propose to use ensemble classifiers that have been a popular solution for learning in nonstationary environment using a passive strategy that continuously update the parameters of the model whenever labeled data are available. Our framework trains multiple classifiers on each of the data sources when labeled data are available (see Figure 3 that show models are added to the ensemble as new data become available and adaptive weights are assigned to the different models to attempt to maximize the accuracy of the final model). Ensembles have been shown to provide desirable theoretical and empirical traits, which is why we propose to learn a classifier on each data source (Freund and Shapire, 1997; Breiman, 1996; Breiman, 2001), and aggregate their decision. We will investigate coupling the change detection algorithms with the supervised classifier to enable them to rapidly learn in a changing cybersecurity environment. Furthermore, we will also investigate the feasibility of using deep neural networks in these dynamically changing environments. Co-PI Ditzler’s group has recently been working in this area and plan to contribute to the research in neural networks as well as training students in applied machine learning in cybersecurity.


Approaches to Counter Adversarial Manipulation of Machine Learning Algorithms

Given the significance of the problem and the urgent need for an adversarial big data analytic capabilities for Intrusion Prevention/Detection Systems (IPS/IDS) in a cyber environment, we have identified a novel approach to implement these capabilities. The proposed multilayer framework that is disposable, diverse, and autonomous. Our framework can broadly be applied to problems that face large volumes of data and uses machine learning to leverage adversarial data as well to improve:
Our goal for is geared to address the following machine learning limitations:
– The increasing prevalence of classification problems with massive volumes of streaming data plagues the state-of-the-art data mining algorithms. Application areas include – but are not limited to – climate, remote sensing, fraud detection, web usage tracking, IPS/IDS and malware detection for cyber-security data. The use for many “commercial-off-the-shelf” classification is not useful if they cannot cope with learning in some of the harshest environments.
– Recent research has shown that many classifiers are susceptible to attacks even if the direct form of the model is unknown (e.g., logistic regression or neural network).

The security and privacy of machine learning algorithms have been exposed in many classes’ models (e.g., logistic regression, neural networks, etc.). In fact, adversaries in machine learning has been a topic of recent interest and concern due to the mathematical flaws in the algorithms. Furthermore, domains of cybersecurity pose an even larger threat than many applications of machine learning because of how an adversary can influence the training data or even the testing data. Application areas related to this proposal include – but are not limited to – remote sensing, fraud detection, web usage tracking, IPS/IDS and malware detection for cybersecurity data. We will model the adversary in cybersecurity from preprocessing the data to learning a model for classification, to making predictions on unseen data (i.e., a typical data science pipeline for analyzing data). It should be noted that adversaries’ impact on the preprocessing of data is often overlooked (Liu and Ditzler, 2019; Elderman et al, 2017). Therefore, we have identified a data science pipeline that leverages information-theoretic feature selection (FS), which is the process of identifying features in data that are informative, to understand how the adversary can negatively impact the FS. One robust and relevant features have been selected, we will then focus on learning a classifier by leveraging adversarial data from cybersecurity. Co-PI Ditzler’s team was the first to show how to insert data samples into a training dataset that poisons an information-theoretic feature selection algorithm. Note that while cybersecurity is one area that can benefit from adversarial data, it is often not used in benchmarks. Therefore, our efforts will focus on mathematical models of an adversary and a data science pipeline for processing data for cybersecurity.


Cybersecurity Detection, Protection and Forensic Analysis

Thrust Area Leader:
Salim Hariri, University of Arizona

Current cyberattack analysis and detection tools are mainly static and manually intensive. At the same time, the complexity of cyber systems, their dynamic behavior, and the availability of many heterogeneous devices that are static and mobile make these tools incapable to accurately characterize current states, detect malicious attacks and stop them or their fast propagation and/or minimize their impacts.
Research is needed to better understand human cognitive processes in relation to how alerts are processed and how best to present alerts to analyst so that appropriate action can be taken at the time of a cyber attack. In fact, what is needed is a paradigm shift in the way we model and characterize attacks, how we identify them and develop prompt responses to stop them or their propagations and minimize their impacts on mission critical operations. We need urgently to develop a cybersecurity science field that provides the mathematical foundations to build the next generation of autonomic monitoring tools that continuously monitors the cyber system resources and services 24 by 7, analyze the current state and predict the next operational state, perform anomaly behavior analysis to detect attacks and proactively either recommend actions to stop attacks and minimize their impacts and/or respond automatically depending on the severity of these attacks and their potential impacts on the overall system operations.

Research Approach:
The main research activities will include (1) the development of a theoretical framework to perform anomaly behavior analysis of cyber
systems, protocols and applications; (2) the development of cyber-social data structures and metrics that can be used to integrate social
and cyber activities to improve the accuracy and the time to detect insider attacks; (3) the development of bioinspired self-protection
system; and (4) development of a methodology to perform continuous forensic monitoring, analysis and protection.


Theoretical Framework for Anomaly Behavior Analysis of Cyber Systems and Protocols

​Our anomaly behavior analysis methodology as shown in Figure 2 is defined over a universe space U, which is a finite set of events. Since we are modeling the overall cyber system behavior we can consider the event set U as all possible transitions in the system. U is partitioned into two subsets N and A, which respectively denote the Normal and Abnormal events, such that N∪A=U and N∩A=∅. To model the U space we need a representation map to represent the events in the event set U for further analysis.

Thus the representation map R is responsible to map the events in U to their representation patterns in U^R as U□(□(⇒┴R ) U^R ). Likewise, the N^R and A^R respectively represent the normal events set N, and abnormal events set A, such that N□(⇒┴R ) N^R, A□(⇒┴R ) A^R and N^R∪A^R=U^R. A detector is defined as a system D=(f,M) with two components f and M, where f is the anomaly characterization function defined as f:U^R→[0,1] and M is the memory of the system that keeps the extracted normal patterns from N^R as a normal behavior model. With an output between 0 and 1, function f specifies the degree of abnormality for a sample event sequence s∈U^R through comparing it with the stored normal model M. The greater the value of f, the more abnormality degree for the sample s. The detector D is a binary classifier, which classifies a sample s∈U^R as normal or abnormal by comparing it with normal model M. We can consider D as:

D(s)={■(abnormal & if f(s,M)>τ_i@ & @normal &otherwise)┤
where τ_i is the ith element of an n dimensional threshold vector. Detection occurs when the detector classifies a sample as abnormal. The detection errors are defined over a test set U_t^R which is a subset of U^R, U_t^R⊆U^R. Two types of errors are considered for a detector: false positive and false negative. The false positive happens when a sample from normal set N^R is detected as an abnormal event, which is defined as ε^+={s∈N^R│D(s)=abnormal}; the false negative occurs when the detector classifies an abnormal sample s∈A^R as a normal event (undetected anomalies), that is ε^-={s∈A^R│D(s)=normal}.

We have applied this methodology to detect attacks against several protocols including WiFi, DNS, BACnet, Modbus, and Bluetooth (Alipour, 2015; Satam, 2015; Pan, 2014; Mallouhi, 2011; Satam, 2018). In what follow, we briefly describe how to apply the approach to model the behavior of the WiFi protocol as shown in Figure 3. In our protocol behavior analysis approach, we consider the frequency of a sequence of protocol transitions over a period of time as a measure of whether or not the protocol is behaving normally. During the training phase, state transitions are represented as a multiset of n-gram patterns (N_T^R), and their statistical properties are captured in the corresponding normal behavior model (M). During the testing phase, the frequency of any N consecutive transitions of the protocol in each session S_l is computed during frequency of the normal transitions that are stored in the normal behavior model M. The difference between these two values specifies the anomaly degree for each sub-session S_(l,∆T).

Figure 4 depicts the a-score (anomaly score) distribution of both attack and normal traffic in the same graph. By comparing the two distributions, we observe that the normal and abnormal traffic can be easily differentiated with a good margin as is specified by blue dashed area. It means the a score threshold τ can be set to some value between 6 and 15. A detailed discussion of our approach and these evaluation results are presented in (Alipour, 2013).


Wireless Cybersecurity, Modeling and Analysis

Thrust Area Leader:
Tamal Bose, University of Arizona

Cybersecurity is a critical area in the domain of wireless communications/networks and has applications in multiple domains. For mission critical operations, it is imperative to ensure that transmissions are conducted reliably and discreetly. Without protocols in place to protect the content of these transmissions, adversaries can interfere with these communications through jamming or intercepting and hacking. The High Frequency (HF) band (3-30 MHz), for example, is a common operating range used by government radios for long-range communications without satellites. A protocol called the Automatic Linking Establishment (ALE) was designed to facilitate a linking process between HF stations; however, while it provides capabilities for encryption/decryption, it is still susceptible to jamming [Johnson 2013]. There has also been work in ensuring that the terminals of space systems are not compromised by adversaries, to prevent “[…] major damage to space missions by exfiltration or corrupting critical mission information or disrupting mission communications at critical times” [Gavins 2010]. The importance of cybersecurity also extends to commercials applications. With the growing popularity of Internet of Things (IOT) technologies, for example, security protocols will be required to protect the data of users that are stored across networked devices. In addition, with the incorporation of vehicular ad hoc networks (VANETS) in the automotive industry, measures must be adopted to ensure that the network isn’t contaminated with invalid information, which would cause confusion among all participating vehicles [Stolyarova 2018]. In summary, with the spur of technological advancements being made in wireless communications/networks, cybersecurity protocols must be created to ensure that they are not compromised or exploited by unauthorized users.
One area of research for investigating the application of cybersecurity to wireless communications/networks is signal classification, where attributes of a signal (i.e. modulation, coding technique, etc.) can be detected and used to identify if a transmission originates from an ally or adversary. We have conducted multiple research efforts in modulation classification, which consists of three main components: preprocessing, feature extraction, and classification [Vanhoy 2016]. In [Vanhoy 2016], modulation classification was analyzed through the context of multiple transmitters/receivers and non-cooperative communications, which assumes that “[…] the signal
of interest is coming from a transmitter that does not intend for its data to be interpreted by the observing radio”. In [Vanhoy 2017], modulation classification was extended to classify several radar waveforms designed to not be recovered from an outside source so that the secrecy of the transmitter is preserved. More recently in [Vanhoy 2018], deep learning architectures combined with a hierarchical structure was investigated as a means of modulation classification. Currently, we are building on this work by investigating different deep learning techniques to classify waveforms from different protocols (i.e. LTE, Wi-Fi, 5G).
In this thrust area, we will investigate the following two research projects:


Intelligent/Resilient Secure Transmission Techniques for HF Communications

In this project, we will investigate the development of new security protocols for HF communications. While techniques for keeping transmissions in the HF band secure are already implemented, we will implement protocols with the intent of higher security and resilience to distortion caused by the Ionosphere. Multiple adaptive frequency-hopping techniques have been presented for application in HF communications. When utilizing adaptive frequency-hopping, the receiver uses a Link Quality Analyzer (LQA) to inform the transmitter of which of the available
frequencies are/aren’t viable for usage [Wenlong 2011]. In [Zhu 2018], usage of dynamic spectrum anti-jamming (DSAJ) under an HF channel model was studied to exploit/utilize spectrum holes, defined as “[…] portions [of spectrum] where there is no jamming, or the power of jamming is below a particular threshold […]”. In [Liu 2018], a deep Q network architecture along with a deep reinforcement learning algorithm is proposed to select frequencies for transmission under an HF channel model by processing/fusing information regarding the channel gain and spectral
state (i.e. jamming, interference). We will implement techniques in similar veins of these efforts, utilizing machine learning and signal processing, so that transmissions in the HF band are secure against channel variations and adversarial attack


Development of Intelligent Jammers for Wireless Protocols

In this project, we investigate the development of an intelligent jammer against multiple wireless protocols. In our effort [Thurston 2018], an intelligent jammer was implemented using a deep Q network via recurrent neural networks and was shown to be effective in forcing the transmitter to choose lower-order modulation encodings, by learning when and with how much power to jam.
Specifically, the jammer was “[…] motivated to keep the modulation below the threshold with as little jamming and as low jamming power as possible.” [Thurston 2018] As noted in [Thurston 2018], an extension of this work could be to have the jammer observe and learn to exploit weaknesses in different protocols including “[…] handshakes, ACK/NACK packets, and static packet structures […]”. In addition to strengthening the abilities of the jammer, research would also be conducted to develop techniques for radios “[…] to constantly change their behavior in order to deny an intelligent attacker the time to learn the pattern they are following.” [Thurston
2018]. Thus, this project would benefit the developing of attack and defense mechanisms against potential adversaries.


Federated Cybersecurity Tesbeds for Experimentation, Validation and Demonstration

Thrust Area Leader:
Danda Rawat, Howard University

Emerging smart city systems and applications are so complex and diverse that traditional approaches for cybersecurity, performance prediction, measurement and management are not applicable in a straightforward manner. Furthermore, current work for securing different systems is done in isolation, focused on solutions limited to patching or single domain with limited cooperation across the industry, government and academia. Note that the security was not considered when the protocols for the Internet were designed. Use of same protocols for emerging connected systems including smart city applications would result in significant cyberattacks that would lead to huge devastation. It is clear that the traditional cybersecurity and cyber-defense solutions cannot meet the security requirements of emerging connected systems.


Furthermore, traditional solutions did not consider the following
i) high mobility of end users (in case of smart transportation systems) where network topology changes dynamically based on the speed of vehicles and drivers destinations;

ii) heterogeneous wireless access environment where security solution in one is different from another; iii) data offloading to third party (cloud, edge or peer) for data analytics and getting the response back in case of big data;

iv) self-healing and resilience operations since there was control mechanism unlike emerging cyber physical systems,

v) authentication and access control where millions or billions of devices are connected to Internet or each other; and

vi) efficient and automated password/PIN or policy update process where most of the time IoT devices run without keyboard type input devices.

Thus, cyber defense solutions for emerging IoT enabled connected systems in smart city environment will have to meet the expected exponential growth in demand through a variety of strategies beyond traditional security approaches.


Develop Tools to Support Automatic Federated Cybersecurity Testbeds

There are many testbeds, physical, virtual and simulations for critical infrastructures and cyber systems. Furthermore, it is extremely difficult for one organization to have all the required expertise to perform research and development on these heterogeneous testbeds, and it is cost prohibitive to own and manage these testbeds. However, to understand the interdependency among these testbeds and their implications on cybersecurity issues and how to develop effective defense solutions, researchers and educators need to have full access to federated testbeds that accurately model their operations and their interdepedencies. It is important to be able to compose several testbeds into one federated testbed that includes smart devices and sensors, IoT devices, cloud systems, smart grids, smart buildings, etc.


The federated testbed can then be used to:
(1) train students on how to analyze the normal operations of the composed testbeds,
(2) identify their inter-dependencies, vulnerabilities and how they can be exploited to lunch sophisticated cyberattacks,
(3) how to develop innovative defend techniques, and
(4) how to protect them.
There are currently many isolated cybersecurity and cyber-physically testbeds (Adjih, 2015; Siaterlis, 2014; Nati, 2013; Cintuglu, 2017) but currently there are no methodologies and tools to automatically build a federated testbed (a testbed of heterogeneous testbeds). The goal of this project is to leverage the NSF Federated Cybersecurity Testbed as a Service (TCTaaS) project to further develop the capabilities to provide further support for experimentation, training and validation. This will allow researchers and students to experiment with and evaluate different techniques and tools to detect and protect smart infrastructures and their services from malicious cyberattacks, faults or accidents.
In addition, the PACT researchers will be provided with the tools to add their cybersecurity testbeds to FCTaaS portal. The initial testbed portal will include the UA IoT Testbed, Virtual Cybersecurity Testbed that is currently hosted on Amazon public cloud, and our Wireless Security Testbed. The FCTaaS architecture shown in Figure 8 will utilize open communication standards and the cybersecurity tools that are developed by Dr. Hariri team to maintain the security and privacy of the federated security testbed. These services will allow heterogeneous testbeds to communicate their data syntactically and semantically (so we can understand the data semantics and the dependencies among these testbeds). The Experiment management services will also allow users to configure the required testbeds and their interactions, manage the global time among all testbeds used in the experiment, and also adopt these testbeds as required by the experiment goals.


Illustrative Use Cases For Experimentation, Validation and Training D2.1 Smart City Services

One important class of services to be studied is the one related to smart grid systems. We apply the research developed in thrust areas A, B, and C to develop cyber-attack detection and cyber defense solutions since the smart grid system has assets distributed across the PACT sites, and the system needs feedbacks (for cyber-defense or fault tolerance) that can respond in milliseconds (3 ms to 500 ms) to avoid power outage. Specifically, our goals are:
• To develop a system that minimizes the attacking time (where attackers is trying to maximize the attack time), probability of false alarm and incident response time using federated framework (Howard University, University of Arizona)
• To tightly couple the control, communication and computing with security for smart grid systems with Federated Framework (Howard University, Navajo Technical University)
• To design, develop and evaluate cyber-attack detection and countermeasure for resilient smart grid systems. (Howard University, University of Arizona, Navajo Technical University)
• To validate and evaluate the performance of federated framework enabled cyber-defense solutions (Howard University, University of Arizona, Navajo Technical University)

The cyberattack detection approaches and cyber defense solutions would be applicable to other smart city applications such as transportation systems where delay/latency has important role while distributing emergency messages in a secure manner. Other sample projects related to smart city applications include:
• Smart Transportation Cyber Physical Systems
• Unmanned vehicular systems
• Smart health care systems
• Smart mobile computing for data driven applications

D2.1 Waggle-based Power Utility Infrastructure Testbed
Argonne National Laboratory is collaborating with Exelon Corporation to adapt Argonne’s open source, modular “Waggle” platform to provide intelligent measurement and predictive analytics for power utility infrastructure. Waggle has been deployed in Chicago and a growing number of cities worldwide through an NSF-funded project, the Array of Things, measuring environmental, air quality, and urban activity. The platform, which supports edge computation (embedded machine learning and other remotely programmable capabilities), is an example of “smart infrastructure” that supports research, development, and deployment of novel cyberinfrastructure approaches.
In this project, we will collaborate with the Argonne researchers to add the Waggle testbed to our FCTaaS pool and add the developed security and defense tools to make the Waggle services highly secure and trustworthy.


Cybersecurity Education and Training Programs

Thrust Area Leaders: Salim Hariri, UA and Frank Stomp, Navajo Technical University

In this thrust area, we will develop a virtual cybersecurity laboratory that can be offered as a cloud service and then develop the required educational and training cybersecurity  programs. In what follows, we highlight our approach to develop these capabilities.


Cybersecurity Lab as a Service (ClaaS)

Earlier approaches of cybersecurity testing and training involved setting of dedicated computers, routers, switches, and many more networking components. After setting the hardware environment, it is required to set up the software systems and tools, and configure them. Setting an experimental cybersecurity environment by applying these approaches is time intensive and it requires deep knowledge in software systems, hardware, and network devices. For the dedicated cybersecurity labs, the security experiments to be performed by students require creating a complex, manually intensive, and costly environment that support all the software systems and applications required to run the experiments. For example, in current cybersecurity labs that use VirtualBox or VMware, the user needs to create every Virtual Machine (VM) that is needed (this might take a long time), and require a good software, hardware, and network knowledge to successfully build the virtual cybersecurity environment (Willems, 2011). In reference (Stewart, 2009), the authors describe the importance of the virtualized environment for training and education and suggested using the virtual network for the experimentation. Similarly, Wang et al. suggested using VMware Center Lab Manager to manage the pool of resources to provide virtualized lab environment to the undergraduate students with remote Web access (Wang, 2010).
In contrast to these methods, the CLaaS environment takes a different approach. The student is given the whole virtual experiment that is already built containing all the required VMs and already configured software tools to support the experiment tasks. In our approach, students focus on the cybersecurity issues being investigated or studied and not how to build the virtual cybersecurity experiment components.
In this project, we propose to further develop our Cybersecurity Lab as a Service (CLaaS) prototype (Tunc, 2015a; Tunc, 2015b; Tunc, 2015c). The CLaaS aims at offering virtual cybersecurity experiments as a cloud service that can be accessed from anywhere and from any device (desktop, laptop, tablet, smart mobile device, etc.) with Internet connectivity.
The CLaaS enables students or trainees to conduct virtual cybersecurity experiments in a closed virtual cloud environment to:
• Understand the methodology of launching cyberattacks;
• Train on how to use cybersecurity detection and protection tools;
• Perform penetration testing for software systems; and
• Evaluate new cybersecurity detection and protection algorithms


Undergraduate Cybersecurity Education and Training Programs

In this project, we will develop a Cybersecurity Training Program for undergraduate students that is not only focused on the education of undergraduate students in cybersecurity but also mentor them with research. This project represents an active collaboration between an HBCU, a Latino/Hispanic University, and a Native American serving institute to educate and train underrepresented and minority students in their institutions. Our collaboration will incorporate these diverse backgrounds and provide them with an 8 week cybersecurity program that will provide education and mentoring on cybersecurity and machine learning that will be hosted at the University of Arizona. During the first two weeks of this program, students will go through tutorial sessions on cybersecurity and research methods by professors at the participating institutions.
The remainder of the weeks will involve advanced topics in machine learning and cybersecurity that are supplemented with lab sessions to provide them with hands-on experience in order to reinforce their learning. We will also invite speakers to mentor students on how to write papers and deliver effective presentations. This is an essential component to the program because it will allow students to be exposed to research and how to convey complex subjects to a non-technical audience, and provide them the opportunity to interact with experts in their respective fields. Our Cybersecurity Training Program will be project-based that will allow groups of 2 or 3 undergraduate students, and each group will work on a different topic in the domain of cybersecurity. They will be mentored by faculty as well as graduate students. The summer program will culminate in a symposium, where each group will present their work in a poster format and demonstrate their algorithms on a laptop.
Our educational and training outreach will also be closely coupled with a collaboration with Argonne National Laboratory. We have allocated funds to allow our students the opportunity to not only participate in the Cybersecurity Training Program but also to do internships with DoE and other government labs. This component of undergraduate education is essential for several reasons. First, the undergraduate students will have the opportunity to see real-world problems that they may have not been exposed to in the classroom. Second, this internship allows them to be introduced to different areas of research in cybersecurity. Finally, the internships can lead to the students getting jobs in the cyber workforce or they could be a catalyst for them to go to graduate school. We believe our training and mentoring will provide the students the proper guidance for them to make the best career choice. We will take advantage of national lab researchers, academics and industry collaborators to mentor the students. What an outstanding opportunity for them!
Our education and training programs also include an exchange of graduate students among the collaborating universities. Graduate students will be exchanged for two weeks to enable collaboration as well as mentoring.


Integrated Research, Educational and Training Programs

Develop new cybersecurity and forensic teaching modules

of (a) Security Modeling and Analysis of cyber systems and applications; (b) Cyber Forensic Analysis and Protection; (c) and Cybersecurity Modeling and Analysis of Smart City Resources and Services. The objective of adopting modular course development is to target a broader audience with different backgrounds. Due to the flexibility of the modular design, these modules can be integrated into various existing senior undergraduate and junior graduate courses into PIs’ teaching activities, such as the UA Undergraduate program in cyber operations (UGC-CO) and cyber security (UGC-CS), ECE 509 (Cyber Security), ECE 524 (Cloud Security), ECE 478/578 (Computer Networks), ECE 677 (Distributed Computing Systems), and MIS 429/529 (Detection of Deception and Intent). Howard University also teaches CSCI 453/653 Cybersecurity, CSCI 454/654 Cybersecurity II and EECE/CSCI 479 & EECE 676 – Cybersecurity for CPS/IoT, which was recently I developed.

Develop and expand Online Education programs offered by AskCyPert site:

In addition to new teaching modules for existing courses, short courses as well as complete online training classes will be developed to reach a wider range of engineers, practitioners, and IT managers. These courses would be developed with a variety of audiences in mind, including IT professionals and practitioners, to managers of business units, to C-level executives. The Distance Learning Laboratory (DLL) at the University of Arizona has committed to providing media for outreach education to industry companies. We anticipate that the module development specific to this research will begin in the second year of the project.

Organize Cybersecurity Workshops for Under-represented Undergraduate Students:

In order to increase the participation of underrepresented students in our cybersecurity education and training programs, we will also organize one week workshops in cybersecurity to introduce cybersecurity concepts, tools, and how to protect computers, networks, and data. In addition, we will highlight the many job opportunities that are available if they are enrolled in our SFF cybersecurity programs as well as in UGC-CO and UGC-CS programs.

Training and Preparing Students for Cybersecurity Competition:

There are numerous types of cybersecurity design competitions that exist, including “Pwn2Own”, capture the flag, and redteam/blue-team. In the “Pwn2Own”, the goal is to exploit widely used software and mobile devices with vulnerabilities that have not yet been publicly disclosed in exchange for the device in question and cash prizes. Capture the flag is designed around both attacking and defending. A team must defend their own virtual assets while capturing “flags” by hacking other opponents. Red-team/blueteam competitions involve numerous teams that must defend information assets while the red-team attempts to hack each of the blue-teams. The proposed CLaaS services will be used to provide training to users so that they may effectively compete in cybersecurity competitions. In addition, we will expand CLaaS by incorporating cognitive computing services to develop training programs for students so they can be effective in puzzle-based time trial competitions. In these types of competitions, players must solve a series of cybersecurity puzzles in a fixed amount of time. CLaaS virtual cybersecurity experiments will provide a viable training for these competitions by playing a sort of game with increasingly more difficult puzzles. CLaaS will have three difficulty levels that pertain to individuals at the high school, college, and graduate level.