-
USC-ISI Networking and Cybersecurity Seminar Talk
Mon, Mar 17, 2025 @ 01:30 PM - 02:30 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Dr. Latifur Khan, University of Texas at Dallas
Talk Title: Generative AI including Large Language Models for Cyber-Security
Series: Networking and Cybersecurity
Abstract: In this presentation, I will explore three applications of generative AI, specifically Large Language Models (LLMs), in the domains of tabular security datasets, cyber-threat reports, and Federal and State legislation related to autonomous vehicles.
1. Learning for Tabular Security Datasets and its applications to Automotive Security.
Tabular datasets in cybersecurity present significant challenges for machine learning due to their heavily imbalanced nature—with a small number of labeled attack samples buried in a vast sea of mostly benign, unlabeled data. Semi-supervised learning leverages a small subset of labeled data alongside a large subset of unlabeled data to train a model. While semi-supervised methods have been extensively studied in image and language domains, they remain underutilized in security contexts—particularly for tabular security datasets, where challenges such as contextual information loss and class imbalance hinder machine learning performance. To address these issues, we propose MCoM (Mixup Contrastive Mixup), a novel semi-supervised learning methodology that introduces a triplet mixup data augmentation approach to mitigate the imbalanced data problem in tabular security datasets.
Many automotive security datasets are tabular in nature. We leverage these advantages to produce novel solutions for securing smart vehicles. Machine learning approaches are a natural choice for detecting such attacks based on the payload information. However, machine learning models typically require a large dataset for training. With manufacturers independently gathering this data based on their own cars, it is unlikely that all this data will be available in one place. To address this issue, we explore federated solutions that learn in a distributed manner for increased smart vehicle security. We explore challenging scenarios in which we do not assume an independent and identically distributed (IID) setting for the data. With a combination of techniques including triplet-mixup based augmentation and a data exchange scheme involving synthetically generated samples, we show that we can attain strong performance in the most challenging label distribution scenarios.
2. AI for Cybersecurity Intelligence and Policy
In this area, we will discuss several related research projects:
• Optimizing Cyber Threat Intelligence with Active Learning We propose a framework for efficiently identifying cyber-attacks, called ALERT. The ALERT framework addresses challenges in extracting actionable intelligence from complex Cyber Threat Intelligence (CTI) reports by automating the identification of attack techniques and mapping them to the MITRE ATT&CK framework. By combining active learning strategies with Large Language Models (LLMs), our approach selects only the most informative instances for annotation, achieving comparable performance with 77% less data. This significantly reduces the resource-intensive process of manual annotation by security professionals while maintaining effectiveness in threat technique extraction.
• Automating Cyber-Threat Intelligence with LLMs In collaboration with researchers from NIST, we focus on automating the extraction of attack techniques from Common Vulnerabilities and Exposures (CVE) and Cyber Threat Intelligence (CTI) reports. We then map these techniques to the standardized MITRE ATT&CK framework using a combination of LLMs and active learning. This talk will demonstrate how this curated knowledge enables security analysts to respond more effectively to cyber threats by streamlining intelligence gathering and threat attribution.
• Identifying Legislative Gaps in Autonomous Vehicle Regulations Leveraging LLMs and Retrieval-Augmented Generation (RAG), we have identified gaps in Federal and State legislation concerning data privacy and cybersecurity within the autonomous vehicle domain. This presentation will showcase how modifications or additions to existing legislative frameworks can proactively address emerging cybersecurity and privacy challenges in autonomous vehicle regulations. By integrating generative AI and LLMs into these domains, we aim to bridge critical gaps in cybersecurity, intelligence automation, and policy-making, demonstrating the transformative potential of AI in real-world applications.
3. Responsible Active Online Learning for Streaming Data
In many practical applications, machine learning systems face three interconnected challenges: processing continuous streams of incoming data, handling predominantly unlabeled datasets, and ensuring responsible and unbiased predictions across diverse demographic groups. Current approaches rarely address all three aspects effectively. We propose a framework called FACTION, which strategically identifies and selects the most valuable data points for annotation by balancing model uncertainty with ethical considerations for various subpopulations. The system additionally demonstrates exceptional capability in detecting anomalous data points within streaming contexts. Through comprehensive testing on real-world datasets and rigorous theoretical validation, FACTION shows promising results in maintaining both accuracy and responsible AI principles in evolving data landscapes. This approach could potentially be applied to transportation safety systems, particularly pedestrian detection in autonomous vehicles where cameras continuously capture diverse individuals in varying conditions. By intelligently selecting informative detection scenarios for annotation, such an application might help address potential disparities in detection accuracy while optimizing the labeling process for continuous video feeds.
*This work is funded by NSF, DOT, NIH, ONR, ARMY, and NSA.
Biography:
Dr. Latifur Khan is currently a full Professor (tenured) in the Computer Science department at the University of Texas at Dallas, USA where he has been teaching and conducting research since September 2000. He received his Ph.D. degree in Computer Science from the University of Southern California (USC) in August of 2000.
Dr. Khan is a fellow of IEEE, IET, BCS, and an ACM Distinguished Scientist. He has received prestigious awards including the IEEE Technical Achievement Award for Intelligence and Security Informatics, IEEE Big Data Security Award, and IBM Faculty Award (research) 2016. Dr. Khan has published over 300 papers in premier journals and prestigious conferences. Currently, Dr. Khan’s research focuses on big data management and analytics, data mining and its application to cyber security, and complex data management including geospatial data and multimedia data. His research has been supported by grants from NSF, NIH, the Air Force Office of Scientific Research (AFOSR), DOE, NSA, IBM, and HPE. More details can be found at www.utdallas.edu/~lkhan.
Host: David Balenson and Jelena Mirkovic
More Info: https://www.isi.edu/events/5655/generative-ai-including-large-language-models-for-cyber-security/
Webcast: https://www.isi.edu/events/5655/generative-ai-including-large-language-models-for-cyber-security/Location: Information Science Institute (ISI) - 1135/37
WebCast Link: https://www.isi.edu/events/5655/generative-ai-including-large-language-models-for-cyber-security/
Audiences: Everyone Is Invited
Contact: Matt Binkley / Information Sciences Institute
Event Link: https://www.isi.edu/events/5655/generative-ai-including-large-language-models-for-cyber-security/
This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.