The current research interests are
String Pattern Matching Algorithms

Formal Interpretation of Practical Regex Features: Practical regexes often incorporate several advanced features, such as the count operation, lookaround, and backreferences, which are not accounted for in the classical regular expression model. Some of these features even allow regexes to express nonregular languages. We formalize these extensions within the framework of automata and formal language theory, and develop efficient algorithms for such features.
 Parikh Matrix Equivalence: The Mequivalence test checks if two strings are equivalent by comparing their Parikh matrices, which detail the frequency and position of characters within the strings. It involves characterization of Mequivalent classes and also algorithms for efficient matrix computation and comparison.
 Simon’s Congruence Matching: Given an integer k, two strings are Simon's kcongruent if their set of subsequences with length at most k are equal. We study pattern matching problems under Simon's congruence, where two strings match if they are Simon's kcongruent. We are also interested in solving the approximate version of the matching problem, as well as problems finding a string inside a given language that is Simon's kcongruent to another string.
Formal Grammars for Deep Learning

PCFGbased Parsing Technique for Data Augmentation: We utilize probabilistic contextfree grammars (PCFG) to generate diverse and syntactically correct variations of input data, enhancing dataset size and model generalization.
 Analysis and Interpretation of Neural Networks via Probabilistic Automata: It involves employing probabilistic automata models to understand and interpret the behavior and decisions made by neural networks, enabling insights into their underlying processes and improving their transparency and explainability.
 NeuroSymbolic AI for Logical Reasoning: Neurosymbolic AI for logical reasoning combines neural networks with symbolic reasoning to enhance decisionmaking and problemsolving capabilities, bridging the gap between statistical learning and logical inference. It leverages the strengths of both approaches to tackle complex reasoning tasks, offering more robust and interpretable solutions.
Deep Learning for Software Codes

Code Time Complexity Prediction: We predict the time complexity of code by analyzing its structure, identifying key factors such as loops and recursion, and expressing the runtime in terms of Big O notation.
 Smallscale Code LLMs: We develop a smallscale code LLM that is executable on a single GPU with a typical memory size. We aim to develop a model that performs well in various coderelated tasks with a small model size using techniques such as model merging and instruction tuning.
 Natural Language Code Search: NLCode search is a process of retrieving code snippets of functions using natural language queries. It involves utilizing techniques from NLP and code analysis to understand both the query and the code base.
Deep Learning for NLP

Hate Speech Detection: We aim to develop algorithms or AI models to automatically identify and categorize language that expresses hatred or prejudice towards specific individuals or groups, aiding in the moderation of online content and fostering a safer digital environment.
 FewShot Text Classification using SelfTraining: It involves leveraging a small labeled dataset combined with a larger unlabeled dataset to improve classification performance by iteratively refining predictions and updating the model’s parameters through selftraining iterations.
 MachineGenerated Text Detection: Machinegenerated text detection is the process of identifying text generated by AI models rather than humans, crucial for identifying fake news and misinformation. It involves analyzing linguistic patterns and inconsistencies to distinguish between human and machinegenerated content.