I am looking for self-motivated student researchers / research interns with interests and expertise in LLMs! Feel free contact me if you would like to work on cutting-edge challenges in machine learning and LLMs.
We propose Model Swarms, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, Model Swarms starts with a pool of LLM experts and a utility function. Guided by the best-found checkpoints across models, diverse LLM experts collaboratively move in the weight space and optimize a utility function representing model adaptation objectives.
Speculative Knowledge Distillation (SKD) is a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution. In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution, transferring high-quality knowledge adaptively.
Reverse Thinking Makes LLMs Stronger Reasoners
Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
The Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025.
[paper]
To enable LLMs to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. RevThink outperforms a standard fine-tuning method trained on 10x more forward reasoning, it also exhibits strong generalization to out-of-distribution held-out datasets.
Speculative RAG is a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. (A complementary strategy to the CaLM paper below from a very different perspective!)
TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.
CaLM is a novel verification framework that leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs.
LLMs often struggle to capture relevant information in the middle of their input due to an intrinsic U-shaped attention bias, favoring tokens at the beginning and end. To address this, we propose a calibration mechanism called "found-in-the-middle" to mitigate this bias, greatly improving context relevance and RAG performance.
CodecLM is a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process.
CHAIN-OF-TABLE enhances the reasoning capability of LLMs by leveraging tabular structures to express intermediate thoughts for table-based reasoning. It instructs LLMs to dynamically plan an operation chain according to the input table and its associated question.
QueryForm: A Simple Zero-shot Form Entity Query Framework Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent
Perot, Tomas Pfister Findings of the Association for Computational Linguistics: (ACL) 2023
[paper]
QueryForm consists of a novel prompting-based framework for zero-shot document entity recognition
with large language models (LLMs), and
a large-scale weakly-supervised pre-training method on publicly available webpages.
DualHSIC consists of two complementary components that stem from the Hilbert Schmidt independence
criterion (HSIC): HSIC-Bottleneck for Rehearsal (HBR) lessens the inter-task
interference and HSIC Alignment (HA) promotes task-invariant knowledge sharing.
SparCL: Sparse Continual Learning on the Edge Zifeng Wang*, Zheng Zhan*, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian,
Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy Neural Information Processing Systems (NeurIPS), 2022.
[paper] [code]
SparCL explores sparsity for efficient continual learning and achieves both training acceleration
and accuracy preservation through the synergy of three aspects: weight sparsity,
data efficiency, and gradient sparsity.
DualPrompt presents a novel approach to attach complementary prompts to the pre-trained backbone,
and then formulates the continual learning objective as learning task-invariant and task-specific
“instructions".
Learning to Prompt for Continual Learning Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent
Perot, Jennifer Dy, Tomas Pfister IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[paper] [code] [blog]
We propose a new learning paradigm for continual learning: our method learns to dynamically
prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions.
We investigate the HSIC (Hilbert-Schmidt independence criterion) bottleneck as a
regularizer for learning an adversarially robust deep neural network classifier, both theoretically
and empirically.
Learn-Prune-Share for Lifelong Learning Zifeng Wang*, Tong Jian*, Kaushik Chowdhury, Yanzhi Wang, Jennifer Dy, Stratis Ioannidis International Conference on Data Mining (ICDM), 2020.
[paper]
We propose a learn-prune-share (LPS) algorithm which addresses the challenges of catastrophic
forgetting, parsimony, and knowledge reuse simultaneously.
Open-world class discovery with kernel networks Zifeng Wang, Batool Salehi, Andrey Gritsenko, Kaushik Chowdhury, Stratis Ioannidis, Jennifer Dy
International Conference on Data Mining (ICDM), 2020.
Best paper candidate
[paper]
We propose Class Discovery Kernel Network with Expansion (CD-KNet-Exp), a deep learning framework
for open-world class discovery problem.
Invited Talks
Sparse Continual Learning on the Edge - ContinualAI, 2023, Remote - AI Time, 2023, Remote (Chinese
version)
Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness
- INFORMS Annual
Meeting,
2022, Indianapolis, Indiana [abstract] - AI Time, 2022, Remote (Chinese
version)
[recording][blog]
Awards
Outstanding Student Award in Research, Northeastern University,
2023
Scholar Award, NeurIPS 2022
Best Paper Candidate, ICDM 2020
Best Paper Award, DySPAN 2019
Travel Award, DySPAN 2019
Travel Award, NeurIPS 2019
Dean's Fellowship, Northeastern University, 2018    Highest honor awarded to new PhD
students for out standing academic background.
Evergrande Scholarship, Tsinghua University, 2016    Awarded to students with excellent
academic performance, scientific potential and overall development.