Zifeng Wang

I am a research scientist at Google. I received my PhD in Machine Learning at SPIRAL Group from Northeastern University, advised by Prof. Jennifer G. Dy. During my PhD, I also work closely with Prof. Stratis Ioannidis and Prof. Yanzhi Wang. I received my BS degree in Electronic Engineering from Tsinghua University. In my college years, I also worked with Prof. Jiwen Lu (Tsinghua), Prof. Jia Deng (Princeton) on computer vision and Prof. Yong Li (Tsinghua) on big data.

I am looking for self-motivated student researchers / research interns with interests and expertise in LLMs! Feel free contact me if you would like to work on cutting-edge challenges in machine learning and LLMs.

Email  /  Google Scholar  /  LinkedIn  /  Twitter  /  CV

profile photo
Research

  • Large language models
    • Model adaptation and customization
    • Multi-LLM collaboration and interaction
    • Agents and their applications
  • Continual (Lifelong) learning
  • Open set recognition / novel class discovery
  • Adversarial robustness and model compression

Experiences

  • Aug 2023 - Present, Google,
    Research Scientist at Cloud AI Research
  • Sep 2018 - July 2023, Northeastern University,
    Research assistant at SPIRAL Group
  • June 2021 - Jan 2023, Google,
    Student researcher / Research Intern at Cloud AI Research
  • Feb 2017 - July 2018, Tsinghua University,
    Research assistant at i-Vision Group
  • July 2017 - Sep 2017, University of Michigan,
    Visiting researcher at Vision & Learning Lab

Selected Publications
Filter Papers by Topics (Google Scholar for all publications.)
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Shangbin Feng, Zifeng Wang, Yike Wang, Sayna Ebrahimi, Hamid Palangi, Lesly Miculicich, Achin Kulshrestha, Nathalie Rauschmayr, Yejin Choi, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister
Arxiv, 2024.
[paper]

We propose Model Swarms, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, Model Swarms starts with a pool of LLM experts and a utility function. Guided by the best-found checkpoints across models, diverse LLM experts collaboratively move in the weight space and optimize a utility function representing model adaptation objectives.

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei Li, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister
International Conference on Learning Representations (ICLR), 2025.
[paper]

Speculative Knowledge Distillation (SKD) is a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution. In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution, transferring high-quality knowledge adaptively.

Reverse Thinking Makes LLMs Stronger Reasoners
Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister
The Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025.
[paper]

To enable LLMs to perform reverse thinking, we introduce Reverse-Enhanced Thinking (RevThink), a framework composed of data augmentation and learning objectives. RevThink outperforms a standard fine-tuning method trained on 10x more forward reasoning, it also exhibits strong generalization to out-of-distribution held-out datasets.

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister
International Conference on Learning Representations (ICLR), 2025.
[paper][blog]

Speculative RAG is a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. (A complementary strategy to the CaLM paper below from a very different perspective!)

TableRAG: Million-Token Table Understanding with Language Models
Si-An Chen, Lesly Miculicich, Julian Martin Eisenschlos, Zifeng Wang, Zilong Wang, Yanfei Chen, Yasuhisa Fujii, Hsuan-Tien Lin, Chen-Yu Lee, Tomas Pfister
Neural Information Processing Systems (NeurIPS), 2024.
[paper][code]

TableRAG is a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs.

CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation
I-Hung Hsu, Zifeng Wang, Long T Le, Lesly Miculicich, Nanyun Peng, Chen-Yu Lee, Tomas Pfister
Findings of the Association for Computational Linguistics (ACL), 2024.
[paper]

CaLM is a novel verification framework that leverages the insight that a robust grounded response should be consistent with information derived solely from its cited sources. Our framework empowers smaller LMs, which rely less on parametric memory and excel at processing relevant information given a query, to validate the output of larger LMs.

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
Findings of the Association for Computational Linguistics (ACL), 2024.
[paper]

LLMs often struggle to capture relevant information in the middle of their input due to an intrinsic U-shaped attention bias, favoring tokens at the beginning and end. To address this, we propose a calibration mechanism called "found-in-the-middle" to mitigate this bias, greatly improving context relevance and RAG performance.

CodecLM: Aligning Language Models with Tailored Synthetic Data
Zifeng Wang, Chun-Liang Li, Vincent Perot, Long T. Le, Jin Miao, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister
Findings of North American Chapter of the Association for Computational Linguistics (NAACL), 2024.
[paper][blogpost]

CodecLM is a general framework for adaptively generating high-quality synthetic data for LLM alignment with different downstream instruction distributions and LLMs. Drawing on the Encode-Decode principles, we use LLMs as codecs to guide the data generation process.

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister
International Conference on Learning Representations (ICLR), 2024.
[paper]

CHAIN-OF-TABLE enhances the reasoning capability of LLMs by leveraging tabular structures to express intermediate thoughts for table-based reasoning. It instructs LLMs to dynamically plan an operation chain according to the input table and its associated question.

QueryForm: A Simple Zero-shot Form Entity Query Framework
Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister
Findings of the Association for Computational Linguistics: (ACL) 2023
[paper]

QueryForm consists of a novel prompting-based framework for zero-shot document entity recognition with large language models (LLMs), and a large-scale weakly-supervised pre-training method on publicly available webpages.

DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning
Zifeng Wang*, Zheng Zhan*, Yifan Gong, Yucai Shao, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
International Conference on Machine Learning (ICML), 2023.
[paper][code]

DualHSIC consists of two complementary components that stem from the Hilbert Schmidt independence criterion (HSIC): HSIC-Bottleneck for Rehearsal (HBR) lessens the inter-task interference and HSIC Alignment (HA) promotes task-invariant knowledge sharing.

SparCL: Sparse Continual Learning on the Edge
Zifeng Wang*, Zheng Zhan*, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy
Neural Information Processing Systems (NeurIPS), 2022.
[paper] [code]

SparCL explores sparsity for efficient continual learning and achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.

DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning
Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister
European Conference on Computer Vision (ECCV), 2022.
[paper] [code]

DualPrompt presents a novel approach to attach complementary prompts to the pre-trained backbone, and then formulates the continual learning objective as learning task-invariant and task-specific “instructions".

Learning to Prompt for Continual Learning
Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[paper] [code] [blog]

We propose a new learning paradigm for continual learning: our method learns to dynamically prompt (L2P) a pre-trained model to learn tasks sequentially under different task transitions.

Revisiting Hilbert-Schmidt Information Bottleneck for Adversarial Robustness
Zifeng Wang*, Tong Jian*, Aria Masoomi, Stratis Ioannidis, Jennifer Dy
Neural Information Processing Systems (NeurIPS), 2021.
[paper] [code]
Invited oral presentation at INFORMS 2022

We investigate the HSIC (Hilbert-Schmidt independence criterion) bottleneck as a regularizer for learning an adversarially robust deep neural network classifier, both theoretically and empirically.

Learn-Prune-Share for Lifelong Learning
Zifeng Wang*, Tong Jian*, Kaushik Chowdhury, Yanzhi Wang, Jennifer Dy, Stratis Ioannidis
International Conference on Data Mining (ICDM), 2020.
[paper]

We propose a learn-prune-share (LPS) algorithm which addresses the challenges of catastrophic forgetting, parsimony, and knowledge reuse simultaneously.

Open-world class discovery with kernel networks
Zifeng Wang, Batool Salehi, Andrey Gritsenko, Kaushik Chowdhury, Stratis Ioannidis, Jennifer Dy
International Conference on Data Mining (ICDM), 2020.
Best paper candidate
[paper]

We propose Class Discovery Kernel Network with Expansion (CD-KNet-Exp), a deep learning framework for open-world class discovery problem.

Invited Talks
Awards
  • Outstanding Student Award in Research, Northeastern University, 2023
  • Scholar Award, NeurIPS 2022
  • Best Paper Candidate, ICDM 2020
  • Best Paper Award, DySPAN 2019
  • Travel Award, DySPAN 2019
  • Travel Award, NeurIPS 2019
  • Dean's Fellowship, Northeastern University, 2018
       Highest honor awarded to new PhD students for out standing academic background.
  • Evergrande Scholarship, Tsinghua University, 2016
       Awarded to students with excellent academic performance, scientific potential and overall development.
Academic Service

Conference Reviewer: NeurIPS, ICML, ICLR, CVPR, ICCV, ACL ARR
PC Member: SDM
Journal Reviewer: TPAMI, TMLR, Neural Networks

Template Credit: Jon Barron