I’m a M.S. student in Electrical and Computer Engineering (Machine Learning and Data Science) at UC San Diego, where I focus on Generative AI, Multimodal Reasoning, and AI Safety. Previously, I earned my B.Eng. in Intelligent Science and Technology from Sun Yat-sen University, with a GPA of 3.8/4.0 (upper-division 4.0/4.0).

🔬 Research Interests

My research focuses on multi-modal generation models, particularly on text, image, and video generation to achieve semantic alignment and mutual enhancement between modalities.

Recently, I have been especially interested in memory-augmented video generation, exploring how retrieval and structured memory modules can improve long-horizon consistency, physical plausibility, and content controllability in generative models.

Looking forward, I aim to study memory mechanisms in world models and their integration with VLA (Vision-Language-Action) systems, enabling more reliable embodied reasoning, interactive prediction, and agentic planning in dynamic environments.

I am also interested in unified generation and understanding models, with the goal of building general-purpose frameworks that combine multi-modal comprehension, reasoning, and generation in a single architecture.

In addition, I am exploring AI safety and robustness, studying how large models can remain reliable and secure under adversarial or backdoor conditions.


🎖 Honors and Awards

  • Merit Scholarship, Sun Yat-sen University (2020–2024)

📖 Educations

  • 2024.09 - present, M.S. in Electrical and Computer Engineering (Machine Learning and Data Science), UC San Diego
  • 2020.09 – 2024.06, B.Eng in Intelligent Science and Technology, Sun Yat-sen University

💻 Internships

  • Research Internship in Biwei Huang’s Lab at UCSD (2025.4 – Present)
    Focused on memory-based video generation and World Models. Aiming to extend memory into world-model and VLA settings.
  • Internship at Sun Yat-sen Univeristy under Professor Xiaodan Liang (2023.10-2024.9)
    Conducted research on visual-language pretraining and representation alignment in multimodal learning.

📑Publications

Learning Plug-and-play Memory for Guiding Video Diffusion Models arXiv

Selena Song*, Ziming Xu*, Zijun Zhang, Kun Zhou, Jiaxian Guo, Lianhui Qin, Biwei Huang

Under Review (CVPR submission)

Steganographic Backdoor Attacks in NLP: Ultra-Low Poisoning and Defense Evasion arXiv

Eric Xue, Ruiyi Zhang, Zijun Zhang, Pengtao Xie

arXiv Preprint

💬 Contact