Mengyu Ye

Ph.D. student @ Tohoku University Fundamental AI lab.

sea_prof_pic.jpg

ye.mengyu.s1 [at] dc.tohoku.ac.jp

I am a 2nd-year Ph.D. student in NLP at the Fundamental AI lab (member of the Tohoku NLP Group) at Tohoku University, advised by Prof. Jun Suzuki. I am also a Google PhD Fellow. My research centers on the evaluation of large language models — in particular, evaluation reliability and the faithfulness of chain-of-thought (CoT) reasoning. I combine controlled experiments with causal methods to characterize how models reason, how reliably we can measure it, and where they fail.

My work focuses on uncovering overlooked failure modes in reasoning and its evaluation, and on validating their underlying mechanisms, with resulting publications at NeurIPS, ACL, and EMNLP. A central aim is to develop tools and methodologies that make both model capabilities and the metrics used to assess them more predictable, reliable, and controllable.

I have recently extended this evaluation-centric perspective to diffusion language models, identifying an overlooked failure mode in open-ended generation — the regime where diffusion LMs fall furthest behind autoregressive models — and tracing it to a train–inference mismatch induced by standard training objectives. I also explore agentic systems, applying the same experimental discipline; our deep-research agent received the Best Static Evaluation Prize in the MMU-RAG competition at NeurIPS 2025.

news

Jan 30, 2026 We release a new paper on relaxing positional alignment in masked diffusion LMs, identifying a key failure mode in open-ended generation, now on arXiv.
Dec 08, 2025 Our team won the Best Static Evaluation Prize in the MMU-RAG NeurIPS 2025 Competition.
Dec 01, 2025 Released a CLI tool that uses an LLM agent to automatically clean, format, and update BibTeX references.
Oct 24, 2025 I’m honored to receive the 2025 Google PhD Fellowship in Natural Language Processing.
Sep 19, 2025 The paper demonstrating that the interpretability of key-value memories closely matches that of sparse autoencoders has been accepted to NeurIPS 2025.

previous news