publications/projects

Publications

2026

arXiv

Relaxing Positional Alignment in Masked Diffusion Language Models

Mengyu Ye, Ryosuke Takahashi, Keito Kudo, and Jun Suzuki

2026

Abs Paper

Masked diffusion language models (MDLMs) have emerged as a promising alternative to dominant autoregressive approaches. Although they achieve competitive performance on several tasks, a substantial gap remains in open-ended text generation. We hypothesize that one cause of this gap is that strict positional prediction makes MDLM decoding highly sensitive to token misalignment, and we show through controlled interventions that a one-position shift can severely disrupt semantics. This observation suggests that enforcing strict positional supervision during training is misaligned with the irreversible denoising dynamics of MDLM decoding. Motivated by this mismatch, we adopt an alignment-flexible supervision strategy during fine-tuning. Specifically, we introduce a special token <slack> via the connectionist temporal classification objective. We apply this approach to the widely used MDLM model and conduct experiments on five open-ended text generation benchmarks. Our method consistently outperforms the original model and improves robustness to positional shifts, indicating that relaxing strict positional supervision is an important factor in improving generation quality in MDLMs.

2025

NeurIPS

Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders

Mengyu Ye, Jun Suzuki, Tatsuro Inaba, and Kuribayashi Tatsuki

In The Thirty-ninth Annual Conference on Neural Information Processing Systems, Dec 2025

Abs Project Page

Recent interpretability work on large language models (LLMs) is increasingly dominated by a feature discovery approach with the help of proxy modules, where the quality of features learned by, e.g., sparse auto-encoders (SAEs), has been evaluated. This paradigm naturally raises a critical question — how much better features such proxies have indeed discovered, especially compared to those already represented within the original model parameters — and unfortunately, such a comparison has little been made so far. In this work, we revisit the interpretability of feature vectors stored in feed-forward (FF) layers, given the perspective of FF as key-value memories, with modern interpretability benchmarks. Our extensive evaluation revealed that SAE and FFs exhibits a similar range of interpretability, although SAEs displayed an observable but minimal improvement in some aspects. Furthermore, in certain aspects, surprisingly, even vanilla FFs yielded better interpretability scores than the SAEs, and features discovered in SAEs and FFs diverged. These bring questions about the advantage of SAEs from both perspectives of feature quality and faithfulness, compared to directly interpreting FF feature vectors.
arXiv

Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

Tarek Naous, Anagha Savit, Carlos Rafael Catalan, Geyang Guo, Jaehyeok Lee, Kyungdon Lee, Lheane Marie Dizon, Mengyu Ye, and 12 more authors

Dec 2025

Paper
ACL Findings

Can Input Attributions Explain Inductive Reasoning in In-Context Learning?

Mengyu Ye, Tatsuki Kuribayashi, Goro Kobayashi, and Jun Suzuki

In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025

Abs Paper

Interpreting the internal process of neural models has long been a challenge. This challenge remains relevant in the era of large language models (LLMs) and in-context learning (ICL); for example, ICL poses a new issue of interpreting which example in the few-shot examples contributed to identifying/solving the task. To this end, in this paper, we design synthetic diagnostic tasks of inductive reasoning, inspired by the generalization tests in linguistics; here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates the task demonstrated. The question is whether conventional input attribution (IA) methods can track such a reasoning process, i.e., identify the influential example, in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the generally harder it is to interpret the ICL with gradient-based IA methods.

2023

EMNLP

Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism

Mengyu Ye, Tatsuki Kuribayashi, Jun Suzuki, Goro Kobayashi, and Hiroaki Funayama

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (oral), Dec 2023

Abs Paper

Large language models (LLMs) take advantage of step-by-step reasoning instructions, e.g., chain-of-thought (CoT) prompting. Building on this, their ability to perform CoT-style reasoning robustly is of interest from a probing perspective. In this study, we inspect the step-by-step reasoning ability of LLMs with a focus on negation, which is a core linguistic phenomenon that is difficult to process. In particular, we introduce several controlled settings (e.g., reasoning in case of fictional entities) to evaluate the logical reasoning abilities of the models. We observed that dozens of modern LLMs were not robust against lexical negation (e.g., plausible\rightarrowimplausible) when performing CoT-style reasoning, and the results highlight unique limitations in each LLM family.
SemEval

TohokuNLP at SemEval-2023 Task 5: Clickbait Spoiling via Simple Seq2Seq Generation and Ensembling

Hiroto Kurita, Ikumi Ito, Hiroaki Funayama, Shota Sasaki, Shoji Moriya, Mengyu Ye, Kazuma Kokuta, Ryujin Hatakeyama, and 2 more authors

In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), Jul 2023

Abs Paper

This paper describes our system submitted to SemEval-2023 Task 5: Clickbait Spoiling. We work on spoiler generation of the subtask 2 and develop a system which comprises two parts: 1) simple seq2seq spoiler generation and 2) post-hoc model ensembling. Using this simple method, we address the challenge of generating multipart spoiler. In the test set, our submitted system outperformed the baseline by a large margin (approximately 10 points above on the BLEU score) for mixed types of spoilers. We also found that our system successfully handled the challenge of the multipart spoiler, confirming the effectiveness of our approach.

Projects

2025

CLI Tool LLM Agent System

BibTeX Cleaning Agent

An LLM-based agentic system that automatically cleans and standardizes BibTeX entries. It connects to DBLP, Semantic Scholar, and arXiv to fix formatting, normalize fields, enrich metadata, and update a paper’s official publication and venue information. The tool can generate consistent citation keys, detect and remove duplicates, and produce a JSON file that maps original keys to their cleaned versions. It also reports entries that are missing required fields so users can review and correct them manually.

View on GitHub