AISTATS Poster A Likelihood Based Approach for Watermark Detection

Poster

A Likelihood Based Approach for Watermark Detection

Guanxun Li · Anirban Bhattacharya

[ Abstract ]

Abstract:

Watermarking techniques embed statistical signals within content generated by large language models to help trace its source. Although existing methods perform well on long texts, their effectiveness significantly decreases for shorter texts. We introduce a statistical detection approach that improves the power of watermark detection, particularly in shorter texts. Our method leverages both the watermark key sequence and the next token probabilities (NTPs) to determine whether a text is generated by a large language model. We demonstrate the optimality of our approach and analyze its power properties. We also investigate an approach to estimating NTPs and extend our method to scenarios where texts face potential attacks such as substitutions, insertions, or deletions. We validate the effectiveness of our technique using texts generated by Meta-Llama-3-8B from Meta and Mistral-7B-v0.1 from Mistral AI, utilizing prompts extracted from Google's C4 dataset. In scenarios without attacks and with short text lengths, our method demonstrates approximately 65% power improvement compared to the baseline method on average. We release all code publicly at https://github.com/doccstat/llm-watermark-adaptive.

Live content is unavailable. Log in and register to view live content