GitHub - zwhe99/X-SIR: [ACL 2024] Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

💧 X-SIR: A text watermark that survives translation

Implementaion of our paper:

Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models

🔥 News

[Apr 8, 2024]: New repo released!

Conda environment

Tested on the following environment, but it should work on other versions.

python 3.10.10
pytorch
pip3 install -r requirements.txt
[optional] pip3 install flash-attn==2.3.3

Overview

src_watermark implements three text watermarking methods (x-sir, sir and kgw) with a unified interface.
attack contains two watermarking removal methods: paraphrase and translation
Scripts:
- gen.py: generate text with watermark
- detect.py: compute z-score for given texts
- eval_detection.py: calculate AUC, TPR, and F1 for watermark detection
- You can use --help to see full usage of these scripts.
Supported models:
- meta-llama/Llama-2-7b-hf
- baichuan-inc/Baichuan2-7B-Base
- baichuan-inc/Baichuan-7B
- mistralai/Mistral-7B-v0.1
Supported languages: English (En), German (De), French (Fr), Chinese (Zh), Japanese (Ja)
You can learn how to extend model and language in from-scratch.md.

Usage (No attack)

Generate text with watermark

MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/xsir/300_mapping_$MODEL_ABBR.json

WATERMARK_METHOD_FLAG="--watermark_method xsir  --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"

python3 gen.py \
    --base_model $MODEL_NAME \
    --fp16 \
    --batch_size 32 \
    --input_file data/dataset/mc4/mc4.en.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
    --WATERMARK_METHOD_FLAG

Compute the z-scores

# Compute z-score for human-written text
python3 detect.py \
    --base_model $MODEL_NAME \
    --detect_file data/dataset/mc4/mc4.en.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
    $WATERMARK_METHOD_FLAG

# Compute z-score for watermarked text
python3 detect.py \
    --base_model $MODEL_NAME \
    --detect_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
    --output_file gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl \
    $WATERMARK_METHOD_FLAG

Evaluation

python3 eval_detection.py \
	--hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
	--wm_zscore gen/$MODEL_ABBR/xsir/mc4.en.mod.z_score.jsonl

AUC: 0.994

TPR@FPR=0.1: 0.994
TPR@FPR=0.01: 0.862

F1@FPR=0.1: 0.955
F1@FPR=0.01: 0.921

Usage (With attack)

Here we test the watermark after translating to other languages (De, Fr, Zh, Ja).

Preparation

We use ChatGPT to perform paraphrase and translation. Therefore:

Set you openai api key: export OPENAI_API_KEY=xxxx
You may also want to modify the RPMs and TPMs in attack/const.py

Translation

TGT_LANGS=("de" "fr" "zh" "ja")
for TGT_LANG in "${TGT_LANGS[@]}"; do
    python3 attack/translate.py \
        --input_file gen/$MODEL_ABBR/xsir/mc4.en.mod.jsonl \
        --output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
        --model gpt-3.5-turbo-1106 \
        --src_lang en \
        --tgt_lang $TGT_LANG
done

Compute the z-scores

for TGT_LANG in "${TGT_LANGS[@]}"; do
    python3 detect.py \
        --base_model $MODEL_NAME \
        --detect_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.jsonl \
        --output_file gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl \
        $WATERMARK_METHOD_FLAG
done

Evaluation

for TGT_LANG in "${TGT_LANGS[@]}"; do
    echo "En->$TGT_LANG"
    python3 eval_detection.py \
        --hm_zscore gen/$MODEL_ABBR/xsir/mc4.en.hum.z_score.jsonl \
        --wm_zscore gen/$MODEL_ABBR/xsir/mc4.en-$TGT_LANG.mod.z_score.jsonl
done

En->de
AUC: 0.769

TPR@FPR=0.1: 0.318
TPR@FPR=0.01: 0.060

F1@FPR=0.1: 0.450
F1@FPR=0.01: 0.112

En->fr
AUC: 0.810

TPR@FPR=0.1: 0.354
TPR@FPR=0.01: 0.046

F1@FPR=0.1: 0.488
F1@FPR=0.01: 0.087

En->zh
AUC: 0.905

TPR@FPR=0.1: 0.702
TPR@FPR=0.01: 0.182

F1@FPR=0.1: 0.781
F1@FPR=0.01: 0.305

En->ja
AUC: 0.911

TPR@FPR=0.1: 0.696
TPR@FPR=0.01: 0.112

F1@FPR=0.1: 0.775
F1@FPR=0.01: 0.200

Usage (With attack)

You can use the following flags to specify the watermarking method:

KGW

WATERMARK_METHOD_FLAG="--watermark_method kgw"

SIR

MODEL_NAME=baichuan-inc/Baichuan-7B
MODEL_ABBR=baichuan-7b
TRANSFORM_MODEL=data/model/transform_model_x-sbert_10K.pth
MAPPING_FILE=data/mapping/sir/300_mapping_$MODEL_ABBR.json

WATERMARK_METHOD_FLAG="--watermark_method sir  --transform_model $TRANSFORM_MODEL --embedding_model paraphrase-multilingual-mpnet-base-v2 --mapping_file $MAPPING_FILE"

Acknowledgement

This work can not be done without the help of the following repos:

Citation

@article{he2024can,
  title={Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models},
  author={He, Zhiwei and Zhou, Binglin and Hao, Hongkun and Liu, Aiwei and Wang, Xing and Tu, Zhaopeng and Zhang, Zhuosheng and Wang, Rui},
  journal={arXiv preprint arXiv:2402.14007},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
assert		assert
attack		attack
data		data
gen		gen
scripts		scripts
src_watermark		src_watermark
.gitignore		.gitignore
README.md		README.md
detect.py		detect.py
eval_detection.py		eval_detection.py
from-scratch.md		from-scratch.md
gen.py		gen.py
requirements.txt		requirements.txt
utils.py		utils.py

zwhe99/X-SIR

Folders and files

Latest commit

History

Repository files navigation

💧 X-SIR: A text watermark that survives translation

Usage (No attack)

Usage (With attack)

Preparation

Usage (With attack)

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages