cvector: better prompt handling, add "mean vector" method #8069

ngxson · 2024-06-22T22:23:56Z

Motivation

After more consideration, I think that we should not handle completions in cvector (at least for now), because it can add unnecessary complexity. Positive/negative prompts are now 100% up to user to prepare. I also changed the example to using llama-3 format.

Also fix a bug where special tokens are not being correctly tokenized.

With this change, I spotted a problem with PCA: the output vector is being inverted (i.e. cvector_happy.gguf +1.0 makes it sad, while cvector_happy.gguf -1.0 makes it happy). Don't know why for now, but the quick fix is to invert the vector back before saving it.

Mean method

Added "mean" as dimensionality reduction method. It simple calculates mean vector from all embeddings.

The output results turns out to be very acceptable even with this simple method:

# +0.8 happy
YAY! I'm super excited to share a bedtime story with YOU! Here's a fun and exciting story for you!

# -0.8 happy
The darkness of despair weighs heavy upon your empty chest. The once-tainted soul, now forever shrouded in the darkness of a thousand tears.

Demo

./llama-cli -m ./llama-3.Q4_K_M.gguf -p "<|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nSing a song<|im_end|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" --special --control-vector-scaled ./control_vector.gguf -0.8 --control-vector-layer-range 10 31
...
"Oh, the darkness is spreading wide,
A heavy fog that won't subside,
The weight of the world, it's crushing me,
And the melodies that were once free!

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

HatsuneMikuUwU33 · 2024-06-23T07:06:54Z

I think that we should not handle completions in cvector (at least for now), because it can add unnecessary complexity.

What's the reason for removing it? If you don't want to use them just make an empty file and run with --completions 1

ngxson · 2024-06-23T09:10:32Z

@HatsuneMikuUwU33 The main reason is that completions file is used for preparing input data used for training. But main goal of cvector-generator is, as the name suggest, to calculate the vector file. The nature of these 2 steps are different.

At this stage, I don't feel like mixing these 2 steps in the same program because it makes the code too complex. The actually completion mentioned in #7514 (comment) can be done better using llama-cli. This allow user to have much more control over input data.

If we absolutely want to have completion here, I'd suggest writing a dedicated program (maybe shell script or python) to do the data preparing step.

CC @christianazinn

jukofyork · 2024-06-23T21:26:43Z

With this change, I spotted a problem with PCA: the output vector is being inverted (i.e. cvector_happy.gguf +1.0 makes it sad, while cvector_happy.gguf -1.0 makes it happy). Don't know why for now, but the quick fix is to invert the vector back before saving it.

I'm just looking through the code now and I can't see anywhere where the data matrix is projected onto the principle component:

    PCA::run_pca(pca_params, ctx_train.v_diff, ctx_train.v_final);

    // write output vectors to gguf
    export_gguf(ctx_train.v_final, params.cvector_outfile, model_hint);

static void run_pca(
        struct pca_params & params,
        const std::vector<struct ggml_tensor *> & v_input, // shape of v_input[0]: [n_samples, n_embd]
        const std::vector<struct ggml_tensor *> & v_output) {
    printf("%s: Running PCA...\n", __func__);
    for (size_t il = 0; il < v_input.size(); ++il) {

        // prepare output vector
        struct ggml_tensor * ctrl_out = v_output[il];
        ggml_format_name(ctrl_out, "direction.%ld", il+1);

        // run power_iteration
        params.i_layer = il;
        params.n_layers = v_input.size();
        power_iteration(params, v_input[il], ctrl_out);
        printf("%s: Done layer %d / %d\n", __func__, (int) il+1, (int) v_input.size());
    }
}

    // get output tensor
    GGML_ASSERT(last_eigenvector);
    ggml_backend_tensor_get(last_eigenvector, output->data, 0, ggml_nbytes(last_eigenvector));
    //print_debug_tensor(output);
    ggml_gallocr_free(allocr);

You need to project the data matrix onto the Eigenvector(s), calculate the mean, and see if the signs of the vector(s) need flipping so that adding the vector makes the mean go the way you want. There is no inherent directionality to the Eigenvectors found and it's pretty much random if it points the way you want it or not.

The Python code does this here:

        # calculate sign
        projected_hiddens = project_onto_direction(h, directions[layer])

        # order is [positive, negative, positive, negative, ...]
        positive_smaller_mean = np.mean(
            [
                projected_hiddens[i] < projected_hiddens[i + 1]
                for i in range(0, len(inputs) * 2, 2)
            ]
        )
        positive_larger_mean = np.mean(
            [
                projected_hiddens[i] > projected_hiddens[i + 1]
                for i in range(0, len(inputs) * 2, 2)
            ]
        )

        if positive_smaller_mean > positive_larger_mean:  # type: ignore
            directions[layer] *= -1

See: https://github.com/vgel/repeng/blob/main/repeng/extract.py

But I actually think (and have tested successful in my own code), that not only should the sign be flipped but the magnitude should be scaled by the mean or else all the vectors will just keep their norm of 1 as returned by PCA, and it will be very hard to balance mixing control vectors from early and later layers where the mean hidden state is 1-2 orders of magnitude different.

jukofyork · 2024-06-23T21:38:37Z

Mean method

Added "mean" as dimensionality reduction method. It simple calculates mean vector from all embeddings.

The output results turns out to be very acceptable even with this simple method:
# +0.8 happy
YAY! I'm super excited to share a bedtime story with YOU! Here's a fun and exciting story for you!

# -0.8 happy
The darkness of despair weighs heavy upon your empty chest. The once-tainted soul, now forever shrouded in the darkness of a thousand tears.

This method is actually Linear Discriminant Analysis where you assume the covariance matrices are just the identity. This is a nice explanation of why this is and why it might not be optimal:

https://sthalles.github.io/fisher-linear-discriminant/

ngxson · 2024-06-24T18:27:27Z

You need to project the data matrix onto the Eigenvector(s), calculate the mean, and see if the signs of the vector(s) need flipping so that adding the vector makes the mean go the way you want. There is no inherent directionality to the Eigenvectors found and it's pretty much random if it points the way you want it or not.

Thanks for the explanation. I looked on the python code earlier but didn't understand this part in particular. It's all clear now and I'll try to bring this part to cpp. For now I'll just remove my hot fix and leave a TODO there.

But I actually think (and have tested successful in my own code), that not only should the sign be flipped but the magnitude should be scaled by the mean or else all the vectors will just keep their norm of 1 as returned by PCA, and it will be very hard to balance mixing control vectors from early and later layers where the mean hidden state is 1-2 orders of magnitude different.

Cool! Maybe this is also related to the fact that the generated control vector only effective if I apply it to layers higher than 10 (i.e. --control-vector-layer-range 10 31)

jukofyork · 2024-06-24T19:06:57Z

Hopefully this isn't confusing as I'm actually using more than 2 classes (num_dataset_types), but this is essentially what my code would do for 2 classes:

projected_scores = [self._project_data_onto_component(d, component) for d in data]
mean_differences = self._compute_mean_difference(projected_scores[0], projected_scores[1])  # 2 classes only!
for j in range(num_dataset_types - 1):
    scaled_direction = -mean_differences[j] * component
    direction_matrices[j][layer_index].append(torch.tensor(scaled_direction))

def _project_data_onto_component(self, data, component):
    return  np.dot(data, component.reshape(-1, 1)).squeeze()

def _compute_mean_difference(projected_scores1, projected_scores2):
    return np.mean(projected_scores1) - np.mean(projected_scores2)

To use the same logic as the old code where you just keep the norms of 1:

scaled_direction =  -math.copysign(1.0, mean_differences[j]) * component

I'm also being careful to use the delta of hidden_state[i] - hidden_state[i-1] instead of just hidden_state[i] (where hidden_state[0] is the hidden state before the first block, etc), but the original Python code is very obtuse and I'm not sure if they did this or if it really makes any difference... It's always a good idea to normalized anything you can like this for PCA or else you risk having the first principle component end up doing the normalization for you and screwing up all the subsequent components due to the orthogonality constraint!

christianazinn · 2024-06-25T00:12:35Z

@HatsuneMikuUwU33 The main reason is that completions file is used for preparing input data used for training. But main goal of cvector-generator is, as the name suggest, to calculate the vector file. The nature of these 2 steps are different.

At this stage, I don't feel like mixing these 2 steps in the same program because it makes the code too complex. The actually completion mentioned in #7514 (comment) can be done better using llama-cli. This allow user to have much more control over input data.

If we absolutely want to have completion here, I'd suggest writing a dedicated program (maybe shell script or python) to do the data preparing step.

CC @christianazinn

This looks fine. Ideally I'd like to have an example Python/shell script to provide examples or instructions of how to format data, but this is not urgent.

remove completions file

8a52b54

github-actions bot added the examples label Jun 22, 2024

ngxson added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 22, 2024

fix inverted vector

bd989c2

add mean method

6a11a39

ngxson changed the title ~~cvector: better prompt handling~~ cvector: better prompt handling, add "mean vector" method Jun 23, 2024

code style

a41075a

ngxson marked this pull request as ready for review June 23, 2024 13:35

ngxson requested a review from slaren June 23, 2024 13:37

slaren approved these changes Jun 24, 2024

View reviewed changes

ngxson added 2 commits June 25, 2024 10:54

remove inverted pca hotfix

0d4ecfd

Merge branch 'master' into xsn/cvector-better-prompt

8bffa85

ngxson added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jun 25, 2024

ngxson merged commit 49c03c7 into ggerganov:master Jun 25, 2024
63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cvector: better prompt handling, add "mean vector" method #8069

cvector: better prompt handling, add "mean vector" method #8069

ngxson commented Jun 22, 2024 •

edited

Loading

HatsuneMikuUwU33 commented Jun 23, 2024 •

edited

Loading

ngxson commented Jun 23, 2024

jukofyork commented Jun 23, 2024 •

edited

Loading

jukofyork commented Jun 23, 2024

Mean method

ngxson commented Jun 24, 2024

jukofyork commented Jun 24, 2024 •

edited

Loading

christianazinn commented Jun 25, 2024

cvector: better prompt handling, add "mean vector" method #8069

cvector: better prompt handling, add "mean vector" method #8069

Conversation

ngxson commented Jun 22, 2024 • edited Loading

Motivation

Mean method

Demo

HatsuneMikuUwU33 commented Jun 23, 2024 • edited Loading

ngxson commented Jun 23, 2024

jukofyork commented Jun 23, 2024 • edited Loading

jukofyork commented Jun 23, 2024

Mean method

ngxson commented Jun 24, 2024

jukofyork commented Jun 24, 2024 • edited Loading

christianazinn commented Jun 25, 2024

ngxson commented Jun 22, 2024 •

edited

Loading

HatsuneMikuUwU33 commented Jun 23, 2024 •

edited

Loading

jukofyork commented Jun 23, 2024 •

edited

Loading

jukofyork commented Jun 24, 2024 •

edited

Loading