Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10

Open
Phuoc-Hoan-Le opened this issue Nov 9, 2021 · 10 comments

Comments

@Phuoc-Hoan-Le
Copy link

Phuoc-Hoan-Le commented Nov 9, 2021

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

@Phuoc-Hoan-Le Phuoc-Hoan-Le changed the title How do we compute eigenvalues of Hessian matrices and know which eigenvalues correspond to which weight matrices or to which module in a model? How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? Nov 27, 2021
@345308394
Copy link

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

@Phuoc-Hoan-Le
Copy link
Author

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

@345308394
Copy link

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

To calculate the maximum eigenvalue of the second derivative of the weights, first calculate the parameters and first partial derivative of the weights. This function(get_params_grad(model)) is to get all the weights and the corresponding first partial derivatives. Therefore, my method is to change this function, return the weight of each block and the corresponding first partial derivative, and then calculate the maximum eigenvalue of the corresponding second derivative.

@Phuoc-Hoan-Le
Copy link
Author

Phuoc-Hoan-Le commented Dec 2, 2021

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

@345308394
Copy link

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

Can we exchange our calculations?

@huanmei9
Copy link

huanmei9 commented Jan 7, 2022

I also encountered the same question, have you solved it? @345308394 @CharlesLeeeee

@xchuwenbo
Copy link

Hi! I also find this question important!

I just want to see the layer-wise eigenvalues of a specific model.

@katayoun-cadence
Copy link

katayoun-cadence commented Feb 3, 2022

@345308394 @CharlesLee did your code solve this issue? Can we exchange the calculation?

@BiaoFangAIA
Copy link

i know how to calculate each layer hessian trace:
get_trace(self,maxIter=100, tol=1e-3):
"""

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

@EdenBelouadah
Copy link

@BiaoFangAIA y

i know how to calculate each layer hessian trace: get_trace(self,maxIter=100, tol=1e-3): """

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

did you solve the issue?

the proposed solution does not work, my ViT model contains 75 layers each containing weight and bias layers, your code returns a list with 52 traces only

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants