Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[budecosystem/code-millenials-1b]Issue with getting coherent results #1

Open
samveen opened this issue Mar 28, 2024 · 0 comments
Open

Comments

@samveen
Copy link

samveen commented Mar 28, 2024

I'm trying to run the code-millenials-1B, but I;m facing issues getting any coherent responses. Please give me the following inputs to help me figure out the root cause of my lack of results

  • CPU requirements if any
  • Memory requirements if any
  • GPU requirements in addition to the CPU requirements above
  • An set of example prompts and associateed expected result as tested by you for comparison against results in my environments.

As of now, I'm using the example code from the repository as below:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("./code-millenials-1b")
model = AutoModelForCausalLM.from_pretrained("./code-millenials-1b")

template = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction: {instruction} ### Response:"""

instruction = "please write a function  taking a string as input and printing 'hello world' postfixed with the input string"

prompt = template.format(instruction=instruction)

inputs = tokenizer(prompt, return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))

My result is as below:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction: please write a function  taking a string as input and printing 'hello world' postfixed with the input string ### Response: downstream Sabbathements censored Lect UkrainianLooks Membershall CASerylements censored bilateralphan circ Blaz presc NvidiaCover Din Kardisites Chronimeter Laure TDDesign McDhall CASerylPointicide butterfly censored Lect Ukrainianroo unwillingness cmd undergradEngland Slovhall CASeryl mysteries Ukrainianroo censored bilateralphan CollectorFrame notingオ CAS Ukrainianroo censored bilateralphan circ Blaz presc disparateオ

The complete output of the example code is as below:

~/workspace/locallm$ python3 test.py 
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of the model checkpoint at ./code-millenials-1b were not used when initializing PhiForCausalLM: ['lm_head.linear.bias', 'lm_head.linear.weight', 'lm_head.ln.bias', 'lm_head.ln.weight', 'transformer.embd.wte.weight', 'transformer.h.0.ln.bias', 'transformer.h.0.ln.weight', 'transformer.h.0.mixer.Wqkv.bias', 'transformer.h.0.mixer.Wqkv.weight', 'transformer.h.0.mixer.out_proj.bias', 'transformer.h.0.mixer.out_proj.weight', 'transformer.h.0.mlp.fc1.bias', 'transformer.h.0.mlp.fc1.weight', 'transformer.h.0.mlp.fc2.bias', 'transformer.h.0.mlp.fc2.weight', 'transformer.h.1.ln.bias', 'transformer.h.1.ln.weight', 'transformer.h.1.mixer.Wqkv.bias', 'transformer.h.1.mixer.Wqkv.weight', 'transformer.h.1.mixer.out_proj.bias', 'transformer.h.1.mixer.out_proj.weight', 'transformer.h.1.mlp.fc1.bias', 'transformer.h.1.mlp.fc1.weight', 'transformer.h.1.mlp.fc2.bias', 'transformer.h.1.mlp.fc2.weight', 'transformer.h.10.ln.bias', 'transformer.h.10.ln.weight', 'transformer.h.10.mixer.Wqkv.bias', 'transformer.h.10.mixer.Wqkv.weight', 'transformer.h.10.mixer.out_proj.bias', 'transformer.h.10.mixer.out_proj.weight', 'transformer.h.10.mlp.fc1.bias', 'transformer.h.10.mlp.fc1.weight', 'transformer.h.10.mlp.fc2.bias', 'transformer.h.10.mlp.fc2.weight', 'transformer.h.11.ln.bias', 'transformer.h.11.ln.weight', 'transformer.h.11.mixer.Wqkv.bias', 'transformer.h.11.mixer.Wqkv.weight', 'transformer.h.11.mixer.out_proj.bias', 'transformer.h.11.mixer.out_proj.weight', 'transformer.h.11.mlp.fc1.bias', 'transformer.h.11.mlp.fc1.weight', 'transformer.h.11.mlp.fc2.bias', 'transformer.h.11.mlp.fc2.weight', 'transformer.h.12.ln.bias', 'transformer.h.12.ln.weight', 'transformer.h.12.mixer.Wqkv.bias', 'transformer.h.12.mixer.Wqkv.weight', 'transformer.h.12.mixer.out_proj.bias', 'transformer.h.12.mixer.out_proj.weight', 'transformer.h.12.mlp.fc1.bias', 'transformer.h.12.mlp.fc1.weight', 'transformer.h.12.mlp.fc2.bias', 'transformer.h.12.mlp.fc2.weight', 'transformer.h.13.ln.bias', 'transformer.h.13.ln.weight', 'transformer.h.13.mixer.Wqkv.bias', 'transformer.h.13.mixer.Wqkv.weight', 'transformer.h.13.mixer.out_proj.bias', 'transformer.h.13.mixer.out_proj.weight', 'transformer.h.13.mlp.fc1.bias', 'transformer.h.13.mlp.fc1.weight', 'transformer.h.13.mlp.fc2.bias', 'transformer.h.13.mlp.fc2.weight', 'transformer.h.14.ln.bias', 'transformer.h.14.ln.weight', 'transformer.h.14.mixer.Wqkv.bias', 'transformer.h.14.mixer.Wqkv.weight', 'transformer.h.14.mixer.out_proj.bias', 'transformer.h.14.mixer.out_proj.weight', 'transformer.h.14.mlp.fc1.bias', 'transformer.h.14.mlp.fc1.weight', 'transformer.h.14.mlp.fc2.bias', 'transformer.h.14.mlp.fc2.weight', 'transformer.h.15.ln.bias', 'transformer.h.15.ln.weight', 'transformer.h.15.mixer.Wqkv.bias', 'transformer.h.15.mixer.Wqkv.weight', 'transformer.h.15.mixer.out_proj.bias', 'transformer.h.15.mixer.out_proj.weight', 'transformer.h.15.mlp.fc1.bias', 'transformer.h.15.mlp.fc1.weight', 'transformer.h.15.mlp.fc2.bias', 'transformer.h.15.mlp.fc2.weight', 'transformer.h.16.ln.bias', 'transformer.h.16.ln.weight', 'transformer.h.16.mixer.Wqkv.bias', 'transformer.h.16.mixer.Wqkv.weight', 'transformer.h.16.mixer.out_proj.bias', 'transformer.h.16.mixer.out_proj.weight', 'transformer.h.16.mlp.fc1.bias', 'transformer.h.16.mlp.fc1.weight', 'transformer.h.16.mlp.fc2.bias', 'transformer.h.16.mlp.fc2.weight', 'transformer.h.17.ln.bias', 'transformer.h.17.ln.weight', 'transformer.h.17.mixer.Wqkv.bias', 'transformer.h.17.mixer.Wqkv.weight', 'transformer.h.17.mixer.out_proj.bias', 'transformer.h.17.mixer.out_proj.weight', 'transformer.h.17.mlp.fc1.bias', 'transformer.h.17.mlp.fc1.weight', 'transformer.h.17.mlp.fc2.bias', 'transformer.h.17.mlp.fc2.weight', 'transformer.h.18.ln.bias', 'transformer.h.18.ln.weight', 'transformer.h.18.mixer.Wqkv.bias', 'transformer.h.18.mixer.Wqkv.weight', 'transformer.h.18.mixer.out_proj.bias', 'transformer.h.18.mixer.out_proj.weight', 'transformer.h.18.mlp.fc1.bias', 'transformer.h.18.mlp.fc1.weight', 'transformer.h.18.mlp.fc2.bias', 'transformer.h.18.mlp.fc2.weight', 'transformer.h.19.ln.bias', 'transformer.h.19.ln.weight', 'transformer.h.19.mixer.Wqkv.bias', 'transformer.h.19.mixer.Wqkv.weight', 'transformer.h.19.mixer.out_proj.bias', 'transformer.h.19.mixer.out_proj.weight', 'transformer.h.19.mlp.fc1.bias', 'transformer.h.19.mlp.fc1.weight', 'transformer.h.19.mlp.fc2.bias', 'transformer.h.19.mlp.fc2.weight', 'transformer.h.2.ln.bias', 'transformer.h.2.ln.weight', 'transformer.h.2.mixer.Wqkv.bias', 'transformer.h.2.mixer.Wqkv.weight', 'transformer.h.2.mixer.out_proj.bias', 'transformer.h.2.mixer.out_proj.weight', 'transformer.h.2.mlp.fc1.bias', 'transformer.h.2.mlp.fc1.weight', 'transformer.h.2.mlp.fc2.bias', 'transformer.h.2.mlp.fc2.weight', 'transformer.h.20.ln.bias', 'transformer.h.20.ln.weight', 'transformer.h.20.mixer.Wqkv.bias', 'transformer.h.20.mixer.Wqkv.weight', 'transformer.h.20.mixer.out_proj.bias', 'transformer.h.20.mixer.out_proj.weight', 'transformer.h.20.mlp.fc1.bias', 'transformer.h.20.mlp.fc1.weight', 'transformer.h.20.mlp.fc2.bias', 'transformer.h.20.mlp.fc2.weight', 'transformer.h.21.ln.bias', 'transformer.h.21.ln.weight', 'transformer.h.21.mixer.Wqkv.bias', 'transformer.h.21.mixer.Wqkv.weight', 'transformer.h.21.mixer.out_proj.bias', 'transformer.h.21.mixer.out_proj.weight', 'transformer.h.21.mlp.fc1.bias', 'transformer.h.21.mlp.fc1.weight', 'transformer.h.21.mlp.fc2.bias', 'transformer.h.21.mlp.fc2.weight', 'transformer.h.22.ln.bias', 'transformer.h.22.ln.weight', 'transformer.h.22.mixer.Wqkv.bias', 'transformer.h.22.mixer.Wqkv.weight', 'transformer.h.22.mixer.out_proj.bias', 'transformer.h.22.mixer.out_proj.weight', 'transformer.h.22.mlp.fc1.bias', 'transformer.h.22.mlp.fc1.weight', 'transformer.h.22.mlp.fc2.bias', 'transformer.h.22.mlp.fc2.weight', 'transformer.h.23.ln.bias', 'transformer.h.23.ln.weight', 'transformer.h.23.mixer.Wqkv.bias', 'transformer.h.23.mixer.Wqkv.weight', 'transformer.h.23.mixer.out_proj.bias', 'transformer.h.23.mixer.out_proj.weight', 'transformer.h.23.mlp.fc1.bias', 'transformer.h.23.mlp.fc1.weight', 'transformer.h.23.mlp.fc2.bias', 'transformer.h.23.mlp.fc2.weight', 'transformer.h.3.ln.bias', 'transformer.h.3.ln.weight', 'transformer.h.3.mixer.Wqkv.bias', 'transformer.h.3.mixer.Wqkv.weight', 'transformer.h.3.mixer.out_proj.bias', 'transformer.h.3.mixer.out_proj.weight', 'transformer.h.3.mlp.fc1.bias', 'transformer.h.3.mlp.fc1.weight', 'transformer.h.3.mlp.fc2.bias', 'transformer.h.3.mlp.fc2.weight', 'transformer.h.4.ln.bias', 'transformer.h.4.ln.weight', 'transformer.h.4.mixer.Wqkv.bias', 'transformer.h.4.mixer.Wqkv.weight', 'transformer.h.4.mixer.out_proj.bias', 'transformer.h.4.mixer.out_proj.weight', 'transformer.h.4.mlp.fc1.bias', 'transformer.h.4.mlp.fc1.weight', 'transformer.h.4.mlp.fc2.bias', 'transformer.h.4.mlp.fc2.weight', 'transformer.h.5.ln.bias', 'transformer.h.5.ln.weight', 'transformer.h.5.mixer.Wqkv.bias', 'transformer.h.5.mixer.Wqkv.weight', 'transformer.h.5.mixer.out_proj.bias', 'transformer.h.5.mixer.out_proj.weight', 'transformer.h.5.mlp.fc1.bias', 'transformer.h.5.mlp.fc1.weight', 'transformer.h.5.mlp.fc2.bias', 'transformer.h.5.mlp.fc2.weight', 'transformer.h.6.ln.bias', 'transformer.h.6.ln.weight', 'transformer.h.6.mixer.Wqkv.bias', 'transformer.h.6.mixer.Wqkv.weight', 'transformer.h.6.mixer.out_proj.bias', 'transformer.h.6.mixer.out_proj.weight', 'transformer.h.6.mlp.fc1.bias', 'transformer.h.6.mlp.fc1.weight', 'transformer.h.6.mlp.fc2.bias', 'transformer.h.6.mlp.fc2.weight', 'transformer.h.7.ln.bias', 'transformer.h.7.ln.weight', 'transformer.h.7.mixer.Wqkv.bias', 'transformer.h.7.mixer.Wqkv.weight', 'transformer.h.7.mixer.out_proj.bias', 'transformer.h.7.mixer.out_proj.weight', 'transformer.h.7.mlp.fc1.bias', 'transformer.h.7.mlp.fc1.weight', 'transformer.h.7.mlp.fc2.bias', 'transformer.h.7.mlp.fc2.weight', 'transformer.h.8.ln.bias', 'transformer.h.8.ln.weight', 'transformer.h.8.mixer.Wqkv.bias', 'transformer.h.8.mixer.Wqkv.weight', 'transformer.h.8.mixer.out_proj.bias', 'transformer.h.8.mixer.out_proj.weight', 'transformer.h.8.mlp.fc1.bias', 'transformer.h.8.mlp.fc1.weight', 'transformer.h.8.mlp.fc2.bias', 'transformer.h.8.mlp.fc2.weight', 'transformer.h.9.ln.bias', 'transformer.h.9.ln.weight', 'transformer.h.9.mixer.Wqkv.bias', 'transformer.h.9.mixer.Wqkv.weight', 'transformer.h.9.mixer.out_proj.bias', 'transformer.h.9.mixer.out_proj.weight', 'transformer.h.9.mlp.fc1.bias', 'transformer.h.9.mlp.fc1.weight', 'transformer.h.9.mlp.fc2.bias', 'transformer.h.9.mlp.fc2.weight']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of PhiForCausalLM were not initialized from the model checkpoint at ./code-millenials-1b and are newly initialized: ['embed_tokens.weight', 'final_layernorm.bias', 'final_layernorm.weight', 'layers.0.input_layernorm.bias', 'layers.0.input_layernorm.weight', 'layers.0.mlp.fc1.bias', 'layers.0.mlp.fc1.weight', 'layers.0.mlp.fc2.bias', 'layers.0.mlp.fc2.weight', 'layers.0.self_attn.dense.bias', 'layers.0.self_attn.dense.weight', 'layers.0.self_attn.k_proj.bias', 'layers.0.self_attn.k_proj.weight', 'layers.0.self_attn.q_proj.bias', 'layers.0.self_attn.q_proj.weight', 'layers.0.self_attn.v_proj.bias', 'layers.0.self_attn.v_proj.weight', 'layers.1.input_layernorm.bias', 'layers.1.input_layernorm.weight', 'layers.1.mlp.fc1.bias', 'layers.1.mlp.fc1.weight', 'layers.1.mlp.fc2.bias', 'layers.1.mlp.fc2.weight', 'layers.1.self_attn.dense.bias', 'layers.1.self_attn.dense.weight', 'layers.1.self_attn.k_proj.bias', 'layers.1.self_attn.k_proj.weight', 'layers.1.self_attn.q_proj.bias', 'layers.1.self_attn.q_proj.weight', 'layers.1.self_attn.v_proj.bias', 'layers.1.self_attn.v_proj.weight', 'layers.10.input_layernorm.bias', 'layers.10.input_layernorm.weight', 'layers.10.mlp.fc1.bias', 'layers.10.mlp.fc1.weight', 'layers.10.mlp.fc2.bias', 'layers.10.mlp.fc2.weight', 'layers.10.self_attn.dense.bias', 'layers.10.self_attn.dense.weight', 'layers.10.self_attn.k_proj.bias', 'layers.10.self_attn.k_proj.weight', 'layers.10.self_attn.q_proj.bias', 'layers.10.self_attn.q_proj.weight', 'layers.10.self_attn.v_proj.bias', 'layers.10.self_attn.v_proj.weight', 'layers.11.input_layernorm.bias', 'layers.11.input_layernorm.weight', 'layers.11.mlp.fc1.bias', 'layers.11.mlp.fc1.weight', 'layers.11.mlp.fc2.bias', 'layers.11.mlp.fc2.weight', 'layers.11.self_attn.dense.bias', 'layers.11.self_attn.dense.weight', 'layers.11.self_attn.k_proj.bias', 'layers.11.self_attn.k_proj.weight', 'layers.11.self_attn.q_proj.bias', 'layers.11.self_attn.q_proj.weight', 'layers.11.self_attn.v_proj.bias', 'layers.11.self_attn.v_proj.weight', 'layers.12.input_layernorm.bias', 'layers.12.input_layernorm.weight', 'layers.12.mlp.fc1.bias', 'layers.12.mlp.fc1.weight', 'layers.12.mlp.fc2.bias', 'layers.12.mlp.fc2.weight', 'layers.12.self_attn.dense.bias', 'layers.12.self_attn.dense.weight', 'layers.12.self_attn.k_proj.bias', 'layers.12.self_attn.k_proj.weight', 'layers.12.self_attn.q_proj.bias', 'layers.12.self_attn.q_proj.weight', 'layers.12.self_attn.v_proj.bias', 'layers.12.self_attn.v_proj.weight', 'layers.13.input_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.13.mlp.fc1.bias', 'layers.13.mlp.fc1.weight', 'layers.13.mlp.fc2.bias', 'layers.13.mlp.fc2.weight', 'layers.13.self_attn.dense.bias', 'layers.13.self_attn.dense.weight', 'layers.13.self_attn.k_proj.bias', 'layers.13.self_attn.k_proj.weight', 'layers.13.self_attn.q_proj.bias', 'layers.13.self_attn.q_proj.weight', 'layers.13.self_attn.v_proj.bias', 'layers.13.self_attn.v_proj.weight', 'layers.14.input_layernorm.bias', 'layers.14.input_layernorm.weight', 'layers.14.mlp.fc1.bias', 'layers.14.mlp.fc1.weight', 'layers.14.mlp.fc2.bias', 'layers.14.mlp.fc2.weight', 'layers.14.self_attn.dense.bias', 'layers.14.self_attn.dense.weight', 'layers.14.self_attn.k_proj.bias', 'layers.14.self_attn.k_proj.weight', 'layers.14.self_attn.q_proj.bias', 'layers.14.self_attn.q_proj.weight', 'layers.14.self_attn.v_proj.bias', 'layers.14.self_attn.v_proj.weight', 'layers.15.input_layernorm.bias', 'layers.15.input_layernorm.weight', 'layers.15.mlp.fc1.bias', 'layers.15.mlp.fc1.weight', 'layers.15.mlp.fc2.bias', 'layers.15.mlp.fc2.weight', 'layers.15.self_attn.dense.bias', 'layers.15.self_attn.dense.weight', 'layers.15.self_attn.k_proj.bias', 'layers.15.self_attn.k_proj.weight', 'layers.15.self_attn.q_proj.bias', 'layers.15.self_attn.q_proj.weight', 'layers.15.self_attn.v_proj.bias', 'layers.15.self_attn.v_proj.weight', 'layers.16.input_layernorm.bias', 'layers.16.input_layernorm.weight', 'layers.16.mlp.fc1.bias', 'layers.16.mlp.fc1.weight', 'layers.16.mlp.fc2.bias', 'layers.16.mlp.fc2.weight', 'layers.16.self_attn.dense.bias', 'layers.16.self_attn.dense.weight', 'layers.16.self_attn.k_proj.bias', 'layers.16.self_attn.k_proj.weight', 'layers.16.self_attn.q_proj.bias', 'layers.16.self_attn.q_proj.weight', 'layers.16.self_attn.v_proj.bias', 'layers.16.self_attn.v_proj.weight', 'layers.17.input_layernorm.bias', 'layers.17.input_layernorm.weight', 'layers.17.mlp.fc1.bias', 'layers.17.mlp.fc1.weight', 'layers.17.mlp.fc2.bias', 'layers.17.mlp.fc2.weight', 'layers.17.self_attn.dense.bias', 'layers.17.self_attn.dense.weight', 'layers.17.self_attn.k_proj.bias', 'layers.17.self_attn.k_proj.weight', 'layers.17.self_attn.q_proj.bias', 'layers.17.self_attn.q_proj.weight', 'layers.17.self_attn.v_proj.bias', 'layers.17.self_attn.v_proj.weight', 'layers.18.input_layernorm.bias', 'layers.18.input_layernorm.weight', 'layers.18.mlp.fc1.bias', 'layers.18.mlp.fc1.weight', 'layers.18.mlp.fc2.bias', 'layers.18.mlp.fc2.weight', 'layers.18.self_attn.dense.bias', 'layers.18.self_attn.dense.weight', 'layers.18.self_attn.k_proj.bias', 'layers.18.self_attn.k_proj.weight', 'layers.18.self_attn.q_proj.bias', 'layers.18.self_attn.q_proj.weight', 'layers.18.self_attn.v_proj.bias', 'layers.18.self_attn.v_proj.weight', 'layers.19.input_layernorm.bias', 'layers.19.input_layernorm.weight', 'layers.19.mlp.fc1.bias', 'layers.19.mlp.fc1.weight', 'layers.19.mlp.fc2.bias', 'layers.19.mlp.fc2.weight', 'layers.19.self_attn.dense.bias', 'layers.19.self_attn.dense.weight', 'layers.19.self_attn.k_proj.bias', 'layers.19.self_attn.k_proj.weight', 'layers.19.self_attn.q_proj.bias', 'layers.19.self_attn.q_proj.weight', 'layers.19.self_attn.v_proj.bias', 'layers.19.self_attn.v_proj.weight', 'layers.2.input_layernorm.bias', 'layers.2.input_layernorm.weight', 'layers.2.mlp.fc1.bias', 'layers.2.mlp.fc1.weight', 'layers.2.mlp.fc2.bias', 'layers.2.mlp.fc2.weight', 'layers.2.self_attn.dense.bias', 'layers.2.self_attn.dense.weight', 'layers.2.self_attn.k_proj.bias', 'layers.2.self_attn.k_proj.weight', 'layers.2.self_attn.q_proj.bias', 'layers.2.self_attn.q_proj.weight', 'layers.2.self_attn.v_proj.bias', 'layers.2.self_attn.v_proj.weight', 'layers.20.input_layernorm.bias', 'layers.20.input_layernorm.weight', 'layers.20.mlp.fc1.bias', 'layers.20.mlp.fc1.weight', 'layers.20.mlp.fc2.bias', 'layers.20.mlp.fc2.weight', 'layers.20.self_attn.dense.bias', 'layers.20.self_attn.dense.weight', 'layers.20.self_attn.k_proj.bias', 'layers.20.self_attn.k_proj.weight', 'layers.20.self_attn.q_proj.bias', 'layers.20.self_attn.q_proj.weight', 'layers.20.self_attn.v_proj.bias', 'layers.20.self_attn.v_proj.weight', 'layers.21.input_layernorm.bias', 'layers.21.input_layernorm.weight', 'layers.21.mlp.fc1.bias', 'layers.21.mlp.fc1.weight', 'layers.21.mlp.fc2.bias', 'layers.21.mlp.fc2.weight', 'layers.21.self_attn.dense.bias', 'layers.21.self_attn.dense.weight', 'layers.21.self_attn.k_proj.bias', 'layers.21.self_attn.k_proj.weight', 'layers.21.self_attn.q_proj.bias', 'layers.21.self_attn.q_proj.weight', 'layers.21.self_attn.v_proj.bias', 'layers.21.self_attn.v_proj.weight', 'layers.22.input_layernorm.bias', 'layers.22.input_layernorm.weight', 'layers.22.mlp.fc1.bias', 'layers.22.mlp.fc1.weight', 'layers.22.mlp.fc2.bias', 'layers.22.mlp.fc2.weight', 'layers.22.self_attn.dense.bias', 'layers.22.self_attn.dense.weight', 'layers.22.self_attn.k_proj.bias', 'layers.22.self_attn.k_proj.weight', 'layers.22.self_attn.q_proj.bias', 'layers.22.self_attn.q_proj.weight', 'layers.22.self_attn.v_proj.bias', 'layers.22.self_attn.v_proj.weight', 'layers.23.input_layernorm.bias', 'layers.23.input_layernorm.weight', 'layers.23.mlp.fc1.bias', 'layers.23.mlp.fc1.weight', 'layers.23.mlp.fc2.bias', 'layers.23.mlp.fc2.weight', 'layers.23.self_attn.dense.bias', 'layers.23.self_attn.dense.weight', 'layers.23.self_attn.k_proj.bias', 'layers.23.self_attn.k_proj.weight', 'layers.23.self_attn.q_proj.bias', 'layers.23.self_attn.q_proj.weight', 'layers.23.self_attn.v_proj.bias', 'layers.23.self_attn.v_proj.weight', 'layers.3.input_layernorm.bias', 'layers.3.input_layernorm.weight', 'layers.3.mlp.fc1.bias', 'layers.3.mlp.fc1.weight', 'layers.3.mlp.fc2.bias', 'layers.3.mlp.fc2.weight', 'layers.3.self_attn.dense.bias', 'layers.3.self_attn.dense.weight', 'layers.3.self_attn.k_proj.bias', 'layers.3.self_attn.k_proj.weight', 'layers.3.self_attn.q_proj.bias', 'layers.3.self_attn.q_proj.weight', 'layers.3.self_attn.v_proj.bias', 'layers.3.self_attn.v_proj.weight', 'layers.4.input_layernorm.bias', 'layers.4.input_layernorm.weight', 'layers.4.mlp.fc1.bias', 'layers.4.mlp.fc1.weight', 'layers.4.mlp.fc2.bias', 'layers.4.mlp.fc2.weight', 'layers.4.self_attn.dense.bias', 'layers.4.self_attn.dense.weight', 'layers.4.self_attn.k_proj.bias', 'layers.4.self_attn.k_proj.weight', 'layers.4.self_attn.q_proj.bias', 'layers.4.self_attn.q_proj.weight', 'layers.4.self_attn.v_proj.bias', 'layers.4.self_attn.v_proj.weight', 'layers.5.input_layernorm.bias', 'layers.5.input_layernorm.weight', 'layers.5.mlp.fc1.bias', 'layers.5.mlp.fc1.weight', 'layers.5.mlp.fc2.bias', 'layers.5.mlp.fc2.weight', 'layers.5.self_attn.dense.bias', 'layers.5.self_attn.dense.weight', 'layers.5.self_attn.k_proj.bias', 'layers.5.self_attn.k_proj.weight', 'layers.5.self_attn.q_proj.bias', 'layers.5.self_attn.q_proj.weight', 'layers.5.self_attn.v_proj.bias', 'layers.5.self_attn.v_proj.weight', 'layers.6.input_layernorm.bias', 'layers.6.input_layernorm.weight', 'layers.6.mlp.fc1.bias', 'layers.6.mlp.fc1.weight', 'layers.6.mlp.fc2.bias', 'layers.6.mlp.fc2.weight', 'layers.6.self_attn.dense.bias', 'layers.6.self_attn.dense.weight', 'layers.6.self_attn.k_proj.bias', 'layers.6.self_attn.k_proj.weight', 'layers.6.self_attn.q_proj.bias', 'layers.6.self_attn.q_proj.weight', 'layers.6.self_attn.v_proj.bias', 'layers.6.self_attn.v_proj.weight', 'layers.7.input_layernorm.bias', 'layers.7.input_layernorm.weight', 'layers.7.mlp.fc1.bias', 'layers.7.mlp.fc1.weight', 'layers.7.mlp.fc2.bias', 'layers.7.mlp.fc2.weight', 'layers.7.self_attn.dense.bias', 'layers.7.self_attn.dense.weight', 'layers.7.self_attn.k_proj.bias', 'layers.7.self_attn.k_proj.weight', 'layers.7.self_attn.q_proj.bias', 'layers.7.self_attn.q_proj.weight', 'layers.7.self_attn.v_proj.bias', 'layers.7.self_attn.v_proj.weight', 'layers.8.input_layernorm.bias', 'layers.8.input_layernorm.weight', 'layers.8.mlp.fc1.bias', 'layers.8.mlp.fc1.weight', 'layers.8.mlp.fc2.bias', 'layers.8.mlp.fc2.weight', 'layers.8.self_attn.dense.bias', 'layers.8.self_attn.dense.weight', 'layers.8.self_attn.k_proj.bias', 'layers.8.self_attn.k_proj.weight', 'layers.8.self_attn.q_proj.bias', 'layers.8.self_attn.q_proj.weight', 'layers.8.self_attn.v_proj.bias', 'layers.8.self_attn.v_proj.weight', 'layers.9.input_layernorm.bias', 'layers.9.input_layernorm.weight', 'layers.9.mlp.fc1.bias', 'layers.9.mlp.fc1.weight', 'layers.9.mlp.fc2.bias', 'layers.9.mlp.fc2.weight', 'layers.9.self_attn.dense.bias', 'layers.9.self_attn.dense.weight', 'layers.9.self_attn.k_proj.bias', 'layers.9.self_attn.k_proj.weight', 'layers.9.self_attn.q_proj.bias', 'layers.9.self_attn.q_proj.weight', 'layers.9.self_attn.v_proj.bias', 'layers.9.self_attn.v_proj.weight', 'lm_head.bias', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction: please write a function  taking a string as input and printing 'hello world' postfixed with the input string ### Response: downstream Sabbathements censored Lect UkrainianLooks Membershall CASerylements censored bilateralphan circ Blaz presc NvidiaCover Din Kardisites Chronimeter Laure TDDesign McDhall CASerylPointicide butterfly censored Lect Ukrainianroo unwillingness cmd undergradEngland Slovhall CASeryl mysteries Ukrainianroo censored bilateralphan CollectorFrame notingオ CAS Ukrainianroo censored bilateralphan circ Blaz presc disparateオ```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant