You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run the code-millenials-1B, but I;m facing issues getting any coherent responses. Please give me the following inputs to help me figure out the root cause of my lack of results
CPU requirements if any
Memory requirements if any
GPU requirements in addition to the CPU requirements above
An set of example prompts and associateed expected result as tested by you for comparison against results in my environments.
As of now, I'm using the example code from the repository as below:
importtorchfromtransformersimportAutoTokenizer, AutoModelForCausalLMtokenizer=AutoTokenizer.from_pretrained("./code-millenials-1b")
model=AutoModelForCausalLM.from_pretrained("./code-millenials-1b")
template="""A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.### Instruction: {instruction} ### Response:"""instruction="please write a function taking a string as input and printing 'hello world' postfixed with the input string"prompt=template.format(instruction=instruction)
inputs=tokenizer(prompt, return_tensors="pt")
sample=model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))
My result is as below:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction: please write a function taking a string as input and printing 'hello world' postfixed with the input string ### Response: downstream Sabbathements censored Lect UkrainianLooks Membershall CASerylements censored bilateralphan circ Blaz presc NvidiaCover Din Kardisites Chronimeter Laure TDDesign McDhall CASerylPointicide butterfly censored Lect Ukrainianroo unwillingness cmd undergradEngland Slovhall CASeryl mysteries Ukrainianroo censored bilateralphan CollectorFrame notingオ CAS Ukrainianroo censored bilateralphan circ Blaz presc disparateオ
The complete output of the example code is as below:
~/workspace/locallm$ python3 test.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Some weights of the model checkpoint at ./code-millenials-1b were not used when initializing PhiForCausalLM: ['lm_head.linear.bias', 'lm_head.linear.weight', 'lm_head.ln.bias', 'lm_head.ln.weight', 'transformer.embd.wte.weight', 'transformer.h.0.ln.bias', 'transformer.h.0.ln.weight', 'transformer.h.0.mixer.Wqkv.bias', 'transformer.h.0.mixer.Wqkv.weight', 'transformer.h.0.mixer.out_proj.bias', 'transformer.h.0.mixer.out_proj.weight', 'transformer.h.0.mlp.fc1.bias', 'transformer.h.0.mlp.fc1.weight', 'transformer.h.0.mlp.fc2.bias', 'transformer.h.0.mlp.fc2.weight', 'transformer.h.1.ln.bias', 'transformer.h.1.ln.weight', 'transformer.h.1.mixer.Wqkv.bias', 'transformer.h.1.mixer.Wqkv.weight', 'transformer.h.1.mixer.out_proj.bias', 'transformer.h.1.mixer.out_proj.weight', 'transformer.h.1.mlp.fc1.bias', 'transformer.h.1.mlp.fc1.weight', 'transformer.h.1.mlp.fc2.bias', 'transformer.h.1.mlp.fc2.weight', 'transformer.h.10.ln.bias', 'transformer.h.10.ln.weight', 'transformer.h.10.mixer.Wqkv.bias', 'transformer.h.10.mixer.Wqkv.weight', 'transformer.h.10.mixer.out_proj.bias', 'transformer.h.10.mixer.out_proj.weight', 'transformer.h.10.mlp.fc1.bias', 'transformer.h.10.mlp.fc1.weight', 'transformer.h.10.mlp.fc2.bias', 'transformer.h.10.mlp.fc2.weight', 'transformer.h.11.ln.bias', 'transformer.h.11.ln.weight', 'transformer.h.11.mixer.Wqkv.bias', 'transformer.h.11.mixer.Wqkv.weight', 'transformer.h.11.mixer.out_proj.bias', 'transformer.h.11.mixer.out_proj.weight', 'transformer.h.11.mlp.fc1.bias', 'transformer.h.11.mlp.fc1.weight', 'transformer.h.11.mlp.fc2.bias', 'transformer.h.11.mlp.fc2.weight', 'transformer.h.12.ln.bias', 'transformer.h.12.ln.weight', 'transformer.h.12.mixer.Wqkv.bias', 'transformer.h.12.mixer.Wqkv.weight', 'transformer.h.12.mixer.out_proj.bias', 'transformer.h.12.mixer.out_proj.weight', 'transformer.h.12.mlp.fc1.bias', 'transformer.h.12.mlp.fc1.weight', 'transformer.h.12.mlp.fc2.bias', 'transformer.h.12.mlp.fc2.weight', 'transformer.h.13.ln.bias', 'transformer.h.13.ln.weight', 'transformer.h.13.mixer.Wqkv.bias', 'transformer.h.13.mixer.Wqkv.weight', 'transformer.h.13.mixer.out_proj.bias', 'transformer.h.13.mixer.out_proj.weight', 'transformer.h.13.mlp.fc1.bias', 'transformer.h.13.mlp.fc1.weight', 'transformer.h.13.mlp.fc2.bias', 'transformer.h.13.mlp.fc2.weight', 'transformer.h.14.ln.bias', 'transformer.h.14.ln.weight', 'transformer.h.14.mixer.Wqkv.bias', 'transformer.h.14.mixer.Wqkv.weight', 'transformer.h.14.mixer.out_proj.bias', 'transformer.h.14.mixer.out_proj.weight', 'transformer.h.14.mlp.fc1.bias', 'transformer.h.14.mlp.fc1.weight', 'transformer.h.14.mlp.fc2.bias', 'transformer.h.14.mlp.fc2.weight', 'transformer.h.15.ln.bias', 'transformer.h.15.ln.weight', 'transformer.h.15.mixer.Wqkv.bias', 'transformer.h.15.mixer.Wqkv.weight', 'transformer.h.15.mixer.out_proj.bias', 'transformer.h.15.mixer.out_proj.weight', 'transformer.h.15.mlp.fc1.bias', 'transformer.h.15.mlp.fc1.weight', 'transformer.h.15.mlp.fc2.bias', 'transformer.h.15.mlp.fc2.weight', 'transformer.h.16.ln.bias', 'transformer.h.16.ln.weight', 'transformer.h.16.mixer.Wqkv.bias', 'transformer.h.16.mixer.Wqkv.weight', 'transformer.h.16.mixer.out_proj.bias', 'transformer.h.16.mixer.out_proj.weight', 'transformer.h.16.mlp.fc1.bias', 'transformer.h.16.mlp.fc1.weight', 'transformer.h.16.mlp.fc2.bias', 'transformer.h.16.mlp.fc2.weight', 'transformer.h.17.ln.bias', 'transformer.h.17.ln.weight', 'transformer.h.17.mixer.Wqkv.bias', 'transformer.h.17.mixer.Wqkv.weight', 'transformer.h.17.mixer.out_proj.bias', 'transformer.h.17.mixer.out_proj.weight', 'transformer.h.17.mlp.fc1.bias', 'transformer.h.17.mlp.fc1.weight', 'transformer.h.17.mlp.fc2.bias', 'transformer.h.17.mlp.fc2.weight', 'transformer.h.18.ln.bias', 'transformer.h.18.ln.weight', 'transformer.h.18.mixer.Wqkv.bias', 'transformer.h.18.mixer.Wqkv.weight', 'transformer.h.18.mixer.out_proj.bias', 'transformer.h.18.mixer.out_proj.weight', 'transformer.h.18.mlp.fc1.bias', 'transformer.h.18.mlp.fc1.weight', 'transformer.h.18.mlp.fc2.bias', 'transformer.h.18.mlp.fc2.weight', 'transformer.h.19.ln.bias', 'transformer.h.19.ln.weight', 'transformer.h.19.mixer.Wqkv.bias', 'transformer.h.19.mixer.Wqkv.weight', 'transformer.h.19.mixer.out_proj.bias', 'transformer.h.19.mixer.out_proj.weight', 'transformer.h.19.mlp.fc1.bias', 'transformer.h.19.mlp.fc1.weight', 'transformer.h.19.mlp.fc2.bias', 'transformer.h.19.mlp.fc2.weight', 'transformer.h.2.ln.bias', 'transformer.h.2.ln.weight', 'transformer.h.2.mixer.Wqkv.bias', 'transformer.h.2.mixer.Wqkv.weight', 'transformer.h.2.mixer.out_proj.bias', 'transformer.h.2.mixer.out_proj.weight', 'transformer.h.2.mlp.fc1.bias', 'transformer.h.2.mlp.fc1.weight', 'transformer.h.2.mlp.fc2.bias', 'transformer.h.2.mlp.fc2.weight', 'transformer.h.20.ln.bias', 'transformer.h.20.ln.weight', 'transformer.h.20.mixer.Wqkv.bias', 'transformer.h.20.mixer.Wqkv.weight', 'transformer.h.20.mixer.out_proj.bias', 'transformer.h.20.mixer.out_proj.weight', 'transformer.h.20.mlp.fc1.bias', 'transformer.h.20.mlp.fc1.weight', 'transformer.h.20.mlp.fc2.bias', 'transformer.h.20.mlp.fc2.weight', 'transformer.h.21.ln.bias', 'transformer.h.21.ln.weight', 'transformer.h.21.mixer.Wqkv.bias', 'transformer.h.21.mixer.Wqkv.weight', 'transformer.h.21.mixer.out_proj.bias', 'transformer.h.21.mixer.out_proj.weight', 'transformer.h.21.mlp.fc1.bias', 'transformer.h.21.mlp.fc1.weight', 'transformer.h.21.mlp.fc2.bias', 'transformer.h.21.mlp.fc2.weight', 'transformer.h.22.ln.bias', 'transformer.h.22.ln.weight', 'transformer.h.22.mixer.Wqkv.bias', 'transformer.h.22.mixer.Wqkv.weight', 'transformer.h.22.mixer.out_proj.bias', 'transformer.h.22.mixer.out_proj.weight', 'transformer.h.22.mlp.fc1.bias', 'transformer.h.22.mlp.fc1.weight', 'transformer.h.22.mlp.fc2.bias', 'transformer.h.22.mlp.fc2.weight', 'transformer.h.23.ln.bias', 'transformer.h.23.ln.weight', 'transformer.h.23.mixer.Wqkv.bias', 'transformer.h.23.mixer.Wqkv.weight', 'transformer.h.23.mixer.out_proj.bias', 'transformer.h.23.mixer.out_proj.weight', 'transformer.h.23.mlp.fc1.bias', 'transformer.h.23.mlp.fc1.weight', 'transformer.h.23.mlp.fc2.bias', 'transformer.h.23.mlp.fc2.weight', 'transformer.h.3.ln.bias', 'transformer.h.3.ln.weight', 'transformer.h.3.mixer.Wqkv.bias', 'transformer.h.3.mixer.Wqkv.weight', 'transformer.h.3.mixer.out_proj.bias', 'transformer.h.3.mixer.out_proj.weight', 'transformer.h.3.mlp.fc1.bias', 'transformer.h.3.mlp.fc1.weight', 'transformer.h.3.mlp.fc2.bias', 'transformer.h.3.mlp.fc2.weight', 'transformer.h.4.ln.bias', 'transformer.h.4.ln.weight', 'transformer.h.4.mixer.Wqkv.bias', 'transformer.h.4.mixer.Wqkv.weight', 'transformer.h.4.mixer.out_proj.bias', 'transformer.h.4.mixer.out_proj.weight', 'transformer.h.4.mlp.fc1.bias', 'transformer.h.4.mlp.fc1.weight', 'transformer.h.4.mlp.fc2.bias', 'transformer.h.4.mlp.fc2.weight', 'transformer.h.5.ln.bias', 'transformer.h.5.ln.weight', 'transformer.h.5.mixer.Wqkv.bias', 'transformer.h.5.mixer.Wqkv.weight', 'transformer.h.5.mixer.out_proj.bias', 'transformer.h.5.mixer.out_proj.weight', 'transformer.h.5.mlp.fc1.bias', 'transformer.h.5.mlp.fc1.weight', 'transformer.h.5.mlp.fc2.bias', 'transformer.h.5.mlp.fc2.weight', 'transformer.h.6.ln.bias', 'transformer.h.6.ln.weight', 'transformer.h.6.mixer.Wqkv.bias', 'transformer.h.6.mixer.Wqkv.weight', 'transformer.h.6.mixer.out_proj.bias', 'transformer.h.6.mixer.out_proj.weight', 'transformer.h.6.mlp.fc1.bias', 'transformer.h.6.mlp.fc1.weight', 'transformer.h.6.mlp.fc2.bias', 'transformer.h.6.mlp.fc2.weight', 'transformer.h.7.ln.bias', 'transformer.h.7.ln.weight', 'transformer.h.7.mixer.Wqkv.bias', 'transformer.h.7.mixer.Wqkv.weight', 'transformer.h.7.mixer.out_proj.bias', 'transformer.h.7.mixer.out_proj.weight', 'transformer.h.7.mlp.fc1.bias', 'transformer.h.7.mlp.fc1.weight', 'transformer.h.7.mlp.fc2.bias', 'transformer.h.7.mlp.fc2.weight', 'transformer.h.8.ln.bias', 'transformer.h.8.ln.weight', 'transformer.h.8.mixer.Wqkv.bias', 'transformer.h.8.mixer.Wqkv.weight', 'transformer.h.8.mixer.out_proj.bias', 'transformer.h.8.mixer.out_proj.weight', 'transformer.h.8.mlp.fc1.bias', 'transformer.h.8.mlp.fc1.weight', 'transformer.h.8.mlp.fc2.bias', 'transformer.h.8.mlp.fc2.weight', 'transformer.h.9.ln.bias', 'transformer.h.9.ln.weight', 'transformer.h.9.mixer.Wqkv.bias', 'transformer.h.9.mixer.Wqkv.weight', 'transformer.h.9.mixer.out_proj.bias', 'transformer.h.9.mixer.out_proj.weight', 'transformer.h.9.mlp.fc1.bias', 'transformer.h.9.mlp.fc1.weight', 'transformer.h.9.mlp.fc2.bias', 'transformer.h.9.mlp.fc2.weight']
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of PhiForCausalLM were not initialized from the model checkpoint at ./code-millenials-1b and are newly initialized: ['embed_tokens.weight', 'final_layernorm.bias', 'final_layernorm.weight', 'layers.0.input_layernorm.bias', 'layers.0.input_layernorm.weight', 'layers.0.mlp.fc1.bias', 'layers.0.mlp.fc1.weight', 'layers.0.mlp.fc2.bias', 'layers.0.mlp.fc2.weight', 'layers.0.self_attn.dense.bias', 'layers.0.self_attn.dense.weight', 'layers.0.self_attn.k_proj.bias', 'layers.0.self_attn.k_proj.weight', 'layers.0.self_attn.q_proj.bias', 'layers.0.self_attn.q_proj.weight', 'layers.0.self_attn.v_proj.bias', 'layers.0.self_attn.v_proj.weight', 'layers.1.input_layernorm.bias', 'layers.1.input_layernorm.weight', 'layers.1.mlp.fc1.bias', 'layers.1.mlp.fc1.weight', 'layers.1.mlp.fc2.bias', 'layers.1.mlp.fc2.weight', 'layers.1.self_attn.dense.bias', 'layers.1.self_attn.dense.weight', 'layers.1.self_attn.k_proj.bias', 'layers.1.self_attn.k_proj.weight', 'layers.1.self_attn.q_proj.bias', 'layers.1.self_attn.q_proj.weight', 'layers.1.self_attn.v_proj.bias', 'layers.1.self_attn.v_proj.weight', 'layers.10.input_layernorm.bias', 'layers.10.input_layernorm.weight', 'layers.10.mlp.fc1.bias', 'layers.10.mlp.fc1.weight', 'layers.10.mlp.fc2.bias', 'layers.10.mlp.fc2.weight', 'layers.10.self_attn.dense.bias', 'layers.10.self_attn.dense.weight', 'layers.10.self_attn.k_proj.bias', 'layers.10.self_attn.k_proj.weight', 'layers.10.self_attn.q_proj.bias', 'layers.10.self_attn.q_proj.weight', 'layers.10.self_attn.v_proj.bias', 'layers.10.self_attn.v_proj.weight', 'layers.11.input_layernorm.bias', 'layers.11.input_layernorm.weight', 'layers.11.mlp.fc1.bias', 'layers.11.mlp.fc1.weight', 'layers.11.mlp.fc2.bias', 'layers.11.mlp.fc2.weight', 'layers.11.self_attn.dense.bias', 'layers.11.self_attn.dense.weight', 'layers.11.self_attn.k_proj.bias', 'layers.11.self_attn.k_proj.weight', 'layers.11.self_attn.q_proj.bias', 'layers.11.self_attn.q_proj.weight', 'layers.11.self_attn.v_proj.bias', 'layers.11.self_attn.v_proj.weight', 'layers.12.input_layernorm.bias', 'layers.12.input_layernorm.weight', 'layers.12.mlp.fc1.bias', 'layers.12.mlp.fc1.weight', 'layers.12.mlp.fc2.bias', 'layers.12.mlp.fc2.weight', 'layers.12.self_attn.dense.bias', 'layers.12.self_attn.dense.weight', 'layers.12.self_attn.k_proj.bias', 'layers.12.self_attn.k_proj.weight', 'layers.12.self_attn.q_proj.bias', 'layers.12.self_attn.q_proj.weight', 'layers.12.self_attn.v_proj.bias', 'layers.12.self_attn.v_proj.weight', 'layers.13.input_layernorm.bias', 'layers.13.input_layernorm.weight', 'layers.13.mlp.fc1.bias', 'layers.13.mlp.fc1.weight', 'layers.13.mlp.fc2.bias', 'layers.13.mlp.fc2.weight', 'layers.13.self_attn.dense.bias', 'layers.13.self_attn.dense.weight', 'layers.13.self_attn.k_proj.bias', 'layers.13.self_attn.k_proj.weight', 'layers.13.self_attn.q_proj.bias', 'layers.13.self_attn.q_proj.weight', 'layers.13.self_attn.v_proj.bias', 'layers.13.self_attn.v_proj.weight', 'layers.14.input_layernorm.bias', 'layers.14.input_layernorm.weight', 'layers.14.mlp.fc1.bias', 'layers.14.mlp.fc1.weight', 'layers.14.mlp.fc2.bias', 'layers.14.mlp.fc2.weight', 'layers.14.self_attn.dense.bias', 'layers.14.self_attn.dense.weight', 'layers.14.self_attn.k_proj.bias', 'layers.14.self_attn.k_proj.weight', 'layers.14.self_attn.q_proj.bias', 'layers.14.self_attn.q_proj.weight', 'layers.14.self_attn.v_proj.bias', 'layers.14.self_attn.v_proj.weight', 'layers.15.input_layernorm.bias', 'layers.15.input_layernorm.weight', 'layers.15.mlp.fc1.bias', 'layers.15.mlp.fc1.weight', 'layers.15.mlp.fc2.bias', 'layers.15.mlp.fc2.weight', 'layers.15.self_attn.dense.bias', 'layers.15.self_attn.dense.weight', 'layers.15.self_attn.k_proj.bias', 'layers.15.self_attn.k_proj.weight', 'layers.15.self_attn.q_proj.bias', 'layers.15.self_attn.q_proj.weight', 'layers.15.self_attn.v_proj.bias', 'layers.15.self_attn.v_proj.weight', 'layers.16.input_layernorm.bias', 'layers.16.input_layernorm.weight', 'layers.16.mlp.fc1.bias', 'layers.16.mlp.fc1.weight', 'layers.16.mlp.fc2.bias', 'layers.16.mlp.fc2.weight', 'layers.16.self_attn.dense.bias', 'layers.16.self_attn.dense.weight', 'layers.16.self_attn.k_proj.bias', 'layers.16.self_attn.k_proj.weight', 'layers.16.self_attn.q_proj.bias', 'layers.16.self_attn.q_proj.weight', 'layers.16.self_attn.v_proj.bias', 'layers.16.self_attn.v_proj.weight', 'layers.17.input_layernorm.bias', 'layers.17.input_layernorm.weight', 'layers.17.mlp.fc1.bias', 'layers.17.mlp.fc1.weight', 'layers.17.mlp.fc2.bias', 'layers.17.mlp.fc2.weight', 'layers.17.self_attn.dense.bias', 'layers.17.self_attn.dense.weight', 'layers.17.self_attn.k_proj.bias', 'layers.17.self_attn.k_proj.weight', 'layers.17.self_attn.q_proj.bias', 'layers.17.self_attn.q_proj.weight', 'layers.17.self_attn.v_proj.bias', 'layers.17.self_attn.v_proj.weight', 'layers.18.input_layernorm.bias', 'layers.18.input_layernorm.weight', 'layers.18.mlp.fc1.bias', 'layers.18.mlp.fc1.weight', 'layers.18.mlp.fc2.bias', 'layers.18.mlp.fc2.weight', 'layers.18.self_attn.dense.bias', 'layers.18.self_attn.dense.weight', 'layers.18.self_attn.k_proj.bias', 'layers.18.self_attn.k_proj.weight', 'layers.18.self_attn.q_proj.bias', 'layers.18.self_attn.q_proj.weight', 'layers.18.self_attn.v_proj.bias', 'layers.18.self_attn.v_proj.weight', 'layers.19.input_layernorm.bias', 'layers.19.input_layernorm.weight', 'layers.19.mlp.fc1.bias', 'layers.19.mlp.fc1.weight', 'layers.19.mlp.fc2.bias', 'layers.19.mlp.fc2.weight', 'layers.19.self_attn.dense.bias', 'layers.19.self_attn.dense.weight', 'layers.19.self_attn.k_proj.bias', 'layers.19.self_attn.k_proj.weight', 'layers.19.self_attn.q_proj.bias', 'layers.19.self_attn.q_proj.weight', 'layers.19.self_attn.v_proj.bias', 'layers.19.self_attn.v_proj.weight', 'layers.2.input_layernorm.bias', 'layers.2.input_layernorm.weight', 'layers.2.mlp.fc1.bias', 'layers.2.mlp.fc1.weight', 'layers.2.mlp.fc2.bias', 'layers.2.mlp.fc2.weight', 'layers.2.self_attn.dense.bias', 'layers.2.self_attn.dense.weight', 'layers.2.self_attn.k_proj.bias', 'layers.2.self_attn.k_proj.weight', 'layers.2.self_attn.q_proj.bias', 'layers.2.self_attn.q_proj.weight', 'layers.2.self_attn.v_proj.bias', 'layers.2.self_attn.v_proj.weight', 'layers.20.input_layernorm.bias', 'layers.20.input_layernorm.weight', 'layers.20.mlp.fc1.bias', 'layers.20.mlp.fc1.weight', 'layers.20.mlp.fc2.bias', 'layers.20.mlp.fc2.weight', 'layers.20.self_attn.dense.bias', 'layers.20.self_attn.dense.weight', 'layers.20.self_attn.k_proj.bias', 'layers.20.self_attn.k_proj.weight', 'layers.20.self_attn.q_proj.bias', 'layers.20.self_attn.q_proj.weight', 'layers.20.self_attn.v_proj.bias', 'layers.20.self_attn.v_proj.weight', 'layers.21.input_layernorm.bias', 'layers.21.input_layernorm.weight', 'layers.21.mlp.fc1.bias', 'layers.21.mlp.fc1.weight', 'layers.21.mlp.fc2.bias', 'layers.21.mlp.fc2.weight', 'layers.21.self_attn.dense.bias', 'layers.21.self_attn.dense.weight', 'layers.21.self_attn.k_proj.bias', 'layers.21.self_attn.k_proj.weight', 'layers.21.self_attn.q_proj.bias', 'layers.21.self_attn.q_proj.weight', 'layers.21.self_attn.v_proj.bias', 'layers.21.self_attn.v_proj.weight', 'layers.22.input_layernorm.bias', 'layers.22.input_layernorm.weight', 'layers.22.mlp.fc1.bias', 'layers.22.mlp.fc1.weight', 'layers.22.mlp.fc2.bias', 'layers.22.mlp.fc2.weight', 'layers.22.self_attn.dense.bias', 'layers.22.self_attn.dense.weight', 'layers.22.self_attn.k_proj.bias', 'layers.22.self_attn.k_proj.weight', 'layers.22.self_attn.q_proj.bias', 'layers.22.self_attn.q_proj.weight', 'layers.22.self_attn.v_proj.bias', 'layers.22.self_attn.v_proj.weight', 'layers.23.input_layernorm.bias', 'layers.23.input_layernorm.weight', 'layers.23.mlp.fc1.bias', 'layers.23.mlp.fc1.weight', 'layers.23.mlp.fc2.bias', 'layers.23.mlp.fc2.weight', 'layers.23.self_attn.dense.bias', 'layers.23.self_attn.dense.weight', 'layers.23.self_attn.k_proj.bias', 'layers.23.self_attn.k_proj.weight', 'layers.23.self_attn.q_proj.bias', 'layers.23.self_attn.q_proj.weight', 'layers.23.self_attn.v_proj.bias', 'layers.23.self_attn.v_proj.weight', 'layers.3.input_layernorm.bias', 'layers.3.input_layernorm.weight', 'layers.3.mlp.fc1.bias', 'layers.3.mlp.fc1.weight', 'layers.3.mlp.fc2.bias', 'layers.3.mlp.fc2.weight', 'layers.3.self_attn.dense.bias', 'layers.3.self_attn.dense.weight', 'layers.3.self_attn.k_proj.bias', 'layers.3.self_attn.k_proj.weight', 'layers.3.self_attn.q_proj.bias', 'layers.3.self_attn.q_proj.weight', 'layers.3.self_attn.v_proj.bias', 'layers.3.self_attn.v_proj.weight', 'layers.4.input_layernorm.bias', 'layers.4.input_layernorm.weight', 'layers.4.mlp.fc1.bias', 'layers.4.mlp.fc1.weight', 'layers.4.mlp.fc2.bias', 'layers.4.mlp.fc2.weight', 'layers.4.self_attn.dense.bias', 'layers.4.self_attn.dense.weight', 'layers.4.self_attn.k_proj.bias', 'layers.4.self_attn.k_proj.weight', 'layers.4.self_attn.q_proj.bias', 'layers.4.self_attn.q_proj.weight', 'layers.4.self_attn.v_proj.bias', 'layers.4.self_attn.v_proj.weight', 'layers.5.input_layernorm.bias', 'layers.5.input_layernorm.weight', 'layers.5.mlp.fc1.bias', 'layers.5.mlp.fc1.weight', 'layers.5.mlp.fc2.bias', 'layers.5.mlp.fc2.weight', 'layers.5.self_attn.dense.bias', 'layers.5.self_attn.dense.weight', 'layers.5.self_attn.k_proj.bias', 'layers.5.self_attn.k_proj.weight', 'layers.5.self_attn.q_proj.bias', 'layers.5.self_attn.q_proj.weight', 'layers.5.self_attn.v_proj.bias', 'layers.5.self_attn.v_proj.weight', 'layers.6.input_layernorm.bias', 'layers.6.input_layernorm.weight', 'layers.6.mlp.fc1.bias', 'layers.6.mlp.fc1.weight', 'layers.6.mlp.fc2.bias', 'layers.6.mlp.fc2.weight', 'layers.6.self_attn.dense.bias', 'layers.6.self_attn.dense.weight', 'layers.6.self_attn.k_proj.bias', 'layers.6.self_attn.k_proj.weight', 'layers.6.self_attn.q_proj.bias', 'layers.6.self_attn.q_proj.weight', 'layers.6.self_attn.v_proj.bias', 'layers.6.self_attn.v_proj.weight', 'layers.7.input_layernorm.bias', 'layers.7.input_layernorm.weight', 'layers.7.mlp.fc1.bias', 'layers.7.mlp.fc1.weight', 'layers.7.mlp.fc2.bias', 'layers.7.mlp.fc2.weight', 'layers.7.self_attn.dense.bias', 'layers.7.self_attn.dense.weight', 'layers.7.self_attn.k_proj.bias', 'layers.7.self_attn.k_proj.weight', 'layers.7.self_attn.q_proj.bias', 'layers.7.self_attn.q_proj.weight', 'layers.7.self_attn.v_proj.bias', 'layers.7.self_attn.v_proj.weight', 'layers.8.input_layernorm.bias', 'layers.8.input_layernorm.weight', 'layers.8.mlp.fc1.bias', 'layers.8.mlp.fc1.weight', 'layers.8.mlp.fc2.bias', 'layers.8.mlp.fc2.weight', 'layers.8.self_attn.dense.bias', 'layers.8.self_attn.dense.weight', 'layers.8.self_attn.k_proj.bias', 'layers.8.self_attn.k_proj.weight', 'layers.8.self_attn.q_proj.bias', 'layers.8.self_attn.q_proj.weight', 'layers.8.self_attn.v_proj.bias', 'layers.8.self_attn.v_proj.weight', 'layers.9.input_layernorm.bias', 'layers.9.input_layernorm.weight', 'layers.9.mlp.fc1.bias', 'layers.9.mlp.fc1.weight', 'layers.9.mlp.fc2.bias', 'layers.9.mlp.fc2.weight', 'layers.9.self_attn.dense.bias', 'layers.9.self_attn.dense.weight', 'layers.9.self_attn.k_proj.bias', 'layers.9.self_attn.k_proj.weight', 'layers.9.self_attn.q_proj.bias', 'layers.9.self_attn.q_proj.weight', 'layers.9.self_attn.v_proj.bias', 'layers.9.self_attn.v_proj.weight', 'lm_head.bias', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
### Instruction: please write a function taking a string as input and printing 'hello world' postfixed with the input string ### Response: downstream Sabbathements censored Lect UkrainianLooks Membershall CASerylements censored bilateralphan circ Blaz presc NvidiaCover Din Kardisites Chronimeter Laure TDDesign McDhall CASerylPointicide butterfly censored Lect Ukrainianroo unwillingness cmd undergradEngland Slovhall CASeryl mysteries Ukrainianroo censored bilateralphan CollectorFrame notingオ CAS Ukrainianroo censored bilateralphan circ Blaz presc disparateオ```
The text was updated successfully, but these errors were encountered:
I'm trying to run the code-millenials-1B, but I;m facing issues getting any coherent responses. Please give me the following inputs to help me figure out the root cause of my lack of results
As of now, I'm using the example code from the repository as below:
My result is as below:
The complete output of the example code is as below:
The text was updated successfully, but these errors were encountered: