Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] fix some typo #206

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 4 additions & 4 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ CodeGeeX2 is the second-generation model of the multilingual code generation mod

* **More Powerful Coding Capabilities**: Based on the ChatGLM2-6B model, CodeGeeX2-6B has been further pre-trained on 600B code tokens, which has been comprehensively improved in coding capability compared to the first-generation. On the [HumanEval-X](https://huggingface.co/datasets/THUDM/humaneval-x) benchmark, all six languages have been significantly improved (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321\%), and in Python it reached 35.9% of Pass@1 one-time pass rate, surpassing the larger StarCoder-15B.
* **More Useful Features**: Inheriting the ChatGLM2-6B model features, CodeGeeX2-6B better supports both Chinese and English prompts, maximum 8192 sequence length, and the inference speed is significantly improved compared to the first-generation. After quantization, it only needs 6GB of GPU memory for inference, thus supports lightweight local deployment.
* **Comprehensive AI Coding Assistant**: The backend of CodeGeeX plugin ([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex)) is upgraded, supporting 100+ programming languages, and adding practical functions such as infilling and cross-file completion. Combined with the "Ask CodeGeeX" interactive AI coding assistant, it can be used to solve various programming problems via Chinese or English dialogue, including but not limited to code summarization, code translation, debugging, and comment generation, which helps increasing the efficiency of developpers.
* **Comprehensive AI Coding Assistant**: The backend of CodeGeeX plugin ([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex)) is upgraded, supporting 100+ programming languages, and adding practical functions such as infilling and cross-file completion. Combined with the "Ask CodeGeeX" interactive AI coding assistant, it can be used to solve various programming problems via Chinese or English dialogue, including but not limited to code summarization, code translation, debugging, and comment generation, which helps increasing the efficiency of developers.
* **Open License**: CodeGeeX2-6B weights are fully open to academic research, and please apply for commercial use by filling in the [registration form](https://open.bigmodel.cn/mla/form?mcode=CodeGeeX2-6B).


Expand Down Expand Up @@ -66,8 +66,8 @@ python ./demo/run_demo.py
```

❗️Attention:
* CodeGeeX2 is a base model, which is not instruction-tuned for chatting. It can do tasks like code completion/translation/explaination. To try the instruction-tuned version in CodeGeeX plugins ([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex)).
* Programming languages can be controled by adding `language tag`, e.g., `# language: Python`. The format should be respected to ensure performance, full list can be found [here](https://github.com/THUDM/CodeGeeX2/blob/main/evaluation/utils.py#L14). Please write comments under the format of the selected programming language to achieve better results.
* CodeGeeX2 is a base model, which is not instruction-tuned for chatting. It can do tasks like code completion/translation/explanation. To try the instruction-tuned version in CodeGeeX plugins ([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex)).
* Programming languages can be controlled by adding `language tag`, e.g., `# language: Python`. The format should be respected to ensure performance, full list can be found [here](https://github.com/THUDM/CodeGeeX2/blob/main/evaluation/utils.py#L14). Please write comments under the format of the selected programming language to achieve better results.
* If the GPU doesn't support `bfloat16` format, it will cause incorrect output. Please convert the model to `float16` format:
```python
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).half().cuda()
Expand Down Expand Up @@ -151,7 +151,7 @@ CodeGeeX2 is more friendly to deployment than the previous generation. Thanks to
| :--------------: | :-------: | :-----: | :----: |
| CodeGeeX-13B | 26\.9 GB | 14\.7 GB | - |
| **CodeGeeX2-6B** | 13\.1 GB | 8\.2 GB | 5\.5 GB |
> Based on PyTorch 2.0, using `torch.nn.functional.scaled_dot_product_attention` for effecient attention mechanism。
> Based on PyTorch 2.0, using `torch.nn.functional.scaled_dot_product_attention` for efficient attention mechanism。

### Acceleration

Expand Down
8 changes: 4 additions & 4 deletions demo/run_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ def add_code_generation_args(parser):


def set_random_seed(seed):
"""Set random seed for reproducability."""
"""Set random seed for reproducibility."""
random.seed(seed)
numpy.random.seed(seed)
torch.manual_seed(seed)
Expand Down Expand Up @@ -312,8 +312,8 @@ def predict(
* 按照所选编程语言的格式写注释可以获得更好的结果,请参照下方给出的示例。

This is the DEMO for CodeGeeX2. Please note that:
* CodeGeeX2 is a base model, which is not instruction-tuned for chatting. It can do tasks like code completion/translation/explaination. To try the instruction-tuned version in CodeGeeX plugins ([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex)).
* Programming languages can be controled by adding `language tag`, e.g., `# language: Python`. The format should be respected to ensure performance, full list can be found [here](https://github.com/THUDM/CodeGeeX2/blob/main/evaluation/utils.py#L14).
* CodeGeeX2 is a base model, which is not instruction-tuned for chatting. It can do tasks like code completion/translation/explanation. To try the instruction-tuned version in CodeGeeX plugins ([VS Code](https://marketplace.visualstudio.com/items?itemName=aminer.codegeex), [Jetbrains](https://plugins.jetbrains.com/plugin/20587-codegeex)).
* Programming languages can be controlled by adding `language tag`, e.g., `# language: Python`. The format should be respected to ensure performance, full list can be found [here](https://github.com/THUDM/CodeGeeX2/blob/main/evaluation/utils.py#L14).
* Write comments under the format of the selected programming language to achieve better results, see examples below.
""")

Expand Down Expand Up @@ -348,7 +348,7 @@ def predict(
clr.click(fn=lambda value: gr.update(value=""), inputs=clr, outputs=prompt)

gr_examples = gr.Examples(examples=examples, inputs=[prompt, lang],
label="Example Inputs (Click to insert an examplet it into the input box)",
label="Example Inputs (Click to insert an example into the input box)",
examples_per_page=20)
if not args.auth:
demo.launch(server_name=args.listen, server_port=args.port)
Expand Down
2 changes: 1 addition & 1 deletion evaluation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@


def set_random_seed(seed):
"""Set random seed for reproducability."""
"""Set random seed for reproducibility."""
random.seed(seed)
numpy.random.seed(seed)
torch.manual_seed(seed)
Expand Down