Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truncated_output_suffixes & #32

Closed
thistleknot opened this issue Apr 29, 2024 · 2 comments
Closed

truncated_output_suffixes & #32

thistleknot opened this issue Apr 29, 2024 · 2 comments

Comments

@thistleknot
Copy link

thistleknot commented Apr 29, 2024

with open("data/all_truncated_outputs.json") as f:
    output_suffixes = json.load(f)
truncated_output_suffixes = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes)
    for i in range(1, len(tokens))
]
truncated_output_suffixes_512 = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes[:512])
    for i in range(1, len(tokens))
]

files referenced that do not exist in the repo for the mve

another ex is true_facts.json (did not find an example in the paper that mentioned facts or a .json file)

@thistleknot
Copy link
Author

created a script that i think mimics what you were showcasing

https://gist.github.com/thistleknot/b936477ee82ce608b3c7f47381f6b15d

@vgel
Copy link
Owner

vgel commented May 24, 2024

make sure you're running the notebook with cwd in the notebooks folder, the data folder is notebooks/data. alternatively you can just copy the data folder to wherever you need it (you can figure out the current cwd with import os; print(os.getcwd()) and copy the data folder there), it's pretty small.

@vgel vgel closed this as completed May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants