bloom-code-evaluation

Evaluation of BLOOM on the task of code generation using the HumanEval benchmark.

On JZ

This generates code for the 164 prompts present in the benchmark (200 generations are made for each problem). The experiment is done three times for 3 different temperatures 0.2, 0.6 and 0.8:

Setup

transformersand accelerate are installed from source along with datasets, we also clone the HumanEval benchmark to use it offline.

bash setup.sh

Code generation

The following commands generate code for each experiment/temperature, you can increase the batch size if you have enough memory. This outputs two files generations.jsonand references.json placed in the corresponding output_file. You can also change MODEL_CKPT to a local repository to load it offline.

export HF_DATASETS_OFFLINE=1

OUTPUT_file1=code_generations_exp1
OUTPUT_file2=code_generations_exp2
OUTPUT_file3=code_generations_exp3

MODEL_CKPT=bigscience/bloom
echo using $MODEL_CKPT as model checkpoint, if not done change it to a local repository

python  code_eval.py --model_ckpt $MODEL_CKPT \
--batch_size 1 \
--do_sample True \
--temperature 0.2 \
--top_p 0.95 \
--n_samples 200 \
--output_file $OUTPUT_file1

python  code_eval.py --model_ckpt $MODEL_CKPT \
--batch_size 1 \
--do_sample True \
--temperature 0.6 \
--top_p 0.95 \
--n_samples 200 \
--output_file $OUTPUT_file2

python  code_eval.py --model_ckpt $MODEL_CKPT \
--batch_size 1 \
--do_sample True \
--temperature 0.8 \
--top_p 0.95 \
--n_samples 200 \
--output_file $OUTPUT_file3

Evaluation on GCP

All experiments must be placed in the folder output_file. HF_ALLOW_CODE_EVAL=1 allwos executing the code generated by the model. This prints the pass@k scores for each experiment and saves them in json files in output_file.

pip install datasets transformers

python run_evaluation.py --HF_ALLOW_CODE_EVAL 1 --output_file bloom --num_tasks 164

As a final score we take the best results out of the three experiments of each of the pass@1, pass@10 and pass@100 scores.

Note: If you are evaluating the existing generations from bloom in this repo, please set replace_eos=True in run_evaluation.py.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
code_generations_bloom		code_generations_bloom
code_generations_bloom1b1		code_generations_bloom1b1
code_generations_bloom1b7		code_generations_bloom1b7
code_generations_bloom3b		code_generations_bloom3b
code_generations_bloom560m		code_generations_bloom560m
code_generations_bloom7b1		code_generations_bloom7b1
code_generations_opt		code_generations_opt
other		other
.gitignore		.gitignore
README.md		README.md
arguments.py		arguments.py
code_eval.py		code_eval.py
generate_code_bloom.slurm		generate_code_bloom.slurm
generate_code_bloom1b1.slurm		generate_code_bloom1b1.slurm
generate_code_bloom1b7.slurm		generate_code_bloom1b7.slurm
generate_code_bloom3b.slurm		generate_code_bloom3b.slurm
generate_code_bloom560m.slurm		generate_code_bloom560m.slurm
generate_code_bloom7b1.slurm		generate_code_bloom7b1.slurm
generate_code_opt.slurm		generate_code_opt.slurm
requirements.txt		requirements.txt
run_evaluation.py		run_evaluation.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bloom-code-evaluation

On JZ

Setup

Code generation

Evaluation on GCP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

loubnabnl/bloom-code-evaluation

Folders and files

Latest commit

History

Repository files navigation

bloom-code-evaluation

On JZ

Setup

Code generation

Evaluation on GCP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages