Skip to content

Commit a05f9da

Browse files
authored
[Feature] Make dump-eval-details default behavior (open-compass#1999)
* Update * update * update
1 parent fd82bea commit a05f9da

File tree

3 files changed

+10
-5
lines changed

3 files changed

+10
-5
lines changed

docs/en/user_guides/experimentation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ The parameter explanation is as follows:
5757
- `-w`: Specify the working path, default is `./outputs/default`.
5858
- `-l`: Enable status reporting via Lark bot.
5959
- `--dry-run`: When enabled, inference and evaluation tasks will be dispatched but won't actually run for debugging.
60-
- `--dump-eval-details`: When enabled,evaluation under the `results` folder will include more details, such as the correctness of each sample.
60+
- `--dump-eval-details`: Default enabled,evaluation under the `results` folder will include more details, such as the correctness of each sample. Set `--dump-eval-details False` to disable it。
6161
6262
Using run mode `-m all` as an example, the overall execution flow is as follows:
6363

docs/zh_cn/user_guides/experimentation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ python run.py $EXP {--slurm | --dlc | None} [-p PARTITION] [-q QUOTATYPE] [--deb
5757
- `-w`: 指定工作路径,默认为 `./outputs/default`
5858
- `-l`: 打开飞书机器人状态上报。
5959
- `--dry-run`: 开启时,推理和评测任务仅会分发但不会真正运行,便于调试;
60-
- `--dump-eval-details`: 开启时`results` 下的评测结果中将会包含更加详细的评测结果信息,例如每条样本是否正确等。
60+
- `--dump-eval-details`: 默认开启`results` 下的评测结果中将会包含更加详细的评测结果信息,例如每条样本是否正确等。如不需要开启,需设置`--dump-eval-details False`
6161
6262
以运行模式 `-m all` 为例,整体运行流如下:
6363

opencompass/cli/main.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -119,8 +119,11 @@ def parse_args():
119119
parser.add_argument(
120120
'--dump-eval-details',
121121
help='Whether to dump the evaluation details, including the '
122-
'correctness of each sample, bpb, etc.',
123-
action='store_true',
122+
'correctness of each sample, bpb, etc. Defaults to True.',
123+
nargs='?',
124+
const=True,
125+
default=True,
126+
type=lambda x: False if x and x.lower() == 'false' else True
124127
)
125128
parser.add_argument(
126129
'--dump-extract-rate',
@@ -233,7 +236,6 @@ def parse_custom_dataset_args(custom_dataset_parser):
233236

234237
def main():
235238
args = parse_args()
236-
237239
if args.num_gpus is not None:
238240
raise ValueError('The `--num-gpus` argument is deprecated, please use '
239241
'`--hf-num-gpus` to describe number of gpus used for '
@@ -350,6 +352,9 @@ def main():
350352
if args.dlc or args.slurm or cfg.get('eval', None) is None:
351353
fill_eval_cfg(cfg, args)
352354
if args.dump_eval_details:
355+
logger.warning('Default to dump eval details, it might take extra'
356+
'space to save all the evaluation details. '
357+
'Set --dump-eval-details False to skip the details dump')
353358
cfg.eval.runner.task.dump_details = True
354359
if args.dump_extract_rate:
355360
cfg.eval.runner.task.cal_extract_rate = True

0 commit comments

Comments
 (0)