预训练放开vision encoder,效果很差 #430
Unanswered
liuheng0111
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
使用llava1.5模型结构,在第一个预训练使用大量caption数据放开vision encoder、mlp_2x的参数训练,global batch_size 192, 训练多步后测试模型发现模型完全没有caption能力,而且loss也比单独放开mlp_2x大很多,消融发现是vision encoder放开后提取图像信息能力变差了很多,请问如果放开vision encoder应该如何训练? yi-vl stage1使用Laion-400M data for pretraining 放开了vit和mlp_2x之后,vision encoder能很好的提取图像信息吗?
我采用的Learning rate也是1e-4,mlp_2x没有使用layer normalizations, 使用bf16训练,没有gradient clip,batch_size是192;你们使用的bfloat16训练么?bfloat16训练如何设置gradient clip, deepspeed bfloat16是不是不支持这个参数的设置?
Beta Was this translation helpful? Give feedback.
All reactions