Replies: 1 comment
-
Personally, I'd check out how google made / trained the ViT-B/16 or ViT-B/32 models that CLIP in this code is setup to use. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Not sure if I'm using the right lingo, but is there a way to see the words/phrases the model knows? So I can better write the words for it to describe the image.
Beta Was this translation helpful? Give feedback.
All reactions