they slap an MLP to project visual tokens from FastViTHD into the LLM's world
the result: way fewer tokens (like 4× less than FastViT, 16× less than ViT‑L/14 at 336‑pixel res). I mean, that's a big dropping in token count and complexity, while
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
10 Likes
Reward
10
5
Repost
Share
Comment
0/400
GasFeeLover
· 14h ago
That's just how it is, what is there to brag about?
View OriginalReply0
ser_we_are_early
· 14h ago
It seems FastVLM is really amazing.
View OriginalReply0
BlockchainBard
· 14h ago
Impressive! I was shocked by the number of tokens.
here's where FastVLM comes in
they slap an MLP to project visual tokens from FastViTHD into the LLM's world
the result: way fewer tokens (like 4× less than FastViT, 16× less than ViT‑L/14 at 336‑pixel res). I mean, that's a big dropping in token count and complexity, while