Google Gemini Embeddings | Multimodal Processing & Insights

Google has introduced a new version of its Gemini Embedding — now with multimodal embeddings!

This new model can natively process videos up to 2 minutes long, handle multiple PDF pages, and also pay attention to audio with text. It can be used both in the free tier and via a paid API. The embeddings are structured like a nesting doll: each individual embedding piece is generated independently, although less precise.

Unfortunately, Google’s service prices have risen again. Text processing now costs about $0.20 per million tokens, while the price for multimodal data has increased significantly — for example, video processing now costs $12 per million tokens (approximately 15,000 frames). Google is actively leveraging the fact that there are few competitors in this segment — other major companies have yet to implement such extensive updates. For instance, OpenAI last updated its embeddings in January 2024, while simultaneously improving GPT-3.5 Turbo and GPT-4 Turbo.

All of this is relevant due to the lack of widespread alternatives on the market.

Created with n8n:
https://cutt.ly/n8n

Created with syllaby:
https://cutt.ly/syllaby

Page view 18.03 12:39 Page view 18.03 12:38 Page view 18.03 12:38 Page view /ai-blog/new-photoshop-rotate-object-feature-elevate-creativity-with-beta/ 18.03 12:35 Page view 18.03 12:33 Page view 18.03 12:28 Page view /ai-blog/overcoming-dopamine-dependence-boost-focus-balance/ 18.03 12:19 Page view /ai-blog/pytorch-2-10-new-release-faster-flexible-ai-development/ 18.03 12:18 Page view /category/ai-blog/ai-agent-news/?query-1-page=14 18.03 12:11 Page view 18.03 12:09