Google Gemini Embeddings | Multimodal Processing & Insights

Google has introduced a new version of its Gemini Embedding — now with multimodal embeddings!

This new model can natively process videos up to 2 minutes long, handle multiple PDF pages, and also pay attention to audio with text. It can be used both in the free tier and via a paid API. The embeddings are structured like a nesting doll: each individual embedding piece is generated independently, although less precise.

Unfortunately, Google’s service prices have risen again. Text processing now costs about $0.20 per million tokens, while the price for multimodal data has increased significantly — for example, video processing now costs $12 per million tokens (approximately 15,000 frames). Google is actively leveraging the fact that there are few competitors in this segment — other major companies have yet to implement such extensive updates. For instance, OpenAI last updated its embeddings in January 2024, while simultaneously improving GPT-3.5 Turbo and GPT-4 Turbo.

All of this is relevant due to the lack of widespread alternatives on the market.

Created with n8n:
https://cutt.ly/n8n

Created with syllaby:
https://cutt.ly/syllaby

Page view 18.03 14:35 Page view /ai-blog/frances-macron-irans-military-capabilities-still-intact-latest-news 18.03 14:34 Page view /ai-blog/ex-4d-camera-technology-dynamic-3d-video-creation-bytedance-pico/ 18.03 14:32 Page view 18.03 14:32 Page view /ai-blog/google-gemini-embedding-2-multimodal-ai-for-seamless-data-integration 18.03 14:32 Page view 18.03 14:29 Page view 18.03 14:27 Page view 18.03 14:26 Page view 18.03 14:24 Page view /ai-blog/brent-crude-oil-prices-surge-5-iran-energy-facility-attacks/ 18.03 14:20