AI Model Updates & Developments | Microsoft, OpenAI, Anthropic & More

Microsoft announced a multimodal version of Phi-4, named Phi-4-reasoning-vision-15B. This model is based on the SigLIP-2 encoder and the Phi-4 logical architecture, and it is equipped with a flexible inference mechanism that automatically selects the reasoning chain depending on the complexity of the task. When faced with mathematical or logical problems, the model conducts an in-depth analysis, while for simple requests related to image description or OCR, it operates without complex deliberation.

In addition to standard tasks in visual analysis and understanding, Microsoft has developed a solution for AI agents that control computers. This model can interpret screen content, recognize interactive elements, and select optimal actions within a graphical user interface.

All model weights are available on HuggingFace and Microsoft Foundry platforms under the MIT license.

OpenAI is in the final stages of developing a bidirectional audio model. This system will be capable of continuous background sound processing and instantly recognizing user responses, immediately adjusting its replies. Such flexibility will make dialogue more natural: the model can respond appropriately if the interlocutor interrupts or changes the topic mid-sentence.

This is especially important in complex interaction scenarios, such as when a virtual support operator needs to adapt to rapidly changing conversation contexts without losing track of the discussion.

Currently, during the prototype stage, there are certain issues during long sessions. Due to ongoing improvements, the public release of the model has been postponed until at least Q2 of this year.

Anthropic is attempting to retain its contract with the U.S. Department of Defense after negotiations stalled and the agency threatened to exclude the company from military sector contracts. The renewal of negotiations was initiated by Emil Michael — he publicly called Anthropic’s head a “liar with a god complex” last week.

The company is now seeking a compromise solution and hopes to maintain the opportunity to participate in major government tenders. The situation is exacerbated by the fact that OpenAI recently signed a contract with the military. Inside the company, Amodei called competitors’ and officials’ statements on this matter “blatant lies.”

Lightricks announced a new development — a local video editor with the LTX-2.3 model. The LTX Desktop application combines nonlinear editing tools with new generative model capabilities, allowing users to create videos from text, images, or sound. Users can work on a familiar timeline and easily fix unsuccessful segments using the Retake feature.

For full functionality, Windows with an NVIDIA GPU (minimum 32 GB VRAM), 32 GB RAM, and 160 GB free disk space are required. Mac owners or less powerful PCs can only access the cloud mode via API.

The project code is fully open source, and using the LTX-2.3 model locally is free for enthusiasts and small companies with annual revenue under $10 million.

Following recent departures of key developers from Qwen, major players have decided to capitalize on the situation at competitors. Omar Sanseviero from Google DeepMind posted an open invitation on X for specialists from Alibaba’s team — he is seeking talented engineers with experience working on Qwen family models to expand his own open-source ecosystem. He expressed willingness for direct contact for professionals considering a job change.

This initiative indicates DeepMind’s aim to strengthen its position in developing large language models and highlights ongoing competition for talented AI specialists.

Created with n8n:
https://cutt.ly/n8n

Created with syllaby:
https://cutt.ly/syllaby