Microsoft Unveils Maia 200 AI Chip, Claiming Performance Lead Over Amazon and Google In a move to bolster its AI capabilities, Microsoft has unveiled the Maia 200, a new AI chip that the company claims outperforms offerings from tech giants Amazon and Google. This development comes amidst a growing arms race in the AI hardware space, as companies strive to gain a competitive edge in the rapidly evolving field of artificial intelligence. According to the reports, the Maia 200 chip is designed to offer superior performance and efficiency compared to its rivals. Artificial Analysis, an independent AI testing authority, has conducted benchmarks that place the Maia 200 at the top of the Intelligence Index, a metric that evaluates large language models' performance in real-world use cases. The Maia 200 is said to have scored 51 on the Intelligence Index, outpacing the competition. Claude Opus 4.5 with reasoning enabled scored 49, while Gemini 3 Pro Preview set to high reasoning scored 48. GLM-4.7, an open-weights large language model, led the pack with a score of 42. Artificial Analysis' testing methodology involves feeding identical prompts to various AI models at different reasoning and temperature settings, with the models' responses evaluated based on their ability to produce useful documents, spreadsheets, and diagrams, as well as their accuracy in answering technical questions without hallucinating. The reports also highlight Microsoft's efforts to drive AI adoption within its own workforce. The company has launched an initiative called "Camp AIR," a three-week boot camp aimed at training employees on the integration of AI tools into their daily workflows. This move underscores the challenges many companies face in effectively implementing AI technology, even among tech-savvy employees. While the Maia 200's performance claims are impressive, the reports also note the ongoing concerns around data theft and model cloning. Google and OpenAI have both raised concerns about distillation attacks, where bad actors attempt to extract the internal logic of AI models to create cheaper clones, potentially skipping billions in training costs. As the AI landscape continues to evolve, the competition for hardware and software supremacy is intensifying. Microsoft's Maia 200 chip represents the company's latest effort to assert its dominance in the field, but the broader challenges of AI adoption and data security remain key considerations for businesses and researchers alike.
Anthropic's Claude Code: Revolutionizing the Future of Software Development In the rapidly evolving world of AI-powered coding tools, Anthropic's Claude Code has emerged as a game-changer, reshaping the way software is developed. According to engineers in Silicon Valley, the buzz around this innovative technology has reached a fever pitch in recent months. At the helm of Claude Code is Boris Cherny, who explains that the team's goal was to create "the simplest possible thing." However, the impact of their creation has been anything but simple. Cherny acknowledges that early versions of Claude Code often stumbled, but Anthropic built the tool with an eye towards the future of AI capabilities, rather than the present. That foresight has paid off, as several developers claim that AI coding products have reached an inflection point, particularly with the launch of Anthropic's latest AI model, Claude Opus 4.5. Kian Katanforoosh, the CEO of Workera, says his company recently switched to Claude Code after testing various tools, and he believes the latest version has taken a "step-function improvement in coding abilities." The business of AI coding agents has taken off, with Anthropic announcing last year that Claude Code had reached $1 billion in annual recurring revenue. To further expand the tool's capabilities, Anthropic has introduced a new feature called Cowork, which brings Claude Code's agent-based workflow to people who don't write code. According to Anthropic, Cowork allows Claude to read, edit, and create files on its own, shifting the AI from answering questions to completing tasks. The company has also added new skills for creating documents and presentations, and integrated Claude into Chrome, allowing the AI to tackle tasks that require browser access. Interestingly, Anthropic's Claude Code inventor, Boris Cherny, revealed that the Cowork feature was built in under two weeks using Claude Code to write the majority of the code. This demonstrates the tool's ability to accelerate software development, even for Anthropic's own products. In addition to the Cowork feature, Anthropic has also introduced Claude Code Security, a new integrated tool that targets security teams and open-source maintainers. This feature uses the latest Claude Opus 4.6 model to identify complex vulnerabilities, such as business logic flaws and access control issues, that are often missed by conventional static analysis tools. As the AI coding landscape continues to evolve, Anthropic's Claude Code has emerged as a powerful and versatile tool that is reshaping the future of software development. With its growing capabilities, from autonomous code generation to security analysis, Claude Code is poised to transform the way developers and non-programmers alike approach their work.
Distilling the tool-using capabilities of large language models (LLMs) into smaller, more efficient small language models (SLMs) is a key challenge for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor generalization as it trains models to imitate a static set of teacher trajectories rather than learn a robust methodology. While reinforcement learning (RL) offers an alternative, the standard RL using sparse rewards fails to effectively guide SLMs, causing them to struggle with inefficient exploration and adopt suboptimal strategies. To address these distinct challenges, we propose MENTOR, a framework that synergistically combines RL with teacher-guided distillation. Instead of simple imitation, MENTOR employs an RL-based process to learn a more generalizable policy through exploration. In addition, to solve the problem of reward sparsity, it uses a teacher's reference trajectory to construct a dense, composite teacher-guided reward that provides fine-grained guidance. Extensive experiments demonstrate that MENTOR significantly improves the cross-domain generalization and strategic competence of SLMs compared to both SFT and standard sparse-reward RL baselines.
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.