Top Stories

View all

Anthropic, the AI company, has made a bold move by investing $50 billion in American AI infrastructure, signaling its commitment to the future of the technology. This investment comes at a time when the AI industry is facing increasing scrutiny and calls for regulation. According to a report by Sequoia Capital, the AI revolution is poised to be even more transformative than the Industrial Revolution, with the potential to create a "cognitive assembly line" that could outpace the development of previous technological advancements. Anthropic's CEO, Dario Amodei, has acknowledged the need for responsible and thoughtful regulation of AI, stating that he is "deeply uncomfortable" with the decisions about the future of the technology being made by a few tech leaders. Anthropic has emerged as a leading rival to OpenAI and Google in the race to build advanced AI models. The company's Claude family of AI models has gained significant traction in the enterprise market, with Anthropic on track to hit an annualized run rate of close to $10 billion by the end of 2025. This growth has been achieved without the massive capital expenditures seen in the industry, as Anthropic has found ways to train and run its AI models more efficiently. However, Anthropic's stance on AI safety has put it at odds with the Pentagon, which is seeking unrestricted access to the company's AI technology. Anthropic has refused to grant the military unfettered access, insisting on safeguards against the use of its AI for autonomous weapons and domestic surveillance. This standoff has led the Pentagon to consider scaling back or ending its partnership with Anthropic, as other tech giants like OpenAI, Google, and xAI have been more willing to work with the military without such restrictions. To further its commitment to AI safety, Anthropic has donated $20 million to a super PAC focused on AI regulation and safety. This move sets up a fight with rival OpenAI, which has backed a super PAC that has raised over $125 million ahead of the US midterm elections. The industry's increasing political spending reflects the growing public concern over the potential risks of AI, with a 2025 Gallup poll finding that 80% of Americans support rules for AI safety, even if they slow innovation. Anthropic's investment in American AI infrastructure and its stance on responsible AI development highlight the complex and evolving landscape of the AI industry. As the technology continues to advance, the debate over its regulation and the role of tech companies in shaping its future will likely intensify, with Anthropic positioning itself as a leader in the pursuit of safe and ethical AI.

Claude Code: A Highly Agentic Coding Assistant

Headline: "Claude Code: A Highly Agentic Coding Assistant Pushes the Boundaries of AI-Powered Programming" In the rapidly evolving world of AI-powered coding assistants, a new tool called Claude Code is making waves with its unprecedented level of autonomy and agency. Developed by Anthropic, Claude Code is described as a "highly agentic" assistant that can autonomously plan, execute, and improve code with minimal human input. According to a report in The Batch by Andrew Ng, Claude Code represents a significant advancement in the capabilities of AI coding tools. Unlike earlier assistants that primarily helped with occasional coding questions and completion, Claude Code can now work in parallel with developers, running multiple instances to tackle different parts of a codebase. However, this increased autonomy requires new best practices to effectively coordinate the AI's efforts. In a new short course, Anthropic's Head of Technical Education, Elie Schoppik, outlines key tips for using Claude Code, such as providing clear context, specifying relevant files, and connecting the assistant to servers. These practices are demonstrated through examples like exploring a chatbot codebase, analyzing ecommerce data, and creating a web app. Meanwhile, DeepMind has also been exploring the potential of AI-powered code security with its "CodeMender" agent. This system leverages advanced AI models to automatically find and patch software vulnerabilities, with the goal of helping developers focus on building good software rather than constantly chasing security issues. According to the DeepMind blog, CodeMender has already upstreamed 72 security fixes to open-source projects, including some with millions of lines of code. The system operates by using AI reasoning capabilities to debug and fix complex vulnerabilities, while also validating the changes to ensure they don't cause regressions. As these AI coding assistants continue to evolve, experts caution that developers must maintain a critical eye and strong technical skills to ensure the agents are used effectively. As Sean Goedecke notes, "If you're good at reviewing code, you'll be good at using tools like Claude Code, Codex, or the Copilot coding agent." The ability to spot when the AI is going down the wrong track is crucial to avoiding complex, unnecessary solutions. Overall, the emergence of highly agentic coding assistants like Claude Code and CodeMender represents a significant step forward in the integration of AI into the software development process. While these tools hold great promise, their effective use will require a careful balance of human expertise and AI capabilities.

Sea Ltd inks deal with Google to deploy agentic AI across its ecosystem
Zuckerberg: Meta building city-sized AI data center, going on $65 billion spending spree
AI is everywhere except in the data, suggesting it will enhance labor in some sectors rather than replace workers in all sectors, top economist says
Redefine what’s possible for your business with Microsoft AI

New Stories

View all

Research Papers

View all

MENTOR: A Reinforcement Learning Framework for Enabling Tool Use in Small Models via Teacher-Optimized Rewards

ChangSu Choi, Hoyun Song, Dongyeon Kim, WooHyeon Jung, Minkyung Cho, Sunjin Park, NohHyeob Bae, Seona Yu, KyungTae Lim

Distilling the tool-using capabilities of large language models (LLMs) into smaller, more efficient small language models (SLMs) is a key challenge for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor generalization as it trains models to imitate a static set of teacher trajectories rather than learn a robust methodology. While reinforcement learning (RL) offers an alternative, the standard RL using sparse rewards fails to effectively guide SLMs, causing them to struggle with inefficient exploration and adopt suboptimal strategies. To address these distinct challenges, we propose MENTOR, a framework that synergistically combines RL with teacher-guided distillation. Instead of simple imitation, MENTOR employs an RL-based process to learn a more generalizable policy through exploration. In addition, to solve the problem of reward sparsity, it uses a teacher's reference trajectory to construct a dense, composite teacher-guided reward that provides fine-grained guidance. Extensive experiments demonstrate that MENTOR significantly improves the cross-domain generalization and strategic competence of SLMs compared to both SFT and standard sparse-reward RL baselines.

GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection

Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, Jiaqi W. Ma

Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.

Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.

AI Jobs

View all