Anthropic, a leading AI research company, has introduced a new tool called Claude Code Security that uses advanced language models to detect complex software vulnerabilities that traditional scanners often miss. This development has sent shockwaves through the cybersecurity industry, with shares of major security firms tumbling in the wake of the announcement. The key innovation of Claude Code Security is its ability to understand code the way a human security researcher would, rather than relying solely on pattern matching against known vulnerability signatures. Anthropic's models can track how data flows through an application and spot subtle business logic errors or access control issues that evade conventional scanning tools. The system also has multiple verification stages to minimize false positives before presenting findings to human analysts. In testing, Anthropic's latest Opus 4.6 model was able to uncover previously undetected, high-severity vulnerabilities in widely used open-source software - some of which had gone unnoticed for decades. Anthropic's Frontier Red Team, a group dedicated to stress-testing the company's AI systems, found that the model's capabilities in this area have significantly improved. While Claude Code Security is designed to assist security teams rather than replace them, the tool's ability to autonomously analyze codebases at scale has understandably rattled the cybersecurity industry. Anthropic is rolling out the feature cautiously, making it available initially as a limited research preview for enterprise customers. Separately, Anthropic has been making strides in developing AI agents that can interact with computer interfaces, a capability known as "Computer Use." This involves models that can process screen images, understand the state of a system, and issue mouse and keyboard commands to carry out tasks. The company has released educational materials on building towards this functionality, which could enable AI assistants to autonomously navigate and operate software applications. Additionally, Anthropic has introduced "Agent Skills," a framework that allows specialized capabilities to be packaged and deployed across different AI agents. This modular approach enables agents to become domain experts by equipping them with relevant skills, from data analysis to code generation and review. These advancements in agentic AI capabilities, from autonomous vulnerability detection to computer interaction and specialized skills, highlight Anthropic's efforts to push the boundaries of what language models can achieve. As these technologies continue to evolve, they will likely have significant implications for the future of software development, cybersecurity, and human-AI collaboration.
OpenAI Is Asking Contractors to Upload Work From Past Jobs to Evaluate the Performance of AI Agents
OpenAI Grapples with Prompt Injection Attacks and the Limitations of Current AI Models In a series of recent developments, OpenAI has been confronting the challenges of securing its AI agents against prompt injection attacks, while also acknowledging the fundamental limitations of current AI models in learning from mistakes. Firstly, OpenAI has released a security update for the browser agent in its ChatGPT Atlas system, which includes a newly adversarially trained model and enhanced security measures. This comes in response to a new class of prompt injection attacks discovered through the company's internal automated red-teaming efforts. As the agent mode in ChatGPT Atlas can view web pages and perform actions like a human user, it has become an easy target for such attacks, which aim to manipulate the agent's behavior. According to The Decoder, OpenAI admits that prompt injections may never be fully solved, casting doubt on the "agentic AI vision" that the company has been pursuing. The attack surface is virtually unlimited, as any text-based input that an AI model reads can potentially be a target for malicious instructions. Separately, former OpenAI researcher Jerry Tworek has highlighted another fundamental problem with current AI models: their inability to learn from mistakes. Tworek, who worked on OpenAI's reasoning models, believes that unless AI systems can "work themselves through difficulties and get unstuck on solving a problem," they cannot be considered true Artificial General Intelligence (AGI). He describes AI training as a "fundamentally fragile process," in contrast with the robust and self-stabilizing nature of human learning. Meanwhile, OpenAI CEO Sam Altman has expressed optimism about working with the incoming Trump administration, stating that the U.S. and its allies must build the best AI infrastructure to maintain a technological edge over China. Altman believes President-elect Trump will be "very good" at helping to achieve this goal, which he sees as "one of these unusually important moments in the history of technology." In a related development, OpenAI has hired Peter Steinberger, the creator of the viral OpenClaw AI assistant, to spearhead the company's efforts in developing "the next generation of personal agents." This move comes amid an intensifying race among tech giants to create more advanced and secure AI agents. Overall, these events highlight the ongoing challenges and complexities faced by leading AI companies like OpenAI as they strive to push the boundaries of artificial intelligence while addressing critical security and technical limitations.
Distilling the tool-using capabilities of large language models (LLMs) into smaller, more efficient small language models (SLMs) is a key challenge for their practical application. The predominant approach, supervised fine-tuning (SFT), suffers from poor generalization as it trains models to imitate a static set of teacher trajectories rather than learn a robust methodology. While reinforcement learning (RL) offers an alternative, the standard RL using sparse rewards fails to effectively guide SLMs, causing them to struggle with inefficient exploration and adopt suboptimal strategies. To address these distinct challenges, we propose MENTOR, a framework that synergistically combines RL with teacher-guided distillation. Instead of simple imitation, MENTOR employs an RL-based process to learn a more generalizable policy through exploration. In addition, to solve the problem of reward sparsity, it uses a teacher's reference trajectory to construct a dense, composite teacher-guided reward that provides fine-grained guidance. Extensive experiments demonstrate that MENTOR significantly improves the cross-domain generalization and strategic competence of SLMs compared to both SFT and standard sparse-reward RL baselines.
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
Large language models (LLMs) have demonstrated promising performance in generating diagnostic conclusions from imaging findings, thereby supporting radiology reporting, trainee education, and quality control. However, systematic guidance on how to optimize prompt design across different clinical contexts remains underexplored. Moreover, a comprehensive and standardized framework for assessing the trustworthiness of LLM-generated radiology reports is yet to be established. This study aims to enhance the trustworthiness of LLM-generated liver MRI reports by introducing a Multi-Dimensional Credibility Assessment (MDCA) framework and providing guidance on institution-specific prompt optimization. The proposed framework is applied to evaluate and compare the performance of several advanced LLMs, including Kimi-K2-Instruct-0905, Qwen3-235B-A22B-Instruct-2507, DeepSeek-V3, and ByteDance-Seed-OSS-36B-Instruct, using the SiliconFlow platform.