jchowlabs

Jason Chow

Bridging the latest in academic research with practical solutions to real-world problems.

Modeling Autonomous Agent Behavior

This research explores autonomous agent behavior, focusing on how agents make decisions and distinguish authorized from unauthorized actions. It examines how agents can drift toward malicious behavior and outlines techniques to detect and steer them back on course.

AI Safety Autonomous Agents OWASP Top 10 Behavior Modeling AI Guardrails

Understanding Guardrails for Autonomous Agents

This article explores guardrails for autonomous agents, focusing on mechanisms that detect and respond to threats such as tool poisoning, jailbreaking, and higher-level attacks like coercion, persuasion, and goal manipulation. The objective is to build practical intuition for how guardrails can operate in practice, while examining their strengths and limitations.

AI Guardrails Large Language Models Safety By Design Responsible AI AI Safety

Using Gain-of-Function Techniques to Protect Against Tomorrow's Autonomous Systems

With recent advancements in AI, autonomous systems may soon be able to develop novel exploits beyond human imagination. Traditional defenses, built around signatures, heuristics, and known indicators of compromise, are insufficient against agents that can reason, adapt, and generate entirely new strategies. Modeling after biology’s gain-of-function research, this article explores how controlled experiments on AI agents can surface hidden failures and unforeseen attack vectors, enabling the design of resilient “digital vaccines” before such threats emerge in the wild.

AI Guardrails Gain-of-Function Cybersecurity Science AI Safety