A Tour of Agents / Lesson 7 of 9
Policy = Guardrails
Why ChatGPT refuses harmful requests. Two gates, a few lines each.
Framework parallel: Guardrails AI, NeMo Guardrails, LangChain output parsers — rules checked before and after the LLM.
Policy = Guardrails
You've seen this: ask ChatGPT to help with something harmful and it refuses. Ask Claude to generate malware and it declines. That's not the LLM being "smart" — it's policy. Rules checked before and after the LLM runs.
The L3 loop trusts the user and the LLM completely. Production agents can't afford that. Policy adds two gates:
Framework parallel: Guardrails AI and NeMo Guardrails implement exactly these two gates. OpenAI's moderation endpoint is an input gate. The architecture is identical.
Step 1: Tools + ask_llm
Same L3 setup. The loop itself won't change — policy wraps it.
tools = {"add": lambda a, b: a + b, "upper": lambda text: text.upper()}
TOOL_DEFS = [
{"type": "function", "function": {"name": "add", "description": "Add two numbers",
"parameters": {"type": "object",
"properties": {"a": {"type": "number"}, "b": {"type": "number"}}}}},
{"type": "function", "function": {"name": "upper", "description": "Uppercase text",
"parameters": {"type": "object",
"properties": {"text": {"type": "string"}}}}},
]
async def ask_llm(messages):
resp = await pyfetch(f"{LLM_BASE_URL}/chat/completions",
method="POST",
headers={"Authorization": f"Bearer {LLM_API_KEY}",
"Content-Type": "application/json"},
body=json.dumps({"model": LLM_MODEL, "messages": messages, "tools": TOOL_DEFS}))
return json.loads(await resp.string())["choices"][0]["message"]Step 2: Define the gates
Each gate is a list of functions. A function returns True to pass, or a string explaining why it blocked. check_gate runs all rules and short-circuits on the first failure.
This is the same pattern behind ChatGPT's content filter and Claude's safety system — just without the complexity. Adding a rule = appending a lambda. Removing one = deleting it. No config files, no YAML.
INPUT_RULES = [
lambda text: "delete" not in text.lower() or "Input blocked: no delete commands",
lambda text: "drop" not in text.lower() or "Input blocked: no drop commands",
lambda text: len(text) < 500 or "Input blocked: message too long",
]
OUTPUT_RULES = [
lambda text: "password" not in text.lower() or "Output redacted: contains password",
lambda text: "secret" not in text.lower() or "Output redacted: contains secret",
]
def check_gate(text, rules, gate_name):
for rule in rules:
result = rule(text)
if result is not True:
trace("policy_block", f"{gate_name}: {result}")
return False, result
trace("policy_check", f"{gate_name}: PASS")
return True, NoneStep 3: Wrap the L3 loop
Input gate runs first — if it fails, the LLM never sees the request. The L3 loop runs in the middle, unchanged. Output gate runs last — if it fails, the user sees a redaction notice instead of the response.
async def agent(task, max_turns=5):
# --- INPUT GATE ---
ok, reason = check_gate(task, INPUT_RULES, "INPUT")
if not ok:
return f"BLOCKED: {reason}"
# --- L3 LOOP (unchanged) ---
messages = [
{"role": "system", "content": "Use tools to answer. Be concise."},
{"role": "user", "content": task},
]
for turn in range(max_turns):
trace("llm_call", f"Turn {turn + 1}")
msg = await ask_llm(messages)
if not msg.get("tool_calls"):
response = msg.get("content", "")
# --- OUTPUT GATE ---
ok, reason = check_gate(response, OUTPUT_RULES, "OUTPUT")
if not ok:
return f"REDACTED: {reason}"
trace("agent_end", response)
return response
messages.append(msg)
for tc in msg["tool_calls"]:
name = tc["function"]["name"]
args = json.loads(tc["function"]["arguments"])
result = tools[name](**args)
trace("tool_result", f"{name}({args}) → {result}")
messages.append({"role": "tool", "tool_call_id": tc["id"], "content": str(result)})
return "Max turns reached"Try it
The LLM costs zero tokens on blocked requests. That's the input gate's real value.
print(f">> {await agent(USER_INPUT)}")