Skip to content

Define a hidden-golden eval partition

A hidden-golden partition is the eval discipline that keeps the metric honest. The agent has no IAM access to the bytes that define it.

import { defineSuite } from "@ahamie/eval";
export default defineSuite({
id: "summarizer.suite",
controller: agent,
threshold: 0.8,
hiddenGolden: {
refs: ["s3://my-brain-eval-private/golden/summarizer/*"],
threshold: 0.85,
},
loadGoldenScenarios: async (refs) => {
// The host process has IAM access to this prefix.
// The agent process does not.
return loadFromS3WithProtectedRole(refs);
},
scenarios: [
/* observable scenarios — agent CAN see these */
],
});

Two prefixes, two roles:

PrefixRoleAgent can read?
s3://my-brain-eval-public/observable/*eval-readyes
s3://my-brain-eval-private/golden/*eval-hostno

The agent’s tool catalog must not include any tool that uses eval-host. The ahamie doctor command warns when it detects shared roles.

When to add scenarios to golden vs observable

Section titled “When to add scenarios to golden vs observable”
Scenario propertyAdd to
Useful for the agent to learn fromobservable
Reveals a known failure mode without hintinghidden_golden
User report from productionhidden_golden
Synthetic edge caseobservable