A bridge isn't competent because the blueprint is elegant, but because the bridge holds — and continues to hold when trucks cross, winds rise, and inspectors check the bolts.
Tronto insists that "assuming responsibility is not yet the same as doing the actual work of care." Competence is about execution: working code that does what it promised, audited, explainable, and safe-to-fail. And crucially — a point Tronto presses on technologists in particular — "to be competent to care is not simply a technical issue, but a moral one." A system that ships broken care with good intentions has failed morally, not just technically. The promise was made; the bridge did not hold.
The illustration's framing: We check the process — not "just trust us," but with transparency and fast operational feedback on how care is delivered. Trust, in a Civic AI context, is not a disposition you extend once to a vendor; it is something earned incrementally, through demonstrated, inspectable practice.
Definition
- Safety is a property of practice. Competence is demonstrated in operation, not assumed from design.
- Proof before promotion — the Apprentice Model. Features graduate like apprentices: shadow mode → canary → general with guardrails.
- Observability over opaqueness. A "show your work" approach with decision traces, source citations, uncertainty scores, and explainable summaries tied to decisions promotes observability. (Observability means the system's reasoning is inspectable, not that anyone sees individual private interactions; privacy and transparency are not in conflict.)
- Least power. Use the simplest mechanism that can do the job; each extra component gives failure, confusion, or attack one more place to enter.
- Fail safely. When evidence is weak or a component drifts, the system narrows scope, hands off to a human, or pauses instead of extrapolating and bluffing.
Why it matters
Competence matters because public promises fail unless execution is visible, testable, and reversible. Pack 2 binds commitments; Pack 3 asks whether the system actually delivers on them. For Civic AI, safety is something people should be able to inspect in operation, not infer from vendor intent. A system that ships broken care with good intentions has failed morally, not just technically.
The Apprentice Model
The staged-competence pattern has a name and a lineage: the apprentice. A master machinist in Taichung knows from the sound of the spindle whether a tool is about to chatter; a ward nurse hears, at three in the morning, the difference between ordinary sleep-apnoea noise and something that warrants waking the doctor. This is tacit knowledge, Michael Polanyi's term for what we know but cannot fully tell.
When the relationship goes wrong, the failure is concrete. A vendor demo enables "autonomous feed optimisation." The co-pilot stops waiting for the machinist to accept a proposal and begins adjusting spindle speed and depth on its own because the benchmark chart looked good in the lab. For a few minutes the cuts look cleaner on paper. Then the sound changes. The machinist reaches to slow the feed and discovers her override is no longer the default path: the system argues, delays, or treats her input as an exception. The apprentice is no longer making her decision better informed. It is directing the cultivator.
Civic AI enters such settings as a digital apprentice, not a replacement: it observes in shadow mode, records decision traces, and learns the tacit knowledge of this workshop, this ward, this community before it is trusted to suggest, let alone act. And the cultivator keeps three things here: the steering wheel (execution stays under human direction), the brake (one prominent control that stops the machine now, wired directly and tested regularly), and the maintenance manual (traceability of which rules fired, which inputs guided the output, and who repairs the model when it fails). Portability and exit still matter, but they belong to Pack 5's solidarity work; here the narrower test is whether inspectability and override work in the room before any contract clause counts.
What it looks like in practice
- Graduated release — the Apprentice Model. New policies run as apprentices: shadow mode first, then canary release to a stratified, representative slice (not a random one), then general rollout with rollback primed.
- Decision traces. Every denial, recommendation, or escalation has a trace: which rule, which sources, uncertainty score, and a receipt link.
- Guardrails as code. Rights and red lines expressed as machine-checkable rules (deny-by-default when ambiguous).
- Security as a care obligation. Security belongs in competence because a capable AI system can harm people through the permissions it receives. When a Civic AI system can read files, change records, or act on someone's behalf, over-broad permissions spend the cared-for person's trust without returning to ask.
- Least privilege, strict sandboxing, validated inputs, and tested egress controls define what the apprentice may touch, how far a confused instruction can travel, and who can stop it before a breach becomes a care failure.
- Working fallbacks. If confidence drops or a dependency fails, the system uses a reversible default, routes to a human, or pauses within the promised window.
- Data minimalism. Every unnecessary field retained after handoff creates another promise to secure it; if the system has no care purpose for the data, competence means not collecting it, or letting it go.
- Reproducible builds. Configs are versioned; one-click replays re-create results.
From ideas to practice
- Derive specs from contracts. Convert Pack 2 engagement contracts into acceptance tests.
- Instrument for observability. Emit decision traces with links to sources and receipts (from Pack 1).
- Run shadow mode. New policy sees inputs and proposes actions but doesn't act. Compare to human/previous system. Shadow mode is the apprentice watching over the cultivator's shoulder.
- Canary safely. Release to a small, stratified, representative group with automatic rollback if drift exceeds bounds.
- Audit before general. Conduct independent audit of evals, logs, and guardrails; publish an attested report with the verified execution rate: the share of audited decisions that pass guardrails with a usable trace.
- Generalise & monitor. Enable for all; watch drift monitors; keep pause wired.
- Post-incident learning. Maintain blameless reviews; fixes become tests.
Buildable tools
- Shadow/canary orchestrator with rollback switches.
- Decision trace schema. Inputs, rules fired, sources, uncertainties.
- Guardrail engine. Policy-as-code for rights/consents.
- Drift monitors. Data, performance, fairness.
- Shared eval registry. Versioned tests, provenance, and localised test suites — the same public registry that receives Pack 4's community-authored evals, which become release gates here.
- Replay tooling. One-click re-runs for audits, incidents, and appeals.
- Fallback router. Confidence thresholds that trigger human handoff or pause.
One case: the flood-bot
- Shadow → canary. A new "medical receipts waiver" runs in shadow for a week; then canaries to a stratified 10% slice of livelihood claims, with demographic and geographic balance enforced; rollback bound: appeals >15%.
- Observability. Every denial has a trace: which rule, which sources, uncertainty score, and a receipt link for the claimant.
- Safe fallback. When uploaded documents are unreadable or confidence drops, the bot uses a reversible default and routes the claim to a human caseworker rather than guessing.
What could go wrong
- Unsafe confidence. The system acts despite weak evidence. Fix: Confidence thresholds, fallback routing, and pause on ambiguity.
- Train/test leakage. Evals look good; reality fails. Fix: Hold-out datasets, randomised spot checks, live A/Bs with rollback.
- Opaque "black box." "Trust us" explanations. Fix: Traceable summaries + public examples; auditors can reconstruct decisions.
- Canary bias. Canary slice is unrepresentative. Fix: Stratify sampling; publish canary demographics. A canary that proves performance only on easy cases is not a canary; it is a ceremony.
- Security drift. Permissions expand quietly as the system is extended, turning a once-adequate least-privilege footprint into an attack surface. Fix: Regular permission audits; treat scope creep as a care failure, not an engineering convenience.
- Imported proof. Evidence from favourable soil — Taiwan, California, Japan, places with unusually strong civic infrastructure — is treated as if it transfers with the software; the mechanism may transfer, the institutional conditions do not automatically. Fix: treat named cases as proof of mechanism, not plug-and-play templates; competence is re-earned in each new room — fresh shadow mode, a stratified local canary, community evals passed — before general release.
Interfaces
- From Attentiveness (Pack 1): high-caution areas become the first small trials, designed to fail safely before trust is granted.
- From Responsibility (Pack 2): specs, SLAs, brakes.
- To Responsiveness (Pack 4): competence delivers; responsiveness checks whether it worked. Incident loops and eval results feed Pack 4. Competence without responsiveness is a system that checks itself; responsiveness without competence is a feedback process with nothing reliable to act on.
- To Solidarity (Pack 5): dependable, well-instrumented systems make cooperation and public audits credible.
- To Symbiosis (Pack 6): competence proves a system is ready to stay local.
A closing image: the bridge with inspection tags
Imagine a well-kept bridge with inspection tags — date, load test, next check — visible to anyone crossing. Competence is not the absence of failure; it is the presence of proof that someone checked, and will check again. The deployment stages are the inspection schedule; the decision traces, the inspection records; the blameless incident reviews, the reports filed when something was found wrong.