Turning AI Collaboration Into Infrastructure

Manager AI and worker AI agents maintaining skills, qmd, memory, and verification gates together — AI collaboration infrastructure is not one prompt. It is a loop of intent, context search, skill routing, execution, verification, and memory promotion.

After reading Eugene Yan’s “How to Work and Compound with AI”, I wanted to write down how bluetape4k actually runs its AI working environment. The previous post was about what Claude Code and Codex helped build. This post is about what had to exist around them so that every session did not start from zero again.

At first, I thought a good prompt would be enough. It was not. Once the repositories grew and the rules around Kotlin, Spring, Exposed, Ktor, GitHub issues, PRs, lessons, CI, and releases started to interlock, prompts alone became too fragile. AI needed the same kind of onboarding docs, working procedures, searchable history, and verification gates that a human teammate would need.

In bluetape4k, that responsibility is shared by AGENTS.md, CLAUDE.md, skills, qmd, memory, and hooks. Models keep changing. This working environment survives into the next session.

`AGENTS.md` and `CLAUDE.md` Are Onboarding Docs

When a human joins a project, you give them more than a README. You explain the code layout, branch policy, test style, documentation language, choices to avoid, and old decisions that still shape the project. AI needs the same information.

In the bluetape4k workspace, AGENTS.md is the primary onboarding document for Codex. It says that conversation with the user stays in Korean, while public KDoc, PRs, and commit messages are written in English. It records the Git policy: develop is the integration branch, and main is release-only. It also records the Kotlin workflow: inspect references and impact before editing Kotlin code; after touching .kt files, run IDE diagnostics, optimize imports, resolve deprecations, then compile and test the affected modules.

Claude gets the same kind of guidance through ~/.claude/CLAUDE.md, commands, skills, and hooks. Codex gets it through ~/.codex/config.toml, skills, MCP, native subagents, and repo-local AGENTS.md files. The tooling is different, but the purpose is the same: onboard each new session like a new teammate and promote repeated preferences into durable configuration.

These documents are layered. Global rules live under the home directory. Workspace-wide bluetape4k rules live at the workspace root. Each repository can add narrower rules. A Kotlin library repository and a blog repository cannot share every detail, so the closest document provides the most specific guidance.

Skills Are Procedures for Repeated Work

If a prompt says “think this way,” a skill says “work through this procedure.” Repeated work becomes fragile when it is left to a few prompt lines. A skill can say which context to read first, which gate to pass, and which verification evidence is required.

In bluetape4k, bluetape4k-workflow is the first router. It classifies work into lanes such as Full Design, Fast Track, Bug Fix, Code Review, and Maintenance, then selects the verification level that fits the work.

More focused skills sit below it. bluetape4k-design handles new modules, broad API changes, and multi-layer work. bluetape4k-patterns handles Kotlin implementation and final checklists. ecc-kotlin-exposed, ecc-springboot-kotlin, ecc-kotlin-testing, and kotlin-coroutines-skill separate Exposed, Spring Boot, testing, and coroutine-specific judgment. review-delta, review-pr, code-review, and bugfix-workflow handle review and follow-up fixes.

Instructions like “also update the README,” “watch for deprecated Exposed imports,” or “prove this with affected module tests” are easy to forget when they have to be repeated every time. Loading the right skill brings the checklist into the task and makes the verification standard explicit.

qmd Is the Search Layer for Old Decisions

AI sessions forget easily. Repositories accumulate docs, lessons, issues, PRs, plans, and experiments. qmd connects those two facts.

In bluetape4k, prior decisions, lessons, specs, plans, and historical context are searched through qmd first. Workspace docs live in the bluetape4k-docs collection. Personal or cross-project knowledge lives in the wiki collection.

Exact code symbols and filenames are still better handled with rg. But questions like “why did we choose this?”, “where did we build something similar?”, or “did this fail before?” are better suited to qmd. It narrows the relevant documents first, then the session can descend into code and PR history.

That difference matters. Asking AI to read the whole repository is expensive and unstable. A search layer lets the session find the relevant context first and only read the range needed for the task.

Memory Must Escape the Session

Claude and Codex both become unstable if session memory is the only source of truth. bluetape4k keeps memory in several layers.

Short-lived work state lives in runtime artifacts such as .omx/state, .omx/notepad.md, and .omx/plans. Longer-lived design decisions live in docs/superpowers/specs and docs/superpowers/plans. Lessons learned from completed work live in docs/lessons.

A lesson does not need to be long. It should record the context, decision, outcome, verification evidence, and what the next agent should do differently. That is enough to keep the next AI session from re-inferring the same decision.

The important part is not storage. The important part is whether the next task reads it. If a lesson does not feed a skill, cannot be found through search, and is never read during the next task, it is just an old note. Repeated lessons should become skills, AGENTS.md rules, hooks, or tests.

Hooks Catch Frequent Mistakes Early

Telling AI to “be careful” is not enough. Humans need CI, pre-commit checks, and linting. AI needs guardrails too. Rules that matter repeatedly should become hooks.

Claude has hooks for sensitive-file blocking, destructive-git guarding, Kotlin checks, Gradle test guards, README sync reminders, and keyword detection. Codex uses hooks, native subagents, skill routing, and MCP surfaces for similar purposes.

Hooks are not a sign that the model is untrusted. They are a way to catch common mistakes early. Destructive commands, branch-name mistakes, sensitive-file access, workflow drift, and missing tests are cheaper to stop during the task than after CI or review finds them.

Delegation Needs Verification

Codex native subagents and OMX team mode can run work in parallel. Parallelism does not guarantee quality. It changes the bottleneck from implementation speed to ownership and verification: which agent owns which files, how the result is checked, and where conflicts are reported.

That is why bluetape4k does not treat an explanation as proof of completion. For a small change, a targeted test or build check may be enough. If Kotlin code changed, IDE diagnostics, import cleanup, deprecation checks, and affected module tests are expected. If a public API changed, KDoc and README coverage matter. If GitHub workflow files changed, nightly workflow impact is checked too.

Those rules make delegation usable. Multiple agents can move at once only when the final judgment can still rely on verification evidence and a clear change boundary.

Codex and Claude Read the Same Environment Differently

In practice, Codex and Claude do not work exactly the same way. Codex is governed through AGENTS.md, skills, MCP/context-mode, qmd, and native subagents. Claude is governed through CLAUDE.md, commands, skills, hooks, and project histories.

That means durable guidance cannot stay on only one side. If Codex discovers a repeated rule that Claude also needs, it should move into repository docs or a shared skill. If Claude repeatedly catches a problem, the fix should become something Codex can read too.

The core instruction is simple:

Follow this repository workflow, inspect the impact, prove the change with tests, and leave a short lesson for the next agent.

The Environment Also Needs Refactoring

An AI working environment ages like a codebase. Skills can overlap. Hooks can warn about the wrong thing. Long AGENTS.md and CLAUDE.md files can bury the rule that actually matters. The environment itself needs maintenance.

My rule is simple. If I make the same correction twice, it should become a rule or skill. If I see the same failure three times, it should become a hook or test. If a procedure is no longer used, delete it. Do not preserve one-off notes forever; promote repeatable decisions into durable artifacts.

Conclusion

The main lesson from working with AI over time is that the environment compounds more than the model. Models change. A well-organized repository, clear AGENTS.md and CLAUDE.md, reusable skills, qmd-searchable knowledge, verification hooks, and short lessons remain useful across sessions.

AI collaboration is less about a better prompt and more about a better working environment. If AI is used as a one-off generator, each session drifts. If AI is onboarded like a teammate, procedures are managed like code, and verification plus memory become infrastructure, the work becomes more stable across sessions.

Comments

Leave a note or reaction with your GitHub account.