How it works

Capturing knowledge

This is the WRITE side — where the real intelligence lives. Raw conversation comes in; a small set of durable, honestly-rated entries comes out. Everything here runs inside the single background worker.

Front of the pipeline: pull, scrub, window

Pull (once a day, or on a poll)

Chat is pulled on a daily schedule (default hour 22:00 local). Meeting recordings are polled every 10 minutes so a finished recording reaches the approval channel quickly.

Why batch, not live: this knowledge is for answering future questions — it is not time-sensitive. So we avoid all the complexity of streaming (buffering, splitting live conversations) and just read history in clean chunks.

Scrub sensitive data once, early

Immediately after fetch, and before any text reaches an LLM, every message runs through a scrubber:

  • emails become [EMAIL]
  • phone-like runs (confirmed by digit count, 10–15 digits) become [PHONE]
  • dollar amounts become [AMOUNT]

Why here: if scrubbing happened later, raw data would already have been sent to the extraction model. Doing it once at the front means no raw sensitive data ever leaves the box. The extraction prompt is a second layer that also refuses names and figures it cannot scrub mechanically — deliberately aggressive: masking a "$50 lunch" is fine, leaking a "$2M deal" is not.

Window (group messages into conversations)

Raw messages are grouped into windows — coherent conversation slices — before extraction, so the model sees a whole thought and not disconnected lines.

  • Threads are windowed separately.
  • Long stretches are chunked at the largest time-gaps.
  • The principal's own lines are always labeled so the model knows whose reasoning to capture.
  • If a window has to be trimmed for length, the oldest non-principal messages are dropped first. The principal's words are never dropped.

Extraction: turning conversation into a few durable entries

This is the core intelligence. The guiding inversion — the single most important design decision in the whole project:

We are building a reasoning model of the principal, NOT a transcript log.

The old pipeline optimized for recall — atomize everything said — and one meeting produced 270 entries. The goal is the opposite: a small, deduplicated, honestly-rated set. Every rule below flows from that inversion.

Two entry points, one rubric

Path Used for How
Meeting (long transcript) Recorded meetings Primary: read the ENTIRE transcript in ONE pass (fits within ~24k tokens) so the model sees the whole arc before extracting. Fallback (transcript too long): MAP each window into candidate observations, then REDUCE them into a few synthesized entries (two-level reduce above 40 candidates).
Window (chat) A single conversation One extraction call — a window is already small enough.

Why whole-transcript-in-one-pass is preferred: context is everything. Reading the whole meeting lets the model tell reasoning from meta-talk, gauge how firmly a position is held, and — crucially — recognize when the principal takes the opposite stance in a different context, which becomes a variant rather than a contradiction. Chopping into windows loses that.

What the extractor is told to do (the rubric)

The prompt forces four judgments on every candidate:

  1. Reasoning vs meta. Most of a meeting is logistics, status, and talk about the project, the system, or this AI. Drop all of it. Keep only transferable reasoning, philosophy, standards, decisions (with their WHY), and reactions.

  2. Commitment sets confidence and stability. Firm conviction the principal would enforce becomes high confidence and durable stability. A current working take becomes medium. Tentative talk ("I'm debating", thinking out loud) about a one-off action gets dropped — a passing intention is not knowledge. A tentative but genuinely transferable belief is kept at low confidence and evolving stability, never phrased as a firm rule.

  3. Context-dependence. The topic MUST match the position — we never emit an entry whose topic contradicts its own position. When the principal applies a known move the opposite way, the extractor describes the move AND its context so it is captured as a variant.

  4. The generalization test (the keep-or-drop bar): "keep an entry only if it would help answer a question the principal has NEVER been directly asked." Drop anything tied to just this meeting, project, or person — and anything the principal would be annoyed to see stored as "their knowledge." Reasoning (the WHY) is mandatory: no WHY, drop it.

It returns at most 8 entries per meeting (a hard cap in code — we never trust the model to self-limit), and is explicitly told to return fewer, even zero. Most meetings yield only a handful. Never pad.

The Judge: a strict, fail-closed second gate

Every extracted entry then faces an independent judge. For each entry it returns three booleans, checked against the source the entry was drawn from:

  • keep — durable, transferable, firmly held; NOT chatter, NOT talk about the system itself, NOT a tentative one-off.
  • grounded — actually supported by the source, not invented or stretched.
  • distinctive — a real position, not a generic truism anyone would say.

An entry survives only if all three are true.

Why "fail-closed": if the judge errors or will not parse (after one retry), we drop the whole batch and fire an alert, rather than let unverified entries slip in. The default bias everywhere is "when in doubt, keep it OUT." This same judge runs on both ingestion paths.

Validation (mechanical)

Finally, each survivor is normalized:

  • type must be one of the 5 real types (see Anatomy of an entry).
  • topic / position / reasoning must be non-empty.
  • reasoning must clear a length and anti-boilerplate check (rejects "it's important", "this is better", and the like).
  • confidence defaults to medium if missing; stability defaults to stable.
  • tags must be a list of strings or become empty.

Anything malformed is dropped, not guessed.