Skip to main content
Cogniscape captures development activity metadata — who did what, when, and why — not source code. Every event from GitHub, Linear, and developer tools passes through a multi-stage sanitization pipeline that strips code blocks, diff hunks, and raw payloads before anything reaches our knowledge graph. In rare cases, the Cogniscape MCP may return content that resembles code. This is not stored code leaking out — it is your own LLM reconstructing code-like patterns from the semantic descriptions stored in the graph. This page explains exactly how that works and why your source code remains safe.
Cogniscape stores semantic descriptions of what happened in your codebase — never the code itself.

How we process developer activity

Every event that enters Cogniscape passes through four stages before it reaches the knowledge graph. Each stage reduces the payload to only the semantic information needed for developer intelligence.
Stage 1: Event Reception
  GitHub / Linear / Developer Tools → raw event payload received

Stage 2: Normalization
  Raw payload → structured event model
  Only selected fields are mapped (title, action, developer, timestamps)
  The full raw payload is discarded

Stage 3: Sanitization
  Fenced code blocks (```...```) → removed
  Inline code (`...`) → removed
  Diff hunks → excluded
  Raw payloads → excluded
  Internal identifiers → excluded

Stage 4: Knowledge Graph Ingestion
  Sanitized data → AI extraction → entities, facts, and episodes
  The AI extracts semantic meaning: "who did what to what, and why"
  Output: natural-language descriptions, not code
Every event type (pull requests, reviews, comments, issues, pushes) has a dedicated processing path that explicitly selects which fields to include. Unknown event types fall back to a conservative default that still excludes all code and sensitive fields.

What we store vs. what we don’t

The tables below show exactly which fields from common developer events are kept and which are discarded.

Pull request events

FieldStoredExample of what reaches the graph
DeveloperYes"alice"
RepositoryYes"acme/backend"
ActionYes"opened", "merged"
PR numberYes42
TitleYes"Add retry logic to payment service"
StateYes"open", "closed"
Branch namesYes"feat/retry-payments"
LabelsYes["bug", "priority:high"]
Assignees / ReviewersYes["bob", "carol"]
PR body (description)SanitizedCode blocks and inline code removed; surrounding text kept
Diff / changed files contentNoNever captured
Raw event payloadNoAlways excluded

Review and comment events

FieldStoredNotes
Review stateYes"approved", "changes_requested"
Review bodySanitizedCode blocks removed
Comment bodySanitizedCode blocks removed
Code diffsNoContains actual code — always excluded
File pathYes"src/payments/retry.ts" (path only, not content)

Push events

FieldStoredNotes
BranchYes"main"
Commit messagesYesHuman-written text describing intent
Commit identifiersNoExcluded
File lists (added/modified/removed)NoExcluded
File contentsNoNot included in event payloads
Commit messages are written by developers and may occasionally reference code patterns. Cogniscape stores them as-is because they represent developer intent, not source code.

Understanding LLM-reconstructed content

This is the most important section of this document. Even with all code stripped from stored data, you may occasionally see what looks like source code in a Cogniscape MCP response. Here is why.

What the knowledge graph actually contains

When Cogniscape processes a sanitized event, our AI engine extracts entities and facts in natural language. For example, from a pull request review that discusses a timestamp bug fix, the graph might store the following (function and variable names are extracted from PR discussions, not from source code): Entities:
  • addNotification“A helper function that captures the current ID before incrementing to ensure correct timestamp alignment in notification creation.”
  • currentId“A variable used to generate sequential notification IDs and corresponding timestamp offsets.”
Facts:
  • “The addNotification helper was introduced to fix an off-by-one bug where the template literal evaluated before the ID increment.”
These are natural-language descriptions. There is no stored code.

How code-like content appears in responses

When you query the Cogniscape MCP — for example, asking “What technical issues were found in the notifications PR?” — the following happens:
  1. The Cogniscape MCP searches the knowledge graph and retrieves relevant entities and facts
  2. These results are passed to your LLM (the one powering your Claude Code, Claude Desktop, or other MCP client)
  3. Your LLM synthesizes a response from the semantic descriptions
Because the entity names are code identifiers (addNotification, currentId) and the fact descriptions are detailed enough to convey the logic, your LLM can reconstruct plausible code as part of its response. It is doing what LLMs do — generating the most helpful answer from the context it received.
The code in such responses is generated by your own LLM at query time, not retrieved from the Cogniscape database. It may not even match your actual implementation — it is the LLM’s best interpretation of the semantic descriptions.

A concrete example

Here is what is stored in the graph versus what your LLM might generate:
Entity: addNotification
Summary: "A helper function that captures the current ID before
incrementing to ensure correct timestamp alignment."

Fact: "The template literal was previously causing potential
misalignment due to how the ID was being incremented
inside the function."

Security by design

Cogniscape’s data protection is enforced at multiple layers, ensuring that no single point of failure can expose source code.
LayerProtection
Event receptionOnly selected event types are accepted; others are rejected
NormalizationRaw payloads are discarded — only structured metadata fields proceed
SanitizationCode blocks, inline code, diff hunks, and sensitive fields are stripped
Knowledge graphAI extracts natural-language entities and facts, not code
Cogniscape MCPReturns semantic search results; any code in the final response is generated by the client’s own LLM

Questions?

If you have questions about how Cogniscape handles your data, contact us at [email protected].