Data Privacy & Code Safety

Cogniscape captures development activity metadata — who did what, when, and why — not source code. Every event from GitHub, Linear, and developer tools passes through a multi-stage sanitization pipeline that strips code blocks, diff hunks, and raw payloads before anything reaches our knowledge graph. In rare cases, the Cogniscape MCP may return content that resembles code. This is not stored code leaking out — it is your own LLM reconstructing code-like patterns from the semantic descriptions stored in the graph. This page explains exactly how that works and why your source code remains safe.

Cogniscape stores semantic descriptions of what happened in your codebase — never the code itself.

How we process developer activity

Every event that enters Cogniscape passes through four stages before it reaches the knowledge graph. Each stage reduces the payload to only the semantic information needed for developer intelligence.

Stage 1: Event Reception
  GitHub / Linear / Developer Tools → raw event payload received

Stage 2: Normalization
  Raw payload → structured event model
  Only selected fields are mapped (title, action, developer, timestamps)
  The full raw payload is discarded

Stage 3: Sanitization
  Fenced code blocks (```...```) → removed
  Inline code (`...`) → removed
  Diff hunks → excluded
  Raw payloads → excluded
  Internal identifiers → excluded

Stage 4: Knowledge Graph Ingestion
  Sanitized data → AI extraction → entities, facts, and episodes
  The AI extracts semantic meaning: "who did what to what, and why"
  Output: natural-language descriptions, not code

Every event type (pull requests, reviews, comments, issues, pushes) has a dedicated processing path that explicitly selects which fields to include. Unknown event types fall back to a conservative default that still excludes all code and sensitive fields.

What we store vs. what we don’t

The tables below show exactly which fields from common developer events are kept and which are discarded.

Pull request events

Field	Stored	Example of what reaches the graph
Developer	Yes	`"alice"`
Repository	Yes	`"acme/backend"`
Action	Yes	`"opened"`, `"merged"`
PR number	Yes	`42`
Title	Yes	`"Add retry logic to payment service"`
State	Yes	`"open"`, `"closed"`
Branch names	Yes	`"feat/retry-payments"`
Labels	Yes	`["bug", "priority:high"]`
Assignees / Reviewers	Yes	`["bob", "carol"]`
PR body (description)	Sanitized	Code blocks and inline code removed; surrounding text kept
Diff / changed files content	No	Never captured
Raw event payload	No	Always excluded

Review and comment events

Field	Stored	Notes
Review state	Yes	`"approved"`, `"changes_requested"`
Review body	Sanitized	Code blocks removed
Comment body	Sanitized	Code blocks removed
Code diffs	No	Contains actual code — always excluded
File path	Yes	`"src/payments/retry.ts"` (path only, not content)

Push events

Field	Stored	Notes
Branch	Yes	`"main"`
Commit messages	Yes	Human-written text describing intent
Commit identifiers	No	Excluded
File lists (added/modified/removed)	No	Excluded
File contents	No	Not included in event payloads

Commit messages are written by developers and may occasionally reference code patterns. Cogniscape stores them as-is because they represent developer intent, not source code.

Understanding LLM-reconstructed content

This is the most important section of this document. Even with all code stripped from stored data, you may occasionally see what looks like source code in a Cogniscape MCP response. Here is why.

What the knowledge graph actually contains

When Cogniscape processes a sanitized event, our AI engine extracts entities and facts in natural language. For example, from a pull request review that discusses a timestamp bug fix, the graph might store the following (function and variable names are extracted from PR discussions, not from source code): Entities:

addNotification — “A helper function that captures the current ID before incrementing to ensure correct timestamp alignment in notification creation.”
currentId — “A variable used to generate sequential notification IDs and corresponding timestamp offsets.”

Facts:

“The addNotification helper was introduced to fix an off-by-one bug where the template literal evaluated before the ID increment.”

These are natural-language descriptions. There is no stored code.

How code-like content appears in responses

When you query the Cogniscape MCP — for example, asking “What technical issues were found in the notifications PR?” — the following happens:

The Cogniscape MCP searches the knowledge graph and retrieves relevant entities and facts
These results are passed to your LLM (the one powering your Claude Code, Claude Desktop, or other MCP client)
Your LLM synthesizes a response from the semantic descriptions

Because the entity names are code identifiers (addNotification, currentId) and the fact descriptions are detailed enough to convey the logic, your LLM can reconstruct plausible code as part of its response. It is doing what LLMs do — generating the most helpful answer from the context it received.

The code in such responses is generated by your own LLM at query time, not retrieved from the Cogniscape database. It may not even match your actual implementation — it is the LLM’s best interpretation of the semantic descriptions.

A concrete example

Here is what is stored in the graph versus what your LLM might generate:

What Cogniscape stores
What your LLM might generate

Entity: addNotification
Summary: "A helper function that captures the current ID before
incrementing to ensure correct timestamp alignment."

Fact: "The template literal was previously causing potential
misalignment due to how the ID was being incremented
inside the function."

// Before — bug: currentId++ increments AFTER being
// read in the template literal
`notif-${currentId++}`

// After — fix: helper captures id BEFORE incrementing
const addNotification = () => {
    const id = currentId;
    currentId++;
    return { id: `notif-${id}`, timestamp: ... }
}

This code was not stored anywhere in Cogniscape. The LLM reconstructed it from the natural-language descriptions to illustrate the concept in its response.

Security by design

Cogniscape’s data protection is enforced at multiple layers, ensuring that no single point of failure can expose source code.

Layer	Protection
Event reception	Only selected event types are accepted; others are rejected
Normalization	Raw payloads are discarded — only structured metadata fields proceed
Sanitization	Code blocks, inline code, diff hunks, and sensitive fields are stripped
Knowledge graph	AI extracts natural-language entities and facts, not code
Cogniscape MCP	Returns semantic search results; any code in the final response is generated by the client’s own LLM

Questions?

If you have questions about how Cogniscape handles your data, contact us at [email protected].

Data Protection

​How we process developer activity

​What we store vs. what we don’t

​Pull request events

​Review and comment events

​Push events

​Understanding LLM-reconstructed content

​What the knowledge graph actually contains

​How code-like content appears in responses

​A concrete example

​Security by design

Questions?

How we process developer activity

What we store vs. what we don’t

Pull request events

Review and comment events

Push events

Understanding LLM-reconstructed content

What the knowledge graph actually contains

How code-like content appears in responses

A concrete example

Security by design