AI Coding Session Data Quality

Date: March 21, 2026 Severity: High Status: Resolved Affected Feature: AI coding session visibility in dashboards

Summary

We identified and resolved a data quality issue affecting how AI-assisted coding sessions (Claude Code) were captured and displayed in Cogniscape dashboards. Instead of showing meaningful details about what developers worked on — files touched, technologies used, objectives and outcomes — the system was producing generic entries such as “Developer completed a session.” This significantly reduced the value of the five core visibility dimensions Cogniscape provides: what, who, when, how, and why. The issue was fully resolved within 24 hours. Session data quality improved from 4/10 to approximately 8/10 across all dimensions.

What Was Affected

Engineering managers relying on Cogniscape for AI coding session insights were seeing:

Vague descriptions — no detail about what was actually built or investigated
Missing file data — no visibility into which files or areas of the codebase were touched
Incorrect branch information — always showed the default branch, even for feature work
Incomplete session capture — only the first few minutes of long sessions were reflected

Timestamp accuracy was unaffected (9/10 throughout).

Root Cause

Multiple issues in the data processing pipeline compounded to degrade session quality:

Data loss during processing — session details (title, result, files) were being dropped when certain metadata was present, replaced by a generic description
Only the first snapshot was kept — long-running sessions send periodic updates, but only the earliest update was retained, missing hours of subsequent work
Unstructured summaries — raw conversation fragments were used instead of structured descriptions, making it difficult to extract meaningful insights
Missing metadata — rich session data (files modified, commands run, tools used) was available but not being captured
Noisy file references — file paths included developer-specific system paths instead of clean, project-relative references

Resolution

We deployed a series of targeted fixes over four releases:

Phase	What Changed
1	Fixed data loss — all session details are now preserved through the full processing pipeline
2	Improved entity classification — entities are now correctly categorized (developers, repositories, pull requests, etc.)
3	Enabled progressive session capture — each session update is now stored, reflecting the full scope of work regardless of duration
4	Added AI-powered summarization — raw session data is automatically transformed into structured summaries covering objective, work performed, technologies used, and outcome

Additional improvements:

File paths are now clean and project-relative
Branch detection works correctly for all Git workflows
Metadata noise was reduced by 90%

Results

Quality Score Comparison

Dimension	Before	After
What (work performed)	2/10	8/10
Who (people involved)	3/10	7/10
When (timing)	9/10	9/10
How (methods and files)	1/10	8/10
Why (motivation and context)	1/10	7/10
Overall	4/10	~8/10

Key Improvements

Metric	Before	After
Files captured per session	0	30+ (modified and read)
Commands captured	0	5 most recent
Tool usage breakdown	Not available	Full session breakdown
Branch accuracy	Incorrect	Correct
Session coverage	First few minutes only	Entire session
Summary quality	Generic fragments	Structured objective/work/outcome

Privacy

During the investigation, we evaluated capturing developer prompts (the messages typed to the AI assistant) to improve “why” context. This was immediately rejected as a privacy violation — no such data was ever persisted or made available. Cogniscape’s policy remains unchanged: we never capture what developers type to their AI coding assistants. All insights are derived from structured activity metadata (files, commands, tool usage) and AI-generated summaries of the assistant’s responses only.

Lessons Learned

Test with realistic data — the primary bug went undetected because test scenarios didn’t match real-world session patterns
Progressive data matters — coding sessions evolve over hours; capturing only the initial state misses the majority of the work
AI summarization is essential — raw conversation text is not a summary; AI-powered structuring dramatically improves downstream analysis
Privacy by design — even when a data point would improve analytics, it must be evaluated against trust and compliance requirements first

Timeline

Date	Milestone
March 20, 9:00 AM	Issue identified — quality investigation started
March 20, 3:00 PM	Root causes identified
March 20, 7:30 PM	First two fixes deployed
March 21, 9:30 AM	All fixes deployed and verified in production
March 21, 9:52 AM	Updated client (v1.6.0) released
March 21, 9:56 AM	Auto-update to all active installations confirmed

Total time to resolution: ~24 hours

Next Steps

Minor client patch (v1.6.1) to clean up a residual configuration item
Continued improvements to entity classification and deduplication
Additional data sanitization for sensitive command-line content

​Summary

​What Was Affected

​Root Cause

​Resolution

​Results

​Quality Score Comparison

​Key Improvements

​Privacy

​Lessons Learned

​Timeline

​Next Steps