Summary
We identified and resolved a data quality issue affecting how AI-assisted coding sessions (Claude Code) were captured and displayed in Cogniscape dashboards. Instead of showing meaningful details about what developers worked on — files touched, technologies used, objectives and outcomes — the system was producing generic entries such as “Developer completed a session.” This significantly reduced the value of the five core visibility dimensions Cogniscape provides: what, who, when, how, and why. The issue was fully resolved within 24 hours. Session data quality improved from 4/10 to approximately 8/10 across all dimensions.What Was Affected
Engineering managers relying on Cogniscape for AI coding session insights were seeing:- Vague descriptions — no detail about what was actually built or investigated
- Missing file data — no visibility into which files or areas of the codebase were touched
- Incorrect branch information — always showed the default branch, even for feature work
- Incomplete session capture — only the first few minutes of long sessions were reflected
Root Cause
Multiple issues in the data processing pipeline compounded to degrade session quality:- Data loss during processing — session details (title, result, files) were being dropped when certain metadata was present, replaced by a generic description
- Only the first snapshot was kept — long-running sessions send periodic updates, but only the earliest update was retained, missing hours of subsequent work
- Unstructured summaries — raw conversation fragments were used instead of structured descriptions, making it difficult to extract meaningful insights
- Missing metadata — rich session data (files modified, commands run, tools used) was available but not being captured
- Noisy file references — file paths included developer-specific system paths instead of clean, project-relative references
Resolution
We deployed a series of targeted fixes over four releases:| Phase | What Changed |
|---|---|
| 1 | Fixed data loss — all session details are now preserved through the full processing pipeline |
| 2 | Improved entity classification — entities are now correctly categorized (developers, repositories, pull requests, etc.) |
| 3 | Enabled progressive session capture — each session update is now stored, reflecting the full scope of work regardless of duration |
| 4 | Added AI-powered summarization — raw session data is automatically transformed into structured summaries covering objective, work performed, technologies used, and outcome |
- File paths are now clean and project-relative
- Branch detection works correctly for all Git workflows
- Metadata noise was reduced by 90%
Results
Quality Score Comparison
| Dimension | Before | After |
|---|---|---|
| What (work performed) | 2/10 | 8/10 |
| Who (people involved) | 3/10 | 7/10 |
| When (timing) | 9/10 | 9/10 |
| How (methods and files) | 1/10 | 8/10 |
| Why (motivation and context) | 1/10 | 7/10 |
| Overall | 4/10 | ~8/10 |
Key Improvements
| Metric | Before | After |
|---|---|---|
| Files captured per session | 0 | 30+ (modified and read) |
| Commands captured | 0 | 5 most recent |
| Tool usage breakdown | Not available | Full session breakdown |
| Branch accuracy | Incorrect | Correct |
| Session coverage | First few minutes only | Entire session |
| Summary quality | Generic fragments | Structured objective/work/outcome |
Privacy
During the investigation, we evaluated capturing developer prompts (the messages typed to the AI assistant) to improve “why” context. This was immediately rejected as a privacy violation — no such data was ever persisted or made available. Cogniscape’s policy remains unchanged: we never capture what developers type to their AI coding assistants. All insights are derived from structured activity metadata (files, commands, tool usage) and AI-generated summaries of the assistant’s responses only.Lessons Learned
- Test with realistic data — the primary bug went undetected because test scenarios didn’t match real-world session patterns
- Progressive data matters — coding sessions evolve over hours; capturing only the initial state misses the majority of the work
- AI summarization is essential — raw conversation text is not a summary; AI-powered structuring dramatically improves downstream analysis
- Privacy by design — even when a data point would improve analytics, it must be evaluated against trust and compliance requirements first
Timeline
| Date | Milestone |
|---|---|
| March 20, 9:00 AM | Issue identified — quality investigation started |
| March 20, 3:00 PM | Root causes identified |
| March 20, 7:30 PM | First two fixes deployed |
| March 21, 9:30 AM | All fixes deployed and verified in production |
| March 21, 9:52 AM | Updated client (v1.6.0) released |
| March 21, 9:56 AM | Auto-update to all active installations confirmed |
Next Steps
- Minor client patch (v1.6.1) to clean up a residual configuration item
- Continued improvements to entity classification and deduplication
- Additional data sanitization for sensitive command-line content