Code Provenance Logger
Code Provenance Logger is a VS Code extension for CS1 (Introduction to Computer Science) education and research.
It logs how students write, modify, execute, and debug code, while preserving privacy by design.
The extension focuses on coding behavior and process, not on storing or reconstructing source code content.
Features
Usage
Install the extension
Install using a .vsix file:
VS Code → Extensions → … → Install from VSIX…
or via command line:
code --install-extension code-provenance-logger-0.0.6.vsix
Restart VS Code
The extension activates automatically on startup.
Code as usual
Write, save, and run programs normally.
No additional interaction is required for local logging.
Find local log files
By default, logs are stored as JSONL files under:
Windows
%APPDATA%\Code\User\globalStorage\HongwonJeong.code-provenance-logger\logs
Each file corresponds to a single VS Code session.
Local Logging and Server Upload
- Local Logging (Default, No Token Required)
- Enabled by default (localLoggingEnabled = true)
- Logs are saved only on the local machine
- Suitable for personal use, offline analysis, and classroom deployment without server infrastructure
No configuration or token is required.
Server Upload (Optional, Token Required)
Server upload is disabled by default and requires explicit opt-in.
Data Schema Overview
This section provides a high-level overview of the data schema to clarify what is collected and what is not.
Session Header
Recorded once per session, at the beginning of the log.
- schemaVersion: schema version number
- clientId: anonymized, persistent identifier for the local environment
- sessionId: unique identifier for the current VS Code session
- sessionStartTs: session start timestamp
- workspaceName: workspace name (if available)
- extensionVersion: extension version string
Batch Record
Each batch groups multiple events flushed together.
- batchId: unique batch identifier
- batchStartTs, batchEndTs: batch time window
- flushReason: trigger for the batch (interval, save, run, etc.)
- eventCount: number of events in the batch
- metrics:
- Inter-event time intervals
- Approximate typing speed
- Edit burst count
- batchHash: integrity hash for the batch
Event Types (Summary)
The following event types may appear inside a batch:
- Edit
- Edit type (insert / delete / replace)
- Text length, newline count
- Character class distribution
- Hashed text content
- Save
- Hashed full-document content
- Line count
- Selection
- Cursor position
- Selection ranges
- Open / Close
- Timestamped file access events
- Run (Task / Terminal)
- Run count and phase (start / end)
- Exit code when available
- Hashed command line (terminal only)
- Diagnostics
- Total diagnostic counts by severity
- Top diagnostics summarized by hashes
Raw source code, raw messages, and raw commands are never stored.
Logged Data (Summary)
- Edit statistics (length, newline count, character classes)
- Save-time document hash and line count
- Cursor position and selection ranges
- File open and close timestamps
- Program run attempts and exit codes (when available)
- Diagnostic counts and summaries
Use Cases
- Distinguishing typing-based coding from copy-and-paste behavior
- Analyzing error–fix–run cycles in CS1 assignments
- Studying students’ coding strategies and learning progress
- Building datasets for educational data mining
- Supporting research on programming process and code provenance
Notes
- Intended for educational and research purposes
- User consent is strongly recommended before classroom deployment
- The extension is designed to minimize data collection by default
- Server upload is strictly opt-in and token-gated
License
MIT
Author
Hongwon Jeong
Hanyang University
beatsbywoni@hanyang.ac.kr
| |