Project

Score Prediction from User Logs with BERT

nlpbertuser-modelingeducation-technologytransformers

Applying BERTbased sequence modeling to predict user performance scores from interaction logs - demonstrating how transformer architectures can extract learning

User logs - sequences of clicks, answers, navigation paths, and time-on-task - contain rich signals about cognitive engagement and learning progress. This project explores whether a BERT-based sequence model can extract those signals well enough to predict a user's eventual score on a knowledge assessment.

Problem Framing

Given: a sequence of user interactions with a learning platform (e.g., which questions were attempted, answer correctness, time between attempts, navigation patterns).

Predict: the user's final assessment score.

This is a sequence classification problem. The natural language processing analogy: each interaction is a "token," and the full session is a "sentence" whose meaning (predicted score) we want to model.

Approach

BERT's pre-trained representations are adapted to the interaction-log domain through fine-tuning. Rather than tokenizing text, the model ingests a structured sequence of interaction events, each represented as a learned embedding.

The architecture:

Event embedding layer - maps each interaction type (question attempt, resource access, navigation) to a dense vector
BERT encoder - processes the sequence with self-attention, capturing long-range dependencies between events
Classification head - maps the [CLS] token representation to a predicted score bucket

The key insight: BERT's attention mechanism is well-suited to this task because early events in a session (e.g., which topics a user struggles with in the first 10 minutes) have predictive value for later outcomes, and standard RNNs underweight these early signals.

Results and Takeaways

Fine-tuned BERT outperformed LSTM and GRU baselines on the held-out test set, with the performance gap widening for longer sessions - consistent with the hypothesis that self-attention better captures long-range dependencies in behavioral sequences.

This project illustrates a recurring theme in applied ML: the right framing matters as much as the architecture. Treating user logs as sequences analogous to text, and adapting an NLP architecture accordingly, produced a better model than purpose-building a time-series predictor from scratch.

I write about this kind of work - reliability, uncertainty, building things that work in production. One email per month.