I study the long-term effects of AI use on people through large-scale randomized experiments, and use the findings to design better evaluations and post-training methods. These outcomes have received little attention partly because the experiments and longitudinal studies needed to measure them are difficult to design and run. The signals we currently use to train and evaluate models are mostly short-term preferences, which resemble surveys more than experiments, and say little about outcomes like learning, creativity, and wellbeing. I want to bring signals about those outcomes into the models.
LLMs can enhance human creativity when people co-create with them, but it is unclear what happens to unassisted human creativity. In our CHI 2025 paper, we conducted randomized experiments with 1,100 participants and found that LLM assistance boosted creative performance during use but hindered independent performance afterward, with evidence of homogenization (code). In ongoing work, we study how LLMs with different personas affect mathematical problem solving and open-ended writing (under review; code). In a CHI 2026 paper led by Jiayin Zhi, we investigated how the timing of LLM access affects critical thinking under time constraints. Larger experiments are underway, including on the emotional effects of creating with LLMs and on how AI involvement changes the way others receive a creative artifact.
Does help from an LLM build understanding that persists once the model is gone? I have conducted randomized experiments in classrooms and online to study this: on guiding students in their use of LLMs (CSCW 2024), supporting self-reflection at scale (L@S 2024), and math tutoring with 1,800 learners (AIED 2025). These experiments used designs that test participants after assistance is removed, separating performance with AI from capability without it. I also design interventions that target outcomes beyond retention, such as learners’ self-confidence and interest in the subject.
People increasingly bring personal struggles to LLMs. With Mental Health America, I led the technical side of a text-messaging intervention that was deployed to over 10,000 people (IAAI 2024), part of the Small Steps program (Journal of Affective Disorders). In our CHI 2026 paper, we compared AI and human responses to real advice-seeking posts. Our ICML 2026 position paper argues that LLMs should be optimized for our wellbeing rather than short-horizon preferences. I am currently running longitudinal studies and designing LLM agents that optimize for longer-term wellbeing outcomes rather than immediate user satisfaction alone.
My experiments run in both controlled and field settings: online platforms, classrooms, and deployed interventions. For accepted papers, I release the code and data (wherever allowed) on GitHub so that others can build on these experiments. I am comfortable with the LLM post-training stack: supervised fine-tuning and preference optimization (TRL, LoRA/QLoRA), inference with vLLM, multi-GPU training, and LLM-as-judge evaluation pipelines.
My long-term agenda is to build AI systems with positive societal impact, where success is measured not only by immediate task performance, but by how these systems shape human learning, creativity, and wellbeing over time. I want to run longitudinal experiments over longer horizons, and to scale the evaluations these experiments make possible. I also want to move beyond the single user as the unit of analysis: many of the effects I care about play out in groups and communities, and these can be studied experimentally too. On the technical side, I want to develop post-training methods tailored directly to the domains I study, so that what we measure changes how models are trained.
The problems I work on are interdisciplinary. Besides HCI and ML researchers, I have worked with psychologists, philosophers, economists, learning scientists, clinicians, tutors, and artists. I have led teams, as in the Mental Health America project, and worked within larger organizations, as an Applied Scientist Intern at Microsoft. I regularly work with undergraduates, many of whom are co-authors on my papers. I have also served on the program committees of AIED (2025) and Learning@Scale (2025–26), and as an expert reviewer for the Tools Competition. If you work on similar questions, I’d be happy to talk.
Website template adapted from Abhraneel Sarma.