Gabriel Guralnick

My goal is to use AI to better understand human cognition and improve the way we interact with technology.

About Me

Imbue

January 2026 - Present

Member of Technical Staff

Imbue

Abacus.AI

November 2024 - December 2025

Research Software Engineer

Abacus.AI

I worked on LLM benchmarking and agent evaluations. I was the primary maintainer of LiveBench, a popular general-purpose LLM benchmark that evaluates models on reasoning, mathematics, language, coding, and data analysis capabilities. I also developed LiveSWEBench, a novel benchmark for AI coding assistants, which evaluated models based on their performance when working on their own or in collaboration with a human developer to solve real-world coding tasks. Most recently, I helped design and develop an evaluation suite for DeepAgent, a generalist agent with full computer access. This involved creating complex LLM-as-a-judge systems to evaluate the quality of variety of outputs like research reports or websites.

University of Toronto

September - December 2023

Teaching Assistant: Software Design

University of Toronto

I worked as a teaching assistant for the software design class. I mentored groups of second-year students as they developed their term projects. I helped students define goals, brainstorm ideas, and implement their solutions to complex problems in the creditors' insurance industry.

Konrad Group

May - August 2023

Associate Software Developer, Intern

Konrad Group

I worked on a variety of proof-of-concept research projects involving multimodal machine learning pipelines and large language models. The most impressive of these was an application for automatically generating Agile-style tickets from a website screenshot that incorporated image segmentation and captioning models and an LLM. I worked extensively both with LLM APIs and self-hosted and fine-tuned models.

Konrad Group

May - August 2022

Associate Software Developer, Intern

Konrad Group

I contributed to the maintenance and development of KGPortal, the primary internal web application used by Konrad employees to manage projects and other internal tasks. I helped maintain and improve both the frontend and backend components, implementing new UI components and full-page redesigns as part of a new UI library. I also led development of an entirely new secure file upload service which involved updating the existing database and backend and adding integration with AWS S3 and Cron to support the new functionality.

My Projects

© 2026 Gabriel Guralnick