Gabriel Guralnick

Full-Stack Software Engineer

Machine Learning Researcher

Cognitive Scientist

My goal is to use AI to better understand human cognition and improve the way we interact with technology.

About Me

Ever since I was young, I've always been incredibly interested in technology. I grew up seeing the development of all kinds of 'magical' new tools, and I'm excited to now be a part of their creation. I'm also immensely curious about the workings of the human mind and believe one of the best ways to better understand it is through trying to replicate it. My studies in cognitive science and computer science have given me the tools to do just that.

I'm an experienced researcher, dedicated to sitting down and figuring out how to solve problems and make things work. At the University of Toronto, I did research on multi-agent reinforcement learning, trying to apply insights from research into social influence in online social networks to create a learning objective that would motivate more selfless action. At Konrad Group, I researched large language model (LLM) fine-tuning and developed several full-stack proof-of-concept projects to demonstrate potential enterprise applications of LLMs, such as a tool for project managers to automatically generate Agile-style tickets from website design screenshots. Now, at Abacus.AI, I've worked on LLM benchmarking and agent evaluations, developing novel tasks and evaluation metrics to improve capabilities and prevent regressions.

I'm a skilled software developer, with experience in full-stack web development and machine learning. I've worked on projects in a variety of languages and frameworks, including Python, React, and Java. I'm always looking for new opportunities to learn and grow and be involved with cutting-edge research and technology. I thrive in fast-paced, challenging environments where I'm given the chance to explore and experiment and learn in order to solve complex problems.

Experience

Abacus.AI

November 2024 - Present

Research Software Engineer

Abacus.AI

At Abacus.AI, I've worked on LLM benchmarking and agent evaluations. One of my primary focuses has been the maintenance of LiveBench, a popular general-purpose LLM benchmark that evaluates models on reasoning, mathematics, language, coding, and data analysis capabilities. I've been the primary maintainer of the benchmark, evaluating all new models as they are released and developing new tasks and refreshing existing ones to prevent contamination and score saturation as model capabilities evolve. I also developed LiveSWEBench, a novel benchmark for AI coding assistants, which evaluated models based on their performance when working on their own or in collaboration with a human developer to solve real-world coding tasks. Most recently, I've helped design and develop an evaluation suite for Abacus's DeepAgent product, a generalist agent with full computer access. This has involved creating judging systems with LLM integration to evaluate a variety of outputs, including research reports, presentations, websites, videos, and spreadsheets. Artifacts are evaluated both holistically and based on specific criteria such as factual accuracy; to support this, I've developed complex agentic evaluation pipelines to provide appropriate context to the LLM judges for a comprehensive evaluation.

University of Toronto

September - December 2023

Teaching Assistant: Software Design

University of Toronto

I worked as a teaching assistant for the software design class. In addition to helping the professors grade assignments and manage the class, I mentored groups of second-year students as they developed their term projects. I helped students define goals, brainstorm ideas, and implement their solutions to complex problems in the creditors' insurance industry. I guided them throughout the semester as they built full-stack web applications to aid creditors, advising them on frameworks to use for development as well as providers for cloud hosting. Throughout this process, I made sure the projects adhered to modern software design best practices.

Konrad Group

May - August 2023

Associate Software Developer, Intern

Konrad Group

For my second summer at Konrad, I supported the company's entry into the enterprise AI space by working on a variety of proof-of-concept research projects involving multimodal machine learning pipelines and large language models. The most impressive of these was an application for automatically generating Agile-style tickets from a website screenshot that incorporated image segmentation and captioning models and an LLM. I worked extensively both with LLM APIs and self-hosted and fine-tuned models. This experience greatly improved my familiarity with all areas of applied machine learning including building, testing, and fine-tuning models and incorporating them into enterprise applications.

Konrad Group

May - August 2022

Associate Software Developer, Intern

Konrad Group

At Konrad, I contributed to the maintenance and development of KGPortal, the primary internal web application used by Konrad employees to manage projects and other internal tasks. I helped maintain and improve both the frontend and backend components, implementing new UI components and full-page redesigns as part of a new UI library. I also led development of an entirely new secure file upload service which involved updating the existing database and backend and adding integration with AWS S3 and Cron to support the new functionality. I gained meaningful experience working in the industry and further ignited my passion for software development.

My Projects

© 2025 Gabriel Guralnick