Gabriel Guralnick
Full-Stack Software Engineer
Machine Learning Researcher
Cognitive Scientist
My goal is to use AI to better understand human cognition and improve the way we interact with technology.
My goal is to use AI to better understand human cognition and improve the way we interact with technology.
Ever since I was young, I've always been incredibly interested in technology.
I grew up seeing the development of all kinds of 'magical' new tools, and I'm excited to now be a part of their creation.
I'm also immensely curious about the workings of the human mind and believe one of the best ways to better understand it is through trying to replicate it.
My studies in cognitive science and computer science have given me the tools to do just that.
I'm an experienced researcher, dedicated to sitting down and figuring out how to solve problems and make things work.
At the University of Toronto, I did research on multi-agent reinforcement learning, trying to apply insights from research into social influence in online social networks to create a learning objective that would motivate more selfless action.
At Konrad Group, I researched large language model (LLM) fine-tuning and developed several full-stack proof-of-concept projects to demonstrate potential enterprise applications of LLMs, such as a tool for project managers to automatically generate Agile-style tickets from website design screenshots.
Now, at Abacus.AI, I've worked on LLM benchmarking and agent evaluations, developing novel tasks and evaluation metrics to improve capabilities and prevent regressions.
I'm a skilled software developer, with experience in full-stack web development and machine learning.
I've worked on projects in a variety of languages and frameworks, including Python, React, and Java.
I'm always looking for new opportunities to learn and grow and be involved with cutting-edge research and technology.
I thrive in fast-paced, challenging environments where I'm given the chance to explore and experiment and learn in order to solve complex problems.
November 2024 - Present
Abacus.AI
At Abacus.AI, I've worked on LLM benchmarking and agent evaluations. One of my primary focuses has been the maintenance of LiveBench, a popular general-purpose LLM benchmark that evaluates models on reasoning, mathematics, language, coding, and data analysis capabilities. I've been the primary maintainer of the benchmark, evaluating all new models as they are released and developing new tasks and refreshing existing ones to prevent contamination and score saturation as model capabilities evolve. I also developed LiveSWEBench, a novel benchmark for AI coding assistants, which evaluated models based on their performance when working on their own or in collaboration with a human developer to solve real-world coding tasks. Most recently, I've helped design and develop an evaluation suite for Abacus's DeepAgent product, a generalist agent with full computer access. This has involved creating judging systems with LLM integration to evaluate a variety of outputs, including research reports, presentations, websites, videos, and spreadsheets. Artifacts are evaluated both holistically and based on specific criteria such as factual accuracy; to support this, I've developed complex agentic evaluation pipelines to provide appropriate context to the LLM judges for a comprehensive evaluation.
September - December 2023
University of Toronto
I worked as a teaching assistant for the software design class. In addition to helping the professors grade assignments and manage the class, I mentored groups of second-year students as they developed their term projects. I helped students define goals, brainstorm ideas, and implement their solutions to complex problems in the creditors' insurance industry. I guided them throughout the semester as they built full-stack web applications to aid creditors, advising them on frameworks to use for development as well as providers for cloud hosting. Throughout this process, I made sure the projects adhered to modern software design best practices.
May - August 2023
Konrad Group
For my second summer at Konrad, I supported the company's entry into the enterprise AI space by working on a variety of proof-of-concept research projects involving multimodal machine learning pipelines and large language models. The most impressive of these was an application for automatically generating Agile-style tickets from a website screenshot that incorporated image segmentation and captioning models and an LLM. I worked extensively both with LLM APIs and self-hosted and fine-tuned models. This experience greatly improved my familiarity with all areas of applied machine learning including building, testing, and fine-tuning models and incorporating them into enterprise applications.
May - August 2022
Konrad Group
At Konrad, I contributed to the maintenance and development of KGPortal, the primary internal web application used by Konrad employees to manage projects and other internal tasks. I helped maintain and improve both the frontend and backend components, implementing new UI components and full-page redesigns as part of a new UI library. I also led development of an entirely new secure file upload service which involved updating the existing database and backend and adding integration with AWS S3 and Cron to support the new functionality. I gained meaningful experience working in the industry and further ignited my passion for software development.
LLM Playground Implemented full-stack application using React and FastAPI to interact with large language models
Social Learning in Multi-Agent Systems Collaborated with a professor and several peers to apply insights from research on social networks to multi-agent reinforcement learning.
Content Market Model of Online Social Networks Implemented and simulated model of online social networks as a content market to evaluate theoretical accuracy.
Fine-Tuning Stable Diffusion Evaluated the efficacy of LoRA, Textual Inversion, and Dreambooth for introducing new concepts to diffusion image generators
GRIMM AutoFinance Created and deployed a web application to help car buyers save time in the dealership
Wikipedia 6 Degrees of Separation Represented Wikipedia article categories using a graph to model connections and implemented the PageRank algorithm to judge article importance and relevance.
Climate Change Snowfall Effects Modeled the relationship between the Global Land-Ocean Temperature Index and US Snowfall by region using Pandas and Plotly. © 2025 Gabriel Guralnick