Software Architect - Agentic Evals Job at Datagrid AI, Sunnyvale, CA

VExJQmNYOVhFUlUyTUJaVm5xak5EQjJ0dWc9PQ==
  • Datagrid AI
  • Sunnyvale, CA

Job Description

Fully remote, with the exception of occasional meetings in San Francisco to collaborate.

Bay Area residency required.

We believe that everyone deserves their own personal army of AI helpers with deep access to company data to automate any task. Datagrid ingests business data continuously from 100+ sources, makes it all available to AI, and eliminates grunt-work such as categorizing 10k support tickets in minutes.

We are a Series-A startup headquartered in San Francisco, but operate as a distributed company. We offer competitive salaries and health benefits, along with equity and respect for work/life balance.

Join our tight-knit team that ships fast and pushes the boundaries of AI! In the last few months, our agents learned to use Microsoft Teams, write SQL queries, and automate tasks on complex schedules like “MWF at half past 9”. Our Agents live where people work (Slack, Microsoft Teams, etc.) and automatically take useful actions like producing safety reports from worksite photos.

Responsibilities

Datagrid Agents operate where our customers work- across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app. We cannot possibly test this all manually.

Your job will be to:

  • Work closely with an ex-Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development and in CI/CD pipelines, and set up alerting for when Agents misbehave.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Choose the best open/closed source components to build out the testing infra.
  • Integrate publicly available benchmarks such as RAGBench into the testing system.
  • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
  • Expose evaluation performance so the company can track improvement over time.

Desired Experience

  • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1.
  • 10+ years of B2B software engineering experience.
  • Ability to write effective LLM prompts without assistance.
  • Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
  • Familiarity with JavaScript frameworks such as React, Angular JS.
  • Experience with databases such as Weaviate and BigQuery.
  • Experience working with GCP or similar cloud providers.

Salary Range: $200k - $240k

Equity

100% covered medical, dental and vision

401k

All candidates for this role will be asked the following interview question: “Work with me to design a system to evaluate the Agent’s performance at SQL queries.” We don’t expect you to have the perfect answer, but will evaluate you on your ability to clearly explain your thinking.

Job Tags

Local area, Remote job,

Similar Jobs

Alpha Business Images, LLC ("ABI"​)

Associate Creative Director Job at Alpha Business Images, LLC ("ABI"​)

 ...The Associate Creative Director (ACD) for Alpha Business Images, LLC (ABI) plays a pivotal role in driving creative excellence within the agency. Reporting to the Creative Director, the ACD bridges creative vision and execution, managing teams and projects to deliver... 

Interior Talent

Senior Interior Designer Job at Interior Talent

 ...Senior Interior Designer | Austin, TX This is an exciting opportunity to join an established architecture firm in Austin, TX. This Senior Interior Designer opportunity is an impactful role working directly with clients and the team. The Senior Interior Designer... 

Accretional

Software Engineering Intern (AI/ML) Job at Accretional

 ...shareable agentic workflows and demo projects within our AI developer product Brilliant. Integrate 3rd party software (frontend frameworks...  ...opportunities Paid trips to San Francisco for remote interns Lunch stipend for interns working onsite NOTE: As this... 

Net2Source Inc.

Customer support (Banking/Finance) Job at Net2Source Inc.

 ...Right Talent Right Time Right Place Right Price and acting as a Career Coach to our consultants. Role: Customer support (Banking/Finance) Location: Bentonville, AR / Onsite Job Duration: 6 Months+ Contract COMPETENCIES SKILLS Knowledge of KYC (... 

Route Elite

FedEx Delivery Driver Job at Route Elite

 ...Join our team and begin your future in FedEx Delivery TODAY! with the local company AEM Express, Inc. , out of Bessemer, AL. We are hiringregular (full-time/part-time) FedEx Delivery driverswith work-related driving experience and a strong work ethic to make local...