. About the Role We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents.... You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions...