게시물 작성자 Sara Wiltberger

1 결과

필터 지우기
  • 2025년 8월 27일 / Google Labs

    Stop “vibe testing” your LLMs. It's time for real evals.

    Stax, an experimental developer tool, addresses the insufficient nature of "vibe testing" LLMs by streamlining the LLM evaluation lifecycle, allowing users to rigorously test their AI stack and make data-driven decisions through human labeling and scalable LLM-as-a-judge auto-raters.

    Stax