SimpleQA - Build High Quality Benchmark Dataset with LLM + Human

November 05, 2024 in research papers

The OpenAI team built a new benchmark dataset called SimpleQA that evaluates large language models' (LLMs) ability to answer factual questions. A particularly intriguing aspect of this paper is, in this era of LLMs, how the team of researchers leverages LLMs in their own workflow to design, iterate, and analyze a new dataset.

Thought Preference Optimization (TPO)

October 27, 2024 in research papers

Thought Preference Optimization (TPO): Prompt the model to generate a thought process followed by the response. TPO demonstrates significant performance gains on non-reasoning categories, including translation, marketing, and health; reasoning categories like math and analysis also show improvements.