Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No discussion on problem difficulty, or on result quality besides "the Edgee run generated slightly more output tokens than the baseline".


More info in the GitHub repo, in the reports folder (sorry, I'm not sure I can add the link here without being flagged).

"Codex + Edgee consumes roughly half the fresh tokens of the normal Codex baseline. Output tokens are marginally higher (+3,312, +19.5%), suggesting the Edgee scenario produces slightly more verbose responses but dramatically reduces context ingestion."


I think the problem being given to Codex for the benchmark is the one in the attached video, where two Codex run side-by-side, working a "standard" dev thingy




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: