AI benchmarks don't work the way they should.



A new system lets people decide what really matters when judging models. It could be something critical, like ensuring alignment for human safety, or something small, like avoiding em dashes in text.

With this system, you design the tests.
DON-1.76%
WORK-3.94%
IN-16.64%
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 5
  • Repost
  • Share
Comment
0/400
AirdropLickervip
· 20h ago
You can set your own standards now, not bad.
View OriginalReply0
TaxEvadervip
· 08-17 01:22
Ah, yes, yes, yes. Let the old man design it himself. That's pretty good.
View OriginalReply0
StealthDeployervip
· 08-17 01:19
The indicators still have to be determined by humans.
View OriginalReply0
WhaleWatchervip
· 08-17 01:18
Another process test? Same old routine.
View OriginalReply0
nft_widowvip
· 08-17 00:56
The test standards still depend on the person's settings.
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)