Human Combo Leaderboard: 30 day

  • Ranking: The position of the model in the leaderboard as ordered by Overall Score
  • Organization: The group responsible for the model or forecasts
  • Model: The LLM model & prompt info or the human group and forecast aggregation method
    • zero shot: used a zero-shot prompt
    • scratchpad: used a scratchpad prompt with instructions that outline a procedure the model should use to reason about the question
    • with freeze values: means that, for questions from market sources, the prompt was supplemented with the aggregate human forecast from the relevant platform on the day the question set was generated
    • with news: means that the prompt was supplemented with relevant news summaries obtained through an automated process
  • Dataset Score: The average Brier score across all questions sourced from datasets
  • Market Score (resolved): The average Brier score across all resolved questions sourced from prediction markets and forecast aggregation platforms
  • Market Score (unresolved): The average Brier score across all unresolved questions sourced from prediction markets and forecast aggregation platforms
  • Market Score (overall): The average Brier score across all questions sourced from prediction markets and forecast aggregation platforms
  • Overall Resolved Score: The average of the Dataset Score and the Market Score (resolved) columns
  • Overall Score: The average of the Dataset Score and the Market Score (overall) columns
  • Overall Score 95% CI: The 95% confidence interval for the Overall Score
  • Pairwise p-value comparing to No. 1 (bootstrapped): The p-value calculated by bootstrapping the differences in overall score between each model and the best forecaster (the group with rank 1) under the null hypothesis that there's no difference.
  • Pct. more accurate than No. 1: The percent of questions where this forecaster had a better overall score than the best forecaster (with rank 1)
  • Pct. imputed: The percent of questions for which this forecaster did not provide a forecast and hence had a forecast value imputed (0.5 for dataset questions and the aggregate human forecast on the forecast due date for questions sourced from prediction markets or forecast aggregation platforms)
Ranking Organization Model Dataset Score (N=442) Market Score (resolved) (N=26) Market Score (unresolved) (N=275) Market Score (overall) (N=301) Overall Resolved Score (N=468) Overall Score (N=743) Overall Score 95% CI Pairwise p-value comparing to No. 1 (bootstrapped) Pct. more accurate than No. 1 Pct. Imputed
1 ForecastBench Superforecaster median forecast 0.095 0.064 0.037 0.040 0.080 0.068 [0.058, 0.078] 0% 0%
2 ForecastBench Public median forecast 0.124 0.137 0.029 0.039 0.130 0.081 [0.071, 0.091] <0.001 28% 0%
3 Anthropic Claude-3-5-Sonnet-20240620 (scratchpad with freeze values) 0.165 0.185 0.058 0.069 0.175 0.117 [0.104, 0.13] <0.001 28% 0%
4 OpenAI GPT-4o (scratchpad with news with freeze values) 0.180 0.136 0.053 0.060 0.158 0.120 [0.109, 0.131] <0.001 27% 0%
5 OpenAI GPT-4-Turbo-2024-04-09 (scratchpad with freeze values) 0.173 0.153 0.060 0.068 0.163 0.121 [0.108, 0.133] <0.001 26% 0%
6 OpenAI GPT-4o (scratchpad with freeze values) 0.198 0.146 0.039 0.048 0.172 0.123 [0.111, 0.135] <0.001 28% 0%
7 Google Gemini-1.5-Pro (scratchpad with freeze values) 0.156 0.195 0.082 0.092 0.175 0.124 [0.112, 0.136] <0.001 24% 0%
8 Anthropic Claude-3-5-Sonnet-20240620 (scratchpad with news with freeze values) 0.170 0.187 0.078 0.088 0.179 0.129 [0.116, 0.142] <0.001 23% 0%
9 Google Gemini-1.5-Pro (scratchpad with news with freeze values) 0.154 0.191 0.096 0.104 0.173 0.129 [0.117, 0.142] <0.001 23% 0%
10 Anthropic Claude-3-5-Sonnet-20240620 (zero shot with freeze values) 0.183 0.214 0.067 0.079 0.199 0.131 [0.117, 0.145] <0.001 26% 0%
11 Anthropic Claude-3-Opus-20240229 (zero shot with freeze values) 0.182 0.164 0.075 0.083 0.173 0.132 [0.118, 0.146] <0.001 27% 0%
12 OpenAI GPT-4-Turbo-2024-04-09 (zero shot with freeze values) 0.203 0.156 0.055 0.064 0.179 0.133 [0.12, 0.147] <0.001 29% 0%
13 Google Gemini-1.5-Pro (scratchpad) 0.156 0.221 0.102 0.112 0.189 0.134 [0.122, 0.147] <0.001 21% 0%
14 OpenAI GPT-4-Turbo-2024-04-09 (scratchpad) 0.173 0.220 0.084 0.096 0.196 0.135 [0.123, 0.146] <0.001 18% 0%
15 Qwen Qwen1.5-110B-Chat (scratchpad with freeze values) 0.177 0.151 0.088 0.093 0.164 0.135 [0.124, 0.146] <0.001 17% 0%
16 Anthropic Claude-3-5-Sonnet-20240620 (scratchpad) 0.165 0.256 0.091 0.105 0.210 0.135 [0.122, 0.148] <0.001 22% 0%
17 Google Gemini-1.5-Pro (scratchpad with news) 0.154 0.183 0.110 0.116 0.169 0.135 [0.123, 0.148] <0.001 22% 0%
18 OpenAI GPT-4o (scratchpad with news) 0.180 0.170 0.086 0.093 0.175 0.137 [0.125, 0.149] <0.001 21% 0%
19 OpenAI GPT-4 (scratchpad with freeze values) 0.196 0.192 0.070 0.080 0.194 0.138 [0.127, 0.149] <0.001 20% 0%
20 Google Gemini-1.5-Pro (zero shot with freeze values) 0.207 0.109 0.065 0.069 0.158 0.138 [0.125, 0.151] <0.001 25% 15%
21 ForecastBench Imputed Forecaster 0.250 0.058 0.026 0.029 0.154 0.140 [0.134, 0.145] <0.001 28% 100%
22 Anthropic Claude-3-5-Sonnet-20240620 (scratchpad with news) 0.170 0.191 0.103 0.110 0.181 0.140 [0.128, 0.153] <0.001 20% 0%
23 OpenAI GPT-4 (zero shot with freeze values) 0.217 0.155 0.056 0.064 0.186 0.141 [0.128, 0.153] <0.001 26% 0%
24 Meta Llama-3-70b-Chat-Hf (zero shot with freeze values) 0.193 0.159 0.082 0.089 0.176 0.141 [0.128, 0.154] <0.001 23% 0%
25 Google Gemini-1.5-Flash (scratchpad with freeze values) 0.175 0.204 0.098 0.107 0.190 0.141 [0.128, 0.155] <0.001 22% 0%
26 OpenAI GPT-4o (scratchpad) 0.198 0.221 0.074 0.086 0.210 0.142 [0.13, 0.155] <0.001 20% 0%
27 Anthropic Claude-3-Opus-20240229 (scratchpad with freeze values) 0.194 0.150 0.089 0.095 0.172 0.144 [0.132, 0.157] <0.001 19% 0%
28 Anthropic Claude-3-5-Sonnet-20240620 (superforecaster with news 3) 0.177 0.182 0.105 0.111 0.179 0.144 [0.131, 0.157] <0.001 19% 2%
29 Google Gemini-1.5-Flash (scratchpad) 0.175 0.194 0.106 0.114 0.185 0.145 [0.132, 0.157] <0.001 17% 0%
30 Mistral AI Mistral-Large-Latest (scratchpad with freeze values) 0.190 0.140 0.098 0.102 0.165 0.146 [0.135, 0.157] <0.001 17% 0%
31 Meta Llama-3-70b-Chat-Hf (scratchpad with freeze values) 0.204 0.141 0.084 0.089 0.173 0.147 [0.136, 0.157] <0.001 18% 0%
32 Google Gemini-1.5-Pro (zero shot) 0.207 0.160 0.081 0.087 0.184 0.147 [0.134, 0.161] <0.001 21% 15%
33 Qwen Qwen1.5-110B-Chat (scratchpad) 0.177 0.181 0.113 0.119 0.179 0.148 [0.138, 0.158] <0.001 15% 0%
34 OpenAI GPT-4-Turbo-2024-04-09 (scratchpad with news with freeze values) 0.217 0.158 0.072 0.080 0.187 0.148 [0.136, 0.161] <0.001 24% 0%
35 ForecastBench LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news 0.251 0.114 0.042 0.048 0.182 0.149 [0.144, 0.155] <0.001 22% 81%
36 Google Gemini-1.5-Flash (zero shot with freeze values) 0.216 0.160 0.077 0.084 0.188 0.150 [0.134, 0.166] <0.001 26% 0%
37 ForecastBench LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news 0.251 0.111 0.044 0.050 0.181 0.150 [0.145, 0.156] <0.001 23% 81%
38 ForecastBench LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news 0.251 0.114 0.045 0.051 0.182 0.151 [0.145, 0.156] <0.001 22% 81%
39 Anthropic Claude-2.1 (scratchpad) 0.226 0.062 0.077 0.076 0.144 0.151 [0.141, 0.16] <0.001 24% 23%
40 OpenAI GPT-4 (scratchpad) 0.196 0.166 0.101 0.107 0.181 0.151 [0.141, 0.161] <0.001 15% 0%
41 OpenAI GPT-4-Turbo-2024-04-09 (zero shot) 0.203 0.211 0.089 0.100 0.207 0.151 [0.137, 0.166] <0.001 20% 0%
42 Google Gemini-1.5-Pro (superforecaster with news 3) 0.181 0.198 0.115 0.122 0.189 0.152 [0.138, 0.165] <0.001 21% 0%
43 Anthropic Claude-3-Opus-20240229 (zero shot) 0.182 0.188 0.116 0.122 0.185 0.152 [0.137, 0.166] <0.001 21% 0%
44 Mistral AI Mistral-Large-Latest (zero shot with freeze values) 0.204 0.178 0.097 0.104 0.191 0.154 [0.139, 0.168] <0.001 22% 0%
45 Qwen Qwen1.5-110B-Chat (scratchpad with news with freeze values) 0.203 0.161 0.105 0.109 0.182 0.156 [0.145, 0.168] <0.001 18% 0%
46 Anthropic Claude-3-5-Sonnet-20240620 (zero shot) 0.183 0.312 0.113 0.130 0.248 0.157 [0.141, 0.172] <0.001 20% 1%
47 OpenAI GPT-4-Turbo-2024-04-09 (scratchpad with news) 0.217 0.156 0.091 0.097 0.186 0.157 [0.145, 0.169] <0.001 19% 0%
48 Meta Llama-3-70b-Chat-Hf (zero shot) 0.193 0.173 0.116 0.121 0.183 0.157 [0.145, 0.169] <0.001 18% 0%
49 Anthropic Claude-2.1 (scratchpad with freeze values) 0.226 0.101 0.088 0.089 0.163 0.158 [0.146, 0.169] <0.001 23% 17%
50 OpenAI GPT-4-Turbo-2024-04-09 (superforecaster with news 3) 0.207 0.165 0.103 0.108 0.186 0.158 [0.146, 0.17] <0.001 19% 11%
51 Anthropic Claude-3-5-Sonnet-20240620 (superforecaster with news 1) 0.206 0.213 0.105 0.114 0.209 0.160 [0.148, 0.172] <0.001 20% 22%
52 OpenAI GPT-4o (superforecaster with news 3) 0.210 0.154 0.107 0.111 0.182 0.160 [0.149, 0.172] <0.001 17% 6%
53 Mistral AI Mistral-Large-Latest (scratchpad) 0.190 0.201 0.124 0.131 0.196 0.161 [0.15, 0.171] <0.001 16% 0%
54 Anthropic Claude-3-Opus-20240229 (scratchpad) 0.194 0.172 0.126 0.130 0.183 0.162 [0.149, 0.174] <0.001 17% 0%
55 OpenAI GPT-4o (superforecaster with news 1) 0.201 0.244 0.112 0.123 0.223 0.162 [0.147, 0.177] <0.001 21% 0%
56 Google Gemini-1.5-Flash (scratchpad with news with freeze values) 0.210 0.208 0.108 0.117 0.209 0.163 [0.149, 0.177] <0.001 18% 0%
57 Qwen Qwen1.5-110B-Chat (superforecaster with news 1) 0.205 0.264 0.109 0.123 0.234 0.164 [0.152, 0.175] <0.001 18% 19%
58 Mistral AI Mixtral-8x22B-Instruct-V0.1 (scratchpad with freeze values) 0.205 0.152 0.121 0.124 0.179 0.165 [0.151, 0.178] <0.001 18% 0%
59 OpenAI GPT-4o (zero shot with freeze values) 0.219 0.187 0.104 0.111 0.203 0.165 [0.149, 0.181] <0.001 27% 0%
60 OpenAI GPT-4o (zero shot) 0.219 0.259 0.097 0.111 0.239 0.165 [0.15, 0.18] <0.001 21% 3%
61 Qwen Qwen1.5-110B-Chat (scratchpad with news) 0.203 0.186 0.123 0.128 0.195 0.166 [0.154, 0.177] <0.001 17% 0%
62 Mistral AI Mixtral-8x22B-Instruct-V0.1 (scratchpad) 0.205 0.208 0.122 0.130 0.207 0.168 [0.156, 0.179] <0.001 17% 0%
63 Anthropic Claude-3-5-Sonnet-20240620 (superforecaster with news 2) 0.198 0.192 0.132 0.138 0.195 0.168 [0.154, 0.182] <0.001 20% 0%
64 Meta Llama-3-70b-Chat-Hf (scratchpad) 0.204 0.195 0.127 0.133 0.200 0.168 [0.158, 0.178] <0.001 15% 0%
65 Google Gemini-1.5-Flash (superforecaster with news 2) 0.220 0.152 0.114 0.117 0.186 0.168 [0.155, 0.182] <0.001 18% 11%
66 Qwen Qwen1.5-110B-Chat (zero shot with freeze values) 0.221 0.199 0.111 0.118 0.210 0.170 [0.155, 0.184] <0.001 20% 0%
67 Anthropic Claude-3-Opus-20240229 (scratchpad with news with freeze values) 0.214 0.165 0.123 0.127 0.190 0.170 [0.157, 0.183] <0.001 17% 0%
68 OpenAI GPT-4-Turbo-2024-04-09 (superforecaster with news 1) 0.213 0.284 0.113 0.127 0.248 0.170 [0.157, 0.184] <0.001 18% 0%
69 Anthropic Claude-2.1 (scratchpad with news) 0.233 0.206 0.098 0.108 0.220 0.170 [0.159, 0.182] <0.001 20% 14%
70 Qwen Qwen1.5-110B-Chat (superforecaster with news 3) 0.216 0.191 0.119 0.125 0.204 0.171 [0.16, 0.181] <0.001 16% 4%
71 Google Gemini-1.5-Pro (superforecaster with news 1) 0.207 0.281 0.121 0.135 0.244 0.171 [0.157, 0.185] <0.001 20% 5%
72 Anthropic Claude-3-Opus-20240229 (superforecaster with news 3) 0.199 0.160 0.142 0.144 0.180 0.171 [0.158, 0.185] <0.001 17% 6%
73 Anthropic Claude-3-Opus-20240229 (superforecaster with news 1) 0.202 0.241 0.133 0.142 0.222 0.172 [0.159, 0.185] <0.001 17% 10%
74 Meta Llama-3-8b-Chat-Hf (zero shot with freeze values) 0.223 0.214 0.113 0.122 0.218 0.172 [0.156, 0.188] <0.001 20% 0%
75 Google Gemini-1.5-Flash (scratchpad with news) 0.210 0.210 0.128 0.135 0.210 0.172 [0.159, 0.186] <0.001 17% 0%
76 Mistral AI Mixtral-8x22B-Instruct-V0.1 (zero shot with freeze values) 0.219 0.212 0.121 0.129 0.216 0.174 [0.158, 0.19] <0.001 22% 0%
77 Anthropic Claude-3-Opus-20240229 (scratchpad with news) 0.214 0.178 0.133 0.137 0.196 0.175 [0.162, 0.188] <0.001 16% 0%
78 Mistral AI Mistral-Large-Latest (zero shot) 0.204 0.156 0.147 0.147 0.180 0.176 [0.161, 0.19] <0.001 17% 0%
79 Anthropic Claude-2.1 (scratchpad with news with freeze values) 0.233 0.166 0.115 0.119 0.200 0.176 [0.164, 0.188] <0.001 20% 9%
80 OpenAI GPT-4 (zero shot) 0.217 0.145 0.135 0.136 0.181 0.176 [0.163, 0.19] <0.001 17% 0%
81 Qwen Qwen1.5-110B-Chat (zero shot) 0.221 0.186 0.128 0.133 0.203 0.177 [0.165, 0.189] <0.001 16% 1%
82 Google Gemini-1.5-Flash (superforecaster with news 3) 0.226 0.199 0.131 0.137 0.213 0.182 [0.17, 0.194] <0.001 16% 11%
83 Anthropic Claude-3-Opus-20240229 (superforecaster with news 2) 0.214 0.216 0.145 0.151 0.215 0.183 [0.169, 0.197] <0.001 18% 0%
84 Google Gemini-1.5-Flash (zero shot) 0.216 0.208 0.144 0.150 0.212 0.183 [0.167, 0.199] <0.001 18% 0%
85 Mistral AI Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 3) 0.234 0.189 0.128 0.133 0.211 0.183 [0.173, 0.194] <0.001 15% 16%
86 Mistral AI Mixtral-8x22B-Instruct-V0.1 (zero shot) 0.219 0.228 0.143 0.151 0.224 0.185 [0.17, 0.2] <0.001 18% 0%
87 Mistral AI Mixtral-8x7B-Instruct-V0.1 (scratchpad) 0.244 0.132 0.126 0.126 0.188 0.185 [0.172, 0.198] <0.001 21% 11%
88 Meta Llama-3-8b-Chat-Hf (zero shot) 0.223 0.295 0.136 0.150 0.259 0.186 [0.17, 0.203] <0.001 19% 0%
89 Mistral AI Mixtral-8x22B-Instruct-V0.1 (scratchpad with news) 0.234 0.258 0.130 0.141 0.246 0.187 [0.176, 0.199] <0.001 16% 0%
90 Mistral AI Mixtral-8x7B-Instruct-V0.1 (zero shot with freeze values) 0.259 0.163 0.113 0.117 0.211 0.188 [0.17, 0.206] <0.001 26% 0%
91 Mistral AI Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 1) 0.233 0.237 0.134 0.143 0.235 0.188 [0.175, 0.201] <0.001 17% 24%
92 Mistral AI Mixtral-8x22B-Instruct-V0.1 (scratchpad with news with freeze values) 0.234 0.227 0.135 0.143 0.231 0.189 [0.176, 0.201] <0.001 15% 0%
93 Mistral AI Mixtral-8x7B-Instruct-V0.1 (scratchpad with freeze values) 0.244 0.169 0.133 0.136 0.207 0.190 [0.176, 0.204] <0.001 21% 10%
94 Anthropic Claude-2.1 (superforecaster with news 2) 0.265 0.159 0.116 0.120 0.212 0.193 [0.177, 0.208] <0.001 22% 21%
95 Anthropic Claude-2.1 (zero shot with freeze values) 0.242 0.193 0.140 0.144 0.218 0.193 [0.178, 0.208] <0.001 22% 0%
96 Mistral AI Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 3) 0.264 0.179 0.118 0.123 0.222 0.194 [0.181, 0.207] <0.001 21% 16%
97 Mistral AI Mistral-Large-Latest (scratchpad with news with freeze values) 0.242 0.188 0.143 0.147 0.215 0.195 [0.181, 0.208] <0.001 15% 0%
98 Qwen Qwen1.5-110B-Chat (superforecaster with news 2) 0.237 0.212 0.148 0.153 0.225 0.195 [0.183, 0.208] <0.001 17% 5%
99 OpenAI GPT-4o (superforecaster with news 2) 0.263 0.213 0.121 0.129 0.238 0.196 [0.18, 0.212] <0.001 19% 6%
100 Mistral AI Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 2) 0.248 0.187 0.142 0.146 0.217 0.197 [0.185, 0.209] <0.001 16% 1%
101 Google Gemini-1.5-Pro (superforecaster with news 2) 0.256 0.305 0.122 0.138 0.280 0.197 [0.18, 0.214] <0.001 18% 0%
102 Mistral AI Mistral-Large-Latest (scratchpad with news) 0.242 0.164 0.151 0.153 0.203 0.197 [0.185, 0.21] <0.001 16% 0%
103 OpenAI GPT-4-Turbo-2024-04-09 (superforecaster with news 2) 0.255 0.214 0.136 0.143 0.234 0.199 [0.183, 0.214] <0.001 18% 10%
104 Google Gemini-1.5-Flash (superforecaster with news 1) 0.236 0.286 0.151 0.163 0.261 0.199 [0.185, 0.214] <0.001 16% 25%
105 Anthropic Claude-2.1 (superforecaster with news 1) 0.275 0.188 0.124 0.130 0.232 0.203 [0.188, 0.217] <0.001 21% 24%
106 Mistral AI Mistral-Large-Latest (superforecaster with news 2) 0.235 0.216 0.168 0.173 0.225 0.204 [0.189, 0.219] <0.001 16% 12%
107 Anthropic Claude-2.1 (superforecaster with news 3) 0.259 0.241 0.144 0.153 0.250 0.206 [0.192, 0.219] <0.001 19% 9%
108 Anthropic Claude-2.1 (zero shot) 0.242 0.242 0.165 0.172 0.242 0.207 [0.194, 0.22] <0.001 14% 0%
109 Mistral AI Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 1) 0.296 0.163 0.117 0.121 0.230 0.209 [0.194, 0.223] <0.001 21% 17%
110 Meta Llama-2-70b-Chat-Hf (scratchpad with freeze values) 0.258 0.238 0.152 0.159 0.248 0.209 [0.197, 0.22] <0.001 16% 0%
111 Mistral AI Mistral-Large-Latest (superforecaster with news 1) 0.251 0.261 0.158 0.167 0.256 0.209 [0.194, 0.223] <0.001 16% 32%
112 Mistral AI Mixtral-8x7B-Instruct-V0.1 (scratchpad with news) 0.292 0.169 0.126 0.129 0.231 0.211 [0.196, 0.225] <0.001 20% 15%
113 Mistral AI Mixtral-8x7B-Instruct-V0.1 (scratchpad with news with freeze values) 0.292 0.218 0.122 0.131 0.255 0.211 [0.197, 0.226] <0.001 20% 13%
114 Mistral AI Mistral-Large-Latest (superforecaster with news 3) 0.269 0.213 0.150 0.156 0.241 0.212 [0.2, 0.225] <0.001 16% 4%
115 ForecastBench Always 0.5 0.250 0.250 0.168 0.175 0.250 0.212 [0.207, 0.217] <0.001 13% 0%
116 Meta Llama-2-70b-Chat-Hf (scratchpad) 0.258 0.244 0.162 0.169 0.251 0.213 [0.203, 0.224] <0.001 15% 0%
117 Meta Llama-3-8b-Chat-Hf (scratchpad with freeze values) 0.273 0.213 0.152 0.157 0.243 0.215 [0.202, 0.228] <0.001 14% 0%
118 Anthropic Claude-3-Haiku-20240307 (zero shot with freeze values) 0.280 0.168 0.149 0.150 0.224 0.215 [0.202, 0.229] <0.001 16% 0%
119 Mistral AI Mixtral-8x7B-Instruct-V0.1 (zero shot) 0.259 0.230 0.170 0.175 0.244 0.217 [0.198, 0.236] <0.001 20% 0%
120 Mistral AI Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 2) 0.286 0.240 0.140 0.149 0.263 0.217 [0.202, 0.233] <0.001 23% 21%
121 Anthropic Claude-3-Haiku-20240307 (scratchpad with freeze values) 0.271 0.184 0.166 0.167 0.227 0.219 [0.207, 0.231] <0.001 15% 0%
122 Anthropic Claude-3-Haiku-20240307 (scratchpad) 0.271 0.249 0.173 0.180 0.260 0.225 [0.214, 0.236] <0.001 15% 0%
123 Anthropic Claude-3-Haiku-20240307 (superforecaster with news 2) 0.261 0.201 0.189 0.190 0.231 0.225 [0.213, 0.238] <0.001 15% 5%
124 Meta Llama-3-8b-Chat-Hf (scratchpad) 0.273 0.224 0.179 0.183 0.249 0.228 [0.216, 0.24] <0.001 13% 0%
125 Anthropic Claude-3-Haiku-20240307 (zero shot) 0.280 0.192 0.175 0.176 0.236 0.228 [0.216, 0.241] <0.001 15% 0%
126 ForecastBench Always 0 0.271 0.385 0.177 0.195 0.328 0.233 [0.205, 0.261] <0.001 35% 0%
127 Anthropic Claude-3-Haiku-20240307 (scratchpad with news with freeze values) 0.304 0.189 0.172 0.173 0.247 0.238 [0.227, 0.25] <0.001 14% 0%
128 Anthropic Claude-3-Haiku-20240307 (superforecaster with news 3) 0.282 0.221 0.193 0.196 0.252 0.239 [0.228, 0.249] <0.001 14% 21%
129 OpenAI GPT-3.5-Turbo-0125 (scratchpad with freeze values) 0.286 0.281 0.187 0.195 0.283 0.240 [0.229, 0.252] <0.001 14% 0%
130 Anthropic Claude-3-Haiku-20240307 (scratchpad with news) 0.304 0.205 0.181 0.183 0.254 0.244 [0.232, 0.255] <0.001 14% 0%
131 OpenAI GPT-3.5-Turbo-0125 (scratchpad) 0.286 0.284 0.203 0.210 0.285 0.248 [0.236, 0.26] <0.001 14% 0%
132 Meta Llama-2-70b-Chat-Hf (zero shot with freeze values) 0.303 0.234 0.212 0.214 0.269 0.258 [0.241, 0.276] <0.001 17% 0%
133 Anthropic Claude-3-Haiku-20240307 (superforecaster with news 1) 0.288 0.289 0.230 0.235 0.288 0.261 [0.247, 0.276] <0.001 14% 8%
134 Meta Llama-2-70b-Chat-Hf (zero shot) 0.303 0.308 0.238 0.244 0.305 0.274 [0.259, 0.289] <0.001 16% 0%
135 ForecastBench Random Uniform 0.348 0.342 0.216 0.227 0.345 0.288 [0.267, 0.308] <0.001 18% 0%
136 OpenAI GPT-3.5-Turbo-0125 (zero shot with freeze values) 0.445 0.253 0.192 0.198 0.349 0.321 [0.3, 0.342] <0.001 23% 0%
137 OpenAI GPT-3.5-Turbo-0125 (zero shot) 0.445 0.273 0.239 0.242 0.359 0.344 [0.324, 0.363] <0.001 17% 0%
138 ForecastBench Always 1 0.729 0.615 0.658 0.655 0.672 0.692 [0.661, 0.722] <0.001 17% 0%