Updated Feb. 17, 2025
| Ranking | Organization | Model | Dataset Score (N=1,754) | Market Score (resolved) (N=193) | Market Score (unresolved) (N=103) | Market Score (overall) (N=296) | Overall Resolved Score (N=1,947) | Overall Score (N=2,050) | Overall Score 95% CI | Pairwise p-value comparing to No. 1 (bootstrapped) | Pct. more accurate than No. 1 | Pct. Imputed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ForecastBench | Superforecaster median forecast | 0.091 | 0.062 | 0.062 | 0.062 | 0.076 | 0.076 | [0.067, 0.086] | 0% | 0% | |
| 2 | ForecastBench | Public median forecast | 0.119 | 0.092 | 0.035 | 0.072 | 0.105 | 0.096 | [0.086, 0.105] | <0.001 | 23% | 0% |
| 3 | OpenAI | GPT-4o (scratchpad with freeze values) | 0.175 | 0.107 | 0.043 | 0.085 | 0.141 | 0.130 | [0.119, 0.141] | <0.001 | 24% | 0% |
| 4 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with freeze values) | 0.154 | 0.144 | 0.038 | 0.107 | 0.149 | 0.131 | [0.118, 0.143] | <0.001 | 24% | 0% |
| 5 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad with freeze values) | 0.164 | 0.126 | 0.055 | 0.101 | 0.145 | 0.133 | [0.121, 0.145] | <0.001 | 23% | 0% |
| 6 | OpenAI | GPT-4o (scratchpad with news with freeze values) | 0.171 | 0.132 | 0.051 | 0.104 | 0.151 | 0.137 | [0.125, 0.149] | <0.001 | 20% | 0% |
| 7 | Gemini-1.5-Pro (scratchpad with freeze values) | 0.152 | 0.158 | 0.077 | 0.130 | 0.155 | 0.141 | [0.13, 0.152] | <0.001 | 21% | 0% | |
| 8 | Gemini-1.5-Pro (scratchpad with news with freeze values) | 0.154 | 0.164 | 0.075 | 0.133 | 0.159 | 0.143 | [0.133, 0.154] | <0.001 | 21% | 1% | |
| 9 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with news with freeze values) | 0.160 | 0.158 | 0.078 | 0.130 | 0.159 | 0.145 | [0.132, 0.158] | <0.001 | 20% | 0% |
| 10 | Anthropic | Claude-3-5-Sonnet-20240620 (zero shot with freeze values) | 0.174 | 0.148 | 0.064 | 0.119 | 0.161 | 0.146 | [0.133, 0.16] | <0.001 | 22% | 0% |
| 11 | Gemini-1.5-Pro (scratchpad) | 0.152 | 0.172 | 0.089 | 0.143 | 0.162 | 0.148 | [0.137, 0.158] | <0.001 | 20% | 1% | |
| 12 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad) | 0.164 | 0.154 | 0.091 | 0.132 | 0.159 | 0.148 | [0.138, 0.158] | <0.001 | 17% | 0% |
| 13 | Gemini-1.5-Pro (scratchpad with news) | 0.154 | 0.165 | 0.102 | 0.143 | 0.159 | 0.148 | [0.137, 0.16] | <0.001 | 21% | 1% | |
| 14 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad) | 0.154 | 0.166 | 0.099 | 0.143 | 0.160 | 0.149 | [0.137, 0.16] | <0.001 | 20% | 0% |
| 15 | OpenAI | GPT-4o (scratchpad) | 0.175 | 0.144 | 0.082 | 0.122 | 0.159 | 0.149 | [0.138, 0.159] | <0.001 | 19% | 1% |
| 16 | Anthropic | Claude-3-Opus-20240229 (zero shot with freeze values) | 0.173 | 0.162 | 0.054 | 0.124 | 0.167 | 0.149 | [0.135, 0.162] | <0.001 | 21% | 0% |
| 17 | OpenAI | GPT-4o (scratchpad with news) | 0.171 | 0.146 | 0.092 | 0.127 | 0.159 | 0.149 | [0.138, 0.16] | <0.001 | 18% | 0% |
| 18 | OpenAI | GPT-4-Turbo-2024-04-09 (zero shot with freeze values) | 0.200 | 0.125 | 0.052 | 0.100 | 0.163 | 0.150 | [0.138, 0.162] | <0.001 | 24% | 0% |
| 19 | Qwen | Qwen1.5-110B-Chat (scratchpad with freeze values) | 0.171 | 0.160 | 0.078 | 0.131 | 0.165 | 0.151 | [0.14, 0.162] | <0.001 | 16% | 1% |
| 20 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with news) | 0.160 | 0.174 | 0.103 | 0.149 | 0.167 | 0.154 | [0.143, 0.166] | <0.001 | 19% | 0% |
| 21 | ForecastBench | Imputed Forecaster | 0.250 | 0.073 | 0.034 | 0.059 | 0.161 | 0.155 | [0.147, 0.163] | <0.001 | 22% | 100% |
| 22 | OpenAI | GPT-4 (zero shot with freeze values) | 0.213 | 0.125 | 0.052 | 0.099 | 0.169 | 0.156 | [0.144, 0.168] | <0.001 | 21% | 0% |
| 23 | Gemini-1.5-Pro (zero shot with freeze values) | 0.205 | 0.125 | 0.082 | 0.110 | 0.165 | 0.157 | [0.144, 0.171] | <0.001 | 20% | 15% | |
| 24 | Anthropic | Claude-3-5-Sonnet-20240620 (superforecaster with news 3) | 0.167 | 0.181 | 0.088 | 0.149 | 0.174 | 0.158 | [0.146, 0.169] | <0.001 | 17% | 2% |
| 25 | OpenAI | GPT-4 (scratchpad with freeze values) | 0.190 | 0.158 | 0.064 | 0.125 | 0.174 | 0.158 | [0.145, 0.171] | <0.001 | 19% | 1% |
| 26 | Anthropic | Claude-3-Opus-20240229 (scratchpad with freeze values) | 0.185 | 0.161 | 0.084 | 0.134 | 0.173 | 0.159 | [0.148, 0.171] | <0.001 | 18% | 0% |
| 27 | ForecastBench | LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news | 0.241 | 0.097 | 0.050 | 0.080 | 0.169 | 0.161 | [0.153, 0.168] | <0.001 | 18% | 76% |
| 28 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad with news with freeze values) | 0.209 | 0.132 | 0.081 | 0.114 | 0.170 | 0.161 | [0.149, 0.173] | <0.001 | 20% | 0% |
| 29 | ForecastBench | LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news | 0.242 | 0.100 | 0.050 | 0.083 | 0.171 | 0.162 | [0.155, 0.17] | <0.001 | 18% | 76% |
| 30 | ForecastBench | LLM Crowd (gpt-4o, claude-3.5-sonnet, gemini-1.5-pro) with news | 0.243 | 0.099 | 0.050 | 0.082 | 0.171 | 0.162 | [0.155, 0.17] | <0.001 | 18% | 76% |
| 31 | Gemini-1.5-Pro (superforecaster with news 3) | 0.176 | 0.176 | 0.106 | 0.151 | 0.176 | 0.164 | [0.153, 0.175] | <0.001 | 19% | 1% | |
| 32 | Mistral AI | Mistral-Large-Latest (scratchpad with freeze values) | 0.185 | 0.166 | 0.100 | 0.143 | 0.176 | 0.164 | [0.154, 0.175] | <0.001 | 16% | 0% |
| 33 | OpenAI | GPT-4 (scratchpad) | 0.190 | 0.169 | 0.086 | 0.140 | 0.180 | 0.165 | [0.156, 0.174] | <0.001 | 15% | 1% |
| 34 | Qwen | Qwen1.5-110B-Chat (scratchpad) | 0.171 | 0.187 | 0.111 | 0.161 | 0.179 | 0.166 | [0.156, 0.175] | <0.001 | 15% | 0% |
| 35 | Gemini-1.5-Pro (zero shot) | 0.205 | 0.136 | 0.113 | 0.128 | 0.171 | 0.167 | [0.154, 0.179] | <0.001 | 19% | 15% | |
| 36 | Gemini-1.5-Flash (scratchpad with freeze values) | 0.179 | 0.182 | 0.103 | 0.154 | 0.180 | 0.167 | [0.153, 0.18] | <0.001 | 18% | 0% | |
| 37 | Anthropic | Claude-2.1 (scratchpad) | 0.228 | 0.117 | 0.084 | 0.105 | 0.172 | 0.167 | [0.157, 0.177] | <0.001 | 20% | 24% |
| 38 | Meta | Llama-3-70b-Chat-Hf (zero shot with freeze values) | 0.205 | 0.168 | 0.065 | 0.132 | 0.186 | 0.168 | [0.155, 0.182] | <0.001 | 19% | 0% |
| 39 | Meta | Llama-3-70b-Chat-Hf (scratchpad with freeze values) | 0.208 | 0.156 | 0.078 | 0.129 | 0.182 | 0.169 | [0.158, 0.179] | <0.001 | 17% | 0% |
| 40 | Anthropic | Claude-3-Opus-20240229 (zero shot) | 0.173 | 0.201 | 0.098 | 0.165 | 0.187 | 0.169 | [0.156, 0.183] | <0.001 | 18% | 0% |
| 41 | Gemini-1.5-Flash (zero shot with freeze values) | 0.217 | 0.150 | 0.069 | 0.122 | 0.183 | 0.169 | [0.155, 0.183] | <0.001 | 23% | 0% | |
| 42 | OpenAI | GPT-4-Turbo-2024-04-09 (zero shot) | 0.200 | 0.160 | 0.100 | 0.139 | 0.180 | 0.169 | [0.157, 0.182] | <0.001 | 19% | 1% |
| 43 | OpenAI | GPT-4-Turbo-2024-04-09 (scratchpad with news) | 0.209 | 0.149 | 0.096 | 0.131 | 0.179 | 0.170 | [0.159, 0.18] | <0.001 | 17% | 0% |
| 44 | Gemini-1.5-Flash (scratchpad) | 0.179 | 0.185 | 0.115 | 0.161 | 0.182 | 0.170 | [0.159, 0.181] | <0.001 | 16% | 0% | |
| 45 | OpenAI | GPT-4-Turbo-2024-04-09 (superforecaster with news 3) | 0.202 | 0.163 | 0.093 | 0.139 | 0.183 | 0.170 | [0.16, 0.181] | <0.001 | 17% | 9% |
| 46 | Qwen | Qwen1.5-110B-Chat (scratchpad with news with freeze values) | 0.198 | 0.172 | 0.097 | 0.146 | 0.185 | 0.172 | [0.161, 0.183] | <0.001 | 16% | 0% |
| 47 | Anthropic | Claude-3-5-Sonnet-20240620 (zero shot) | 0.174 | 0.197 | 0.122 | 0.171 | 0.185 | 0.172 | [0.158, 0.187] | <0.001 | 17% | 1% |
| 48 | Mistral AI | Mistral-Large-Latest (zero shot with freeze values) | 0.203 | 0.184 | 0.071 | 0.145 | 0.194 | 0.174 | [0.16, 0.188] | <0.001 | 19% | 0% |
| 49 | Anthropic | Claude-2.1 (scratchpad with freeze values) | 0.228 | 0.147 | 0.070 | 0.120 | 0.187 | 0.174 | [0.162, 0.186] | <0.001 | 20% | 18% |
| 50 | OpenAI | GPT-4o (superforecaster with news 3) | 0.206 | 0.169 | 0.100 | 0.145 | 0.188 | 0.175 | [0.165, 0.186] | <0.001 | 16% | 7% |
| 51 | Anthropic | Claude-3-Opus-20240229 (scratchpad) | 0.185 | 0.206 | 0.106 | 0.171 | 0.195 | 0.178 | [0.167, 0.189] | <0.001 | 17% | 0% |
| 52 | Mistral AI | Mistral-Large-Latest (scratchpad) | 0.185 | 0.197 | 0.122 | 0.171 | 0.191 | 0.178 | [0.168, 0.188] | <0.001 | 15% | 0% |
| 53 | Anthropic | Claude-3-5-Sonnet-20240620 (scratchpad with SECOND news) | 0.217 | 0.165 | 0.094 | 0.140 | 0.191 | 0.178 | [0.167, 0.19] | <0.001 | 15% | 1% |
| 54 | OpenAI | GPT-4o (scratchpad with SECOND news) | 0.232 | 0.139 | 0.101 | 0.126 | 0.185 | 0.179 | [0.168, 0.19] | <0.001 | 15% | 11% |
| 55 | Qwen | Qwen1.5-110B-Chat (superforecaster with news 1) | 0.203 | 0.184 | 0.101 | 0.155 | 0.194 | 0.179 | [0.168, 0.191] | <0.001 | 17% | 15% |
| 56 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad) | 0.198 | 0.195 | 0.098 | 0.161 | 0.197 | 0.180 | [0.169, 0.19] | <0.001 | 16% | 0% |
| 57 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad with freeze values) | 0.198 | 0.189 | 0.113 | 0.162 | 0.193 | 0.180 | [0.167, 0.193] | <0.001 | 16% | 0% |
| 58 | Anthropic | Claude-3-5-Sonnet-20240620 (superforecaster with news 1) | 0.203 | 0.190 | 0.100 | 0.159 | 0.196 | 0.181 | [0.168, 0.193] | <0.001 | 18% | 17% |
| 59 | OpenAI | GPT-4o (superforecaster with news 1) | 0.200 | 0.194 | 0.102 | 0.162 | 0.197 | 0.181 | [0.167, 0.195] | <0.001 | 19% | 0% |
| 60 | Qwen | Qwen1.5-110B-Chat (scratchpad with news) | 0.198 | 0.191 | 0.113 | 0.164 | 0.194 | 0.181 | [0.171, 0.191] | <0.001 | 16% | 0% |
| 61 | OpenAI | GPT-4o (zero shot with freeze values) | 0.210 | 0.175 | 0.111 | 0.153 | 0.192 | 0.181 | [0.165, 0.197] | <0.001 | 21% | 1% |
| 62 | Gemini-1.5-Flash (scratchpad with news with freeze values) | 0.213 | 0.174 | 0.105 | 0.150 | 0.194 | 0.182 | [0.169, 0.195] | <0.001 | 16% | 0% | |
| 63 | Meta | Llama-3-70b-Chat-Hf (zero shot) | 0.205 | 0.188 | 0.113 | 0.162 | 0.196 | 0.183 | [0.172, 0.194] | <0.001 | 16% | 0% |
| 64 | OpenAI | GPT-4o (zero shot) | 0.210 | 0.196 | 0.085 | 0.157 | 0.203 | 0.183 | [0.17, 0.197] | <0.001 | 18% | 3% |
| 65 | Anthropic | Claude-3-Opus-20240229 (scratchpad with news with freeze values) | 0.203 | 0.198 | 0.106 | 0.166 | 0.200 | 0.184 | [0.172, 0.196] | <0.001 | 16% | 0% |
| 66 | Anthropic | Claude-3-Opus-20240229 (superforecaster with news 1) | 0.197 | 0.211 | 0.103 | 0.173 | 0.204 | 0.185 | [0.173, 0.197] | <0.001 | 16% | 8% |
| 67 | Anthropic | Claude-3-5-Sonnet-20240620 (superforecaster with news 2) | 0.200 | 0.202 | 0.111 | 0.171 | 0.201 | 0.185 | [0.173, 0.198] | <0.001 | 17% | 1% |
| 68 | Gemini-1.5-Flash (superforecaster with news 2) | 0.222 | 0.161 | 0.128 | 0.150 | 0.191 | 0.186 | [0.174, 0.197] | <0.001 | 18% | 11% | |
| 69 | Anthropic | Claude-3-Opus-20240229 (superforecaster with news 3) | 0.195 | 0.206 | 0.119 | 0.176 | 0.201 | 0.186 | [0.174, 0.198] | <0.001 | 16% | 5% |
| 70 | Anthropic | Claude-3-Opus-20240229 (scratchpad with news) | 0.203 | 0.201 | 0.115 | 0.171 | 0.202 | 0.187 | [0.176, 0.198] | <0.001 | 16% | 0% |
| 71 | Qwen | Qwen1.5-110B-Chat (superforecaster with news 3) | 0.213 | 0.191 | 0.104 | 0.161 | 0.202 | 0.187 | [0.178, 0.197] | <0.001 | 16% | 4% |
| 72 | Gemini-1.5-Flash (scratchpad with news) | 0.213 | 0.187 | 0.117 | 0.162 | 0.200 | 0.188 | [0.177, 0.199] | <0.001 | 16% | 0% | |
| 73 | Qwen | Qwen1.5-110B-Chat (zero shot with freeze values) | 0.218 | 0.196 | 0.090 | 0.159 | 0.207 | 0.189 | [0.174, 0.203] | <0.001 | 16% | 1% |
| 74 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (zero shot with freeze values) | 0.207 | 0.197 | 0.123 | 0.171 | 0.202 | 0.189 | [0.174, 0.204] | <0.001 | 19% | 0% |
| 75 | Meta | Llama-3-8b-Chat-Hf (zero shot with freeze values) | 0.224 | 0.159 | 0.148 | 0.155 | 0.191 | 0.189 | [0.175, 0.204] | <0.001 | 18% | 0% |
| 76 | Anthropic | Claude-2.1 (scratchpad with news) | 0.234 | 0.179 | 0.092 | 0.148 | 0.206 | 0.191 | [0.179, 0.203] | <0.001 | 18% | 14% |
| 77 | Meta | Llama-3-70b-Chat-Hf (scratchpad) | 0.208 | 0.206 | 0.116 | 0.175 | 0.207 | 0.192 | [0.183, 0.2] | <0.001 | 16% | 0% |
| 78 | Anthropic | Claude-2.1 (scratchpad with news with freeze values) | 0.234 | 0.182 | 0.090 | 0.150 | 0.208 | 0.192 | [0.18, 0.204] | <0.001 | 18% | 10% |
| 79 | Gemini-1.5-Pro (superforecaster with news 1) | 0.209 | 0.217 | 0.104 | 0.177 | 0.213 | 0.193 | [0.18, 0.206] | <0.001 | 18% | 4% | |
| 80 | OpenAI | GPT-4-Turbo-2024-04-09 (superforecaster with news 1) | 0.216 | 0.205 | 0.107 | 0.171 | 0.210 | 0.193 | [0.18, 0.206] | <0.001 | 16% | 0% |
| 81 | Qwen | Qwen1.5-110B-Chat (zero shot) | 0.218 | 0.203 | 0.104 | 0.169 | 0.211 | 0.193 | [0.182, 0.204] | <0.001 | 15% | 1% |
| 82 | OpenAI | GPT-4 (zero shot) | 0.213 | 0.212 | 0.111 | 0.177 | 0.213 | 0.195 | [0.184, 0.206] | <0.001 | 16% | 0% |
| 83 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 3) | 0.228 | 0.197 | 0.099 | 0.163 | 0.213 | 0.196 | [0.186, 0.205] | <0.001 | 14% | 13% |
| 84 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad with news) | 0.229 | 0.195 | 0.103 | 0.163 | 0.212 | 0.196 | [0.187, 0.205] | <0.001 | 15% | 0% |
| 85 | Mistral AI | Mistral-Large-Latest (zero shot) | 0.203 | 0.216 | 0.142 | 0.190 | 0.209 | 0.196 | [0.184, 0.209] | <0.001 | 16% | 0% |
| 86 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad) | 0.236 | 0.179 | 0.117 | 0.158 | 0.208 | 0.197 | [0.185, 0.208] | <0.001 | 19% | 12% |
| 87 | Anthropic | Claude-3-Opus-20240229 (superforecaster with news 2) | 0.207 | 0.218 | 0.127 | 0.187 | 0.213 | 0.197 | [0.184, 0.21] | <0.001 | 16% | 1% |
| 88 | Gemini-1.5-Flash (superforecaster with news 3) | 0.225 | 0.197 | 0.121 | 0.170 | 0.211 | 0.198 | [0.187, 0.209] | <0.001 | 15% | 9% | |
| 89 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 1) | 0.234 | 0.183 | 0.123 | 0.163 | 0.209 | 0.198 | [0.185, 0.211] | <0.001 | 17% | 19% |
| 90 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (scratchpad with news with freeze values) | 0.229 | 0.210 | 0.099 | 0.171 | 0.219 | 0.200 | [0.189, 0.211] | <0.001 | 16% | 0% |
| 91 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (zero shot) | 0.207 | 0.222 | 0.140 | 0.193 | 0.214 | 0.200 | [0.186, 0.214] | <0.001 | 16% | 0% |
| 92 | Meta | Llama-3-8b-Chat-Hf (zero shot) | 0.224 | 0.194 | 0.164 | 0.184 | 0.209 | 0.204 | [0.188, 0.219] | <0.001 | 18% | 0% |
| 93 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (zero shot with freeze values) | 0.260 | 0.181 | 0.099 | 0.153 | 0.220 | 0.206 | [0.19, 0.222] | <0.001 | 23% | 0% |
| 94 | Gemini-1.5-Flash (zero shot) | 0.217 | 0.212 | 0.166 | 0.196 | 0.214 | 0.206 | [0.192, 0.221] | <0.001 | 18% | 1% | |
| 95 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad with freeze values) | 0.236 | 0.203 | 0.130 | 0.178 | 0.220 | 0.207 | [0.193, 0.221] | <0.001 | 18% | 11% |
| 96 | Gemini-1.5-Pro (superforecaster with news 2) | 0.242 | 0.210 | 0.115 | 0.177 | 0.226 | 0.210 | [0.195, 0.225] | <0.001 | 18% | 0% | |
| 97 | Anthropic | Claude-2.1 (superforecaster with news 2) | 0.265 | 0.189 | 0.099 | 0.157 | 0.227 | 0.211 | [0.197, 0.225] | <0.001 | 20% | 21% |
| 98 | Mistral AI | Mistral-Large-Latest (scratchpad with news with freeze values) | 0.242 | 0.209 | 0.130 | 0.181 | 0.226 | 0.212 | [0.2, 0.223] | <0.001 | 15% | 0% |
| 99 | OpenAI | GPT-4-Turbo-2024-04-09 (superforecaster with news 2) | 0.246 | 0.210 | 0.119 | 0.178 | 0.228 | 0.212 | [0.198, 0.226] | <0.001 | 18% | 8% |
| 100 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 3) | 0.270 | 0.168 | 0.129 | 0.154 | 0.219 | 0.212 | [0.201, 0.223] | <0.001 | 18% | 17% |
| 101 | Mistral AI | Mixtral-8x22B-Instruct-V0.1 (superforecaster with news 2) | 0.246 | 0.216 | 0.111 | 0.179 | 0.231 | 0.213 | [0.203, 0.223] | <0.001 | 16% | 1% |
| 102 | Qwen | Qwen1.5-110B-Chat (superforecaster with news 2) | 0.243 | 0.224 | 0.115 | 0.186 | 0.233 | 0.214 | [0.203, 0.225] | <0.001 | 16% | 4% |
| 103 | Mistral AI | Mistral-Large-Latest (scratchpad with news) | 0.242 | 0.218 | 0.127 | 0.186 | 0.230 | 0.214 | [0.203, 0.225] | <0.001 | 15% | 0% |
| 104 | Anthropic | Claude-2.1 (zero shot with freeze values) | 0.244 | 0.214 | 0.132 | 0.186 | 0.229 | 0.215 | [0.198, 0.231] | <0.001 | 18% | 0% |
| 105 | OpenAI | GPT-4o (superforecaster with news 2) | 0.257 | 0.210 | 0.112 | 0.176 | 0.234 | 0.216 | [0.203, 0.23] | <0.001 | 18% | 5% |
| 106 | Anthropic | Claude-2.1 (superforecaster with news 3) | 0.257 | 0.218 | 0.104 | 0.178 | 0.237 | 0.217 | [0.205, 0.23] | <0.001 | 18% | 10% |
| 107 | Anthropic | Claude-2.1 (superforecaster with news 1) | 0.274 | 0.199 | 0.103 | 0.165 | 0.237 | 0.220 | [0.206, 0.233] | <0.001 | 19% | 24% |
| 108 | Gemini-1.5-Flash (superforecaster with news 1) | 0.237 | 0.234 | 0.144 | 0.203 | 0.236 | 0.220 | [0.206, 0.234] | <0.001 | 15% | 20% | |
| 109 | Mistral AI | Mistral-Large-Latest (superforecaster with news 2) | 0.231 | 0.244 | 0.159 | 0.215 | 0.237 | 0.223 | [0.209, 0.236] | <0.001 | 15% | 9% |
| 110 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 1) | 0.295 | 0.175 | 0.123 | 0.157 | 0.235 | 0.226 | [0.212, 0.239] | <0.001 | 18% | 17% |
| 111 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad with news with freeze values) | 0.291 | 0.170 | 0.144 | 0.161 | 0.231 | 0.226 | [0.214, 0.238] | <0.001 | 18% | 14% |
| 112 | Mistral AI | Mistral-Large-Latest (superforecaster with news 1) | 0.248 | 0.232 | 0.159 | 0.207 | 0.240 | 0.227 | [0.213, 0.242] | <0.001 | 15% | 25% |
| 113 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (scratchpad with news) | 0.291 | 0.186 | 0.123 | 0.164 | 0.238 | 0.228 | [0.215, 0.24] | <0.001 | 17% | 15% |
| 114 | Anthropic | Claude-2.1 (zero shot) | 0.244 | 0.240 | 0.158 | 0.211 | 0.242 | 0.228 | [0.215, 0.24] | <0.001 | 15% | 0% |
| 115 | Mistral AI | Mistral-Large-Latest (superforecaster with news 3) | 0.267 | 0.217 | 0.137 | 0.189 | 0.242 | 0.228 | [0.216, 0.239] | <0.001 | 15% | 5% |
| 116 | ForecastBench | Always 0.5 | 0.250 | 0.250 | 0.133 | 0.209 | 0.250 | 0.230 | [0.225, 0.234] | <0.001 | 14% | 0% |
| 117 | Meta | Llama-2-70b-Chat-Hf (scratchpad with freeze values) | 0.262 | 0.234 | 0.134 | 0.199 | 0.248 | 0.230 | [0.22, 0.241] | <0.001 | 16% | 1% |
| 118 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (superforecaster with news 2) | 0.289 | 0.197 | 0.129 | 0.173 | 0.243 | 0.231 | [0.218, 0.244] | <0.001 | 20% | 20% |
| 119 | Meta | Llama-3-8b-Chat-Hf (scratchpad with freeze values) | 0.272 | 0.222 | 0.137 | 0.192 | 0.247 | 0.232 | [0.22, 0.244] | <0.001 | 15% | 0% |
| 120 | Mistral AI | Mixtral-8x7B-Instruct-V0.1 (zero shot) | 0.260 | 0.234 | 0.159 | 0.208 | 0.247 | 0.234 | [0.216, 0.251] | <0.001 | 19% | 0% |
| 121 | Anthropic | Claude-3-Haiku-20240307 (zero shot with freeze values) | 0.280 | 0.226 | 0.123 | 0.190 | 0.253 | 0.235 | [0.222, 0.247] | <0.001 | 15% | 1% |
| 122 | Meta | Llama-2-70b-Chat-Hf (scratchpad) | 0.262 | 0.249 | 0.138 | 0.211 | 0.255 | 0.236 | [0.226, 0.246] | <0.001 | 16% | 1% |
| 123 | Anthropic | Claude-3-Haiku-20240307 (scratchpad with freeze values) | 0.270 | 0.248 | 0.140 | 0.210 | 0.259 | 0.240 | [0.229, 0.252] | <0.001 | 14% | 1% |
| 124 | Meta | Llama-3-8b-Chat-Hf (scratchpad) | 0.272 | 0.246 | 0.149 | 0.212 | 0.259 | 0.242 | [0.231, 0.252] | <0.001 | 15% | 0% |
| 125 | Anthropic | Claude-3-Haiku-20240307 (scratchpad) | 0.270 | 0.262 | 0.137 | 0.218 | 0.266 | 0.244 | [0.235, 0.254] | <0.001 | 15% | 1% |
| 126 | Anthropic | Claude-3-Haiku-20240307 (superforecaster with news 2) | 0.263 | 0.271 | 0.140 | 0.226 | 0.267 | 0.244 | [0.233, 0.256] | <0.001 | 15% | 5% |
| 127 | ForecastBench | Always 0 | 0.269 | 0.218 | 0.231 | 0.222 | 0.243 | 0.246 | [0.221, 0.27] | <0.001 | 46% | 0% |
| 128 | Anthropic | Claude-3-Haiku-20240307 (zero shot) | 0.280 | 0.253 | 0.137 | 0.212 | 0.266 | 0.246 | [0.236, 0.256] | <0.001 | 15% | 1% |
| 129 | Anthropic | Claude-3-Haiku-20240307 (superforecaster with news 3) | 0.287 | 0.258 | 0.156 | 0.223 | 0.272 | 0.255 | [0.245, 0.264] | <0.001 | 14% | 18% |
| 130 | Anthropic | Claude-3-Haiku-20240307 (scratchpad with news with freeze values) | 0.305 | 0.252 | 0.122 | 0.207 | 0.278 | 0.256 | [0.246, 0.266] | <0.001 | 14% | 0% |
| 131 | OpenAI | GPT-3.5-Turbo-0125 (scratchpad with freeze values) | 0.288 | 0.269 | 0.166 | 0.233 | 0.278 | 0.261 | [0.25, 0.271] | <0.001 | 14% | 0% |
| 132 | Anthropic | Claude-3-Haiku-20240307 (scratchpad with news) | 0.305 | 0.262 | 0.135 | 0.218 | 0.283 | 0.261 | [0.251, 0.271] | <0.001 | 15% | 1% |
| 133 | OpenAI | GPT-3.5-Turbo-0125 (scratchpad) | 0.288 | 0.291 | 0.162 | 0.246 | 0.289 | 0.267 | [0.256, 0.278] | <0.001 | 15% | 0% |
| 134 | Meta | Llama-2-70b-Chat-Hf (zero shot with freeze values) | 0.304 | 0.288 | 0.189 | 0.254 | 0.296 | 0.279 | [0.262, 0.296] | <0.001 | 17% | 0% |
| 135 | Anthropic | Claude-3-Haiku-20240307 (superforecaster with news 1) | 0.290 | 0.321 | 0.168 | 0.268 | 0.306 | 0.279 | [0.265, 0.292] | <0.001 | 15% | 6% |
| 136 | Meta | Llama-2-70b-Chat-Hf (zero shot) | 0.304 | 0.337 | 0.186 | 0.284 | 0.320 | 0.294 | [0.281, 0.307] | <0.001 | 16% | 1% |
| 137 | ForecastBench | Random Uniform | 0.340 | 0.311 | 0.191 | 0.269 | 0.325 | 0.304 | [0.287, 0.322] | <0.001 | 18% | 0% |
| 138 | OpenAI | GPT-3.5-Turbo-0125 (zero shot with freeze values) | 0.446 | 0.257 | 0.172 | 0.227 | 0.351 | 0.337 | [0.318, 0.355] | <0.001 | 20% | 0% |
| 139 | OpenAI | GPT-3.5-Turbo-0125 (zero shot) | 0.446 | 0.320 | 0.200 | 0.279 | 0.383 | 0.362 | [0.346, 0.378] | <0.001 | 17% | 0% |
| 140 | ForecastBench | Always 1 | 0.731 | 0.782 | 0.536 | 0.696 | 0.757 | 0.714 | [0.688, 0.74] | <0.001 | 20% | 0% |