This post is a combination of some new results, old results, and reddit.com/u/invectorgator's results (with permission) to help give a clear picture of all testing so far. Links to the relevant posts can be found below.
This was a lot of fun, and has lit a fire under me about benchmarking. I have some ideas for a personal benchmarking tool using Wilmer that will be easier for me to run. Will share more info once I dig into it.
As usual, a few notes about the tests:
- These tests were performed using u/chibop1's MMLU-Pro project. Be sure to swing by and thank them for giving us this fun toy
 - With the permission of u/invectorgator, this post will combine all of our results together.
- We both used the same commits of the MMLU-Pro project, we both used only q8 ggufs (unless otherwise specified) and both used Text-Generation-WebUI for our backends to guarantee correct prompt templating, so our test results are compatible
 
 - I didn't do these tests expecting them to be super scientific and accurate assessments of an LLM's knowledge. I understand the concerns people have about them. But they do test a combination of knowledge AND instruction following. They aren't perfect, but it's better than just perplexity testing.
 - Invectorgator is doing Gemma, so I'm not
 - Qwen 2 7b just really does not like this test; at least running in text-gen.
 
New Models In This Test
This test will add the following new models to the pile. I went with some of my personal favorite fine-tunes. You can find the exact GGUFs that I used below, and you can see the above posts for the exact ggufs for the other models:
Old Posts Combined Into This One:
Key Takeaway
I am now convinced that Hermes 2 Theta Llama 3 8b is secretly a 30b in disguise. To say it is punching above its weight is an understatement.
All below tests are ggufs (q8 unless otherwise noted) running in Text-Generation-WebUI. The tests require > 4096 context, so some model versions were chosen to fit that need.
Line breaks are for loose grouping.
Business
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
277/789 | 
35.11 | 
| Open-Hermes-2.5-7b | 
285/789 | 
36.12 | 
| Mistral-7b-Inst-v0.3-q8 | 
265/789 | 
33.59 | 
| Llama-3-8b-q4_K_M | 
148/789 | 
18.76 | 
| Llama-3-8b-q8 | 
160/789 | 
20.28 | 
| Llama-3-8b-SPPO-Iter-3 | 
247/789 | 
31.31 | 
| Hermes-2-Theta-Llama-3-8b | 
330/789 | 
41.83 | 
| Yi-1.5-9b-32k-q8 | 
240/789 | 
30.42 | 
| Phi-Medium-128k-q8 | 
260/789 | 
32.95 | 
| Mixtral-8x7b-Instruct-Q8 | 
310/789 | 
39.29 | 
| Dolphin-Mixtral-2.5-8x7b | 
350/789 | 
44.36 | 
| Nous-Capybara-34b | 
313/789 | 
39.67 | 
| Yi-1.5-34B-32K-Q8 | 
325/789 | 
41.19 | 
| Command-R-v01-Q8 | 
126/789 | 
15.97 | 
| Llama-3-70b-FP16-Q2_KXXS | 
254/789 | 
32.19 | 
| Llama-3-70b-FP16-Q2_K | 
309/789 | 
39.16 | 
| Llama-3-70b-FP16-Q4_K_M | 
427/789 | 
54.12 | 
| Llama-3-70b-FP16-Q5_K_M | 
415/789 | 
52.60 | 
| Llama-3-70b-FP16-Q6_K | 
408/789 | 
51.71 | 
| Llama-3-70b-FP16-Q8_0 | 
411/789 | 
52.09 | 
Law
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
282/1101 | 
25.61 | 
| Open-Hermes-2.5-7b | 
260/1101 | 
23.61 | 
| Mistral-7b-Inst-v0.3-q8 | 
248/1101 | 
22.52 | 
| Yi-1.5-9b-32k-q8 | 
191/1101 | 
17.35 | 
| Phi-Medium-128k-q8 | 
255/1101 | 
23.16 | 
| Llama-3-8b-q4_K_M | 
161/1101 | 
14.62 | 
| Llama-3-8b-q8 | 
172/1101 | 
15.62 | 
| Llama-3-8b-SPPO-Iter-3 | 
200/1101 | 
18.17 | 
| Hermes-2-Theta-Llama-3-8b | 
280/1101 | 
25.43 | 
| Mixtral-8x7b-Instruct-Q8 | 
282/1101 | 
25.61 | 
| Dolphin-Mixtral-2.5-8x7b | 
271/1101 | 
24.61 | 
| Nous-Capybara-34b | 
369/1101 | 
33.51 | 
| Yi-1.5-34B-32K-Q8 | 
417/1101 | 
37.87 | 
| Command-R-v01-Q8 | 
146/1101 | 
13.26 | 
| Llama-3-70b-FP16-Q2_KXXS | 
362/1101 | 
32.88 | 
| Llama-3-70b-FP16-Q2_K | 
416/1101 | 
37.78 | 
| Llama-3-70b-FP16-Q4_K_M | 
471/1101 | 
42.78 | 
| Llama-3-70b-FP16-Q5_K_M | 
469/1101 | 
42.60 | 
| Llama-3-70b-FP16-Q6_K | 
469/1101 | 
42.60 | 
| Llama-3-70b-FP16-Q8_0 | 
464/1101 | 
42.14 | 
Psychology
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
430/798 | 
53.88 | 
| Open-Hermes-2.5-7b | 
434/798 | 
54.39 | 
| Mistral-7b-Inst-v0.3-q8 | 
343/798 | 
42.98 | 
| Llama-3-8b-q4_K_M | 
328/798 | 
41.10 | 
| Llama-3-8b-q8 | 
372/798 | 
46.62 | 
| Llama-3-8b-SPPO-Iter-3 | 
252/798 | 
31.58 | 
| Hermes-2-Theta-Llama-3-8b | 
452/798 | 
56.64 | 
| Yi-1.5-9b-32k-q8 | 
173/798 | 
21.68 | 
| Phi-Medium-128k-q8 | 
358/798 | 
44.86 | 
| Mixtral-8x7b-Instruct-Q8 | 
365/798 | 
45.74 | 
| Dolphin-Mixtral-2.5-8x7b | 
468/798 | 
58.65 | 
| Nous-Capybara-34b | 
474/798 | 
59.40 | 
| Yi-1.5-34B-32K-Q8 | 
510/798 | 
63.91 | 
| Command-R-v01-Q8 | 
131/798 | 
16.42 | 
| Llama-3-70b-FP16-Q2_KXXS | 
493/798 | 
61.78 | 
| Llama-3-70b-FP16-Q2_K | 
565/798 | 
70.80 | 
| Llama-3-70b-FP16-Q4_K_M | 
597/798 | 
74.81 | 
| Llama-3-70b-FP16-Q5_K_M | 
611/798 | 
76.57 | 
| Llama-3-70b-FP16-Q6_K | 
605/798 | 
75.81 | 
| Llama-3-70b-FP16-Q8_0 | 
605/798 | 
75.81 | 
Biology
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
427/717 | 
59.55 | 
| Open-Hermes-2.5-7b | 
417/717 | 
58.16 | 
| Mistral-7b-Inst-v0.3-q8 | 
390/717 | 
54.39 | 
| Llama-3-8b-q4_K_M | 
412/717 | 
57.46 | 
| Llama-3-8b-q8 | 
424/717 | 
59.14 | 
| Llama-3-8b-SPPO-Iter-3 | 
316/717 | 
44.07 | 
| Hermes-2-Theta-Llama-3-8b | 
453/717 | 
63.18 | 
| Yi-1.5-9b-32k-q8 | 
288/717 | 
40.17 | 
| Phi-Medium-128k-q8 | 
262/717 | 
36.54 | 
| Mixtral-8x7b-Instruct-Q8 | 
334/717 | 
46.58 | 
| Dolphin-Mixtral-2.5-8x7b | 
434/717 | 
60.53 | 
| Nous-Capybara-34b | 
473/717 | 
65.97 | 
| Yi-1.5-34B-32K-Q8 | 
521/717 | 
72.66 | 
| Command-R-v01-Q8 | 
138/717 | 
19.25 | 
| Llama-3-70b-FP16-Q2_KXXS | 
510/717 | 
71.13 | 
| Llama-3-70b-FP16-Q2_K | 
556/717 | 
77.55 | 
| Llama-3-70b-FP16-Q4_K_M | 
581/717 | 
81.03 | 
| Llama-3-70b-FP16-Q5_K_M | 
579/717 | 
80.75 | 
| Llama-3-70b-FP16-Q6_K | 
574/717 | 
80.06 | 
| Llama-3-70b-FP16-Q8_0 | 
581/717 | 
81.03 | 
Chemistry
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
246/1132 | 
21.73 | 
| Open-Hermes-2.5-7b | 
298/1132 | 
26.33 | 
| Mistral-7b-Inst-v0.3-q8 | 
265/1132 | 
23.41 | 
| Llama-3-8b-q4_K_M | 
163/1132 | 
14.40 | 
| Llama-3-8b-q8 | 
175/1132 | 
15.46 | 
| Llama-3-8b-SPPO-Iter-3 | 
236/1132 | 
20.85 | 
| Hermes-2-Theta-Llama-3-8b | 
330/1132 | 
29.15 | 
| Yi-1.5-9b-32k-q8 | 
270/1132 | 
23.85 | 
| Phi-Medium-128k-q8 | 
207/1132 | 
18.29 | 
| Mixtral-8x7b-Instruct-Q8 | 
338/1132 | 
29.86 | 
| Dolphin-Mixtral-2.5-8x7b | 
369/1132 | 
32.60 | 
| Nous-Capybara-34b | 
368/1132 | 
32.51 | 
| Yi-1.5-34B-32K-Q8 | 
350/1132 | 
30.92 | 
| Command-R-v01-Q8 | 
129/1132 | 
11.40 | 
| Llama-3-70b-FP16-Q2_KXXS | 
331/1132 | 
29.24 | 
| Llama-3-70b-FP16-Q2_K | 
378/1132 | 
33.39 | 
| Llama-3-70b-FP16-Q4_K_M | 
475/1132 | 
41.96 | 
| Llama-3-70b-FP16-Q5_K_M | 
493/1132 | 
43.55 | 
| Llama-3-70b-FP16-Q6_K | 
461/1132 | 
40.72 | 
| Llama-3-70b-FP16-Q8_0 | 
502/1132 | 
44.35 | 
History
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
143/381 | 
37.53 | 
| Open-Hermes-2.5-7b | 
148/381 | 
38.85 | 
| Mistral-7b-Inst-v0.3-q8 | 
120/381 | 
31.50 | 
| Llama-3-8b-q4_K_M | 
82/381 | 
21.52 | 
| Llama-3-8b-q8 | 
94/381 | 
24.67 | 
| Llama-3-8b-SPPO-Iter-3 | 
70/381 | 
18.37 | 
| Hermes-2-Theta-Llama-3-8b | 
155/381 | 
40.68 | 
| Yi-1.5-9b-32k-q8 | 
69/381 | 
18.11 | 
| Phi-Medium-128k-q8 | 
119/381 | 
31.23 | 
| Mixtral-8x7b-Instruct-Q8 | 
116/381 | 
30.45 | 
| Dolphin-Mixtral-2.5-8x7b | 
155/381 | 
40.68 | 
| Nous-Capybara-34b | 
105/381 | 
27.56 | 
| Yi-1.5-34B-32K-Q8 | 
174/381 | 
45.67 | 
| Command-R-v01-Q8 | 
40/381 | 
10.50 | 
| Llama-3-70b-FP16-Q2_KXXS | 
174/381 | 
45.67 | 
| Llama-3-70b-FP16-Q2_K | 
213/381 | 
55.91 | 
| Llama-3-70b-FP16-Q4_K_M | 
232/381 | 
60.89 | 
| Llama-3-70b-FP16-Q5_K_M | 
231/381 | 
60.63 | 
| Llama-3-70b-FP16-Q6_K | 
231/381 | 
60.63 | 
| Llama-3-70b-FP16-Q8_0 | 
231/381 | 
60.63 | 
Other
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
375/924 | 
40.58 | 
| Open-Hermes-2.5-7b | 
392/924 | 
42.42 | 
| Mistral-7b-Inst-v0.3-q8 | 
327/924 | 
35.39 | 
| Llama-3-8b-q4_K_M | 
269/924 | 
29.11 | 
| Llama-3-8b-q8 | 
292/924 | 
31.60 | 
| Llama-3-8b-SPPO-Iter-3 | 
270/924 | 
29.22 | 
| Hermes-2-Theta-Llama-3-8b | 
429/924 | 
46.43 | 
| Yi-1.5-9b-32k-q8 | 
227/924 | 
24.57 | 
| Phi-Medium-128k-q8 | 
388/924 | 
41.99 | 
| Mixtral-8x7b-Instruct-Q8 | 
355/924 | 
38.42 | 
| Dolphin-Mixtral-2.5-8x7b | 
448/924 | 
48.48 | 
| Nous-Capybara-34b | 
451/924 | 
48.81 | 
| Yi-1.5-34B-32K-Q8 | 
481/924 | 
52.06 | 
| Command-R-v01-Q8 | 
131/924 | 
14.18 | 
| Llama-3-70b-FP16-Q2_KXXS | 
395/924 | 
42.75 | 
| Llama-3-70b-FP16-Q2_K | 
472/924 | 
51.08 | 
| Llama-3-70b-FP16-Q4_K_M | 
529/924 | 
57.25 | 
| Llama-3-70b-FP16-Q5_K_M | 
552/924 | 
59.74 | 
| Llama-3-70b-FP16-Q6_K | 
546/924 | 
59.09 | 
| Llama-3-70b-FP16-Q8_0 | 
556/924 | 
60.17 | 
Health
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
376/818 | 
45.97 | 
| Open-Hermes-2.5-7b | 
356/818 | 
43.52 | 
| Mistral-7b-Inst-v0.3-q8 | 
294/818 | 
35.94 | 
| Llama-3-8b-q4_K_M | 
216/818 | 
26.41 | 
| Llama-3-8b-q8 | 
263/818 | 
32.15 | 
| Llama-3-8b-SPPO-Iter-3 | 
229/818 | 
28.00 | 
| Hermes-2-Theta-Llama-3-8b | 
388/818 | 
47.43 | 
| Yi-1.5-9b-32k-q8 | 
227/818 | 
27.75 | 
| Phi-Medium-128k-q8 | 
349/818 | 
42.67 | 
| Mixtral-8x7b-Instruct-Q8 | 
325/818 | 
39.73 | 
| Dolphin-Mixtral-2.5-8x7b | 
367/818 | 
44.87 | 
| Nous-Capybara-34b | 
348/818 | 
42.54 | 
| Yi-1.5-34B-32K-Q8 | 
468/818 | 
57.21 | 
| Command-R-v01-Q8 | 
110/818 | 
13.45 | 
| Llama-3-70b-FP16-Q2_KXXS | 
406/818 | 
49.63 | 
| Llama-3-70b-FP16-Q2_K | 
502/818 | 
61.37 | 
| Llama-3-70b-FP16-Q4_K_M | 
542/818 | 
66.26 | 
| Llama-3-70b-FP16-Q5_K_M | 
551/818 | 
67.36 | 
| Llama-3-70b-FP16-Q6_K | 
546/818 | 
66.75 | 
| Llama-3-70b-FP16-Q8_0 | 
544/818 | 
66.50 | 
Economics
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
391/844 | 
46.33 | 
| Open-Hermes-2.5-7b | 
407/844 | 
48.22 | 
| Mistral-7b-Inst-v0.3-q8 | 
343/844 | 
40.64 | 
| Llama-3-8b-q4_K_M | 
307/844 | 
36.37 | 
| Llama-3-8b-q8 | 
309/844 | 
36.61 | 
| Llama-3-8b-SPPO-Iter-3 | 
249/844 | 
29.50 | 
| Hermes-2-Theta-Llama-3-8b | 
448/844 | 
53.08 | 
| Yi-1.5-9b-32k-q8 | 
290/844 | 
34.36 | 
| Phi-Medium-128k-q8 | 
369/844 | 
43.72 | 
| Mixtral-8x7b-Instruct-Q8 | 
415/844 | 
49.17 | 
| Dolphin-Mixtral-2.5-8x7b | 
462/844 | 
54.74 | 
| Nous-Capybara-34b | 
451/844 | 
53.44 | 
| Yi-1.5-34B-32K-Q8 | 
519/844 | 
61.49 | 
| Command-R-v01-Q8 | 
198/844 | 
23.46 | 
| Llama-3-70b-FP16-Q2_KXXS | 
494/844 | 
58.53 | 
| Llama-3-70b-FP16-Q2_K | 
565/844 | 
66.94 | 
| Llama-3-70b-FP16-Q4_K_M | 
606/844 | 
71.80 | 
| Llama-3-70b-FP16-Q5_K_M | 
623/844 | 
73.82 | 
| Llama-3-70b-FP16-Q6_K | 
614/844 | 
72.75 | 
| Llama-3-70b-FP16-Q8_0 | 
625/844 | 
74.05 | 
Math
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
379/1351 | 
28.05 | 
| Open-Hermes-2.5-7b | 
423/1351 | 
31.31 | 
| Mistral-7b-Inst-v0.3-q8 | 
399/1351 | 
29.53 | 
| Llama-3-8b-q4_K_M | 
202/1351 | 
14.95 | 
| Llama-3-8b-q8 | 
167/1351 | 
12.36 | 
| Llama-3-8b-SPPO-Iter-3 | 
392/1351 | 
29.02 | 
| Hermes-2-Theta-Llama-3-8b | 
509/1351 | 
37.68 | 
| Yi-1.5-9b-32k-q8 | 
370/1351 | 
27.39 | 
| Phi-Medium-128k-q8 | 
299/1351 | 
22.13 | 
| Mixtral-8x7b-Instruct-Q8 | 
475/1351 | 
35.16 | 
| Dolphin-Mixtral-2.5-8x7b | 
487/1351 | 
36.04 | 
| Nous-Capybara-34b | 
347/1351 | 
25.68 | 
| Yi-1.5-34B-32K-Q8 | 
467/1351 | 
34.57 | 
| Command-R-v01-Q8 | 
166/1351 | 
12.29 | 
| Llama-3-70b-FP16-Q2_KXXS | 
336/1351 | 
24.87 | 
| Llama-3-70b-FP16-Q2_K | 
436/1351 | 
32.27 | 
| Llama-3-70b-FP16-Q4_K_M | 
529/1351 | 
39.16 | 
| Llama-3-70b-FP16-Q5_K_M | 
543/1351 | 
40.19 | 
| Llama-3-70b-FP16-Q6_K | 
547/1351 | 
40.49 | 
| Llama-3-70b-FP16-Q8_0 | 
532/1351 | 
39.38 | 
Physics
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
344/1299 | 
26.48 | 
| Open-Hermes-2.5-7b | 
351/1299 | 
27.02 | 
| Mistral-7b-Inst-v0.3-q8 | 
338/1299 | 
26.02 | 
| Llama-3-8b-q4_K_M | 
168/1299 | 
12.93 | 
| Llama-3-8b-q8 | 
178/1299 | 
13.70 | 
| Llama-3-8b-SPPO-Iter-3 | 
312/1299 | 
24.02 | 
| Hermes-2-Theta-Llama-3-8b | 
417/1299 | 
32.10 | 
| Yi-1.5-9b-32k-q8 | 
321/1299 | 
24.71 | 
| Phi-Medium-128k-q8 | 
312/1299 | 
24.02 | 
| Mixtral-8x7b-Instruct-Q8 | 
442/1299 | 
34.03 | 
| Dolphin-Mixtral-2.5-8x7b | 
410/1299 | 
31.56 | 
| Nous-Capybara-34b | 
404/1299 | 
31.10 | 
| Yi-1.5-34B-32K-Q8 | 
483/1299 | 
37.18 | 
| Command-R-v01-Q8 | 
166/1299 | 
12.78 | 
| Llama-3-70b-FP16-Q2_KXXS | 
382/1299 | 
29.41 | 
| Llama-3-70b-FP16-Q2_K | 
478/1299 | 
36.80 | 
| Llama-3-70b-FP16-Q4_K_M | 
541/1299 | 
41.65 | 
| Llama-3-70b-FP16-Q5_K_M | 
565/1299 | 
43.49 | 
| Llama-3-70b-FP16-Q6_K | 
550/1299 | 
42.34 | 
| Llama-3-70b-FP16-Q8_0 | 
544/1299 | 
41.88 | 
Computer Science
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
137/410 | 
33.41 | 
| Open-Hermes-2.5-7b | 
166/410 | 
40.49 | 
| Mistral-7b-Inst-v0.3-q8 | 
120/410 | 
29.27 | 
| Llama-3-8b-q4_K_M | 
105/410 | 
25.61 | 
| Llama-3-8b-q8 | 
125/410 | 
30.49 | 
| Llama-3-8b-SPPO-Iter-3 | 
130/410 | 
31.71 | 
| Hermes-2-Theta-Llama-3-8b | 
169/410 | 
41.22 | 
| Yi-1.5-9b-32k-q8 | 
96/410 | 
23.41 | 
| Phi-Medium-128k-q8 | 
131/410 | 
31.95 | 
| Mixtral-8x7b-Instruct-Q8 | 
150/410 | 
36.59 | 
| Dolphin-Mixtral-2.5-8x7b | 
177/410 | 
43.17 | 
| Nous-Capybara-34b | 
134/410 | 
32.68 | 
| Yi-1.5-34B-32K-Q8 | 
191/410 | 
46.59 | 
| Command-R-v01-Q8 | 
61/410 | 
14.88 | 
| Llama-3-70b-FP16-Q2_KXXS | 
186/410 | 
45.37 | 
| Llama-3-70b-FP16-Q2_K | 
199/410 | 
48.54 | 
| Llama-3-70b-FP16-Q4_K_M | 
239/410 | 
58.29 | 
| Llama-3-70b-FP16-Q5_K_M | 
241/410 | 
58.78 | 
| Llama-3-70b-FP16-Q6_K | 
240/410 | 
58.54 | 
| Llama-3-70b-FP16-Q8_0 | 
238/410 | 
58.05 | 
Philosophy
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
170/499 | 
34.07 | 
| Open-Hermes-2.5-7b | 
200/499 | 
40.08 | 
| Mistral-7b-Inst-v0.3-q8 | 
175/499 | 
35.07 | 
| Llama-3-8b-q4_K_M | 
152/499 | 
30.46 | 
| Llama-3-8b-q8 | 
161/499 | 
32.26 | 
| Llama-3-8b-SPPO-Iter-3 | 
142/499 | 
28.46 | 
| Hermes-2-Theta-Llama-3-8b | 
194/499 | 
38.88 | 
| Yi-1.5-9b-32k-q8 | 
114/499 | 
22.85 | 
| Phi-Medium-128k-q8 | 
187/499 | 
37.47 | 
| Mixtral-8x7b-Instruct-Q8 | 
194/499 | 
38.88 | 
| Dolphin-Mixtral-2.5-8x7b | 
212/499 | 
42.48 | 
| Nous-Capybara-34b | 
197/499 | 
39.48 | 
| Yi-1.5-34B-32K-Q8 | 
257/499 | 
51.50 | 
| Command-R-v01-Q8 | 
160/499 | 
32.06 | 
| Llama-3-70b-FP16-Q2_KXXS | 
200/499 | 
40.08 | 
| Llama-3-70b-FP16-Q2_K | 
258/499 | 
51.70 | 
| Llama-3-70b-FP16-Q4_K_M | 
282/499 | 
56.51 | 
| Llama-3-70b-FP16-Q5_K_M | 
281/499 | 
56.31 | 
| Llama-3-70b-FP16-Q6_K | 
283/499 | 
56.71 | 
| Llama-3-70b-FP16-Q8_0 | 
278/499 | 
55.71 | 
Engineering
| Model | 
Correct | 
Score (%) | 
| WizardLM-2-7b | 
196/969 | 
20.23 | 
| Open-Hermes-2.5-7b | 
193/969 | 
19.92 | 
| Mistral-7b-Inst-v0.3-q8 | 
198/969 | 
20.43 | 
| Llama-3-8b-q4_K_M | 
149/969 | 
15.38 | 
| Llama-3-8b-q8 | 
166/969 | 
17.13 | 
| Llama-3-8b-SPPO-Iter-3 | 
165/969 | 
17.03 | 
| Hermes-2-Theta-Llama-3-8b | 
245/969 | 
25.28 | 
| Yi-1.5-9b-32k-q8 | 
190/969 | 
19.61 | 
| Phi-Medium-128k-q8 | 
183/969 | 
18.89 | 
| Mixtral-8x7b-Instruct-Q8 | 
234/969 | 
24.15 | 
| Dolphin-Mixtral-2.5-8x7b | 
236/969 | 
24.35 | 
| Nous-Capybara-34b | 
393/969 | 
40.56 | 
| Yi-1.5-34B-32K-Q8 | 
408/969 | 
42.11 | 
| Command-R-v01-Q8 | 
145/969 | 
14.96 | 
| Llama-3-70b-FP16-Q2_KXXS | 
326/969 | 
33.64 | 
| Llama-3-70b-FP16-Q2_K | 
375/969 | 
38.70 | 
| Llama-3-70b-FP16-Q4_K_M | 
394/969 | 
40.66 | 
| Llama-3-70b-FP16-Q5_K_M | 
417/969 | 
43.03 | 
| Llama-3-70b-FP16-Q6_K | 
406/969 | 
41.90 | 
| Llama-3-70b-FP16-Q8_0 | 
398/969 | 
41.07 | 
Totals
| Model | 
Total Correct | 
Total Score (%) | 
| WizardLM-2-7b | 
4173/12032 | 
34.68 | 
| Open-Hermes-2.5-7b | 
4330/12032 | 
35.99 | 
| Mistral-7b-Inst-v0.3-q8 | 
3825/12032 | 
31.79 | 
| Llama-3-8b-q4_K_M | 
2862/12032 | 
23.79 | 
| Llama-3-8b-q8 | 
3058/12032 | 
25.42 | 
| Llama-3-8b-SPPO-Iter-3 | 
3210/12032 | 
26.68 | 
| Hermes-2-Theta-Llama-3-8b | 
4799/12032 | 
39.89 | 
| Yi-1.5-9b-32k-q8 | 
3066/12032 | 
25.48 | 
| Phi-Medium-128k-q8 | 
3679/12032 | 
30.58 | 
| Mixtral-8x7b-Instruct-Q8 | 
4335/12032 | 
36.03 | 
| Dolphin-Mixtral-2.5-8x7b | 
4846/12032 | 
40.27 | 
| Nous-Capybara-34b | 
4827/12032 | 
40.12 | 
| Yi-1.5-34B-32K-Q8 | 
5571/12032 | 
46.30 | 
| Command-R-v01-Q8 | 
1847/12032 | 
15.35 | 
| Llama-3-70b-FP16-Q2_KXXS | 
4849/12032 | 
40.30 | 
| Llama-3-70b-FP16-Q2_K | 
5722/12032 | 
47.56 | 
| Llama-3-70b-FP16-Q4_K_M | 
6445/12032 | 
53.57 | 
| Llama-3-70b-FP16-Q5_K_M | 
6571/12032 | 
54.61 | 
| Llama-3-70b-FP16-Q6_K | 
6480/12032 | 
53.86 | 
| Llama-3-70b-FP16-Q8_0 | 
6509/12032 | 
54.10 |