The questions were asked individually to various LLMs using their default settings. The models tested included Claude 3 Opus, ChatGPT3.5 and ChatGPT4, Gemini 1.5 Pro and Gemini 1.0, Mistral Large, ...
Some results have been hidden because they may be inaccessible to you