Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses Construction And Analysis Of The "TruthEval" Dataset To Expose LLM Weaknesses 31/01/2025 Large Language Models
SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models SportQA, A New Dataset That Measures The Comprehension Of Sports In Large Language Models 30/01/2025 Large Language Models
Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences Proposal For A New Evaluation Method For AI Assistants Based On Human Preferences 29/01/2025 Large Language Models
New UrbanSARFloods Dataset Solves Flood Detection Challenges New UrbanSARFloods Dataset Solves Flood Detection Challenges 15/01/2025 Datasets
Persona Hub, A Large Dataset Built From 1 Billion Personas, Is Now Available! Persona Hub, A Large Dataset Built From 1 Billion Personas, Is Now Available! 19/12/2024 Persona-driven Data Synthesis
InfiMM-WebMath-40B] Improves The Mathematical Performance Of LLM With A Dataset Consisting Of 2.4 Billion Mathematical Documents! InfiMM-WebMath-40B] Improves The Mathematical Performance Of LLM With A Dataset Consisting Of 2.4 Bi ... 30/10/2024 Datasets
IndiBias, A New Dataset For Measuring India-specific Social Biases IndiBias, A New Dataset For Measuring India-specific Social Biases 16/08/2024 Large Language Models
[EDAT24] Event-based Dataset Specialized For Manufacturing Operation Classification [EDAT24] Event-based Dataset Specialized For Manufacturing Operation Classification 05/08/2024 Datasets
[JMMLU] Prompt Politeness Affects LLM Performance! [JMMLU] Prompt Politeness Affects LLM Performance! 26/07/2024 ChatGPT
Analog And Multimodal Manufacturing Data Sets Acquired On The Future Factory Platform Analog And Multimodal Manufacturing Data Sets Acquired On The Future Factory Platform 30/05/2024 Datasets
OpenToM, A Benchmark For Evaluating Whether An LLM Has A "theory Of Mind," Is Now Available! OpenToM, A Benchmark For Evaluating Whether An LLM Has A "theory Of Mind," Is Now Available! 24/05/2024 Datasets
BioPlanner" And "BIOPROT Dataset" Automate Experimental Protocols For Biological Research BioPlanner" And "BIOPROT Dataset" Automate Experimental Protocols For Biological Research 24/05/2024 Large Language Models
Investigation Of A Method To Continuously Authenticate Users With Mouse Movements Investigation Of A Method To Continuously Authenticate Users With Mouse Movements 20/05/2024 Machine Learning
Machine Learning System For Continuous Certification With New Datasets Machine Learning System For Continuous Certification With New Datasets 17/05/2024 Machine Learning