Summary confidence: high
This dataset contains 12,383 multiple-choice questions tagged by subject, grade, and skill, likely from an educational platform. The content is heavily skewed toward language arts (10,068 rows) over social studies (2,315), and grade-5 is the single largest grade bucket at 2,537 rows. The question text shows a notable 24.3% duplicate rate with 3,008 repeats, so deduplication is worth considering before any modeling. Answer indices range 0-3 but are concentrated at 0 and 1 (43% are zero), suggesting possible position bias in the correct-answer distribution. Skill coverage is broad with 402 distinct skills, none dominating (top skill is only 1.9% of rows).
citing: row_count · columns.subject.top_values · columns.grade.top_values · columns.question.stats.duplicate_rate · columns.question.stats.n_duplicates · columns.answer_idx.stats.zero_rate · columns.answer_idx.n_unique · columns.skill.n_unique · columns.skill.stats.top_rate