{"columns":[{"alerts":[],"column":"subject","extras":{"singletons":0,"top_values":[["language arts",10068],["social studies",2315]]},"kind":"categorical","n":12383,"n_null":0,"n_unique":2,"null_rate":0.0,"stats":{"cardinality":2,"entropy":0.6950469974113564,"entropy_ratio":0.6950469974113564,"top_rate":0.8130501493983687,"top_value":"language arts"}},{"alerts":[],"column":"grade","extras":{"singletons":0,"top_values":[["grade-5",2537],["grade-3",1559],["grade-8",1389],["grade-9",1025],["grade-6",1017],["grade-7",807],["grade-11",790],["grade-2",653],["grade-10",649],["grade-12",564],["grade-4",541],["kindergarten",446],["grade-1",284],["pre-k",122]]},"kind":"categorical","n":12383,"n_null":0,"n_unique":14,"null_rate":0.0,"stats":{"cardinality":14,"entropy":3.513169970171711,"entropy_ratio":0.922732459172231,"top_rate":0.2048776548493903,"top_value":"grade-5"}},{"alerts":[],"column":"skill","extras":{"singletons":24,"top_values":[["understand-overall-supply-and-demand",238],["choose-between-adjectives-and-adverbs",176],["determine-the-meanings-of-greek-and-latin-roots",165],["describe-the-difference-between-related-words",162],["costs-and-benefits",160],["is-it-a-complete-sentence-or-a-fragment",156],["use-greek-and-latin-roots-as-clues-to-the-meanings-of-words",137],["identify-vague-pronoun-references",136],["what-does-the-punctuation-suggest",131],["determine-the-meanings-of-words-with-greek-and-latin-roots",128],["use-the-correct-homophone",128],["analogies",127],["classify-logical-fallacies",123],["is-it-a-phrase-or-a-clause",117],["is-the-sentence-declarative-interrogative-imperative-or-exclamatory",116],["analogies-challenge",113],["is-the-sentence-simple-compound-complex-or-compound-complex",108],["is-it-a-complete-sentence-or-a-run-on",107],["use-guide-words",106],["recall-the-source-of-an-allusion",105]]},"kind":"categorical","n":12383,"n_null":0,"n_unique":402,"null_rate":0.0,"stats":{"cardinality":402,"entropy":7.948651455939462,"entropy_ratio":0.9188075322731373,"top_rate":0.01921989824759751,"top_value":"understand-overall-supply-and-demand"}},{"alerts":[{"code":"duplicates","level":"warn","message":"24.3% duplicate strings"}],"column":"question","extras":{"language_counts":{"__engine":"fasttext:4,381","en":4379,"es":2},"language_sample_size":5000,"length_histogram":{"counts":[8306,2142,439,337,253,150,65,30,32,52,38,20,30,28,43,42,26,67,51,27,12,13,22,13,13,5,10,21,15,23,12,13,11,4,9,6,1,0,0,2],"edges":[3.0,163.95,324.9,485.84999999999997,646.8,807.75,968.6999999999999,1129.6499999999999,1290.6,1451.55,1612.5,1773.4499999999998,1934.3999999999999,2095.35,2256.2999999999997,2417.25,2578.2,2739.1499999999996,2900.1,3061.0499999999997,3222.0,3382.95,3543.8999999999996,3704.85,3865.7999999999997,4026.7499999999995,4187.7,4348.65,4509.599999999999,4670.549999999999,4831.5,4992.45,5153.4,5314.349999999999,5475.299999999999,5636.25,5797.2,5958.15,6119.099999999999,6280.049999999999,6441.0]},"near_unique":false,"sample":["Biology is the study of living things. The root **logy** means \"the study of something\". What does the root **bio** mean?","Complete the text with the better word.\n\nThe managers were impressed with Jamie's ___ during his interview. He spoke clearly about his relevant experience and didn't shy away from asking questions.","Guitar makers use a special kind of wood to build their instruments. But the trees that produce that kind of wood are running out. Over time, what will probably happen to the overall supply of new guitars?","What does the following sentence suggest?\n\nUnfortunately, all the wedding rings in Gabby's store that are made of pure gold are above my price range.","What is the meaning of **joined at the hip**?\n\nJulia and her friend Beth met in middle school and immediately became **joined at the hip**.","Which is the best way to complete the text?\n\nA colony of carpenter bees built a nest in the decrepit wood shed in our ___ don't intend to disturb them in their new home.","Complete the sentence with the correct homophone.\n\nWe heard a noisy owl in the woods last  ___","What kind of sentence is this?\n\nWhy are Dillon and Maya arguing?","Which expression of thanks is more formal?","In 1800, where did most Americans live?","Which is a complete sentence?","What does **call** mean in this sentence?\n\nThe **call** of the blue whale can be heard hundreds of miles away.","The word hydrant contains the root hydro. What does the root hydro mean?","Which text best completes the sentence?\n\nTisha is ___.","Pick the uppercase letter that matches.\n\nm","Complete the text with the better conjunctive adverb.\n\n\"I'm allergic to peanuts, so I'll pass on the peanut butter bars,\" Rodrigo said. \n\"___, they must be stale; they've been sitting on the counter for more than a week.\"","Which of the following contains a vague pronoun reference?","Consider this claim:\n\nWikis, or websites that allow users to generate and alter content, are not reliable sources for academic research.\n\nWhich is the strongest evidence to support the claim?","Select the quotation from the story that most strongly suggests that the theme is **It's often better to do a good job on something simple than attempt too many things at once**.","The artist planned to carve the presidents' heads and bodies. Why are only the heads included now?","Review paragraphs 4 and 5.\n\ntitle: The Burning Cuyahoga\nIn northern Ohio, the Cuyahoga River winds through the city of Cleveland before flowing into Lake Erie. People enjoy kayaking and fishing on the nearly one-hundred-mile-long river. Birds, beavers, and other wildlife can sometimes be spotted along its banks. But the river wasn't always a clean, safe place to be. In fact, the river used to be so polluted that it often caught on fire!\nThe Cuyahoga started getting polluted around the middle of the nineteenth century, when factories began using the river as a place to dump their waste. This practice continued for over one hundred years. Over time, the river became filled with oil and other harmful materials. Some of these materials were highly flammable, and this led to fires on the river.\nOn June 22, 1969, a spark from a passing train once again ignited the river. Luckily, the fire was put out in about twenty minutes. Over the years, people in Cleveland had gotten used to the fires, so no one paid much attention to this one. Later that summer, though, a reporter from a popular American news magazine wrote an article about the event. There were no photos of the June fire, so the magazine used a photo from an earlier, bigger fire. That fire had burned for three days.\nThe magazine article shocked many Americans. The burning Cuyahoga became a symbol of America's pollution predicament. Dirty land, air, and water were creating trouble across the country, even hurting people and animals. More and more people were saying that the government needed to take action. As a result, President Nixon helped form the Environmental Protection Agency (EPA) in 1970. As part of the government, the EPA makes rules about the kinds and amounts of materials that can be put into the environment.\nWhen the EPA first examined the Cuyahoga River in 1972, they found hardly any fish. They discovered high levels of harmful metals and chemicals. But after years of work, the Cuyahoga now has thousands of fish. The city of Cleveland has even used the river as a source for drinking water. These improvements are, in part, the result of a fire that helped spark change.\n\nBased on clues in the text, which of the following has most likely contributed to the improved condition of the Cuyahoga River?","Which figure of speech is used in this text?\n\nWhen I was a teenager, **reading was my ticket** to foreign lands: I traveled to India, China, and Antarctica without ever leaving the comfort of my home.","What is the meaning of **walking on air**?\n\nVijay was **walking on air** when he became editor of the city's largest newspaper.","Complete the sentence with the correct helping verb or verbs.\n\n\"We ___ eating dinner at seven o'clock, so don't be late,\" Mrs. Christensen said.","Read the text.\n\nThe flying beetle known as the firefly or lightning bug is named for its most noticeable feature: its nighttime bioluminescence, or glow. The firefly's light-producing organ contains calcium, the pigment luciferin, the light-producing enzyme luciferase, and the chemical adenosine triphosphate (ATP). When oxygen is added to this mix, it creates light, and the insect glows. Scientists don't fully understand how the firefly makes its light blink, but one theory is that the firefly turns the light on and off by controlling the input of oxygen into the light-producing organ.\n\nWhich organizational structure does this text primarily use?","Complete the sentence.\n\nFinding the Northwest Passage would make it easier for the United States to trade with ___. ","Read the sentences.\n\nKayla held her head high. She smiled and shook the teacher's hand.\n\nHow is Kayla probably feeling?","Read the claim and the supporting evidence.\n\n**Claim:** Diana is a good friend.\n**Evidence:** After Diana inadvertently hurt Henry's feelings, she apologized.\n\nWhy does the evidence support the claim? Choose the **analysis** that better explains the connection.","What does the following sentence suggest?\n\nThe need for discipline and the importance of accepting defeat with grace are lessons that Wendy learned from her coach, Darrell.","What is the meaning of **par for the course**?","Choose the poem that uses **onomatopoeia**.","When is Rosh Hashanah?","Which text uses the word **disinterested** in its traditional sense?","Read the sentence.\n\nChewing his lip, Franklin tapped his fingers on the table as he waited for his test grade.\n\nBased on this sentence, how is Franklin probably feeling?","The following texts both describe going up a mountain in a chairlift.\n\nThis text is from a poster at the Jefferson Ski Center:\nThe chairlifts at the Jefferson Ski Center are new and improved. You will love their safe, modern design so much that riding up the mountain might be more fun than skiing down! Enjoy the scenery as you soar to the top of the mountain.\n\nThis text is told from the point of view of Lila, a woman on a chairlift:\nThe chairlift whipped around the bend. It picked me up, and I hung on anxiously as it carried me high into the sky. I peeked down at the happy skiers on the ground below but quickly shut my eyes again. I couldn't wait to get off.\n\nHow is Lila's point of view different from the poster's?","The word **diameter** contains the root **meter**. What does the root **meter** mean?","Last year, 50,000 people lived in the city of Dayton. But since then, 8,000 people have moved away. What probably happened to the overall supply of houses for sale in Dayton?","Complete the sentence.\n\nAt the start of the Great Depression, ___ was president of the United States.","Which sentence is correct?","What does the word dispensary mean?","As soon as he became Secretary of the Treasury, Alexander Hamilton began to work on a financial plan for the country. The financial plan was a plan for how to set up the government's financial system and how to improve the economy.\n\nIn the passage below, President Washington describes one reason why setting up a financial plan for the country was so important in the 1790s. Read the modified passage. Then complete the text below.\n\nDuring the Revolutionary War, we, the United States, needed much more money than we had to fight the war.\nAmerican citizens fought hard and showed their loyalty. This dedication led to our victory. But a load of debt was left upon us.\n\nComplete the sentence.\n\nWashington explained that Hamilton's financial plan was necessary because ___.","What does **well** mean in this sentence?\n\nNow that Brenda takes lessons, she plays the fiddle quite **well**.","Which narrative point of view is shown in the passage?\n\nThe bus is coming now, and you're staring at the tips of your black shoes. You've got to be prepared. You put your hand in your pocket, search among the coins, and finally take out thirty centavos. You've got to be prepared. You grab the handrail\u2014the bus slows down but doesn't stop\u2014and jump aboard.\n\nFrom Carlos Fuentes, Aura. Copyright 1962 by Carlos Fuentes","Is the group of words in bold a phrase or a clause?\n\n**Gina's suitcase was too large to fit in the overhead bin**, so unfortunately she had to check it.","You want to **persuade** someone **to make homemade bread**. \"Persuade\" means \"get someone to do something or believe something\". What should you write?","What is the difference between conversation and communication?","Complete the sentence with the word that best fits the overall meaning and tone.\n\nRafi appreciated the gallery owner's ___ efforts to include artwork from the local community.","Is the word in bold a direct object or an indirect object?\n\nThe University of Morristown gave Caleb **funding** for his research project.","Which type of sentence is this?\n\nHector always approaches difficult tasks enthusiastically, and he frequently motivates others with his energy and fervor.","Which type of sentence is this?\n\nDesmond always approaches difficult tasks enthusiastically, and he frequently motivates others with his energy and fervor."],"top_values":[["Which sentence is correct?",171],["Which sentence states a fact?",146],["Which of the following contains a vague pronoun reference?",136],["Which is a **complete sentence**?",92],["Which is a **run-on sentence**?",90],["Which is a complete sentence?",71],["Which word does not rhyme?",70],["Which sentence is more formal?",65],["Which is a sentence fragment?",56],["Which is a **simple sentence**?",39],["Complete the text.\n\nIn the United States, Thanksgiving is always celebrated on the fourth Thursday of ___.",35],["Which president signed the declaration of war on Great Britain in 1812?",35],["Who was president of the United States during the War of 1812?",35],["Which sentence is in the past tense?",33],["Which sentence is in passive voice?",33],["Which is a **compound sentence**?",31],["Which is a thesis statement?",30],["Consider this claim:\n\nWikis, or websites that allow users to generate and alter content, are not reliable sources for academic research.\n\nWhich is the strongest evidence to support the claim?",26],["Which is a **sentence fragment**?",24],["Which word is not like the others?",21]],"top_words":[["the",28598],["of",8646],["to",8624],["a",8434],["in",6854],["is",6493],["and",5740],["which",3583],["that",3383],["with",2878],["complete",2733],["what",2661],["was",2657],["sentence",2590],["for",2369],["on",2073],["___",1901],["or",1857],["this",1855],["word",1673],["at",1620],["he",1578],["his",1568],["as",1549],["her",1502]],"vocab_skipped":null,"word_histogram":{"counts":[9406,1266,485,287,201,67,30,43,46,26,21,38,49,47,73,63,42,20,22,23,9,21,4,23,20,19,15,11,4,2],"edges":[1.0,33.8,66.6,99.39999999999999,132.2,165.0,197.79999999999998,230.59999999999997,263.4,296.2,329.0,361.79999999999995,394.59999999999997,427.4,460.19999999999993,492.99999999999994,525.8,558.5999999999999,591.4,624.1999999999999,657.0,689.8,722.5999999999999,755.4,788.1999999999999,820.9999999999999,853.8,886.5999999999999,919.3999999999999,952.1999999999999,985.0]}},"kind":"text","n":12383,"n_null":0,"n_unique":9375,"null_rate":0.0,"stats":{"allcaps_rate":0.00032302349995962207,"boilerplate_rate":0.0,"duplicate_rate":0.24291367196963579,"emoji_rate":0.0,"len_max":6441,"len_mean":313.4049099571994,"len_median":120.0,"len_min":3,"len_p95":1501.899999999996,"n_duplicates":3008,"n_empty":0,"one_word_rate":0.00024226762496971655,"readability_flesch_mean":73.85258447547054,"url_rate":0.0,"vocab_size":31660,"word_mean":52.19793264960026,"word_median":20.0}},{"alerts":[{"code":"skipped","level":"info","message":"no profiler for kind=unknown"}],"column":"choices","extras":{},"kind":"unknown","n":12383,"n_null":0,"n_unique":null,"null_rate":0.0,"stats":{}},{"alerts":[],"column":"answer_idx","extras":{"histogram":{"counts":[5317,0,0,0,0,0,0,0,0,0,0,0,0,5132,0,0,0,0,0,0,0,0,0,0,0,0,1357,0,0,0,0,0,0,0,0,0,0,0,0,577],"edges":[0.0,0.075,0.15,0.22499999999999998,0.3,0.375,0.44999999999999996,0.525,0.6,0.6749999999999999,0.75,0.825,0.8999999999999999,0.975,1.05,1.125,1.2,1.275,1.3499999999999999,1.425,1.5,1.575,1.65,1.7249999999999999,1.7999999999999998,1.875,1.95,2.025,2.1,2.175,2.25,2.3249999999999997,2.4,2.475,2.55,2.625,2.6999999999999997,2.775,2.85,2.925,3.0]},"sample":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,2.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,0.0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,3.0,3.0,3.0,1.0,0.0,1.0,1.0,1.0,3.0,0.0,0.0,1.0,3.0,2.0,2.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,3.0,0.0,1.0,2.0,1.0,0.0,0.0,2.0,2.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,3.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,2.0,2.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,2.0,0.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,2.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,1.0,0.0,3.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,2.0,1.0,0.0,2.0,2.0,0.0,1.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,2.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,3.0,1.0,0.0,2.0,0.0,1.0,1.0,3.0,2.0,1.0,0.0,1.0,2.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,3.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,2.0,0.0,3.0,0.0,0.0,0.0,3.0,3.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,2.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,3.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,2.0,0.0,1.0,0.0,2.0,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0,1.0,2.0,1.0,2.0,0.0,2.0,1.0,1.0,0.0,1.0,0.0,0.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,0.0,0.0,2.0,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,2.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,2.0,0.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,1.0,0.0,2.0,1.0,0.0,1.0]},"kind":"numeric","n":12383,"n_null":0,"n_unique":4,"null_rate":0.0,"stats":{"iqr":1.0,"kurtosis":0.40778788837660596,"max":3.0,"mean":0.7733990147783252,"median":1.0,"min":0.0,"n_outliers":577,"outlier_rate":0.04659613986917548,"q1":0.0,"q3":1.0,"skew":0.9448090103574662,"std":0.8210089503857249,"zero_rate":0.4293789873213276}}],"insights":{"errors":[],"insights":[{"confidence":"high","critiques":[],"evidence_keys":["row_count","columns.subject.top_values","columns.grade.top_values","columns.question.stats.duplicate_rate","columns.question.stats.n_duplicates","columns.answer_idx.stats.zero_rate","columns.answer_idx.n_unique","columns.skill.n_unique","columns.skill.stats.top_rate"],"featured_charts":[{"caption":"Shows the heavy tilt toward language arts over social studies.","column":"subject","kind":"donut"},{"caption":"Reveals which grade levels are best represented, with grade-5 leading.","column":"grade","kind":"bar"},{"caption":"Check whether correct-answer positions are evenly spread or biased toward 0 and 1.","column":"answer_idx","kind":"histogram"},{"caption":"Question lengths span from 3 to 6,441 characters; look for a long tail of unusually long prompts.","column":"question","kind":"length"},{"caption":"Top skills are fairly evenly distributed across 402 categories \u2014 no single skill dominates.","column":"skill","kind":"bar"}],"model":"anthropic:claude-opus-4-7","narrative":"This dataset contains 12,383 multiple-choice questions tagged by subject, grade, and skill, likely from an educational platform. The content is heavily skewed toward language arts (10,068 rows) over social studies (2,315), and grade-5 is the single largest grade bucket at 2,537 rows. The question text shows a notable 24.3% duplicate rate with 3,008 repeats, so deduplication is worth considering before any modeling. Answer indices range 0-3 but are concentrated at 0 and 1 (43% are zero), suggesting possible position bias in the correct-answer distribution. Skill coverage is broad with 402 distinct skills, none dominating (top skill is only 1.9% of rows).","scope":"dataset","target":"__global__"},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","stats.cardinality","stats.top_rate","stats.top_value","top_values"],"model":"anthropic:claude-opus-4-7","narrative":"Binary subject indicator with only two values: 'language arts' (10068 rows, 81.3%) and 'social studies' (2315 rows). No nulls across 12383 rows. The class imbalance is notable\u2014roughly 4.3:1 in favor of language arts\u2014which will skew any per-subject aggregation or modelling.","role":"label","scope":"column","target":"subject","treatment":"One-hot or binary encode; account for the 4:1 imbalance when stratifying or modelling."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","stats.cardinality","stats.entropy_ratio","stats.top_rate","stats.top_value","top_values"],"model":"anthropic:claude-opus-4-7","narrative":"Categorical grade level with 14 unique values across 12,383 rows and no nulls. Distribution is fairly even (entropy ratio 0.92), though grade-5 leads at 20.5% and the top 10 values shown span grade-2 through grade-12, suggesting a K-12 schema with a few additional buckets not displayed.","role":"feature","scope":"column","target":"grade","treatment":"Encode as ordinal (extract numeric grade level) before modelling."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","stats.cardinality","stats.entropy_ratio","stats.top_rate","stats.top_value","top_values"],"model":"anthropic:claude-opus-4-7","narrative":"Slug-style identifiers for educational skills (e.g., \"understand-overall-supply-and-demand\", \"choose-between-adjectives-and-adverbs\"), spanning topics from economics to grammar and vocabulary. With 402 unique values across 12,383 rows and entropy ratio 0.919, the distribution is nearly flat \u2014 the most common skill accounts for only 1.92% of rows. Content mixes domains (ELA and economics) suggesting this column tags items from a multi-subject curriculum.","role":"label","scope":"column","target":"skill","treatment":"Treat as a high-cardinality categorical label; target-encode or embed rather than one-hot."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","language_counts","stats.duplicate_rate","stats.n_duplicates","stats.len_min","stats.len_median","stats.len_max","stats.word_mean","stats.readability_flesch_mean","stats.vocab_size","top_values"],"model":"anthropic:claude-opus-4-7","narrative":"Free-text question prompts, almost entirely English (4379 en vs 2 es detected) with a wide length spread (min 3, median 120, max 6441 chars; mean 52 words). High duplication is the standout: 3008 rows (24.3%) repeat, with stems like 'Which sentence is correct?' (171) and 'Which sentence states a fact?' (146) dominating, suggesting templated educational/grammar items. Readability is easy (Flesch 73.9) and vocab is moderate (31660 unique tokens across 9375 distinct questions).","role":"free_text","scope":"column","target":"question","treatment":"Tokenize/embed for modelling and decide whether to dedupe the 3008 repeated prompts before train/test splitting to avoid leakage."},{"confidence":"low","critiques":[],"evidence_keys":["alerts","kind","n","null_rate","n_unique","stats"],"model":"anthropic:claude-opus-4-7","narrative":"The column 'choices' was skipped by the profiler (kind 'unknown'), so no descriptive statistics, cardinality, or value samples are available beyond a row count of 12383 and a null rate of 0.0. Without type inference or distributional signals it is impossible to characterise the contents from this evidence alone. The name suggests it may hold structured option sets (e.g. lists or JSON), which would explain why the dissector could not coerce it into a primitive type.","role":"other","scope":"column","target":"choices","treatment":"Inspect raw values manually and parse (likely JSON/list) before deciding on a downstream representation."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","stats.min","stats.max","stats.zero_rate","stats.median","stats.skew","stats.n_outliers","stats.outlier_rate"],"model":"anthropic:claude-opus-4-7","narrative":"This is almost certainly a categorical answer index encoded as an integer, taking only 4 distinct values from 0 to 3 across 12,383 rows with no nulls. The distribution is heavily weighted toward the low end: 43% are zero and the median is 1, with skew 0.94 indicating answer choice 3 is comparatively rare. The 577 flagged outliers (4.7%) are an artifact of applying numeric outlier rules to what is really a small categorical set.","role":"label","scope":"column","target":"answer_idx","treatment":"Treat as a categorical class label, not a numeric value."}],"providers":["anthropic:claude-opus-4-7"],"total_usage":{"completion_tokens":2354,"prompt_tokens":7555,"total_tokens":9909}},"language_counts":{"en":4379,"es":2},"meta":{"generated_at":"2026-05-01T18:08:01+00:00","mode":"full","row_count":12383,"sampled_rows":12383,"seed":42,"source":"/home/coolhand/html/datavis/data_trove/cache/quirky/social_norms.parquet"},"notes":[],"saturn_version":"0.2.0","schema":{"answer_idx":"numeric","choices":"unknown","grade":"categorical","question":"text","skill":"categorical","subject":"categorical"}}
