{"columns":[{"alerts":[{"code":"long_tail","level":"info","message":"192 singleton categories"}],"column":"ID","extras":{"singletons":192,"top_values":[["1A",1],["2A",1],["3A",1],["4A",1],["5A",1],["6A",1],["7A",1],["8A",1],["9A",1],["10A",1],["10B",1],["11A",1],["12A",1],["13A",1],["14A",1],["15A",1],["16A",1],["17A",1],["18A",1],["19A",1]]},"kind":"categorical","n":192,"n_null":0,"n_unique":192,"null_rate":0.0,"stats":{"cardinality":192,"entropy":7.584962500721157,"entropy_ratio":1.0000000000000002,"top_rate":0.005208333333333333,"top_value":"1A"}},{"alerts":[{"code":"long_tail","level":"info","message":"192 singleton categories"}],"column":"Name","extras":{"singletons":192,"top_values":[["Consonant Inventories",1],["Vowel Quality Inventories",1],["Consonant-Vowel Ratio",1],["Voicing in Plosives and Fricatives",1],["Voicing and Gaps in Plosive Systems",1],["Uvular Consonants",1],["Glottalized Consonants",1],["Lateral Consonants",1],["The Velar Nasal",1],["Vowel Nasalization",1],["Nasal Vowels in West Africa",1],["Front Rounded Vowels",1],["Syllable Structure",1],["Tone",1],["Fixed Stress Locations",1],["Weight-Sensitive Stress",1],["Weight Factors in Weight-Sensitive Stress Systems",1],["Rhythm Types",1],["Absence of Common Consonants",1],["Presence of Uncommon Consonants",1]]},"kind":"categorical","n":192,"n_null":0,"n_unique":192,"null_rate":0.0,"stats":{"cardinality":192,"entropy":7.584962500721157,"entropy_ratio":1.0000000000000002,"top_rate":0.005208333333333333,"top_value":"Consonant Inventories"}},{"alerts":[{"code":"skipped","level":"info","message":"no profiler for kind=unknown"}],"column":"Description","extras":{},"kind":"unknown","n":192,"n_null":0,"n_unique":null,"null_rate":0.0,"stats":{}},{"alerts":[{"code":"skipped","level":"info","message":"no profiler for kind=unknown"}],"column":"ColumnSpec","extras":{},"kind":"unknown","n":192,"n_null":0,"n_unique":null,"null_rate":0.0,"stats":{}},{"alerts":[],"column":"Chapter_ID","extras":{"histogram":{"counts":[12,12,12,12,11,12,11,13,17,13,11,12,44],"edges":[1.0,12.0,23.0,34.0,45.0,56.0,67.0,78.0,89.0,100.0,111.0,122.0,133.0,144.0]},"sample":[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,10.0,11.0,12.0,13.0,14.0,15.0,16.0,17.0,18.0,19.0,20.0,21.0,21.0,22.0,23.0,24.0,25.0,25.0,26.0,27.0,28.0,29.0,30.0,31.0,32.0,33.0,34.0,35.0,36.0,37.0,38.0,39.0,39.0,40.0,41.0,42.0,43.0,44.0,45.0,46.0,47.0,48.0,49.0,50.0,51.0,52.0,53.0,54.0,55.0,56.0,57.0,58.0,58.0,59.0,60.0,61.0,62.0,63.0,64.0,65.0,66.0,67.0,68.0,69.0,70.0,71.0,72.0,73.0,74.0,75.0,76.0,77.0,78.0,79.0,79.0,80.0,81.0,81.0,82.0,83.0,84.0,85.0,86.0,87.0,88.0,89.0,90.0,90.0,90.0,90.0,90.0,90.0,90.0,91.0,92.0,93.0,94.0,95.0,96.0,97.0,98.0,99.0,100.0,101.0,102.0,103.0,104.0,105.0,106.0,107.0,108.0,108.0,109.0,109.0,110.0,111.0,112.0,113.0,114.0,115.0,116.0,117.0,118.0,119.0,120.0,121.0,122.0,123.0,124.0,125.0,126.0,127.0,128.0,129.0,130.0,130.0,131.0,132.0,133.0,134.0,135.0,136.0,136.0,137.0,137.0,138.0,139.0,140.0,141.0,142.0,143.0,143.0,143.0,143.0,143.0,143.0,143.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0,144.0]},"kind":"numeric","n":192,"n_null":0,"n_unique":144,"null_rate":0.0,"stats":{"iqr":84.5,"kurtosis":-1.261831897362796,"max":144.0,"mean":84.515625,"median":89.5,"min":1.0,"n_outliers":0,"outlier_rate":0.0,"q1":44.75,"q3":129.25,"skew":-0.19499062085643412,"std":45.75177467018306,"zero_rate":0.0}}],"insights":{"errors":[],"insights":[{"confidence":"medium","critiques":[],"evidence_keys":["row_count","column_count","n_unique","null_rate","mean","median","skew","max","min","n_unique"],"featured_charts":[{"caption":"Look for whether parameters cluster in certain chapter ranges or are evenly spread across the 1\u2013144 chapter span.","column":"Chapter_ID","kind":"histogram"},{"caption":"Browse the top parameter names to get a sense of which linguistic features are catalogued \u2014 useful for a quick domain orientation.","column":"Name","kind":"bar"},{"caption":"Check the ID naming pattern (e.g., '1A', '2A') to understand how parameters are coded and whether letter suffixes imply sub-groupings.","column":"ID","kind":"bar"}],"model":"anthropic:default","narrative":"This dataset is a catalogue of 192 linguistic parameters from the World Atlas of Language Structures (WALS), each representing a distinct typological feature of human languages (e.g., 'Consonant Inventories', 'Vowel Quality Inventories'). Every row is uniquely identified by both an ID (e.g., '1A', '2A') and a Name, meaning there are no duplicates or groupings to aggregate within those columns. The most analytically interesting column is Chapter_ID, which groups these 192 parameters into 144 chapters \u2014 indicating that some chapters contain multiple parameters worth investigating. The Chapter_ID distribution is fairly uniform (mean ~84.5, median ~89.5, near-symmetric with slight left skew), suggesting chapters are spread across the full range with no heavy clustering.","scope":"dataset","target":"__global__"},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","entropy_ratio","top_rate","top_values","alerts"],"model":"anthropic:default","narrative":"This column is a row identifier, with 192 unique values across 192 rows, zero nulls, and a maximum entropy ratio of 1.0 \u2014 every value appears exactly once. The alphanumeric pattern (e.g., '1A', '2A', \u2026, '10A') suggests a structured code system, possibly denoting sequential items with a suffix category. The long_tail alert is a statistical artifact of perfect uniqueness rather than a genuine distributional concern.","role":"identifier","scope":"column","target":"ID","treatment":"Exclude from modelling features; retain as a row key for traceability or join operations."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","cardinality","entropy_ratio","top_value","top_values","null_rate"],"model":"anthropic:default","narrative":"This column contains the names of linguistic typology features or chapters, almost certainly from the World Atlas of Language Structures (WALS) or a similar comparative linguistics dataset. Every one of the 192 rows has a unique name (cardinality 192, entropy_ratio 1.0), meaning this column is a perfect natural-language identifier with no repetition. The top values \u2014 'Consonant Inventories', 'Vowel Quality Inventories', 'Consonant-Vowel Ratio' \u2014 confirm these are named phonological/typological feature categories, not free-form text.","role":"label","scope":"column","target":"Name","treatment":"Use as a human-readable row label or index; do not encode as a categorical feature \u2014 treat as an identifier or join key on feature name."},{"confidence":"low","critiques":[],"evidence_keys":["alerts","n","null_rate","kind","n_unique","stats"],"model":"anthropic:default","narrative":"The column 'ColumnSpec' was skipped by the profiler, yielding no distributional statistics, cardinality, or type classification beyond 192 non-null rows. Without further evidence it is impossible to characterise the column's content, range, or role. The complete absence of stats alongside the 'skipped' alert is the surprising signal here \u2014 likely due to an unsupported or complex data type (e.g. nested struct, binary, or array).","role":"other","scope":"column","target":"ColumnSpec","treatment":"Inspect raw values manually to determine type and content before assigning a profiling strategy or modelling treatment."},{"confidence":"low","critiques":[],"evidence_keys":["alerts","n","null_rate","kind","n_unique"],"model":"anthropic:default","narrative":"This column contains textual descriptions with 192 non-null rows and a null rate of 0.0%, suggesting complete population coverage. Profiling was skipped entirely (alert: 'skipped'), so no uniqueness, frequency, or length statistics are available \u2014 the column's content distribution is opaque. The 'unknown' kind designation indicates the profiler could not classify it, which is common for free-text or mixed-content fields. Downstream treatment should proceed cautiously given the absence of any statistical evidence.","role":"free_text","scope":"column","target":"Description","treatment":"Inspect raw values manually, then tokenize and embed or apply NLP preprocessing before modelling."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","stats.min","stats.max","stats.kurtosis","stats.skew","stats.iqr","null_rate"],"model":"anthropic:default","narrative":"This column is a chapter identifier, likely a foreign or primary key linking rows to one of 144 distinct chapters across a book or structured document. With 192 rows but only 144 unique values, each chapter ID appears on average 1.33 times, indicating some chapters have multiple associated records. The near-uniform distribution (kurtosis \u20131.26, skew \u20130.19, IQR spanning almost the full 1\u2013144 range) suggests IDs are assigned sequentially and coverage across chapters is fairly even, though the duplication pattern warrants checking whether it reflects intentional one-to-many relationships or data quality issues.","role":"foreign_key","scope":"column","target":"Chapter_ID","treatment":"Left-join on this ID to the chapters reference table; investigate the 48 duplicate entries to confirm they represent legitimate one-to-many relationships."}],"providers":["anthropic:default"],"total_usage":{"completion_tokens":1521,"prompt_tokens":4036,"total_tokens":5557}},"language_counts":{},"meta":{"generated_at":"2026-06-22T00:13:54+00:00","mode":"full","row_count":192,"sampled_rows":192,"seed":42,"source":"/home/coolhand/html/datavis/data_trove/data/linguistic/wals_parameters.csv"},"notes":[],"saturn_version":"0.2.0","schema":{"Chapter_ID":"numeric","ColumnSpec":"unknown","Description":"unknown","ID":"categorical","Name":"categorical"}}
