{"columns":[{"alerts":[{"code":"long_tail","level":"info","message":"28 singleton categories"}],"column":"shape","extras":{"singletons":28,"top_values":[["light",1],["triangle",1],["circle",1],["fireball",1],["unknown",1],["other",1],["sphere",1],["disk",1],["oval",1],["formation",1],["cigar",1],["changing",1],["flash",1],["rectangle",1],["cylinder",1],["diamond",1],["chevron",1],["teardrop",1],["egg",1],["cone",1]]},"kind":"categorical","n":28,"n_null":0,"n_unique":28,"null_rate":0.0,"stats":{"cardinality":28,"entropy":4.807354922057606,"entropy_ratio":1.0000000000000004,"top_rate":0.03571428571428571,"top_value":"light"}},{"alerts":[{"code":"high_skew","level":"info","message":"skew=+2.06"}],"column":"count","extras":{"histogram":{"counts":[19,6,2,0,1],"edges":[1.0,2576.2,5151.4,7726.599999999999,10301.8,12877.0]},"sample":[12877.0,6268.0,5885.0,4935.0,4358.0,4209.0,4131.0,3850.0,2879.0,1906.0,1569.0,1515.0,1024.0,1010.0,977.0,884.0,774.0,560.0,554.0,235.0,177.0,6.0,2.0,1.0,1.0,1.0,1.0,1.0]},"kind":"numeric","n":28,"n_null":0,"n_unique":24,"null_rate":0.0,"stats":{"iqr":3786.0,"kurtosis":4.84487195488125,"max":12877.0,"mean":2163.9285714285716,"median":993.5,"min":1.0,"n_outliers":1,"outlier_rate":0.03571428571428571,"q1":134.25,"q3":3920.25,"skew":2.0595403749984733,"std":2876.242352233738,"zero_rate":0.0}},{"alerts":[{"code":"skipped","level":"info","message":"no profiler for kind=unknown"}],"column":"sightings","extras":{},"kind":"unknown","n":28,"n_null":0,"n_unique":null,"null_rate":0.0,"stats":{}},{"alerts":[{"code":"skipped","level":"info","message":"no profiler for kind=unknown"}],"column":"yearlyTrend","extras":{},"kind":"unknown","n":28,"n_null":0,"n_unique":null,"null_rate":0.0,"stats":{}},{"alerts":[{"code":"high_skew","level":"info","message":"skew=+3.95"},{"code":"outliers","level":"warn","message":"7.1% rows beyond 1.5 IQR"}],"column":"avgDuration","extras":{"histogram":{"counts":[26,0,1,0,1],"edges":[30.0,7584.0,15138.0,22692.0,30246.0,37800.0]},"sample":[15681.518888716317,1438.8383391831524,2748.2666270178424,3161.8138865248225,6414.719573198715,4195.949322879544,1610.0007141128056,1419.1405792207793,2334.1149704758595,1021.5827806925499,2240.8626513703,2152.8633663366336,3118.99365234375,643.3143564356436,4317.102354145343,1453.2392533936652,484.32512919896647,980.40625,2193.5191335740074,1660.4340425531916,765.3107344632768,2682.5,452.5,30.0,3600.0,120.0,240.0,37800.0]},"kind":"numeric","n":28,"n_null":0,"n_unique":28,"null_rate":0.0,"stats":{"iqr":2203.0663397731987,"kurtosis":15.416452925463599,"max":37800.0,"mean":3748.61845020847,"median":1906.6487044449127,"min":30.0,"n_outliers":2,"outlier_rate":0.07142857142857142,"q1":926.6323711158192,"q3":3129.698710889018,"skew":3.9480599708675395,"std":7305.740611459118,"zero_rate":0.0}}],"insights":{"errors":[],"insights":[{"confidence":"high","critiques":[],"evidence_keys":["row_count","column_count","columns[avgDuration].stats","columns[count].stats","columns[shape].stats","columns[shape].top_values"],"featured_charts":[{"caption":"Check the long right tail \u2014 most shapes have low counts but a few exceed several thousand sightings.","column":"count","kind":"histogram"},{"caption":"Look for the extreme outliers stretching up to 37,800 against a median near 1,907.","column":"avgDuration","kind":"histogram"},{"caption":"Each shape appears once, so use this as a lookup of the 28 categories rather than a frequency view.","column":"shape","kind":"bar"}],"model":"anthropic:claude-opus-4-7","narrative":"This dataset aggregates UFO sightings by shape, with 28 rows and 5 columns covering shape categories, sighting counts, average durations, and nested sightings/yearly trend data. The numeric fields are highly skewed: avgDuration ranges from 30 to 37,800 with a mean of about 3,749 and skew near 3.95, while count ranges from 1 to 12,877 with a median of just 993.5. Both fields flag outliers worth inspecting \u2014 likely a few dominant shape categories pulling the distribution. The shape column has 28 unique values (one row per shape), so it functions as an identifier rather than a grouping variable. Start by looking at which shapes drive the count and duration extremes.","scope":"dataset","target":"__global__"},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","cardinality","entropy_ratio","top_rate","top_value","top_values","null_rate"],"model":"anthropic:claude-opus-4-7","narrative":"This column enumerates UFO shape descriptors (light, triangle, circle, fireball, sphere, disk, oval, formation, etc.). Every one of the 28 rows holds a distinct value, giving cardinality 28 and entropy_ratio 1.0 \u2014 the column behaves as a unique key rather than a categorical feature. The presence of bucket terms like 'unknown' and 'other' alongside specific shapes suggests this is a reference/lookup list of shape categories, not observations.","role":"identifier","scope":"column","target":"shape","treatment":"Treat as a lookup dimension; left-join on this key rather than using as a model feature."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","zero_rate","stats.min","stats.max","stats.mean","stats.median","stats.skew","stats.kurtosis","stats.iqr","stats.std","stats.n_outliers"],"model":"anthropic:claude-opus-4-7","narrative":"Numeric tally column with 28 rows, 24 unique values, no nulls or zeros, ranging from 1 to 12877 with a median of 993.5 and mean of 2163.93. The distribution is heavily right-skewed (skew 2.06, kurtosis 4.84) with one outlier flagged at the high end and an IQR of 3786 against a std of 2876.24.","role":"feature","scope":"column","target":"count","treatment":"log-transform before modelling to tame the right skew."},{"confidence":"low","critiques":[],"evidence_keys":["alerts","kind","n","null_rate","n_unique","stats"],"model":"anthropic:claude-opus-4-7","narrative":"The column 'sightings' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. The only confirmed facts are 28 rows and a null rate of 0.0; cardinality and value distribution are unavailable.","role":"other","scope":"column","target":"sightings","treatment":"Re-profile or inspect manually to determine type before any downstream use."},{"confidence":"low","critiques":[],"evidence_keys":["alerts","kind","n","null_rate","stats"],"model":"anthropic:claude-opus-4-7","narrative":"The column 'yearlyTrend' was skipped by the profiler, so its kind is unknown and no statistics were computed beyond a row count of 28 and a null rate of 0.0. With no uniqueness, type, or value signals available, its content and role cannot be inferred from this evidence.","role":"other","scope":"column","target":"yearlyTrend","treatment":"Re-profile this column with parsing enabled before deciding on any downstream handling."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","stats.mean","stats.median","stats.min","stats.max","stats.std","stats.skew","stats.kurtosis","stats.q1","stats.q3","stats.iqr","stats.n_outliers","stats.outlier_rate"],"model":"anthropic:claude-opus-4-7","narrative":"Likely a per-group average duration metric (probably seconds) summarised across 28 unique entities with no nulls. The distribution is heavily right-skewed (skew 3.95, kurtosis 15.42) with a median of 1906.65 but a max of 37800 \u2014 roughly 20x the median \u2014 and 2 outliers (7.14%) pulling the mean up to 3748.62. Standard deviation (7305.74) exceeds the mean, confirming a long tail.","role":"feature","scope":"column","target":"avgDuration","treatment":"Log-transform before modelling to tame the right tail and outliers."}],"providers":["anthropic:claude-opus-4-7"],"total_usage":{"completion_tokens":1732,"prompt_tokens":4833,"total_tokens":6565}},"language_counts":{},"meta":{"generated_at":"2026-05-01T17:08:23+00:00","mode":"full","row_count":28,"sampled_rows":28,"seed":42,"source":"/home/coolhand/html/datavis/data_trove/data/quirky/ufo_shapes_aggregated.json"},"notes":[],"saturn_version":"0.2.0","schema":{"avgDuration":"numeric","count":"numeric","shape":"categorical","sightings":"unknown","yearlyTrend":"unknown"}}
