data-trove-steam-games-catalog

Overview

Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/enriched/games.csv

Saturn profiled 122,611 rows across 40 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/entertainment/gaming/enriched/games.csv",
    "--findings", "data-trove-steam-games-catalog.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This is a Steam games catalogue with 122,611 rows and 40 columns, covering titles, publishers, developers, genres, pricing, review counts, and associated URLs. The most important thing to examine first is the extreme skew across nearly all numeric engagement columns (column23, column24, column27, column29, column31): medians sit at 0–5 while means run into the hundreds or thousands, meaning a tiny fraction of blockbuster titles account for the vast majority of reviews and activity. A second area worth attention is genre distribution (column36), where just a handful of Casual/Indie/Action combinations account for the bulk of the catalogue, and the estimated owner-count banding (column03) shows over 61% of games have fewer than 20,000 owners — pointing to a long-tail market dominated by low-visibility titles.

citing: column23.stats.median · column23.stats.mean · column27.stats.zero_rate · column29.stats.zero_rate · column03.top_values · column03.stats.top_rate · column36.top_values · column36.stats.duplicate_rate · column06.stats.median · column06.stats.mean

Out[4]:

saturn.schema() · 40 columns

column	kind	n	null%	unique	alerts
column00	numeric	122,611	0.0%	122,611
column01	text	122,611	0.0%	121,454	near_unique
column02	text	122,611	0.0%	5,081	short_text duplicates
column03	categorical	122,611	0.0%	14
column04	numeric	122,611	0.0%	1,110	high_skew outliers
column05	numeric	122,611	0.0%	15	high_skew
column06	numeric	122,611	0.0%	941	high_skew outliers
column07	numeric	122,611	0.0%	88
column08	numeric	122,611	0.0%	117	high_skew outliers
column09	text	122,611	6.9%	113,556	near_unique
column10	text	122,611	0.0%	19,113	one_word duplicates
column11	text	122,611	0.0%	3,710	one_word duplicates
column12	text	122,611	90.2%	11,884	near_unique null_rate
column13	text	122,611	0.1%	122,420	near_unique one_word url_heavy
column14	text	122,611	59.5%	39,703	one_word url_heavy null_rate duplicates
column15	text	122,611	55.8%	35,399	one_word url_heavy null_rate duplicates
column16	text	122,611	18.1%	60,519	one_word duplicates
column17	categorical	122,611	0.0%	2	imbalance
column18	categorical	122,611	0.0%	2
column19	categorical	122,611	0.0%	2
column20	numeric	122,611	0.0%	73	high_skew
column21	text	122,611	96.5%	4,160	near_unique one_word url_heavy null_rate
column22	numeric	122,611	0.0%	31	high_skew
column23	numeric	122,611	0.0%	5,540	high_skew outliers
column24	numeric	122,611	0.0%	2,725	high_skew outliers
column25	numeric	122,611	100.0%	3	null_rate
column26	numeric	122,611	0.0%	448	high_skew outliers
column27	numeric	122,611	0.0%	5,332	high_skew outliers
column28	text	122,611	81.7%	18,620	multilingual null_rate
column29	numeric	122,611	0.0%	3,037	high_skew outliers
column30	numeric	122,611	0.0%	993	high_skew
column31	numeric	122,611	0.0%	2,511	high_skew outliers
column32	numeric	122,611	0.0%	993	high_skew
column33	text	122,611	6.9%	70,816	one_word duplicates
column34	text	122,611	7.2%	62,689	one_word duplicates
column35	text	122,611	7.3%	13,291	duplicates
column36	text	122,611	6.9%	2,894	one_word duplicates
column37	text	122,611	32.0%	77,179	multilingual null_rate
column38	text	122,611	4.9%	116,483	near_unique one_word url_heavy
column39	unknown	122,611	0.0%	—	skipped

Fig 1.

column03 · Owner-count bands reveal the long tail: over 61% of games sit in the lowest tier (0–20,000 owners), with rapid drop-off at higher counts.

Show data table

Top values for column03 (14 unique shown, of 14 total).
value	count	share
0 - 20000	75404	61.5%
0 - 0	21641	17.7%
20000 - 50000	11396	9.3%
50000 - 100000	5355	4.4%
100000 - 200000	3454	2.8%
200000 - 500000	2853	2.3%
500000 - 1000000	1154	0.9%
1000000 - 2000000	729	0.6%
2000000 - 5000000	405	0.3%
5000000 - 10000000	125	0.1%
10000000 - 20000000	51	0.0%
20000000 - 50000000	31	0.0%
50000000 - 100000000	9	0.0%
100000000 - 200000000	4	0.0%

Fig 2.

column36 · Genre tag combinations show Casual, Indie, and Action dominate Steam listings — look for which combos are over-represented relative to their review volumes.

Show data table

Character-length distribution for column36 (mean: 22.205064887301003).
chars	count
3 – 9	12259
9 – 15	21084
15 – 20	22318
20 – 26	25837
26 – 32	12284
32 – 38	8026
38 – 44	5596
44 – 50	2995
50 – 55	1587
55 – 61	848
61 – 67	593
67 – 73	229
73 – 79	196
79 – 85	137
85 – 90	71
90 – 96	35
96 – 102	37
102 – 108	13
108 – 114	15
114 – 120	10
120 – 125	6
125 – 131	6
131 – 137	4
137 – 143	2
143 – 149	2
149 – 154	1
154 – 160	0
160 – 166	1
166 – 172	4
172 – 178	0
178 – 184	0
184 – 189	0
189 – 195	0
195 – 201	0
201 – 207	0
207 – 213	1
213 – 219	0
219 – 224	0
224 – 230	0
230 – 236	1

Fig 3.

column23 · Review counts are extremely right-skewed with a median of 5 and a max of over 7 million, illustrating the blockbuster vs. obscurity divide.

Show data table

Histogram bins for column23 (median: 5.0).
bin	count
0 – 1.911e+05	122511
1.911e+05 – 3.821e+05	57
3.821e+05 – 5.732e+05	16
5.732e+05 – 7.642e+05	10
7.642e+05 – 9.553e+05	6
9.553e+05 – 1.146e+06	5
1.146e+06 – 1.337e+06	1
1.337e+06 – 1.528e+06	2
1.528e+06 – 1.719e+06	0
1.719e+06 – 1.911e+06	1
1.911e+06 – 2.102e+06	1
2.102e+06 – 2.293e+06	0
2.293e+06 – 2.484e+06	0
2.484e+06 – 2.675e+06	0
2.675e+06 – 2.866e+06	0
2.866e+06 – 3.057e+06	0
3.057e+06 – 3.248e+06	0
3.248e+06 – 3.439e+06	0
3.439e+06 – 3.63e+06	0
3.63e+06 – 3.821e+06	0
3.821e+06 – 4.012e+06	0
4.012e+06 – 4.203e+06	0
4.203e+06 – 4.394e+06	0
4.394e+06 – 4.585e+06	0
4.585e+06 – 4.776e+06	0
4.776e+06 – 4.967e+06	0
4.967e+06 – 5.158e+06	0
5.158e+06 – 5.349e+06	0
5.349e+06 – 5.541e+06	0
5.541e+06 – 5.732e+06	0
5.732e+06 – 5.923e+06	0
5.923e+06 – 6.114e+06	0
6.114e+06 – 6.305e+06	0
6.305e+06 – 6.496e+06	0
6.496e+06 – 6.687e+06	0
6.687e+06 – 6.878e+06	0
6.878e+06 – 7.069e+06	0
7.069e+06 – 7.26e+06	0
7.26e+06 – 7.451e+06	0
7.451e+06 – 7.642e+06	1

Fig 4.

column06 · Price distribution is heavily skewed toward low values (median $2.24, mean $4.77) with a long tail up to $999.98 — check for outlier pricing strategies.

Show data table

Histogram bins for column06 (median: 2.24).
bin	count
0 – 25	120926
25 – 50	1081
50 – 75	248
75 – 100	47
100 – 125	6
125 – 150	13
150 – 175	1
175 – 200	282
200 – 225	0
225 – 250	0
250 – 275	1
275 – 300	2
300 – 325	0
325 – 350	0
350 – 375	0
375 – 400	0
400 – 425	0
425 – 450	0
450 – 475	0
475 – 500	0
500 – 525	1
525 – 550	0
550 – 575	0
575 – 600	0
600 – 625	0
625 – 650	0
650 – 675	0
675 – 700	0
700 – 725	0
725 – 750	0
750 – 775	0
775 – 800	0
800 – 825	0
825 – 850	0
850 – 875	0
875 – 900	0
900 – 925	0
925 – 950	0
950 – 975	0
975 – 1000	3

Fig 5.

column02 · Release dates cluster heavily in 2024–2025, showing recent acceleration in Steam publishing volume worth comparing against owner and review counts.

Show data table

Character-length distribution for column02 (mean: 11.718671244831214).
chars	count
11 – 11	34494
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	88117

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
column00	numeric	0.0%
column01	text	0.0%
column02	text	0.0%
column03	categorical	0.0%
column04	numeric	0.0%
column05	numeric	0.0%
column06	numeric	0.0%
column07	numeric	0.0%
column08	numeric	0.0%
column09	text	6.9%
column10	text	0.0%
column11	text	0.0%
column12	text	90.2%
column13	text	0.1%
column14	text	59.5%
column15	text	55.8%
column16	text	18.1%
column17	categorical	0.0%
column18	categorical	0.0%
column19	categorical	0.0%
column20	numeric	0.0%
column21	text	96.5%
column22	numeric	0.0%
column23	numeric	0.0%
column24	numeric	0.0%
column25	numeric	100.0%
column26	numeric	0.0%
column27	numeric	0.0%
column28	text	81.7%
column29	numeric	0.0%
column30	numeric	0.0%
column31	numeric	0.0%
column32	numeric	0.0%
column33	text	6.9%
column34	text	7.2%
column35	text	7.3%
column36	text	6.9%
column37	text	32.0%
column38	text	4.9%
column39	unknown	0.0%

Fig 7.

Language mix across all text columns (per-string detection, sampled).

Show data table

Per-language counts (total 18,468 detected strings).
lang	count	share
en	18419	99.7%
da	12	0.1%
de	10	0.1%
zh	9	0.0%
ja	9	0.0%
es	6	0.0%
pt	1	0.0%
fr	1	0.0%
ca	1	0.0%

Fig 8.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
	column00	column04	column05	column06	column07	column08	column20	column22	column23	column24	column25	column26
column00	+1.00	+0.19	+nan	-0.04	-0.16	-0.05	-0.25	+nan	-0.43	-0.49	-0.06	-0.29
column04	+0.19	+1.00	+nan	-0.04	-0.06	+0.38	-0.03	+nan	+0.11	-0.05	-0.04	-0.04
column05	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan
column06	-0.04	-0.04	+nan	+1.00	-0.17	+0.10	-0.08	+nan	+0.12	-0.04	-0.27	+0.14
column07	-0.16	-0.06	+nan	-0.17	+1.00	+0.09	-0.06	+nan	+0.10	+0.26	+0.01	-0.04
column08	-0.05	+0.38	+nan	+0.10	+0.09	+1.00	-0.07	+nan	+0.19	+0.24	-0.22	+0.20
column20	-0.25	-0.03	+nan	-0.08	-0.06	-0.07	+1.00	+nan	-0.09	-0.07	+0.20	+0.08
column22	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan	+nan
column23	-0.43	+0.11	+nan	+0.12	+0.10	+0.19	-0.09	+nan	+1.00	+0.71	-0.10	+0.21
column24	-0.49	-0.05	+nan	-0.04	+0.26	+0.24	-0.07	+nan	+0.71	+1.00	-0.14	+0.12
column25	-0.06	-0.04	+nan	-0.27	+0.01	-0.22	+0.20	+nan	-0.10	-0.14	+1.00	+0.08
column26	-0.29	-0.04	+nan	+0.14	-0.04	+0.20	+0.08	+nan	+0.21	+0.12	+0.08	+1.00

column00 numeric identifier

This column contains 122,611 numeric values that are all unique, null-free, and span from 10 to 4,264,350 — strongly suggesting it is a unique numeric identifier (e.g., a record ID or transaction number). The distribution is remarkably flat and near-uniform: kurtosis of -1.05, negligible skew of 0.18, and zero detected outliers, which is highly unusual for a natural measurement or feature and is consistent with a sequentially or pseudo-randomly assigned integer key. The IQR of 1,806,385 is close to half the full range, further supporting a uniform spread across the ID space.

Treatment: Drop before modelling or use as a row key only; do not use as a predictive feature.

anthropic:default · confidence high

Out[14]:

saturn.columns["column00"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	122,611
min	10
max	4.264e+06
mean	1.985e+06
median	1.907e+06
std	1.088e+06
q1	1.063e+06
q3	2.87e+06
iqr	1.806e+06
skew	0.1772
kurtosis	-1.05
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 9.

Distribution of column00. Vertical dash marks the median.

Show data table

Histogram bins for column00 (median: 1907380.0).
bin	count
10 – 1.066e+05	1208
1.066e+05 – 2.132e+05	280
2.132e+05 – 3.198e+05	2468
3.198e+05 – 4.264e+05	3789
4.264e+05 – 5.331e+05	3626
5.331e+05 – 6.397e+05	3651
6.397e+05 – 7.463e+05	4043
7.463e+05 – 8.529e+05	4024
8.529e+05 – 9.595e+05	3772
9.595e+05 – 1.066e+06	3926
1.066e+06 – 1.173e+06	4091
1.173e+06 – 1.279e+06	3895
1.279e+06 – 1.386e+06	3659
1.386e+06 – 1.493e+06	3825
1.493e+06 – 1.599e+06	3975
1.599e+06 – 1.706e+06	3815
1.706e+06 – 1.812e+06	3854
1.812e+06 – 1.919e+06	3820
1.919e+06 – 2.026e+06	3749
2.026e+06 – 2.132e+06	2991
2.132e+06 – 2.239e+06	3698
2.239e+06 – 2.345e+06	3661
2.345e+06 – 2.452e+06	3593
2.452e+06 – 2.559e+06	3305
2.559e+06 – 2.665e+06	3120
2.665e+06 – 2.772e+06	3232
2.772e+06 – 2.878e+06	3172
2.878e+06 – 2.985e+06	3134
2.985e+06 – 3.092e+06	3098
3.092e+06 – 3.198e+06	3011
3.198e+06 – 3.305e+06	2792
3.305e+06 – 3.411e+06	2880
3.411e+06 – 3.518e+06	2626
3.518e+06 – 3.625e+06	2386
3.625e+06 – 3.731e+06	2314
3.731e+06 – 3.838e+06	2229
3.838e+06 – 3.945e+06	2212
3.945e+06 – 4.051e+06	1778
4.051e+06 – 4.158e+06	1353
4.158e+06 – 4.264e+06	556

column01 text label

This column contains short, near-unique text strings averaging ~3 words and 18 characters, consistent with game or software session/product titles. The dominant top words — 'playtest', 'vr', 'simulator' — strongly suggest these are names of VR game playtesting sessions or titles. Surprising signals include 1,156 duplicates (~0.94% duplicate rate) despite the near-unique alert, a small emoji presence (0.26%), and a maximum length of 413 characters which is anomalously long relative to the median of 16.

Treatment: Use as a descriptive label; deduplicate or flag the 1,156 repeated entries, and investigate the long-tail outliers (len_max 413) before any downstream grouping or embedding.

anthropic:default · confidence medium

Out[17]:

saturn.columns["column01"].stats

stat	value
n	122,611
nulls	1 (0.0%)
unique	121,454
len_min	1
len_max	413
len_mean	18.07
len_median	16
len_p95	37.55
word_mean	2.912
word_median	3
n_empty	0
n_duplicates	1,156
duplicate_rate	0.009428
vocab_size	18,813
readability_flesch_mean	52.87
emoji_rate	0.002585
url_rate	0
one_word_rate	0.1866
allcaps_rate	0.06731
boilerplate_rate	4.078e-05
alert: near_unique	99.1% of rows are unique strings

Fig 10.

Character-length distribution for column01.

Show data table

Character-length distribution for column01 (mean: 18.069627273468722).
chars	count
1 – 11	34028
11 – 22	53727
22 – 32	22850
32 – 42	8502
42 – 52	2281
52 – 63	858
63 – 73	213
73 – 83	69
83 – 94	28
94 – 104	21
104 – 114	13
114 – 125	8
125 – 135	4
135 – 145	3
145 – 156	1
156 – 166	1
166 – 176	0
176 – 186	1
186 – 197	0
197 – 207	0
207 – 217	0
217 – 228	0
228 – 238	0
238 – 248	0
248 – 258	1
258 – 269	0
269 – 279	0
279 – 289	0
289 – 300	0
300 – 310	0
310 – 320	0
320 – 331	0
331 – 341	0
341 – 351	0
351 – 362	0
362 – 372	0
372 – 382	0
382 – 392	0
392 – 403	0
403 – 413	1

column02 text timestamp

This column contains dates formatted as 'Mon DD, YYYY' (e.g., 'Oct 23, 2025'), stored as text rather than a native date type. The values span at least 2021–2025 based on top word frequencies, with a striking duplicate rate of 95.86% — 117,530 of 122,611 rows share one of only 5,081 distinct dates, meaning many records map to the same calendar day. The near-constant string length (median 12, min 11, max 12) and vocabulary of just 68 tokens confirm this is a tightly formatted date field with no free-text variation.

Treatment: Parse to a native date type (e.g., datetime64) before any time-series analysis or feature engineering.

anthropic:default · confidence high

Out[20]:

saturn.columns["column02"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	5,081
len_min	11
len_max	12
len_mean	11.72
len_median	12
len_p95	12
word_mean	3
word_median	3
n_empty	0
n_duplicates	117,530
duplicate_rate	0.9586
vocab_size	68
readability_flesch_mean	98.6
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0
boilerplate_rate	0
alert: short_text	95th-percentile length under 20 chars
alert: duplicates	95.9% duplicate strings

Fig 11.

Character-length distribution for column02.

Show data table

Character-length distribution for column02 (mean: 11.718671244831214).
chars	count
11 – 11	34494
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	88117

column03 categorical feature

This column encodes a numeric quantity as binned range labels — almost certainly an income, revenue, or financial amount bracket given the scale (0 to 10,000,000+) and logarithmically spaced bin edges. The distribution is heavily right-skewed: 61.5% of rows fall in the '0 - 20000' bucket alone, and a notable 21,641 rows sit in '0 - 0', suggesting a zero-value spike that may warrant separate treatment. With only 14 distinct values and zero nulls across 122,611 rows, the encoding is clean but lossy.

Treatment: Ordinal-encode using the natural bin order, or extract bin midpoints as a numeric approximation; investigate the '0 - 0' segment (21,641 rows) as a potential distinct class.

anthropic:default · confidence high

Out[23]:

saturn.columns["column03"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	14
top_value	0 - 20000
top_rate	0.615
cardinality	14
entropy	1.814
entropy_ratio	0.4764

Fig 12.

Top values for column03.

Show data table

Top values for column03 (14 unique shown, of 14 total).
value	count	share
0 - 20000	75404	61.5%
0 - 0	21641	17.7%
20000 - 50000	11396	9.3%
50000 - 100000	5355	4.4%
100000 - 200000	3454	2.8%
200000 - 500000	2853	2.3%
500000 - 1000000	1154	0.9%
1000000 - 2000000	729	0.6%
2000000 - 5000000	405	0.3%
5000000 - 10000000	125	0.1%
10000000 - 20000000	51	0.0%
20000000 - 50000000	31	0.0%
50000000 - 100000000	9	0.0%
100000000 - 200000000	4	0.0%

column04 numeric feature

This column is a heavily zero-inflated numeric field — likely a count, transaction amount, or event frequency — where 83.95% of values are exactly zero and the interquartile range is 0.0, meaning the entire middle half of the distribution is flat at zero. The remaining values are extremely right-skewed (skew = 209.95, kurtosis = 51452.44) with a max of 1,013,936 against a mean of only 54.59, indicating a small number of very large outliers; 16.05% of rows (19,676) are flagged as outliers. The 1,110 unique values and zero null rate suggest this may be a sparse activity or volume metric.

Treatment: Consider a two-part model (zero-inflation indicator + log1p transform on non-zero values) or cap/winsorize at a high percentile before modelling.

anthropic:default · confidence high

Out[26]:

saturn.columns["column04"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	1,110
min	0
max	1.014e+06
mean	54.59
median	0
std	3729
q1	0
q3	0
iqr	0
skew	210
kurtosis	5.145e+04
n_outliers	19,676
outlier_rate	0.1605
zero_rate	0.8395
alert: high_skew	skew=+209.95
alert: outliers	16.0% rows beyond 1.5 IQR

Fig 13.

Distribution of column04. Vertical dash marks the median.

Show data table

Histogram bins for column04 (median: 0.0).
bin	count
0 – 2.535e+04	122573
2.535e+04 – 5.07e+04	22
5.07e+04 – 7.605e+04	6
7.605e+04 – 1.014e+05	2
1.014e+05 – 1.267e+05	2
1.267e+05 – 1.521e+05	1
1.521e+05 – 1.774e+05	2
1.774e+05 – 2.028e+05	0
2.028e+05 – 2.281e+05	0
2.281e+05 – 2.535e+05	0
2.535e+05 – 2.788e+05	0
2.788e+05 – 3.042e+05	0
3.042e+05 – 3.295e+05	1
3.295e+05 – 3.549e+05	0
3.549e+05 – 3.802e+05	0
3.802e+05 – 4.056e+05	0
4.056e+05 – 4.309e+05	0
4.309e+05 – 4.563e+05	0
4.563e+05 – 4.816e+05	0
4.816e+05 – 5.07e+05	0
5.07e+05 – 5.323e+05	0
5.323e+05 – 5.577e+05	0
5.577e+05 – 5.83e+05	0
5.83e+05 – 6.084e+05	0
6.084e+05 – 6.337e+05	1
6.337e+05 – 6.591e+05	0
6.591e+05 – 6.844e+05	0
6.844e+05 – 7.098e+05	0
7.098e+05 – 7.351e+05	0
7.351e+05 – 7.605e+05	0
7.605e+05 – 7.858e+05	0
7.858e+05 – 8.111e+05	0
8.111e+05 – 8.365e+05	0
8.365e+05 – 8.618e+05	0
8.618e+05 – 8.872e+05	0
8.872e+05 – 9.125e+05	0
9.125e+05 – 9.379e+05	0
9.379e+05 – 9.632e+05	0
9.632e+05 – 9.886e+05	0
9.886e+05 – 1.014e+06	1

column05 numeric feature

This column is a low-cardinality integer count (only 15 distinct values, range 0–21) where 98.96% of rows are exactly zero, making it an extreme sparse count feature — likely recording rare events or occurrences per record. The distribution is severely right-skewed (skew 9.88, kurtosis 96.52) with only 1,272 outlier rows (1.04%) carrying any non-zero signal; the IQR is zero because all three quartiles collapse to 0.

Treatment: Treat as a sparse count; consider binarising (0 vs >0) or applying log1p transform, and flag the 1,272 non-zero rows as a minority sub-population for modelling.

anthropic:default · confidence high

Out[29]:

saturn.columns["column05"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	15
min	0
max	21
mean	0.1676
median	0
std	1.654
q1	0
q3	0
iqr	0
skew	9.883
kurtosis	96.52
n_outliers	1,272
outlier_rate	0.01037
zero_rate	0.9896
alert: high_skew	skew=+9.88

Fig 14.

Distribution of column05. Vertical dash marks the median.

Show data table

Histogram bins for column05 (median: 0.0).
bin	count
0 – 0.525	121339
0.525 – 1.05	2
1.05 – 1.575	0
1.575 – 2.1	0
2.1 – 2.625	0
2.625 – 3.15	6
3.15 – 3.675	0
3.675 – 4.2	0
4.2 – 4.725	0
4.725 – 5.25	0
5.25 – 5.775	0
5.775 – 6.3	4
6.3 – 6.825	0
6.825 – 7.35	5
7.35 – 7.875	0
7.875 – 8.4	0
8.4 – 8.925	0
8.925 – 9.45	0
9.45 – 9.975	0
9.975 – 10.5	26
10.5 – 11.03	0
11.03 – 11.55	0
11.55 – 12.08	23
12.08 – 12.6	0
12.6 – 13.12	175
13.12 – 13.65	0
13.65 – 14.18	1
14.18 – 14.7	0
14.7 – 15.23	3
15.23 – 15.75	0
15.75 – 16.28	32
16.28 – 16.8	0
16.8 – 17.32	828
17.32 – 17.85	0
17.85 – 18.38	164
18.38 – 18.9	0
18.9 – 19.43	0
19.43 – 19.95	0
19.95 – 20.48	1
20.48 – 21	2

column06 numeric feature

This column likely represents a monetary amount, duration, or rate — a continuous positive measure where most values are small. The distribution is extreme: the median is 2.24 and Q3 is only 5.24, yet the max reaches 999.98, producing a skew of 22.4 and a kurtosis of 1,135. Over 7.5% of rows (9,297) are flagged as outliers, and 21.4% of values are exactly zero, suggesting a two-part structure (zero-inflation plus a heavy-tailed positive component) that would violate standard regression assumptions.

Treatment: Model the zero-inflation separately (e.g., hurdle or Tweedie model), then log1p-transform the positive portion before regression or scaling.

anthropic:default · confidence medium

Out[32]:

saturn.columns["column06"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	941
min	0
max	1000
mean	4.765
median	2.24
std	12.53
q1	0.55
q3	5.24
iqr	4.69
skew	22.4
kurtosis	1135
n_outliers	9,297
outlier_rate	0.07583
zero_rate	0.2137
alert: high_skew	skew=+22.40
alert: outliers	7.6% rows beyond 1.5 IQR

Fig 15.

Distribution of column06. Vertical dash marks the median.

Show data table

Histogram bins for column06 (median: 2.24).
bin	count
0 – 25	120926
25 – 50	1081
50 – 75	248
75 – 100	47
100 – 125	6
125 – 150	13
150 – 175	1
175 – 200	282
200 – 225	0
225 – 250	0
250 – 275	1
275 – 300	2
300 – 325	0
325 – 350	0
350 – 375	0
375 – 400	0
400 – 425	0
425 – 450	0
450 – 475	0
475 – 500	0
500 – 525	1
525 – 550	0
550 – 575	0
575 – 600	0
600 – 625	0
625 – 650	0
650 – 675	0
675 – 700	0
700 – 725	0
725 – 750	0
750 – 775	0
775 – 800	0
800 – 825	0
825 – 850	0
850 – 875	0
875 – 900	0
900 – 925	0
925 – 950	0
950 – 975	0
975 – 1000	3

column07 numeric feature

This column is a bounded numeric score or percentage, ranging from 0 to 100 with only 88 distinct values, suggesting a discretized or rounded measurement (e.g., a completion rate, satisfaction score, or grade). The most striking feature is that 66.8% of values are exactly zero, making the distribution heavily zero-inflated; the median is 0.0 while the mean is 18.35 and Q3 is only 40.0, confirming the mass is concentrated at the floor. Despite the zero inflation, kurtosis is near zero (−0.05), meaning the non-zero portion is roughly flat or uniform across the 0–100 range. Analysts should treat this as a zero-inflated bounded variable rather than a standard continuous feature.

Treatment: Model with a two-part (hurdle/zero-inflated) approach, or apply an indicator for zero alongside the raw value; avoid log-transform without offset due to zero mass.

anthropic:default · confidence high

Out[35]:

saturn.columns["column07"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	88
min	0
max	100
mean	18.35
median	0
std	28.86
q1	0
q3	40
iqr	40
skew	1.22
kurtosis	-0.05072
n_outliers	0
outlier_rate	0
zero_rate	0.6682

Fig 16.

Distribution of column07. Vertical dash marks the median.

Show data table

Histogram bins for column07 (median: 0.0).
bin	count
0 – 2.5	81930
2.5 – 5	0
5 – 7.5	0
7.5 – 10	0
10 – 12.5	620
12.5 – 15	16
15 – 17.5	437
17.5 – 20	15
20 – 22.5	2394
22.5 – 25	124
25 – 27.5	1471
27.5 – 30	39
30 – 32.5	2689
32.5 – 35	482
35 – 37.5	955
37.5 – 40	46
40 – 42.5	2358
42.5 – 45	52
45 – 47.5	419
47.5 – 50	73
50 – 52.5	9742
52.5 – 55	24
55 – 57.5	534
57.5 – 60	33
60 – 62.5	2493
62.5 – 65	45
65 – 67.5	1150
67.5 – 70	166
70 – 72.5	3120
72.5 – 75	77
75 – 77.5	2918
77.5 – 80	106
80 – 82.5	3754
82.5 – 85	263
85 – 87.5	1353
87.5 – 90	252
90 – 92.5	2217
92.5 – 95	56
95 – 97.5	182
97.5 – 100	6

column08 numeric feature

This column is a sparse count or event-frequency field: 85.5% of its 122,611 rows are exactly zero, the median and IQR are both 0, yet the mean is 0.55 and the max reaches 3,703. The extreme concentration at zero combined with a skew of 171.8 and kurtosis of 38,359 indicates a heavy-tailed distribution driven by rare but very large values; 14.5% of rows (17,771) are flagged as outliers. Only 117 distinct values across 122,611 rows further suggests this is a discrete count, not a continuous measure.

Treatment: Apply log1p transform or use a zero-inflated / Poisson model; consider capping or winsorizing at a high quantile given the max of 3703.

anthropic:default · confidence high

Out[38]:

saturn.columns["column08"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	117
min	0
max	3,703
mean	0.5459
median	0
std	14.52
q1	0
q3	0
iqr	0
skew	171.8
kurtosis	3.836e+04
n_outliers	17,771
outlier_rate	0.1449
zero_rate	0.8551
alert: high_skew	skew=+171.83
alert: outliers	14.5% rows beyond 1.5 IQR

Fig 17.

Distribution of column08. Vertical dash marks the median.

Show data table

Histogram bins for column08 (median: 0.0).
bin	count
0 – 92.58	122551
92.58 – 185.2	39
185.2 – 277.7	6
277.7 – 370.3	3
370.3 – 462.9	2
462.9 – 555.5	0
555.5 – 648	0
648 – 740.6	0
740.6 – 833.2	5
833.2 – 925.8	1
925.8 – 1018	1
1018 – 1111	1
1111 – 1203	0
1203 – 1296	0
1296 – 1389	0
1389 – 1481	0
1481 – 1574	0
1574 – 1666	0
1666 – 1759	0
1759 – 1852	0
1852 – 1944	0
1944 – 2037	1
2037 – 2129	0
2129 – 2222	0
2222 – 2314	0
2314 – 2407	0
2407 – 2500	0
2500 – 2592	0
2592 – 2685	0
2685 – 2777	0
2777 – 2870	0
2870 – 2962	0
2962 – 3055	0
3055 – 3148	0
3148 – 3240	0
3240 – 3333	0
3333 – 3425	0
3425 – 3518	0
3518 – 3610	0
3610 – 3703	1

column09 text free_text

This column contains long-form natural language text, likely user-generated content such as reviews, product descriptions, or messages — with a mean of 1,297 characters and 214 words per entry, and a vocabulary of 105,903 unique terms. The near-unique alert (113,556 unique values out of 122,611 rows) confirms these are essentially free-text narratives rather than categorical labels. Notably, 4.7% of entries contain emojis, suggesting informal or consumer-facing content, and the max length of 89,665 characters indicates some extreme outliers well beyond the 95th-percentile length of 2,966 characters. Flesch readability mean of 58.7 places the text in a 'fairly easy' register, consistent with consumer writing.

Treatment: Tokenize and embed (e.g., sentence-transformers) before modelling; flag or truncate the extreme-length outliers above len_p95 of 2,966 characters.

anthropic:default · confidence high

Out[41]:

saturn.columns["column09"].stats

stat	value
n	122,611
nulls	8,449 (6.9%)
unique	113,556
len_min	1
len_max	89,665
len_mean	1297
len_median	1,064
len_p95	2,966
word_mean	214.3
word_median	177
n_empty	0
n_duplicates	606
duplicate_rate	0.005308
vocab_size	105,903
readability_flesch_mean	58.75
emoji_rate	0.04672
url_rate	0.0003504
one_word_rate	0.0004029
allcaps_rate	0.007559
boilerplate_rate	0.01517
alert: near_unique	99.5% of rows are unique strings

Fig 18.

Character-length distribution for column09.

Show data table

Character-length distribution for column09 (mean: 1297.112095092938).
chars	count
1 – 2243	100879
2243 – 4484	11956
4484 – 6726	1006
6726 – 8967	201
8967 – 11209	66
11209 – 13451	22
13451 – 15692	14
15692 – 17934	7
17934 – 20175	0
20175 – 22417	3
22417 – 24659	0
24659 – 26900	1
26900 – 29142	0
29142 – 31383	1
31383 – 33625	0
33625 – 35867	0
35867 – 38108	0
38108 – 40350	0
40350 – 42591	0
42591 – 44833	0
44833 – 47075	1
47075 – 49316	0
49316 – 51558	0
51558 – 53799	0
53799 – 56041	0
56041 – 58283	0
58283 – 60524	0
60524 – 62766	1
62766 – 65007	0
65007 – 67249	0
67249 – 69491	1
69491 – 71732	0
71732 – 73974	0
73974 – 76215	0
76215 – 78457	0
78457 – 80699	0
80699 – 82940	0
82940 – 85182	0
85182 – 87423	0
87423 – 89665	3

column10 text feature

This column contains serialized Python lists of language names, representing the supported or available languages for each record (likely a software product or game). The dominant value is `['English']` appearing 55,314 times, with `[]` (no languages listed) in 8,380 rows. The duplicate rate is extremely high at 84.4%, which is expected given the limited vocabulary of 217 unique tokens and only 19,113 unique values across 122,611 rows — the data is stored as raw string-serialized lists rather than a normalized structure, which is a notable preprocessing concern.

Treatment: Parse the string-serialized lists into actual list structures, then multi-hot encode each language as a binary feature column.

anthropic:default · confidence high

Out[44]:

saturn.columns["column10"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	19,113
len_min	2
len_max	1,216
len_mean	68.02
len_median	11
len_p95	224
word_mean	6.889
word_median	1
n_empty	0
n_duplicates	103,498
duplicate_rate	0.8441
vocab_size	217
readability_flesch_mean	14.07
emoji_rate	0
url_rate	0
one_word_rate	0.5333
allcaps_rate	0
boilerplate_rate	0
alert: one_word	53.3% rows are a single word
alert: duplicates	84.4% duplicate strings

Fig 19.

Character-length distribution for column10.

Show data table

Character-length distribution for column10 (mean: 68.01866064219361).
chars	count
2 – 32	79054
32 – 63	13178
63 – 93	7232
93 – 123	5289
123 – 154	5081
154 – 184	3884
184 – 214	2265
214 – 245	1157
245 – 275	630
275 – 306	318
306 – 336	341
336 – 366	555
366 – 397	1037
397 – 427	512
427 – 457	49
457 – 488	28
488 – 518	12
518 – 548	5
548 – 579	9
579 – 609	2
609 – 639	0
639 – 670	2
670 – 700	0
700 – 730	2
730 – 761	0
761 – 791	1
791 – 821	2
821 – 852	6
852 – 882	1
882 – 912	4
912 – 943	8
943 – 973	0
973 – 1004	1
1004 – 1034	2
1034 – 1064	5
1064 – 1095	1
1095 – 1125	4
1125 – 1155	6
1155 – 1186	1
1186 – 1216	1927

column11 text feature

This column contains serialized Python lists of language names, representing the supported or available languages for each record (likely a software product or media item). The dominant value is '[]' (empty list) appearing 72,730 times — nearly 60% of rows — indicating most records have no language metadata populated. Despite 122,611 rows, only 3,710 unique values exist and the duplicate rate is 96.97%, which is expected for a categorical-list field, but the vocabulary is tiny at just 194 words, confirming a closed set of language names.

Treatment: Parse the serialized list strings into proper multi-label indicators (one binary column per language) before modelling; treat '[]' as missing/unknown.

anthropic:default · confidence high

Out[47]:

saturn.columns["column11"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	3,710
len_min	2
len_max	1,216
len_mean	24.31
len_median	2
len_p95	46
word_mean	2.854
word_median	1
n_empty	0
n_duplicates	118,901
duplicate_rate	0.9697
vocab_size	194
readability_flesch_mean	8.003
emoji_rate	0
url_rate	0
one_word_rate	0.813
allcaps_rate	0
boilerplate_rate	0
alert: one_word	81.3% rows are a single word
alert: duplicates	97.0% duplicate strings

Fig 20.

Character-length distribution for column11.

Show data table

Character-length distribution for column11 (mean: 24.311350531355263).
chars	count
2 – 32	110168
32 – 63	7499
63 – 93	1015
93 – 123	742
123 – 154	555
154 – 184	422
184 – 214	254
214 – 245	161
245 – 275	75
275 – 306	32
306 – 336	34
336 – 366	85
366 – 397	345
397 – 427	70
427 – 457	10
457 – 488	11
488 – 518	1
518 – 548	1
548 – 579	0
579 – 609	0
609 – 639	0
639 – 670	0
670 – 700	0
700 – 730	1
730 – 761	0
761 – 791	0
791 – 821	0
821 – 852	5
852 – 882	0
882 – 912	0
912 – 943	0
943 – 973	0
973 – 1004	0
1004 – 1034	1
1034 – 1064	3
1064 – 1095	0
1095 – 1125	1
1125 – 1155	3
1155 – 1186	0
1186 – 1216	1117

column12 text free_text

This column contains substantial free-text descriptions or reviews, most likely about games — the word 'game' is the top non-stopword at 7,882 occurrences, average text length is ~340 characters (~57 words), and the vocabulary spans 61,840 unique tokens. The 90.16% null rate is a major alert: only about 12,000 of 122,611 rows carry any content, meaning this field is sparsely populated. An emoji_rate of ~1.6% and a median Flesch readability score of ~57.8 suggest informal, consumer-written prose. The near_unique flag is partially explained by the sparse population — 11,884 unique values among ~12,000 non-null rows confirms almost every entry is distinct.

Treatment: Tokenize and embed (e.g., TF-IDF or sentence transformer) before modelling; impute or mask nulls explicitly given the 90.16% null rate.

anthropic:default · confidence high

Out[50]:

saturn.columns["column12"].stats

stat	value
n	122,611
nulls	110,541 (90.2%)
unique	11,884
len_min	3
len_max	2,912
len_mean	340.3
len_median	295
len_p95	763
word_mean	57.37
word_median	49
n_empty	0
n_duplicates	186
duplicate_rate	0.01541
vocab_size	61,840
readability_flesch_mean	57.83
emoji_rate	0.01649
url_rate	0
one_word_rate	0
allcaps_rate	0.008202
boilerplate_rate	0
alert: near_unique	98.5% of rows are unique strings
alert: null_rate	90.2% null

Fig 21.

Character-length distribution for column12.

Show data table

Character-length distribution for column12 (mean: 340.28823529411767).
chars	count
3 – 76	812
76 – 148	1475
148 – 221	1866
221 – 294	1856
294 – 367	1625
367 – 439	1313
439 – 512	954
512 – 585	652
585 – 658	495
658 – 730	299
730 – 803	226
803 – 876	128
876 – 948	104
948 – 1021	80
1021 – 1094	45
1094 – 1167	28
1167 – 1239	18
1239 – 1312	31
1312 – 1385	20
1385 – 1458	11
1458 – 1530	4
1530 – 1603	5
1603 – 1676	3
1676 – 1748	3
1748 – 1821	5
1821 – 1894	2
1894 – 1967	1
1967 – 2039	1
2039 – 2112	1
2112 – 2185	1
2185 – 2257	0
2257 – 2330	0
2330 – 2403	1
2403 – 2476	1
2476 – 2548	1
2548 – 2621	0
2621 – 2694	0
2694 – 2767	0
2767 – 2839	1
2839 – 2912	2

column13 text metadata

This column contains Steam CDN URLs pointing to game header images hosted on Akamai's steamstatic.com infrastructure — specifically `header.jpg` assets keyed by Steam app ID. With a url_rate of 1.0 and one_word_rate of 1.0, every single value is a single URL. The column is near-unique (122,420 distinct values out of 122,611 rows), with only 110 duplicates, suggesting these map closely to individual game or product records; the small number of repeated URLs (max frequency 5) likely reflects games appearing in multiple dataset rows.

Treatment: Extract Steam app ID from URL path for joining; drop raw URL before modelling or store as-is for image retrieval pipelines.

anthropic:default · confidence high

Out[53]:

saturn.columns["column13"].stats

stat	value
n	122,611
nulls	81 (0.1%)
unique	122,420
len_min	93
len_max	153
len_mean	104.6
len_median	98
len_p95	139
word_mean	1
word_median	1
n_empty	0
n_duplicates	110
duplicate_rate	0.0008977
vocab_size	19,992
readability_flesch_mean	-834.3
emoji_rate	0
url_rate	1
one_word_rate	1
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	99.9% of rows are unique strings
alert: one_word	100.0% rows are a single word
alert: url_heavy	100.0% rows contain a URL

Fig 22.

Character-length distribution for column13.

Show data table

Character-length distribution for column13 (mean: 104.62648331020975).
chars	count
93 – 94	29
94 – 96	238
96 – 98	27191
98 – 99	74714
99 – 100	0
100 – 102	0
102 – 104	0
104 – 105	0
105 – 106	0
106 – 108	0
108 – 110	0
110 – 111	1
111 – 112	16
112 – 114	0
114 – 116	0
116 – 117	0
117 – 118	0
118 – 120	0
120 – 122	0
122 – 123	0
123 – 124	0
124 – 126	0
126 – 128	0
128 – 129	0
129 – 130	0
130 – 132	0
132 – 134	0
134 – 135	0
135 – 136	11
136 – 138	39
138 – 140	19722
140 – 141	0
141 – 142	0
142 – 144	0
144 – 146	0
146 – 147	0
147 – 148	0
148 – 150	0
150 – 152	64
152 – 153	505

column14 text metadata

This column contains publisher or developer website URLs, almost certainly scraped from a Steam or similar games catalogue. Virtually every non-null value is a single URL (one_word_rate 0.9999, url_rate 0.9999), pointing to publisher homepages, Facebook pages, or Steam publisher/group pages. Two signals stand out: 59.48% of rows are null, meaning many game records carry no website; and 20.08% of non-null values are duplicates (9,973 repeated URLs), reflecting publishers with large catalogues who share one website across many titles.

Treatment: Extract domain as a categorical publisher identifier; flag or impute nulls; do not embed raw URL strings.

anthropic:default · confidence high

Out[56]:

saturn.columns["column14"].stats

stat	value
n	122,611
nulls	72,935 (59.5%)
unique	39,703
len_min	7
len_max	236
len_mean	32.57
len_median	29
len_p95	56
word_mean	1
word_median	1
n_empty	0
n_duplicates	9,973
duplicate_rate	0.2008
vocab_size	17,059
readability_flesch_mean	-260.3
emoji_rate	0
url_rate	0.9999
one_word_rate	0.9999
allcaps_rate	0
boilerplate_rate	0
alert: one_word	100.0% rows are a single word
alert: url_heavy	100.0% rows contain a URL
alert: null_rate	59.5% null
alert: duplicates	20.1% duplicate strings

Fig 23.

Character-length distribution for column14.

Show data table

Character-length distribution for column14 (mean: 32.568805861985666).
chars	count
7 – 13	2
13 – 18	1050
18 – 24	10802
24 – 30	13931
30 – 36	10202
36 – 41	5397
41 – 47	3308
47 – 53	1755
53 – 59	1232
59 – 64	791
64 – 70	277
70 – 76	245
76 – 81	223
81 – 87	149
87 – 93	53
93 – 99	34
99 – 104	22
104 – 110	29
110 – 116	26
116 – 122	33
122 – 127	26
127 – 133	18
133 – 139	16
139 – 144	20
144 – 150	4
150 – 156	6
156 – 162	12
162 – 167	3
167 – 173	1
173 – 179	7
179 – 184	1
184 – 190	0
190 – 196	0
196 – 202	0
202 – 207	0
207 – 213	0
213 – 219	0
219 – 225	0
225 – 230	0
230 – 236	1

column15 text metadata

This column is a support/contact URL field — almost certainly a developer or publisher support link associated with game or software records. 95.6% of non-null values are URLs, and the one-word rate is 99.9%, consistent with bare URL strings. Two surprises stand out: the null rate is very high at 55.8%, meaning more than half of records lack this URL, and the duplicate rate is 34.7% (18,808 duplicate values out of ~54,200 non-null rows), reflecting that many games share the same support domain (e.g., Big Fish Games, EA, Facebook pages).

Treatment: Extract domain as a categorical feature; treat raw URL as a grouping key rather than a text feature; impute or flag nulls separately given 55.8% null rate.

anthropic:default · confidence high

Out[59]:

saturn.columns["column15"].stats

stat	value
n	122,611
nulls	68,404 (55.8%)
unique	35,399
len_min	1
len_max	851
len_mean	31.19
len_median	29
len_p95	51
word_mean	1.002
word_median	1
n_empty	0
n_duplicates	18,808
duplicate_rate	0.347
vocab_size	14,875
readability_flesch_mean	-245.1
emoji_rate	0
url_rate	0.9559
one_word_rate	0.9993
allcaps_rate	0.0007933
boilerplate_rate	0
alert: one_word	99.9% rows are a single word
alert: url_heavy	95.6% rows contain a URL
alert: null_rate	55.8% null
alert: duplicates	34.7% duplicate strings

Fig 24.

Character-length distribution for column15.

Show data table

Character-length distribution for column15 (mean: 31.185455752947036).
chars	count
1 – 22	9827
22 – 44	38992
44 – 65	4610
65 – 86	477
86 – 107	155
107 – 128	82
128 – 150	34
150 – 171	10
171 – 192	2
192 – 214	12
214 – 235	1
235 – 256	2
256 – 277	0
277 – 298	0
298 – 320	1
320 – 341	0
341 – 362	1
362 – 384	0
384 – 405	0
405 – 426	0
426 – 447	0
447 – 468	0
468 – 490	0
490 – 511	0
511 – 532	0
532 – 554	0
554 – 575	0
575 – 596	0
596 – 617	0
617 – 638	0
638 – 660	0
660 – 681	0
681 – 702	0
702 – 724	0
724 – 745	0
745 – 766	0
766 – 787	0
787 – 808	0
808 – 830	0
830 – 851	1

column16 text foreign_key

This column contains email addresses for game developers or publishers, as evidenced by the top values (e.g., 'info@bigfishgames.com', 'support@quanticlab.com'). Nearly all values (99.86%) are single tokens, consistent with email format. The duplicate rate is high at 39.7% (39,849 duplicates out of 122,611 rows), indicating many records share a contact email — expected for a publisher-level field where one entity owns multiple titles. The null rate of 18.14% is notable and should be investigated for systematic missingness.

Treatment: Use as a grouping/join key on publisher or developer entity; normalize to lowercase and strip whitespace before joining.

anthropic:default · confidence high

Out[62]:

saturn.columns["column16"].stats

stat	value
n	122,611
nulls	22,243 (18.1%)
unique	60,519
len_min	1
len_max	169
len_mean	22.91
len_median	23
len_p95	31
word_mean	1.004
word_median	1
n_empty	0
n_duplicates	39,849
duplicate_rate	0.397
vocab_size	15,319
readability_flesch_mean	-223.7
emoji_rate	9.963e-06
url_rate	0.003906
one_word_rate	0.9986
allcaps_rate	0.001016
boilerplate_rate	0
alert: one_word	99.9% rows are a single word
alert: duplicates	39.7% duplicate strings

Fig 25.

Character-length distribution for column16.

Show data table

Character-length distribution for column16 (mean: 22.90802845528455).
chars	count
1 – 5	54
5 – 9	23
9 – 14	1014
14 – 18	10969
18 – 22	28430
22 – 26	39631
26 – 30	14560
30 – 35	3948
35 – 39	954
39 – 43	416
43 – 47	184
47 – 51	55
51 – 56	61
56 – 60	37
60 – 64	8
64 – 68	4
68 – 72	3
72 – 77	2
77 – 81	1
81 – 85	0
85 – 89	3
89 – 93	0
93 – 98	0
98 – 102	3
102 – 106	1
106 – 110	3
110 – 114	1
114 – 119	0
119 – 123	0
123 – 127	0
127 – 131	0
131 – 135	0
135 – 140	0
140 – 144	0
144 – 148	0
148 – 152	0
152 – 156	2
156 – 161	0
161 – 165	0
165 – 169	1

column17 categorical feature

This column is a boolean flag stored as string values ('True'/'False'), covering 122,611 rows with no nulls. It is severely imbalanced: 'True' accounts for 99.964% of rows (122,567 occurrences) while 'False' appears only 44 times. The near-zero entropy (0.0046) confirms the column carries almost no information, making it nearly constant.

Treatment: Investigate whether the 44 'False' rows are meaningful anomalies; otherwise drop as near-constant with no predictive variance.

anthropic:default · confidence high

Out[65]:

saturn.columns["column17"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	2
top_value	True
top_rate	0.9996
cardinality	2
entropy	0.004625
entropy_ratio	0.004625
alert: imbalance	top value is 100.0% of rows

Fig 26.

Top values for column17.

Show data table

Top values for column17 (2 unique shown, of 2 total).
value	count	share
True	122567	100.0%
False	44	0.0%

column18 categorical feature

This column is a binary boolean flag stored as string literals 'True'/'False', with zero nulls across 122,611 rows. The dominant value is 'False' at 82.6% (101,319 occurrences), leaving 'True' at roughly 17.4% (21,292) — a moderately imbalanced split that may matter for classification tasks. The entropy ratio of 0.666 confirms meaningful but uneven information content.

Treatment: Cast to boolean/integer (0/1) and monitor class imbalance if used as a target or predictor.

anthropic:default · confidence high

Out[68]:

saturn.columns["column18"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	2
top_value	False
top_rate	0.8263
cardinality	2
entropy	0.666
entropy_ratio	0.666

Fig 27.

Top values for column18.

Show data table

Top values for column18 (2 unique shown, of 2 total).
value	count	share
False	101319	82.6%
True	21292	17.4%

column19 categorical label

This column is a boolean flag stored as string literals 'True'/'False', covering all 122,611 rows with zero nulls. The distribution is heavily skewed: 'False' dominates at 87.2% (106,905 rows) versus 'True' at only 12.8% (15,706 rows). The low entropy of 0.552 confirms the imbalance. An analyst building a classifier on this as a target should anticipate class imbalance requiring resampling or adjusted class weights.

Treatment: encode as binary integer (False=0, True=1) and address class imbalance (~87/13 split) before modelling.

anthropic:default · confidence high

Out[71]:

saturn.columns["column19"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	2
top_value	False
top_rate	0.8719
cardinality	2
entropy	0.5522
entropy_ratio	0.5522

Fig 28.

Top values for column19.

Show data table

Top values for column19 (2 unique shown, of 2 total).
value	count	share
False	106905	87.2%
True	15706	12.8%

column20 numeric feature

This column is a sparse numeric count or score with only 73 distinct values across 122,611 rows, almost certainly representing an event count, frequency, or discrete rating. The distribution is extraordinarily concentrated at zero — 96.5% of values are exactly 0 — with IQR of 0.0 and a median of 0.0, yet the max reaches 97.0, producing extreme positive skew (5.23) and kurtosis (25.75). The 4,256 outlier rows (3.47%) carrying non-zero values likely represent a small active or engaged sub-population, which is the analytically interesting segment.

Treatment: Apply log1p transform or binarise (zero vs. non-zero) before modelling; consider separating the zero-inflated mass from the active tail for two-part modelling.

anthropic:default · confidence high

Out[74]:

saturn.columns["column20"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	73
min	0
max	97
mean	2.565
median	0
std	13.66
q1	0
q3	0
iqr	0
skew	5.227
kurtosis	25.75
n_outliers	4,256
outlier_rate	0.03471
zero_rate	0.9653
alert: high_skew	skew=+5.23

Fig 29.

Distribution of column20. Vertical dash marks the median.

Show data table

Histogram bins for column20 (median: 0.0).
bin	count
0 – 2.425	118355
2.425 – 4.85	0
4.85 – 7.275	1
7.275 – 9.7	0
9.7 – 12.12	0
12.12 – 14.55	0
14.55 – 16.97	0
16.97 – 19.4	0
19.4 – 21.82	1
21.82 – 24.25	2
24.25 – 26.67	0
26.67 – 29.1	2
29.1 – 31.52	2
31.52 – 33.95	3
33.95 – 36.38	8
36.38 – 38.8	6
38.8 – 41.22	16
41.22 – 43.65	14
43.65 – 46.07	19
46.07 – 48.5	21
48.5 – 50.92	29
50.92 – 53.35	62
53.35 – 55.77	56
55.77 – 58.2	100
58.2 – 60.62	85
60.62 – 63.05	178
63.05 – 65.47	154
65.47 – 67.9	179
67.9 – 70.32	398
70.32 – 72.75	270
72.75 – 75.17	514
75.17 – 77.6	383
77.6 – 80.02	605
80.02 – 82.45	367
82.45 – 84.88	283
84.88 – 87.3	268
87.3 – 89.72	109
89.72 – 92.15	88
92.15 – 94.57	24
94.57 – 97	9

column21 text

Out[77]:

saturn.columns["column21"].stats

stat	value
n	122,611
nulls	118,355 (96.5%)
unique	4,160
len_min	42
len_max	142
len_mean	72.43
len_median	70
len_p95	91
word_mean	1
word_median	1
n_empty	0
n_duplicates	96
duplicate_rate	0.02256
vocab_size	4,160
readability_flesch_mean	-704.1
emoji_rate	0
url_rate	1
one_word_rate	1
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	97.7% of rows are unique strings
alert: one_word	100.0% rows are a single word
alert: url_heavy	100.0% rows contain a URL
alert: null_rate	96.5% null

Fig 30.

Character-length distribution for column21.

Show data table

Character-length distribution for column21 (mean: 72.42857142857143).
chars	count
42 – 44	2
44 – 47	1
47 – 50	0
50 – 52	5
52 – 54	5
54 – 57	1
57 – 60	89
60 – 62	209
62 – 64	597
64 – 67	444
67 – 70	632
70 – 72	377
72 – 74	441
74 – 77	238
77 – 80	307
80 – 82	189
82 – 84	241
84 – 87	94
87 – 90	106
90 – 92	74
92 – 94	74
94 – 97	33
97 – 100	38
100 – 102	17
102 – 104	10
104 – 107	10
107 – 110	12
110 – 112	3
112 – 114	1
114 – 117	0
117 – 120	5
120 – 122	0
122 – 124	0
124 – 127	0
127 – 130	0
130 – 132	0
132 – 134	0
134 – 137	0
137 – 140	0
140 – 142	1

column22 numeric feature

This column is almost certainly a sparse indicator or rare-event count: 99.97% of its 122,611 values are exactly zero, with only 40 flagged outliers and a maximum of 100.0. The 31 unique values and an IQR of 0.0 confirm that the vast majority of rows carry no signal at all. The extreme skew (59.25) and kurtosis (3,627.8) are a direct consequence of this near-total zero mass, making standard continuous modelling inappropriate without transformation or binarisation.

Treatment: Binarise (zero vs. non-zero) or treat as a rare-event indicator; if the raw magnitude matters, cap at a sensible percentile and log1p-transform before modelling.

anthropic:default · confidence high

Out[80]:

saturn.columns["column22"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	31
min	0
max	100
mean	0.02455
median	0
std	1.395
q1	0
q3	0
iqr	0
skew	59.25
kurtosis	3628
n_outliers	40
outlier_rate	0.0003262
zero_rate	0.9997
alert: high_skew	skew=+59.25

Fig 31.

Distribution of column22. Vertical dash marks the median.

Show data table

Histogram bins for column22 (median: 0.0).
bin	count
0 – 2.5	122571
2.5 – 5	0
5 – 7.5	0
7.5 – 10	0
10 – 12.5	0
12.5 – 15	0
15 – 17.5	0
17.5 – 20	0
20 – 22.5	0
22.5 – 25	0
25 – 27.5	0
27.5 – 30	0
30 – 32.5	0
32.5 – 35	0
35 – 37.5	1
37.5 – 40	0
40 – 42.5	0
42.5 – 45	0
45 – 47.5	2
47.5 – 50	0
50 – 52.5	2
52.5 – 55	1
55 – 57.5	2
57.5 – 60	0
60 – 62.5	2
62.5 – 65	1
65 – 67.5	2
67.5 – 70	3
70 – 72.5	1
72.5 – 75	1
75 – 77.5	3
77.5 – 80	1
80 – 82.5	3
82.5 – 85	3
85 – 87.5	1
87.5 – 90	1
90 – 92.5	1
92.5 – 95	1
95 – 97.5	3
97.5 – 100	5

column23 numeric feature

This column is a numeric count or magnitude field — likely representing activity volume, transaction amount, or similar accumulation metric — with 122,611 non-null records and only 5,540 distinct values. The distribution is extraordinarily right-skewed (skew=177.84, kurtosis=45,295.94): the median is just 5.0 while the mean is 1,044.99, and the maximum reaches 7,642,084 — a value roughly 272x the standard deviation above the mean. About 34.5% of values are zero and 17.0% are flagged as outliers (20,797 rows), indicating a heavy zero-inflated tail with extreme rare events dominating the mean.

Treatment: Apply log1p-transform (to handle zeros) before modelling, and consider capping or winsorizing at a high percentile to suppress the extreme outliers up to 7,642,084.

anthropic:default · confidence high

Out[83]:

saturn.columns["column23"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	5,540
min	0
max	7.642e+06
mean	1045
median	5
std	2.809e+04
q1	0
q3	37
iqr	37
skew	177.8
kurtosis	4.53e+04
n_outliers	20,797
outlier_rate	0.1696
zero_rate	0.3448
alert: high_skew	skew=+177.84
alert: outliers	17.0% rows beyond 1.5 IQR

Fig 32.

Distribution of column23. Vertical dash marks the median.

Show data table

Histogram bins for column23 (median: 5.0).
bin	count
0 – 1.911e+05	122511
1.911e+05 – 3.821e+05	57
3.821e+05 – 5.732e+05	16
5.732e+05 – 7.642e+05	10
7.642e+05 – 9.553e+05	6
9.553e+05 – 1.146e+06	5
1.146e+06 – 1.337e+06	1
1.337e+06 – 1.528e+06	2
1.528e+06 – 1.719e+06	0
1.719e+06 – 1.911e+06	1
1.911e+06 – 2.102e+06	1
2.102e+06 – 2.293e+06	0
2.293e+06 – 2.484e+06	0
2.484e+06 – 2.675e+06	0
2.675e+06 – 2.866e+06	0
2.866e+06 – 3.057e+06	0
3.057e+06 – 3.248e+06	0
3.248e+06 – 3.439e+06	0
3.439e+06 – 3.63e+06	0
3.63e+06 – 3.821e+06	0
3.821e+06 – 4.012e+06	0
4.012e+06 – 4.203e+06	0
4.203e+06 – 4.394e+06	0
4.394e+06 – 4.585e+06	0
4.585e+06 – 4.776e+06	0
4.776e+06 – 4.967e+06	0
4.967e+06 – 5.158e+06	0
5.158e+06 – 5.349e+06	0
5.349e+06 – 5.541e+06	0
5.541e+06 – 5.732e+06	0
5.732e+06 – 5.923e+06	0
5.923e+06 – 6.114e+06	0
6.114e+06 – 6.305e+06	0
6.305e+06 – 6.496e+06	0
6.496e+06 – 6.687e+06	0
6.687e+06 – 6.878e+06	0
6.878e+06 – 7.069e+06	0
7.069e+06 – 7.26e+06	0
7.26e+06 – 7.451e+06	0
7.451e+06 – 7.642e+06	1

column24 numeric feature

This column is likely a count or frequency measure (e.g., event occurrences, transaction counts, or interaction tallies) given its non-negative integer-like range and high zero rate. The distribution is extraordinarily right-skewed: the median is 1.0 and Q3 is only 10.0, yet the maximum reaches 1,173,003 — a difference of over six orders of magnitude. With 45% zeros, ~16.9% flagged outliers (20,696 rows), a skew of 156.86, and kurtosis exceeding 30,000, the bulk of records cluster near zero while a small number of extreme values dominate the mean (169.20 vs. median 1.0). This is a severe long-tail distribution that will distort any linear model if used as-is.

Treatment: Apply log1p-transform (or cap at a high percentile) before modelling to reduce extreme skew.

anthropic:default · confidence medium

Out[86]:

saturn.columns["column24"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	2,725
min	0
max	1.173e+06
mean	169.2
median	1
std	5375
q1	0
q3	10
iqr	10
skew	156.9
kurtosis	3.063e+04
n_outliers	20,696
outlier_rate	0.1688
zero_rate	0.4502
alert: high_skew	skew=+156.86
alert: outliers	16.9% rows beyond 1.5 IQR

Fig 33.

Distribution of column24. Vertical dash marks the median.

Show data table

Histogram bins for column24 (median: 1.0).
bin	count
0 – 2.933e+04	122529
2.933e+04 – 5.865e+04	40
5.865e+04 – 8.798e+04	18
8.798e+04 – 1.173e+05	9
1.173e+05 – 1.466e+05	3
1.466e+05 – 1.76e+05	3
1.76e+05 – 2.053e+05	0
2.053e+05 – 2.346e+05	1
2.346e+05 – 2.639e+05	2
2.639e+05 – 2.933e+05	1
2.933e+05 – 3.226e+05	1
3.226e+05 – 3.519e+05	1
3.519e+05 – 3.812e+05	0
3.812e+05 – 4.106e+05	0
4.106e+05 – 4.399e+05	0
4.399e+05 – 4.692e+05	1
4.692e+05 – 4.985e+05	0
4.985e+05 – 5.279e+05	0
5.279e+05 – 5.572e+05	0
5.572e+05 – 5.865e+05	0
5.865e+05 – 6.158e+05	0
6.158e+05 – 6.452e+05	0
6.452e+05 – 6.745e+05	0
6.745e+05 – 7.038e+05	0
7.038e+05 – 7.331e+05	0
7.331e+05 – 7.625e+05	0
7.625e+05 – 7.918e+05	0
7.918e+05 – 8.211e+05	0
8.211e+05 – 8.504e+05	0
8.504e+05 – 8.798e+05	0
8.798e+05 – 9.091e+05	0
9.091e+05 – 9.384e+05	0
9.384e+05 – 9.677e+05	0
9.677e+05 – 9.971e+05	0
9.971e+05 – 1.026e+06	0
1.026e+06 – 1.056e+06	1
1.056e+06 – 1.085e+06	0
1.085e+06 – 1.114e+06	0
1.114e+06 – 1.144e+06	0
1.144e+06 – 1.173e+06	1

column25 numeric

Out[89]:

saturn.columns["column25"].stats

stat	value
n	122,611
nulls	122,571 (100.0%)
unique	3
min	98
max	100
mean	99.17
median	99
std	0.6751
q1	99
q3	100
iqr	1
skew	-0.2149
kurtosis	-0.7872
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	100.0% null

Fig 34.

Distribution of column25. Vertical dash marks the median.

Show data table

Histogram bins for column25 (median: 99.0).
bin	count
98 – 98.33	6
98.33 – 98.67	0
98.67 – 99	0
99 – 99.33	21
99.33 – 99.67	0
99.67 – 100	13

column26 numeric feature

This column is likely a count or frequency metric (e.g., event occurrences, transaction counts, or tenure in days/months), given its non-negative integer values with only 448 distinct values across 122,611 rows. The distribution is severely right-skewed (skew=32.63, kurtosis=1192.15): the median is just 2.0 while the mean is 18.09, Q1 is 0.0, and the maximum reaches 9,821—an extreme outlier relative to the IQR of 19. Nearly half the rows (48.6%) are zero, and 6.9% are flagged as outliers, signaling a heavy zero-inflated tail that will distort any linear model trained on raw values.

Treatment: Apply log1p-transform (or a zero-inflated model) to compress the extreme right tail before modelling.

anthropic:default · confidence high

Out[92]:

saturn.columns["column26"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	448
min	0
max	9,821
mean	18.09
median	2
std	141.5
q1	0
q3	19
iqr	19
skew	32.63
kurtosis	1192
n_outliers	8,433
outlier_rate	0.06878
zero_rate	0.4859
alert: high_skew	skew=+32.63
alert: outliers	6.9% rows beyond 1.5 IQR

Fig 35.

Distribution of column26. Vertical dash marks the median.

Show data table

Histogram bins for column26 (median: 2.0).
bin	count
0 – 245.5	122280
245.5 – 491.1	109
491.1 – 736.6	44
736.6 – 982.1	16
982.1 – 1228	18
1228 – 1473	14
1473 – 1719	12
1719 – 1964	4
1964 – 2210	11
2210 – 2455	5
2455 – 2701	2
2701 – 2946	2
2946 – 3192	7
3192 – 3437	4
3437 – 3683	1
3683 – 3928	0
3928 – 4174	3
4174 – 4419	2
4419 – 4665	2
4665 – 4910	5
4910 – 5156	68
5156 – 5402	1
5402 – 5647	0
5647 – 5893	0
5893 – 6138	0
6138 – 6384	0
6384 – 6629	0
6629 – 6875	0
6875 – 7120	0
7120 – 7366	0
7366 – 7611	0
7611 – 7857	0
7857 – 8102	0
8102 – 8348	0
8348 – 8593	0
8593 – 8839	0
8839 – 9084	0
9084 – 9330	0
9330 – 9575	0
9575 – 9821	1

column27 numeric feature

This column is a sparse, heavily right-skewed numeric count or amount field — likely representing an event frequency, transaction volume, or similar quantity that is zero for the vast majority of records. 82.9% of the 122,611 rows are exactly zero, the median is 0.0, and the IQR is 0.0, yet the mean is 961.8 and the maximum reaches 4,830,455 — indicating a tiny fraction of extreme values driving nearly all the variance. The skew of 113.9 and kurtosis of 20,874.5 are extraordinary, and 17.1% of rows are flagged as outliers, confirming that the non-zero tail is severely extreme relative to the bulk of the distribution.

Treatment: Apply log1p-transform (or treat as two-part: zero/non-zero indicator + log-transformed non-zero value) before modelling to handle extreme skew and outliers.

anthropic:default · confidence high

Out[95]:

saturn.columns["column27"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	5,332
min	0
max	4.83e+06
mean	961.8
median	0
std	2.188e+04
q1	0
q3	0
iqr	0
skew	113.9
kurtosis	2.087e+04
n_outliers	20,906
outlier_rate	0.1705
zero_rate	0.8295
alert: high_skew	skew=+113.91
alert: outliers	17.1% rows beyond 1.5 IQR

Fig 36.

Distribution of column27. Vertical dash marks the median.

Show data table

Histogram bins for column27 (median: 0.0).
bin	count
0 – 1.208e+05	122458
1.208e+05 – 2.415e+05	85
2.415e+05 – 3.623e+05	30
3.623e+05 – 4.83e+05	11
4.83e+05 – 6.038e+05	2
6.038e+05 – 7.246e+05	4
7.246e+05 – 8.453e+05	8
8.453e+05 – 9.661e+05	2
9.661e+05 – 1.087e+06	2
1.087e+06 – 1.208e+06	1
1.208e+06 – 1.328e+06	5
1.328e+06 – 1.449e+06	0
1.449e+06 – 1.57e+06	0
1.57e+06 – 1.691e+06	0
1.691e+06 – 1.811e+06	1
1.811e+06 – 1.932e+06	1
1.932e+06 – 2.053e+06	0
2.053e+06 – 2.174e+06	0
2.174e+06 – 2.294e+06	0
2.294e+06 – 2.415e+06	0
2.415e+06 – 2.536e+06	0
2.536e+06 – 2.657e+06	0
2.657e+06 – 2.778e+06	0
2.778e+06 – 2.898e+06	0
2.898e+06 – 3.019e+06	0
3.019e+06 – 3.14e+06	0
3.14e+06 – 3.261e+06	0
3.261e+06 – 3.381e+06	0
3.381e+06 – 3.502e+06	0
3.502e+06 – 3.623e+06	0
3.623e+06 – 3.744e+06	0
3.744e+06 – 3.864e+06	0
3.864e+06 – 3.985e+06	0
3.985e+06 – 4.106e+06	0
4.106e+06 – 4.227e+06	0
4.227e+06 – 4.347e+06	0
4.347e+06 – 4.468e+06	0
4.468e+06 – 4.589e+06	0
4.589e+06 – 4.71e+06	0
4.71e+06 – 4.83e+06	1

column28 text free_text

This column contains free-text content warnings or age-rating disclosures for video games, with recurring phrases about mature content, nudity, sexual content, and violence. It is massively sparse — 81.68% of rows are null — meaning most games carry no such warning. The duplicate rate of 17.09% (3,839 duplicates across 18,620 unique values) reflects the use of templated boilerplate warning strings, while a small multilingual signal (2 Chinese, 1 Japanese entries) indicates some non-English publisher submissions. Flesch readability of 44.38 and a median length of 124 characters are consistent with dense legal/disclaimer prose.

Treatment: Encode as binary 'has_warning' flag and/or extract categorical warning types (violence, nudity, sexual content) via keyword/regex before modelling; drop raw text.

anthropic:default · confidence high

Out[98]:

saturn.columns["column28"].stats

stat	value
n	122,611
nulls	100,152 (81.7%)
unique	18,620
len_min	2
len_max	2,020
len_mean	164.1
len_median	124
len_p95	445
word_mean	25.74
word_median	20
n_empty	0
n_duplicates	3,839
duplicate_rate	0.1709
vocab_size	23,061
readability_flesch_mean	44.38
emoji_rate	0.0007124
url_rate	8.905e-05
one_word_rate	0.009039
allcaps_rate	0.008193
boilerplate_rate	0.009484
alert: multilingual	4 languages detected in sample
alert: null_rate	81.7% null

Fig 37.

Character-length distribution for column28.

Show data table

Character-length distribution for column28 (mean: 164.09902488979918).
chars	count
2 – 52	4251
52 – 103	5273
103 – 153	3975
153 – 204	3096
204 – 254	1915
254 – 305	1167
305 – 355	787
355 – 406	579
406 – 456	389
456 – 506	246
506 – 557	175
557 – 607	143
607 – 658	91
658 – 708	64
708 – 759	60
759 – 809	49
809 – 860	34
860 – 910	25
910 – 961	32
961 – 1011	23
1011 – 1061	18
1061 – 1112	13
1112 – 1162	10
1162 – 1213	6
1213 – 1263	4
1263 – 1314	8
1314 – 1364	5
1364 – 1415	3
1415 – 1465	2
1465 – 1516	4
1516 – 1566	2
1566 – 1616	3
1616 – 1667	0
1667 – 1717	2
1717 – 1768	2
1768 – 1818	0
1818 – 1869	2
1869 – 1919	0
1919 – 1970	0
1970 – 2020	1

column29 numeric feature

This column is a heavily zero-inflated count or amount field — 78.7% of its 122,611 rows are exactly zero, and the interquartile range is 0.0, meaning the entire middle 50% of the distribution is zero. Despite a median of 0 and mean of only 208, the max reaches 3,429,544, producing extreme skew (262.89) and kurtosis (75,698), with 21.3% of rows flagged as outliers. This pattern is consistent with a sparse event-count, transaction amount, or usage metric where most entities are inactive but a small tail drives enormous values.

Treatment: Apply log1p-transform or treat as two-part model (zero vs. non-zero) before regression or ML use.

anthropic:default · confidence high

Out[101]:

saturn.columns["column29"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	3,037
min	0
max	3.43e+06
mean	208
median	0
std	1.122e+04
q1	0
q3	0
iqr	0
skew	262.9
kurtosis	7.57e+04
n_outliers	26,119
outlier_rate	0.213
zero_rate	0.787
alert: high_skew	skew=+262.89
alert: outliers	21.3% rows beyond 1.5 IQR

Fig 38.

Distribution of column29. Vertical dash marks the median.

Show data table

Histogram bins for column29 (median: 0.0).
bin	count
0 – 8.574e+04	122594
8.574e+04 – 1.715e+05	10
1.715e+05 – 2.572e+05	3
2.572e+05 – 3.43e+05	1
3.43e+05 – 4.287e+05	1
4.287e+05 – 5.144e+05	0
5.144e+05 – 6.002e+05	0
6.002e+05 – 6.859e+05	0
6.859e+05 – 7.716e+05	0
7.716e+05 – 8.574e+05	0
8.574e+05 – 9.431e+05	0
9.431e+05 – 1.029e+06	0
1.029e+06 – 1.115e+06	0
1.115e+06 – 1.2e+06	0
1.2e+06 – 1.286e+06	0
1.286e+06 – 1.372e+06	0
1.372e+06 – 1.458e+06	0
1.458e+06 – 1.543e+06	0
1.543e+06 – 1.629e+06	0
1.629e+06 – 1.715e+06	1
1.715e+06 – 1.801e+06	0
1.801e+06 – 1.886e+06	0
1.886e+06 – 1.972e+06	0
1.972e+06 – 2.058e+06	0
2.058e+06 – 2.143e+06	0
2.143e+06 – 2.229e+06	0
2.229e+06 – 2.315e+06	0
2.315e+06 – 2.401e+06	0
2.401e+06 – 2.486e+06	0
2.486e+06 – 2.572e+06	0
2.572e+06 – 2.658e+06	0
2.658e+06 – 2.744e+06	0
2.744e+06 – 2.829e+06	0
2.829e+06 – 2.915e+06	0
2.915e+06 – 3.001e+06	0
3.001e+06 – 3.087e+06	0
3.087e+06 – 3.172e+06	0
3.172e+06 – 3.258e+06	0
3.258e+06 – 3.344e+06	0
3.344e+06 – 3.43e+06	1

column30 numeric feature

This column is a heavily zero-inflated count or amount field: 96.8% of its 122,611 rows are exactly zero, driving a median of 0.0 and an IQR of 0.0. The remaining values are extremely skewed (skew = 51.68, kurtosis = 3252.96), with a mean of 13.79 pulled far right by a maximum of 20,088 — likely representing rare but large events such as transaction amounts, error counts, or penalty values. The 3,898 outliers (3.2% of rows) account for virtually all non-zero variance, which is the defining surprise here.

Treatment: Apply zero-inflated modelling or split into a binary indicator plus a log-transformed positive-value sub-model before regression.

anthropic:default · confidence high

Out[104]:

saturn.columns["column30"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	993
min	0
max	20,088
mean	13.79
median	0
std	270.4
q1	0
q3	0
iqr	0
skew	51.68
kurtosis	3253
n_outliers	3,898
outlier_rate	0.03179
zero_rate	0.9682
alert: high_skew	skew=+51.68

Fig 39.

Distribution of column30. Vertical dash marks the median.

Show data table

Histogram bins for column30 (median: 0.0).
bin	count
0 – 502.2	121943
502.2 – 1004	377
1004 – 1507	112
1507 – 2009	57
2009 – 2511	35
2511 – 3013	14
3013 – 3515	7
3515 – 4018	7
4018 – 4520	9
4520 – 5022	2
5022 – 5524	2
5524 – 6026	3
6026 – 6529	4
6529 – 7031	5
7031 – 7533	2
7533 – 8035	2
8035 – 8537	4
8537 – 9040	1
9040 – 9542	0
9542 – 1.004e+04	2
1.004e+04 – 1.055e+04	2
1.055e+04 – 1.105e+04	1
1.105e+04 – 1.155e+04	0
1.155e+04 – 1.205e+04	1
1.205e+04 – 1.256e+04	0
1.256e+04 – 1.306e+04	1
1.306e+04 – 1.356e+04	3
1.356e+04 – 1.406e+04	0
1.406e+04 – 1.456e+04	0
1.456e+04 – 1.507e+04	0
1.507e+04 – 1.557e+04	0
1.557e+04 – 1.607e+04	0
1.607e+04 – 1.657e+04	3
1.657e+04 – 1.707e+04	3
1.707e+04 – 1.758e+04	0
1.758e+04 – 1.808e+04	0
1.808e+04 – 1.858e+04	0
1.858e+04 – 1.908e+04	0
1.908e+04 – 1.959e+04	0
1.959e+04 – 2.009e+04	9

column31 numeric feature

This column is a sparse count or activity metric where the overwhelming majority of records (78.7%) are zero, producing a median of 0.0 and an IQR of exactly 0.0. The distribution is extraordinarily right-skewed (skew = 263.99, kurtosis = 76112.44), driven by extreme outliers reaching a max of 3,429,544 against a mean of only 173.57 — indicating a tiny fraction of records carry massive values. Roughly 21.3% of rows (26,119) are flagged as outliers, which is an unusually high outlier rate and signals a power-law or heavy-tailed phenomenon rather than a simple data error.

Treatment: Apply log1p-transform (or a zero-inflated model) before regression; consider capping at a high percentile to manage extreme outliers.

anthropic:default · confidence high

Out[107]:

saturn.columns["column31"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	2,511
min	0
max	3.43e+06
mean	173.6
median	0
std	1.12e+04
q1	0
q3	0
iqr	0
skew	264
kurtosis	7.611e+04
n_outliers	26,119
outlier_rate	0.213
zero_rate	0.787
alert: high_skew	skew=+263.99
alert: outliers	21.3% rows beyond 1.5 IQR

Fig 40.

Distribution of column31. Vertical dash marks the median.

Show data table

Histogram bins for column31 (median: 0.0).
bin	count
0 – 8.574e+04	122592
8.574e+04 – 1.715e+05	12
1.715e+05 – 2.572e+05	3
2.572e+05 – 3.43e+05	1
3.43e+05 – 4.287e+05	1
4.287e+05 – 5.144e+05	0
5.144e+05 – 6.002e+05	0
6.002e+05 – 6.859e+05	0
6.859e+05 – 7.716e+05	0
7.716e+05 – 8.574e+05	0
8.574e+05 – 9.431e+05	0
9.431e+05 – 1.029e+06	0
1.029e+06 – 1.115e+06	0
1.115e+06 – 1.2e+06	0
1.2e+06 – 1.286e+06	0
1.286e+06 – 1.372e+06	0
1.372e+06 – 1.458e+06	0
1.458e+06 – 1.543e+06	0
1.543e+06 – 1.629e+06	0
1.629e+06 – 1.715e+06	1
1.715e+06 – 1.801e+06	0
1.801e+06 – 1.886e+06	0
1.886e+06 – 1.972e+06	0
1.972e+06 – 2.058e+06	0
2.058e+06 – 2.143e+06	0
2.143e+06 – 2.229e+06	0
2.229e+06 – 2.315e+06	0
2.315e+06 – 2.401e+06	0
2.401e+06 – 2.486e+06	0
2.486e+06 – 2.572e+06	0
2.572e+06 – 2.658e+06	0
2.658e+06 – 2.744e+06	0
2.744e+06 – 2.829e+06	0
2.829e+06 – 2.915e+06	0
2.915e+06 – 3.001e+06	0
3.001e+06 – 3.087e+06	0
3.087e+06 – 3.172e+06	0
3.172e+06 – 3.258e+06	0
3.258e+06 – 3.344e+06	0
3.344e+06 – 3.43e+06	1

column32 numeric feature

This column is almost certainly a sparse count or occurrence field — likely an event frequency, error count, or similar rare-event tally. The zero_rate of 96.8% means the vast majority of rows have no event, while the remaining ~3.2% drive an extreme right tail (skew=48.9, kurtosis=2848.5) reaching a maximum of 20,088 against a median of 0 and mean of 14.7. The IQR of 0.0 confirms the middle 50% of the distribution is entirely flat at zero, with 3,898 flagged outliers carrying virtually all the variance.

Treatment: Apply log1p transform or treat as binary (zero vs. non-zero) flag before modelling; consider capping at a high percentile to suppress the extreme tail.

anthropic:default · confidence high

Out[110]:

saturn.columns["column32"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	993
min	0
max	20,088
mean	14.72
median	0
std	294.5
q1	0
q3	0
iqr	0
skew	48.91
kurtosis	2848
n_outliers	3,898
outlier_rate	0.03179
zero_rate	0.9682
alert: high_skew	skew=+48.91

Fig 41.

Distribution of column32. Vertical dash marks the median.

Show data table

Histogram bins for column32 (median: 0.0).
bin	count
0 – 502.2	121952
502.2 – 1004	342
1004 – 1507	114
1507 – 2009	66
2009 – 2511	35
2511 – 3013	19
3013 – 3515	5
3515 – 4018	8
4018 – 4520	12
4520 – 5022	6
5022 – 5524	2
5524 – 6026	4
6026 – 6529	5
6529 – 7031	5
7031 – 7533	1
7533 – 8035	1
8035 – 8537	3
8537 – 9040	1
9040 – 9542	0
9542 – 1.004e+04	0
1.004e+04 – 1.055e+04	2
1.055e+04 – 1.105e+04	1
1.105e+04 – 1.155e+04	1
1.155e+04 – 1.205e+04	1
1.205e+04 – 1.256e+04	1
1.256e+04 – 1.306e+04	1
1.306e+04 – 1.356e+04	3
1.356e+04 – 1.406e+04	0
1.406e+04 – 1.456e+04	0
1.456e+04 – 1.507e+04	1
1.507e+04 – 1.557e+04	0
1.557e+04 – 1.607e+04	1
1.607e+04 – 1.657e+04	3
1.657e+04 – 1.707e+04	4
1.707e+04 – 1.758e+04	0
1.758e+04 – 1.808e+04	0
1.808e+04 – 1.858e+04	0
1.858e+04 – 1.908e+04	0
1.908e+04 – 1.959e+04	1
1.959e+04 – 2.009e+04	10

column33 text label

This column contains game developer or publisher names, evidenced by top values such as 'Choice of Games', 'KOEI TECMO GAMES CO., LTD.', and dominant vocabulary including 'games', 'studio', 'studios', 'interactive', and 'entertainment'. The duplicate rate of 37.98% (43,364 duplicates across 122,611 rows) is expected — publishers release multiple titles — but the 70,816 unique values and a max length of 584 characters suggest occasional free-text entries or combined multi-publisher strings. The one-word rate of 31.8% and mean word count of ~2 words are consistent with company name formats, though the wide length range (1–584 chars) warrants inspection for outliers.

Treatment: Normalize casing and strip punctuation variants before grouping; use as a categorical grouping key or encode as a feature via target/frequency encoding.

anthropic:default · confidence high

Out[113]:

saturn.columns["column33"].stats

stat	value
n	122,611
nulls	8,431 (6.9%)
unique	70,816
len_min	1
len_max	584
len_mean	14.37
len_median	13
len_p95	27
word_mean	2.019
word_median	2
n_empty	0
n_duplicates	43,364
duplicate_rate	0.3798
vocab_size	18,429
readability_flesch_mean	38.73
emoji_rate	0.0008933
url_rate	0.000219
one_word_rate	0.3181
allcaps_rate	0.07974
boilerplate_rate	0
alert: one_word	31.8% rows are a single word
alert: duplicates	38.0% duplicate strings

Fig 42.

Character-length distribution for column33.

Show data table

Character-length distribution for column33 (mean: 14.365659485023647).
chars	count
1 – 16	77264
16 – 30	33047
30 – 45	2696
45 – 59	611
59 – 74	241
74 – 88	129
88 – 103	71
103 – 118	34
118 – 132	18
132 – 147	15
147 – 161	10
161 – 176	7
176 – 190	7
190 – 205	8
205 – 220	2
220 – 234	4
234 – 249	1
249 – 263	4
263 – 278	1
278 – 292	2
292 – 307	2
307 – 322	0
322 – 336	1
336 – 351	0
351 – 365	0
365 – 380	1
380 – 395	0
395 – 409	0
409 – 424	1
424 – 438	0
438 – 453	0
453 – 467	0
467 – 482	0
482 – 497	0
497 – 511	0
511 – 526	0
526 – 540	1
540 – 555	1
555 – 569	0
569 – 584	1

column34 text label

This column contains game publisher or developer company names, as evidenced by top values like 'BFG Entertainment', 'Choice of Games', and 'Strategy First', and top words dominated by 'games', 'studio', 'studios', 'entertainment', and corporate suffixes ('llc', 'inc.', 'ltd.'). The duplicate rate is notably high at 44.9% (51,089 duplicates across 122,611 rows), which is expected since many games share the same publisher. The one-word rate of 31.8% reflects single-token studio names, and the 7.2% null rate warrants attention for records with unknown publishers.

Treatment: Encode as a categorical feature (e.g. frequency or target encoding); investigate nulls at 7.2% before modelling.

anthropic:default · confidence high

Out[116]:

saturn.columns["column34"].stats

stat	value
n	122,611
nulls	8,833 (7.2%)
unique	62,689
len_min	1
len_max	164
len_mean	13.82
len_median	13
len_p95	26
word_mean	1.988
word_median	2
n_empty	0
n_duplicates	51,089
duplicate_rate	0.449
vocab_size	15,765
readability_flesch_mean	40.22
emoji_rate	0.0009141
url_rate	0.0002285
one_word_rate	0.3178
allcaps_rate	0.0817
boilerplate_rate	0
alert: one_word	31.8% rows are a single word
alert: duplicates	44.9% duplicate strings

Fig 43.

Character-length distribution for column34.

Show data table

Character-length distribution for column34 (mean: 13.824825537450122).
chars	count
1 – 5	6733
5 – 9	22067
9 – 13	32481
13 – 17	28105
17 – 21	13359
21 – 25	5059
25 – 30	2901
30 – 34	1309
34 – 38	764
38 – 42	388
42 – 46	199
46 – 50	132
50 – 54	58
54 – 58	90
58 – 62	46
62 – 66	35
66 – 70	11
70 – 74	15
74 – 78	7
78 – 82	5
82 – 87	2
87 – 91	1
91 – 95	1
95 – 99	2
99 – 103	3
103 – 107	1
107 – 111	0
111 – 115	0
115 – 119	0
119 – 123	0
123 – 127	1
127 – 131	1
131 – 135	0
135 – 140	0
140 – 144	0
144 – 148	0
148 – 152	0
152 – 156	0
156 – 160	0
160 – 164	2

column35 text feature

This column contains a comma-delimited list of Steam game features/categories (e.g., 'Single-player', 'Steam Achievements', 'Family Sharing', 'Full controller support'), typical of the Steam store's supported features field per game. The extreme duplicate rate (88.3%, 100,367 of 122,611 rows) is expected because many games share identical feature sets, and the tiny vocabulary size of 589 words confirms a finite, enumerated tag system. The 'da' language detection on 12 rows is almost certainly a false positive from short comma-separated tokens, not actual Danish text. With only 13,291 unique combinations out of 122,611 rows, this column is highly suitable for multi-label binarization.

Treatment: Split on commas and one-hot encode each feature tag for modelling.

anthropic:default · confidence high

Out[119]:

saturn.columns["column35"].stats

stat	value
n	122,611
nulls	8,953 (7.3%)
unique	13,291
len_min	3
len_max	534
len_mean	71.58
len_median	59
len_p95	178
word_mean	5.089
word_median	4
n_empty	0
n_duplicates	100,367
duplicate_rate	0.8831
vocab_size	589
readability_flesch_mean	-105.9
emoji_rate	0
url_rate	0
one_word_rate	0.04047
allcaps_rate	8.798e-06
boilerplate_rate	0
alert: duplicates	88.3% duplicate strings

Fig 44.

Character-length distribution for column35.

Show data table

Character-length distribution for column35 (mean: 71.58431434654841).
chars	count
3 – 16	4980
16 – 30	24221
30 – 43	6419
43 – 56	20213
56 – 69	12806
69 – 83	10422
83 – 96	9469
96 – 109	5839
109 – 122	3729
122 – 136	2863
136 – 149	2700
149 – 162	2033
162 – 176	1967
176 – 189	1499
189 – 202	1190
202 – 215	830
215 – 229	625
229 – 242	475
242 – 255	363
255 – 268	282
268 – 282	175
282 – 295	137
295 – 308	105
308 – 322	70
322 – 335	66
335 – 348	54
348 – 361	38
361 – 375	19
375 – 388	21
388 – 401	18
401 – 415	6
415 – 428	7
428 – 441	3
441 – 454	4
454 – 468	1
468 – 481	6
481 – 494	1
494 – 507	0
507 – 521	0
521 – 534	2

column36 text label

This column contains comma-separated game genre tags (e.g., 'Casual,Indie', 'Action,Adventure,Indie'), consistent with a Steam or similar game catalog dataset. The duplicate rate is extremely high at 97.5%, reflecting the natural cardinality collapse when games share genre combinations — only 2,894 unique tag-sets exist across 122,611 rows. The top words 'to', 'access', and 'play' suggest some rows contain free-text strings like 'Early Access' or 'Free to Play' mixed into the same field, indicating occasional value pollution worth investigating.

Treatment: Split on comma to multi-hot encode genre tags before modelling; flag rows where values contain free-text phrases ('to', 'access', 'play') for cleansing.

anthropic:default · confidence high

Out[122]:

saturn.columns["column36"].stats

stat	value
n	122,611
nulls	8,413 (6.9%)
unique	2,894
len_min	3
len_max	236
len_mean	22.21
len_median	21
len_p95	45
word_mean	1.364
word_median	1
n_empty	0
n_duplicates	111,304
duplicate_rate	0.9747
vocab_size	940
readability_flesch_mean	-206.1
emoji_rate	0
url_rate	0
one_word_rate	0.7892
allcaps_rate	0.009781
boilerplate_rate	0
alert: one_word	78.9% rows are a single word
alert: duplicates	97.5% duplicate strings

Fig 45.

Character-length distribution for column36.

Show data table

Character-length distribution for column36 (mean: 22.205064887301003).
chars	count
3 – 9	12259
9 – 15	21084
15 – 20	22318
20 – 26	25837
26 – 32	12284
32 – 38	8026
38 – 44	5596
44 – 50	2995
50 – 55	1587
55 – 61	848
61 – 67	593
67 – 73	229
73 – 79	196
79 – 85	137
85 – 90	71
90 – 96	35
96 – 102	37
102 – 108	13
108 – 114	15
114 – 120	10
120 – 125	6
125 – 131	6
131 – 137	4
137 – 143	2
143 – 149	2
149 – 154	1
154 – 160	0
160 – 166	1
166 – 172	4
172 – 178	0
178 – 184	0
184 – 189	0
189 – 195	0
195 – 201	0
201 – 207	0
207 – 213	1
213 – 219	0
219 – 224	0
224 – 230	0
230 – 236	1

column37 text label

This column contains comma-separated genre/tag lists for software or game products (e.g., 'Adventure,Casual,Hidden Object', 'Action,Indie'), consistent with a Steam-style app catalog. The null rate of 32.02% is notably high and warrants investigation before modelling. A multilingual alert is raised, but the non-English content is negligible (26 records out of 3,376 detected), suggesting near-uniform English data with minor noise. The duplicate rate of 7.4% (6,167 duplicates) is expected given finite genre combinations across a large catalog.

Treatment: Split on commas to multi-hot encode genre tags; investigate and decide on imputation strategy for the 32.02% null rows before modelling.

anthropic:default · confidence high

Out[125]:

saturn.columns["column37"].stats

stat	value
n	122,611
nulls	39,265 (32.0%)
unique	77,179
len_min	3
len_max	295
len_mean	141.3
len_median	163
len_p95	228
word_mean	4.923
word_median	5
n_empty	0
n_duplicates	6,167
duplicate_rate	0.07399
vocab_size	57,260
readability_flesch_mean	-449.7
emoji_rate	0
url_rate	0
one_word_rate	0.1233
allcaps_rate	4.799e-05
boilerplate_rate	0
alert: multilingual	9 languages detected in sample
alert: null_rate	32.0% null

Fig 46.

Character-length distribution for column37.

Show data table

Character-length distribution for column37 (mean: 141.31500011998176).
chars	count
3 – 10	584
10 – 18	1637
18 – 25	2337
25 – 32	3056
32 – 40	2371
40 – 47	2362
47 – 54	2419
54 – 61	1862
61 – 69	1883
69 – 76	1806
76 – 83	1905
83 – 91	1584
91 – 98	1595
98 – 105	1879
105 – 112	1648
112 – 120	1657
120 – 127	1916
127 – 134	1722
134 – 142	1741
142 – 149	1845
149 – 156	2108
156 – 164	1958
164 – 171	2426
171 – 178	3392
178 – 186	3991
186 – 193	4902
193 – 200	6178
200 – 207	5471
207 – 215	4901
215 – 222	3666
222 – 229	2999
229 – 237	1612
237 – 244	918
244 – 251	599
251 – 258	239
258 – 266	120
266 – 273	37
273 – 280	13
280 – 288	5
288 – 295	2

column38 text metadata

This column contains comma-separated lists of Steam screenshot URLs (Akamai CDN), one packed string per row representing all screenshot images for a given Steam game entry. Every value is technically 'one word' (no spaces) because the URLs are concatenated without whitespace, explaining the paradoxical one_word_rate of 1.0 alongside a mean length of ~1319 characters and a max of 29132. With 116,483 unique values out of 122,611 rows and only 110 duplicates, this is near-unique; the small duplicate count likely reflects games with identical screenshot sets.

Treatment: Split on commas to extract individual screenshot URLs per game; store as a list-type column or explode into a separate screenshots table keyed by game id.

anthropic:default · confidence high

Out[128]:

saturn.columns["column38"].stats

stat	value
n	122,611
nulls	6,018 (4.9%)
unique	116,483
len_min	144
len_max	29,132
len_mean	1319
len_median	1,039
len_p95	2,773
word_mean	1
word_median	1
n_empty	0
n_duplicates	110
duplicate_rate	0.0009435
vocab_size	19,994
readability_flesch_mean	-5099
emoji_rate	0
url_rate	1
one_word_rate	1
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	99.9% of rows are unique strings
alert: one_word	100.0% rows are a single word
alert: url_heavy	100.0% rows contain a URL

Fig 47.

Character-length distribution for column38.

Show data table

Character-length distribution for column38 (mean: 1318.9448423147187).
chars	count
144 – 869	28597
869 – 1593	59377
1593 – 2318	18118
2318 – 3043	6216
3043 – 3768	2157
3768 – 4492	959
4492 – 5217	470
5217 – 5942	274
5942 – 6666	167
6666 – 7391	95
7391 – 8116	47
8116 – 8840	29
8840 – 9565	25
9565 – 10290	20
10290 – 11014	7
11014 – 11739	10
11739 – 12464	2
12464 – 13189	4
13189 – 13913	2
13913 – 14638	0
14638 – 15363	1
15363 – 16087	5
16087 – 16812	2
16812 – 17537	1
17537 – 18262	0
18262 – 18986	1
18986 – 19711	0
19711 – 20436	0
20436 – 21160	4
21160 – 21885	0
21885 – 22610	0
22610 – 23334	0
23334 – 24059	0
24059 – 24784	1
24784 – 25508	0
25508 – 26233	0
26233 – 26958	1
26958 – 27683	0
27683 – 28407	0
28407 – 29132	1

column39 unknown other

This column was skipped by the profiler, so its content and type are entirely unknown. With 122,611 rows, zero nulls, and no computed statistics or uniqueness information, no data-driven characterisation is possible. The 'skipped' alert is the only signal available.

Treatment: Manually inspect raw values to determine type and role before any further processing.

anthropic:default · confidence low

Out[131]:

saturn.columns["column39"].stats

stat	value
n	122,611
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

Overview

Summary confidence: high

column00 numeric identifier

column01 text label

column02 text timestamp

column03 categorical feature

column04 numeric feature

column05 numeric feature

column06 numeric feature

column07 numeric feature

column08 numeric feature

column09 text free_text

column10 text feature

column11 text feature

column12 text free_text

column13 text metadata

column14 text metadata

column15 text metadata

column16 text foreign_key

column17 categorical feature

column18 categorical feature

column19 categorical label

column20 numeric feature

column21 text

column22 numeric feature

column23 numeric feature

column24 numeric feature

column25 numeric

column26 numeric feature

column27 numeric feature

column28 text free_text

column29 numeric feature

column30 numeric feature

column31 numeric feature

column32 numeric feature

column33 text label

column34 text label

column35 text feature

column36 text label

column37 text label

column38 text metadata

column39 unknown other

How to cite