data trove steam users

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json

Saturn profiled 1 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json",
    "--findings", "data-trove-steam-users.json",
    "--llm", "anthropic:default",
])

Summary confidence: low

This is a single-row metadata record describing the Steam Users dataset, a collection of 14,306,064 Steam user profiles sourced from Steam Store data (likely via Kaggle or SteamSpy) and last updated on 2025-01-20. Rather than being an analytical dataset itself, it serves as a data catalogue entry pointing analysts toward the actual user data file (185 MB) which links to a recommendations.csv via user_id. The most important thing to note is the scale: over 14 million user profiles covering library size and review activity represent a substantial analytical resource. Before diving in, analysts should locate and join the referenced recommendations.csv to unlock the full relational value of this dataset.

citing: record_count.max · notes.top_value · source.top_value · last_updated.top_value · row_count

Out[4]:

saturn.schema() · 6 columns

column	kind	n	null%	unique	alerts
dataset_name	categorical	1	0.0%	1	long_tail imbalance
last_updated	categorical	1	0.0%	1	long_tail imbalance
source	categorical	1	0.0%	1	long_tail imbalance
record_count	numeric	1	0.0%	1	constant
fields	unknown	1	0.0%	—	skipped
notes	categorical	1	0.0%	1	long_tail imbalance

Fig 1.

dataset_name · Confirms this is a single-entry catalogue record for the 'Steam Users' dataset — useful as a quick identity check.

Show data table

Top values for dataset_name (1 unique shown, of 1 total).
value	count	share
Steam Users	1	100.0%

Fig 2.

source · Shows the data provenance (Steam Store via Kaggle or SteamSpy), which is critical for understanding lineage and any licensing constraints.

Show data table

Top values for source (1 unique shown, of 1 total).
value	count	share
Steam Store user data (likely via Kaggle or SteamSpy)	1	100.0%

Fig 3.

last_updated · Displays the single update timestamp of 2025-01-20, flagging how current the underlying data is.

Show data table

Top values for last_updated (1 unique shown, of 1 total).
value	count	share
2025-01-20	1	100.0%

Fig 4.

notes · Renders the full metadata notes field, which contains the key facts about file size, record scope, and the join key to recommendations.csv.

Show data table

Top values for notes (1 unique shown, of 1 total).
value	count	share
14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.	1	100.0%

Fig 5.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
dataset_name	categorical	0.0%
last_updated	categorical	0.0%
source	categorical	0.0%
record_count	numeric	0.0%
fields	unknown	0.0%
notes	categorical	0.0%

dataset_name categorical metadata

This column is a dataset-level identifier or metadata tag indicating the source dataset, with every row labelled 'Steam Users'. With only 1 row and 1 unique value, the column carries zero entropy (0.0) and a top_rate of 1.0 — it is a constant and provides no discriminative information. The long_tail and imbalance alerts are technically correct but trivially explained by the single-row, single-value nature of the data.

Treatment: Drop before modelling; constant column with no variance and only 1 row.

anthropic:default · confidence high

Out[11]:

saturn.columns["dataset_name"].stats

stat	value
n	1
nulls	0 (0.0%)
unique	1
top_value	Steam Users
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: long_tail	1 singleton categories
alert: imbalance	top value is 100.0% of rows

Fig 6.

Top values for dataset_name.

Show data table

Top values for dataset_name (1 unique shown, of 1 total).
value	count	share
Steam Users	1	100.0%

last_updated categorical timestamp

This column is a timestamp or date field indicating when a record was last updated, stored as a categorical string. The dataset contains only a single row (n=1), and that row holds the value '2025-01-20', giving a cardinality of 1 and top_rate of 1.0. With only one observation, no distributional insight is possible; the 'long_tail' and 'imbalance' alerts are artefacts of the trivial sample size rather than meaningful signals.

Treatment: Parse to a proper date type; defer any temporal analysis until the full dataset is loaded, as current sample has only 1 row.

anthropic:default · confidence high

Out[14]:

saturn.columns["last_updated"].stats

stat	value
n	1
nulls	0 (0.0%)
unique	1
top_value	2025-01-20
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: long_tail	1 singleton categories
alert: imbalance	top value is 100.0% of rows

Fig 7.

Top values for last_updated.

Show data table

Top values for last_updated (1 unique shown, of 1 total).
value	count	share
2025-01-20	1	100.0%

source categorical metadata

This column records the data provenance or source attribution for the dataset, with every single row carrying the identical value 'Steam Store user data (likely via Kaggle or SteamSpy)'. With n=1, cardinality=1, entropy=0.0, and top_rate=1.0, there is zero variance whatsoever — this is a constant column. It adds no analytical signal and likely exists as a metadata annotation injected during data collection or curation.

Treatment: Drop before modelling — zero-variance constant column carries no predictive information.

anthropic:default · confidence high

Out[17]:

saturn.columns["source"].stats

stat	value
n	1
nulls	0 (0.0%)
unique	1
top_value	Steam Store user data (likely via Kaggle or SteamSpy)
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: long_tail	1 singleton categories
alert: imbalance	top value is 100.0% of rows

Fig 8.

Top values for source.

Show data table

Top values for source (1 unique shown, of 1 total).
value	count	share
Steam Store user data (likely via Kaggle or SteamSpy)	1	100.0%

record_count numeric metadata

This column is a record count field, almost certainly a metadata scalar reporting the total row count of a source dataset — here fixed at 14,306,064 across all rows. With n=1, n_unique=1, and a constant value equal to mean, min, and max, there is zero variance; saturn has flagged it as 'constant'. This is not a feature or target but a summary statistic embedded in the dataset, likely from an ETL or export header row.

Treatment: Drop before modelling; store separately as a provenance scalar if needed.

anthropic:default · confidence high

Out[20]:

saturn.columns["record_count"].stats

stat	value
n	1
nulls	0 (0.0%)
unique	1
min	1.431e+07
max	1.431e+07
mean	1.431e+07
median	1.431e+07
std	0
q1	1.431e+07
q3	1.431e+07
iqr	0
skew	0
kurtosis	0
n_outliers	0
outlier_rate	0
zero_rate	0
alert: constant	only one distinct value

Fig 9.

Distribution of record_count. Vertical dash marks the median.

Show data table

Histogram bins for record_count (median: 14306064.0).
bin	count
1.431e+07 – 1.431e+07	0
1.431e+07 – 1.431e+07	0
1.431e+07 – 1.431e+07	1
1.431e+07 – 1.431e+07	0
1.431e+07 – 1.431e+07	0

fields unknown other

This column contains only a single row and was skipped by the profiler, yielding no distributional statistics. With n=1 and no uniqueness or type information available, no meaningful inference about its content or role is possible beyond the fact that it is non-null. The 'unknown' kind designation and empty stats block indicate the profiler could not parse or classify the value.

Treatment: Manually inspect the single row value to determine type and role before any downstream use.

anthropic:default · confidence low

Out[23]:

saturn.columns["fields"].stats

stat	value
n	1
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

notes categorical metadata

This column is a dataset-level metadata note, not a real data column — it contains a single static string describing the dataset itself (14.3 million Steam user profiles, file size 185 MB, join key user_id). With n=1 and cardinality=1, it appears to be a singleton annotation row or a schema-level descriptor accidentally included in the profiled data. The entropy of 0.0 and top_rate of 1.0 confirm it carries zero analytical signal.

Treatment: Drop before modelling; use the embedded join hint (user_id → recommendations.csv) as a schema reference only.

anthropic:default · confidence high

Out[25]:

saturn.columns["notes"].stats

stat	value
n	1
nulls	0 (0.0%)
unique	1
top_value	14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: long_tail	1 singleton categories
alert: imbalance	top value is 100.0% of rows

Fig 10.

Top values for notes.

Show data table

Top values for notes (1 unique shown, of 1 total).
value	count	share
14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.	1	100.0%

How to cite

click to copy

BibTeX

@misc{saturn-data-trove-steam-users-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove steam users},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-steam-users}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}

APA

Steuber, L. (2026). Saturn reading: data trove steam users. Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-steam-users