saturn·

data trove steam users

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json

Saturn profiled 1 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json",
    "--findings", "data-trove-steam-users.json",
    "--llm", "anthropic:default",
])

Summary confidence: low

This is a single-row metadata record describing the Steam Users dataset, a collection of 14,306,064 Steam user profiles sourced from Steam Store data (likely via Kaggle or SteamSpy) and last updated on 2025-01-20. Rather than being an analytical dataset itself, it serves as a data catalogue entry pointing analysts toward the actual user data file (185 MB) which links to a recommendations.csv via user_id. The most important thing to note is the scale: over 14 million user profiles covering library size and review activity represent a substantial analytical resource. Before diving in, analysts should locate and join the referenced recommendations.csv to unlock the full relational value of this dataset.

citing: record_count.max · notes.top_value · source.top_value · last_updated.top_value · row_count

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
dataset_name categorical 1 0.0% 1 long_tail imbalance
last_updated categorical 1 0.0% 1 long_tail imbalance
source categorical 1 0.0% 1 long_tail imbalance
record_count numeric 1 0.0% 1 constant
fields unknown 1 0.0% skipped
notes categorical 1 0.0% 1 long_tail imbalance
Fig 1.
dataset_name · Confirms this is a single-entry catalogue record for the 'Steam Users' dataset — useful as a quick identity check.
Show data table
Top values for dataset_name (1 unique shown, of 1 total).
valuecountshare
Steam Users1100.0%
Fig 2.
source · Shows the data provenance (Steam Store via Kaggle or SteamSpy), which is critical for understanding lineage and any licensing constraints.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
Steam Store user data (likely via Kaggle or SteamSpy)1100.0%
Fig 3.
last_updated · Displays the single update timestamp of 2025-01-20, flagging how current the underlying data is.
Show data table
Top values for last_updated (1 unique shown, of 1 total).
valuecountshare
2025-01-201100.0%
Fig 4.
notes · Renders the full metadata notes field, which contains the key facts about file size, record scope, and the join key to recommendations.csv.
Show data table
Top values for notes (1 unique shown, of 1 total).
valuecountshare
14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.1100.0%
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
dataset_namecategorical0.0%
last_updatedcategorical0.0%
sourcecategorical0.0%
record_countnumeric0.0%
fieldsunknown0.0%
notescategorical0.0%

dataset_name categorical metadata

This column is a dataset-level identifier or metadata tag indicating the source dataset, with every row labelled 'Steam Users'. With only 1 row and 1 unique value, the column carries zero entropy (0.0) and a top_rate of 1.0 — it is a constant and provides no discriminative information. The long_tail and imbalance alerts are technically correct but trivially explained by the single-row, single-value nature of the data.

Treatment: Drop before modelling; constant column with no variance and only 1 row.

anthropic:default · confidence high
Out[11]:

saturn.columns["dataset_name"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value Steam Users
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 6.
Top values for dataset_name.
Show data table
Top values for dataset_name (1 unique shown, of 1 total).
valuecountshare
Steam Users1100.0%

last_updated categorical timestamp

This column is a timestamp or date field indicating when a record was last updated, stored as a categorical string. The dataset contains only a single row (n=1), and that row holds the value '2025-01-20', giving a cardinality of 1 and top_rate of 1.0. With only one observation, no distributional insight is possible; the 'long_tail' and 'imbalance' alerts are artefacts of the trivial sample size rather than meaningful signals.

Treatment: Parse to a proper date type; defer any temporal analysis until the full dataset is loaded, as current sample has only 1 row.

anthropic:default · confidence high
Out[14]:

saturn.columns["last_updated"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value 2025-01-20
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 7.
Top values for last_updated.
Show data table
Top values for last_updated (1 unique shown, of 1 total).
valuecountshare
2025-01-201100.0%

source categorical metadata

This column records the data provenance or source attribution for the dataset, with every single row carrying the identical value 'Steam Store user data (likely via Kaggle or SteamSpy)'. With n=1, cardinality=1, entropy=0.0, and top_rate=1.0, there is zero variance whatsoever — this is a constant column. It adds no analytical signal and likely exists as a metadata annotation injected during data collection or curation.

Treatment: Drop before modelling — zero-variance constant column carries no predictive information.

anthropic:default · confidence high
Out[17]:

saturn.columns["source"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value Steam Store user data (likely via Kaggle or SteamSpy)
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 8.
Top values for source.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
Steam Store user data (likely via Kaggle or SteamSpy)1100.0%

record_count numeric metadata

This column is a record count field, almost certainly a metadata scalar reporting the total row count of a source dataset — here fixed at 14,306,064 across all rows. With n=1, n_unique=1, and a constant value equal to mean, min, and max, there is zero variance; saturn has flagged it as 'constant'. This is not a feature or target but a summary statistic embedded in the dataset, likely from an ETL or export header row.

Treatment: Drop before modelling; store separately as a provenance scalar if needed.

anthropic:default · confidence high
Out[20]:

saturn.columns["record_count"].stats

statvalue
n1
nulls0 (0.0%)
unique1
min 1.431e+07
max 1.431e+07
mean 1.431e+07
median 1.431e+07
std 0
q1 1.431e+07
q3 1.431e+07
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 9.
Distribution of record_count. Vertical dash marks the median.
Show data table
Histogram bins for record_count (median: 14306064.0).
bincount
1.431e+07 – 1.431e+070
1.431e+07 – 1.431e+070
1.431e+07 – 1.431e+071
1.431e+07 – 1.431e+070
1.431e+07 – 1.431e+070

fields unknown other

This column contains only a single row and was skipped by the profiler, yielding no distributional statistics. With n=1 and no uniqueness or type information available, no meaningful inference about its content or role is possible beyond the fact that it is non-null. The 'unknown' kind designation and empty stats block indicate the profiler could not parse or classify the value.

Treatment: Manually inspect the single row value to determine type and role before any downstream use.

anthropic:default · confidence low
Out[23]:

saturn.columns["fields"].stats

statvalue
n1
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

notes categorical metadata

This column is a dataset-level metadata note, not a real data column — it contains a single static string describing the dataset itself (14.3 million Steam user profiles, file size 185 MB, join key user_id). With n=1 and cardinality=1, it appears to be a singleton annotation row or a schema-level descriptor accidentally included in the profiled data. The entropy of 0.0 and top_rate of 1.0 confirm it carries zero analytical signal.

Treatment: Drop before modelling; use the embedded join hint (user_id → recommendations.csv) as a schema reference only.

anthropic:default · confidence high
Out[25]:

saturn.columns["notes"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value 14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 10.
Top values for notes.
Show data table
Top values for notes (1 unique shown, of 1 total).
valuecountshare
14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.1100.0%

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-steam-users-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove steam users},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-steam-users}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove steam users. Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-steam-users