saturn·

data trove steam users

source /home/coolhand/html/datavis/data_trove/entertainment/gaming/users_metadata.json 1 rows 6 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · low confidence anthropic:default

This is a single-row metadata record describing the Steam Users dataset, a collection of 14,306,064 Steam user profiles sourced from Steam Store data (likely via Kaggle or SteamSpy) and last updated on 2025-01-20. Rather than being an analytical dataset itself, it serves as a data catalogue entry pointing analysts toward the actual user data file (185 MB) which links to a recommendations.csv via user_id. The most important thing to note is the scale: over 14 million user profiles covering library size and review activity represent a substantial analytical resource. Before diving in, analysts should locate and join the referenced recommendations.csv to unlock the full relational value of this dataset.

citing: record_count.max · notes.top_value · source.top_value · last_updated.top_value · row_count

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
dataset_name categorical 0.0% 1
long_tail imbalance
last_updated categorical 0.0% 1
long_tail imbalance
source categorical 0.0% 1
long_tail imbalance
record_count numeric 0.0% 1
constant
fields unknown 0.0%
skipped
notes categorical 0.0% 1
long_tail imbalance

dataset_name

categorical metadata long_tail imbalance
This column is a dataset-level identifier or metadata tag indicating the source dataset, with every row labelled 'Steam Users'. With only 1 row and 1 unique value, the column carries zero entropy (0.0) and a top_rate of 1.0 — it is a constant and provides no discriminative information. The long_tail and imbalance alerts are technically correct but trivially explained by the single-row, single-value nature of the data. Treatment: Drop before modelling; constant column with no variance and only 1 row. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
Steam Users
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

last_updated

categorical timestamp long_tail imbalance
This column is a timestamp or date field indicating when a record was last updated, stored as a categorical string. The dataset contains only a single row (n=1), and that row holds the value '2025-01-20', giving a cardinality of 1 and top_rate of 1.0. With only one observation, no distributional insight is possible; the 'long_tail' and 'imbalance' alerts are artefacts of the trivial sample size rather than meaningful signals. Treatment: Parse to a proper date type; defer any temporal analysis until the full dataset is loaded, as current sample has only 1 row. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
2025-01-20
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

source

categorical metadata long_tail imbalance
This column records the data provenance or source attribution for the dataset, with every single row carrying the identical value 'Steam Store user data (likely via Kaggle or SteamSpy)'. With n=1, cardinality=1, entropy=0.0, and top_rate=1.0, there is zero variance whatsoever — this is a constant column. It adds no analytical signal and likely exists as a metadata annotation injected during data collection or curation. Treatment: Drop before modelling — zero-variance constant column carries no predictive information. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
Steam Store user data (likely via Kaggle or SteamSpy)
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

record_count

numeric metadata constant
This column is a record count field, almost certainly a metadata scalar reporting the total row count of a source dataset — here fixed at 14,306,064 across all rows. With n=1, n_unique=1, and a constant value equal to mean, min, and max, there is zero variance; saturn has flagged it as 'constant'. This is not a feature or target but a summary statistic embedded in the dataset, likely from an ETL or export header row. Treatment: Drop before modelling; store separately as a provenance scalar if needed. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
min
1.431e+07
max
1.431e+07
mean
1.431e+07
median
1.431e+07
std
0
q1
1.431e+07
q3
1.431e+07
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

fields

unknown other skipped
This column contains only a single row and was skipped by the profiler, yielding no distributional statistics. With n=1 and no uniqueness or type information available, no meaningful inference about its content or role is possible beyond the fact that it is non-null. The 'unknown' kind designation and empty stats block indicate the profiler could not parse or classify the value. Treatment: Manually inspect the single row value to determine type and role before any downstream use. low · anthropic:default
n
1
nulls
0 (0.0%)
unique

notes

categorical metadata long_tail imbalance
This column is a dataset-level metadata note, not a real data column — it contains a single static string describing the dataset itself (14.3 million Steam user profiles, file size 185 MB, join key user_id). With n=1 and cardinality=1, it appears to be a singleton annotation row or a schema-level descriptor accidentally included in the profiled data. The entropy of 0.0 and top_rate of 1.0 confirm it carries zero analytical signal. Treatment: Drop before modelling; use the embedded join hint (user_id → recommendations.csv) as a schema reference only. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
14.3 million Steam user profiles with library size and review activity. File is 185 MB. Links to recommendations.csv via user_id.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0