saturn·

data trove steam user reviews

source /home/coolhand/html/datavis/data_trove/entertainment/gaming/recommendations_metadata.json 1 rows 6 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · low confidence anthropic:default

This is a single-row metadata descriptor for the 'Steam Game Recommendations' dataset, last updated 2025-01-20 — it is a catalog entry rather than the underlying data itself. The key takeaway is the scale of what it describes: 41.1 million Steam user reviews stored in a 1.9 GB file, sourced likely via Kaggle or SteamSpy. The metadata notes that the full dataset links to companion files (games.csv and users.csv) via app_id and user_id, and includes playtime and helpfulness metrics — making those join keys the most important fields to validate before any analysis. Analysts should treat this file as a data dictionary and move quickly to the referenced source files for substantive exploration.

citing: record_count.max · dataset_name.top_value · last_updated.top_value · source.top_value · notes.top_value · row_count

Schema

6 columns
Per-column summary. Click column name to jump to its detail.
Alerts
dataset_name categorical 0.0% 1
long_tail imbalance
last_updated categorical 0.0% 1
long_tail imbalance
source categorical 0.0% 1
long_tail imbalance
record_count numeric 0.0% 1
constant
fields unknown 0.0%
skipped
notes categorical 0.0% 1
long_tail imbalance

dataset_name

categorical metadata long_tail imbalance
This column is a dataset-level metadata tag identifying the source dataset, with every single row (n=1) carrying the value 'Steam Game Recommendations'. Cardinality is 1 and entropy is 0.0, meaning the column is entirely constant and carries zero information. The 'long_tail' and 'imbalance' alerts are triggered mechanically by the extreme top_rate of 1.0, but are not meaningful here — this is simply a constant label. Treatment: Drop before modelling; zero-variance constant column adds no predictive signal. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
Steam Game Recommendations
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

last_updated

categorical metadata long_tail imbalance
This column is a metadata timestamp recording when each record was last updated. With only 1 row in the dataset and a single value of '2025-01-20' holding a top_rate of 1.0, the column is entirely constant — there is zero variance. The alerts for long_tail and imbalance are technically correct but vacuous given the single-row dataset; no meaningful distribution analysis is possible. Treatment: Exclude from modelling features; retain as audit metadata, but re-evaluate once the full dataset is loaded. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
2025-01-20
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

source

categorical metadata long_tail imbalance
This column records the data provenance/source attribution for the dataset, and contains exactly one unique value across all rows: 'Steam Store user reviews (likely via Kaggle or SteamSpy)'. With cardinality of 1, entropy of 0.0, and a top_rate of 1.0, it is a constant column carrying zero discriminative information. The alerts for long_tail and imbalance are technically triggered but are trivially explained by the single-value nature of the column. Treatment: Drop before modelling; constant column adds no signal and wastes memory. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
Steam Store user reviews (likely via Kaggle or SteamSpy)
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

record_count

numeric metadata constant
This column appears to be a summary or metadata field recording total row count for a dataset or batch, with a single observed value of 41,154,794. The dataset profile contains only 1 row (n=1), meaning this column is a scalar summary rather than a per-record attribute. It is flagged as 'constant' with zero variance, zero IQR, and min/max/mean/median all equal to 41,154,794.0. There is no analytical signal here — it carries no discriminative power and exists purely as a metadata annotation. Treatment: Drop before modelling; retain only as a data-quality audit reference if needed. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
min
4.115e+07
max
4.115e+07
mean
4.115e+07
median
4.115e+07
std
0
q1
4.115e+07
q3
4.115e+07
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

fields

unknown other skipped
This column ('fields') contains only a single row and was skipped by the profiler, yielding no distributional statistics. With n=1 and no type inference completed, essentially nothing can be determined about its content or role. The absence of nulls is the only positive signal available. Treatment: Inspect raw value manually before deciding on handling; re-profile with a larger sample if this column appears in a fuller dataset. low · anthropic:default
n
1
nulls
0 (0.0%)
unique

notes

categorical metadata long_tail imbalance
This column is a dataset-level metadata note, not a real data column — its single value is a documentation string describing the broader dataset (41.1 million Steam reviews, file size, join keys, and available metrics). With n=1, cardinality=1, and top_rate=1.0, it carries zero analytical signal and is purely an artifact of how the dataset profile was constructed. The entropy of 0.0 confirms there is no variation whatsoever. Treatment: Drop before modelling; this is a documentation artifact with no predictive or analytical value. high · anthropic:default
n
1
nulls
0 (0.0%)
unique
1
top_value
41.1 million Steam user reviews/recommendations. File is 1.9 GB. Links to games.csv via app_id and to users.csv via user_id. Includes playtime and helpfulness metrics.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0