saturn·

data trove steam user reviews

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/recommendations_metadata.json

Saturn profiled 1 rows across 6 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/entertainment/gaming/recommendations_metadata.json",
    "--findings", "data-trove-steam-user-reviews.json",
    "--llm", "anthropic:default",
])

Summary confidence: low

This is a single-row metadata descriptor for the 'Steam Game Recommendations' dataset, last updated 2025-01-20 — it is a catalog entry rather than the underlying data itself. The key takeaway is the scale of what it describes: 41.1 million Steam user reviews stored in a 1.9 GB file, sourced likely via Kaggle or SteamSpy. The metadata notes that the full dataset links to companion files (games.csv and users.csv) via app_id and user_id, and includes playtime and helpfulness metrics — making those join keys the most important fields to validate before any analysis. Analysts should treat this file as a data dictionary and move quickly to the referenced source files for substantive exploration.

citing: record_count.max · dataset_name.top_value · last_updated.top_value · source.top_value · notes.top_value · row_count

Out[4]:

saturn.schema() · 6 columns

column kind n null% unique alerts
dataset_name categorical 1 0.0% 1 long_tail imbalance
last_updated categorical 1 0.0% 1 long_tail imbalance
source categorical 1 0.0% 1 long_tail imbalance
record_count numeric 1 0.0% 1 constant
fields unknown 1 0.0% skipped
notes categorical 1 0.0% 1 long_tail imbalance
Fig 1.
dataset_name · Confirms this is a single-dataset metadata file with 100% of rows describing 'Steam Game Recommendations'.
Show data table
Top values for dataset_name (1 unique shown, of 1 total).
valuecountshare
Steam Game Recommendations1100.0%
Fig 2.
record_count · Shows the single constant value of 41.1 million records, underscoring the scale of the underlying dataset.
Show data table
Histogram bins for record_count (median: 41154794.0).
bincount
4.115e+07 – 4.115e+070
4.115e+07 – 4.115e+070
4.115e+07 – 4.115e+071
4.115e+07 – 4.115e+070
4.115e+07 – 4.115e+070
Fig 3.
source · Identifies the data origin as Steam Store user reviews via Kaggle or SteamSpy — important for provenance checks.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
Steam Store user reviews (likely via Kaggle or SteamSpy)1100.0%
Fig 4.
last_updated · Shows the single update timestamp of 2025-01-20, useful for tracking data freshness.
Show data table
Top values for last_updated (1 unique shown, of 1 total).
valuecountshare
2025-01-201100.0%
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
dataset_namecategorical0.0%
last_updatedcategorical0.0%
sourcecategorical0.0%
record_countnumeric0.0%
fieldsunknown0.0%
notescategorical0.0%

dataset_name categorical metadata

This column is a dataset-level metadata tag identifying the source dataset, with every single row (n=1) carrying the value 'Steam Game Recommendations'. Cardinality is 1 and entropy is 0.0, meaning the column is entirely constant and carries zero information. The 'long_tail' and 'imbalance' alerts are triggered mechanically by the extreme top_rate of 1.0, but are not meaningful here — this is simply a constant label.

Treatment: Drop before modelling; zero-variance constant column adds no predictive signal.

anthropic:default · confidence high
Out[11]:

saturn.columns["dataset_name"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value Steam Game Recommendations
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 6.
Top values for dataset_name.
Show data table
Top values for dataset_name (1 unique shown, of 1 total).
valuecountshare
Steam Game Recommendations1100.0%

last_updated categorical metadata

This column is a metadata timestamp recording when each record was last updated. With only 1 row in the dataset and a single value of '2025-01-20' holding a top_rate of 1.0, the column is entirely constant — there is zero variance. The alerts for long_tail and imbalance are technically correct but vacuous given the single-row dataset; no meaningful distribution analysis is possible.

Treatment: Exclude from modelling features; retain as audit metadata, but re-evaluate once the full dataset is loaded.

anthropic:default · confidence high
Out[14]:

saturn.columns["last_updated"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value 2025-01-20
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 7.
Top values for last_updated.
Show data table
Top values for last_updated (1 unique shown, of 1 total).
valuecountshare
2025-01-201100.0%

source categorical metadata

This column records the data provenance/source attribution for the dataset, and contains exactly one unique value across all rows: 'Steam Store user reviews (likely via Kaggle or SteamSpy)'. With cardinality of 1, entropy of 0.0, and a top_rate of 1.0, it is a constant column carrying zero discriminative information. The alerts for long_tail and imbalance are technically triggered but are trivially explained by the single-value nature of the column.

Treatment: Drop before modelling; constant column adds no signal and wastes memory.

anthropic:default · confidence high
Out[17]:

saturn.columns["source"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value Steam Store user reviews (likely via Kaggle or SteamSpy)
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 8.
Top values for source.
Show data table
Top values for source (1 unique shown, of 1 total).
valuecountshare
Steam Store user reviews (likely via Kaggle or SteamSpy)1100.0%

record_count numeric metadata

This column appears to be a summary or metadata field recording total row count for a dataset or batch, with a single observed value of 41,154,794. The dataset profile contains only 1 row (n=1), meaning this column is a scalar summary rather than a per-record attribute. It is flagged as 'constant' with zero variance, zero IQR, and min/max/mean/median all equal to 41,154,794.0. There is no analytical signal here — it carries no discriminative power and exists purely as a metadata annotation.

Treatment: Drop before modelling; retain only as a data-quality audit reference if needed.

anthropic:default · confidence high
Out[20]:

saturn.columns["record_count"].stats

statvalue
n1
nulls0 (0.0%)
unique1
min 4.115e+07
max 4.115e+07
mean 4.115e+07
median 4.115e+07
std 0
q1 4.115e+07
q3 4.115e+07
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 9.
Distribution of record_count. Vertical dash marks the median.
Show data table
Histogram bins for record_count (median: 41154794.0).
bincount
4.115e+07 – 4.115e+070
4.115e+07 – 4.115e+070
4.115e+07 – 4.115e+071
4.115e+07 – 4.115e+070
4.115e+07 – 4.115e+070

fields unknown other

This column ('fields') contains only a single row and was skipped by the profiler, yielding no distributional statistics. With n=1 and no type inference completed, essentially nothing can be determined about its content or role. The absence of nulls is the only positive signal available.

Treatment: Inspect raw value manually before deciding on handling; re-profile with a larger sample if this column appears in a fuller dataset.

anthropic:default · confidence low
Out[23]:

saturn.columns["fields"].stats

statvalue
n1
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

notes categorical metadata

This column is a dataset-level metadata note, not a real data column — its single value is a documentation string describing the broader dataset (41.1 million Steam reviews, file size, join keys, and available metrics). With n=1, cardinality=1, and top_rate=1.0, it carries zero analytical signal and is purely an artifact of how the dataset profile was constructed. The entropy of 0.0 confirms there is no variation whatsoever.

Treatment: Drop before modelling; this is a documentation artifact with no predictive or analytical value.

anthropic:default · confidence high
Out[25]:

saturn.columns["notes"].stats

statvalue
n1
nulls0 (0.0%)
unique1
top_value 41.1 million Steam user reviews/recommendations. File is 1.9 GB. Links to games.csv via app_id and to users.csv via user_id. Includes playtime and helpfulness metrics.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: imbalancetop value is 100.0% of rows
Fig 10.
Top values for notes.
Show data table
Top values for notes (1 unique shown, of 1 total).
valuecountshare
41.1 million Steam user reviews/recommendations. File is 1.9 GB. Links to games.csv via app_id and to users.csv via user_id. Includes playtime and helpfulness metrics.1100.0%

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-steam-user-reviews-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove steam user reviews},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-steam-user-reviews}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove steam user reviews. Source: /home/coolhand/html/datavis/data_trove/entertainment/gaming/recommendations_metadata.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-steam-user-reviews