data trove steam user reviews
Reading
This is a single-row metadata descriptor for the 'Steam Game Recommendations' dataset, last updated 2025-01-20 — it is a catalog entry rather than the underlying data itself. The key takeaway is the scale of what it describes: 41.1 million Steam user reviews stored in a 1.9 GB file, sourced likely via Kaggle or SteamSpy. The metadata notes that the full dataset links to companion files (games.csv and users.csv) via app_id and user_id, and includes playtime and helpfulness metrics — making those join keys the most important fields to validate before any analysis. Analysts should treat this file as a data dictionary and move quickly to the referenced source files for substantive exploration.
citing: record_count.max · dataset_name.top_value · last_updated.top_value · source.top_value · notes.top_value · row_count
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Steam Game Recommendations | 1 | 100.0% |
Show data table
| bin | count |
|---|---|
| 4.115e+07 – 4.115e+07 | 0 |
| 4.115e+07 – 4.115e+07 | 0 |
| 4.115e+07 – 4.115e+07 | 1 |
| 4.115e+07 – 4.115e+07 | 0 |
| 4.115e+07 – 4.115e+07 | 0 |
Show data table
| value | count | share |
|---|---|---|
| Steam Store user reviews (likely via Kaggle or SteamSpy) | 1 | 100.0% |
Show data table
| value | count | share |
|---|---|---|
| 2025-01-20 | 1 | 100.0% |
Schema
6 columns| Alerts | ||||
|---|---|---|---|---|
| dataset_name | categorical | 0.0% | 1 |
long_tail
imbalance
|
| last_updated | categorical | 0.0% | 1 |
long_tail
imbalance
|
| source | categorical | 0.0% | 1 |
long_tail
imbalance
|
| record_count | numeric | 0.0% | 1 |
constant
|
| fields | unknown | 0.0% | — |
skipped
|
| notes | categorical | 0.0% | 1 |
long_tail
imbalance
|
dataset_name
categorical metadata long_tail imbalanceThis column is a dataset-level metadata tag identifying the source dataset, with every single row (n=1) carrying the value 'Steam Game Recommendations'. Cardinality is 1 and entropy is 0.0, meaning the column is entirely constant and carries zero information. The 'long_tail' and 'imbalance' alerts are triggered mechanically by the extreme top_rate of 1.0, but are not meaningful here — this is simply a constant label. Treatment: Drop before modelling; zero-variance constant column adds no predictive signal.
- n
- 1
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- Steam Game Recommendations
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
last_updated
categorical metadata long_tail imbalanceThis column is a metadata timestamp recording when each record was last updated. With only 1 row in the dataset and a single value of '2025-01-20' holding a top_rate of 1.0, the column is entirely constant — there is zero variance. The alerts for long_tail and imbalance are technically correct but vacuous given the single-row dataset; no meaningful distribution analysis is possible. Treatment: Exclude from modelling features; retain as audit metadata, but re-evaluate once the full dataset is loaded.
- n
- 1
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- 2025-01-20
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
source
categorical metadata long_tail imbalanceThis column records the data provenance/source attribution for the dataset, and contains exactly one unique value across all rows: 'Steam Store user reviews (likely via Kaggle or SteamSpy)'. With cardinality of 1, entropy of 0.0, and a top_rate of 1.0, it is a constant column carrying zero discriminative information. The alerts for long_tail and imbalance are technically triggered but are trivially explained by the single-value nature of the column. Treatment: Drop before modelling; constant column adds no signal and wastes memory.
- n
- 1
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- Steam Store user reviews (likely via Kaggle or SteamSpy)
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
record_count
numeric metadata constantThis column appears to be a summary or metadata field recording total row count for a dataset or batch, with a single observed value of 41,154,794. The dataset profile contains only 1 row (n=1), meaning this column is a scalar summary rather than a per-record attribute. It is flagged as 'constant' with zero variance, zero IQR, and min/max/mean/median all equal to 41,154,794.0. There is no analytical signal here — it carries no discriminative power and exists purely as a metadata annotation. Treatment: Drop before modelling; retain only as a data-quality audit reference if needed.
- n
- 1
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 4.115e+07
- max
- 4.115e+07
- mean
- 4.115e+07
- median
- 4.115e+07
- std
- 0
- q1
- 4.115e+07
- q3
- 4.115e+07
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
fields
unknown other skippedThis column ('fields') contains only a single row and was skipped by the profiler, yielding no distributional statistics. With n=1 and no type inference completed, essentially nothing can be determined about its content or role. The absence of nulls is the only positive signal available. Treatment: Inspect raw value manually before deciding on handling; re-profile with a larger sample if this column appears in a fuller dataset.
- n
- 1
- nulls
- 0 (0.0%)
- unique
- —
notes
categorical metadata long_tail imbalanceThis column is a dataset-level metadata note, not a real data column — its single value is a documentation string describing the broader dataset (41.1 million Steam reviews, file size, join keys, and available metrics). With n=1, cardinality=1, and top_rate=1.0, it carries zero analytical signal and is purely an artifact of how the dataset profile was constructed. The entropy of 0.0 confirms there is no variation whatsoever. Treatment: Drop before modelling; this is a documentation artifact with no predictive or analytical value.
- n
- 1
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- 41.1 million Steam user reviews/recommendations. File is 1.9 GB. Links to games.csv via app_id and to users.csv via user_id. Includes playtime and helpfulness metrics.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0