data trove boy bands
Reading
This dataset is a small reference list of 15 famous boy bands, capturing each band's name and its years active. The most immediately interesting angle is the band frequency distribution — four bands (Westlife, Jonas Brothers, Take That, and Blue) each appear twice, suggesting possible duplicate rows or multiple entries per group worth investigating. The Years Active column is entirely unique across all 15 rows, spanning acts from 1958 (The Osmonds) to present-day groups, hinting at a wide generational spread that could reward closer reading.
citing: row_count · column_count · top_value · top_rate · n_unique · null_rate
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Westlife | 2 | 13.3% |
| Jonas Brothers | 2 | 13.3% |
| Take That | 2 | 13.3% |
| Blue | 2 | 13.3% |
| NSync | 1 | 6.7% |
| Backstreet Boys | 1 | 6.7% |
| BTS | 1 | 6.7% |
| One Direction | 1 | 6.7% |
| The Osmonds | 1 | 6.7% |
| New Kids on the Block | 1 | 6.7% |
| The Beatles | 1 | 6.7% |
Show data table
| value | count | share |
|---|---|---|
| 1995-2002 | 1 | 6.7% |
| 1998-2012 | 1 | 6.7% |
| 2018-present | 1 | 6.7% |
| 1993-present | 1 | 6.7% |
| 2013-present | 1 | 6.7% |
| 2010-2016 | 1 | 6.7% |
| 2005-2013 | 1 | 6.7% |
| 2019-present | 1 | 6.7% |
| 1958-present | 1 | 6.7% |
| 1984-1994 | 1 | 6.7% |
| 1990-1996 | 1 | 6.7% |
| 2005-present | 1 | 6.7% |
| 1960-1970 | 1 | 6.7% |
| 2000-2005 | 1 | 6.7% |
| 2011-present | 1 | 6.7% |
Schema
4 columns| Alerts | ||||
|---|---|---|---|---|
| index | numeric | 0.0% | 15 |
|
| S.No. | numeric | 0.0% | 15 |
|
| Band | categorical | 0.0% | 11 |
long_tail
|
| Years Active | categorical | 0.0% | 15 |
long_tail
|
index
numeric identifierThis column is a row index running 0–14 across all 15 records, with perfect uniqueness and no nulls. Values are uniformly spaced (mean = median = 7.0, skew = 0.0, platykurtic at −1.21), consistent with an auto-generated sequential integer index. It carries no analytical information. Treatment: Drop before modelling; it is a row counter with no predictive value.
- n
- 15
- nulls
- 0 (0.0%)
- unique
- 15
- min
- 0
- max
- 14
- mean
- 7
- median
- 7
- std
- 4.472
- q1
- 3.5
- q3
- 10.5
- iqr
- 7
- skew
- 0
- kurtosis
- -1.211
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.06667
S.No.
numeric identifierThis column is a sequential row index (serial number), running from 1 to 15 with all 15 values unique and no nulls. The distribution is perfectly symmetric (skew = 0.0, mean = median = 8.0) and uniformly spread, consistent with a simple integer counter. There is nothing analytically informative here beyond row ordering. Treatment: Drop before modelling; use only for row traceability if needed.
- n
- 15
- nulls
- 0 (0.0%)
- unique
- 15
- min
- 1
- max
- 15
- mean
- 8
- median
- 8
- std
- 4.472
- q1
- 4.5
- q3
- 11.5
- iqr
- 7
- skew
- 0
- kurtosis
- -1.211
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
Band
categorical label long_tailThis column contains the names of pop/boy bands, functioning as a categorical label in what appears to be a small reference dataset of 15 rows covering 11 distinct acts. The top four values (Westlife, Jonas Brothers, Take That, Blue) each appear exactly twice, while the remaining 7 bands appear once — producing a long-tail alert despite the tiny dataset size. With only 15 rows total, high entropy ratio (0.975) and near-unique cardinality (11/15), this column is close to an identifier rather than a grouping feature. Treatment: Use as a grouping label for lookup or display; with only 15 rows and 11 unique values, avoid treating as a statistical feature without acquiring significantly more data.
- n
- 15
- nulls
- 0 (0.0%)
- unique
- 11
- top_value
- Westlife
- top_rate
- 0.1333
- cardinality
- 11
- entropy
- 3.374
- entropy_ratio
- 0.9752
Years Active
categorical feature long_tailThis column captures the active career span of entities (likely artists, bands, or performers) as free-form date-range strings such as '1995-2002' or '2018-present'. With cardinality of 15 out of 15 rows and entropy_ratio of ~1.0, every value is unique — the column is essentially free text with no repeated categories. The trailing whitespace visible in values like '1995-2002 ' and '1958-present ' indicates inconsistent formatting that will require cleaning before any date parsing. Treatment: Strip whitespace, split on '-' to extract start year and end year/flag 'present', then engineer numeric duration and is_active boolean features.
- n
- 15
- nulls
- 0 (0.0%)
- unique
- 15
- top_value
- 1995-2002
- top_rate
- 0.06667
- cardinality
- 15
- entropy
- 3.907
- entropy_ratio
- 1