blissapi
Reading
This dataset contains 6,181 rows and 2 columns drawn from blissapi.db, pairing a free-text 'keyword' field with a categorical 'symbol_count' field. Every keyword is unique (6,181 distinct values across 6,181 rows) and is exactly one word, with lengths ranging from 2 to 72 characters and a median of 12. The 'symbol_count' column is fully constant at the value '1', so it carries no information for analysis. The most useful first look is the distribution of keyword lengths, since that is essentially the only varying signal in the data.
citing: row_count · column_count · columns[keyword].n_unique · columns[keyword].stats.one_word_rate · columns[keyword].stats.len_min · columns[keyword].stats.len_max · columns[keyword].stats.len_median · columns[keyword].stats.len_mean · columns[keyword].stats.len_p95 · columns[symbol_count].n_unique · columns[symbol_count].stats.top_value · columns[symbol_count].stats.top_rate
Charts the summary said to look at first
Show data table
| chars | count |
|---|---|
| 2 – 4 | 84 |
| 4 – 6 | 531 |
| 6 – 7 | 641 |
| 7 – 9 | 374 |
| 9 – 11 | 774 |
| 11 – 12 | 696 |
| 12 – 14 | 588 |
| 14 – 16 | 268 |
| 16 – 18 | 438 |
| 18 – 20 | 395 |
| 20 – 21 | 267 |
| 21 – 23 | 132 |
| 23 – 25 | 223 |
| 25 – 26 | 172 |
| 26 – 28 | 142 |
| 28 – 30 | 61 |
| 30 – 32 | 96 |
| 32 – 34 | 62 |
| 34 – 35 | 56 |
| 35 – 37 | 26 |
| 37 – 39 | 41 |
| 39 – 40 | 24 |
| 40 – 42 | 26 |
| 42 – 44 | 14 |
| 44 – 46 | 10 |
| 46 – 48 | 9 |
| 48 – 49 | 5 |
| 49 – 51 | 3 |
| 51 – 53 | 6 |
| 53 – 54 | 3 |
| 54 – 56 | 0 |
| 56 – 58 | 3 |
| 58 – 60 | 2 |
| 60 – 62 | 1 |
| 62 – 63 | 6 |
| 63 – 65 | 0 |
| 65 – 67 | 0 |
| 67 – 68 | 0 |
| 68 – 70 | 1 |
| 70 – 72 | 1 |
Show data table
| chars | count |
|---|---|
| 2 – 4 | 84 |
| 4 – 6 | 531 |
| 6 – 7 | 641 |
| 7 – 9 | 374 |
| 9 – 11 | 774 |
| 11 – 12 | 696 |
| 12 – 14 | 588 |
| 14 – 16 | 268 |
| 16 – 18 | 438 |
| 18 – 20 | 395 |
| 20 – 21 | 267 |
| 21 – 23 | 132 |
| 23 – 25 | 223 |
| 25 – 26 | 172 |
| 26 – 28 | 142 |
| 28 – 30 | 61 |
| 30 – 32 | 96 |
| 32 – 34 | 62 |
| 34 – 35 | 56 |
| 35 – 37 | 26 |
| 37 – 39 | 41 |
| 39 – 40 | 24 |
| 40 – 42 | 26 |
| 42 – 44 | 14 |
| 44 – 46 | 10 |
| 46 – 48 | 9 |
| 48 – 49 | 5 |
| 49 – 51 | 3 |
| 51 – 53 | 6 |
| 53 – 54 | 3 |
| 54 – 56 | 0 |
| 56 – 58 | 3 |
| 58 – 60 | 2 |
| 60 – 62 | 1 |
| 62 – 63 | 6 |
| 63 – 65 | 0 |
| 65 – 67 | 0 |
| 67 – 68 | 0 |
| 68 – 70 | 1 |
| 70 – 72 | 1 |
Show data table
| value | count | share |
|---|---|---|
| 1 | 6181 | 100.0% |
Schema
2 columns| Alerts | ||||
|---|---|---|---|---|
| keyword | text | 0.0% | 6,181 |
near_unique
one_word
|
| symbol_count | categorical | 0.0% | 1 |
imbalance
|
keyword
text identifier near_unique one_wordThis column is a single-word keyword or concept tag, with every one of the 6181 rows holding a unique value (n_unique = 6181, duplicate_rate = 0.0, one_word_rate = 1.0). Tokens are short (len_mean 14.5, len_median 12) and many are compound forms joined by underscores like 'tangerine_clementine_mandarin' or 'cns_injury', suggesting a controlled vocabulary of concept labels rather than free text. The fully unique vocabulary means it behaves like an identifier for distinct concepts, not a categorical feature. Treatment: Treat as a concept key; split underscore-joined tokens and embed if semantic similarity is needed, otherwise leave out of modelling.
- n
- 6,181
- nulls
- 0 (0.0%)
- unique
- 6,181
- len_min
- 2
- len_max
- 72
- len_mean
- 14.53
- len_median
- 12
- len_p95
- 31
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 6,181
- readability_flesch_mean
- -75.9
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 0
- boilerplate_rate
- 0
symbol_count
categorical metadata imbalanceThis column records a symbol count, but every one of the 6181 rows holds the value "1" (top_rate 1.0, cardinality 1, entropy 0). It carries no information and was flagged for imbalance. There are no nulls, just a single constant. Treatment: Drop; constant column with zero entropy.
- n
- 6,181
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- 1
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0