saturn·

blissapi

source /home/coolhand/data/blissapi.db 6,181 rows 2 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 6,181 rows and 2 columns drawn from blissapi.db, pairing a free-text 'keyword' field with a categorical 'symbol_count' field. Every keyword is unique (6,181 distinct values across 6,181 rows) and is exactly one word, with lengths ranging from 2 to 72 characters and a median of 12. The 'symbol_count' column is fully constant at the value '1', so it carries no information for analysis. The most useful first look is the distribution of keyword lengths, since that is essentially the only varying signal in the data.

citing: row_count · column_count · columns[keyword].n_unique · columns[keyword].stats.one_word_rate · columns[keyword].stats.len_min · columns[keyword].stats.len_max · columns[keyword].stats.len_median · columns[keyword].stats.len_mean · columns[keyword].stats.len_p95 · columns[symbol_count].n_unique · columns[symbol_count].stats.top_value · columns[symbol_count].stats.top_rate

Schema

2 columns
Per-column summary. Click column name to jump to its detail.
Alerts
keyword text 0.0% 6,181
near_unique one_word
symbol_count categorical 0.0% 1
imbalance

keyword

text identifier near_unique one_word
This column is a single-word keyword or concept tag, with every one of the 6181 rows holding a unique value (n_unique = 6181, duplicate_rate = 0.0, one_word_rate = 1.0). Tokens are short (len_mean 14.5, len_median 12) and many are compound forms joined by underscores like 'tangerine_clementine_mandarin' or 'cns_injury', suggesting a controlled vocabulary of concept labels rather than free text. The fully unique vocabulary means it behaves like an identifier for distinct concepts, not a categorical feature. Treatment: Treat as a concept key; split underscore-joined tokens and embed if semantic similarity is needed, otherwise leave out of modelling. high · anthropic:claude-opus-4-7
n
6,181
nulls
0 (0.0%)
unique
6,181
len_min
2
len_max
72
len_mean
14.53
len_median
12
len_p95
31
word_mean
1
word_median
1
n_empty
0
n_duplicates
0
duplicate_rate
0
vocab_size
6,181
readability_flesch_mean
-75.9
emoji_rate
0
url_rate
0
one_word_rate
1
allcaps_rate
0
boilerplate_rate
0

symbol_count

categorical metadata imbalance
This column records a symbol count, but every one of the 6181 rows holds the value "1" (top_rate 1.0, cardinality 1, entropy 0). It carries no information and was flagged for imbalance. There are no nulls, just a single constant. Treatment: Drop; constant column with zero entropy. high · anthropic:claude-opus-4-7
n
6,181
nulls
0 (0.0%)
unique
1
top_value
1
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0