saturn·

quirky cheese list

source /home/coolhand/html/datavis/data_trove/data/quirky/cheese_list.json 7,146 rows 4 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset is a catalogue of 7,146 cheese product entries with a name, a category, a country of origin, and a constant value field. Cheeses span 32 categories and 111 countries, with France alone accounting for 25.9% of rows and Germany and the United States rounding out the top three. Category is led by Cream Cheese (1,187 rows, 16.6%), followed by Mozzarella and Soft Cheese, suggesting some categories are far more populated than others. The name column is multilingual (predominantly English and French, with notable German, Spanish, and Italian presence) and has an 11.3% duplicate rate worth investigating before any de-duplicated analysis. Note that the value column is constant at 1.0 across all rows and carries no analytical signal.

citing: row_count · column_count · columns.name.language_counts · columns.name.stats.duplicate_rate · columns.category.n_unique · columns.category.top_values · columns.category.stats.top_rate · columns.country.n_unique · columns.country.top_values · columns.country.stats.top_rate · columns.value.n_unique · columns.value.stats.mean

Schema

4 columns
Per-column summary. Click column name to jump to its detail.
Alerts
name text 0.0% 6,337
multilingual
country categorical 0.0% 111
category categorical 0.0% 32
value numeric 0.0% 1
constant

name

text label multilingual
Short product names for cheeses, averaging 3.4 words / 23 characters, with top tokens 'cheese' (1465), 'fromage' (406), 'queso' (239) and varieties like Mozzarella, Cottage cheese, Gouda. Language detection spans 30 codes — predominantly en (1847), fr (1029), de (552), it (389), es (251) — confirming the 'multilingual' alert. 809 duplicates (11.3%) include casing variants ('Cottage cheese' 29 vs 'Cottage Cheese' 27), and 6337 unique values out of 7146 means it is high-cardinality but not an identifier. Treatment: Normalize case and language before grouping; consider canonicalizing to a cheese-variety taxonomy. high · anthropic:claude-opus-4-7
n
7,146
nulls
0 (0.0%)
unique
6,337
len_min
4
len_max
191
len_mean
22.96
len_median
21
len_p95
44
word_mean
3.443
word_median
3
n_empty
0
n_duplicates
809
duplicate_rate
0.1132
vocab_size
4,732
readability_flesch_mean
53.61
emoji_rate
0.0005598
url_rate
0
one_word_rate
0.1041
allcaps_rate
0.01707
boilerplate_rate
0

country

categorical feature
This is a country-of-origin or location categorical with 111 distinct values across 7146 rows and no nulls. The distribution is Europe-heavy and concentrated: France alone accounts for 25.9% of records, with Germany (907) and the United States (759) trailing, giving an entropy ratio of 0.63. The long tail of 100+ smaller countries means rare-category handling will matter. Treatment: Group rare countries into an 'Other' bucket before one-hot or target encoding. high · anthropic:claude-opus-4-7
n
7,146
nulls
0 (0.0%)
unique
111
top_value
France
top_rate
0.2593
cardinality
111
entropy
4.268
entropy_ratio
0.6281

category

categorical feature
This is a categorical product-type field for cheese items, with 32 distinct categories across 7146 rows and no nulls. Cream Cheese leads at 16.6% (1187 rows), followed by Mozzarella and Soft Cheese, and entropy ratio 0.82 indicates a fairly even spread rather than dominance by one value. No rare-label or drift signals are present in the evidence. Treatment: One-hot or target-encode for modelling; cardinality of 32 is manageable. high · anthropic:claude-opus-4-7
n
7,146
nulls
0 (0.0%)
unique
32
top_value
Cream Cheese
top_rate
0.1661
cardinality
32
entropy
4.098
entropy_ratio
0.8195

value

numeric other constant
The column 'value' is numeric but completely constant: all 7146 rows hold the value 1.0, with zero variance and a single unique value. It carries no information for analysis or modelling. Treatment: Drop; constant column with no signal. high · anthropic:claude-opus-4-7
n
7,146
nulls
0 (0.0%)
unique
1
min
1
max
1
mean
1
median
1
std
0
q1
1
q3
1
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0