saturn·

quirky cheese list

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/cheese_list.json

Saturn profiled 7,146 rows across 4 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/cheese_list.json",
    "--findings", "quirky-cheese_list.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset is a catalogue of 7,146 cheese product entries with a name, a category, a country of origin, and a constant value field. Cheeses span 32 categories and 111 countries, with France alone accounting for 25.9% of rows and Germany and the United States rounding out the top three. Category is led by Cream Cheese (1,187 rows, 16.6%), followed by Mozzarella and Soft Cheese, suggesting some categories are far more populated than others. The name column is multilingual (predominantly English and French, with notable German, Spanish, and Italian presence) and has an 11.3% duplicate rate worth investigating before any de-duplicated analysis. Note that the value column is constant at 1.0 across all rows and carries no analytical signal.

citing: row_count · column_count · columns.name.language_counts · columns.name.stats.duplicate_rate · columns.category.n_unique · columns.category.top_values · columns.category.stats.top_rate · columns.country.n_unique · columns.country.top_values · columns.country.stats.top_rate · columns.value.n_unique · columns.value.stats.mean

Out[4]:

saturn.schema() · 4 columns

column kind n null% unique alerts
name text 7,146 0.0% 6,337 multilingual
country categorical 7,146 0.0% 111
category categorical 7,146 0.0% 32
value numeric 7,146 0.0% 1 constant
Fig 1.
country · France dominates origin counts at ~26%, with a long tail across 111 countries.
Show data table
Top values for country (20 unique shown, of 111 total).
valuecountshare
France185325.9%
Germany90712.7%
United States75910.6%
Belgium3344.7%
United Kingdom3334.7%
Spain3194.5%
Italy3074.3%
Switzerland2092.9%
Poland1452.0%
Netherlands1341.9%
Austria1271.8%
Canada1251.7%
Sweden1231.7%
Portugal1151.6%
Ireland1141.6%
Czech Republic1031.4%
Australia1001.4%
Finland881.2%
Norway751.0%
Bulgaria600.8%
Fig 2.
category · Cream Cheese, Mozzarella, and Soft Cheese lead the 32 categories — check whether top categories are over-represented.
Show data table
Top values for category (20 unique shown, of 32 total).
valuecountshare
Cream Cheese118716.6%
Mozzarella7029.8%
Soft Cheese6378.9%
Grated Cheese5718.0%
Cottage Cheese5447.6%
Goat Cheese5267.4%
Cheese Spread4736.6%
Gouda4566.4%
Hard Cheese3404.8%
Feta2463.4%
Fresh Cheese1962.7%
Fromage Blanc1502.1%
Raclette1442.0%
Comté991.4%
Edam971.4%
Havarti951.3%
Burrata881.2%
Halloumi871.2%
Ricotta851.2%
Dairy Products771.1%
Fig 3.
name · Most cheese names are short (median 21 characters, ~3 words); look for outliers up to 191 characters.
Show data table
Character-length distribution for name (mean: 22.96235656311223).
charscount
4 – 9438
9 – 13994
13 – 181446
18 – 231162
23 – 271119
27 – 32774
32 – 37389
37 – 41341
41 – 46184
46 – 51107
51 – 5569
55 – 6044
60 – 6528
65 – 6917
69 – 747
74 – 792
79 – 832
83 – 886
88 – 930
93 – 981
98 – 1023
102 – 1071
107 – 1126
112 – 1160
116 – 1210
121 – 1263
126 – 1301
130 – 1350
135 – 1400
140 – 1440
144 – 1490
149 – 1540
154 – 1581
158 – 1630
163 – 1680
168 – 1720
172 – 1770
177 – 1820
182 – 1860
186 – 1911
Fig 4.
name · Language mix of names is led by English and French — confirm this matches your expected scope.
Show data table
Character-length distribution for name (mean: 22.96235656311223).
charscount
4 – 9438
9 – 13994
13 – 181446
18 – 231162
23 – 271119
27 – 32774
32 – 37389
37 – 41341
41 – 46184
46 – 51107
51 – 5569
55 – 6044
60 – 6528
65 – 6917
69 – 747
74 – 792
79 – 832
83 – 886
88 – 930
93 – 981
98 – 1023
102 – 1071
107 – 1126
112 – 1160
116 – 1210
121 – 1263
126 – 1301
130 – 1350
135 – 1400
140 – 1440
144 – 1490
149 – 1540
154 – 1581
158 – 1630
163 – 1680
168 – 1720
172 – 1770
177 – 1820
182 – 1860
186 – 1911
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
nametext0.0%
countrycategorical0.0%
categorycategorical0.0%
valuenumeric0.0%
Fig 6.
Language mix across all text columns (per-string detection, sampled).
Show data table
Per-language counts (total 4,745 detected strings).
langcountshare
en184738.9%
fr102921.7%
de55211.6%
it3898.2%
es2515.3%
nl1463.1%
pt962.0%
pl771.6%
ca501.1%
fi390.8%
sv300.6%
uk290.6%
cs270.6%
ru240.5%
sk190.4%
tr170.4%
et130.3%
no130.3%
ro120.3%
ja120.3%
el100.2%
da90.2%
hu70.1%
he70.1%
lt70.1%
ceb70.1%
eo70.1%
lv70.1%
sh60.1%
id60.1%

name text label

Short product names for cheeses, averaging 3.4 words / 23 characters, with top tokens 'cheese' (1465), 'fromage' (406), 'queso' (239) and varieties like Mozzarella, Cottage cheese, Gouda. Language detection spans 30 codes — predominantly en (1847), fr (1029), de (552), it (389), es (251) — confirming the 'multilingual' alert. 809 duplicates (11.3%) include casing variants ('Cottage cheese' 29 vs 'Cottage Cheese' 27), and 6337 unique values out of 7146 means it is high-cardinality but not an identifier.

Treatment: Normalize case and language before grouping; consider canonicalizing to a cheese-variety taxonomy.

anthropic:claude-opus-4-7 · confidence high
Out[12]:

saturn.columns["name"].stats

statvalue
n7,146
nulls0 (0.0%)
unique6,337
len_min 4
len_max 191
len_mean 22.96
len_median 21
len_p95 44
word_mean 3.443
word_median 3
n_empty 0
n_duplicates 809
duplicate_rate 0.1132
vocab_size 4,732
readability_flesch_mean 53.61
emoji_rate 0.0005598
url_rate 0
one_word_rate 0.1041
allcaps_rate 0.01707
boilerplate_rate 0
alert: multilingual31 languages detected in sample
Fig 7.
Character-length distribution for name.
Show data table
Character-length distribution for name (mean: 22.96235656311223).
charscount
4 – 9438
9 – 13994
13 – 181446
18 – 231162
23 – 271119
27 – 32774
32 – 37389
37 – 41341
41 – 46184
46 – 51107
51 – 5569
55 – 6044
60 – 6528
65 – 6917
69 – 747
74 – 792
79 – 832
83 – 886
88 – 930
93 – 981
98 – 1023
102 – 1071
107 – 1126
112 – 1160
116 – 1210
121 – 1263
126 – 1301
130 – 1350
135 – 1400
140 – 1440
144 – 1490
149 – 1540
154 – 1581
158 – 1630
163 – 1680
168 – 1720
172 – 1770
177 – 1820
182 – 1860
186 – 1911

country categorical feature

This is a country-of-origin or location categorical with 111 distinct values across 7146 rows and no nulls. The distribution is Europe-heavy and concentrated: France alone accounts for 25.9% of records, with Germany (907) and the United States (759) trailing, giving an entropy ratio of 0.63. The long tail of 100+ smaller countries means rare-category handling will matter.

Treatment: Group rare countries into an 'Other' bucket before one-hot or target encoding.

anthropic:claude-opus-4-7 · confidence high
Out[15]:

saturn.columns["country"].stats

statvalue
n7,146
nulls0 (0.0%)
unique111
top_value France
top_rate 0.2593
cardinality 111
entropy 4.268
entropy_ratio 0.6281
Fig 8.
Top values for country.
Show data table
Top values for country (20 unique shown, of 111 total).
valuecountshare
France185325.9%
Germany90712.7%
United States75910.6%
Belgium3344.7%
United Kingdom3334.7%
Spain3194.5%
Italy3074.3%
Switzerland2092.9%
Poland1452.0%
Netherlands1341.9%
Austria1271.8%
Canada1251.7%
Sweden1231.7%
Portugal1151.6%
Ireland1141.6%
Czech Republic1031.4%
Australia1001.4%
Finland881.2%
Norway751.0%
Bulgaria600.8%

category categorical feature

This is a categorical product-type field for cheese items, with 32 distinct categories across 7146 rows and no nulls. Cream Cheese leads at 16.6% (1187 rows), followed by Mozzarella and Soft Cheese, and entropy ratio 0.82 indicates a fairly even spread rather than dominance by one value. No rare-label or drift signals are present in the evidence.

Treatment: One-hot or target-encode for modelling; cardinality of 32 is manageable.

anthropic:claude-opus-4-7 · confidence high
Out[18]:

saturn.columns["category"].stats

statvalue
n7,146
nulls0 (0.0%)
unique32
top_value Cream Cheese
top_rate 0.1661
cardinality 32
entropy 4.098
entropy_ratio 0.8195
Fig 9.
Top values for category.
Show data table
Top values for category (20 unique shown, of 32 total).
valuecountshare
Cream Cheese118716.6%
Mozzarella7029.8%
Soft Cheese6378.9%
Grated Cheese5718.0%
Cottage Cheese5447.6%
Goat Cheese5267.4%
Cheese Spread4736.6%
Gouda4566.4%
Hard Cheese3404.8%
Feta2463.4%
Fresh Cheese1962.7%
Fromage Blanc1502.1%
Raclette1442.0%
Comté991.4%
Edam971.4%
Havarti951.3%
Burrata881.2%
Halloumi871.2%
Ricotta851.2%
Dairy Products771.1%

value numeric other

The column 'value' is numeric but completely constant: all 7146 rows hold the value 1.0, with zero variance and a single unique value. It carries no information for analysis or modelling.

Treatment: Drop; constant column with no signal.

anthropic:claude-opus-4-7 · confidence high
Out[21]:

saturn.columns["value"].stats

statvalue
n7,146
nulls0 (0.0%)
unique1
min 1
max 1
mean 1
median 1
std 0
q1 1
q3 1
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 10.
Distribution of value. Vertical dash marks the median.
Show data table
Histogram bins for value (median: 1.0).
bincount
0.5 – 0.5250
0.525 – 0.550
0.55 – 0.5750
0.575 – 0.60
0.6 – 0.6250
0.625 – 0.650
0.65 – 0.6750
0.675 – 0.70
0.7 – 0.7250
0.725 – 0.750
0.75 – 0.7750
0.775 – 0.80
0.8 – 0.8250
0.825 – 0.850
0.85 – 0.8750
0.875 – 0.90
0.9 – 0.9250
0.925 – 0.950
0.95 – 0.9750
0.975 – 10
1 – 1.0257146
1.025 – 1.050
1.05 – 1.0750
1.075 – 1.10
1.1 – 1.1250
1.125 – 1.150
1.15 – 1.1750
1.175 – 1.20
1.2 – 1.2250
1.225 – 1.250
1.25 – 1.2750
1.275 – 1.30
1.3 – 1.3250
1.325 – 1.350
1.35 – 1.3750
1.375 – 1.40
1.4 – 1.4250
1.425 – 1.450
1.45 – 1.4750
1.475 – 1.50

How to cite

click to copy

BibTeX
@misc{saturn-quirky-cheese-list-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: quirky cheese list},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/quirky-cheese_list}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: quirky cheese list. Source: /home/coolhand/html/datavis/data_trove/data/quirky/cheese_list.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/quirky-cheese_list