saturn

/home/coolhand/html/datavis/data_trove/data/quirky/cheese_list.json 7,146 rows sample n=7,146 seed 42 2026-06-21T23:45:17+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/cheese_list.json
Total rows7,146
Profiled sample7,146
Columns4
Generated2026-06-21T23:45:17+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
nametext0.0%
countrycategorical0.0%
categorycategorical0.0%
valuenumeric0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is a multilingual catalogue of 7,146 cheese products spanning 32 categories and 111 countries of origin. The most immediately striking pattern is the geographic concentration: France alone accounts for 26% of all entries (1,853), followed by Germany and the United States, suggesting the dataset skews heavily toward Western European dairy traditions. On the category side, Cream Cheese dominates with 1,187 entries (17%), and the top 5 categories together cover over half the dataset — worth examining for potential over-representation. The 'value' column is entirely constant at 1.0 and can be safely ignored. Note also that the product names are highly multilingual (30 languages detected) with an 11% duplicate rate, indicating some cheese types are listed under multiple language variants.

name high anthropic:default

This column contains the names of cheese products (or cheese-related food items), as evidenced by top values like 'Mozzarella', 'Cottage Cheese', and 'Gouda', and dominant words including 'cheese', 'mozzarella', 'fromage', and 'queso'. Notably, 11.3% of values are duplicates (809 out of 7146), partly due to case-inconsistent entries like 'Cottage cheese' (29) and 'Cottage Cheese' (27) being counted separately. A multilingual alert is triggered across 30 detected languages — English (1847), French (1029), German (552), Italian (389), and Spanish (251) are dominant — reflecting international product naming rather than true language mixing in a single entry. The mean name length is ~23 characters with a median of 3 words, consistent with structured product label strings rather than free text.

value high anthropic:default

This column is a numeric constant: every one of its 7,146 non-null rows holds exactly the value 1.0, with zero variance, zero skew, and a single unique value. It carries no information and will contribute nothing to any model or analysis. This is likely a placeholder, a join artifact, or a weight/flag column that was never populated with real data.

category high anthropic:default

This column is a product category label for what appears to be a cheese-focused retail or food dataset, with 32 distinct cheese types across 7,146 records and no nulls. The distribution is moderately uneven: 'Cream Cheese' dominates at 16.6% (1,187 rows), while the top 10 categories alone account for the vast majority of records. The entropy ratio of 0.82 suggests reasonable spread across categories but with a clear long tail beyond the top 10. No anomalies or alerts are present.

country high anthropic:default

This column records the country of origin or residence for each record, spanning 111 distinct countries across 7,146 rows with no nulls. France dominates heavily, accounting for 25.9% of all records (1,853 rows), followed by Germany (907) and United States (759) — suggesting a strongly Europe-centric dataset, likely French-sourced. The entropy ratio of 0.628 indicates moderate distributional spread, but the long tail of 111 countries means many nations are sparsely represented beyond the top 10.

Languages detected

Per-string language detection across text columns (sampled).

Show data table
Per-language counts (total 4,745 detected strings).
langcountshare
en184738.9%
fr102921.7%
de55211.6%
it3898.2%
es2515.3%
nl1463.1%
pt962.0%
pl771.6%
ca501.1%
fi390.8%
sv300.6%
uk290.6%
cs270.6%
ru240.5%
sk190.4%
tr170.4%
et130.3%
no130.3%
ro120.3%
ja120.3%
el100.2%
da90.2%
hu70.1%
he70.1%
lt70.1%
ceb70.1%
eo70.1%
lv70.1%
sh60.1%
id60.1%

name text

31 languages detected in sample
rows7,146
null0 (0.0%)
unique6,337
len_min4
len_max191
len_mean22.962
len_median21.000
len_p9544.000
word_mean3.443
word_median3.000
n_empty0
n_duplicates809
duplicate_rate0.113
vocab_size4,732
readability_flesch_mean53.613
emoji_rate5.60e-04
url_rate0.000
one_word_rate0.104
allcaps_rate0.017
boilerplate_rate0.000
Show data table
Character-length distribution for name (mean: 22.96235656311223).
charscount
4 – 9438
9 – 13994
13 – 181446
18 – 231162
23 – 271119
27 – 32774
32 – 37389
37 – 41341
41 – 46184
46 – 51107
51 – 5569
55 – 6044
60 – 6528
65 – 6917
69 – 747
74 – 792
79 – 832
83 – 886
88 – 930
93 – 981
98 – 1023
102 – 1071
107 – 1126
112 – 1160
116 – 1210
121 – 1263
126 – 1301
130 – 1350
135 – 1400
140 – 1440
144 – 1490
149 – 1540
154 – 1581
158 – 1630
163 – 1680
168 – 1720
172 – 1770
177 – 1820
182 – 1860
186 – 1911
Sample values (first 10)
  1. Cœur Complice
  2. Queso Feta de Grecia
  3. Kremost Naturell
  4. Comté 18 mois
  5. Oneg kosher gourmet, mozzarella cheese
  6. Sir Gauda
  7. Cottage Cheese small curd 4% milkfat minimum
  8. Cheddar cheese spread squeeze
  9. Goat's cheese
  10. Cacouyard

country categorical

rows7,146
null0 (0.0%)
unique111
top_valueFrance
top_rate0.259
cardinality111
entropy4.268
entropy_ratio0.628
Show data table
Top values for country (20 unique shown, of 111 total).
valuecountshare
France185325.9%
Germany90712.7%
United States75910.6%
Belgium3344.7%
United Kingdom3334.7%
Spain3194.5%
Italy3074.3%
Switzerland2092.9%
Poland1452.0%
Netherlands1341.9%
Austria1271.8%
Canada1251.7%
Sweden1231.7%
Portugal1151.6%
Ireland1141.6%
Czech Republic1031.4%
Australia1001.4%
Finland881.2%
Norway751.0%
Bulgaria600.8%
Top values (rank 1–20)
  1. France — 1,853
  2. Germany — 907
  3. United States — 759
  4. Belgium — 334
  5. United Kingdom — 333
  6. Spain — 319
  7. Italy — 307
  8. Switzerland — 209
  9. Poland — 145
  10. Netherlands — 134
  11. Austria — 127
  12. Canada — 125
  13. Sweden — 123
  14. Portugal — 115
  15. Ireland — 114
  16. Czech Republic — 103
  17. Australia — 100
  18. Finland — 88
  19. Norway — 75
  20. Bulgaria — 60

category categorical

rows7,146
null0 (0.0%)
unique32
top_valueCream Cheese
top_rate0.166
cardinality32
entropy4.098
entropy_ratio0.820
Show data table
Top values for category (20 unique shown, of 32 total).
valuecountshare
Cream Cheese118716.6%
Mozzarella7029.8%
Soft Cheese6378.9%
Grated Cheese5718.0%
Cottage Cheese5447.6%
Goat Cheese5267.4%
Cheese Spread4736.6%
Gouda4566.4%
Hard Cheese3404.8%
Feta2463.4%
Fresh Cheese1962.7%
Fromage Blanc1502.1%
Raclette1442.0%
Comté991.4%
Edam971.4%
Havarti951.3%
Burrata881.2%
Halloumi871.2%
Ricotta851.2%
Dairy Products771.1%
Top values (rank 1–20)
  1. Cream Cheese — 1,187
  2. Mozzarella — 702
  3. Soft Cheese — 637
  4. Grated Cheese — 571
  5. Cottage Cheese — 544
  6. Goat Cheese — 526
  7. Cheese Spread — 473
  8. Gouda — 456
  9. Hard Cheese — 340
  10. Feta — 246
  11. Fresh Cheese — 196
  12. Fromage Blanc — 150
  13. Raclette — 144
  14. Comté — 99
  15. Edam — 97
  16. Havarti — 95
  17. Burrata — 88
  18. Halloumi — 87
  19. Ricotta — 85
  20. Dairy Products — 77

value numeric

only one distinct value
rows7,146
null0 (0.0%)
unique1
min1.000
max1.000
mean1.000
median1.000
std0.000
q11.000
q31.000
iqr0.000
skew0.000
kurtosis0.000
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for value (median: 1.0).
bincount
0.5 – 0.5250
0.525 – 0.550
0.55 – 0.5750
0.575 – 0.60
0.6 – 0.6250
0.625 – 0.650
0.65 – 0.6750
0.675 – 0.70
0.7 – 0.7250
0.725 – 0.750
0.75 – 0.7750
0.775 – 0.80
0.8 – 0.8250
0.825 – 0.850
0.85 – 0.8750
0.875 – 0.90
0.9 – 0.9250
0.925 – 0.950
0.95 – 0.9750
0.975 – 10
1 – 1.0257146
1.025 – 1.050
1.05 – 1.0750
1.075 – 1.10
1.1 – 1.1250
1.125 – 1.150
1.15 – 1.1750
1.175 – 1.20
1.2 – 1.2250
1.225 – 1.250
1.25 – 1.2750
1.275 – 1.30
1.3 – 1.3250
1.325 – 1.350
1.35 – 1.3750
1.375 – 1.40
1.4 – 1.4250
1.425 – 1.450
1.45 – 1.4750
1.475 – 1.50