quirky cheese list
Reading
This dataset is a catalogue of 7,146 cheese product entries with a name, a category, a country of origin, and a constant value field. Cheeses span 32 categories and 111 countries, with France alone accounting for 25.9% of rows and Germany and the United States rounding out the top three. Category is led by Cream Cheese (1,187 rows, 16.6%), followed by Mozzarella and Soft Cheese, suggesting some categories are far more populated than others. The name column is multilingual (predominantly English and French, with notable German, Spanish, and Italian presence) and has an 11.3% duplicate rate worth investigating before any de-duplicated analysis. Note that the value column is constant at 1.0 across all rows and carries no analytical signal.
citing: row_count · column_count · columns.name.language_counts · columns.name.stats.duplicate_rate · columns.category.n_unique · columns.category.top_values · columns.category.stats.top_rate · columns.country.n_unique · columns.country.top_values · columns.country.stats.top_rate · columns.value.n_unique · columns.value.stats.mean
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| France | 1853 | 25.9% |
| Germany | 907 | 12.7% |
| United States | 759 | 10.6% |
| Belgium | 334 | 4.7% |
| United Kingdom | 333 | 4.7% |
| Spain | 319 | 4.5% |
| Italy | 307 | 4.3% |
| Switzerland | 209 | 2.9% |
| Poland | 145 | 2.0% |
| Netherlands | 134 | 1.9% |
| Austria | 127 | 1.8% |
| Canada | 125 | 1.7% |
| Sweden | 123 | 1.7% |
| Portugal | 115 | 1.6% |
| Ireland | 114 | 1.6% |
| Czech Republic | 103 | 1.4% |
| Australia | 100 | 1.4% |
| Finland | 88 | 1.2% |
| Norway | 75 | 1.0% |
| Bulgaria | 60 | 0.8% |
Show data table
| value | count | share |
|---|---|---|
| Cream Cheese | 1187 | 16.6% |
| Mozzarella | 702 | 9.8% |
| Soft Cheese | 637 | 8.9% |
| Grated Cheese | 571 | 8.0% |
| Cottage Cheese | 544 | 7.6% |
| Goat Cheese | 526 | 7.4% |
| Cheese Spread | 473 | 6.6% |
| Gouda | 456 | 6.4% |
| Hard Cheese | 340 | 4.8% |
| Feta | 246 | 3.4% |
| Fresh Cheese | 196 | 2.7% |
| Fromage Blanc | 150 | 2.1% |
| Raclette | 144 | 2.0% |
| Comté | 99 | 1.4% |
| Edam | 97 | 1.4% |
| Havarti | 95 | 1.3% |
| Burrata | 88 | 1.2% |
| Halloumi | 87 | 1.2% |
| Ricotta | 85 | 1.2% |
| Dairy Products | 77 | 1.1% |
Show data table
| chars | count |
|---|---|
| 4 – 9 | 438 |
| 9 – 13 | 994 |
| 13 – 18 | 1446 |
| 18 – 23 | 1162 |
| 23 – 27 | 1119 |
| 27 – 32 | 774 |
| 32 – 37 | 389 |
| 37 – 41 | 341 |
| 41 – 46 | 184 |
| 46 – 51 | 107 |
| 51 – 55 | 69 |
| 55 – 60 | 44 |
| 60 – 65 | 28 |
| 65 – 69 | 17 |
| 69 – 74 | 7 |
| 74 – 79 | 2 |
| 79 – 83 | 2 |
| 83 – 88 | 6 |
| 88 – 93 | 0 |
| 93 – 98 | 1 |
| 98 – 102 | 3 |
| 102 – 107 | 1 |
| 107 – 112 | 6 |
| 112 – 116 | 0 |
| 116 – 121 | 0 |
| 121 – 126 | 3 |
| 126 – 130 | 1 |
| 130 – 135 | 0 |
| 135 – 140 | 0 |
| 140 – 144 | 0 |
| 144 – 149 | 0 |
| 149 – 154 | 0 |
| 154 – 158 | 1 |
| 158 – 163 | 0 |
| 163 – 168 | 0 |
| 168 – 172 | 0 |
| 172 – 177 | 0 |
| 177 – 182 | 0 |
| 182 – 186 | 0 |
| 186 – 191 | 1 |
Show data table
| chars | count |
|---|---|
| 4 – 9 | 438 |
| 9 – 13 | 994 |
| 13 – 18 | 1446 |
| 18 – 23 | 1162 |
| 23 – 27 | 1119 |
| 27 – 32 | 774 |
| 32 – 37 | 389 |
| 37 – 41 | 341 |
| 41 – 46 | 184 |
| 46 – 51 | 107 |
| 51 – 55 | 69 |
| 55 – 60 | 44 |
| 60 – 65 | 28 |
| 65 – 69 | 17 |
| 69 – 74 | 7 |
| 74 – 79 | 2 |
| 79 – 83 | 2 |
| 83 – 88 | 6 |
| 88 – 93 | 0 |
| 93 – 98 | 1 |
| 98 – 102 | 3 |
| 102 – 107 | 1 |
| 107 – 112 | 6 |
| 112 – 116 | 0 |
| 116 – 121 | 0 |
| 121 – 126 | 3 |
| 126 – 130 | 1 |
| 130 – 135 | 0 |
| 135 – 140 | 0 |
| 140 – 144 | 0 |
| 144 – 149 | 0 |
| 149 – 154 | 0 |
| 154 – 158 | 1 |
| 158 – 163 | 0 |
| 163 – 168 | 0 |
| 168 – 172 | 0 |
| 172 – 177 | 0 |
| 177 – 182 | 0 |
| 182 – 186 | 0 |
| 186 – 191 | 1 |
Schema
4 columns| Alerts | ||||
|---|---|---|---|---|
| name | text | 0.0% | 6,337 |
multilingual
|
| country | categorical | 0.0% | 111 |
|
| category | categorical | 0.0% | 32 |
|
| value | numeric | 0.0% | 1 |
constant
|
name
text label multilingualShort product names for cheeses, averaging 3.4 words / 23 characters, with top tokens 'cheese' (1465), 'fromage' (406), 'queso' (239) and varieties like Mozzarella, Cottage cheese, Gouda. Language detection spans 30 codes — predominantly en (1847), fr (1029), de (552), it (389), es (251) — confirming the 'multilingual' alert. 809 duplicates (11.3%) include casing variants ('Cottage cheese' 29 vs 'Cottage Cheese' 27), and 6337 unique values out of 7146 means it is high-cardinality but not an identifier. Treatment: Normalize case and language before grouping; consider canonicalizing to a cheese-variety taxonomy.
- n
- 7,146
- nulls
- 0 (0.0%)
- unique
- 6,337
- len_min
- 4
- len_max
- 191
- len_mean
- 22.96
- len_median
- 21
- len_p95
- 44
- word_mean
- 3.443
- word_median
- 3
- n_empty
- 0
- n_duplicates
- 809
- duplicate_rate
- 0.1132
- vocab_size
- 4,732
- readability_flesch_mean
- 53.61
- emoji_rate
- 0.0005598
- url_rate
- 0
- one_word_rate
- 0.1041
- allcaps_rate
- 0.01707
- boilerplate_rate
- 0
country
categorical featureThis is a country-of-origin or location categorical with 111 distinct values across 7146 rows and no nulls. The distribution is Europe-heavy and concentrated: France alone accounts for 25.9% of records, with Germany (907) and the United States (759) trailing, giving an entropy ratio of 0.63. The long tail of 100+ smaller countries means rare-category handling will matter. Treatment: Group rare countries into an 'Other' bucket before one-hot or target encoding.
- n
- 7,146
- nulls
- 0 (0.0%)
- unique
- 111
- top_value
- France
- top_rate
- 0.2593
- cardinality
- 111
- entropy
- 4.268
- entropy_ratio
- 0.6281
category
categorical featureThis is a categorical product-type field for cheese items, with 32 distinct categories across 7146 rows and no nulls. Cream Cheese leads at 16.6% (1187 rows), followed by Mozzarella and Soft Cheese, and entropy ratio 0.82 indicates a fairly even spread rather than dominance by one value. No rare-label or drift signals are present in the evidence. Treatment: One-hot or target-encode for modelling; cardinality of 32 is manageable.
- n
- 7,146
- nulls
- 0 (0.0%)
- unique
- 32
- top_value
- Cream Cheese
- top_rate
- 0.1661
- cardinality
- 32
- entropy
- 4.098
- entropy_ratio
- 0.8195
value
numeric other constantThe column 'value' is numeric but completely constant: all 7146 rows hold the value 1.0, with zero variance and a single unique value. It carries no information for analysis or modelling. Treatment: Drop; constant column with no signal.
- n
- 7,146
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 1
- max
- 1
- mean
- 1
- median
- 1
- std
- 0
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0