quirky hot sauces
Reading
This dataset catalogs 258 hot sauce products sourced entirely from OpenFoodFacts, with 9 categorical columns covering brand, category, country, ingredients, labels, name, and URL. Brands are highly fragmented across 158 unique values, with Tabasco (12) and McIlhenny Company, Tabasco (11) leading but no dominant player — and 37 records have a blank brand worth investigating. Geographically, the United States (54) and France (28) account for the largest shares of the 123 country values, though inconsistent encoding (e.g., 'en:us' vs 'United States') suggests a data-cleaning task. The labels column is sparse: 145 of 258 rows are blank, so dietary tags like 'No gluten' or 'Non GMO project' apply to only a small minority. Note that source and type are constant (OpenFoodFacts / hot_sauce_product) and carry no analytical signal.
citing: brand · countries · labels · categories · name · source · type
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| 37 | 14.3% | |
| Tabasco | 12 | 4.7% |
| McIlhenny Company, Tabasco | 11 | 4.3% |
| Flying Goose Brand | 6 | 2.3% |
| Melinda's | 5 | 1.9% |
| Lola's Fine Hot Sauce | 5 | 1.9% |
| Cholula | 4 | 1.6% |
| Encona | 4 | 1.6% |
| El Yucateco | 4 | 1.6% |
| Mrs. Renfro's | 4 | 1.6% |
| Huy Fong Foods, Inc. | 3 | 1.2% |
| Sauce Shop | 3 | 1.2% |
| Go-Tan | 2 | 0.8% |
| Vitasia | 2 | 0.8% |
| Valentina | 2 | 0.8% |
| Heinz | 2 | 0.8% |
| sauce shop | 2 | 0.8% |
| CHOLULA | 2 | 0.8% |
| TABASCO | 2 | 0.8% |
| Serpis | 2 | 0.8% |
Show data table
| value | count | share |
|---|---|---|
| United States | 54 | 20.9% |
| France | 28 | 10.9% |
| en:us | 10 | 3.9% |
| en:gb | 8 | 3.1% |
| en:fr | 8 | 3.1% |
| en:france | 4 | 1.6% |
| en:germany | 4 | 1.6% |
| United States, World | 4 | 1.6% |
| en:United States | 4 | 1.6% |
| United Kingdom | 3 | 1.2% |
| en:United Kingdom | 3 | 1.2% |
| France, United States | 3 | 1.2% |
| en:Canada | 3 | 1.2% |
| World | 3 | 1.2% |
| France, en:morocco | 2 | 0.8% |
| en:ma | 2 | 0.8% |
| France,Royaume-Uni | 2 | 0.8% |
| en:Germany | 2 | 0.8% |
| Belgique,France | 2 | 0.8% |
| Canada | 2 | 0.8% |
Show data table
| value | count | share |
|---|---|---|
| 145 | 56.2% | |
| No gluten | 9 | 3.5% |
| No GMOs, Non GMO project | 9 | 3.5% |
| Sans gluten | 5 | 1.9% |
| Halal | 4 | 1.6% |
| en:vegan | 4 | 1.6% |
| No GMOs, Non GMO project, en:no-gluten | 3 | 1.2% |
| Point Vert | 3 | 1.2% |
| Vegetarian, Vegan, Green Dot | 2 | 0.8% |
| Triman | 2 | 0.8% |
| No gluten, en:vegan | 2 | 0.8% |
| Punto Verde | 2 | 0.8% |
| Sans OGM,en:Non GMO project | 2 | 0.8% |
| en:halal | 2 | 0.8% |
| en:no-gluten | 2 | 0.8% |
| Sin gluten,Punto Verde | 1 | 0.4% |
| Vegetarian, Vegan, European Vegetarian Union, European Vegetarian Union Vegan, Nutriscore, Rainforest Alliance, en:green-dot | 1 | 0.4% |
| Vegetarian | 1 | 0.4% |
| Thai quality label, Halal, Natural colorings, Thailand Diversity & Refinement, The Central Islamic Committee of Thailand | 1 | 0.4% |
| No gluten, No added MSG | 1 | 0.4% |
Show data table
| value | count | share |
|---|---|---|
| 35 | 13.6% | |
| Condiments, Sauces, Hot sauces, Groceries | 32 | 12.4% |
| Condiments, Sauces, Groceries | 23 | 8.9% |
| Condiments, Sauces, Dips, Groceries | 13 | 5.0% |
| Condiments, Sauces, Sauces chili, en:groceries | 9 | 3.5% |
| Condiments,Sauces | 8 | 3.1% |
| Condiments, Sauces, Hot sauces | 7 | 2.7% |
| Hot sauces | 5 | 1.9% |
| Condiments,Sauces,Hot sauces | 5 | 1.9% |
| Condiments,Sauces,Hot sauces,Groceries | 5 | 1.9% |
| Condiments, Sauces, en:hot-sauces | 4 | 1.6% |
| Condiments,Sauces,Sauces chili | 4 | 1.6% |
| Sauces chili | 3 | 1.2% |
| Condiments, Sauces, Sauces chili, Sauces sriracha, en:groceries | 3 | 1.2% |
| Condiments, Sauces, Barbecue sauces, Groceries | 3 | 1.2% |
| Condiments, Sauces | 3 | 1.2% |
| undefined | 3 | 1.2% |
| Condimentos,Salsas,Salsas de chiles,en:groceries | 2 | 0.8% |
| en:hot-sauces | 2 | 0.8% |
| Condiments, Sauces, Hot sauces, Sriracha sauces | 2 | 0.8% |
Show data table
| value | count | share |
|---|---|---|
| Carolina Reaper Hot Sauce | 6 | 2.3% |
| Tabasco | 5 | 1.9% |
| Sriracha Hot Chilli Sauce | 3 | 1.2% |
| Sriracha Hot Chili Sauce | 3 | 1.2% |
| Sauce de piment sriracha | 3 | 1.2% |
| 3 | 1.2% | |
| Ghost pepper hot sauce | 3 | 1.2% |
| Carolina Reaper Sauce | 3 | 1.2% |
| Carolina reaper hot sauce | 3 | 1.2% |
| Carolina Reaper | 3 | 1.2% |
| Salsa Picante | 2 | 0.8% |
| Sriracha Sauce | 2 | 0.8% |
| Sriracha | 2 | 0.8% |
| Sauce sriracha | 2 | 0.8% |
| Sauce de Piment Sriracha | 2 | 0.8% |
| Tabasco Green Pepper Sauce | 2 | 0.8% |
| Tabasco® brand pepper sauce | 2 | 0.8% |
| Habanero Hot Sauce | 2 | 0.8% |
| Hot Sauce Chile Habanero | 2 | 0.8% |
| Ghost Pepper | 2 | 0.8% |
Schema
9 columns| Alerts | ||||
|---|---|---|---|---|
| name | categorical | 0.0% | 221 |
long_tail
|
| brand | categorical | 0.0% | 158 |
long_tail
|
| countries | categorical | 0.0% | 123 |
long_tail
|
| categories | categorical | 0.0% | 106 |
long_tail
|
| ingredients | categorical | 0.0% | 207 |
long_tail
|
| labels | categorical | 0.0% | 77 |
long_tail
|
| url | categorical | 0.0% | 258 |
long_tail
|
| source | categorical | 0.0% | 1 |
imbalance
|
| type | categorical | 0.0% | 1 |
imbalance
|
name
categorical label long_tailThis is a product name field for hot sauces, with 221 unique values across 258 rows and near-maximal entropy ratio of 0.984. The top value 'Carolina Reaper Hot Sauce' only covers 2.3% of rows, and casing/spelling variants ('Carolina Reaper Hot Sauce' vs 'Carolina reaper hot sauce', 'Sriracha Hot Chilli Sauce' vs 'Sriracha Hot Chili Sauce') plus a French entry and 3 empty strings indicate inconsistent normalization despite a 0.0 null rate. Treatment: normalize casing and spelling variants (and treat empty strings as missing) before grouping or joining.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 221
- top_value
- Carolina Reaper Hot Sauce
- top_rate
- 0.02326
- cardinality
- 221
- entropy
- 7.666
- entropy_ratio
- 0.9843
brand
categorical feature long_tailCategorical brand label for what appears to be a hot sauce catalogue, with 158 distinct brands across 258 rows and very high entropy ratio (0.894) indicating a long tail. The most common value is the empty string at 37 occurrences (14.3% top rate), meaning missing-as-blank dominates over real brands like Tabasco (12) and McIlhenny Company, Tabasco (11). Note also that 'Tabasco' and 'McIlhenny Company, Tabasco' likely refer to the same maker but appear as separate categories, suggesting inconsistent normalisation. Treatment: Replace empty strings with explicit nulls, normalise brand aliases (e.g. Tabasco vs McIlhenny), then group rare brands into 'Other' before encoding.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 158
- top_value
- top_rate
- 0.1434
- cardinality
- 158
- entropy
- 6.53
- entropy_ratio
- 0.8941
countries
categorical feature long_tailThis is a country-of-origin or sale label for 258 records, with 123 distinct values and no nulls. The encoding is inconsistent: plain names ('United States', 54) coexist with Open Food Facts-style tag prefixes ('en:us', 10; 'en:United States', 4) and multi-country strings ('United States, World'), so the same country appears under several spellings. High entropy ratio (0.82) and a long tail confirm the values are fragmented well beyond the 20.9% top rate. Treatment: Normalize to ISO country codes (strip 'en:' prefixes, split comma lists) before grouping or encoding.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 123
- top_value
- United States
- top_rate
- 0.2093
- cardinality
- 123
- entropy
- 5.676
- entropy_ratio
- 0.8176
categories
categorical feature long_tailComma-delimited product category tags, dominated by condiment/sauce/hot-sauce hierarchies. Cardinality is high (106 unique across 258 rows, entropy ratio 0.82) and the most common value is the empty string at 13.6% (35 rows), indicating missing labels encoded as blanks rather than nulls. Near-duplicate variants differ only by spacing, casing, or 'en:' prefixes (e.g., 'Condiments,Sauces' vs 'Condiments, Sauces, Groceries'), so raw cardinality overstates the true taxonomy. Treatment: Normalise delimiters/casing, treat empty strings as missing, then split into a multi-hot tag encoding.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 106
- top_value
- top_rate
- 0.1357
- cardinality
- 106
- entropy
- 5.506
- entropy_ratio
- 0.8183
ingredients
categorical free_text long_tailFree-text ingredient lists for what appears to be hot-sauce or chili products, with 207 distinct strings across 258 rows and entropy ratio 0.90 indicating near-unique values. The dominant 'value' is an empty string at 49 rows (19% top_rate), so roughly a fifth of records have no ingredients recorded. The remaining entries mix multiple languages (English, French, Norwegian, German) and formatting conventions, so direct categorical use is not viable. Treatment: Treat empty strings as missing, then tokenize/normalize across languages and extract ingredient features before modelling.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 207
- top_value
- top_rate
- 0.1899
- cardinality
- 207
- entropy
- 6.922
- entropy_ratio
- 0.8997
labels
categorical feature long_tailFree-form product label tags (dietary, certification, packaging) with 77 distinct values across 258 rows. Over half the rows (56.2%) carry an empty string rather than a true null, so null_rate=0 is misleading. Values mix languages (English 'No gluten' vs French 'Sans gluten') and formats (raw text vs Open Food Facts taxonomy codes like 'en:vegan'), and many cells concatenate multiple labels with commas. Treatment: Treat empty strings as missing, split on commas, normalise language/taxonomy variants, then multi-hot encode.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 77
- top_value
- top_rate
- 0.562
- cardinality
- 77
- entropy
- 3.557
- entropy_ratio
- 0.5675
url
categorical identifier long_tailThis column holds Open Food Facts product URLs, one per row, with the trailing path segment being the product barcode. Every one of the 258 values is unique (entropy_ratio 1.0, top_rate 0.0039), so it functions as a row identifier rather than a feature. Treatment: Drop from modelling; keep as a lookup link or join key on the embedded barcode.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 258
- top_value
- https://world.openfoodfacts.org/product/8710605030051
- top_rate
- 0.003876
- cardinality
- 258
- entropy
- 8.011
- entropy_ratio
- 1
source
categorical metadata imbalanceThis column records the data provenance, with every one of the 258 rows tagged 'OpenFoodFacts'. Cardinality is 1 and entropy is 0, so it carries no information for modelling and simply documents that the entire slice came from a single source. Treatment: Drop before modelling; retain only as dataset-level provenance.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- OpenFoodFacts
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
type
categorical metadata imbalanceThis column is a constant categorical tag identifying every row as 'hot_sauce_product', appearing in all 258 records with no nulls. Cardinality is 1 and entropy is 0, so it carries no discriminative information. It likely served as a type marker from an ingestion pipeline rather than a usable feature. Treatment: Drop before modelling; single constant value provides no signal.
- n
- 258
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- hot_sauce_product
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0