Saturn profiled 50 rows across
545 columns. The stats below are deterministic and
machine-readable; the prose is a language-model interpretation of those stats (opt-in,
added after the fact, never sees raw rows).
This is a 50-product sample from the Open Food Facts database, an open crowdsourced food product catalogue with 545 columns spanning multilingual product names, ingredient texts, allergen data, nutritional scores, packaging details, and community metadata. The most striking structural issue is extreme sparsity: the vast majority of language-specific columns (e.g. product_name_dz, ingredients_text_ja) have null rates of 96–98%, meaning content is concentrated in French and English fields. Two things most deserve a closer look: first, the Nutri-Score distribution is heavily skewed toward grade 'e' (54% of products), suggesting the sample leans toward nutritionally poor items; second, scan counts (scans_n, mean 578, max 2523) show a strong right-skewed tail with a few highly popular products dominating community attention.
nutriscore_grade · Check whether grade 'e' dominates the distribution, which would indicate the sample skews toward nutritionally poor products.Show data table
Top values for nutriscore_grade (6 unique shown, of 6 total).
value
count
share
e
27
54.0%
d
9
18.0%
c
7
14.0%
a
4
8.0%
b
2
4.0%
unknown
1
2.0%
Fig 2.
scans_n · Look for the strong right skew (mean 578, max 2523) revealing a small set of highly-scanned popular products.Show data table
Histogram bins for scans_n (median: 492.0).
bin
count
333 – 645.9
39
645.9 – 958.7
7
958.7 – 1272
3
1272 – 1584
0
1584 – 1897
0
1897 – 2210
0
2210 – 2523
1
Fig 3.
nova_groups · Notice that NOVA group 4 (ultra-processed) accounts for the majority of products in this sample.Show data table
Top values for nova_groups (3 unique shown, of 3 total).
value
count
share
4
33
66.0%
3
14
28.0%
1
1
2.0%
Fig 4.
ecoscore_grade · Inspect how eco-scores spread across grades a through f, with 'e' and 'unknown' representing a large share.Show data table
Top values for ecoscore_grade (9 unique shown, of 9 total).
value
count
share
e
12
24.0%
d
9
18.0%
b
8
16.0%
c
8
16.0%
unknown
6
12.0%
a
3
6.0%
a-plus
2
4.0%
not-applicable
1
2.0%
f
1
2.0%
Fig 5.
pnns_groups_1 · See that 'Sugary snacks' dominates the food group classification, confirming the sample's heavy confectionery bias.Show data table
Top values for pnns_groups_1 (7 unique shown, of 7 total).
value
count
share
Sugary snacks
38
76.0%
Salty snacks
4
8.0%
Cereals and potatoes
3
6.0%
unknown
2
4.0%
Milk and dairy products
1
2.0%
Beverages
1
2.0%
Fruits and vegetables
1
2.0%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.Show data table
Distribution of ingredients_without_ciqual_codes_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_without_ciqual_codes_n (median: 3.5).
bin
count
0 – 3.143
25
3.143 – 6.286
9
6.286 – 9.429
8
9.429 – 12.57
4
12.57 – 15.71
3
15.71 – 18.86
0
18.86 – 22
1
origin_sv
categorical
other
This column, likely an origin or source indicator (possibly a survey or system variant field), is effectively empty: 92% of its 50 rows are null, and the sole non-null 'value' present in 4 rows is itself an empty string. With cardinality of 1 and entropy of 0, there is zero information content in this column. The combination of near-total nulls and a blank top value means the column carries no usable signal whatsoever.
Treatment: Drop — column contains no information (92% null, remaining values are empty strings).
Out[49]:
saturn.columns["origin_sv"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
92.0% null
alert: imbalance
top value is 100.0% of rows
Fig 18.
Top values for origin_sv.
Show data table
Top values for origin_sv (1 unique shown, of 1 total).
value
count
share
4
8.0%
product_name_ja
categorical
Out[52]:
saturn.columns["product_name_ja"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 19.
Top values for product_name_ja.
Show data table
Top values for product_name_ja (1 unique shown, of 1 total).
Top values for product_name_fi (4 unique shown, of 4 total).
value
count
share
2
4.0%
Excellence: 90% cocoa Dark Supreme
1
2.0%
Arriba 85% Cacao Dark Chocolate
1
2.0%
Original
1
2.0%
origin_de
categorical
label
This column appears to be a German-language origin/source label field ('origin_de'), but it contains effectively no usable data: the only observed value across all 50 rows is an empty string, appearing 20 times, with 60% of rows (30) being null. Cardinality is 1, entropy is 0, and top_rate is 1.0 — the column is entirely uninformative in its current state.
Treatment: Drop this column; it carries zero information (all non-null values are empty strings and 60% are null).
Out[68]:
saturn.columns["origin_de"].stats
stat
value
n
50
nulls
30 (60.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
60.0% null
alert: imbalance
top value is 100.0% of rows
Fig 21.
Top values for origin_de.
Show data table
Top values for origin_de (1 unique shown, of 1 total).
value
count
share
20
40.0%
packaging_lc
categorical
Out[71]:
saturn.columns["packaging_lc"].stats
stat
value
n
50
nulls
6 (12.0%)
unique
7
top_value
fr
top_rate
0.3864
cardinality
7
entropy
1.992
entropy_ratio
0.7094
Fig 22.
Top values for packaging_lc.
Show data table
Top values for packaging_lc (7 unique shown, of 7 total).
value
count
share
fr
17
34.0%
en
17
34.0%
de
5
10.0%
pt
2
4.0%
it
1
2.0%
es
1
2.0%
hr
1
2.0%
correctors_tags
unknown
Out[74]:
saturn.columns["correctors_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
categories_hierarchy
unknown
Out[76]:
saturn.columns["categories_hierarchy"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
ingredients_ids_debug
unknown
Out[78]:
saturn.columns["ingredients_ids_debug"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
traces_lc
categorical
Out[80]:
saturn.columns["traces_lc"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
6
top_value
fr
top_rate
0.4792
cardinality
6
entropy
1.575
entropy_ratio
0.6093
Fig 23.
Top values for traces_lc.
Show data table
Top values for traces_lc (6 unique shown, of 6 total).
Top values for ingredients_text_with_allergens_nb.
Show data table
Top values for ingredients_text_with_allergens_nb (1 unique shown, of 1 total).
value
count
share
2
4.0%
quantity
categorical
Out[102]:
saturn.columns["quantity"].stats
stat
value
n
50
nulls
1 (2.0%)
unique
36
top_value
100 g
top_rate
0.1224
cardinality
36
entropy
4.956
entropy_ratio
0.9587
alert: long_tail
28 singleton categories
Fig 29.
Top values for quantity.
Show data table
Top values for quantity (20 unique shown, of 36 total).
value
count
share
100 g
6
12.0%
100g
3
6.0%
125g
2
4.0%
42g
2
4.0%
90g
2
4.0%
2
4.0%
100 gram
2
4.0%
230 g
2
4.0%
300 g
1
2.0%
22 g
1
2.0%
230g
1
2.0%
500 ml
1
2.0%
150 g
1
2.0%
304 g
1
2.0%
275 g
1
2.0%
150g
1
2.0%
225 g
1
2.0%
85 g
1
2.0%
36 g
1
2.0%
52
1
2.0%
countries_hierarchy
unknown
Out[105]:
saturn.columns["countries_hierarchy"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
data_quality_tags
unknown
Out[107]:
saturn.columns["data_quality_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
ingredients_n
numeric
Out[109]:
saturn.columns["ingredients_n"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
22
min
1
max
39
mean
11.7
median
9
std
8.244
q1
5
q3
16
iqr
11
skew
1.237
kurtosis
1.435
n_outliers
2
outlier_rate
0.04
zero_rate
0
Fig 30.
Distribution of ingredients_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_n (median: 9.0).
bin
count
1 – 6.429
18
6.429 – 11.86
9
11.86 – 17.29
13
17.29 – 22.71
6
22.71 – 28.14
2
28.14 – 33.57
0
33.57 – 39
2
grades
unknown
Out[112]:
saturn.columns["grades"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
additives_original_tags
unknown
Out[114]:
saturn.columns["additives_original_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
nutrition_score_beverage
numeric
Out[116]:
saturn.columns["nutrition_score_beverage"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
2
min
0
max
1
mean
0.02
median
0
std
0.1414
q1
0
q3
0
iqr
0
skew
6.857
kurtosis
45.02
n_outliers
1
outlier_rate
0.02
zero_rate
0.98
alert: high_skew
skew=+6.86
Fig 31.
Distribution of nutrition_score_beverage. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_beverage (median: 0.0).
bin
count
0 – 0.1429
49
0.1429 – 0.2857
0
0.2857 – 0.4286
0
0.4286 – 0.5714
0
0.5714 – 0.7143
0
0.7143 – 0.8571
0
0.8571 – 1
1
packaging_text_nl
categorical
other
This column appears to hold Dutch-language packaging text for products, but is effectively empty: 76% of the 50 rows are null, and the sole non-null value is an empty string appearing 12 times, giving a cardinality of 1 and zero entropy. Every observed value is either missing or a blank string, meaning this column carries no usable information in this sample.
Treatment: Drop this column; it contains no informative values in the current dataset.
Out[119]:
saturn.columns["packaging_text_nl"].stats
stat
value
n
50
nulls
38 (76.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
76.0% null
alert: imbalance
top value is 100.0% of rows
Fig 32.
Top values for packaging_text_nl.
Show data table
Top values for packaging_text_nl (1 unique shown, of 1 total).
value
count
share
12
24.0%
photographers
unknown
Out[122]:
saturn.columns["photographers"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
pnns_groups_1
categorical
Out[124]:
saturn.columns["pnns_groups_1"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
7
top_value
Sugary snacks
top_rate
0.76
cardinality
7
entropy
1.36
entropy_ratio
0.4846
Fig 33.
Top values for pnns_groups_1.
Show data table
Top values for pnns_groups_1 (7 unique shown, of 7 total).
value
count
share
Sugary snacks
38
76.0%
Salty snacks
4
8.0%
Cereals and potatoes
3
6.0%
unknown
2
4.0%
Milk and dairy products
1
2.0%
Beverages
1
2.0%
Fruits and vegetables
1
2.0%
product_name_en
categorical
Out[127]:
saturn.columns["product_name_en"].stats
stat
value
n
50
nulls
7 (14.0%)
unique
34
top_value
top_rate
0.2326
cardinality
34
entropy
4.654
entropy_ratio
0.9147
alert: long_tail
33 singleton categories
Fig 34.
Top values for product_name_en.
Show data table
Top values for product_name_en (20 unique shown, of 34 total).
(en) en:gluten,en:Amande,en:Arachides,en:Avoine,en:Blé,en:Lait,en:Noisettes,en:Noix,en:Noix de cajou,en:Noix de macadamia,en:Noix de pécan,en:Noix du brésil,en:Orge,en:Pistaches,en:Seigle
1
2.0%
(fr) en:lupin,en:milk,en:mustard,en:soybeans
1
2.0%
generic_name_nl
categorical
Out[133]:
saturn.columns["generic_name_nl"].stats
stat
value
n
50
nulls
38 (76.0%)
unique
4
top_value
top_rate
0.75
cardinality
4
entropy
1.208
entropy_ratio
0.6038
alert: long_tail
3 singleton categories
alert: null_rate
76.0% null
Fig 36.
Top values for generic_name_nl.
Show data table
Top values for generic_name_nl (4 unique shown, of 4 total).
value
count
share
9
18.0%
Extra fijne pure chocolade
1
2.0%
Biscuits bedekt met melkchocolade
1
2.0%
Krokante volkorentoasts
1
2.0%
nutrition_grade_fr
categorical
Out[136]:
saturn.columns["nutrition_grade_fr"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
6
top_value
e
top_rate
0.54
cardinality
6
entropy
1.913
entropy_ratio
0.7399
Fig 37.
Top values for nutrition_grade_fr.
Show data table
Top values for nutrition_grade_fr (6 unique shown, of 6 total).
Top values for last_editor (20 unique shown, of 24 total).
value
count
share
foodless
21
42.0%
municorn-calorie-counter-app
3
6.0%
charlesnepote
2
4.0%
macrofactor
2
4.0%
bodysupport
2
4.0%
moon-rabbit
1
2.0%
gmlaa
1
2.0%
prepperapp
1
2.0%
marmotte73
1
2.0%
laura-chaud
1
2.0%
org-barilla-france-sa
1
2.0%
tom1707
1
2.0%
bubu63
1
2.0%
moncoachigbas
1
2.0%
natrius
1
2.0%
clxtng
1
2.0%
roboto-app
1
2.0%
fgouget
1
2.0%
ludolm
1
2.0%
foodiq
1
2.0%
nutrient_levels_tags
unknown
Out[145]:
saturn.columns["nutrient_levels_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
product_name_nb
categorical
Out[147]:
saturn.columns["product_name_nb"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
2
top_value
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
alert: long_tail
2 singleton categories
alert: null_rate
96.0% null
Fig 40.
Top values for product_name_nb.
Show data table
Top values for product_name_nb (2 unique shown, of 2 total).
value
count
share
1
2.0%
99% mørk sjokolade
1
2.0%
packaging_shapes_tags
unknown
Out[150]:
saturn.columns["packaging_shapes_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
_keywords
unknown
Out[152]:
saturn.columns["_keywords"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
emb_codes_tags
unknown
Out[154]:
saturn.columns["emb_codes_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
images
unknown
Out[156]:
saturn.columns["images"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
states_tags
unknown
Out[158]:
saturn.columns["states_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
packaging_text_sv
categorical
other
This column appears to be a Swedish-language packaging text field ('sv' suffix indicating Swedish locale), but it is effectively empty in this dataset. A 92% null rate leaves only 4 non-null rows, and all 4 of those contain an empty string — meaning there is zero usable content across all 50 rows. The cardinality of 1 and entropy of 0.0 confirm complete absence of informational signal.
Treatment: Drop — 100% of present values are empty strings and 92% are null, yielding no usable signal.
Out[160]:
saturn.columns["packaging_text_sv"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
92.0% null
alert: imbalance
top value is 100.0% of rows
Fig 41.
Top values for packaging_text_sv.
Show data table
Top values for packaging_text_sv (1 unique shown, of 1 total).
value
count
share
4
8.0%
informers_tags
unknown
Out[163]:
saturn.columns["informers_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
ingredients_text_pl
categorical
Out[165]:
saturn.columns["ingredients_text_pl"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
3
top_value
top_rate
0.6
cardinality
3
entropy
1.371
entropy_ratio
0.865
alert: long_tail
2 singleton categories
alert: null_rate
90.0% null
Fig 42.
Top values for ingredients_text_pl.
Show data table
Top values for ingredients_text_pl (3 unique shown, of 3 total).
value
count
share
3
6.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, kakao w proszku o obniżonej zawartości tłuszczu, emulgator: lecytyny (soja); naturalny aromat waniliowy. Czekolada gorzka: masa kakaowa minimum 74 %. Może zawierać orzeszki ziemne, orzechy, mleko i gluten (pszenica, żyt jęczmień, owies, pszenica orkisz i pszenica khorosan).
1
2.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, wanilia.
1
2.0%
labels
categorical
Out[168]:
saturn.columns["labels"].stats
stat
value
n
50
nulls
1 (2.0%)
unique
42
top_value
top_rate
0.1633
cardinality
42
entropy
5.125
entropy_ratio
0.9504
alert: long_tail
41 singleton categories
Fig 43.
Top values for labels.
Show data table
Top values for labels (20 unique shown, of 42 total).
value
count
share
8
16.0%
Distributor labels,Charte LU Harmony,Triman
1
2.0%
Point Vert,Triman
1
2.0%
No preservatives, Made in France, Natural flavors, No colorings, No palm oil, Nutriscore, Nutriscore Grade B, Triman, en:green-dot
Commerce équitable,Bio,Végétarien,Bio européen,Fairtrade International,Agriculture non UE,Végétalien,FR-BIO-01,en:FSC,FSC Mix,Point Vert,Max Havelaar,PL-EKO-07,en:Soil Association Organic,The Vegan Society
1
2.0%
Agriculture non UE,Fabriqué en Belgique,Fabriqué en France,Sans huile de palme,Triman
1
2.0%
Organic,EU Organic,Non-EU Agriculture,Certified B Corporation,EU Agriculture,EU/non-EU Agriculture,FR-BIO-01,No palm oil,Pure cocoa butter,AB Agriculture Biologique,fr:Farine de blé français
Top values for nutrition_data_prepared.
Show data table
Top values for nutrition_data_prepared (1 unique shown, of 1 total).
value
count
share
48
96.0%
packaging_text_fi
categorical
metadata
This column appears to be Finnish-language packaging text for a product dataset, but it is almost entirely empty: 90% of the 50 rows are null, and the sole non-null value across all 5 populated rows is an empty string. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated.
Treatment: Drop this column; it is 90% null with only empty strings in the remaining rows and provides no signal for modelling or analysis.
Out[187]:
saturn.columns["packaging_text_fi"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
90.0% null
alert: imbalance
top value is 100.0% of rows
Fig 48.
Top values for packaging_text_fi.
Show data table
Top values for packaging_text_fi (1 unique shown, of 1 total).
value
count
share
5
10.0%
interface_version_created
categorical
Out[190]:
saturn.columns["interface_version_created"].stats
stat
value
n
50
nulls
1 (2.0%)
unique
3
top_value
20120622
top_rate
0.5918
cardinality
3
entropy
1.167
entropy_ratio
0.7363
Fig 49.
Top values for interface_version_created.
Show data table
Top values for interface_version_created (3 unique shown, of 3 total).
Top values for ingredients_text_with_allergens_es.
Show data table
Top values for ingredients_text_with_allergens_es (13 unique shown, of 13 total).
value
count
share
7
14.0%
Pasta de cacao, manteca de cacao, cacao magro en polvo, azúcar, vainilla.
1
2.0%
Azúcar, Grasa vegetal de palmiste parcialmente hidrogenada, Leche en polvo, Almendras, Cacao desgrasado en polvo, suero lácteo en polvo, Emulgente (lecitina de soja), aroma (vainilla).
1
2.0%
Crema de avellanas y cacao 40% (azúcar, manteca de palma, avellanas 13%, leche desnatada en polvo 8,7%, cacao desgrasado 7.4%, emulgentes (lecitinas (soja), vainillina), harina de trigo 32,5%, grasas vegetales (palma, palmiste), azúcar de caña 8,5% (trigo), lactosa, salvado de trigo, leche entera en polvo, extracto en polvo de malta de cebada y maíz, miel, gasificantes (difosfato disódico, carbonato ácido de sodio, carbonato ácido de amonio), cacao desgrasado, sal, almidón de trigo, harina de cebada, malteada, emulsionantes (lecitinas (soja), vainillina.
1
2.0%
70% pasta de cacao*, azúcar, rnanteca de cacao, cacao desgrasado en polvo, emulgente: lecitlna de girasol (E-322), aroma natural de vainilla. *Pasta de cacao Ralnforest Alliance Certified cocoa. Cacao: 74% mínimo.
1
2.0%
Harina de TRIGO, grasa de palma, extracto de malta de CEBADA, gasificantes (carbonatos de amonio, carbonatos de sodio), sal, HUEVO, aroma, agente de tratamiento de la harina (METABISULFITO sódico).
1
2.0%
Pasta de cacao, azúcar, manteca de cacao, vainilla.
1
2.0%
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
1
2.0%
Copos de avena integral (60%),azúcar, aceite refinado de girasol, miel (3%), sal, melaza de caña, emulgente (lecitina de girasol), gasificante (carbonato ácido de sodio),
1
2.0%
Pasta de cacao, cacao magro, manteca de cacao, azúcar moreno de caña
pasta de cacao, azúcar, manteca de cacao, emulgente (lecitina de soja), vainilla. Cacao: 70% mínimo.
1
2.0%
Pasta de cacao, cacao desgrasado en polvo, manteca de cacao, azúcar, leche en polvo, pasta de almendras y avellanas, emulgentes (lecitinas de soja, girasol), aroma
1
2.0%
labels_lc
categorical
Out[213]:
saturn.columns["labels_lc"].stats
stat
value
n
50
nulls
1 (2.0%)
unique
6
top_value
en
top_rate
0.449
cardinality
6
entropy
1.57
entropy_ratio
0.6072
Fig 54.
Top values for labels_lc.
Show data table
Top values for labels_lc (6 unique shown, of 6 total).
value
count
share
en
22
44.0%
fr
22
44.0%
es
2
4.0%
de
1
2.0%
it
1
2.0%
pl
1
2.0%
nova_group_debug
categorical
Out[216]:
saturn.columns["nova_group_debug"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
3
top_value
top_rate
0.96
cardinality
3
entropy
0.2823
entropy_ratio
0.1781
alert: long_tail
2 singleton categories
alert: imbalance
top value is 96.0% of rows
Fig 55.
Top values for nova_group_debug.
Show data table
Top values for nova_group_debug (3 unique shown, of 3 total).
value
count
share
48
96.0%
no nova group if too many ingredients are unknown: 5 out of 5
1
2.0%
no nova group if too many ingredients are unknown: 13 out of 13
This column represents a count of unique scans (likely QR-code or barcode scan events) per record, with 50 observations and no nulls. The bulk of values cluster between 362.75 (Q1) and 560.75 (Q3), yet a right-skewed tail (skew=3.91, kurtosis=18.71) driven by 4 outliers pulls the mean (525.38) well above the median (432.0), with a maximum of 2257.0 — nearly 4× the Q3 value. The outlier rate of 8% in just 50 rows is a strong signal that a small number of records see dramatically higher scan volumes than the rest.
Treatment: Log-transform or apply robust scaling before modelling to reduce the influence of the 4 extreme outliers; investigate those records for data-quality issues.
Out[246]:
saturn.columns["unique_scans_n"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
48
min
319
max
2,257
mean
525.4
median
432
std
306.4
q1
362.8
q3
560.8
iqr
198
skew
3.911
kurtosis
18.71
n_outliers
4
outlier_rate
0.08
zero_rate
0
alert: high_skew
skew=+3.91
alert: outliers
8.0% rows beyond 1.5 IQR
Fig 61.
Distribution of unique_scans_n. Vertical dash marks the median.
Show data table
Histogram bins for unique_scans_n (median: 432.0).
bin
count
319 – 595.9
39
595.9 – 872.7
7
872.7 – 1150
3
1150 – 1426
0
1426 – 1703
0
1703 – 1980
0
1980 – 2257
1
update_key
categorical
Out[249]:
saturn.columns["update_key"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
9
top_value
brands
top_rate
0.56
cardinality
9
entropy
2.015
entropy_ratio
0.6357
alert: long_tail
5 singleton categories
Fig 62.
Top values for update_key.
Show data table
Top values for update_key (9 unique shown, of 9 total).
value
count
share
brands
28
56.0%
sort
10
20.0%
divinfood
5
10.0%
key_1748337248
2
4.0%
nova-yogurts
1
2.0%
key_1744830970
1
2.0%
ingredients20240805
1
2.0%
germany2
1
2.0%
france
1
2.0%
emb_codes_orig
categorical
Out[252]:
saturn.columns["emb_codes_orig"].stats
stat
value
n
50
nulls
17 (34.0%)
unique
5
top_value
top_rate
0.8485
cardinality
5
entropy
0.9048
entropy_ratio
0.3897
alert: long_tail
3 singleton categories
alert: null_rate
34.0% null
Fig 63.
Top values for emb_codes_orig.
Show data table
Top values for emb_codes_orig (5 unique shown, of 5 total).
Farine de maïs* (70%), farine de riz*, sel marin. * K issus de l'agriculture biologique. • sans sucres ajoutés(¹) (contient des sucres naturellement présents.
kakaomassa, fettreducerat kakaopulver, kakaosmör, socker, emulgeringsmedel (sojalecitin), vaniljextrakt. Minst 85 % kakao i chokladen. Kan innehålla spår av nötter och mjölk.
Distribution of ingredients_sweeteners_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_sweeteners_n (median: 0.0).
bin
count
-0.5 – -0.3571
0
-0.3571 – -0.2143
0
-0.2143 – -0.07143
0
-0.07143 – 0.07143
50
0.07143 – 0.2143
0
0.2143 – 0.3571
0
0.3571 – 0.5
0
ingredients_text_ja
categorical
Out[300]:
saturn.columns["ingredients_text_ja"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 75.
Top values for ingredients_text_ja.
Show data table
Top values for ingredients_text_ja (1 unique shown, of 1 total).
value
count
share
1
2.0%
allergens_tags
unknown
Out[303]:
saturn.columns["allergens_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
origin_es
categorical
other
This column appears to be a Spanish-language origin/source label field ('origin_es'), but it is entirely devoid of meaningful content: the sole observed value is an empty string, appearing 20 times across 50 rows. With a 60% null rate and the remaining 40% being empty strings, the column carries zero informational entropy and is effectively blank across the entire dataset. This is a strong signal that the field was never populated.
Treatment: Drop this column; it contains no usable signal (cardinality 1, top value is empty string, 60% nulls).
Out[305]:
saturn.columns["origin_es"].stats
stat
value
n
50
nulls
30 (60.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
60.0% null
alert: imbalance
top value is 100.0% of rows
Fig 76.
Top values for origin_es.
Show data table
Top values for origin_es (1 unique shown, of 1 total).
value
count
share
20
40.0%
last_updated_t
numeric
Out[308]:
saturn.columns["last_updated_t"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
50
min
1.739e+09
max
1.769e+09
mean
1.763e+09
median
1.767e+09
std
8.037e+06
q1
1.762e+09
q3
1.768e+09
iqr
6.138e+06
skew
-1.945
kurtosis
2.892
n_outliers
6
outlier_rate
0.12
zero_rate
0
alert: outliers
12.0% rows beyond 1.5 IQR
Fig 77.
Distribution of last_updated_t. Vertical dash marks the median.
Show data table
Histogram bins for last_updated_t (median: 1766580948.5).
bin
count
1.739e+09 – 1.743e+09
3
1.743e+09 – 1.747e+09
1
1.747e+09 – 1.752e+09
1
1.752e+09 – 1.756e+09
2
1.756e+09 – 1.76e+09
3
1.76e+09 – 1.764e+09
8
1.764e+09 – 1.769e+09
32
origin_fr
categorical
Out[311]:
saturn.columns["origin_fr"].stats
stat
value
n
50
nulls
4 (8.0%)
unique
7
top_value
top_rate
0.8696
cardinality
7
entropy
0.8958
entropy_ratio
0.3191
alert: long_tail
6 singleton categories
Fig 78.
Top values for origin_fr.
Show data table
Top values for origin_fr (7 unique shown, of 7 total).
value
count
share
40
80.0%
Fabriqué par: Aachen Allemagne
1
2.0%
Germe de blé origine ue. Sésame origine non-ue.
1
2.0%
France
1
2.0%
fabriqué en France.pommes origine UE. noisettes origine UE et non UE
1
2.0%
Fabriqué en France par Nutrition et Santé. Farine de blé: France. Figues : non UE
1
2.0%
Pâte de cacao (Afrique de l'Ouest, Amérique du Sud)Afrique, Europe, Madagascar, Amérique du Sud, Afrique de l'Ouest
Top values for ingredients_text_with_allergens_it.
Show data table
Top values for ingredients_text_with_allergens_it (12 unique shown, of 12 total).
value
count
share
5
10.0%
Pasta di cacao, burro di cacao, cacao magro in polvere, zucchero. Può contenere nocciole, mandorle, altra frutta a guscio, latte, soia.
1
2.0%
crema alle NOCCIOLE e al cacao 40% (zucchero, olio di palma, NOCCIOLE 13%, LATTE Scremato in polvere 8.7%, cacao magro 7,4%, emulsionanti: lecitine (SOIA): vanillina), farina di FRUMENTO (32%), grassi vegetali (palma, palmisto), zucchero di canna (9%), LATTOSIO, crusca di FRUMENTO, LATTE intero in polvere, estratto in polvere di malto d'ORZO e mais, miele, agenti lievitanti (difosfato disodico. carbonato acido di ammonio, carbonato acido di sodio), cacao magro, sale, amido di FRUMENTO, farina di ORZO maltato, emulsionanti: lecitine (SOIA), vanillina.
1
2.0%
pasta di cacao, zucchero, burro di cacao, vaniglia
1
2.0%
patate, olio di girasole, sale marino.
1
2.0%
Pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna, vaniglia.
1
2.0%
Farina integrale di segale (59 g), crusca di grano (27 g), fiocchi d'avena (12 g), semi di sesamo (7,0 g), germe di grano, sale. Può contenere tracce di latte.
1
2.0%
Farina di FRUMENTO, olio di palma, sciroppo di glucosio, estratto di malto d'ORZO, agenti lievitanti (carbonati di ammonio, carbonati di sodio), sale, UOVA, aroma, agente di trattamento della farina (METABISOLFITO di sodio).
1
2.0%
Pasta di cacao, zucchero, burro di cacao, vaniglia.
1
2.0%
Massa di cacao, zucchero, burro di cacao, emulsionante: lecitine (soia); estratto di vaniglia. Può contenere tracce di frutta a guscio e latte. Il 40% della massa di cacao proviene da piantagioni selezionate dell'Ecuador.
1
2.0%
wdrated potatoes, sunflower oll, wheat flour, corn lour.test NRC b ber otin. Emulgator (E471), Salz, Farbstoff (Annatto Norbirin, k hottom (BB). Packaged in a protective atmosphere, (DE) KNAEF Kam ef s1sel colorant (n0rbixine de rocou). Peut contenir lait, soja. À conse gie vepackt. (FR) SNACK SALE. INGREDIENTS: Pommes de terre disht SNCK SALATO. : Patate disidratate, olio di girasole, (arina d frmu botisiha d annatto). Puo contenere latte, sola. Da consumarsi prelerbilmetp SEL NGREDIENTES: Batatas desidratadas, óleo de girasol, farinha de trigo.(aimha d mh e o, Pode conter leite, soja. Consumir de preferëncia antes de: ver fundo (BB), Enbazhyer OHTS Pttas deshidratadas, aceite de qirasol, harina de trigo, harina de maiz, haia ca rm e eche, soja. Consumir preferentemente antes del: ver parte interior (8B), Enast et 'Releenc itle dn 100 g | RI" /30g| Eectsge/Ayt acuilo medo 84U bole / Prodoth te /30g ji begja /Valor energetico Tpas (Grassi/ Unjdos / Grasas tan eậticte Fetsäuren / dont 2214 kJ 664 kJ 530 kcal 159 kcal adulo medio / 8% 31g 3.0 9 9.3 0.9g 17g 13% Produoad by: see yd Aii dd cassi satui / dos quais Producido por urdes thtrde | Glucites | 5% oidrati / MedaCoyK Sabd 55g 7% Uont sucres /di eui *FRSCAME QNg
1
2.0%
25% noci, 25% mandorle, 25% uva sultanina (99,5% uva sultanina, olio di semi di girasole), 25% mirtilli rossi americani, essiccati e zuccherati (60% mirtilli rossi americani, 39% zucchero, olio di semi di girasole). Può contenere tracce di altra frutta a guscio e arachidi. Confezionato in atmosfera protettiva.
1
2.0%
data_quality_errors_tags
unknown
Out[322]:
saturn.columns["data_quality_errors_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
origin_pl
categorical
metadata
This column appears to be an 'origin platform' or similar provenance field, but it is essentially empty: 90% of its 50 rows are null, and the only non-null value is an empty string appearing 5 times. With cardinality of 1 and entropy of 0.0, it carries zero information. The combination of high null rate and a single blank value strongly suggests this field was never populated in this dataset slice.
Treatment: Drop — zero variance and 90% nulls make this column useless for modelling or analysis.
Out[324]:
saturn.columns["origin_pl"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
90.0% null
alert: imbalance
top value is 100.0% of rows
Fig 81.
Top values for origin_pl.
Show data table
Top values for origin_pl (1 unique shown, of 1 total).
value
count
share
5
10.0%
packaging_text_fr
categorical
Out[327]:
saturn.columns["packaging_text_fr"].stats
stat
value
n
50
nulls
3 (6.0%)
unique
14
top_value
top_rate
0.7234
cardinality
14
entropy
1.874
entropy_ratio
0.4923
alert: long_tail
13 singleton categories
Fig 82.
Top values for packaging_text_fr.
Show data table
Top values for packaging_text_fr (14 unique shown, of 14 total).
value
count
share
34
68.0%
1 film en plastique à recycler
1 étui en papier ondulé à recycler
1
2.0%
carton, plastique
1
2.0%
1 bouchon en plastique à trier
1 bouteille en plastique à trier
1
2.0%
1 étui en carton à recycler
1 feuille en aluminium à recycler
1
2.0%
1 sachet plastique à jeter
1
2.0%
1 étui en carton à recycler
1 feuille en aluminium à recycler
1
2.0%
LE TRI +FACILE + BAC DE TRI
1
2.0%
4 FILMS PLASTIQUE A JETER
1 ÉTUI CARTON À RECYCLER
1
2.0%
FR LE TRI + FACILE ÉTUI 8+ SACHETS BAC DE TRI A consommer de préférence avant le : en France par et Santé S.A.S. 10:02 11914538 112 eCastelnaudary REVEL 30 04 2024
1
2.0%
1 étui carton à recycler, 1 film plastique à jeter, 1 barquette plastique à jeter.
1
2.0%
1 FEUILLE PAPIER À RECYCLER, 1 FEUILLE METAL À RECYCLER, 1 FILM PLASTIQUE À JETER
1
2.0%
Sachet, clip à recycler
1
2.0%
2 sachets en plastique à recycler
1 boîte en carton à recycler
kakaomassa, fettreducerat kakaopulver, kakaosmör, socker, emulgeringsmedel (_sojalecitin_), vaniljextrakt. Minst 85 % kakao i chokladen. Kan innehålla spår av nötter och mjölk.
en:gluten,en:Amande,en:Arachides,en:Avoine,en:Blé,en:Lait,en:Noisettes,en:Noix,en:Noix de cajou,en:Noix de macadamia,en:Noix de pécan,en:Noix du brésil,en:Orge,en:Pistaches,en:Seigle
1
2.0%
en:lupin,en:milk,en:mustard,en:soybeans
1
2.0%
en:gluten,en:milk
1
2.0%
en:gluten,en:nuts,en:peanuts,en:soybeans
1
2.0%
en:nuts,en:peanuts,en:soybeans
1
2.0%
en:gluten,en:nuts
1
2.0%
known_ingredients_n
numeric
Out[346]:
saturn.columns["known_ingredients_n"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
22
min
0
max
36
mean
11.76
median
9
std
8.721
q1
5
q3
18.5
iqr
13.5
skew
0.8598
kurtosis
0.07411
n_outliers
0
outlier_rate
0
zero_rate
0.04
Fig 87.
Distribution of known_ingredients_n. Vertical dash marks the median.
Show data table
Histogram bins for known_ingredients_n (median: 9.0).
bin
count
0 – 5.143
16
5.143 – 10.29
12
10.29 – 15.43
6
15.43 – 20.57
7
20.57 – 25.71
5
25.71 – 30.86
2
30.86 – 36
2
packaging_text_pl
categorical
other
This column appears to be a Polish-language packaging text field that is almost entirely empty: 90% of its 50 rows are null, and the sole non-null value present in 5 rows is an empty string. With cardinality of 1 and entropy of 0, the column carries zero information. The combination of a 90% null rate and a top_value of '' means not a single meaningful entry exists in this sample.
Treatment: Drop this column; it contains no usable information in the current sample.
Out[349]:
saturn.columns["packaging_text_pl"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
90.0% null
alert: imbalance
top value is 100.0% of rows
Fig 88.
Top values for packaging_text_pl.
Show data table
Top values for packaging_text_pl (1 unique shown, of 1 total).
Top values for ingredients_text_with_allergens_fi.
Show data table
Top values for ingredients_text_with_allergens_fi (4 unique shown, of 4 total).
value
count
share
2
4.0%
kaakaomassa, kaakaovoi, vähärasvainen kaakaojauhe, sokeri, vanilja. Saattaa sisältää hasselpähkinää, muita pähkinöitä, maitoa, soijaa. Tummassa suklaassa kaakaota vähintään 90%.
1
2.0%
kaakaomassa, vähärasvainen kaakaojauhe, kaakaovoi, sokeri, emulgointiaine (soijalesitiini), vaniljauute. Suklaassa kaakaota vähintään 85 %. Saattaa sisältää pieniä määriä pähkinää ja maitoa.
Top values for ingredients_text_with_allergens_pl.
Show data table
Top values for ingredients_text_with_allergens_pl (3 unique shown, of 3 total).
value
count
share
2
4.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, kakao w proszku o obniżonej zawartości tłuszczu, emulgator: lecytyny (soja); naturalny aromat waniliowy. Czekolada gorzka: masa kakaowa minimum 74 %. Może zawierać orzeszki ziemne, orzechy, mleko i gluten (pszenica, żyt jęczmień, owies, pszenica orkisz i pszenica khorosan).
1
2.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, wanilia.
1
2.0%
allergens_hierarchy
unknown
Out[433]:
saturn.columns["allergens_hierarchy"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
languages_hierarchy
unknown
Out[435]:
saturn.columns["languages_hierarchy"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
nova_groups_tags
unknown
Out[437]:
saturn.columns["nova_groups_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
ingredients_tags
unknown
Out[439]:
saturn.columns["ingredients_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
ingredients_text_it
categorical
Out[441]:
saturn.columns["ingredients_text_it"].stats
stat
value
n
50
nulls
34 (68.0%)
unique
12
top_value
top_rate
0.3125
cardinality
12
entropy
3.274
entropy_ratio
0.9134
alert: long_tail
11 singleton categories
alert: null_rate
68.0% null
Fig 106.
Top values for ingredients_text_it.
Show data table
Top values for ingredients_text_it (12 unique shown, of 12 total).
value
count
share
5
10.0%
Pasta di cacao, burro di cacao, cacao magro in polvere, zucchero. Può contenere nocciole, mandorle, altra frutta a guscio, latte, soia.
1
2.0%
crema alle NOCCIOLE e al cacao 40% (zucchero, olio di palma, NOCCIOLE 13%, LATTE Scremato in polvere 8.7%, cacao magro 7,4%, emulsionanti: lecitine (SOIA): vanillina), farina di FRUMENTO (32%), grassi vegetali (palma, palmisto), zucchero di canna (9%), LATTOSIO, crusca di FRUMENTO, LATTE intero in polvere, estratto in polvere di malto d'ORZO e mais, miele, agenti lievitanti (difosfato disodico. carbonato acido di ammonio, carbonato acido di sodio), cacao magro, sale, amido di FRUMENTO, farina di ORZO maltato, emulsionanti: lecitine (SOIA), vanillina.
1
2.0%
pasta di cacao, zucchero, burro di cacao, vaniglia
1
2.0%
patate, olio di girasole, sale marino.
1
2.0%
Pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna, vaniglia.
1
2.0%
Farina integrale di _segale_ (59 g), crusca di _grano_ (27 g), fiocchi d'_avena_ (12 g), semi di _sesamo_ (7,0 g), germe di _grano_, sale. Può contenere tracce di _latte_.
1
2.0%
Farina di _FRUMENTO_, olio di palma, sciroppo di glucosio, estratto di malto d'_ORZO_, agenti lievitanti (carbonati di ammonio, carbonati di sodio), sale, _UOVA_, aroma, agente di trattamento della farina (_METABISOLFITO_ di sodio).
1
2.0%
Pasta di cacao, zucchero, burro di cacao, vaniglia.
1
2.0%
Massa di cacao, zucchero, burro di cacao, emulsionante: lecitine (soia); estratto di vaniglia. Può contenere tracce di frutta a guscio e latte. Il 40% della massa di cacao proviene da piantagioni selezionate dell'Ecuador.
1
2.0%
wdrated potatoes, sunflower oll, wheat flour, corn lour.test NRC b ber otin. Emulgator (E471), Salz, Farbstoff (Annatto Norbirin, k hottom (BB). Packaged in a protective atmosphere, (DE) KNAEF Kam ef s1sel colorant (n0rbixine de rocou). Peut contenir lait, soja. À conse gie vepackt. (FR) SNACK SALE. INGREDIENTS: Pommes de terre disht SNCK SALATO. : Patate disidratate, olio di girasole, (arina d frmu botisiha d annatto). Puo contenere latte, sola. Da consumarsi prelerbilmetp SEL NGREDIENTES: Batatas desidratadas, óleo de girasol, farinha de trigo.(aimha d mh e o, Pode conter leite, soja. Consumir de preferëncia antes de: ver fundo (BB), Enbazhyer OHTS Pttas deshidratadas, aceite de qirasol, harina de trigo, harina de maiz, haia ca rm e eche, soja. Consumir preferentemente antes del: ver parte interior (8B), Enast et 'Releenc itle dn 100 g | RI" /30g| Eectsge/Ayt acuilo medo 84U bole / Prodoth te /30g ji begja /Valor energetico Tpas (Grassi/ Unjdos / Grasas tan eậticte Fetsäuren / dont 2214 kJ 664 kJ 530 kcal 159 kcal adulo medio / 8% 31g 3.0 9 9.3 0.9g 17g 13% Produoad by: see yd Aii dd cassi satui / dos quais Producido por urdes thtrde | Glucites | 5% oidrati / MedaCoyK Sabd 55g 7% Uont sucres /di eui *FRSCAME QNg
1
2.0%
25% noci, 25% mandorle, 25% uva sultanina (99,5% uva sultanina, olio di semi di girasole), 25% mirtilli rossi americani, essiccati e zuccherati (60% mirtilli rossi americani, 39% zucchero, olio di semi di girasole). Può contenere tracce di altra frutta a guscio e arachidi. Confezionato in atmosfera protettiva.
1
2.0%
informers
unknown
Out[444]:
saturn.columns["informers"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
origin_nb
categorical
Out[446]:
saturn.columns["origin_nb"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 107.
Top values for origin_nb.
Show data table
Top values for origin_nb (1 unique shown, of 1 total).
value
count
share
2
4.0%
creator
categorical
Out[449]:
saturn.columns["creator"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
13
top_value
openfoodfacts-contributors
top_rate
0.46
cardinality
13
entropy
2.351
entropy_ratio
0.6353
alert: long_tail
10 singleton categories
Fig 108.
Top values for creator.
Show data table
Top values for creator (13 unique shown, of 13 total).
value
count
share
openfoodfacts-contributors
23
46.0%
kiliweb
15
30.0%
javichu
2
4.0%
meryemali
1
2.0%
vichenze
1
2.0%
mllep
1
2.0%
andre
1
2.0%
sqoia
1
2.0%
shaolan
1
2.0%
tacite
1
2.0%
mambl
1
2.0%
norbert45fr
1
2.0%
date-limite-app
1
2.0%
packaging_text_ja
categorical
Out[452]:
saturn.columns["packaging_text_ja"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 109.
Top values for packaging_text_ja.
Show data table
Top values for packaging_text_ja (1 unique shown, of 1 total).
value
count
share
1
2.0%
sortkey
numeric
Out[455]:
saturn.columns["sortkey"].stats
stat
value
n
50
nulls
6 (12.0%)
unique
44
min
1.568e+09
max
1.611e+09
mean
1.605e+09
median
1.608e+09
std
8.692e+06
q1
1.604e+09
q3
1.61e+09
iqr
6.16e+06
skew
-2.782
kurtosis
8.091
n_outliers
4
outlier_rate
0.09091
zero_rate
0
alert: high_skew
skew=-2.78
alert: outliers
9.1% rows beyond 1.5 IQR
Fig 110.
Distribution of sortkey. Vertical dash marks the median.
Show data table
Histogram bins for sortkey (median: 1608147866.0).
bin
count
1.568e+09 – 1.575e+09
1
1.575e+09 – 1.582e+09
1
1.582e+09 – 1.589e+09
1
1.589e+09 – 1.596e+09
1
1.596e+09 – 1.604e+09
5
1.604e+09 – 1.611e+09
35
packagings_materials_main
categorical
Out[458]:
saturn.columns["packagings_materials_main"].stats
stat
value
n
50
nulls
31 (62.0%)
unique
3
top_value
en:paper-or-cardboard
top_rate
0.6842
cardinality
3
entropy
1.105
entropy_ratio
0.6972
alert: null_rate
62.0% null
Fig 111.
Top values for packagings_materials_main.
Show data table
Top values for packagings_materials_main (3 unique shown, of 3 total).
value
count
share
en:paper-or-cardboard
13
26.0%
en:plastic
5
10.0%
en:unknown
1
2.0%
ingredients_percent_analysis
numeric
feature
This column appears to be a binary flag or pass/fail indicator for ingredient percentage analysis, taking only two distinct values across all 50 rows: 1.0 (present in the vast majority) and -1.0 (a minority case). With Q1, median, and Q3 all equal to 1.0 and a mean of 0.84, roughly 84% of records are coded 1.0 while the remaining ~16% are -1.0, which are flagged as the 4 outliers (8% outlier rate). The extreme skew (−3.10) and kurtosis (7.59) are entirely explained by this near-constant binary distribution, not by a continuous numeric spread.
Treatment: Recode as a binary categorical (1 / -1 → 1 / 0) before modelling; verify whether -1.0 encodes 'fail' or 'missing' to avoid misinterpretation.
This column is intended to capture an environmental impact level category, but it is effectively empty: 56% of the 50 rows are null and the remaining 44% (22 rows) contain only a blank string, yielding a single unique value and zero entropy. The column carries no usable information in its current state and is entirely uninformative for modelling or analysis.
Treatment: Drop this column; all non-null values are blank strings and it contains zero informational signal.
Out[468]:
saturn.columns["environment_impact_level"].stats
stat
value
n
50
nulls
28 (56.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
56.0% null
alert: imbalance
top value is 100.0% of rows
Fig 113.
Top values for environment_impact_level.
Show data table
Top values for environment_impact_level (1 unique shown, of 1 total).
value
count
share
22
44.0%
expiration_date
categorical
Out[471]:
saturn.columns["expiration_date"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
34
top_value
top_rate
0.3125
cardinality
34
entropy
4.364
entropy_ratio
0.8578
alert: long_tail
33 singleton categories
Fig 114.
Top values for expiration_date.
Show data table
Top values for expiration_date (20 unique shown, of 34 total).
Top values for ingredients_text_with_allergens.
Show data table
Top values for ingredients_text_with_allergens (20 unique shown, of 50 total).
value
count
share
milk cream, cream, sugar, banana, bacteria
1
2.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.
1
2.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.
1
2.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja
1
2.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).
1
2.0%
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
1
2.0%
Eau de source
1
2.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.
cocoa mass #, cane sugar #, cocoa butter #, vanilla extract #, may contain nuts, milk,
1
2.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب
1
2.0%
Farine de froment, sucre, graisse végétale, noix de coco râpée, poudre de lait, poudre de lactosérum, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.
1
2.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.
1
2.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, vanille.
Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.
1
2.0%
Kartoffeln, Sonnenblumenöl, Meersalz.
1
2.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique
1
2.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille
1
2.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.
Distribution of ingredients_with_specified_percent_sum. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_with_specified_percent_sum (median: 0.0).
bin
count
0 – 14.23
33
14.23 – 28.46
0
28.46 – 42.69
2
42.69 – 56.91
4
56.91 – 71.14
5
71.14 – 85.37
4
85.37 – 99.6
2
nutriscore_version
categorical
Out[486]:
saturn.columns["nutriscore_version"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
1
top_value
2023
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: imbalance
top value is 100.0% of rows
Fig 119.
Top values for nutriscore_version.
Show data table
Top values for nutriscore_version (1 unique shown, of 1 total).
value
count
share
2023
50
100.0%
lang
categorical
Out[489]:
saturn.columns["lang"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
5
top_value
fr
top_rate
0.7
cardinality
5
entropy
1.294
entropy_ratio
0.5572
Fig 120.
Top values for lang.
Show data table
Top values for lang (5 unique shown, of 5 total).
value
count
share
fr
35
70.0%
en
10
20.0%
de
3
6.0%
bg
1
2.0%
ro
1
2.0%
origins_hierarchy
unknown
Out[492]:
saturn.columns["origins_hierarchy"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
origins_lc
categorical
Out[494]:
saturn.columns["origins_lc"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
6
top_value
fr
top_rate
0.4792
cardinality
6
entropy
1.575
entropy_ratio
0.6093
Fig 121.
Top values for origins_lc.
Show data table
Top values for origins_lc (6 unique shown, of 6 total).
value
count
share
fr
23
46.0%
en
20
40.0%
es
2
4.0%
de
1
2.0%
it
1
2.0%
pl
1
2.0%
origin_it
categorical
other
This column appears to be an 'origin Italy' flag or similar origin/locale indicator, but it is effectively empty: 68% of its 50 rows are null, and the sole non-null value present is an empty string appearing 16 times. With cardinality of 1 and entropy of 0, the column carries zero information. The combination of high nulls and a blank-string-only value suggests the field was never populated in this dataset slice.
Treatment: Drop — zero variance and entirely unpopulated (null or empty string); contributes no signal to any downstream task.
Out[497]:
saturn.columns["origin_it"].stats
stat
value
n
50
nulls
34 (68.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
68.0% null
alert: imbalance
top value is 100.0% of rows
Fig 122.
Top values for origin_it.
Show data table
Top values for origin_it (1 unique shown, of 1 total).
value
count
share
16
32.0%
serving_quantity
categorical
Out[500]:
saturn.columns["serving_quantity"].stats
stat
value
n
50
nulls
6 (12.0%)
unique
27
top_value
100
top_rate
0.1591
cardinality
27
entropy
4.322
entropy_ratio
0.9089
alert: long_tail
21 singleton categories
Fig 123.
Top values for serving_quantity.
Show data table
Top values for serving_quantity (20 unique shown, of 27 total).
value
count
share
100
7
14.0%
10
7
14.0%
20
3
6.0%
25
2
4.0%
42
2
4.0%
30
2
4.0%
23
1
2.0%
11.5
1
2.0%
1000
1
2.0%
13.8
1
2.0%
11.4
1
2.0%
18
1
2.0%
50
1
2.0%
85
1
2.0%
36
1
2.0%
40
1
2.0%
45
1
2.0%
8.4
1
2.0%
7.143
1
2.0%
58
1
2.0%
checkers
unknown
Out[503]:
saturn.columns["checkers"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
editors_tags
unknown
Out[505]:
saturn.columns["editors_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
stores
categorical
Out[507]:
saturn.columns["stores"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
31
top_value
top_rate
0.2917
cardinality
31
entropy
4.233
entropy_ratio
0.8543
alert: long_tail
29 singleton categories
Fig 124.
Top values for stores.
Show data table
Top values for stores (20 unique shown, of 31 total).
Top values for compared_to_category.
Show data table
Top values for compared_to_category (20 unique shown, of 35 total).
value
count
share
en:dark-chocolate-bar-with-more-than-70-cocoa
5
10.0%
en:biscuits
4
8.0%
en:extra-fine-dark-chocolates
3
6.0%
en:dark-chocolates
3
6.0%
en:snacks-sucres
3
6.0%
en:sandwich-biscuits
2
4.0%
en:extruded-crispbreads
2
4.0%
en:plain-fermented-dairy-desserts-with-cream
1
2.0%
en:chocolate-stuffed-wafers
1
2.0%
en:spring-waters
1
2.0%
en:food
1
2.0%
en:drop-cookies
1
2.0%
en:shortbread-cookie-with-coconut
1
2.0%
en:biscuits-cookies-shelf-stable
1
2.0%
en:crispbreads
1
2.0%
fr:chips-de-pommes-de-terre-classiques
1
2.0%
en:dark-chocolate-bar
1
2.0%
en:cacao-et-derives
1
2.0%
en:crispbreads-wholemeal
1
2.0%
en:biscuit-snack-with-chocolate-filling
1
2.0%
generic_name_es
categorical
Out[529]:
saturn.columns["generic_name_es"].stats
stat
value
n
50
nulls
30 (60.0%)
unique
7
top_value
top_rate
0.65
cardinality
7
entropy
1.817
entropy_ratio
0.6471
alert: long_tail
5 singleton categories
alert: null_rate
60.0% null
Fig 130.
Top values for generic_name_es.
Show data table
Top values for generic_name_es (7 unique shown, of 7 total).
value
count
share
13
26.0%
Chocolate negro
2
4.0%
Chocolate negro con un 74% de cacao mínimo
1
2.0%
Crackers
1
2.0%
Tableta de chocolate negro extrafino con 70% de cacao
1
2.0%
Tableta de chocolate negro Ecuador con un 70% de cacao mínimo
1
2.0%
Chocolate Negro 99%
1
2.0%
correctors
unknown
Out[532]:
saturn.columns["correctors"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
additives_n
numeric
Out[534]:
saturn.columns["additives_n"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
8
min
0
max
8
mean
1.52
median
1
std
1.821
q1
0
q3
2
iqr
2
skew
1.473
kurtosis
2.105
n_outliers
2
outlier_rate
0.04
zero_rate
0.4
Fig 131.
Distribution of additives_n. Vertical dash marks the median.
Show data table
Histogram bins for additives_n (median: 1.0).
bin
count
0 – 1.143
29
1.143 – 2.286
11
2.286 – 3.429
3
3.429 – 4.571
3
4.571 – 5.714
2
5.714 – 6.857
1
6.857 – 8
1
ingredients_text_nb
categorical
Out[537]:
saturn.columns["ingredients_text_nb"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 132.
Top values for ingredients_text_nb.
Show data table
Top values for ingredients_text_nb (1 unique shown, of 1 total).
value
count
share
2
4.0%
ingredients_text_es
categorical
Out[540]:
saturn.columns["ingredients_text_es"].stats
stat
value
n
50
nulls
30 (60.0%)
unique
13
top_value
top_rate
0.4
cardinality
13
entropy
3.122
entropy_ratio
0.8437
alert: long_tail
12 singleton categories
alert: null_rate
60.0% null
Fig 133.
Top values for ingredients_text_es.
Show data table
Top values for ingredients_text_es (13 unique shown, of 13 total).
value
count
share
8
16.0%
Pasta de cacao, manteca de cacao, cacao magro en polvo, azúcar, vainilla.
1
2.0%
Azúcar, Grasa vegetal de palmiste parcialmente hidrogenada, Leche en polvo, Almendras, Cacao desgrasado en polvo, suero lácteo en polvo, Emulgente (lecitina de soja), aroma (vainilla).
1
2.0%
Crema de avellanas y cacao 40% (azúcar, manteca de palma, avellanas 13%, leche desnatada en polvo 8,7%, cacao desgrasado 7.4%, emulgentes (lecitinas (soja), vainillina), harina de trigo 32,5%, grasas vegetales (palma, palmiste), azúcar de caña 8,5% (trigo), lactosa, salvado de trigo, leche entera en polvo, extracto en polvo de malta de cebada y maíz, miel, gasificantes (difosfato disódico, carbonato ácido de sodio, carbonato ácido de amonio), cacao desgrasado, sal, almidón de trigo, harina de cebada, malteada, emulsionantes (lecitinas (soja), vainillina.
1
2.0%
70% pasta de cacao*, azúcar, rnanteca de cacao, cacao desgrasado en polvo, emulgente: lecitlna de girasol (E-322), aroma natural de vainilla. *Pasta de cacao Ralnforest Alliance Certified cocoa. Cacao: 74% mínimo.
1
2.0%
Harina de _TRIGO_, grasa de palma, extracto de malta de _CEBADA_, gasificantes (carbonatos de amonio, carbonatos de sodio), sal, _HUEVO_, aroma, agente de tratamiento de la harina (_METABISULFITO_ sódico).
1
2.0%
Pasta de cacao, azúcar, manteca de cacao, vainilla.
1
2.0%
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
1
2.0%
Copos de avena integral (60%),azúcar, aceite refinado de girasol, miel (3%), sal, melaza de caña, emulgente (lecitina de girasol), gasificante (carbonato ácido de sodio),
1
2.0%
Pasta de cacao, cacao magro, manteca de cacao, azúcar moreno de caña
pasta de cacao, azúcar, manteca de cacao, emulgente (lecitina de _soja_), vainilla. Cacao: 70% mínimo.
1
2.0%
Pasta de cacao, cacao desgrasado en polvo, manteca de cacao, azúcar, leche en polvo, pasta de almendras y avellanas, emulgentes (lecitinas de soja, girasol), aroma
1
2.0%
manufacturing_places_tags
unknown
Out[543]:
saturn.columns["manufacturing_places_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
origin
categorical
Out[545]:
saturn.columns["origin"].stats
stat
value
n
50
nulls
3 (6.0%)
unique
6
top_value
top_rate
0.8936
cardinality
6
entropy
0.7359
entropy_ratio
0.2847
alert: long_tail
5 singleton categories
Fig 134.
Top values for origin.
Show data table
Top values for origin (6 unique shown, of 6 total).
value
count
share
42
84.0%
Fabriqué par: Aachen Allemagne
1
2.0%
Germe de blé origine ue. Sésame origine non-ue.
1
2.0%
France
1
2.0%
fabriqué en France.pommes origine UE. noisettes origine UE et non UE
1
2.0%
Fabriqué en France par Nutrition et Santé. Farine de blé: France. Figues : non UE
1
2.0%
origins_old
categorical
Out[548]:
saturn.columns["origins_old"].stats
stat
value
n
50
nulls
11 (22.0%)
unique
9
top_value
top_rate
0.7949
cardinality
9
entropy
1.347
entropy_ratio
0.4251
alert: long_tail
8 singleton categories
alert: null_rate
22.0% null
Fig 135.
Top values for origins_old.
Show data table
Top values for origins_old (9 unique shown, of 9 total).
Top values for packaging_text_de (2 unique shown, of 2 total).
value
count
share
19
38.0%
1 Folie aus 22 PAP zum Recyclen
1
2.0%
languages
unknown
Out[554]:
saturn.columns["languages"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
categories_old
categorical
Out[556]:
saturn.columns["categories_old"].stats
stat
value
n
50
nulls
1 (2.0%)
unique
45
top_value
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits secs
top_rate
0.04082
cardinality
45
entropy
5.451
entropy_ratio
0.9926
alert: long_tail
41 singleton categories
Fig 137.
Top values for categories_old.
Show data table
Top values for categories_old (20 unique shown, of 45 total).
value
count
share
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits secs
2
4.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits
2
4.0%
Aliments et boissons à base de végétaux, Aliments d'origine végétale, Snacks, Céréales et pommes de terre, Pains, Tartines craquantes extrudées, Pains croustillants
2
4.0%
Snacks, Sweet snacks, Cocoa and its products, Chocolates, Dark chocolates
Boissons et préparations de boissons, Boissons, Eaux, Eaux de sources, Boissons sans sucre ajouté
1
2.0%
Snacks, Snacks sucrés, Confiseries, Succédanés du chocolat, en:Vegecaos
1
2.0%
Snacks, Sweet snacks, Cocoa and its products, Confectioneries, Chocolates, Dark chocolates
1
2.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, en:Biscuits et gâteaux, en:Snacks sucrés
1
2.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits sablés, Sablés à la noix de coco
1
2.0%
Botanas,Snacks dulces,Galletas y pasteles,Galletas,Galletas rellenas
1
2.0%
Produits laitiers, Produits fermentés, Produits laitiers fermentés, Snacks, Fromages, Snacks sucrés, Cacao et dérivés, Chocolats, Chocolats noirs, Chocolats noirs en tablette, Chocolat noir en tablette extra dégustation à 70% de cacao minimum
1
2.0%
Aliments et boissons à base de végétaux, Aliments d'origine végétale, Snacks, Céréales et pommes de terre, Snacks salés, Amuse-gueules, Chips et frites, Chips, Chips de pommes de terre, Chips de pommes de terre à l'huile de tournesol, en:Aliments d'origine végétale, en:Aliments et boissons à base de végétaux, en:Amuse-gueules, en:Chips, en:Chips de pommes de terre, en:Chips de pommes de terre classiques, en:Chips de pommes de terre à l'huile de tournesol, en:Chips et frites, en:Céréales et pommes de terre, en:Snacks salés
1
2.0%
Snacks, Snacks sucrés, Cacao et dérivés, Chocolats, Chocolats noirs, Chocolats noirs en tablette
1
2.0%
Snacks,Sweet snacks,Biscuits and cakes,Biscuits,Chocolate biscuits,Filled biscuits,Dark chocolate biscuits
1
2.0%
Snacks, Sweet snacks, Cocoa and its products, Chocolates, Dark chocolates, Cacao-et-derives, Chocolats, Chocolats-noirs, Chocolats-noirs-extra-fin
1
2.0%
Aliments et boissons à base de végétaux,Aliments d'origine végétale,Céréales et pommes de terre,Pains,Pains croustillants
This column, likely representing an origin financial institution or similar identifier, is almost entirely empty: 90% null rate with only 5 non-null rows across 50 records. Among those 5 non-null values, every single one is an empty string, meaning the column contains zero meaningful information—cardinality is 1, entropy is 0, and the sole 'value' is a blank.
Treatment: Drop this column entirely; it carries no information and is 100% effectively empty across all 50 rows.
Out[563]:
saturn.columns["origin_fi"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
90.0% null
alert: imbalance
top value is 100.0% of rows
Fig 138.
Top values for origin_fi.
Show data table
Top values for origin_fi (1 unique shown, of 1 total).
value
count
share
5
10.0%
packaging_old
categorical
Out[566]:
saturn.columns["packaging_old"].stats
stat
value
n
50
nulls
7 (14.0%)
unique
40
top_value
Plastique
top_rate
0.06977
cardinality
40
entropy
5.269
entropy_ratio
0.9901
alert: long_tail
38 singleton categories
Fig 139.
Top values for packaging_old.
Show data table
Top values for packaging_old (20 unique shown, of 40 total).
value
count
share
Plastique
3
6.0%
2
4.0%
Paquet, Etui en carton, Film en plastique
1
2.0%
Cardboard, Container, Packaging, Paperboard, Aluminium wrap, Caja de cartón, Box cardboard, Card-box, Foil-wrapper, pt:Papel de aluminio
Sachet, Sous atmosphère protectrice, en:mixed plastic-packet
1
2.0%
Paper, Film
1
2.0%
fr:emballage carton, fr:papier aluminium
1
2.0%
Film en plastique, Film plastique à jeter, Étui carton à recycler
1
2.0%
fr:Plastique,fr:Sachet plastique de 3g,en:mixed plastic-packet
1
2.0%
Papier, Enveloppe
1
2.0%
Papier
1
2.0%
Plastic
1
2.0%
Container, Caja de cartón, Aluminium-wrapper, Card-carton, pt:Papel de aluminio
1
2.0%
ingredients_text_fi
categorical
Out[569]:
saturn.columns["ingredients_text_fi"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
4
top_value
top_rate
0.4
cardinality
4
entropy
1.922
entropy_ratio
0.961
alert: long_tail
3 singleton categories
alert: null_rate
90.0% null
Fig 140.
Top values for ingredients_text_fi.
Show data table
Top values for ingredients_text_fi (4 unique shown, of 4 total).
value
count
share
2
4.0%
kaakaomassa, kaakaovoi, vähärasvainen kaakaojauhe, sokeri, vanilja. Saattaa sisältää hasselpähkinää, muita pähkinöitä, maitoa, soijaa. Tummassa suklaassa kaakaota vähintään 90%.
1
2.0%
kaakaomassa, vähärasvainen kaakaojauhe, kaakaovoi, sokeri, emulgointiaine (_soijalesitiini_), vaniljauute. Suklaassa kaakaota vähintään 85 %. Saattaa sisältää pieniä määriä pähkinää ja maitoa.
*Referentie inname van een gemiddelde volwassehe (8400 kJ/ 2000 ReJI), 16,7 g 46x4, www,snackmindful,com Milka www,milka,com ER Mondelez France SAS, 6 avenue Réaumur, CS 50014, 92142 Clamart Cedex, Service Consommateurs Nº Cristal:09,69,39,79,79 BE Mondelez Belgium, Stationsstraat 100, 2800 Mechelen, ND Mondelez Nederland, Verlengde Poolseweg 34, 4818 CL Breda, eu mondelezinternational,com e 100 g COCOA LIFE www,cocoalife,org 8 FR FRANCE ONLY 05 pp 3 045140 105502
Snacks,Breakfasts,Sweet snacks,Biscuits and cakes,Biscuits and crackers,Sandwich biscuits
1
2.0%
Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolats noirs en tablette,Chocolats noirs extra fin
1
2.0%
Snacks sucrés,Biscuits et gâteaux,Gaufrettes fourrées au chocolat
1
2.0%
Boissons et préparations de boissons,Boissons,Snacks,Eaux,Eaux de sources
1
2.0%
Snacks,Snacks sucrés,Biscuits et gâteaux,Biscuits
1
2.0%
Snacks,Sweet snacks,Cocoa and its products,Confectioneries,Chocolates,Compound chocolates,Food
1
2.0%
Snacks,Sweet snacks,Biscuits and cakes,Biscuits and crackers,Biscuits,Drop cookies
1
2.0%
Snacks,Snacks sucrés,Biscuits et gâteaux,Biscuits,Biscuits sablés,Sablés à la noix de coco
1
2.0%
Botanas,Snacks dulces,Galletas y pasteles,en:Biscuits and crackers,Galletas,en:Biscuits/Cookies (Shelf Stable),fr:Biscoitos recheados
1
2.0%
Aliments d'origine végétale,Snacks,Céréales et pommes de terre,Pains,Pains croustillants,Petit-déjeuners
1
2.0%
Produits fermentés,Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolats noirs en tablette,Chocolat noir en tablette extra dégustation à 70% de cacao minimum
1
2.0%
Plant-based foods and beverages,Plant-based foods,Snacks,Cereals and potatoes,Salty snacks,Appetizers,Chips and fries,Crisps,Potato crisps,Potato crisps in sunflower oil,fr:Chips de pommes de terre classiques
1
2.0%
Snacks,Snacks sucrés,Cacao et dérivés,Confiseries,Confiseries chocolatées,Chocolats,Chocolats noirs
1
2.0%
Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolats noirs en tablette
1
2.0%
Snacks, Sweet snacks, Biscuits and cakes, Biscuits and crackers, Biscuits, Chocolate biscuits, Filled biscuits, Dark chocolate biscuits, Sandwich biscuits
1
2.0%
Snacks,Sweet snacks,Cocoa and its products,Chocolates,Dark chocolates,Extra fine dark chocolates,Cacao-et-derives
1
2.0%
nutrition_grades_tags
unknown
Out[603]:
saturn.columns["nutrition_grades_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
nutriscore_2023_tags
unknown
Out[605]:
saturn.columns["nutriscore_2023_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
origin_ja
categorical
Out[607]:
saturn.columns["origin_ja"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 148.
Top values for origin_ja.
Show data table
Top values for origin_ja (1 unique shown, of 1 total).
value
count
share
1
2.0%
nutrition_score_debug
categorical
Out[610]:
saturn.columns["nutrition_score_debug"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
2
top_value
top_rate
0.98
cardinality
2
entropy
0.1414
entropy_ratio
0.1414
alert: imbalance
top value is 98.0% of rows
Fig 149.
Top values for nutrition_score_debug.
Show data table
Top values for nutrition_score_debug (2 unique shown, of 2 total).
Farine de maïs* (70%), farine de riz*, sel marin. * K issus de l'agriculture biologique. • sans sucres ajoutés(¹) (contient des sucres naturellement présents.
1
2.0%
misc_tags
unknown
Out[698]:
saturn.columns["misc_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
photographers_tags
unknown
Out[700]:
saturn.columns["photographers_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
packaging_materials_tags
unknown
Out[702]:
saturn.columns["packaging_materials_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
product_name_nl
categorical
Out[704]:
saturn.columns["product_name_nl"].stats
stat
value
n
50
nulls
38 (76.0%)
unique
7
top_value
top_rate
0.5
cardinality
7
entropy
2.292
entropy_ratio
0.8166
alert: long_tail
6 singleton categories
alert: null_rate
76.0% null
Fig 171.
Top values for product_name_nl.
Show data table
Top values for product_name_nl (7 unique shown, of 7 total).
Distribution of ingredients_with_specified_percent_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_with_specified_percent_n (median: 0.0).
bin
count
0 – 1.143
36
1.143 – 2.286
5
2.286 – 3.429
3
3.429 – 4.571
4
4.571 – 5.714
1
5.714 – 6.857
0
6.857 – 8
1
origin_nl
categorical
other
This column ('origin_nl') is a categorical field, likely a Dutch-language origin label or description, but it is effectively empty: 76% of the 50 rows are null, and the sole non-null value present is an empty string (''), appearing 12 times. With cardinality of 1, zero entropy, and a top_rate of 1.0 across only 12 non-null rows, the column carries no information whatsoever.
Treatment: Drop this column; it contains no usable signal (100% null or empty string across all 50 rows).
Out[723]:
saturn.columns["origin_nl"].stats
stat
value
n
50
nulls
38 (76.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
76.0% null
alert: imbalance
top value is 100.0% of rows
Fig 176.
Top values for origin_nl.
Show data table
Top values for origin_nl (1 unique shown, of 1 total).
Top values for packaging_text_en (5 unique shown, of 5 total).
value
count
share
39
78.0%
1 plastic bottle to recycle
1 plastic cap to recycle
1
2.0%
1 cardboard sleeve recyclable, 1 sheet of aluminium recyclable
1
2.0%
Terracycle. Please dispose of this pack responsibly. Find out more at www.terracycle.co.uk.
1
2.0%
cardboard (to recycle)
foil paper (to throw away)
1
2.0%
packaging_text_it
categorical
Out[753]:
saturn.columns["packaging_text_it"].stats
stat
value
n
50
nulls
34 (68.0%)
unique
3
top_value
top_rate
0.875
cardinality
3
entropy
0.6686
entropy_ratio
0.4218
alert: long_tail
2 singleton categories
alert: null_rate
68.0% null
Fig 184.
Top values for packaging_text_it.
Show data table
Top values for packaging_text_it (3 unique shown, of 3 total).
value
count
share
14
28.0%
Incarto esterno in carta da riciclare, Incarto interno in alluminio da riciclare.
1
2.0%
1 tubo C/PAP 85 da indifferenziata, 1 sigillo C/PAP 84 da indifferenziata, 1 tappo di plastica PP5 da riciclare.
1
2.0%
traces_tags
unknown
Out[756]:
saturn.columns["traces_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
brands_tags
unknown
Out[758]:
saturn.columns["brands_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
nutriscore_2021_tags
unknown
Out[760]:
saturn.columns["nutriscore_2021_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
packaging_text
categorical
Out[762]:
saturn.columns["packaging_text"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
13
top_value
top_rate
0.75
cardinality
13
entropy
1.708
entropy_ratio
0.4614
alert: long_tail
12 singleton categories
Fig 185.
Top values for packaging_text.
Show data table
Top values for packaging_text (13 unique shown, of 13 total).
value
count
share
36
72.0%
1 film en plastique à recycler
1 étui en papier ondulé à recycler
1
2.0%
carton, plastique
1
2.0%
1 bouchon en plastique à trier
1 bouteille en plastique à trier
1
2.0%
1 étui en carton à recycler
1 feuille en aluminium à recycler
1
2.0%
1 sachet plastique à jeter
1
2.0%
1 étui en carton à recycler
1 feuille en aluminium à recycler
1
2.0%
LE TRI +FACILE + BAC DE TRI
1
2.0%
4 FILMS PLASTIQUE A JETER
1 ÉTUI CARTON À RECYCLER
1
2.0%
cardboard (to recycle)
foil paper (to throw away)
1
2.0%
FR LE TRI + FACILE ÉTUI 8+ SACHETS BAC DE TRI A consommer de préférence avant le : en France par et Santé S.A.S. 10:02 11914538 112 eCastelnaudary REVEL 30 04 2024
1
2.0%
Sachet, clip à recycler
1
2.0%
2 sachets en plastique à recycler
1 boîte en carton à recycler
1
2.0%
popularity_key
numeric
identifier
This column appears to be a synthetic or encoded identifier rather than a true popularity metric — values cluster tightly in the 23.9–24.0 billion range, with a median of ~23,999,500,422 and a max of ~23,999,992,269, suggesting a fixed-prefix integer key scheme. The strong negative skew (−2.67) and high kurtosis (5.11) are driven by 5 outlier values that fall far below the cluster, near the minimum of ~22,999,500,355, which is about 1 billion lower than the bulk of records. Despite the name 'popularity_key', the distribution is inconsistent with any organic popularity signal and is almost certainly a generated or composite key.
Treatment: Treat as an opaque identifier; do not use as a numeric feature — investigate the 5 outlier records (~10% of data) for data integrity issues before joining or filtering.
Out[765]:
saturn.columns["popularity_key"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
49
min
2.3e+10
max
2.4e+10
mean
2.39e+10
median
2.4e+10
std
3.03e+08
q1
2.4e+10
q3
2.4e+10
iqr
4.002e+05
skew
-2.667
kurtosis
5.111
n_outliers
5
outlier_rate
0.1
zero_rate
0
alert: high_skew
skew=-2.67
alert: outliers
10.0% rows beyond 1.5 IQR
Fig 186.
Distribution of popularity_key. Vertical dash marks the median.
Show data table
Histogram bins for popularity_key (median: 23999500422.0).
bin
count
2.3e+10 – 2.314e+10
5
2.314e+10 – 2.329e+10
0
2.329e+10 – 2.343e+10
0
2.343e+10 – 2.357e+10
0
2.357e+10 – 2.371e+10
0
2.371e+10 – 2.386e+10
0
2.386e+10 – 2.4e+10
45
ingredients_text
categorical
Out[768]:
saturn.columns["ingredients_text"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
50
top_value
milk cream, cream, sugar, banana, bacteria
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1
alert: long_tail
50 singleton categories
Fig 187.
Top values for ingredients_text.
Show data table
Top values for ingredients_text (20 unique shown, of 50 total).
value
count
share
milk cream, cream, sugar, banana, bacteria
1
2.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.
1
2.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.
1
2.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja
1
2.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).
1
2.0%
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
1
2.0%
Eau de source
1
2.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.
cocoa mass #, cane sugar #, cocoa butter #, vanilla extract #, may contain nuts, milk,
1
2.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب
1
2.0%
Farine de _froment_, sucre, graisse végétale, noix de coco râpée, poudre de _lait_, poudre de _lactosérum_, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.
1
2.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.
1
2.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, vanille.
Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.
1
2.0%
Kartoffeln, Sonnenblumenöl, Meersalz.
1
2.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique
1
2.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille
1
2.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.
Top values for ingredients_text_with_allergens_fr.
Show data table
Top values for ingredients_text_with_allergens_fr (20 unique shown, of 47 total).
value
count
share
2
4.0%
Lait écrémé, crème, SUcre, ferments laciques
1
2.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.
1
2.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.
1
2.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja
1
2.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).
1
2.0%
Pâte de cacao, cacao maigre en poudre, beurre de cacao, sucre, émulsifiant : lécithines (soja) ; extrait de vanille. Traces éventuelles de fruits à coque et de lait.
1
2.0%
Eau de source
1
2.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.
1
2.0%
Sucre, graisse vegetale de palmiste hidrogenée, Lait Enteir en poudre, Amandes, Cacao Dégraissé en poudre, lactoserum en poudre, Emulsifiant Lécithine de soja, Arômes (Vainilline).
1
2.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب
1
2.0%
Farine de froment, sucre, graisse végétale, noix de coco râpée, poudre de lait, poudre de lactosérum, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.
1
2.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.
1
2.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, vanille.
Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.
1
2.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique
1
2.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille
1
2.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, cacao maigre en poudre, émulsifiant : lécithines (soja), arôme naturel de vanille.
1
2.0%
Farine complète de SEIGLE 59 g*, son de BLÉ 27 g*, flocons d'AVOINE 12 g*, GRAINES DE SÉSAME 7,0 g*, germe de BLÉ, sel. *en g pour 100 g de produit fini. Peut contenir des traces de LUPIN, LAIT, MOUTARDE et SOJA.
1
2.0%
ingredients_text_nl
categorical
Out[774]:
saturn.columns["ingredients_text_nl"].stats
stat
value
n
50
nulls
38 (76.0%)
unique
9
top_value
top_rate
0.3333
cardinality
9
entropy
2.918
entropy_ratio
0.9206
alert: long_tail
8 singleton categories
alert: null_rate
76.0% null
Fig 189.
Top values for ingredients_text_nl.
Show data table
Top values for ingredients_text_nl (9 unique shown, of 9 total).
*Referentie inname van een gemiddelde volwassehe (8400 kJ/ 2000 ReJI), 16,7 g 46x4, www,snackmindful,com Milka www,milka,com ER Mondelez France SAS, 6 avenue Réaumur, CS 50014, 92142 Clamart Cedex, Service Consommateurs Nº Cristal:09,69,39,79,79 BE Mondelez Belgium, Stationsstraat 100, 2800 Mechelen, ND Mondelez Nederland, Verlengde Poolseweg 34, 4818 CL Breda, eu mondelezinternational,com e 100 g COCOA LIFE www,cocoalife,org 8 FR FRANCE ONLY 05 pp 3 045140 105502
Top values for product_name_es (17 unique shown, of 17 total).
value
count
share
4
8.0%
Príncipe Galletas de Chocolate
1
2.0%
Excellence chocolate 90% cacao
1
2.0%
Chocolate negro 85% cacao
1
2.0%
Nutella Biscuits
1
2.0%
Biscotes integrales original
1
2.0%
Excellence 85% cacao
1
2.0%
Chocolate negro 74% cacao
1
2.0%
Tostadas crujientes de fibra
1
2.0%
Original
1
2.0%
Excellence 70% Cocoa Intense Dark
1
2.0%
Chocolate negro Ecuador 70% cacao
1
2.0%
Nutella
1
2.0%
Crunchy Oats & Honey
1
2.0%
Excellence 99% Cacao Noir Absolu
1
2.0%
Chocolate Con Leche Milka
1
2.0%
Excellence suave 70% cacao
1
2.0%
data_sources_tags
unknown
Out[780]:
saturn.columns["data_sources_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
data_quality_bugs_tags
unknown
Out[782]:
saturn.columns["data_quality_bugs_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
obsolete_since_date
categorical
Out[784]:
saturn.columns["obsolete_since_date"].stats
stat
value
n
50
nulls
6 (12.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: imbalance
top value is 100.0% of rows
Fig 191.
Top values for obsolete_since_date.
Show data table
Top values for obsolete_since_date (1 unique shown, of 1 total).
value
count
share
44
88.0%
weighers_tags
unknown
Out[787]:
saturn.columns["weighers_tags"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
ingredients_text_debug
categorical
Out[789]:
saturn.columns["ingredients_text_debug"].stats
stat
value
n
50
nulls
14 (28.0%)
unique
35
top_value
top_rate
0.05556
cardinality
35
entropy
5.114
entropy_ratio
0.9971
alert: long_tail
34 singleton categories
alert: null_rate
28.0% null
Fig 192.
Top values for ingredients_text_debug.
Show data table
Top values for ingredients_text_debug (20 unique shown, of 35 total).
value
count
share
2
4.0%
Lait écrémé, créme, sucre, ferments lactiques. matière grosse 3% , sa première date de publication au maroc 01/10/1993 le changement du packaging 10 ans par 10 ans depuis vingt-cinq ans de l’offre
1
2.0%
Céréale 50,7 % (farine de blé 35 %, farine de blé complète 15,7 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudre à lever : (carbonate acide d'ammonium, carbonate acide de sodium, diphosphate disodique), émulsifiants : (lécithine de soja, lécithine de tournesol), sel, lait écrémé en poudre, lactose et protéines de lait, arômes.
1
2.0%
Pâte de cacao, beurre de cacao, cacao maige, sucre, vanille. Cacao: 90% minimum.
1
2.0%
Farine de blé 55,1%, sucre de canne roux, huile de colza 14,3%, sésame toasté 11,6%, germe de blé 5,2%, levain de seigle dévitalisé en poudre, fibres d'avoine, calcium, sel de mer, arôme naturel, magnésium, émulsifiant : lécithines de colza, poudres à lever : (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), acidifiant : acide malique, protéines de lait, amidon de blé, vitamines B1, B6, B9, PP et E (lactose, protéines de lait).
1
2.0%
Eau de source
1
2.0%
Farine de froment sucre, graisse végétale ,sucre inverti, agents levants ( bicarbonate d'ammonium-bicarbonate de sodium, sel , arome. Contient du gluten Peut contenir traces de lait et soja. Conserver dans un endroit frais et sec
1
2.0%
Sucre, graisse végétale de palmiste hydrogénée, _Lait_ entier en poudre, Amandes, Cacao dégraissé en poudre, _lactosérum_ en poudre, Émulsifiant : Lécithine de _soja_, Arômes (Vanilline).
1
2.0%
Pâte à tartiner aux _noisettes_ et au cacao 40% (sucre, huile de palme, _noisettes_ 13%, _lait_ écrémé en poudre 8,7%, cacao maigre 7,4%, émulsifiants : lécithines _soja_ ; vanilline), farine de _froment_ 32%, graisses végétales (palme, palmiste), sucre de canne 9%, _lactose_, son de _blé_, _lait_ en poudre, extrait en poudre de malt d'orge et de maïs, miel, poudres à lever : (disphosfate disodique, carbonate acide d'ammonium, carbonate acide de sodium), cacao maigre, sel, amidon de _froment_, farine d'_orge_ malté, lécithines _soja_ ; vanilline.
1
2.0%
Farine complète de _seigle_, farine de _seigle_ 29%, levure, sel.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, vanille.
1
2.0%
Pomme de terre, huile de tournesol, sel de mer.
1
2.0%
pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille.
1
2.0%
Céréales 54%(*farine de _blé_, *farine complète de _blé_ (15%)), *chocolat noir (25%) (*pâte de cacao, *sucre de canne non raffiné, *beurre de cacao), *sucre de canne roux non raffiné, *huile de tournesol oléique (9,7%), arôme naturel de vanille, *_lait_ écrémé en poudre, sel de mer non raffiné, poudres à lever : carbonates d'ammonium et de sodium, épaississant : *gomme d'acacia, antioxydant : , *extraits de romarin.
en gras peuvent provoquer yne réaction chez tes personnes souffrant d'allergies d'intolérahces alimentaires. en g pour 100g de produit. ou 4
1
2.0%
farine de froment, sucre, Graisse végétale , Sucre inverti, Agents levants (Bicarbonate d'ammonium, Bicarbonate de sodium), arôme vanille
1
2.0%
Farine de _Blé_ 73.5 %, matière grasse végétale,extrait de malt d'_orge_, sirop de glucose, sel, poudre à lever : (carbonate acide d’ammonium, carbonate acide de sodium), _œufs_, agent de traitement de la farine : (_sulfite_ de sodium_), arôme
1
2.0%
Pasta de cacao, azúcar, manteca de cacao, vainilla Bourbon natural. (Cacao: 70% mínimo)
1
2.0%
Farine de _blé_ 68,4%, huile de colza, sirop de sucres issu de fruits, jus concentré de pomme 5,3%, _noisettes_ torréfiées 5,3%, germe de _blé_ 5,2%, fibres de chicorée : fructo-oligosaccharides, extrait de malt d'_orge_, arôme naturel de pomme, émulsifiant : lécithines de colza, amidon de _blé_, poudres à lever : (tartrates de potassium, carbonates de potassium, carbonates d‘ammonium), protéines de _lait_, vitamines B1, B2, B6, B9, PP et E (_lactose_, protéines de _lait_).
1
2.0%
link
categorical
Out[792]:
saturn.columns["link"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
28
top_value
top_rate
0.4375
cardinality
28
entropy
3.663
entropy_ratio
0.762
alert: long_tail
27 singleton categories
Fig 193.
Top values for link.
Show data table
Top values for link (20 unique shown, of 28 total).
Distribution of created_t. Vertical dash marks the median.
Show data table
Histogram bins for created_t (median: 1475927880.5).
bin
count
1.338e+09 – 1.393e+09
13
1.393e+09 – 1.448e+09
8
1.448e+09 – 1.503e+09
8
1.503e+09 – 1.558e+09
9
1.558e+09 – 1.614e+09
7
1.614e+09 – 1.669e+09
3
1.669e+09 – 1.724e+09
2
ingredients_text_fr
categorical
Out[798]:
saturn.columns["ingredients_text_fr"].stats
stat
value
n
50
nulls
2 (4.0%)
unique
47
top_value
top_rate
0.04167
cardinality
47
entropy
5.543
entropy_ratio
0.998
alert: long_tail
46 singleton categories
Fig 195.
Top values for ingredients_text_fr.
Show data table
Top values for ingredients_text_fr (20 unique shown, of 47 total).
value
count
share
2
4.0%
Lait écrémé, crème, SUcre, ferments laciques
1
2.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.
1
2.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.
1
2.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja
1
2.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).
1
2.0%
Pâte de cacao, cacao maigre en poudre, beurre de cacao, sucre, émulsifiant : lécithines (soja) ; extrait de vanille. Traces éventuelles de fruits à coque et de lait.
1
2.0%
Eau de source
1
2.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.
1
2.0%
Sucre, graisse vegetale de palmiste hidrogenée, Lait Enteir en poudre, Amandes, Cacao Dégraissé en poudre, lactoserum en poudre, Emulsifiant Lécithine de soja, Arômes (Vainilline).
1
2.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب
1
2.0%
Farine de _froment_, sucre, graisse végétale, noix de coco râpée, poudre de _lait_, poudre de _lactosérum_, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.
1
2.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.
1
2.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, vanille.
Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.
1
2.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique
1
2.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille
1
2.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.
1
2.0%
Pâte de cacao, sucre, beurre de cacao, cacao maigre en poudre, émulsifiant : lécithines (_soja_), arôme naturel de vanille.
1
2.0%
Farine complète de SEIGLE 59 g*, son de BLÉ 27 g*, flocons d'AVOINE 12 g*, GRAINES DE SÉSAME 7,0 g*, germe de BLÉ, sel. *en g pour 100 g de produit fini. Peut contenir des traces de LUPIN, LAIT, MOUTARDE et SOJA.
Fair trade, Organic, Vegetarian, EU Organic, Fairtrade International, Vegan, Soil Association Organic, The Vegan Society, Commerce équitable
1
2.0%
Point Vert, Non-bio, Triman
1
2.0%
Sans conservateurs, Fabriqué en France, Triman
1
2.0%
Sans gluten, Végétarien, Sans arômes artificiels, Végétalien, Assured Food Standards, Point Vert, Sans colorants artificiels, Sans exhausteur de goût, Sans glutamate, en:Made-in-england, en:Terracycle
1
2.0%
Organic, Vegetarian, EU Organic, Fair trade, Non-EU Agriculture, Vegan, Fairtrade International, FR-BIO-01, FSC, FSC Mix, Green Dot, Max Havelaar, PL-EKO-07, Soil Association Organic, The Vegan Society
1
2.0%
Agriculture non UE, Fabriqué en Belgique, Fabriqué en France, Sans huile de palme, Triman
Source de fibres alimentaires,Point Vert,Riche en fibres,Triman,Emballage-recyclable
1
2.0%
Halal
1
2.0%
Vegetariano,Vegano,Punto Verde
1
2.0%
Commerce équitable, Sans gluten, Bio, Végétarien, Épi barré, Bio européen, Kascher, Végétalien, Point Vert, Fabriqué en France, Nutriscore, Nutriscore A, The Vegan Society, AB Agriculture Biologique, Afdiag
1
2.0%
Peu ou pas de sucre, Peu de sucre, Pauvre ou sans sodium, Sans conservateurs, Agriculture non UE, Allégé en sucre, Riche en vitamine E, Source de fibres alimentaires, Agriculture durable, Enrichi en vitamines, Agriculture UE, Agriculture UE/Non UE, Riche en fibres, Faible teneur en sodium, Fabriqué en France, Arômes naturels, Sans colorants, Sans colorants ou conservateurs, Sans huile de palme, Nutriscore, Nutriscore A, Riche en vitamine B1, Riche en vitamine B9, Source de vitamine B6, Sans édulcorants, Farine de blé français, Triman
This column likely represents a count of scans per record (e.g., barcode or document scans), with 50 records and no nulls. The bulk of values sit in a moderate range (Q1=387, median=492, Q3=604), but extreme positive skew (3.90) and very high kurtosis (18.72) are driven by 4 outliers (8% of rows) reaching up to 2523 — more than 4× the median. The min of 333 suggests a natural floor, possibly a minimum scan threshold or truncation artefact.
Treatment: Investigate the 4 outliers before modelling; apply log-transform or robust scaling to reduce skew impact in regression or distance-based models.
Out[836]:
saturn.columns["scans_n"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
49
min
333
max
2,523
mean
577.9
median
492
std
343.9
q1
387
q3
604
iqr
217
skew
3.899
kurtosis
18.72
n_outliers
4
outlier_rate
0.08
zero_rate
0
alert: high_skew
skew=+3.90
alert: outliers
8.0% rows beyond 1.5 IQR
Fig 205.
Distribution of scans_n. Vertical dash marks the median.
Show data table
Top values for carbon_footprint_from_known_ingredients_debug.
Show data table
Top values for carbon_footprint_from_known_ingredients_debug (14 unique shown, of 14 total).
value
count
share
en:cereal 50% x 0.3 = 15 g -
1
2.0%
en:wheat-flour 55.1% x 1.2 = 66.12 g -
1
2.0%
en:wheat-flour 32% x 1.2 = 38.4 g - en:cane-sugar 9% x 1.3 = 11.7 g -
1
2.0%
en:wholemeal-rye-flour 77% x 1.2 = 92.4 g - en:rye-flour 28% x 1.2 = 33.6 g -
1
2.0%
en:wheat-flour 39% x 1.2 = 46.8 g - en:dark-chocolate 25% x 4.9 = 122.5 g - en:whole-wheat-flour 15% x 1.2 = 18 g -
1
2.0%
en:wholemeal-rye-flour 59% x 1.2 = 70.8 g - en:wheat-bran 27% x 0.6 = 16.2 g - en:oat-flakes 12% x 0.3 = 3.6 g -
1
2.0%
en:wheat-flour 68.5% x 1.2 = 82.2 g - en:wheat-germ 5.2% x 0.6 = 3.12 g -
1
2.0%
en:hazelnut-oil 13% x 2.6 = 33.8 g -
1
2.0%
en:whole-wheat-flour 26.5% x 1.2 = 31.8 g - en:wheat-flour 26.1% x 1.2 = 31.32 g - en:wheat-bran 19.9% x 0.6 = 11.94 g - en:fig-paste 5.1% x 0.3 = 1.53 g -
1
2.0%
en:wheat-flour 41% x 1.2 = 49.2 g - en:fresh-egg 11% x 2.6 = 28.6 g -
1
2.0%
en:walnut-kernel 25% x 1.3 = 32.5 g - en:almond 25% x 5.9 = 147.5 g - en:cranberry 25% x 0.3 = 7.5 g -
1
2.0%
en:whole-fresh-eggs 8% x 2.6 = 20.8 g -
1
2.0%
en:wheat-flour 37% x 1.2 = 44.4 g - en:milk-chocolate 27% x 5.9 = 159.3 g - en:whole-wheat-flour 12% x 1.2 = 14.4 g -
1
2.0%
en:cereal 98.3% x 0.3 = 29.49 g -
1
2.0%
packaging_text_ar
categorical
metadata
This column appears to hold Arabic-language packaging text, but it is effectively empty: 80% of the 50 rows are null, and the remaining 10 non-null rows contain only an empty string — giving a single unique value with top_rate of 1.0 and zero entropy. The column carries no information whatsoever in this dataset snapshot.
Treatment: Drop this column; it contains no usable signal (100% null or empty string across all rows).
Out[858]:
saturn.columns["packaging_text_ar"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
80.0% null
alert: imbalance
top value is 100.0% of rows
Fig 211.
Top values for packaging_text_ar.
Show data table
Top values for packaging_text_ar (1 unique shown, of 1 total).
value
count
share
10
20.0%
generic_name_uk
categorical
Out[861]:
saturn.columns["generic_name_uk"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 212.
Top values for generic_name_uk.
Show data table
Top values for generic_name_uk (1 unique shown, of 1 total).
value
count
share
1
2.0%
last_checker
categorical
Out[864]:
saturn.columns["last_checker"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
4
top_value
aleene
top_rate
0.4286
cardinality
4
entropy
1.842
entropy_ratio
0.9212
alert: null_rate
86.0% null
Fig 213.
Top values for last_checker.
Show data table
Top values for last_checker (4 unique shown, of 4 total).
value
count
share
aleene
3
6.0%
moon-rabbit
2
4.0%
beniben
1
2.0%
sebleouf
1
2.0%
checked
categorical
feature
This column appears to be a binary checkbox field (HTML-style 'on'/'off'), but only the value 'on' is ever recorded — cardinality is 1 with 'on' appearing in all 7 non-null rows. The 86% null rate is the dominant signal: nulls almost certainly represent unchecked state rather than missing data, meaning the column encodes a boolean with an unconventional null-as-false convention. Zero entropy confirms complete absence of variation among non-null values.
Treatment: Recode nulls as 0 and 'on' as 1 to produce a proper boolean/integer column before modelling.
Out[867]:
saturn.columns["checked"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
1
top_value
on
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
86.0% null
alert: imbalance
top value is 100.0% of rows
Fig 214.
Top values for checked.
Show data table
Top values for checked (1 unique shown, of 1 total).
value
count
share
on
7
14.0%
product_name_ar
categorical
Out[870]:
saturn.columns["product_name_ar"].stats
stat
value
n
50
nulls
39 (78.0%)
unique
6
top_value
top_rate
0.5455
cardinality
6
entropy
2.049
entropy_ratio
0.7928
alert: long_tail
5 singleton categories
alert: null_rate
78.0% null
Fig 215.
Top values for product_name_ar.
Show data table
Top values for product_name_ar (6 unique shown, of 6 total).
Distribution of carbon_footprint_percent_of_known_ingredients. Vertical dash marks the median.
Show data table
Histogram bins for carbon_footprint_percent_of_known_ingredients (median: 70.0).
bin
count
8 – 27.4
3
27.4 – 46.8
2
46.8 – 66.2
3
66.2 – 85.6
8
85.6 – 105
3
origin_ar
categorical
other
This column appears to be an Arabic-language origin field ('origin_ar') that is almost entirely empty. With an 80% null rate and cardinality of 1, the sole 'unique' value is itself an empty string appearing 10 times across 50 rows — meaning the column contains no actual data at all. This is a fully degenerate column with zero informational content.
Treatment: Drop — column carries no information (100% null or empty string, entropy 0.0).
Out[891]:
saturn.columns["origin_ar"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
80.0% null
alert: imbalance
top value is 100.0% of rows
Fig 222.
Top values for origin_ar.
Show data table
Top values for origin_ar (1 unique shown, of 1 total).
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 225.
Top values for ingredients_text_with_allergens_sl.
Show data table
Top values for ingredients_text_with_allergens_sl (1 unique shown, of 1 total).
value
count
share
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
1
2.0%
packaging_text_sk
categorical
Out[909]:
saturn.columns["packaging_text_sk"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 226.
Top values for packaging_text_sk.
Show data table
Top values for packaging_text_sk (1 unique shown, of 1 total).
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1
alert: long_tail
3 singleton categories
alert: null_rate
94.0% null
Fig 227.
Top values for ingredients_text_with_allergens_bg.
Show data table
Top values for ingredients_text_with_allergens_bg (3 unique shown, of 3 total).
value
count
share
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
1
2.0%
1
2.0%
Захар, палмово масло, ЛЕШНИЦИ (13%), обезмаслено МЛЯКО на прах (8,7%), нискомаслено какао на прах (7,4%), емулгатор: лецитини (СОЯ), ванилин.
1
2.0%
ingredients_text_pt
categorical
Out[915]:
saturn.columns["ingredients_text_pt"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
4
top_value
top_rate
0.7
cardinality
4
entropy
1.357
entropy_ratio
0.6784
alert: long_tail
3 singleton categories
alert: null_rate
80.0% null
Fig 228.
Top values for ingredients_text_pt.
Show data table
Top values for ingredients_text_pt (4 unique shown, of 4 total).
value
count
share
7
14.0%
Creme para barrar de AVELAS e cacau 40% (açúcar, gordura de palma, AVELAS (13%), LEITE desnatado em pó (8,7%), cacau magro (7,4%), emulsionantes: lecitinas (SOJA), vanilina), farinha de TRIGO (32,5%), gorduras vegetais (palma, palmiste), açúcar de cana (contém TRIGO) (8,5%), LACTOSE, farelo de TRIGO, LEITE inteiro em pó, mel, levedantes químicos (difosfato dissódico, hidrogenocarbonato de sódio, hidrogenocarbonato de amónio), farinha de CEVADA maltada, cacau magro, sal, extrato em pó de malte de CEVADA e milho, amido de TRIGO, emulsionantes: lecitinas (SOJA), vanilina.
1
2.0%
Farinha de _TRIGO_, gordura de palma, xarope de glucose, extrato de _CEVADA_ malteada, levedantes (carbonatos de amónio, carbonatos de sódio), sal, _OVOS_, aroma, agente de tratamento da farinha (_METABISSULFITO_ de sódio).
1
2.0%
Pasta de cacau, açúcar, manteiga de cacau, baunilha.
1
2.0%
ingredients_text_dz
categorical
Out[918]:
saturn.columns["ingredients_text_dz"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 229.
Top values for ingredients_text_dz.
Show data table
Top values for ingredients_text_dz (1 unique shown, of 1 total).
value
count
share
1
2.0%
generic_name_ca
categorical
Out[921]:
saturn.columns["generic_name_ca"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 230.
Top values for generic_name_ca.
Show data table
Top values for generic_name_ca (1 unique shown, of 1 total).
value
count
share
2
4.0%
generic_name_bg
categorical
label
This column appears to be a Bulgarian-language generic name field (likely a pharmaceutical or product name localization), but it is almost entirely absent: 94% of rows are null and the remaining 3 non-null rows contain only an empty string. With cardinality of 1 and entropy of 0, the column carries zero information.
Treatment: Drop this column; it is 94% null and the only observed value is an empty string, making it analytically useless.
Out[924]:
saturn.columns["generic_name_bg"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 231.
Top values for generic_name_bg.
Show data table
Top values for generic_name_bg (1 unique shown, of 1 total).
value
count
share
3
6.0%
origin_sl
categorical
Out[927]:
saturn.columns["origin_sl"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 232.
Top values for origin_sl.
Show data table
Top values for origin_sl (1 unique shown, of 1 total).
value
count
share
1
2.0%
product_name_et
categorical
Out[930]:
saturn.columns["product_name_et"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
3
top_value
Chocolat noir - 85% cacao
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1
alert: long_tail
3 singleton categories
alert: null_rate
94.0% null
Fig 233.
Top values for product_name_et.
Show data table
Top values for product_name_et (3 unique shown, of 3 total).
value
count
share
Chocolat noir - 85% cacao
1
2.0%
1
2.0%
Excellence 70% Cocoa Intense Dark
1
2.0%
origin_et
categorical
metadata
This column appears to be an origin or source tag in Amharic/Ethiopic script (indicated by the '_et' suffix), but it is effectively empty: 94% of the 50 rows are null, and the sole non-null value present is an empty string appearing 3 times. With cardinality of 1 and entropy of 0.0, the column carries zero information. This is likely an unfilled localization or metadata field.
Treatment: Drop this column; it contains no usable signal (94% null, sole value is empty string).
Out[933]:
saturn.columns["origin_et"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 234.
Top values for origin_et.
Show data table
Top values for origin_et (1 unique shown, of 1 total).
Distribution of nutrition_score_warning_nutriments_estimated. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_nutriments_estimated (median: 1.0).
bin
count
0.5 – 0.7
0
0.7 – 0.9
0
0.9 – 1.1
2
1.1 – 1.3
0
1.3 – 1.5
0
ingredients_text_sk
categorical
Out[945]:
saturn.columns["ingredients_text_sk"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 238.
Top values for ingredients_text_sk.
Show data table
Top values for ingredients_text_sk (1 unique shown, of 1 total).
value
count
share
1
2.0%
generic_name_pt
categorical
Out[948]:
saturn.columns["generic_name_pt"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
3
top_value
top_rate
0.8
cardinality
3
entropy
0.9219
entropy_ratio
0.5817
alert: long_tail
2 singleton categories
alert: null_rate
80.0% null
Fig 239.
Top values for generic_name_pt.
Show data table
Top values for generic_name_pt (3 unique shown, of 3 total).
value
count
share
8
16.0%
Bolachas recheadas de creme para barrar de avelãs e cacau NUTELLA®
1
2.0%
Chocolate extrafino com 70% de cacau
1
2.0%
ingredients_text_bg
categorical
Out[951]:
saturn.columns["ingredients_text_bg"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
3
top_value
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1
alert: long_tail
3 singleton categories
alert: null_rate
94.0% null
Fig 240.
Top values for ingredients_text_bg.
Show data table
Top values for ingredients_text_bg (3 unique shown, of 3 total).
value
count
share
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
1
2.0%
1
2.0%
Захар, палмово масло, ЛЕШНИЦИ (13%), обезмаслено МЛЯКО на прах (8,7%), нискомаслено какао на прах (7,4%), емулгатор: лецитини (СОЯ), ванилин.
1
2.0%
packaging_text_et
categorical
free_text
This column contains Estonian-language packaging text (`_et` locale suffix), but is effectively empty: 94% of its 50 rows are null, and the sole non-null value across all 3 populated rows is an empty string. With cardinality of 1 and entropy of 0.0, the column carries zero information — it has never been populated in this dataset.
Treatment: Drop — 94% null rate and only empty-string values provide no usable signal.
Out[954]:
saturn.columns["packaging_text_et"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 241.
Top values for packaging_text_et.
Show data table
Top values for packaging_text_et (1 unique shown, of 1 total).
value
count
share
3
6.0%
product_name_sk
categorical
Out[957]:
saturn.columns["product_name_sk"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 242.
Top values for product_name_sk.
Show data table
Top values for product_name_sk (1 unique shown, of 1 total).
value
count
share
1
2.0%
ingredients_text_ca
categorical
Out[960]:
saturn.columns["ingredients_text_ca"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 243.
Top values for ingredients_text_ca.
Show data table
Top values for ingredients_text_ca (1 unique shown, of 1 total).
Top values for ingredients_text_with_allergens_ca.
Show data table
Top values for ingredients_text_with_allergens_ca (1 unique shown, of 1 total).
value
count
share
1
2.0%
product_name_dz
categorical
Out[966]:
saturn.columns["product_name_dz"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 245.
Top values for product_name_dz.
Show data table
Top values for product_name_dz (1 unique shown, of 1 total).
value
count
share
1
2.0%
product_name_sl
categorical
Out[969]:
saturn.columns["product_name_sl"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
ARRIBA 85% cacao
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 246.
Top values for product_name_sl.
Show data table
Top values for product_name_sl (1 unique shown, of 1 total).
value
count
share
ARRIBA 85% cacao
1
2.0%
origin_sk
categorical
Out[972]:
saturn.columns["origin_sk"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 247.
Top values for origin_sk.
Show data table
Top values for origin_sk (1 unique shown, of 1 total).
value
count
share
1
2.0%
generic_name_et
categorical
label
This column appears to be an Estonian-language generic name field ('et' locale suffix), but it is effectively empty: 94% of its 50 rows are null, and the sole non-null value is a blank string appearing 3 times, giving a cardinality of 1. The column carries zero information — entropy is 0.0 and top_rate is 1.0 across a single empty token.
Treatment: Drop this column; it contains no usable data (94% null, remaining values are blank strings).
Out[975]:
saturn.columns["generic_name_et"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 248.
Top values for generic_name_et.
Show data table
Top values for generic_name_et (1 unique shown, of 1 total).
Top values for packaging_text_ca (1 unique shown, of 1 total).
value
count
share
2
4.0%
packaging_text_sl
categorical
Out[984]:
saturn.columns["packaging_text_sl"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 251.
Top values for packaging_text_sl.
Show data table
Top values for packaging_text_sl (1 unique shown, of 1 total).
value
count
share
1
2.0%
generic_name_dz
categorical
Out[987]:
saturn.columns["generic_name_dz"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 252.
Top values for generic_name_dz.
Show data table
Top values for generic_name_dz (1 unique shown, of 1 total).
value
count
share
1
2.0%
origin_ca
categorical
Out[990]:
saturn.columns["origin_ca"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 253.
Top values for origin_ca.
Show data table
Top values for origin_ca (1 unique shown, of 1 total).
value
count
share
2
4.0%
product_name_ca
categorical
Out[993]:
saturn.columns["product_name_ca"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 254.
Top values for product_name_ca.
Show data table
Top values for product_name_ca (1 unique shown, of 1 total).
value
count
share
2
4.0%
packaging_text_pt
categorical
free_text
This column appears to be a Portuguese-language packaging text field, almost certainly intended to carry product label or packaging descriptions. With an 80% null rate and the sole non-null value being an empty string appearing 10 times, the column contains zero usable information across all 50 rows. The effective data-present rate is 0%, making this column entirely empty in practice.
Treatment: Drop this column; it carries no information and all present values are empty strings.
Out[996]:
saturn.columns["packaging_text_pt"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
80.0% null
alert: imbalance
top value is 100.0% of rows
Fig 255.
Top values for packaging_text_pt.
Show data table
Top values for packaging_text_pt (1 unique shown, of 1 total).
value
count
share
10
20.0%
origin_bg
categorical
other
This column ('origin_bg') is a categorical field with 50 rows, but 94% of values are null and the sole non-null value is an empty string appearing 3 times — making it entirely devoid of usable information. Cardinality is 1, entropy is 0, and top_rate is 1.0, confirming complete uniformity across non-null entries. Both alerts (null_rate and imbalance) are triggered, which is expected given the near-total absence of data.
Treatment: Drop this column; it carries zero information with 94% nulls and only empty strings remaining.
Out[999]:
saturn.columns["origin_bg"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 256.
Top values for origin_bg.
Show data table
Top values for origin_bg (1 unique shown, of 1 total).
value
count
share
3
6.0%
packaging_text_bg
categorical
free_text
This column contains Bulgarian-language packaging text for products, but it is almost entirely empty: 94% of the 50 rows are null, and the sole non-null value observed is an empty string appearing 3 times (top_rate 1.0). With cardinality of 1 and entropy of 0.0, the column carries zero information in its current state.
Treatment: Drop from modelling; re-evaluate only if Bulgarian market data is backfilled, otherwise exclude as zero-variance.
Out[1002]:
saturn.columns["packaging_text_bg"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 257.
Top values for packaging_text_bg.
Show data table
Top values for packaging_text_bg (1 unique shown, of 1 total).
value
count
share
3
6.0%
origin_pt
categorical
other
This column, likely representing an origin point or location, is almost entirely empty: 80% of its 50 rows are null, and the only non-null value present is an empty string appearing 10 times — meaning the column contains no actual information whatsoever. With a cardinality of 1 and entropy of 0.0, it is completely invariant. The combination of high null rate and a sole value being an empty string suggests the field was never populated in this dataset.
Treatment: Drop — column carries zero information due to 80% nulls and a single empty-string value across all remaining rows.
Out[1005]:
saturn.columns["origin_pt"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
80.0% null
alert: imbalance
top value is 100.0% of rows
Fig 258.
Top values for origin_pt.
Show data table
Top values for origin_pt (1 unique shown, of 1 total).
Top values for ingredients_text_with_allergens_pt.
Show data table
Top values for ingredients_text_with_allergens_pt (4 unique shown, of 4 total).
value
count
share
5
10.0%
Creme para barrar de AVELAS e cacau 40% (açúcar, gordura de palma, AVELAS (13%), LEITE desnatado em pó (8,7%), cacau magro (7,4%), emulsionantes: lecitinas (SOJA), vanilina), farinha de TRIGO (32,5%), gorduras vegetais (palma, palmiste), açúcar de cana (contém TRIGO) (8,5%), LACTOSE, farelo de TRIGO, LEITE inteiro em pó, mel, levedantes químicos (difosfato dissódico, hidrogenocarbonato de sódio, hidrogenocarbonato de amónio), farinha de CEVADA maltada, cacau magro, sal, extrato em pó de malte de CEVADA e milho, amido de TRIGO, emulsionantes: lecitinas (SOJA), vanilina.
1
2.0%
Farinha de TRIGO, gordura de palma, xarope de glucose, extrato de CEVADA malteada, levedantes (carbonatos de amónio, carbonatos de sódio), sal, OVOS, aroma, agente de tratamento da farinha (METABISSULFITO de sódio).
1
2.0%
Pasta de cacau, açúcar, manteiga de cacau, baunilha.
1
2.0%
product_name_bg
categorical
Out[1011]:
saturn.columns["product_name_bg"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
3
top_value
Шоколад 85% какаова маса
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1
alert: long_tail
3 singleton categories
alert: null_rate
94.0% null
Fig 260.
Top values for product_name_bg.
Show data table
Top values for product_name_bg (3 unique shown, of 3 total).
value
count
share
Шоколад 85% какаова маса
1
2.0%
Тъмен шоколад 74% какао
1
2.0%
Лешниково-какаов крем
1
2.0%
ingredients_text_sl
categorical
Out[1014]:
saturn.columns["ingredients_text_sl"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 261.
Top values for ingredients_text_sl.
Show data table
Top values for ingredients_text_sl (1 unique shown, of 1 total).
value
count
share
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
1
2.0%
generic_name_sl
categorical
Out[1017]:
saturn.columns["generic_name_sl"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 262.
Top values for generic_name_sl.
Show data table
Top values for generic_name_sl (1 unique shown, of 1 total).
value
count
share
1
2.0%
generic_name_sk
categorical
Out[1020]:
saturn.columns["generic_name_sk"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 263.
Top values for generic_name_sk.
Show data table
Top values for generic_name_sk (1 unique shown, of 1 total).
value
count
share
1
2.0%
product_name_pt
categorical
Out[1023]:
saturn.columns["product_name_pt"].stats
stat
value
n
50
nulls
40 (80.0%)
unique
7
top_value
top_rate
0.4
cardinality
7
entropy
2.522
entropy_ratio
0.8983
alert: long_tail
6 singleton categories
alert: null_rate
80.0% null
Fig 264.
Top values for product_name_pt.
Show data table
Top values for product_name_pt (7 unique shown, of 7 total).
value
count
share
4
8.0%
Cioccolato Fondente 85% Cacao
1
2.0%
Crocantes bolachas com um coração cremoso de Nutella®
1
2.0%
70% Cacao noir intense
1
2.0%
Excellence 70% Cocoa Intense Dark
1
2.0%
Original
1
2.0%
Mix com sultanas e arandos
1
2.0%
lc_imported
categorical
Out[1026]:
saturn.columns["lc_imported"].stats
stat
value
n
50
nulls
42 (84.0%)
unique
2
top_value
fr
top_rate
0.875
cardinality
2
entropy
0.5436
entropy_ratio
0.5436
alert: null_rate
84.0% null
Fig 265.
Top values for lc_imported.
Show data table
Top values for lc_imported (2 unique shown, of 2 total).
Top values for abbreviated_product_name_fr_imported.
Show data table
Top values for abbreviated_product_name_fr_imported (7 unique shown, of 7 total).
value
count
share
CRISTALINE Eau De Source 0.5L
1
2.0%
Nutella biscuits t22
1
2.0%
Authentique 275g, fr
1
2.0%
Fibres 230g, fr
1
2.0%
ORG Original 175g
1
2.0%
NESTLE DESSERT Noir 205g
1
2.0%
BRIOCHE TRANCHEE BIO 400g
1
2.0%
generic_name_zh
categorical
Out[1032]:
saturn.columns["generic_name_zh"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 267.
Top values for generic_name_zh.
Show data table
Top values for generic_name_zh (1 unique shown, of 1 total).
value
count
share
1
2.0%
obsolete_imported
categorical
other
This column appears to be a boolean or flag field (likely 'imported' status, now obsolete) that contains only the value '0' across all 7 non-null rows. With an 86% null rate and a cardinality of 1, the column carries zero information — entropy is exactly 0.0 and the single observed value covers 100% of non-null records. Both the near-total nulls and complete value imbalance are flagged as alerts.
Treatment: Drop — zero variance, 86% nulls, and a name explicitly marking it obsolete make this column uninformative for any downstream use.
Out[1035]:
saturn.columns["obsolete_imported"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
1
top_value
0
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
86.0% null
alert: imbalance
top value is 100.0% of rows
Fig 268.
Top values for obsolete_imported.
Show data table
Top values for obsolete_imported (1 unique shown, of 1 total).
value
count
share
0
7
14.0%
generic_name_fr_imported
categorical
Out[1038]:
saturn.columns["generic_name_fr_imported"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
7
top_value
Eau De Source
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1
alert: long_tail
7 singleton categories
alert: null_rate
86.0% null
Fig 269.
Top values for generic_name_fr_imported.
Show data table
Top values for generic_name_fr_imported (7 unique shown, of 7 total).
value
count
share
Eau De Source
1
2.0%
Biscuit fourré à la pâte à tartiner aux noisettes et au cacao Nutella®
1
2.0%
Pain croustillant a la farine de seigle
1
2.0%
Pain croustillant à la farine complète de seigle, avoine et sésame.
1
2.0%
Snack salé
1
2.0%
Chocolat noir supérieur
1
2.0%
Brioche tranchée issue de l'agriculture biologique
1
2.0%
owners_tags
categorical
Out[1041]:
saturn.columns["owners_tags"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
6
top_value
org-barilla-france-sa
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755
alert: long_tail
5 singleton categories
alert: null_rate
86.0% null
Fig 270.
Top values for owners_tags.
Show data table
Top values for owners_tags (6 unique shown, of 6 total).
value
count
share
org-barilla-france-sa
2
4.0%
org-gie-sources-alma
1
2.0%
org-ferrero-france-commerciale
1
2.0%
org-kellogg-s
1
2.0%
org-nestle-france
1
2.0%
org-la-boulangere-co
1
2.0%
owner_imported
categorical
Out[1044]:
saturn.columns["owner_imported"].stats
stat
value
n
50
nulls
44 (88.0%)
unique
5
top_value
org-barilla-france-sa
top_rate
0.3333
cardinality
5
entropy
2.252
entropy_ratio
0.9697
alert: long_tail
4 singleton categories
alert: null_rate
88.0% null
Fig 271.
Top values for owner_imported.
Show data table
Top values for owner_imported (5 unique shown, of 5 total).
value
count
share
org-barilla-france-sa
2
4.0%
org-gie-sources-alma
1
2.0%
org-ferrero-france-commerciale
1
2.0%
org-nestle-france
1
2.0%
org-la-boulangere-co
1
2.0%
customer_service
categorical
Out[1047]:
saturn.columns["customer_service"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
6
top_value
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755
alert: long_tail
5 singleton categories
alert: null_rate
86.0% null
Fig 272.
Top values for customer_service.
Show data table
Top values for customer_service (6 unique shown, of 6 total).
value
count
share
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
2
4.0%
Service Consommateurs Cristaline, 70 avenue des Sources 03270 SAINT YORRE
1
2.0%
FERRERO FRANCE COMMERCIALE - Service Consommateurs, CS 90058 - 76136 MONT SAINT AIGNAN Cedex
1
2.0%
Service Conseil Consommateurs, Kellogg's Produits Alimentaires S.A.S. - Immeuble Neptune - 1 rue Galilée 93160 Noisy-le-Grand (France)
1
2.0%
Nestlé France, BP 900 Noisiel 77446 Marne la Vallée Cedex 2
1
2.0%
Service consommateurs La Boulangère & Co, La Boulangère & Co 1 rue du petit bocage CS 40 201 85140 ESSARTS
This column captures the unit basis for imported nutrition data (e.g., 'per 100g'), and is effectively a constant — the only observed value is '100g' across all 7 non-null rows. With an 86% null rate and cardinality of 1, it carries zero discriminative information. The combination of near-total missingness and zero entropy is a strong signal this field was either sparsely populated at ingestion or serves as a fixed schema placeholder.
Treatment: Drop before modelling; column is a zero-variance constant with 86% nulls and provides no analytical value.
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755
alert: long_tail
5 singleton categories
alert: null_rate
86.0% null
Fig 282.
Top values for customer_service_fr.
Show data table
Top values for customer_service_fr (6 unique shown, of 6 total).
value
count
share
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
2
4.0%
Service Consommateurs Cristaline, 70 avenue des Sources 03270 SAINT YORRE
1
2.0%
FERRERO FRANCE COMMERCIALE - Service Consommateurs, CS 90058 - 76136 MONT SAINT AIGNAN Cedex
1
2.0%
Service Conseil Consommateurs, Kellogg's Produits Alimentaires S.A.S. - Immeuble Neptune - 1 rue Galilée 93160 Noisy-le-Grand (France)
1
2.0%
Nestlé France, 34-40 rue Guynemer 92130 Issy-les-Moulineaux
1
2.0%
Service consommateurs La Boulangère & Co, La Boulangère & Co 1 rue du petit bocage CS 40 201 85140 ESSARTS
1
2.0%
nutrition_data_per_imported
categorical
metadata
This column represents the unit basis for imported nutrition data, and every non-null value is identically '100g' — giving it a cardinality of 1 and an entropy of 0.0. With an 84% null rate across 50 rows, only 8 observations carry a value at all, making the column almost entirely absent. The combination of extreme nullity and zero variance means this column provides no discriminating information whatsoever.
Treatment: Drop — 84% null with a single constant value ('100g') offers no predictive or analytical signal.
Top values for product_name_fr_imported.
Show data table
Top values for product_name_fr_imported (7 unique shown, of 7 total).
value
count
share
CRISTALINE Eau De Source 0.5L
1
2.0%
Biscuits Nutella x22 biscuits fourrés - 304g
1
2.0%
Wasa tartine croustillante authentique au seigle 275g
1
2.0%
Wasa tartine croustillante fibres 230g
1
2.0%
Chips Pringles Original
1
2.0%
NESTLE DESSERT Noir 205g
1
2.0%
Brioche Tranchée Bio 400g
1
2.0%
lang_imported
categorical
metadata
This column records the imported language of a record, and across the full 50-row dataset every non-null value is 'fr' (French) — a single unique value with zero entropy. With an 86% null rate, only 7 of 50 rows carry any value at all, making the column nearly empty and entirely constant where populated. Both the extreme null rate and perfect imbalance are flagged as alerts, suggesting this field may be partially populated metadata from an import pipeline rather than a reliable feature.
Treatment: Drop or impute cautiously — 86% nulls and zero variance make this column uninformative for modelling; investigate import pipeline for why values are absent.
Out[1117]:
saturn.columns["lang_imported"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
1
top_value
fr
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
86.0% null
alert: imbalance
top value is 100.0% of rows
Fig 292.
Top values for lang_imported.
Show data table
Top values for lang_imported (1 unique shown, of 1 total).
Top values for ingredients_text_fr_imported.
Show data table
Top values for ingredients_text_fr_imported (7 unique shown, of 7 total).
value
count
share
Eau de Source
1
2.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%, LAIT écrémé en poudre 8,7%, cacao maigre 7,4%, émulsifiants : lécithines [SOJA] ; vanilline), farine de FROMENT 32%, graisses végétales (palme, palmiste), sucre de canne 8,5%, LACTOSE, son de BLE, LAIT en poudre, extrait en poudre de malt d'ORGE et de maïs, miel, poudres à lever (disphosphate disodique, carbonate acide d'ammonium, carbonate acide de sodium), cacao maigre, sel, amidon de FROMENT, farine d'ORGE malté, émulsifiants : lécithines [SOJA] ; vanilline.
1
2.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.
1
2.0%
Farine complète de SEIGLE 59 g*, son de BLÉ 27 g*, flocons d'AVOINE 12 g*, GRAINES DE SÉSAME 7,0 g*, germe de BLÉ, sel. *en g pour 100 g de produit fini. Peut contenir des traces de LUPIN, LAIT, MOUTARDE et SOJA.
1
2.0%
Pommes de terre déshydratées, huiles végétales (tournesol, maïs), farine de riz, amidon de BLÉ, farine de maïs, émulsifiant (E471), maltodextrine, sel, extrait de levure, levure en poudre, colorant (rocou).
1
2.0%
Sucre, pâte de cacao (Afrique de l'Ouest, Amérique du Sud), beurre de cacao, émulsifiant (lécithine), arôme naturel de vanille de Madagascar. Cacao : 53% minimum. Peut contenir : LAIT, FRUITS A COQUE.
1
2.0%
Farine de BLÉ*/** 54%, ŒUFS entiers*/** 14%, sucre de canne roux*, huile de tournesol*/** 8%, levain* (eau, farines de BLÉ*/** 2% et de SEIGLE*, levures), GLUTEN DE BLÉ*, sel, levure, arôme naturel de vanille* (contient alcool*), extrait de vanille*, levure désactivée. Traces éventuelles de lait, moutarde et soja. *Ingrédients issus de l'Agriculture Biologique. **Ingrédients issus du commerce équitable français.
1
2.0%
conservation_conditions
categorical
Out[1126]:
saturn.columns["conservation_conditions"].stats
stat
value
n
50
nulls
43 (86.0%)
unique
7
top_value
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1
alert: long_tail
7 singleton categories
alert: null_rate
86.0% null
Fig 295.
Top values for conservation_conditions.
Show data table
Top values for conservation_conditions (7 unique shown, of 7 total).
value
count
share
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
1
2.0%
A conserver au sec et à l'abri de la chaleur. Ne pas mettre au réfrigérateur.
1
2.0%
A conserver dans un endroit sec à l'abri de la lumière.
1
2.0%
Conserver dans un endroit frais et sec.
1
2.0%
À conserver dans un endroit sec
1
2.0%
A conserver au frais et au sec.
1
2.0%
À conserver dans son emballage fermé, dans un endroit sec, à température ambiante.
1
2.0%
nova_group_error
categorical
Out[1129]:
saturn.columns["nova_group_error"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
too_many_unknown_ingredients
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 296.
Top values for nova_group_error.
Show data table
Top values for nova_group_error (1 unique shown, of 1 total).
Top values for ingredients_text_ro.
Show data table
Top values for ingredients_text_ro (1 unique shown, of 1 total).
value
count
share
2
4.0%
producer_version_id
categorical
Out[1150]:
saturn.columns["producer_version_id"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
3
top_value
1
top_rate
0.5
cardinality
3
entropy
1.5
entropy_ratio
0.9464
alert: long_tail
2 singleton categories
alert: null_rate
92.0% null
Fig 303.
Top values for producer_version_id.
Show data table
Top values for producer_version_id (3 unique shown, of 3 total).
value
count
share
1
2
4.0%
2021-01-25T13:53:49+01:00
1
2.0%
44217063
1
2.0%
labels_imported
categorical
Out[1153]:
saturn.columns["labels_imported"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
3
top_value
Végétarien
top_rate
0.6
cardinality
3
entropy
1.371
entropy_ratio
0.865
alert: long_tail
2 singleton categories
alert: null_rate
90.0% null
Fig 304.
Top values for labels_imported.
Show data table
Top values for labels_imported (3 unique shown, of 3 total).
value
count
share
Végétarien
3
6.0%
Point Vert, Rainforest Alliance, Triman
1
2.0%
Commerce équitable, Bio, Bio européen, en:organic
1
2.0%
allergens_imported
categorical
Out[1156]:
saturn.columns["allergens_imported"].stats
stat
value
n
50
nulls
45 (90.0%)
unique
4
top_value
Gluten
top_rate
0.4
cardinality
4
entropy
1.922
entropy_ratio
0.961
alert: long_tail
3 singleton categories
alert: null_rate
90.0% null
Fig 305.
Top values for allergens_imported.
Show data table
Top values for allergens_imported (4 unique shown, of 4 total).
value
count
share
Gluten
2
4.0%
Gluten, Lait, Fruits à coque, Soja, Gs1:T4078:ML
1
2.0%
Gluten, Graines de sésame
1
2.0%
Œufs, Gluten
1
2.0%
origin_ro
categorical
Out[1159]:
saturn.columns["origin_ro"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 306.
Top values for origin_ro.
Show data table
Top values for origin_ro (1 unique shown, of 1 total).
value
count
share
2
4.0%
no_nutrition_data_imported
categorical
feature
This column is a boolean flag indicating whether nutrition data was absent for a record. It has a 92% null rate across 50 rows, and the only 4 non-null values all carry the single value 'false', giving it zero entropy and cardinality of 1. The extreme null rate combined with complete value uniformity among non-nulls means this column carries no predictive signal whatsoever — it is effectively empty.
Treatment: Drop — zero variance and 92% nulls make this column useless for modelling or analysis.
Top values for abbreviated_product_name_imported.
Show data table
Top values for abbreviated_product_name_imported (3 unique shown, of 3 total).
value
count
share
Authentique 275g, fr
1
2.0%
Fibres 230g, fr
1
2.0%
DESSERT Noir 205g
1
2.0%
traces_imported
categorical
Out[1183]:
saturn.columns["traces_imported"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
4
top_value
Lupin, Lait, Moutarde, Graines de sésame, Soja
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1
alert: long_tail
4 singleton categories
alert: null_rate
92.0% null
Fig 314.
Top values for traces_imported.
Show data table
Top values for traces_imported (4 unique shown, of 4 total).
value
count
share
Lupin, Lait, Moutarde, Graines de sésame, Soja
1
2.0%
Lupin, Lait, Moutarde, Soja
1
2.0%
Lait, Fruits à coque
1
2.0%
Lait, Moutarde, Soja
1
2.0%
specific_ingredients
unknown
Out[1186]:
saturn.columns["specific_ingredients"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
packaging_text_ru
categorical
metadata
This column holds Russian-language packaging text, but is almost entirely empty: 94% of the 50 rows are null, and the sole non-null value appearing 3 times is an empty string — giving a cardinality of 1 and zero entropy. In practice the column carries no information whatsoever across the observed sample.
Treatment: Drop this column; it is effectively unpopulated (94% null, remaining values are empty strings) and provides no signal for modelling or analysis.
Out[1188]:
saturn.columns["packaging_text_ru"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 315.
Top values for packaging_text_ru.
Show data table
Top values for packaging_text_ru (1 unique shown, of 1 total).
value
count
share
3
6.0%
origin_ru
categorical
other
This column appears to be a Russian-language origin/source field that is almost entirely unpopulated: 94% of the 50 rows are null, and the sole non-null value is an empty string appearing 3 times. With cardinality of 1, zero entropy, and a top_rate of 1.0, the column carries absolutely no information. It was likely intended to capture Russian-locale origin metadata but was never populated.
Treatment: Drop this column — it contains no usable signal (94% null, remaining values are empty strings).
Out[1191]:
saturn.columns["origin_ru"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 316.
Top values for origin_ru.
Show data table
Top values for origin_ru (1 unique shown, of 1 total).
This column is intended to store Russian-language ingredients text with allergen information for food products. It is effectively empty: 94% of the 50 rows are null, and the sole non-null value present is an empty string (''), giving a cardinality of 1 and entropy of 0. There is no usable signal whatsoever in this column for the sampled data.
Treatment: Drop this column; it carries no information (94% null, remaining values are empty strings).
Top values for ingredients_text_with_allergens_ru.
Show data table
Top values for ingredients_text_with_allergens_ru (1 unique shown, of 1 total).
value
count
share
3
6.0%
product_name_ru
categorical
Out[1197]:
saturn.columns["product_name_ru"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
2
top_value
top_rate
0.6667
cardinality
2
entropy
0.9183
entropy_ratio
0.9183
alert: null_rate
94.0% null
Fig 318.
Top values for product_name_ru.
Show data table
Top values for product_name_ru (2 unique shown, of 2 total).
value
count
share
2
4.0%
Экселенс 99% какао
1
2.0%
generic_name_ru
categorical
Out[1200]:
saturn.columns["generic_name_ru"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
2
top_value
top_rate
0.6667
cardinality
2
entropy
0.9183
entropy_ratio
0.9183
alert: null_rate
94.0% null
Fig 319.
Top values for generic_name_ru.
Show data table
Top values for generic_name_ru (2 unique shown, of 2 total).
value
count
share
2
4.0%
Плитка горького шоколада (99% какао)
1
2.0%
ingredients_text_ru
categorical
other
This column is a Russian-language ingredients text field for food/product records, almost certainly a localized variant of a broader ingredients column. It is 94% null across 50 rows, and the only non-null value observed is an empty string (appearing 3 times), meaning there is effectively zero usable content in this column. Cardinality of 1 and entropy of 0.0 confirm complete absence of informational signal.
Treatment: Drop; 94% null with only empty-string values provides no modelling or analytical value.
Out[1203]:
saturn.columns["ingredients_text_ru"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 320.
Top values for ingredients_text_ru.
Show data table
Top values for ingredients_text_ru (1 unique shown, of 1 total).
value
count
share
3
6.0%
packaging_text_da
categorical
Out[1206]:
saturn.columns["packaging_text_da"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 321.
Top values for packaging_text_da.
Show data table
Top values for packaging_text_da (1 unique shown, of 1 total).
value
count
share
2
4.0%
generic_name_da
categorical
Out[1209]:
saturn.columns["generic_name_da"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
2
top_value
Kiks
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
alert: long_tail
2 singleton categories
alert: null_rate
96.0% null
Fig 322.
Top values for generic_name_da.
Show data table
Top values for generic_name_da (2 unique shown, of 2 total).
value
count
share
Kiks
1
2.0%
1
2.0%
forest_footprint_data
unknown
Out[1212]:
saturn.columns["forest_footprint_data"].stats
stat
value
n
50
nulls
0 (0.0%)
unique
—
alert: skipped
no profiler for kind=unknown
product_name_da
categorical
Out[1214]:
saturn.columns["product_name_da"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
2
top_value
Original
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
alert: long_tail
2 singleton categories
alert: null_rate
96.0% null
Fig 323.
Top values for product_name_da.
Show data table
Top values for product_name_da (2 unique shown, of 2 total).
kakaómassza, cukor, kakaó - vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 330.
Top values for ingredients_text_hu_ocr_1571428260_result.
Show data table
Top values for ingredients_text_hu_ocr_1571428260_result (1 unique shown, of 1 total).
value
count
share
kakaómassza, cukor, kakaó - vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
1
2.0%
packaging_text_cs
categorical
metadata
This column appears to be Czech-language packaging text (`_cs` locale suffix), but it is almost entirely empty: 94% null rate across 50 rows, and the only observed non-null value is an empty string appearing 3 times. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated for this dataset slice.
Treatment: Drop this column; it contains no usable signal (94% nulls, sole value is empty string).
Out[1238]:
saturn.columns["packaging_text_cs"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 331.
Top values for packaging_text_cs.
Show data table
Top values for packaging_text_cs (1 unique shown, of 1 total).
value
count
share
3
6.0%
ingredients_text_sr
categorical
Out[1241]:
saturn.columns["ingredients_text_sr"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
2
top_value
Šećer, kakao masa, kakao buter, vanile.
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
alert: long_tail
2 singleton categories
alert: null_rate
96.0% null
Fig 332.
Top values for ingredients_text_sr.
Show data table
Top values for ingredients_text_sr (2 unique shown, of 2 total).
value
count
share
Šećer, kakao masa, kakao buter, vanile.
1
2.0%
1
2.0%
origin_sr
categorical
Out[1244]:
saturn.columns["origin_sr"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 333.
Top values for origin_sr.
Show data table
Top values for origin_sr (1 unique shown, of 1 total).
kakaómassza, cukor, kakaó- vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 334.
Top values for ingredients_text_hu_ocr_1571428260.
Show data table
Top values for ingredients_text_hu_ocr_1571428260 (1 unique shown, of 1 total).
value
count
share
kakaómassza, cukor, kakaó- vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
1
2.0%
packaging_text_hu
categorical
feature
This column contains Hungarian-language packaging text, but is almost entirely empty: 92% null rate across 50 rows, and the only non-null value observed is an empty string appearing 4 times. With cardinality of 1 and entropy of 0.0, the column carries zero information — it is effectively unpopulated.
Treatment: Drop — 92% nulls and a single empty-string value provide no modelling or analytical signal.
Out[1250]:
saturn.columns["packaging_text_hu"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
92.0% null
alert: imbalance
top value is 100.0% of rows
Fig 335.
Top values for packaging_text_hu.
Show data table
Top values for packaging_text_hu (1 unique shown, of 1 total).
value
count
share
4
8.0%
origin_cs
categorical
Out[1253]:
saturn.columns["origin_cs"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 336.
Top values for origin_cs.
Show data table
Top values for origin_cs (1 unique shown, of 1 total).
Top values for ingredients_text_with_allergens_sr.
Show data table
Top values for ingredients_text_with_allergens_sr (2 unique shown, of 2 total).
value
count
share
Šećer, kakao masa, kakao buter, vanile.
1
2.0%
1
2.0%
ingredients_text_hu
categorical
Out[1274]:
saturn.columns["ingredients_text_hu"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
4
top_value
Kakaómassza, cukor, kakaó - vaj, vanília.
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1
alert: long_tail
4 singleton categories
alert: null_rate
92.0% null
Fig 343.
Top values for ingredients_text_hu.
Show data table
Top values for ingredients_text_hu (4 unique shown, of 4 total).
value
count
share
Kakaómassza, cukor, kakaó - vaj, vanília.
1
2.0%
HU Étcsokoládé. Kakaó szárazanyag legalább 70% . ÖSszetevők: kakaómassza, cukor, kakaóvaj, emulgeálószerek: lecitinek (szójából); vanília kivonat. Nyomokban dióféléket és tejet tartalmazhat. Bontatlan csomagolásban tárolva minőségét megórzi (nap/hónap/év): a csomagolás hátoldalán feltüntetett időpontig. Száraz, hűvös helyen tárolandó! Készült: Németországban. A kakaóbab származási helye: Ecuador, Elefántcsontpart, Ghána, Kamerun és Nigeria. A Fairtrade Cocoa Program (Fairtrade Kakaó Program) előnyökhöz juttatja a kistermelőket azáltal, hogy több kakaót értékesítenek Fairtrade termékként. Látogasson el a www.info.fairtrade.net/program oldalra.
RO Ciocolată amăruie. Substantă uscată de cacao minimum 70% Ingrediente: masă de cacao, zahăr, unt de cacao, emulsifiant: lecitine din soia; extract din vanilie. Cu ingrediente din tări UE şi non UE. Poate contine urme de fructe cu coajă lemnoasă şi lapte. A se consuma de preferintă înainte de/Nr. Lot: vezi spate ambalaj. A se păstra la loc uscat şi răcoros, ferit de razele soarelui și de înghet, atât înainte, cât şi după deschidere. A se consuma în cel mai scurt timp după deschidere. Fairtrade Cocoa Program (Programul Fairtrade de Cacao) permite micilor agricultori să beneficieze de vânzarea propriei cacao ca Fairtrade. Vizitați www.info.fairtrade.net/program. Produs in U.E. pentru S.C. Lidl Discount SRL, Sat Nedelea, Comuna Ariceştii Rahtivani, DN 72, Crângul lui Bot, KM 73+810, județul Prahova, România.
BG Натурален шоколад. Съдържа мин. 70% какаова маса. Съставки: какаова маса, захар, какаово масло, емулгатор: лецитин (соев); екстракт от ванилия. Може да съдържа следи от ядки и мляко. Неотворен най-добър до:/ Партида: виж задната страна. Да се съхранява на сухо и хладно място. Програмата за сертифициране на какао Fairtrade Сосоа Program дава възможност на малките производители да продават повече какао при справедливи условия на търговия. Повече информация на www.info.fairtrade.net/program Произведено в Германия за Лидл Щифтунг енд Ко. КГ, Щифтсбергщрасе 1, 74167 Некарзулм, Германия. LIDL
Top values for product_name_hu (3 unique shown, of 3 total).
value
count
share
2
4.0%
Excellence 70% Cocoa Intense Dark
1
2.0%
Dark Chocolate 70% Cacao
1
2.0%
generic_name_sr
categorical
Out[1280]:
saturn.columns["generic_name_sr"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
2
top_value
Tamna čokolada sa 70% kakaa
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1
alert: long_tail
2 singleton categories
alert: null_rate
96.0% null
Fig 345.
Top values for generic_name_sr.
Show data table
Top values for generic_name_sr (2 unique shown, of 2 total).
value
count
share
Tamna čokolada sa 70% kakaa
1
2.0%
1
2.0%
origin_hu
categorical
other
This column appears to be an origin or handling-unit identifier that is almost entirely unpopulated — 92% of its 50 rows are null, and the sole non-null value present is an empty string appearing 4 times. With cardinality of 1, zero entropy, and a top_rate of 1.0 across non-null values, the column carries no discriminative information whatsoever. This is a effectively a blank field in the current dataset snapshot.
Treatment: Drop — 92% null with a single empty-string value provides zero signal for any downstream task.
Out[1283]:
saturn.columns["origin_hu"].stats
stat
value
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
92.0% null
alert: imbalance
top value is 100.0% of rows
Fig 346.
Top values for origin_hu.
Show data table
Top values for origin_hu (1 unique shown, of 1 total).
Top values for ingredients_text_with_allergens_hu.
Show data table
Top values for ingredients_text_with_allergens_hu (3 unique shown, of 3 total).
value
count
share
Kakaómassza, cukor, kakaó - vaj, vanília.
1
2.0%
HU Étcsokoládé. Kakaó szárazanyag legalább 70% . ÖSszetevők: kakaómassza, cukor, kakaóvaj, emulgeálószerek: lecitinek (szójából); vanília kivonat. Nyomokban dióféléket és tejet tartalmazhat. Bontatlan csomagolásban tárolva minőségét megórzi (nap/hónap/év): a csomagolás hátoldalán feltüntetett időpontig. Száraz, hűvös helyen tárolandó! Készült: Németországban. A kakaóbab származási helye: Ecuador, Elefántcsontpart, Ghána, Kamerun és Nigeria. A Fairtrade Cocoa Program (Fairtrade Kakaó Program) előnyökhöz juttatja a kistermelőket azáltal, hogy több kakaót értékesítenek Fairtrade termékként. Látogasson el a www.info.fairtrade.net/program oldalra.
RO Ciocolată amăruie. Substantă uscată de cacao minimum 70% Ingrediente: masă de cacao, zahăr, unt de cacao, emulsifiant: lecitine din soia; extract din vanilie. Cu ingrediente din tări UE şi non UE. Poate contine urme de fructe cu coajă lemnoasă şi lapte. A se consuma de preferintă înainte de/Nr. Lot: vezi spate ambalaj. A se păstra la loc uscat şi răcoros, ferit de razele soarelui și de înghet, atât înainte, cât şi după deschidere. A se consuma în cel mai scurt timp după deschidere. Fairtrade Cocoa Program (Programul Fairtrade de Cacao) permite micilor agricultori să beneficieze de vânzarea propriei cacao ca Fairtrade. Vizitați www.info.fairtrade.net/program. Produs in U.E. pentru S.C. Lidl Discount SRL, Sat Nedelea, Comuna Ariceştii Rahtivani, DN 72, Crângul lui Bot, KM 73+810, județul Prahova, România.
BG Натурален шоколад. Съдържа мин. 70% какаова маса. Съставки: какаова маса, захар, какаово масло, емулгатор: лецитин (соев); екстракт от ванилия. Може да съдържа следи от ядки и мляко. Неотворен най-добър до:/ Партида: виж задната страна. Да се съхранява на сухо и хладно място. Програмата за сертифициране на какао Fairtrade Сосоа Program дава възможност на малките производители да продават повече какао при справедливи условия на търговия. Повече информация на www.info.fairtrade.net/program Произведено в Германия за Лидл Щифтунг енд Ко. КГ, Щифтсбергщрасе 1, 74167 Некарзулм, Германия. LIDL
This column appears to be a Czech-language generic name field (indicated by the '_cs' suffix) that is almost entirely empty: 94% of its 50 rows are null, and the sole non-null value is an empty string appearing 3 times. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated.
Treatment: Drop this column; it contains no usable signal with a 94% null rate and only empty-string values in the remainder.
Out[1289]:
saturn.columns["generic_name_cs"].stats
stat
value
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
94.0% null
alert: imbalance
top value is 100.0% of rows
Fig 348.
Top values for generic_name_cs.
Show data table
Top values for generic_name_cs (1 unique shown, of 1 total).
value
count
share
3
6.0%
ingredients_text_xx
categorical
Out[1292]:
saturn.columns["ingredients_text_xx"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 349.
Top values for ingredients_text_xx.
Show data table
Top values for ingredients_text_xx (1 unique shown, of 1 total).
value
count
share
2
4.0%
origin_xx
categorical
Out[1295]:
saturn.columns["origin_xx"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 350.
Top values for origin_xx.
Show data table
Top values for origin_xx (1 unique shown, of 1 total).
value
count
share
1
2.0%
product_name_xx
categorical
Out[1298]:
saturn.columns["product_name_xx"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 351.
Top values for product_name_xx.
Show data table
Top values for product_name_xx (1 unique shown, of 1 total).
value
count
share
2
4.0%
packaging_text_xx
categorical
Out[1301]:
saturn.columns["packaging_text_xx"].stats
stat
value
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 352.
Top values for packaging_text_xx.
Show data table
Top values for packaging_text_xx (1 unique shown, of 1 total).
value
count
share
1
2.0%
generic_name_xx
categorical
Out[1304]:
saturn.columns["generic_name_xx"].stats
stat
value
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: null_rate
96.0% null
alert: imbalance
top value is 100.0% of rows
Fig 353.
Top values for generic_name_xx.
Show data table
Top values for generic_name_xx (1 unique shown, of 1 total).
Cioccolato amaro extra.
Cacao: 99% minimo.
Ingredienti: pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna.
Può contenere frutta a guscio, latte e soia.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 375.
Top values for ingredients_text_it_ocr_1559410715.
Show data table
Top values for ingredients_text_it_ocr_1559410715 (1 unique shown, of 1 total).
value
count
share
Cioccolato amaro extra.
Cacao: 99% minimo.
Ingredienti: pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna.
Può contenere frutta a guscio, latte e soia.
Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 420.
Top values for ingredients_text_fr_ocr_1713713129_result.
Show data table
Top values for ingredients_text_fr_ocr_1713713129_result (1 unique shown, of 1 total).
value
count
share
Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
Ingrédients : Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0
alert: long_tail
1 singleton categories
alert: null_rate
98.0% null
alert: imbalance
top value is 100.0% of rows
Fig 421.
Top values for ingredients_text_fr_ocr_1713713129.
Show data table
Top values for ingredients_text_fr_ocr_1713713129 (1 unique shown, of 1 total).
value
count
share
Ingrédients : Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
1
2.0%
How to cite
click to copy
BibTeX
@misc{saturn-data-trove-openfoodfacts-database-2026,
author = {Steuber, Luke},
title = {Saturn reading: data trove openfoodfacts database},
year ={2026},
howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-openfoodfacts-database}},
note = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove openfoodfacts database. Source: /home/coolhand/html/datavis/data_trove/cache/wild/openfoodfacts_sample.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-openfoodfacts-database