saturn·

data trove openfoodfacts database

source /home/coolhand/html/datavis/data_trove/cache/wild/openfoodfacts_sample.json 50 rows 545 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · medium confidence anthropic:default

This is a 50-product sample from the Open Food Facts database, an open crowdsourced food product catalogue with 545 columns spanning multilingual product names, ingredient texts, allergen data, nutritional scores, packaging details, and community metadata. The most striking structural issue is extreme sparsity: the vast majority of language-specific columns (e.g. product_name_dz, ingredients_text_ja) have null rates of 96–98%, meaning content is concentrated in French and English fields. Two things most deserve a closer look: first, the Nutri-Score distribution is heavily skewed toward grade 'e' (54% of products), suggesting the sample leans toward nutritionally poor items; second, scan counts (scans_n, mean 578, max 2523) show a strong right-skewed tail with a few highly popular products dominating community attention.

citing: nutriscore_grade.top_value · nutriscore_grade.stats.top_rate · scans_n.stats.mean · scans_n.stats.max · scans_n.alerts · nova_groups.top_value · nova_groups.stats.top_rate · ecoscore_grade.stats.cardinality · emb_code.null_rate · ingredients_text_ja.null_rate

Schema

545 columns
Per-column summary. Click column name to jump to its detail.
Alerts
ingredients_with_unspecified_percent_sum numeric 0.0% 22
purchase_places categorical 2.0% 32
long_tail
rev numeric 0.0% 46
product_name_it categorical 68.0% 12
long_tail null_rate
editors unknown 0.0%
skipped
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients numeric 10.0% 1
constant
traces_hierarchy unknown 0.0%
skipped
packaging categorical 12.0% 41
long_tail
packagings_n numeric 18.0% 5
outliers
categories_properties unknown 0.0%
skipped
generic_name_en categorical 14.0% 8
long_tail
food_groups categorical 2.0% 11
ingredients_without_ciqual_codes_n numeric 0.0% 15
origin_sv categorical 92.0% 1
null_rate imbalance
product_name_ja categorical 98.0% 1
long_tail null_rate imbalance
data_quality_warnings_tags unknown 0.0%
skipped
packaging_recycling_tags unknown 0.0%
skipped
scores unknown 0.0%
skipped
nucleotides_prev_tags unknown 0.0%
skipped
data_quality_dimensions unknown 0.0%
skipped
product_name_fi categorical 90.0% 4
long_tail null_rate
origin_de categorical 60.0% 1
null_rate imbalance
packaging_lc categorical 12.0% 7
correctors_tags unknown 0.0%
skipped
categories_hierarchy unknown 0.0%
skipped
ingredients_ids_debug unknown 0.0%
skipped
traces_lc categorical 4.0% 6
environment_impact_level_tags unknown 0.0%
skipped
last_image_t numeric 0.0% 50
high_skew
ingredients_that_may_be_from_palm_oil_n numeric 8.0% 3
high_skew outliers
max_imgid categorical 0.0% 38
long_tail
nutriscore_tags unknown 0.0%
skipped
generic_name_sv categorical 92.0% 4
long_tail null_rate
ingredients_text_with_allergens_nb categorical 96.0% 1
null_rate imbalance
quantity categorical 2.0% 36
long_tail
countries_hierarchy unknown 0.0%
skipped
data_quality_tags unknown 0.0%
skipped
ingredients_n numeric 0.0% 22
grades unknown 0.0%
skipped
additives_original_tags unknown 0.0%
skipped
nutrition_score_beverage numeric 0.0% 2
high_skew
packaging_text_nl categorical 76.0% 1
null_rate imbalance
photographers unknown 0.0%
skipped
pnns_groups_1 categorical 0.0% 7
product_name_en categorical 14.0% 34
long_tail
traces_from_user categorical 0.0% 35
long_tail
generic_name_nl categorical 76.0% 4
long_tail null_rate
nutrition_grade_fr categorical 0.0% 6
image_front_thumb_url categorical 0.0% 50
long_tail
last_editor categorical 2.0% 24
long_tail
nutrient_levels_tags unknown 0.0%
skipped
product_name_nb categorical 96.0% 2
long_tail null_rate
packaging_shapes_tags unknown 0.0%
skipped
_keywords unknown 0.0%
skipped
emb_codes_tags unknown 0.0%
skipped
images unknown 0.0%
skipped
states_tags unknown 0.0%
skipped
packaging_text_sv categorical 92.0% 1
null_rate imbalance
informers_tags unknown 0.0%
skipped
ingredients_text_pl categorical 90.0% 3
long_tail null_rate
labels categorical 2.0% 42
long_tail
sources unknown 0.0%
skipped
checkers_tags unknown 0.0%
skipped
product_quantity_unit categorical 10.0% 2
imbalance
last_modified_by categorical 2.0% 24
long_tail
image_front_url categorical 0.0% 50
long_tail
nutrition_data_prepared categorical 4.0% 1
imbalance
packaging_text_fi categorical 90.0% 1
null_rate imbalance
interface_version_created categorical 2.0% 3
nutrient_levels unknown 0.0%
skipped
languages_tags unknown 0.0%
skipped
vitamins_prev_tags unknown 0.0%
skipped
other_nutritional_substances_tags unknown 0.0%
skipped
product_name_de categorical 60.0% 16
long_tail null_rate
nutrition_grades categorical 0.0% 6
countries_beforescanbot categorical 14.0% 38
long_tail
ingredients_text_with_allergens_es categorical 62.0% 13
long_tail null_rate
labels_lc categorical 2.0% 6
nova_group_debug categorical 0.0% 3
long_tail imbalance
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value numeric 8.0% 6
high_skew outliers
lc categorical 0.0% 5
allergens_from_user categorical 0.0% 34
long_tail
debug_param_sorted_langs unknown 0.0%
skipped
ecoscore_tags unknown 0.0%
skipped
nutriscore_score_opposite numeric 2.0% 28
image_small_url categorical 0.0% 50
long_tail
codes_tags unknown 0.0%
skipped
pnns_groups_2_tags unknown 0.0%
skipped
ingredients_analysis_tags unknown 0.0%
skipped
purchase_places_tags unknown 0.0%
skipped
unique_scans_n numeric 0.0% 48
high_skew outliers
update_key categorical 0.0% 9
long_tail
emb_codes_orig categorical 34.0% 5
long_tail null_rate
ingredients_text_with_allergens_de categorical 66.0% 16
long_tail null_rate
ingredients_without_ecobalyse_ids_n numeric 0.0% 20
main_countries_tags unknown 0.0%
skipped
ingredients_text_with_allergens_en categorical 16.0% 36
long_tail
nucleotides_tags unknown 0.0%
skipped
ingredients_text_with_allergens_sv categorical 92.0% 4
long_tail null_rate
entry_dates_tags unknown 0.0%
skipped
allergens_from_ingredients categorical 0.0% 35
long_tail
nova_groups categorical 4.0% 3
product_quantity categorical 6.0% 27
long_tail
ingredients_debug unknown 0.0%
skipped
generic_name categorical 4.0% 28
long_tail
origins_tags unknown 0.0%
skipped
added_countries_tags unknown 0.0%
skipped
categories_lc categorical 0.0% 6
image_url categorical 0.0% 50
long_tail
ingredients_sweeteners_n numeric 0.0% 1
constant
ingredients_text_ja categorical 98.0% 1
long_tail null_rate imbalance
allergens_tags unknown 0.0%
skipped
origin_es categorical 60.0% 1
null_rate imbalance
last_updated_t numeric 0.0% 50
outliers
origin_fr categorical 8.0% 7
long_tail
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value numeric 10.0% 13
high_skew outliers
ingredients_without_ecobalyse_ids unknown 0.0%
skipped
ingredients_text_with_allergens_it categorical 68.0% 12
long_tail null_rate
data_quality_errors_tags unknown 0.0%
skipped
origin_pl categorical 90.0% 1
null_rate imbalance
packaging_text_fr categorical 6.0% 14
long_tail
debug_tags unknown 0.0%
skipped
ingredients_text_sv categorical 92.0% 4
long_tail null_rate
cities_tags unknown 0.0%
skipped
ingredients_with_unspecified_percent_n numeric 0.0% 18
product_name_fr categorical 2.0% 47
long_tail
traces categorical 0.0% 23
long_tail
known_ingredients_n numeric 0.0% 22
packaging_text_pl categorical 90.0% 1
null_rate imbalance
image_front_small_url categorical 0.0% 50
long_tail
origin_en categorical 14.0% 2
imbalance
interface_version_modified categorical 0.0% 2
serving_size categorical 12.0% 37
long_tail
states categorical 0.0% 26
long_tail
generic_name_fi categorical 90.0% 5
long_tail null_rate
schema_version numeric 0.0% 1
constant
packaging_old_before_taxonomization categorical 24.0% 36
long_tail null_rate
nova_groups_markers unknown 0.0%
skipped
amino_acids_prev_tags unknown 0.0%
skipped
product unknown 0.0%
skipped
emb_codes categorical 4.0% 11
long_tail
labels_tags unknown 0.0%
skipped
selected_images unknown 0.0%
skipped
nutriscore unknown 0.0%
skipped
packaging_tags unknown 0.0%
skipped
traces_from_ingredients categorical 0.0% 12
long_tail
nutrition_data_per categorical 0.0% 2
ecoscore_grade categorical 0.0% 9
packaging_hierarchy unknown 0.0%
skipped
nova_group numeric 4.0% 3
high_skew
additives_tags unknown 0.0%
skipped
emb_codes_20141016 categorical 58.0% 7
long_tail null_rate
ingredients_without_ciqual_codes unknown 0.0%
skipped
categories_tags unknown 0.0%
skipped
category_properties unknown 0.0%
skipped
packagings unknown 0.0%
skipped
languages_codes unknown 0.0%
skipped
ingredients_text_with_allergens_fi categorical 90.0% 4
long_tail null_rate
ciqual_food_name_tags unknown 0.0%
skipped
complete numeric 0.0% 2
ingredients_text_with_allergens_pl categorical 92.0% 3
long_tail null_rate
allergens_hierarchy unknown 0.0%
skipped
languages_hierarchy unknown 0.0%
skipped
nova_groups_tags unknown 0.0%
skipped
ingredients_tags unknown 0.0%
skipped
ingredients_text_it categorical 68.0% 12
long_tail null_rate
informers unknown 0.0%
skipped
origin_nb categorical 96.0% 1
null_rate imbalance
creator categorical 0.0% 13
long_tail
packaging_text_ja categorical 98.0% 1
long_tail null_rate imbalance
sortkey numeric 12.0% 44
high_skew outliers
packagings_materials_main categorical 62.0% 3
null_rate
ingredients_percent_analysis numeric 0.0% 2
high_skew outliers
amino_acids_tags unknown 0.0%
skipped
categories_properties_tags unknown 0.0%
skipped
environment_impact_level categorical 56.0% 1
null_rate imbalance
expiration_date categorical 4.0% 34
long_tail
ingredients_from_or_that_may_be_from_palm_oil_n numeric 6.0% 3
nutriscore_score numeric 2.0% 28
ingredients_text_with_allergens categorical 0.0% 50
long_tail
ingredients_with_specified_percent_sum numeric 0.0% 22
nutriscore_version categorical 0.0% 1
imbalance
lang categorical 0.0% 5
origins_hierarchy unknown 0.0%
skipped
origins_lc categorical 4.0% 6
origin_it categorical 68.0% 1
null_rate imbalance
serving_quantity categorical 12.0% 27
long_tail
checkers unknown 0.0%
skipped
editors_tags unknown 0.0%
skipped
stores categorical 4.0% 31
long_tail
product_name_pl categorical 90.0% 3
long_tail null_rate
weighters_tags unknown 0.0%
skipped
ecoscore_score numeric 14.0% 31
generic_name_it categorical 68.0% 5
long_tail null_rate
obsolete categorical 12.0% 1
imbalance
other_nutritional_substances_prev_tags unknown 0.0%
skipped
compared_to_category categorical 0.0% 35
long_tail
generic_name_es categorical 60.0% 7
long_tail null_rate
correctors unknown 0.0%
skipped
additives_n numeric 0.0% 8
ingredients_text_nb categorical 96.0% 1
null_rate imbalance
ingredients_text_es categorical 60.0% 13
long_tail null_rate
manufacturing_places_tags unknown 0.0%
skipped
origin categorical 6.0% 6
long_tail
origins_old categorical 22.0% 9
long_tail null_rate
packaging_text_de categorical 60.0% 2
null_rate
languages unknown 0.0%
skipped
categories_old categorical 2.0% 45
long_tail
ingredients_from_palm_oil_tags unknown 0.0%
skipped
minerals_prev_tags unknown 0.0%
skipped
origin_fi categorical 90.0% 1
null_rate imbalance
packaging_old categorical 14.0% 40
long_tail
ingredients_text_fi categorical 90.0% 4
long_tail null_rate
product_type categorical 0.0% 1
imbalance
ingredients_hierarchy unknown 0.0%
skipped
removed_countries_tags unknown 0.0%
skipped
unknown_nutrients_tags unknown 0.0%
skipped
no_nutrition_data categorical 4.0% 1
imbalance
ingredients_analysis unknown 0.0%
skipped
packagings_materials unknown 0.0%
skipped
serving_quantity_unit categorical 8.0% 2
imbalance
product_name categorical 0.0% 49
long_tail
id categorical 0.0% 50
long_tail
ingredients_text_with_allergens_nl categorical 78.0% 9
long_tail null_rate
categories categorical 0.0% 46
long_tail
nutrition_grades_tags unknown 0.0%
skipped
nutriscore_2023_tags unknown 0.0%
skipped
origin_ja categorical 98.0% 1
long_tail null_rate imbalance
nutrition_score_debug categorical 0.0% 2
imbalance
teams categorical 8.0% 39
long_tail
unknown_ingredients_n numeric 0.0% 6
high_skew outliers
url categorical 0.0% 50
long_tail
data_quality_completeness_tags unknown 0.0%
skipped
ecoscore_data unknown 0.0%
skipped
generic_name_pl categorical 90.0% 2
null_rate
nutrition_data categorical 2.0% 1
imbalance
generic_name_ja categorical 98.0% 1
long_tail null_rate imbalance
nutriments unknown 0.0%
skipped
last_image_dates_tags unknown 0.0%
skipped
brands categorical 0.0% 41
long_tail
minerals_tags unknown 0.0%
skipped
nutrition_data_prepared_per categorical 0.0% 1
imbalance
popularity_tags unknown 0.0%
skipped
packaging_text_es categorical 60.0% 2
null_rate
manufacturing_places categorical 2.0% 20
long_tail
generic_name_nb categorical 96.0% 1
null_rate imbalance
last_modified_t numeric 0.0% 50
outliers
vitamins_tags unknown 0.0%
skipped
_id categorical 0.0% 50
long_tail
teams_tags unknown 0.0%
skipped
countries categorical 0.0% 43
long_tail
pnns_groups_2 categorical 0.0% 11
states_hierarchy unknown 0.0%
skipped
code categorical 0.0% 50
long_tail
countries_lc categorical 2.0% 6
stores_tags unknown 0.0%
skipped
generic_name_de categorical 60.0% 9
long_tail null_rate
ingredients_n_tags unknown 0.0%
skipped
allergens categorical 0.0% 16
allergens_lc categorical 4.0% 6
ingredients_text_en categorical 12.0% 36
long_tail
misc_tags unknown 0.0%
skipped
photographers_tags unknown 0.0%
skipped
packaging_materials_tags unknown 0.0%
skipped
product_name_nl categorical 76.0% 7
long_tail null_rate
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients numeric 8.0% 1
constant
product_name_sv categorical 92.0% 4
long_tail null_rate
food_groups_tags unknown 0.0%
skipped
completeness numeric 0.0% 14
outliers
pnns_groups_1_tags unknown 0.0%
skipped
ingredients_with_specified_percent_n numeric 0.0% 7
origin_nl categorical 76.0% 1
null_rate imbalance
fruits-vegetables-nuts_100g_estimate numeric 46.0% 2
null_rate high_skew
brands_old categorical 32.0% 29
long_tail null_rate
generic_name_fr categorical 6.0% 34
long_tail
ingredients unknown 0.0%
skipped
countries_tags unknown 0.0%
skipped
ingredients_original_tags unknown 0.0%
skipped
ingredients_text_de categorical 60.0% 16
long_tail null_rate
nutriscore_grade categorical 0.0% 6
image_thumb_url categorical 0.0% 50
long_tail
packaging_text_en categorical 14.0% 5
long_tail
packaging_text_it categorical 68.0% 3
long_tail null_rate
traces_tags unknown 0.0%
skipped
brands_tags unknown 0.0%
skipped
nutriscore_2021_tags unknown 0.0%
skipped
packaging_text categorical 4.0% 13
long_tail
popularity_key numeric 0.0% 49
high_skew outliers
ingredients_text categorical 0.0% 50
long_tail
ingredients_text_with_allergens_fr categorical 4.0% 47
long_tail
ingredients_text_nl categorical 76.0% 9
long_tail null_rate
product_name_es categorical 60.0% 17
long_tail null_rate
data_sources_tags unknown 0.0%
skipped
data_quality_bugs_tags unknown 0.0%
skipped
obsolete_since_date categorical 12.0% 1
imbalance
weighers_tags unknown 0.0%
skipped
ingredients_text_debug categorical 28.0% 35
long_tail null_rate
link categorical 4.0% 28
long_tail
created_t numeric 0.0% 50
ingredients_text_fr categorical 4.0% 47
long_tail
labels_hierarchy unknown 0.0%
skipped
ingredients_non_nutritive_sweeteners_n numeric 0.0% 1
constant
last_edit_dates_tags unknown 0.0%
skipped
packaging_text_nb categorical 96.0% 1
null_rate imbalance
packagings_complete numeric 4.0% 2
data_sources categorical 0.0% 43
long_tail
labels_old categorical 8.0% 38
long_tail
data_quality_info_tags unknown 0.0%
skipped
ingredients_from_palm_oil_n numeric 8.0% 2
outliers
ingredients_text_with_allergens_ja categorical 98.0% 1
long_tail null_rate imbalance
ingredients_lc categorical 0.0% 4
origins categorical 4.0% 20
long_tail
nutriscore_data unknown 0.0%
skipped
scans_n numeric 0.0% 49
high_skew outliers
ingredients_that_may_be_from_palm_oil_tags unknown 0.0%
skipped
generic_name_ar categorical 80.0% 2
null_rate
product_name_uk categorical 98.0% 1
long_tail null_rate imbalance
last_checked_t numeric 86.0% 7
null_rate
last_check_dates_tags unknown 0.0%
skipped
ingredients_text_uk categorical 98.0% 1
long_tail null_rate imbalance
carbon_footprint_from_known_ingredients_debug categorical 72.0% 14
long_tail null_rate
packaging_text_ar categorical 80.0% 1
null_rate imbalance
generic_name_uk categorical 98.0% 1
long_tail null_rate imbalance
last_checker categorical 86.0% 4
null_rate
checked categorical 86.0% 1
null_rate imbalance
product_name_ar categorical 78.0% 6
long_tail null_rate
ingredients_text_with_allergens_uk categorical 98.0% 1
long_tail null_rate imbalance
origin_uk categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_uk categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_ar categorical 78.0% 2
null_rate
ingredients_text_with_allergens_ar categorical 82.0% 2
null_rate
carbon_footprint_percent_of_known_ingredients numeric 62.0% 19
null_rate
origin_ar categorical 80.0% 1
null_rate imbalance
nutrition_score_warning_no_fiber numeric 70.0% 1
null_rate constant
ingredients_text_debug_tags unknown 0.0%
skipped
nutriments_estimated unknown 0.0%
skipped
completed_t numeric 68.0% 16
null_rate
taxonomies_enhancer_tags unknown 0.0%
skipped
ingredients_text_with_allergens_sl categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_sk categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_bg categorical 94.0% 3
long_tail null_rate
ingredients_text_pt categorical 80.0% 4
long_tail null_rate
ingredients_text_dz categorical 98.0% 1
long_tail null_rate imbalance
generic_name_ca categorical 96.0% 1
null_rate imbalance
generic_name_bg categorical 94.0% 1
null_rate imbalance
origin_sl categorical 98.0% 1
long_tail null_rate imbalance
product_name_et categorical 94.0% 3
long_tail null_rate
origin_et categorical 94.0% 1
null_rate imbalance
ingredients_text_with_allergens_sk categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_et categorical 94.0% 3
long_tail null_rate
nutrition_score_warning_nutriments_estimated numeric 96.0% 1
null_rate constant
ingredients_text_sk categorical 98.0% 1
long_tail null_rate imbalance
generic_name_pt categorical 80.0% 3
long_tail null_rate
ingredients_text_bg categorical 94.0% 3
long_tail null_rate
packaging_text_et categorical 94.0% 1
null_rate imbalance
product_name_sk categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_ca categorical 96.0% 1
null_rate imbalance
ingredients_text_with_allergens_ca categorical 98.0% 1
long_tail null_rate imbalance
product_name_dz categorical 98.0% 1
long_tail null_rate imbalance
product_name_sl categorical 98.0% 1
long_tail null_rate imbalance
origin_sk categorical 98.0% 1
long_tail null_rate imbalance
generic_name_et categorical 94.0% 1
null_rate imbalance
ingredients_text_et categorical 94.0% 3
long_tail null_rate
packaging_text_ca categorical 96.0% 1
null_rate imbalance
packaging_text_sl categorical 98.0% 1
long_tail null_rate imbalance
generic_name_dz categorical 98.0% 1
long_tail null_rate imbalance
origin_ca categorical 96.0% 1
null_rate imbalance
product_name_ca categorical 96.0% 1
null_rate imbalance
packaging_text_pt categorical 80.0% 1
null_rate imbalance
origin_bg categorical 94.0% 1
null_rate imbalance
packaging_text_bg categorical 94.0% 1
null_rate imbalance
origin_pt categorical 80.0% 1
null_rate imbalance
ingredients_text_with_allergens_pt categorical 84.0% 4
long_tail null_rate
product_name_bg categorical 94.0% 3
long_tail null_rate
ingredients_text_sl categorical 98.0% 1
long_tail null_rate imbalance
generic_name_sl categorical 98.0% 1
long_tail null_rate imbalance
generic_name_sk categorical 98.0% 1
long_tail null_rate imbalance
product_name_pt categorical 80.0% 7
long_tail null_rate
lc_imported categorical 84.0% 2
null_rate
abbreviated_product_name_fr_imported categorical 86.0% 7
long_tail null_rate
generic_name_zh categorical 98.0% 1
long_tail null_rate imbalance
obsolete_imported categorical 86.0% 1
null_rate imbalance
generic_name_fr_imported categorical 86.0% 7
long_tail null_rate
owners_tags categorical 86.0% 6
long_tail null_rate
owner_imported categorical 88.0% 5
long_tail null_rate
customer_service categorical 86.0% 6
long_tail null_rate
ingredients_text_zh_debug_tags unknown 0.0%
skipped
countries_imported categorical 84.0% 2
null_rate
data_sources_imported categorical 84.0% 8
long_tail null_rate
product_name_zh categorical 98.0% 1
long_tail null_rate imbalance
categories_imported categorical 88.0% 5
long_tail null_rate
quantity_imported categorical 86.0% 7
long_tail null_rate
ingredients_text_zh categorical 98.0% 1
long_tail null_rate imbalance
emb_code categorical 98.0% 1
long_tail null_rate imbalance
origins_fr categorical 96.0% 2
long_tail null_rate
nutrition_data_prepared_per_imported categorical 86.0% 1
null_rate imbalance
product_name_zh_debug_tags unknown 0.0%
skipped
sources_fields unknown 0.0%
skipped
customer_service_fr categorical 86.0% 6
long_tail null_rate
nutrition_data_per_imported categorical 84.0% 1
null_rate imbalance
owner categorical 86.0% 6
long_tail null_rate
abbreviated_product_name categorical 86.0% 7
long_tail null_rate
conservation_conditions_fr categorical 86.0% 7
long_tail null_rate
brands_imported categorical 86.0% 6
long_tail null_rate
owner_fields unknown 0.0%
skipped
conservation_conditions_fr_imported categorical 86.0% 7
long_tail null_rate
origin_fr_imported categorical 96.0% 2
long_tail null_rate
customer_service_fr_imported categorical 86.0% 6
long_tail null_rate
generic_name_zh_debug_tags unknown 0.0%
skipped
product_name_fr_imported categorical 86.0% 7
long_tail null_rate
lang_imported categorical 86.0% 1
null_rate imbalance
abbreviated_product_name_fr categorical 86.0% 7
long_tail null_rate
ingredients_text_fr_imported categorical 86.0% 7
long_tail null_rate
conservation_conditions categorical 86.0% 7
long_tail null_rate
nova_group_error categorical 96.0% 1
null_rate imbalance
producer_version_id_imported categorical 92.0% 3
long_tail null_rate
ingredients_text_de_ocr_1648990410 categorical 98.0% 1
long_tail null_rate imbalance
product_name_ro categorical 96.0% 2
long_tail null_rate
packaging_imported categorical 92.0% 2
null_rate
ingredients_text_de_ocr_1648990410_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_ro categorical 96.0% 1
null_rate imbalance
producer_version_id categorical 92.0% 3
long_tail null_rate
labels_imported categorical 90.0% 3
long_tail null_rate
allergens_imported categorical 90.0% 4
long_tail null_rate
origin_ro categorical 96.0% 1
null_rate imbalance
no_nutrition_data_imported categorical 92.0% 1
null_rate imbalance
serving_size_imported categorical 88.0% 6
long_tail null_rate
generic_name_ro categorical 96.0% 1
null_rate imbalance
ingredients_text_de_ocr_1648897071 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_de_ocr_1648897071_result categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_ro categorical 96.0% 1
null_rate imbalance
abbreviated_product_name_imported categorical 94.0% 3
long_tail null_rate
traces_imported categorical 92.0% 4
long_tail null_rate
specific_ingredients unknown 0.0%
skipped
packaging_text_ru categorical 94.0% 1
null_rate imbalance
origin_ru categorical 94.0% 1
null_rate imbalance
ingredients_text_with_allergens_ru categorical 94.0% 1
null_rate imbalance
product_name_ru categorical 94.0% 2
null_rate
generic_name_ru categorical 94.0% 2
null_rate
ingredients_text_ru categorical 94.0% 1
null_rate imbalance
packaging_text_da categorical 96.0% 1
null_rate imbalance
generic_name_da categorical 96.0% 2
long_tail null_rate
forest_footprint_data unknown 0.0%
skipped
product_name_da categorical 96.0% 2
long_tail null_rate
ingredients_text_with_allergens_da categorical 96.0% 2
long_tail null_rate
origin_da categorical 96.0% 1
null_rate imbalance
ingredients_text_da categorical 96.0% 2
long_tail null_rate
ingredients_text_cs categorical 94.0% 2
null_rate
ingredients_text_nl_ocr_1675675383_result categorical 98.0% 1
long_tail null_rate imbalance
product_name_cs categorical 94.0% 2
null_rate
ingredients_text_hu_ocr_1571428260_result categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_cs categorical 94.0% 1
null_rate imbalance
ingredients_text_sr categorical 96.0% 2
long_tail null_rate
origin_sr categorical 96.0% 1
null_rate imbalance
ingredients_text_hu_ocr_1571428260 categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_hu categorical 92.0% 1
null_rate imbalance
origin_cs categorical 96.0% 1
null_rate imbalance
ingredients_text_nl_ocr_1675675383 categorical 98.0% 1
long_tail null_rate imbalance
product_name_sr categorical 96.0% 2
long_tail null_rate
generic_name_hu categorical 92.0% 2
null_rate
packaging_text_sr categorical 96.0% 1
null_rate imbalance
ingredients_text_with_allergens_cs categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_sr categorical 96.0% 2
long_tail null_rate
ingredients_text_hu categorical 92.0% 4
long_tail null_rate
product_name_hu categorical 92.0% 3
long_tail null_rate
generic_name_sr categorical 96.0% 2
long_tail null_rate
origin_hu categorical 92.0% 1
null_rate imbalance
ingredients_text_with_allergens_hu categorical 94.0% 3
long_tail null_rate
generic_name_cs categorical 94.0% 1
null_rate imbalance
ingredients_text_xx categorical 96.0% 1
null_rate imbalance
origin_xx categorical 98.0% 1
long_tail null_rate imbalance
product_name_xx categorical 96.0% 1
null_rate imbalance
packaging_text_xx categorical 98.0% 1
long_tail null_rate imbalance
generic_name_xx categorical 96.0% 1
null_rate imbalance
ingredients_text_es_ocr_1548767061 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_es_ocr_1548767061_result categorical 98.0% 1
long_tail null_rate imbalance
origin_ur categorical 98.0% 1
long_tail null_rate imbalance
product_name_he categorical 96.0% 2
long_tail null_rate
ingredients_text_he categorical 98.0% 1
long_tail null_rate imbalance
product_name_ur categorical 98.0% 1
long_tail null_rate imbalance
generic_name_he categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_he categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_ur categorical 98.0% 1
long_tail null_rate imbalance
origin_he categorical 98.0% 1
long_tail null_rate imbalance
generic_name_ur categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_ur categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_he categorical 98.0% 1
long_tail null_rate imbalance
nutriscore_grade_producer_imported categorical 94.0% 3
long_tail null_rate
nutriscore_grade_producer categorical 94.0% 3
long_tail null_rate
ingredients_text_el categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_el categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_el categorical 98.0% 1
long_tail null_rate imbalance
origin_el categorical 98.0% 1
long_tail null_rate imbalance
product_name_el categorical 98.0% 1
long_tail null_rate imbalance
generic_name_el categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_it_ocr_1559410715 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_de_ocr_1559410715 categorical 98.0% 1
long_tail null_rate imbalance
product_name_th categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_th categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_de_ocr_1548767354 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_de_ocr_1548767354_result categorical 98.0% 1
long_tail null_rate imbalance
generic_name_th categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_it_ocr_1559410715_result categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_th categorical 98.0% 1
long_tail null_rate imbalance
origin_th categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_th categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_de_ocr_1559410715_result categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_fr_imported categorical 98.0% 1
long_tail null_rate imbalance
preparation categorical 98.0% 1
long_tail null_rate imbalance
preparation_fr_imported categorical 98.0% 1
long_tail null_rate imbalance
preparation_fr categorical 98.0% 1
long_tail null_rate imbalance
generic_name_lc categorical 98.0% 1
long_tail null_rate imbalance
product_name_lc categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_lc categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_lc categorical 98.0% 1
long_tail null_rate imbalance
generic_name_xx_debug_tags unknown 0.0%
skipped
ingredients_text_xx_debug_tags unknown 0.0%
skipped
product_name_xx_debug_tags unknown 0.0%
skipped
ingredients_text_fr_ocr_1561814324_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1561814324 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1624039072 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1624039072_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108349 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108349_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107560_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108360 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107556_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573109955 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1566920858 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107560 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108346 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108346_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573109955_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1566920858_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107556 categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108360_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_ro categorical 98.0% 1
long_tail null_rate imbalance
origin_lt categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_with_allergens_lt categorical 98.0% 1
long_tail null_rate imbalance
product_name_lt categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_lt categorical 98.0% 1
long_tail null_rate imbalance
packaging_text_lt categorical 98.0% 1
long_tail null_rate imbalance
generic_name_lt categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1713713129_result categorical 98.0% 1
long_tail null_rate imbalance
ingredients_text_fr_ocr_1713713129 categorical 98.0% 1
long_tail null_rate imbalance

ingredients_with_unspecified_percent_sum

numeric
n
50
nulls
0 (0.0%)
unique
22
min
0.4
max
100
mean
79.42
median
100
std
31.64
q1
53.6
q3
100
iqr
46.4
skew
-1.183
kurtosis
-0.133
n_outliers
0
outlier_rate
0
zero_rate
0

purchase_places

categorical long_tail
n
50
nulls
1 (2.0%)
unique
32
top_value
France
top_rate
0.1837
cardinality
32
entropy
4.479
entropy_ratio
0.8958

rev

numeric
n
50
nulls
0 (0.0%)
unique
46
min
19
max
674
mean
230
median
233.5
std
166.6
q1
72.75
q3
310.5
iqr
237.8
skew
0.7092
kurtosis
-0.02278
n_outliers
1
outlier_rate
0.02
zero_rate
0

product_name_it

categorical long_tail null_rate
n
50
nulls
34 (68.0%)
unique
12
top_value
top_rate
0.3125
cardinality
12
entropy
3.274
entropy_ratio
0.9134

editors

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients

numeric constant
n
50
nulls
5 (10.0%)
unique
1
min
1
max
1
mean
1
median
1
std
0
q1
1
q3
1
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

traces_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging

categorical long_tail
n
50
nulls
6 (12.0%)
unique
41
top_value
Plastique
top_rate
0.09091
cardinality
41
entropy
5.278
entropy_ratio
0.9851

packagings_n

numeric outliers
n
50
nulls
9 (18.0%)
unique
5
min
1
max
5
mean
2.073
median
2
std
0.8772
q1
2
q3
2
iqr
0
skew
0.9834
kurtosis
1.602
n_outliers
20
outlier_rate
0.4878
zero_rate
0

categories_properties

unknown skipped
n
50
nulls
0 (0.0%)
unique

generic_name_en

categorical long_tail
n
50
nulls
7 (14.0%)
unique
8
top_value
top_rate
0.8372
cardinality
8
entropy
1.098
entropy_ratio
0.366

food_groups

categorical
n
50
nulls
1 (2.0%)
unique
11
top_value
en:biscuits-and-cakes
top_rate
0.3469
cardinality
11
entropy
2.549
entropy_ratio
0.7367

ingredients_without_ciqual_codes_n

numeric
n
50
nulls
0 (0.0%)
unique
15
min
0
max
22
mean
4.98
median
3.5
std
4.825
q1
1
q3
7.75
iqr
6.75
skew
1.208
kurtosis
1.491
n_outliers
1
outlier_rate
0.02
zero_rate
0.18

origin_sv

categorical other null_rate imbalance
This column, likely an origin or source indicator (possibly a survey or system variant field), is effectively empty: 92% of its 50 rows are null, and the sole non-null 'value' present in 4 rows is itself an empty string. With cardinality of 1 and entropy of 0, there is zero information content in this column. The combination of near-total nulls and a blank top value means the column carries no usable signal whatsoever. Treatment: Drop — column contains no information (92% null, remaining values are empty strings). high · anthropic:default
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_ja

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

data_quality_warnings_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_recycling_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

scores

unknown skipped
n
50
nulls
0 (0.0%)
unique

nucleotides_prev_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

data_quality_dimensions

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_fi

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
4
top_value
top_rate
0.4
cardinality
4
entropy
1.922
entropy_ratio
0.961

origin_de

categorical label null_rate imbalance
This column appears to be a German-language origin/source label field ('origin_de'), but it contains effectively no usable data: the only observed value across all 50 rows is an empty string, appearing 20 times, with 60% of rows (30) being null. Cardinality is 1, entropy is 0, and top_rate is 1.0 — the column is entirely uninformative in its current state. Treatment: Drop this column; it carries zero information (all non-null values are empty strings and 60% are null). high · anthropic:default
n
50
nulls
30 (60.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_lc

categorical
n
50
nulls
6 (12.0%)
unique
7
top_value
fr
top_rate
0.3864
cardinality
7
entropy
1.992
entropy_ratio
0.7094

correctors_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

categories_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_ids_debug

unknown skipped
n
50
nulls
0 (0.0%)
unique

traces_lc

categorical
n
50
nulls
2 (4.0%)
unique
6
top_value
fr
top_rate
0.4792
cardinality
6
entropy
1.575
entropy_ratio
0.6093

environment_impact_level_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

last_image_t

numeric high_skew
n
50
nulls
0 (0.0%)
unique
50
min
1.639e+09
max
1.768e+09
mean
1.745e+09
median
1.752e+09
std
2.681e+07
q1
1.735e+09
q3
1.764e+09
iqr
2.896e+07
skew
-2.443
kurtosis
7.36
n_outliers
2
outlier_rate
0.04
zero_rate
0

ingredients_that_may_be_from_palm_oil_n

numeric high_skew outliers
n
50
nulls
4 (8.0%)
unique
3
min
0
max
2
mean
0.1957
median
0
std
0.4531
q1
0
q3
0
iqr
0
skew
2.23
kurtosis
4.321
n_outliers
8
outlier_rate
0.1739
zero_rate
0.8261

max_imgid

categorical long_tail
n
50
nulls
0 (0.0%)
unique
38
top_value
47
top_rate
0.06
cardinality
38
entropy
5.149
entropy_ratio
0.9811

nutriscore_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

generic_name_sv

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
4
top_value
Fin mörk choklad med 90% kakao
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1

ingredients_text_with_allergens_nb

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

quantity

categorical long_tail
n
50
nulls
1 (2.0%)
unique
36
top_value
100 g
top_rate
0.1224
cardinality
36
entropy
4.956
entropy_ratio
0.9587

countries_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

data_quality_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_n

numeric
n
50
nulls
0 (0.0%)
unique
22
min
1
max
39
mean
11.7
median
9
std
8.244
q1
5
q3
16
iqr
11
skew
1.237
kurtosis
1.435
n_outliers
2
outlier_rate
0.04
zero_rate
0

grades

unknown skipped
n
50
nulls
0 (0.0%)
unique

additives_original_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutrition_score_beverage

numeric high_skew
n
50
nulls
0 (0.0%)
unique
2
min
0
max
1
mean
0.02
median
0
std
0.1414
q1
0
q3
0
iqr
0
skew
6.857
kurtosis
45.02
n_outliers
1
outlier_rate
0.02
zero_rate
0.98

packaging_text_nl

categorical other null_rate imbalance
This column appears to hold Dutch-language packaging text for products, but is effectively empty: 76% of the 50 rows are null, and the sole non-null value is an empty string appearing 12 times, giving a cardinality of 1 and zero entropy. Every observed value is either missing or a blank string, meaning this column carries no usable information in this sample. Treatment: Drop this column; it contains no informative values in the current dataset. high · anthropic:default
n
50
nulls
38 (76.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

photographers

unknown skipped
n
50
nulls
0 (0.0%)
unique

pnns_groups_1

categorical
n
50
nulls
0 (0.0%)
unique
7
top_value
Sugary snacks
top_rate
0.76
cardinality
7
entropy
1.36
entropy_ratio
0.4846

product_name_en

categorical long_tail
n
50
nulls
7 (14.0%)
unique
34
top_value
top_rate
0.2326
cardinality
34
entropy
4.654
entropy_ratio
0.9147

traces_from_user

categorical long_tail
n
50
nulls
0 (0.0%)
unique
35
top_value
(en)
top_rate
0.14
cardinality
35
entropy
4.811
entropy_ratio
0.9379

generic_name_nl

categorical long_tail null_rate
n
50
nulls
38 (76.0%)
unique
4
top_value
top_rate
0.75
cardinality
4
entropy
1.208
entropy_ratio
0.6038

nutrition_grade_fr

categorical
n
50
nulls
0 (0.0%)
unique
6
top_value
e
top_rate
0.54
cardinality
6
entropy
1.913
entropy_ratio
0.7399

image_front_thumb_url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

last_editor

categorical long_tail
n
50
nulls
1 (2.0%)
unique
24
top_value
foodless
top_rate
0.4286
cardinality
24
entropy
3.513
entropy_ratio
0.7662

nutrient_levels_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_nb

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

packaging_shapes_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

_keywords

unknown skipped
n
50
nulls
0 (0.0%)
unique

emb_codes_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

images

unknown skipped
n
50
nulls
0 (0.0%)
unique

states_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_text_sv

categorical other null_rate imbalance
This column appears to be a Swedish-language packaging text field ('sv' suffix indicating Swedish locale), but it is effectively empty in this dataset. A 92% null rate leaves only 4 non-null rows, and all 4 of those contain an empty string — meaning there is zero usable content across all 50 rows. The cardinality of 1 and entropy of 0.0 confirm complete absence of informational signal. Treatment: Drop — 100% of present values are empty strings and 92% are null, yielding no usable signal. high · anthropic:default
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

informers_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_pl

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
3
top_value
top_rate
0.6
cardinality
3
entropy
1.371
entropy_ratio
0.865

labels

categorical long_tail
n
50
nulls
1 (2.0%)
unique
42
top_value
top_rate
0.1633
cardinality
42
entropy
5.125
entropy_ratio
0.9504

sources

unknown skipped
n
50
nulls
0 (0.0%)
unique

checkers_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_quantity_unit

categorical imbalance
n
50
nulls
5 (10.0%)
unique
2
top_value
g
top_rate
0.9778
cardinality
2
entropy
0.1537
entropy_ratio
0.1537

last_modified_by

categorical long_tail
n
50
nulls
1 (2.0%)
unique
24
top_value
foodless
top_rate
0.4286
cardinality
24
entropy
3.513
entropy_ratio
0.7662

image_front_url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

nutrition_data_prepared

categorical imbalance
n
50
nulls
2 (4.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_fi

categorical metadata null_rate imbalance
This column appears to be Finnish-language packaging text for a product dataset, but it is almost entirely empty: 90% of the 50 rows are null, and the sole non-null value across all 5 populated rows is an empty string. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated. Treatment: Drop this column; it is 90% null with only empty strings in the remaining rows and provides no signal for modelling or analysis. high · anthropic:default
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

interface_version_created

categorical
n
50
nulls
1 (2.0%)
unique
3
top_value
20120622
top_rate
0.5918
cardinality
3
entropy
1.167
entropy_ratio
0.7363

nutrient_levels

unknown skipped
n
50
nulls
0 (0.0%)
unique

languages_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

vitamins_prev_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

other_nutritional_substances_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_de

categorical long_tail null_rate
n
50
nulls
30 (60.0%)
unique
16
top_value
top_rate
0.25
cardinality
16
entropy
3.741
entropy_ratio
0.9354

nutrition_grades

categorical
n
50
nulls
0 (0.0%)
unique
6
top_value
e
top_rate
0.54
cardinality
6
entropy
1.913
entropy_ratio
0.7399

countries_beforescanbot

categorical long_tail
n
50
nulls
7 (14.0%)
unique
38
top_value
France
top_rate
0.1395
cardinality
38
entropy
5.066
entropy_ratio
0.9653

ingredients_text_with_allergens_es

categorical long_tail null_rate
n
50
nulls
31 (62.0%)
unique
13
top_value
top_rate
0.3684
cardinality
13
entropy
3.214
entropy_ratio
0.8684

labels_lc

categorical
n
50
nulls
1 (2.0%)
unique
6
top_value
en
top_rate
0.449
cardinality
6
entropy
1.57
entropy_ratio
0.6072

nova_group_debug

categorical long_tail imbalance
n
50
nulls
0 (0.0%)
unique
3
top_value
top_rate
0.96
cardinality
3
entropy
0.2823
entropy_ratio
0.1781

nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value

numeric high_skew outliers
n
50
nulls
4 (8.0%)
unique
6
min
0
max
50
mean
1.652
median
0
std
7.551
q1
0
q3
0
iqr
0
skew
5.932
kurtosis
35.23
n_outliers
5
outlier_rate
0.1087
zero_rate
0.8913

lc

categorical
n
50
nulls
0 (0.0%)
unique
5
top_value
fr
top_rate
0.7
cardinality
5
entropy
1.294
entropy_ratio
0.5572

allergens_from_user

categorical long_tail
n
50
nulls
0 (0.0%)
unique
34
top_value
(fr)
top_rate
0.16
cardinality
34
entropy
4.636
entropy_ratio
0.9112

debug_param_sorted_langs

unknown skipped
n
50
nulls
0 (0.0%)
unique

ecoscore_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutriscore_score_opposite

numeric
n
50
nulls
1 (2.0%)
unique
28
min
-40
max
0
mean
-17.47
median
-19
std
9.906
q1
-25
q3
-10
iqr
15
skew
0.1616
kurtosis
-0.5337
n_outliers
0
outlier_rate
0
zero_rate
0.08163

image_small_url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

codes_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

pnns_groups_2_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_analysis_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

purchase_places_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

unique_scans_n

numeric feature high_skew outliers
This column represents a count of unique scans (likely QR-code or barcode scan events) per record, with 50 observations and no nulls. The bulk of values cluster between 362.75 (Q1) and 560.75 (Q3), yet a right-skewed tail (skew=3.91, kurtosis=18.71) driven by 4 outliers pulls the mean (525.38) well above the median (432.0), with a maximum of 2257.0 — nearly 4× the Q3 value. The outlier rate of 8% in just 50 rows is a strong signal that a small number of records see dramatically higher scan volumes than the rest. Treatment: Log-transform or apply robust scaling before modelling to reduce the influence of the 4 extreme outliers; investigate those records for data-quality issues. high · anthropic:default
n
50
nulls
0 (0.0%)
unique
48
min
319
max
2,257
mean
525.4
median
432
std
306.4
q1
362.8
q3
560.8
iqr
198
skew
3.911
kurtosis
18.71
n_outliers
4
outlier_rate
0.08
zero_rate
0

update_key

categorical long_tail
n
50
nulls
0 (0.0%)
unique
9
top_value
brands
top_rate
0.56
cardinality
9
entropy
2.015
entropy_ratio
0.6357

emb_codes_orig

categorical long_tail null_rate
n
50
nulls
17 (34.0%)
unique
5
top_value
top_rate
0.8485
cardinality
5
entropy
0.9048
entropy_ratio
0.3897

ingredients_text_with_allergens_de

categorical long_tail null_rate
n
50
nulls
33 (66.0%)
unique
16
top_value
top_rate
0.1176
cardinality
16
entropy
3.97
entropy_ratio
0.9925

ingredients_without_ecobalyse_ids_n

numeric
n
50
nulls
0 (0.0%)
unique
20
min
0
max
29
mean
8.16
median
6.5
std
5.898
q1
4
q3
11
iqr
7
skew
1.28
kurtosis
1.743
n_outliers
1
outlier_rate
0.02
zero_rate
0.02

main_countries_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_with_allergens_en

categorical long_tail
n
50
nulls
8 (16.0%)
unique
36
top_value
top_rate
0.1667
cardinality
36
entropy
4.924
entropy_ratio
0.9525

nucleotides_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_with_allergens_sv

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
4
top_value
kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1

entry_dates_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

allergens_from_ingredients

categorical long_tail
n
50
nulls
0 (0.0%)
unique
35
top_value
top_rate
0.3
cardinality
35
entropy
4.432
entropy_ratio
0.864

nova_groups

categorical
n
50
nulls
2 (4.0%)
unique
3
top_value
4
top_rate
0.6875
cardinality
3
entropy
1.006
entropy_ratio
0.635

product_quantity

categorical long_tail
n
50
nulls
3 (6.0%)
unique
27
top_value
100
top_rate
0.234
cardinality
27
entropy
4.287
entropy_ratio
0.9017

ingredients_debug

unknown skipped
n
50
nulls
0 (0.0%)
unique

generic_name

categorical long_tail
n
50
nulls
2 (4.0%)
unique
28
top_value
top_rate
0.4375
cardinality
28
entropy
3.663
entropy_ratio
0.762

origins_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

added_countries_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

categories_lc

categorical
n
50
nulls
0 (0.0%)
unique
6
top_value
fr
top_rate
0.5
cardinality
6
entropy
1.628
entropy_ratio
0.6297

image_url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

ingredients_sweeteners_n

numeric constant
n
50
nulls
0 (0.0%)
unique
1
min
0
max
0
mean
0
median
0
std
0
q1
0
q3
0
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
1

ingredients_text_ja

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

allergens_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

origin_es

categorical other null_rate imbalance
This column appears to be a Spanish-language origin/source label field ('origin_es'), but it is entirely devoid of meaningful content: the sole observed value is an empty string, appearing 20 times across 50 rows. With a 60% null rate and the remaining 40% being empty strings, the column carries zero informational entropy and is effectively blank across the entire dataset. This is a strong signal that the field was never populated. Treatment: Drop this column; it contains no usable signal (cardinality 1, top value is empty string, 60% nulls). high · anthropic:default
n
50
nulls
30 (60.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

last_updated_t

numeric outliers
n
50
nulls
0 (0.0%)
unique
50
min
1.739e+09
max
1.769e+09
mean
1.763e+09
median
1.767e+09
std
8.037e+06
q1
1.762e+09
q3
1.768e+09
iqr
6.138e+06
skew
-1.945
kurtosis
2.892
n_outliers
6
outlier_rate
0.12
zero_rate
0

origin_fr

categorical long_tail
n
50
nulls
4 (8.0%)
unique
7
top_value
top_rate
0.8696
cardinality
7
entropy
0.8958
entropy_ratio
0.3191

nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value

numeric high_skew outliers
n
50
nulls
5 (10.0%)
unique
13
min
0
max
100
mean
4.532
median
0
std
15.52
q1
0
q3
2.326
iqr
2.326
skew
5.411
kurtosis
30.37
n_outliers
7
outlier_rate
0.1556
zero_rate
0.7111

ingredients_without_ecobalyse_ids

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_with_allergens_it

categorical long_tail null_rate
n
50
nulls
34 (68.0%)
unique
12
top_value
top_rate
0.3125
cardinality
12
entropy
3.274
entropy_ratio
0.9134

data_quality_errors_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

origin_pl

categorical metadata null_rate imbalance
This column appears to be an 'origin platform' or similar provenance field, but it is essentially empty: 90% of its 50 rows are null, and the only non-null value is an empty string appearing 5 times. With cardinality of 1 and entropy of 0.0, it carries zero information. The combination of high null rate and a single blank value strongly suggests this field was never populated in this dataset slice. Treatment: Drop — zero variance and 90% nulls make this column useless for modelling or analysis. high · anthropic:default
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_fr

categorical long_tail
n
50
nulls
3 (6.0%)
unique
14
top_value
top_rate
0.7234
cardinality
14
entropy
1.874
entropy_ratio
0.4923

debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_sv

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
4
top_value
kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1

cities_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_with_unspecified_percent_n

numeric
n
50
nulls
0 (0.0%)
unique
18
min
1
max
33
mean
8.8
median
7
std
6.061
q1
5
q3
11
iqr
6
skew
1.645
kurtosis
3.545
n_outliers
2
outlier_rate
0.04
zero_rate
0

product_name_fr

categorical long_tail
n
50
nulls
1 (2.0%)
unique
47
top_value
Henry’s
top_rate
0.04082
cardinality
47
entropy
5.533
entropy_ratio
0.9961

traces

categorical long_tail
n
50
nulls
0 (0.0%)
unique
23
top_value
top_rate
0.22
cardinality
23
entropy
3.922
entropy_ratio
0.8671

known_ingredients_n

numeric
n
50
nulls
0 (0.0%)
unique
22
min
0
max
36
mean
11.76
median
9
std
8.721
q1
5
q3
18.5
iqr
13.5
skew
0.8598
kurtosis
0.07411
n_outliers
0
outlier_rate
0
zero_rate
0.04

packaging_text_pl

categorical other null_rate imbalance
This column appears to be a Polish-language packaging text field that is almost entirely empty: 90% of its 50 rows are null, and the sole non-null value present in 5 rows is an empty string. With cardinality of 1 and entropy of 0, the column carries zero information. The combination of a 90% null rate and a top_value of '' means not a single meaningful entry exists in this sample. Treatment: Drop this column; it contains no usable information in the current sample. high · anthropic:default
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

image_front_small_url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

origin_en

categorical imbalance
n
50
nulls
7 (14.0%)
unique
2
top_value
top_rate
0.9767
cardinality
2
entropy
0.1594
entropy_ratio
0.1594

interface_version_modified

categorical
n
50
nulls
0 (0.0%)
unique
2
top_value
20150316.jqm2
top_rate
0.84
cardinality
2
entropy
0.6343
entropy_ratio
0.6343

serving_size

categorical long_tail
n
50
nulls
6 (12.0%)
unique
37
top_value
100g
top_rate
0.06818
cardinality
37
entropy
5.107
entropy_ratio
0.9803

states

categorical long_tail
n
50
nulls
0 (0.0%)
unique
26
top_value
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded
top_rate
0.16
cardinality
26
entropy
4.286
entropy_ratio
0.9119

generic_name_fi

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
5
top_value
Hieno tumma suklaa jossa 90% kaakaota
top_rate
0.2
cardinality
5
entropy
2.322
entropy_ratio
1

schema_version

numeric constant
n
50
nulls
0 (0.0%)
unique
1
min
996
max
996
mean
996
median
996
std
0
q1
996
q3
996
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

packaging_old_before_taxonomization

categorical long_tail null_rate
n
50
nulls
12 (24.0%)
unique
36
top_value
plastique
top_rate
0.07895
cardinality
36
entropy
5.123
entropy_ratio
0.9909

nova_groups_markers

unknown skipped
n
50
nulls
0 (0.0%)
unique

amino_acids_prev_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product

unknown skipped
n
50
nulls
0 (0.0%)
unique

emb_codes

categorical long_tail
n
50
nulls
2 (4.0%)
unique
11
top_value
top_rate
0.7292
cardinality
11
entropy
1.72
entropy_ratio
0.4972

labels_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

selected_images

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutriscore

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

traces_from_ingredients

categorical long_tail
n
50
nulls
0 (0.0%)
unique
12
top_value
top_rate
0.78
cardinality
12
entropy
1.521
entropy_ratio
0.4243

nutrition_data_per

categorical
n
50
nulls
0 (0.0%)
unique
2
top_value
100g
top_rate
0.84
cardinality
2
entropy
0.6343
entropy_ratio
0.6343

ecoscore_grade

categorical
n
50
nulls
0 (0.0%)
unique
9
top_value
e
top_rate
0.24
cardinality
9
entropy
2.808
entropy_ratio
0.8857

packaging_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

nova_group

numeric high_skew
n
50
nulls
2 (4.0%)
unique
3
min
1
max
4
mean
3.646
median
4
std
0.601
q1
3
q3
4
iqr
1
skew
-2.062
kurtosis
5.651
n_outliers
1
outlier_rate
0.02083
zero_rate
0

additives_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

emb_codes_20141016

categorical long_tail null_rate
n
50
nulls
29 (58.0%)
unique
7
top_value
top_rate
0.7143
cardinality
7
entropy
1.602
entropy_ratio
0.5705

ingredients_without_ciqual_codes

unknown skipped
n
50
nulls
0 (0.0%)
unique

categories_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

category_properties

unknown skipped
n
50
nulls
0 (0.0%)
unique

packagings

unknown skipped
n
50
nulls
0 (0.0%)
unique

languages_codes

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_with_allergens_fi

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
4
top_value
top_rate
0.4
cardinality
4
entropy
1.922
entropy_ratio
0.961

ciqual_food_name_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

complete

numeric
n
50
nulls
0 (0.0%)
unique
2
min
0
max
1
mean
0.32
median
0
std
0.4712
q1
0
q3
1
iqr
1
skew
0.7717
kurtosis
-1.404
n_outliers
0
outlier_rate
0
zero_rate
0.68

ingredients_text_with_allergens_pl

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
3
top_value
top_rate
0.5
cardinality
3
entropy
1.5
entropy_ratio
0.9464

allergens_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

languages_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

nova_groups_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_it

categorical long_tail null_rate
n
50
nulls
34 (68.0%)
unique
12
top_value
top_rate
0.3125
cardinality
12
entropy
3.274
entropy_ratio
0.9134

informers

unknown skipped
n
50
nulls
0 (0.0%)
unique

origin_nb

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

creator

categorical long_tail
n
50
nulls
0 (0.0%)
unique
13
top_value
openfoodfacts-contributors
top_rate
0.46
cardinality
13
entropy
2.351
entropy_ratio
0.6353

packaging_text_ja

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

sortkey

numeric high_skew outliers
n
50
nulls
6 (12.0%)
unique
44
min
1.568e+09
max
1.611e+09
mean
1.605e+09
median
1.608e+09
std
8.692e+06
q1
1.604e+09
q3
1.61e+09
iqr
6.16e+06
skew
-2.782
kurtosis
8.091
n_outliers
4
outlier_rate
0.09091
zero_rate
0

packagings_materials_main

categorical null_rate
n
50
nulls
31 (62.0%)
unique
3
top_value
en:paper-or-cardboard
top_rate
0.6842
cardinality
3
entropy
1.105
entropy_ratio
0.6972

ingredients_percent_analysis

numeric feature high_skew outliers
This column appears to be a binary flag or pass/fail indicator for ingredient percentage analysis, taking only two distinct values across all 50 rows: 1.0 (present in the vast majority) and -1.0 (a minority case). With Q1, median, and Q3 all equal to 1.0 and a mean of 0.84, roughly 84% of records are coded 1.0 while the remaining ~16% are -1.0, which are flagged as the 4 outliers (8% outlier rate). The extreme skew (−3.10) and kurtosis (7.59) are entirely explained by this near-constant binary distribution, not by a continuous numeric spread. Treatment: Recode as a binary categorical (1 / -1 → 1 / 0) before modelling; verify whether -1.0 encodes 'fail' or 'missing' to avoid misinterpretation. high · anthropic:default
n
50
nulls
0 (0.0%)
unique
2
min
-1
max
1
mean
0.84
median
1
std
0.5481
q1
1
q3
1
iqr
0
skew
-3.096
kurtosis
7.587
n_outliers
4
outlier_rate
0.08
zero_rate
0

amino_acids_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

categories_properties_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

environment_impact_level

categorical other null_rate imbalance
This column is intended to capture an environmental impact level category, but it is effectively empty: 56% of the 50 rows are null and the remaining 44% (22 rows) contain only a blank string, yielding a single unique value and zero entropy. The column carries no usable information in its current state and is entirely uninformative for modelling or analysis. Treatment: Drop this column; all non-null values are blank strings and it contains zero informational signal. high · anthropic:default
n
50
nulls
28 (56.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

expiration_date

categorical long_tail
n
50
nulls
2 (4.0%)
unique
34
top_value
top_rate
0.3125
cardinality
34
entropy
4.364
entropy_ratio
0.8578

ingredients_from_or_that_may_be_from_palm_oil_n

numeric
n
50
nulls
3 (6.0%)
unique
3
min
0
max
2
mean
0.3404
median
0
std
0.5625
q1
0
q3
1
iqr
1
skew
1.393
kurtosis
0.969
n_outliers
0
outlier_rate
0
zero_rate
0.7021

nutriscore_score

numeric
n
50
nulls
1 (2.0%)
unique
28
min
0
max
40
mean
17.47
median
19
std
9.906
q1
10
q3
25
iqr
15
skew
-0.1616
kurtosis
-0.5337
n_outliers
0
outlier_rate
0
zero_rate
0.08163

ingredients_text_with_allergens

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
milk cream, cream, sugar, banana, bacteria
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

ingredients_with_specified_percent_sum

numeric
n
50
nulls
0 (0.0%)
unique
22
min
0
max
99.6
mean
22.74
median
0
std
32.88
q1
0
q3
52.25
iqr
52.25
skew
0.9979
kurtosis
-0.5856
n_outliers
0
outlier_rate
0
zero_rate
0.58

nutriscore_version

categorical imbalance
n
50
nulls
0 (0.0%)
unique
1
top_value
2023
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

lang

categorical
n
50
nulls
0 (0.0%)
unique
5
top_value
fr
top_rate
0.7
cardinality
5
entropy
1.294
entropy_ratio
0.5572

origins_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

origins_lc

categorical
n
50
nulls
2 (4.0%)
unique
6
top_value
fr
top_rate
0.4792
cardinality
6
entropy
1.575
entropy_ratio
0.6093

origin_it

categorical other null_rate imbalance
This column appears to be an 'origin Italy' flag or similar origin/locale indicator, but it is effectively empty: 68% of its 50 rows are null, and the sole non-null value present is an empty string appearing 16 times. With cardinality of 1 and entropy of 0, the column carries zero information. The combination of high nulls and a blank-string-only value suggests the field was never populated in this dataset slice. Treatment: Drop — zero variance and entirely unpopulated (null or empty string); contributes no signal to any downstream task. high · anthropic:default
n
50
nulls
34 (68.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

serving_quantity

categorical long_tail
n
50
nulls
6 (12.0%)
unique
27
top_value
100
top_rate
0.1591
cardinality
27
entropy
4.322
entropy_ratio
0.9089

checkers

unknown skipped
n
50
nulls
0 (0.0%)
unique

editors_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

stores

categorical long_tail
n
50
nulls
2 (4.0%)
unique
31
top_value
top_rate
0.2917
cardinality
31
entropy
4.233
entropy_ratio
0.8543

product_name_pl

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
3
top_value
top_rate
0.6
cardinality
3
entropy
1.371
entropy_ratio
0.865

weighters_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ecoscore_score

numeric
n
50
nulls
7 (14.0%)
unique
31
min
13
max
94
mean
47.74
median
44
std
21.19
q1
27.5
q3
64
iqr
36.5
skew
0.3069
kurtosis
-0.7946
n_outliers
0
outlier_rate
0
zero_rate
0

generic_name_it

categorical long_tail null_rate
n
50
nulls
34 (68.0%)
unique
5
top_value
top_rate
0.6875
cardinality
5
entropy
1.497
entropy_ratio
0.6446

obsolete

categorical imbalance
n
50
nulls
6 (12.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

other_nutritional_substances_prev_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

compared_to_category

categorical long_tail
n
50
nulls
0 (0.0%)
unique
35
top_value
en:dark-chocolate-bar-with-more-than-70-cocoa
top_rate
0.1
cardinality
35
entropy
4.886
entropy_ratio
0.9526

generic_name_es

categorical long_tail null_rate
n
50
nulls
30 (60.0%)
unique
7
top_value
top_rate
0.65
cardinality
7
entropy
1.817
entropy_ratio
0.6471

correctors

unknown skipped
n
50
nulls
0 (0.0%)
unique

additives_n

numeric
n
50
nulls
0 (0.0%)
unique
8
min
0
max
8
mean
1.52
median
1
std
1.821
q1
0
q3
2
iqr
2
skew
1.473
kurtosis
2.105
n_outliers
2
outlier_rate
0.04
zero_rate
0.4

ingredients_text_nb

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_es

categorical long_tail null_rate
n
50
nulls
30 (60.0%)
unique
13
top_value
top_rate
0.4
cardinality
13
entropy
3.122
entropy_ratio
0.8437

manufacturing_places_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

origin

categorical long_tail
n
50
nulls
3 (6.0%)
unique
6
top_value
top_rate
0.8936
cardinality
6
entropy
0.7359
entropy_ratio
0.2847

origins_old

categorical long_tail null_rate
n
50
nulls
11 (22.0%)
unique
9
top_value
top_rate
0.7949
cardinality
9
entropy
1.347
entropy_ratio
0.4251

packaging_text_de

categorical null_rate
n
50
nulls
30 (60.0%)
unique
2
top_value
top_rate
0.95
cardinality
2
entropy
0.2864
entropy_ratio
0.2864

languages

unknown skipped
n
50
nulls
0 (0.0%)
unique

categories_old

categorical long_tail
n
50
nulls
1 (2.0%)
unique
45
top_value
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits secs
top_rate
0.04082
cardinality
45
entropy
5.451
entropy_ratio
0.9926

ingredients_from_palm_oil_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

minerals_prev_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

origin_fi

categorical other null_rate imbalance
This column, likely representing an origin financial institution or similar identifier, is almost entirely empty: 90% null rate with only 5 non-null rows across 50 records. Among those 5 non-null values, every single one is an empty string, meaning the column contains zero meaningful information—cardinality is 1, entropy is 0, and the sole 'value' is a blank. Treatment: Drop this column entirely; it carries no information and is 100% effectively empty across all 50 rows. high · anthropic:default
n
50
nulls
45 (90.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_old

categorical long_tail
n
50
nulls
7 (14.0%)
unique
40
top_value
Plastique
top_rate
0.06977
cardinality
40
entropy
5.269
entropy_ratio
0.9901

ingredients_text_fi

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
4
top_value
top_rate
0.4
cardinality
4
entropy
1.922
entropy_ratio
0.961

product_type

categorical imbalance
n
50
nulls
0 (0.0%)
unique
1
top_value
food
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

removed_countries_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

unknown_nutrients_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

no_nutrition_data

categorical imbalance
n
50
nulls
2 (4.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_analysis

unknown skipped
n
50
nulls
0 (0.0%)
unique

packagings_materials

unknown skipped
n
50
nulls
0 (0.0%)
unique

serving_quantity_unit

categorical imbalance
n
50
nulls
4 (8.0%)
unique
2
top_value
g
top_rate
0.9783
cardinality
2
entropy
0.1511
entropy_ratio
0.1511

product_name

categorical long_tail
n
50
nulls
0 (0.0%)
unique
49
top_value
Henry’s
top_rate
0.04
cardinality
49
entropy
5.604
entropy_ratio
0.9981

id

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
6111242100992
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

ingredients_text_with_allergens_nl

categorical long_tail null_rate
n
50
nulls
39 (78.0%)
unique
9
top_value
top_rate
0.2727
cardinality
9
entropy
3.027
entropy_ratio
0.955

categories

categorical long_tail
n
50
nulls
0 (0.0%)
unique
46
top_value
Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolat noir en tablette extra dégustation à 70% de cacao minimum
top_rate
0.06
cardinality
46
entropy
5.469
entropy_ratio
0.9901

nutrition_grades_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutriscore_2023_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

origin_ja

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

nutrition_score_debug

categorical imbalance
n
50
nulls
0 (0.0%)
unique
2
top_value
top_rate
0.98
cardinality
2
entropy
0.1414
entropy_ratio
0.1414

teams

categorical long_tail
n
50
nulls
4 (8.0%)
unique
39
top_value
pain-au-chocolat
top_rate
0.1087
cardinality
39
entropy
5.124
entropy_ratio
0.9695

unknown_ingredients_n

numeric high_skew outliers
n
50
nulls
0 (0.0%)
unique
6
min
0
max
13
mean
0.66
median
0
std
2.255
q1
0
q3
0
iqr
0
skew
4.236
kurtosis
18.32
n_outliers
8
outlier_rate
0.16
zero_rate
0.84

url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://world.openfoodfacts.org/product/6111242100992/perly
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

data_quality_completeness_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ecoscore_data

unknown skipped
n
50
nulls
0 (0.0%)
unique

generic_name_pl

categorical null_rate
n
50
nulls
45 (90.0%)
unique
2
top_value
top_rate
0.8
cardinality
2
entropy
0.7219
entropy_ratio
0.7219

nutrition_data

categorical imbalance
n
50
nulls
1 (2.0%)
unique
1
top_value
on
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_ja

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

nutriments

unknown skipped
n
50
nulls
0 (0.0%)
unique

last_image_dates_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

brands

categorical long_tail
n
50
nulls
0 (0.0%)
unique
41
top_value
Lindt
top_rate
0.08
cardinality
41
entropy
5.214
entropy_ratio
0.9731

minerals_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutrition_data_prepared_per

categorical imbalance
n
50
nulls
0 (0.0%)
unique
1
top_value
100g
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

popularity_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_text_es

categorical null_rate
n
50
nulls
30 (60.0%)
unique
2
top_value
top_rate
0.95
cardinality
2
entropy
0.2864
entropy_ratio
0.2864

manufacturing_places

categorical long_tail
n
50
nulls
1 (2.0%)
unique
20
top_value
top_rate
0.4082
cardinality
20
entropy
3.187
entropy_ratio
0.7374

generic_name_nb

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

last_modified_t

numeric outliers
n
50
nulls
0 (0.0%)
unique
50
min
1.738e+09
max
1.769e+09
mean
1.763e+09
median
1.767e+09
std
8.093e+06
q1
1.762e+09
q3
1.768e+09
iqr
6.138e+06
skew
-1.961
kurtosis
2.972
n_outliers
6
outlier_rate
0.12
zero_rate
0

vitamins_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

_id

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
6111242100992
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

teams_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

countries

categorical long_tail
n
50
nulls
0 (0.0%)
unique
43
top_value
Maroc
top_rate
0.1
cardinality
43
entropy
5.252
entropy_ratio
0.9678

pnns_groups_2

categorical
n
50
nulls
0 (0.0%)
unique
11
top_value
Biscuits and cakes
top_rate
0.34
cardinality
11
entropy
2.599
entropy_ratio
0.7513

states_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

code

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
6111242100992
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

countries_lc

categorical
n
50
nulls
1 (2.0%)
unique
6
top_value
en
top_rate
0.5714
cardinality
6
entropy
1.521
entropy_ratio
0.5883

stores_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

generic_name_de

categorical long_tail null_rate
n
50
nulls
30 (60.0%)
unique
9
top_value
top_rate
0.6
cardinality
9
entropy
2.171
entropy_ratio
0.6849

ingredients_n_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

allergens

categorical
n
50
nulls
0 (0.0%)
unique
16
top_value
top_rate
0.32
cardinality
16
entropy
3.364
entropy_ratio
0.8411

allergens_lc

categorical
n
50
nulls
2 (4.0%)
unique
6
top_value
en
top_rate
0.4583
cardinality
6
entropy
1.578
entropy_ratio
0.6104

ingredients_text_en

categorical long_tail
n
50
nulls
6 (12.0%)
unique
36
top_value
top_rate
0.2045
cardinality
36
entropy
4.811
entropy_ratio
0.9306

misc_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

photographers_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_materials_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_nl

categorical long_tail null_rate
n
50
nulls
38 (76.0%)
unique
7
top_value
top_rate
0.5
cardinality
7
entropy
2.292
entropy_ratio
0.8166

nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients

numeric constant
n
50
nulls
4 (8.0%)
unique
1
min
1
max
1
mean
1
median
1
std
0
q1
1
q3
1
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

product_name_sv

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
4
top_value
90% Cocoa
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1

food_groups_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

completeness

numeric outliers
n
50
nulls
0 (0.0%)
unique
14
min
0.575
max
1.1
mean
0.91
median
0.9
std
0.1358
q1
0.8875
q3
1
iqr
0.1125
skew
-0.6678
kurtosis
0.32
n_outliers
6
outlier_rate
0.12
zero_rate
0

pnns_groups_1_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_with_specified_percent_n

numeric
n
50
nulls
0 (0.0%)
unique
7
min
0
max
8
mean
1.1
median
0
std
1.729
q1
0
q3
2
iqr
2
skew
1.878
kurtosis
3.676
n_outliers
1
outlier_rate
0.02
zero_rate
0.58

origin_nl

categorical other null_rate imbalance
This column ('origin_nl') is a categorical field, likely a Dutch-language origin label or description, but it is effectively empty: 76% of the 50 rows are null, and the sole non-null value present is an empty string (''), appearing 12 times. With cardinality of 1, zero entropy, and a top_rate of 1.0 across only 12 non-null rows, the column carries no information whatsoever. Treatment: Drop this column; it contains no usable signal (100% null or empty string across all 50 rows). high · anthropic:default
n
50
nulls
38 (76.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

fruits-vegetables-nuts_100g_estimate

numeric null_rate high_skew
n
50
nulls
23 (46.0%)
unique
2
min
0
max
85
mean
3.148
median
0
std
16.36
q1
0
q3
0
iqr
0
skew
4.903
kurtosis
22.04
n_outliers
1
outlier_rate
0.03704
zero_rate
0.963

brands_old

categorical long_tail null_rate
n
50
nulls
16 (32.0%)
unique
29
top_value
Gerblé
top_rate
0.08824
cardinality
29
entropy
4.749
entropy_ratio
0.9776

generic_name_fr

categorical long_tail
n
50
nulls
3 (6.0%)
unique
34
top_value
top_rate
0.2979
cardinality
34
entropy
4.42
entropy_ratio
0.8689

ingredients

unknown skipped
n
50
nulls
0 (0.0%)
unique

countries_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_original_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_de

categorical long_tail null_rate
n
50
nulls
30 (60.0%)
unique
16
top_value
top_rate
0.25
cardinality
16
entropy
3.741
entropy_ratio
0.9354

nutriscore_grade

categorical
n
50
nulls
0 (0.0%)
unique
6
top_value
e
top_rate
0.54
cardinality
6
entropy
1.913
entropy_ratio
0.7399

image_thumb_url

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

packaging_text_en

categorical long_tail
n
50
nulls
7 (14.0%)
unique
5
top_value
top_rate
0.907
cardinality
5
entropy
0.6325
entropy_ratio
0.2724

packaging_text_it

categorical long_tail null_rate
n
50
nulls
34 (68.0%)
unique
3
top_value
top_rate
0.875
cardinality
3
entropy
0.6686
entropy_ratio
0.4218

traces_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

brands_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutriscore_2021_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_text

categorical long_tail
n
50
nulls
2 (4.0%)
unique
13
top_value
top_rate
0.75
cardinality
13
entropy
1.708
entropy_ratio
0.4614

popularity_key

numeric identifier high_skew outliers
This column appears to be a synthetic or encoded identifier rather than a true popularity metric — values cluster tightly in the 23.9–24.0 billion range, with a median of ~23,999,500,422 and a max of ~23,999,992,269, suggesting a fixed-prefix integer key scheme. The strong negative skew (−2.67) and high kurtosis (5.11) are driven by 5 outlier values that fall far below the cluster, near the minimum of ~22,999,500,355, which is about 1 billion lower than the bulk of records. Despite the name 'popularity_key', the distribution is inconsistent with any organic popularity signal and is almost certainly a generated or composite key. Treatment: Treat as an opaque identifier; do not use as a numeric feature — investigate the 5 outlier records (~10% of data) for data integrity issues before joining or filtering. medium · anthropic:default
n
50
nulls
0 (0.0%)
unique
49
min
2.3e+10
max
2.4e+10
mean
2.39e+10
median
2.4e+10
std
3.03e+08
q1
2.4e+10
q3
2.4e+10
iqr
4.002e+05
skew
-2.667
kurtosis
5.111
n_outliers
5
outlier_rate
0.1
zero_rate
0

ingredients_text

categorical long_tail
n
50
nulls
0 (0.0%)
unique
50
top_value
milk cream, cream, sugar, banana, bacteria
top_rate
0.02
cardinality
50
entropy
5.644
entropy_ratio
1

ingredients_text_with_allergens_fr

categorical long_tail
n
50
nulls
2 (4.0%)
unique
47
top_value
top_rate
0.04167
cardinality
47
entropy
5.543
entropy_ratio
0.998

ingredients_text_nl

categorical long_tail null_rate
n
50
nulls
38 (76.0%)
unique
9
top_value
top_rate
0.3333
cardinality
9
entropy
2.918
entropy_ratio
0.9206

product_name_es

categorical long_tail null_rate
n
50
nulls
30 (60.0%)
unique
17
top_value
top_rate
0.2
cardinality
17
entropy
3.922
entropy_ratio
0.9595

data_sources_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

data_quality_bugs_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

obsolete_since_date

categorical imbalance
n
50
nulls
6 (12.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

weighers_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_debug

categorical long_tail null_rate
n
50
nulls
14 (28.0%)
unique
35
top_value
top_rate
0.05556
cardinality
35
entropy
5.114
entropy_ratio
0.9971

created_t

numeric
n
50
nulls
0 (0.0%)
unique
50
min
1.338e+09
max
1.724e+09
mean
1.483e+09
median
1.476e+09
std
1.043e+08
q1
1.386e+09
q3
1.555e+09
iqr
1.694e+08
skew
0.3311
kurtosis
-0.8095
n_outliers
0
outlier_rate
0
zero_rate
0

ingredients_text_fr

categorical long_tail
n
50
nulls
2 (4.0%)
unique
47
top_value
top_rate
0.04167
cardinality
47
entropy
5.543
entropy_ratio
0.998

labels_hierarchy

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_non_nutritive_sweeteners_n

numeric constant
n
50
nulls
0 (0.0%)
unique
1
min
0
max
0
mean
0
median
0
std
0
q1
0
q3
0
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
1

last_edit_dates_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_text_nb

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packagings_complete

numeric
n
50
nulls
2 (4.0%)
unique
2
min
0
max
1
mean
0.5208
median
1
std
0.5049
q1
0
q3
1
iqr
1
skew
-0.08341
kurtosis
-1.993
n_outliers
0
outlier_rate
0
zero_rate
0.4792

data_sources

categorical long_tail
n
50
nulls
0 (0.0%)
unique
43
top_value
App - yuka, Apps, App - Open Food Facts, App - smoothie-openfoodfacts
top_rate
0.08
cardinality
43
entropy
5.309
entropy_ratio
0.9783

labels_old

categorical long_tail
n
50
nulls
4 (8.0%)
unique
38
top_value
top_rate
0.1957
cardinality
38
entropy
4.903
entropy_ratio
0.9343

data_quality_info_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_from_palm_oil_n

numeric outliers
n
50
nulls
4 (8.0%)
unique
2
min
0
max
1
mean
0.1522
median
0
std
0.3632
q1
0
q3
0
iqr
0
skew
1.937
kurtosis
1.751
n_outliers
7
outlier_rate
0.1522
zero_rate
0.8478

ingredients_text_with_allergens_ja

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_lc

categorical
n
50
nulls
0 (0.0%)
unique
4
top_value
fr
top_rate
0.7
cardinality
4
entropy
1.212
entropy_ratio
0.6061

origins

categorical long_tail
n
50
nulls
2 (4.0%)
unique
20
top_value
top_rate
0.5
cardinality
20
entropy
3.027
entropy_ratio
0.7003

nutriscore_data

unknown skipped
n
50
nulls
0 (0.0%)
unique

scans_n

numeric feature high_skew outliers
This column likely represents a count of scans per record (e.g., barcode or document scans), with 50 records and no nulls. The bulk of values sit in a moderate range (Q1=387, median=492, Q3=604), but extreme positive skew (3.90) and very high kurtosis (18.72) are driven by 4 outliers (8% of rows) reaching up to 2523 — more than 4× the median. The min of 333 suggests a natural floor, possibly a minimum scan threshold or truncation artefact. Treatment: Investigate the 4 outliers before modelling; apply log-transform or robust scaling to reduce skew impact in regression or distance-based models. medium · anthropic:default
n
50
nulls
0 (0.0%)
unique
49
min
333
max
2,523
mean
577.9
median
492
std
343.9
q1
387
q3
604
iqr
217
skew
3.899
kurtosis
18.72
n_outliers
4
outlier_rate
0.08
zero_rate
0

ingredients_that_may_be_from_palm_oil_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

generic_name_ar

categorical null_rate
n
50
nulls
40 (80.0%)
unique
2
top_value
top_rate
0.9
cardinality
2
entropy
0.469
entropy_ratio
0.469

product_name_uk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

last_checked_t

numeric null_rate
n
50
nulls
43 (86.0%)
unique
7
min
1.541e+09
max
1.73e+09
mean
1.607e+09
median
1.565e+09
std
7.772e+07
q1
1.556e+09
q3
1.652e+09
iqr
9.601e+07
skew
0.8106
kurtosis
-1.103
n_outliers
0
outlier_rate
0
zero_rate
0

last_check_dates_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_uk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

carbon_footprint_from_known_ingredients_debug

categorical long_tail null_rate
n
50
nulls
36 (72.0%)
unique
14
top_value
en:cereal 50% x 0.3 = 15 g -
top_rate
0.07143
cardinality
14
entropy
3.807
entropy_ratio
1

packaging_text_ar

categorical metadata null_rate imbalance
This column appears to hold Arabic-language packaging text, but it is effectively empty: 80% of the 50 rows are null, and the remaining 10 non-null rows contain only an empty string — giving a single unique value with top_rate of 1.0 and zero entropy. The column carries no information whatsoever in this dataset snapshot. Treatment: Drop this column; it contains no usable signal (100% null or empty string across all rows). high · anthropic:default
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_uk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

last_checker

categorical null_rate
n
50
nulls
43 (86.0%)
unique
4
top_value
aleene
top_rate
0.4286
cardinality
4
entropy
1.842
entropy_ratio
0.9212

checked

categorical feature null_rate imbalance
This column appears to be a binary checkbox field (HTML-style 'on'/'off'), but only the value 'on' is ever recorded — cardinality is 1 with 'on' appearing in all 7 non-null rows. The 86% null rate is the dominant signal: nulls almost certainly represent unchecked state rather than missing data, meaning the column encodes a boolean with an unconventional null-as-false convention. Zero entropy confirms complete absence of variation among non-null values. Treatment: Recode nulls as 0 and 'on' as 1 to produce a proper boolean/integer column before modelling. high · anthropic:default
n
50
nulls
43 (86.0%)
unique
1
top_value
on
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_ar

categorical long_tail null_rate
n
50
nulls
39 (78.0%)
unique
6
top_value
top_rate
0.5455
cardinality
6
entropy
2.049
entropy_ratio
0.7928

ingredients_text_with_allergens_uk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_uk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_uk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_ar

categorical null_rate
n
50
nulls
39 (78.0%)
unique
2
top_value
top_rate
0.9091
cardinality
2
entropy
0.4395
entropy_ratio
0.4395

ingredients_text_with_allergens_ar

categorical null_rate
n
50
nulls
41 (82.0%)
unique
2
top_value
top_rate
0.8889
cardinality
2
entropy
0.5033
entropy_ratio
0.5033

carbon_footprint_percent_of_known_ingredients

numeric null_rate
n
50
nulls
31 (62.0%)
unique
19
min
8
max
105
mean
61.79
median
70
std
28.98
q1
45.5
q3
78.3
iqr
32.8
skew
-0.4493
kurtosis
-0.8083
n_outliers
0
outlier_rate
0
zero_rate
0

origin_ar

categorical other null_rate imbalance
This column appears to be an Arabic-language origin field ('origin_ar') that is almost entirely empty. With an 80% null rate and cardinality of 1, the sole 'unique' value is itself an empty string appearing 10 times across 50 rows — meaning the column contains no actual data at all. This is a fully degenerate column with zero informational content. Treatment: Drop — column carries no information (100% null or empty string, entropy 0.0). high · anthropic:default
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

nutrition_score_warning_no_fiber

numeric null_rate constant
n
50
nulls
35 (70.0%)
unique
1
min
1
max
1
mean
1
median
1
std
0
q1
1
q3
1
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

ingredients_text_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

nutriments_estimated

unknown skipped
n
50
nulls
0 (0.0%)
unique

completed_t

numeric null_rate
n
50
nulls
34 (68.0%)
unique
16
min
1.628e+09
max
1.763e+09
mean
1.7e+09
median
1.703e+09
std
4.07e+07
q1
1.663e+09
q3
1.74e+09
iqr
7.618e+07
skew
0.001247
kurtosis
-1.155
n_outliers
0
outlier_rate
0
zero_rate
0

taxonomies_enhancer_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_with_allergens_sl

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_sk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_bg

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

ingredients_text_pt

categorical long_tail null_rate
n
50
nulls
40 (80.0%)
unique
4
top_value
top_rate
0.7
cardinality
4
entropy
1.357
entropy_ratio
0.6784

ingredients_text_dz

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_ca

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_bg

categorical label null_rate imbalance
This column appears to be a Bulgarian-language generic name field (likely a pharmaceutical or product name localization), but it is almost entirely absent: 94% of rows are null and the remaining 3 non-null rows contain only an empty string. With cardinality of 1 and entropy of 0, the column carries zero information. Treatment: Drop this column; it is 94% null and the only observed value is an empty string, making it analytically useless. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_sl

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_et

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
Chocolat noir - 85% cacao
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

origin_et

categorical metadata null_rate imbalance
This column appears to be an origin or source tag in Amharic/Ethiopic script (indicated by the '_et' suffix), but it is effectively empty: 94% of the 50 rows are null, and the sole non-null value present is an empty string appearing 3 times. With cardinality of 1 and entropy of 0.0, the column carries zero information. This is likely an unfilled localization or metadata field. Treatment: Drop this column; it contains no usable signal (94% null, sole value is empty string). high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_sk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_et

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije.
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

nutrition_score_warning_nutriments_estimated

numeric null_rate constant
n
50
nulls
48 (96.0%)
unique
1
min
1
max
1
mean
1
median
1
std
0
q1
1
q3
1
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
0

ingredients_text_sk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_pt

categorical long_tail null_rate
n
50
nulls
40 (80.0%)
unique
3
top_value
top_rate
0.8
cardinality
3
entropy
0.9219
entropy_ratio
0.5817

ingredients_text_bg

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

packaging_text_et

categorical free_text null_rate imbalance
This column contains Estonian-language packaging text (`_et` locale suffix), but is effectively empty: 94% of its 50 rows are null, and the sole non-null value across all 3 populated rows is an empty string. With cardinality of 1 and entropy of 0.0, the column carries zero information — it has never been populated in this dataset. Treatment: Drop — 94% null rate and only empty-string values provide no usable signal. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_sk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_ca

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_ca

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_dz

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_sl

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
ARRIBA 85% cacao
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_sk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_et

categorical label null_rate imbalance
This column appears to be an Estonian-language generic name field ('et' locale suffix), but it is effectively empty: 94% of its 50 rows are null, and the sole non-null value is a blank string appearing 3 times, giving a cardinality of 1. The column carries zero information — entropy is 0.0 and top_rate is 1.0 across a single empty token. Treatment: Drop this column; it contains no usable data (94% null, remaining values are blank strings). high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_et

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (_sojin_ lecitin); ekstrakt vanilije.
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

packaging_text_ca

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_sl

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_dz

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_ca

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_ca

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_pt

categorical free_text null_rate imbalance
This column appears to be a Portuguese-language packaging text field, almost certainly intended to carry product label or packaging descriptions. With an 80% null rate and the sole non-null value being an empty string appearing 10 times, the column contains zero usable information across all 50 rows. The effective data-present rate is 0%, making this column entirely empty in practice. Treatment: Drop this column; it carries no information and all present values are empty strings. high · anthropic:default
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_bg

categorical other null_rate imbalance
This column ('origin_bg') is a categorical field with 50 rows, but 94% of values are null and the sole non-null value is an empty string appearing 3 times — making it entirely devoid of usable information. Cardinality is 1, entropy is 0, and top_rate is 1.0, confirming complete uniformity across non-null entries. Both alerts (null_rate and imbalance) are triggered, which is expected given the near-total absence of data. Treatment: Drop this column; it carries zero information with 94% nulls and only empty strings remaining. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_bg

categorical free_text null_rate imbalance
This column contains Bulgarian-language packaging text for products, but it is almost entirely empty: 94% of the 50 rows are null, and the sole non-null value observed is an empty string appearing 3 times (top_rate 1.0). With cardinality of 1 and entropy of 0.0, the column carries zero information in its current state. Treatment: Drop from modelling; re-evaluate only if Bulgarian market data is backfilled, otherwise exclude as zero-variance. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_pt

categorical other null_rate imbalance
This column, likely representing an origin point or location, is almost entirely empty: 80% of its 50 rows are null, and the only non-null value present is an empty string appearing 10 times — meaning the column contains no actual information whatsoever. With a cardinality of 1 and entropy of 0.0, it is completely invariant. The combination of high null rate and a sole value being an empty string suggests the field was never populated in this dataset. Treatment: Drop — column carries zero information due to 80% nulls and a single empty-string value across all remaining rows. high · anthropic:default
n
50
nulls
40 (80.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_pt

categorical long_tail null_rate
n
50
nulls
42 (84.0%)
unique
4
top_value
top_rate
0.625
cardinality
4
entropy
1.549
entropy_ratio
0.7744

product_name_bg

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
Шоколад 85% какаова маса
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

ingredients_text_sl

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_sl

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_sk

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_pt

categorical long_tail null_rate
n
50
nulls
40 (80.0%)
unique
7
top_value
top_rate
0.4
cardinality
7
entropy
2.522
entropy_ratio
0.8983

lc_imported

categorical null_rate
n
50
nulls
42 (84.0%)
unique
2
top_value
fr
top_rate
0.875
cardinality
2
entropy
0.5436
entropy_ratio
0.5436

abbreviated_product_name_fr_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
CRISTALINE Eau De Source 0.5L
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

generic_name_zh

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

obsolete_imported

categorical other null_rate imbalance
This column appears to be a boolean or flag field (likely 'imported' status, now obsolete) that contains only the value '0' across all 7 non-null rows. With an 86% null rate and a cardinality of 1, the column carries zero information — entropy is exactly 0.0 and the single observed value covers 100% of non-null records. Both the near-total nulls and complete value imbalance are flagged as alerts. Treatment: Drop — zero variance, 86% nulls, and a name explicitly marking it obsolete make this column uninformative for any downstream use. high · anthropic:default
n
50
nulls
43 (86.0%)
unique
1
top_value
0
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_fr_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
Eau De Source
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

owners_tags

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
6
top_value
org-barilla-france-sa
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755

owner_imported

categorical long_tail null_rate
n
50
nulls
44 (88.0%)
unique
5
top_value
org-barilla-france-sa
top_rate
0.3333
cardinality
5
entropy
2.252
entropy_ratio
0.9697

customer_service

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
6
top_value
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755

ingredients_text_zh_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

countries_imported

categorical null_rate
n
50
nulls
42 (84.0%)
unique
2
top_value
France
top_rate
0.875
cardinality
2
entropy
0.5436
entropy_ratio
0.5436

data_sources_imported

categorical long_tail null_rate
n
50
nulls
42 (84.0%)
unique
8
top_value
Producers, Producer - gie-sources-alma, Database - Equadis, Database - GDSN, Databases, Producers, Producer - gie-sources-alma
top_rate
0.125
cardinality
8
entropy
3
entropy_ratio
1

product_name_zh

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

categories_imported

categorical long_tail null_rate
n
50
nulls
44 (88.0%)
unique
5
top_value
Snacks, Snacks salés, Amuse-gueules, Chips et frites, Chips
top_rate
0.3333
cardinality
5
entropy
2.252
entropy_ratio
0.9697

quantity_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
500 ml
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

ingredients_text_zh

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

emb_code

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
EMB 44068 A
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origins_fr

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Chambon-la-Forêt,France,Cairanne,Provence-Alpes-Côte d'Azur,Vaucluse,Italie,Source Sainte Cécile,Source Ofélia,Source Éléonore,Source Emma,Source Éléna
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

nutrition_data_prepared_per_imported

categorical metadata null_rate imbalance
This column captures the unit basis for imported nutrition data (e.g., 'per 100g'), and is effectively a constant — the only observed value is '100g' across all 7 non-null rows. With an 86% null rate and cardinality of 1, it carries zero discriminative information. The combination of near-total missingness and zero entropy is a strong signal this field was either sparsely populated at ingestion or serves as a fixed schema placeholder. Treatment: Drop before modelling; column is a zero-variance constant with 86% nulls and provides no analytical value. high · anthropic:default
n
50
nulls
43 (86.0%)
unique
1
top_value
100g
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_zh_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

sources_fields

unknown skipped
n
50
nulls
0 (0.0%)
unique

customer_service_fr

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
6
top_value
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755

nutrition_data_per_imported

categorical metadata null_rate imbalance
This column represents the unit basis for imported nutrition data, and every non-null value is identically '100g' — giving it a cardinality of 1 and an entropy of 0.0. With an 84% null rate across 50 rows, only 8 observations carry a value at all, making the column almost entirely absent. The combination of extreme nullity and zero variance means this column provides no discriminating information whatsoever. Treatment: Drop — 84% null with a single constant value ('100g') offers no predictive or analytical signal. high · anthropic:default
n
50
nulls
42 (84.0%)
unique
1
top_value
100g
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

owner

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
6
top_value
org-barilla-france-sa
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755

abbreviated_product_name

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
CRISTALINE Eau De Source 0.5L
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

conservation_conditions_fr

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

brands_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
6
top_value
Wasa
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755

owner_fields

unknown skipped
n
50
nulls
0 (0.0%)
unique

conservation_conditions_fr_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

origin_fr_imported

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
France
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

customer_service_fr_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
6
top_value
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate
0.2857
cardinality
6
entropy
2.522
entropy_ratio
0.9755

generic_name_zh_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_fr_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
CRISTALINE Eau De Source 0.5L
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

lang_imported

categorical metadata null_rate imbalance
This column records the imported language of a record, and across the full 50-row dataset every non-null value is 'fr' (French) — a single unique value with zero entropy. With an 86% null rate, only 7 of 50 rows carry any value at all, making the column nearly empty and entirely constant where populated. Both the extreme null rate and perfect imbalance are flagged as alerts, suggesting this field may be partially populated metadata from an import pipeline rather than a reliable feature. Treatment: Drop or impute cautiously — 86% nulls and zero variance make this column uninformative for modelling; investigate import pipeline for why values are absent. high · anthropic:default
n
50
nulls
43 (86.0%)
unique
1
top_value
fr
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

abbreviated_product_name_fr

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
CRISTALINE Eau De Source 0.5L
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

ingredients_text_fr_imported

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
Eau de Source
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

conservation_conditions

categorical long_tail null_rate
n
50
nulls
43 (86.0%)
unique
7
top_value
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

nova_group_error

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
too_many_unknown_ingredients
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

producer_version_id_imported

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
3
top_value
1
top_rate
0.5
cardinality
3
entropy
1.5
entropy_ratio
0.9464

ingredients_text_de_ocr_1648990410

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Kekse mit Nuss - Nugat- Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_ro

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

packaging_imported

categorical null_rate
n
50
nulls
46 (92.0%)
unique
2
top_value
Enveloppe
top_rate
0.75
cardinality
2
entropy
0.8113
entropy_ratio
0.8113

ingredients_text_de_ocr_1648990410_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Kekse mit Nuss - Nugat - Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_ro

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

producer_version_id

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
3
top_value
1
top_rate
0.5
cardinality
3
entropy
1.5
entropy_ratio
0.9464

labels_imported

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
3
top_value
Végétarien
top_rate
0.6
cardinality
3
entropy
1.371
entropy_ratio
0.865

allergens_imported

categorical long_tail null_rate
n
50
nulls
45 (90.0%)
unique
4
top_value
Gluten
top_rate
0.4
cardinality
4
entropy
1.922
entropy_ratio
0.961

origin_ro

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

no_nutrition_data_imported

categorical feature null_rate imbalance
This column is a boolean flag indicating whether nutrition data was absent for a record. It has a 92% null rate across 50 rows, and the only 4 non-null values all carry the single value 'false', giving it zero entropy and cardinality of 1. The extreme null rate combined with complete value uniformity among non-nulls means this column carries no predictive signal whatsoever — it is effectively empty. Treatment: Drop — zero variance and 92% nulls make this column useless for modelling or analysis. high · anthropic:default
n
50
nulls
46 (92.0%)
unique
1
top_value
false
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

serving_size_imported

categorical long_tail null_rate
n
50
nulls
44 (88.0%)
unique
6
top_value
13.8 g (1)
top_rate
0.1667
cardinality
6
entropy
2.585
entropy_ratio
1

generic_name_ro

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_de_ocr_1648897071

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_- und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_de_ocr_1648897071_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_ - und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_ro

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

abbreviated_product_name_imported

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
Authentique 275g, fr
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

traces_imported

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
4
top_value
Lupin, Lait, Moutarde, Graines de sésame, Soja
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1

specific_ingredients

unknown skipped
n
50
nulls
0 (0.0%)
unique

packaging_text_ru

categorical metadata null_rate imbalance
This column holds Russian-language packaging text, but is almost entirely empty: 94% of the 50 rows are null, and the sole non-null value appearing 3 times is an empty string — giving a cardinality of 1 and zero entropy. In practice the column carries no information whatsoever across the observed sample. Treatment: Drop this column; it is effectively unpopulated (94% null, remaining values are empty strings) and provides no signal for modelling or analysis. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_ru

categorical other null_rate imbalance
This column appears to be a Russian-language origin/source field that is almost entirely unpopulated: 94% of the 50 rows are null, and the sole non-null value is an empty string appearing 3 times. With cardinality of 1, zero entropy, and a top_rate of 1.0, the column carries absolutely no information. It was likely intended to capture Russian-locale origin metadata but was never populated. Treatment: Drop this column — it contains no usable signal (94% null, remaining values are empty strings). high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_ru

categorical metadata null_rate imbalance
This column is intended to store Russian-language ingredients text with allergen information for food products. It is effectively empty: 94% of the 50 rows are null, and the sole non-null value present is an empty string (''), giving a cardinality of 1 and entropy of 0. There is no usable signal whatsoever in this column for the sampled data. Treatment: Drop this column; it carries no information (94% null, remaining values are empty strings). high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_ru

categorical null_rate
n
50
nulls
47 (94.0%)
unique
2
top_value
top_rate
0.6667
cardinality
2
entropy
0.9183
entropy_ratio
0.9183

generic_name_ru

categorical null_rate
n
50
nulls
47 (94.0%)
unique
2
top_value
top_rate
0.6667
cardinality
2
entropy
0.9183
entropy_ratio
0.9183

ingredients_text_ru

categorical other null_rate imbalance
This column is a Russian-language ingredients text field for food/product records, almost certainly a localized variant of a broader ingredients column. It is 94% null across 50 rows, and the only non-null value observed is an empty string (appearing 3 times), meaning there is effectively zero usable content in this column. Cardinality of 1 and entropy of 0.0 confirm complete absence of informational signal. Treatment: Drop; 94% null with only empty-string values provides no modelling or analytical value. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_da

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_da

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Kiks
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

forest_footprint_data

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_da

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Original
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

ingredients_text_with_allergens_da

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
VETEMJÖL/HVEDEMEL, palmolja/-olie, glukossirap, maltextrakt från KORN/BYG, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, ÄGG/ÆG/EGG, arom, mjölbehandlingsmedel/melbehandlingsmiddel (NATRIUMDISULFIT).
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

origin_da

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_da

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
_VETEMJÖL_/_HVEDEMEL_, palmolja/-olie, glukossirap, maltextrakt från _KORN_/_BYG_, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, _ÄGG_/_ÆG_/_EGG_, arom, mjölbehandlingsmedel/melbehandlingsmiddel (_NATRIUMDISULFIT_).
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

ingredients_text_cs

categorical null_rate
n
50
nulls
47 (94.0%)
unique
2
top_value
top_rate
0.6667
cardinality
2
entropy
0.9183
entropy_ratio
0.9183

ingredients_text_nl_ocr_1675675383_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille - stokje.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_cs

categorical null_rate
n
50
nulls
47 (94.0%)
unique
2
top_value
top_rate
0.6667
cardinality
2
entropy
0.9183
entropy_ratio
0.9183

ingredients_text_hu_ocr_1571428260_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
kakaómassza, cukor, kakaó - vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_cs

categorical metadata null_rate imbalance
This column appears to be Czech-language packaging text (`_cs` locale suffix), but it is almost entirely empty: 94% null rate across 50 rows, and the only observed non-null value is an empty string appearing 3 times. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated for this dataset slice. Treatment: Drop this column; it contains no usable signal (94% nulls, sole value is empty string). high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_sr

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Šećer, kakao masa, kakao buter, vanile.
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

origin_sr

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_hu_ocr_1571428260

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
kakaómassza, cukor, kakaó- vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_hu

categorical feature null_rate imbalance
This column contains Hungarian-language packaging text, but is almost entirely empty: 92% null rate across 50 rows, and the only non-null value observed is an empty string appearing 4 times. With cardinality of 1 and entropy of 0.0, the column carries zero information — it is effectively unpopulated. Treatment: Drop — 92% nulls and a single empty-string value provide no modelling or analytical signal. high · anthropic:default
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_cs

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_nl_ocr_1675675383

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille- stokje.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_sr

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Excellence 70% Cocoa Intense Dark
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

generic_name_hu

categorical null_rate
n
50
nulls
46 (92.0%)
unique
2
top_value
top_rate
0.75
cardinality
2
entropy
0.8113
entropy_ratio
0.8113

packaging_text_sr

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_cs

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Kakaová hmota, cukr, kakaové máslo, vanilka.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_sr

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Šećer, kakao masa, kakao buter, vanile.
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

ingredients_text_hu

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
4
top_value
Kakaómassza, cukor, kakaó - vaj, vanília.
top_rate
0.25
cardinality
4
entropy
2
entropy_ratio
1

product_name_hu

categorical long_tail null_rate
n
50
nulls
46 (92.0%)
unique
3
top_value
top_rate
0.5
cardinality
3
entropy
1.5
entropy_ratio
0.9464

generic_name_sr

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
Tamna čokolada sa 70% kakaa
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

origin_hu

categorical other null_rate imbalance
This column appears to be an origin or handling-unit identifier that is almost entirely unpopulated — 92% of its 50 rows are null, and the sole non-null value present is an empty string appearing 4 times. With cardinality of 1, zero entropy, and a top_rate of 1.0 across non-null values, the column carries no discriminative information whatsoever. This is a effectively a blank field in the current dataset snapshot. Treatment: Drop — 92% null with a single empty-string value provides zero signal for any downstream task. high · anthropic:default
n
50
nulls
46 (92.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_hu

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
Kakaómassza, cukor, kakaó - vaj, vanília.
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

generic_name_cs

categorical label null_rate imbalance
This column appears to be a Czech-language generic name field (indicated by the '_cs' suffix) that is almost entirely empty: 94% of its 50 rows are null, and the sole non-null value is an empty string appearing 3 times. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated. Treatment: Drop this column; it contains no usable signal with a 94% null rate and only empty-string values in the remainder. high · anthropic:default
n
50
nulls
47 (94.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_xx

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_xx

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_xx

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_xx

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_xx

categorical null_rate imbalance
n
50
nulls
48 (96.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_es_ocr_1548767061

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_es_ocr_1548767061_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_ur

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_he

categorical long_tail null_rate
n
50
nulls
48 (96.0%)
unique
2
top_value
נוטלה
top_rate
0.5
cardinality
2
entropy
1
entropy_ratio
1

ingredients_text_he

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_ur

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_he

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
ממרח אגוזי לוז עם קקאו
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_he

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_ur

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_he

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_ur

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_ur

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_he

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

nutriscore_grade_producer_imported

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
c
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

nutriscore_grade_producer

categorical long_tail null_rate
n
50
nulls
47 (94.0%)
unique
3
top_value
c
top_rate
0.3333
cardinality
3
entropy
1.585
entropy_ratio
1

ingredients_text_el

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_el

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_el

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_el

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_el

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_el

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_it_ocr_1559410715

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Cioccolato amaro extra. Cacao: 99% minimo. Ingredienti: pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_de_ocr_1559410715

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_th

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
ลินด์ เอ็กเซอร์แลนซ์ ดาร์ก 99% โกโก้ ดาร์ก แอปโซลูท ช็อกโกแลต
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_th

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_de_ocr_1548767354

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_de_ocr_1548767354_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_th

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_it_ocr_1559410715_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_th

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_th

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_th

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_de_ocr_1559410715_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_fr_imported

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
1 FEUILLE PAPIER À RECYCLER, 1 FEUILLE METAL À RECYCLER.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

preparation

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Produit prêt à consommer
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

preparation_fr_imported

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Produit prêt à consommer
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

preparation_fr

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Produit prêt à consommer
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_lc

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_lc

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_lc

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_lc

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_xx_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_xx_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

product_name_xx_debug_tags

unknown skipped
n
50
nulls
0 (0.0%)
unique

ingredients_text_fr_ocr_1561814324_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1561814324

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides. Conditionné sous atmosphère protectrice.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1624039072

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
ingrédients : cacao, émulsifiant (lécithine de _soja_), vanille.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1624039072_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Cacao, émulsifiant (lécithine de _soja_), vanille.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573108349

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573108349_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573107560_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573108360

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573107556_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573109955

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1566920858

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573107560

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573108346

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573108346_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573109955_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1566920858_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573107556

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1573108360_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_ro

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

origin_lt

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_with_allergens_lt

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

product_name_lt

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_lt

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

packaging_text_lt

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

generic_name_lt

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1713713129_result

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

ingredients_text_fr_ocr_1713713129

categorical long_tail null_rate imbalance
n
50
nulls
49 (98.0%)
unique
1
top_value
Ingrédients : Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0