data trove openfoodfacts database
Reading
This is a 50-product sample from the Open Food Facts database, an open crowdsourced food product catalogue with 545 columns spanning multilingual product names, ingredient texts, allergen data, nutritional scores, packaging details, and community metadata. The most striking structural issue is extreme sparsity: the vast majority of language-specific columns (e.g. product_name_dz, ingredients_text_ja) have null rates of 96–98%, meaning content is concentrated in French and English fields. Two things most deserve a closer look: first, the Nutri-Score distribution is heavily skewed toward grade 'e' (54% of products), suggesting the sample leans toward nutritionally poor items; second, scan counts (scans_n, mean 578, max 2523) show a strong right-skewed tail with a few highly popular products dominating community attention.
citing: nutriscore_grade.top_value · nutriscore_grade.stats.top_rate · scans_n.stats.mean · scans_n.stats.max · scans_n.alerts · nova_groups.top_value · nova_groups.stats.top_rate · ecoscore_grade.stats.cardinality · emb_code.null_rate · ingredients_text_ja.null_rate
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| e | 27 | 54.0% |
| d | 9 | 18.0% |
| c | 7 | 14.0% |
| a | 4 | 8.0% |
| b | 2 | 4.0% |
| unknown | 1 | 2.0% |
Show data table
| bin | count |
|---|---|
| 333 – 645.9 | 39 |
| 645.9 – 958.7 | 7 |
| 958.7 – 1272 | 3 |
| 1272 – 1584 | 0 |
| 1584 – 1897 | 0 |
| 1897 – 2210 | 0 |
| 2210 – 2523 | 1 |
Show data table
| value | count | share |
|---|---|---|
| 4 | 33 | 66.0% |
| 3 | 14 | 28.0% |
| 1 | 1 | 2.0% |
Show data table
| value | count | share |
|---|---|---|
| e | 12 | 24.0% |
| d | 9 | 18.0% |
| b | 8 | 16.0% |
| c | 8 | 16.0% |
| unknown | 6 | 12.0% |
| a | 3 | 6.0% |
| a-plus | 2 | 4.0% |
| not-applicable | 1 | 2.0% |
| f | 1 | 2.0% |
Show data table
| value | count | share |
|---|---|---|
| Sugary snacks | 38 | 76.0% |
| Salty snacks | 4 | 8.0% |
| Cereals and potatoes | 3 | 6.0% |
| unknown | 2 | 4.0% |
| Milk and dairy products | 1 | 2.0% |
| Beverages | 1 | 2.0% |
| Fruits and vegetables | 1 | 2.0% |
Schema
545 columns| Alerts | ||||
|---|---|---|---|---|
| ingredients_with_unspecified_percent_sum | numeric | 0.0% | 22 |
|
| purchase_places | categorical | 2.0% | 32 |
long_tail
|
| rev | numeric | 0.0% | 46 |
|
| product_name_it | categorical | 68.0% | 12 |
long_tail
null_rate
|
| editors | unknown | 0.0% | — |
skipped
|
| nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients | numeric | 10.0% | 1 |
constant
|
| traces_hierarchy | unknown | 0.0% | — |
skipped
|
| packaging | categorical | 12.0% | 41 |
long_tail
|
| packagings_n | numeric | 18.0% | 5 |
outliers
|
| categories_properties | unknown | 0.0% | — |
skipped
|
| generic_name_en | categorical | 14.0% | 8 |
long_tail
|
| food_groups | categorical | 2.0% | 11 |
|
| ingredients_without_ciqual_codes_n | numeric | 0.0% | 15 |
|
| origin_sv | categorical | 92.0% | 1 |
null_rate
imbalance
|
| product_name_ja | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| data_quality_warnings_tags | unknown | 0.0% | — |
skipped
|
| packaging_recycling_tags | unknown | 0.0% | — |
skipped
|
| scores | unknown | 0.0% | — |
skipped
|
| nucleotides_prev_tags | unknown | 0.0% | — |
skipped
|
| data_quality_dimensions | unknown | 0.0% | — |
skipped
|
| product_name_fi | categorical | 90.0% | 4 |
long_tail
null_rate
|
| origin_de | categorical | 60.0% | 1 |
null_rate
imbalance
|
| packaging_lc | categorical | 12.0% | 7 |
|
| correctors_tags | unknown | 0.0% | — |
skipped
|
| categories_hierarchy | unknown | 0.0% | — |
skipped
|
| ingredients_ids_debug | unknown | 0.0% | — |
skipped
|
| traces_lc | categorical | 4.0% | 6 |
|
| environment_impact_level_tags | unknown | 0.0% | — |
skipped
|
| last_image_t | numeric | 0.0% | 50 |
high_skew
|
| ingredients_that_may_be_from_palm_oil_n | numeric | 8.0% | 3 |
high_skew
outliers
|
| max_imgid | categorical | 0.0% | 38 |
long_tail
|
| nutriscore_tags | unknown | 0.0% | — |
skipped
|
| generic_name_sv | categorical | 92.0% | 4 |
long_tail
null_rate
|
| ingredients_text_with_allergens_nb | categorical | 96.0% | 1 |
null_rate
imbalance
|
| quantity | categorical | 2.0% | 36 |
long_tail
|
| countries_hierarchy | unknown | 0.0% | — |
skipped
|
| data_quality_tags | unknown | 0.0% | — |
skipped
|
| ingredients_n | numeric | 0.0% | 22 |
|
| grades | unknown | 0.0% | — |
skipped
|
| additives_original_tags | unknown | 0.0% | — |
skipped
|
| nutrition_score_beverage | numeric | 0.0% | 2 |
high_skew
|
| packaging_text_nl | categorical | 76.0% | 1 |
null_rate
imbalance
|
| photographers | unknown | 0.0% | — |
skipped
|
| pnns_groups_1 | categorical | 0.0% | 7 |
|
| product_name_en | categorical | 14.0% | 34 |
long_tail
|
| traces_from_user | categorical | 0.0% | 35 |
long_tail
|
| generic_name_nl | categorical | 76.0% | 4 |
long_tail
null_rate
|
| nutrition_grade_fr | categorical | 0.0% | 6 |
|
| image_front_thumb_url | categorical | 0.0% | 50 |
long_tail
|
| last_editor | categorical | 2.0% | 24 |
long_tail
|
| nutrient_levels_tags | unknown | 0.0% | — |
skipped
|
| product_name_nb | categorical | 96.0% | 2 |
long_tail
null_rate
|
| packaging_shapes_tags | unknown | 0.0% | — |
skipped
|
| _keywords | unknown | 0.0% | — |
skipped
|
| emb_codes_tags | unknown | 0.0% | — |
skipped
|
| images | unknown | 0.0% | — |
skipped
|
| states_tags | unknown | 0.0% | — |
skipped
|
| packaging_text_sv | categorical | 92.0% | 1 |
null_rate
imbalance
|
| informers_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_pl | categorical | 90.0% | 3 |
long_tail
null_rate
|
| labels | categorical | 2.0% | 42 |
long_tail
|
| sources | unknown | 0.0% | — |
skipped
|
| checkers_tags | unknown | 0.0% | — |
skipped
|
| product_quantity_unit | categorical | 10.0% | 2 |
imbalance
|
| last_modified_by | categorical | 2.0% | 24 |
long_tail
|
| image_front_url | categorical | 0.0% | 50 |
long_tail
|
| nutrition_data_prepared | categorical | 4.0% | 1 |
imbalance
|
| packaging_text_fi | categorical | 90.0% | 1 |
null_rate
imbalance
|
| interface_version_created | categorical | 2.0% | 3 |
|
| nutrient_levels | unknown | 0.0% | — |
skipped
|
| languages_tags | unknown | 0.0% | — |
skipped
|
| vitamins_prev_tags | unknown | 0.0% | — |
skipped
|
| other_nutritional_substances_tags | unknown | 0.0% | — |
skipped
|
| product_name_de | categorical | 60.0% | 16 |
long_tail
null_rate
|
| nutrition_grades | categorical | 0.0% | 6 |
|
| countries_beforescanbot | categorical | 14.0% | 38 |
long_tail
|
| ingredients_text_with_allergens_es | categorical | 62.0% | 13 |
long_tail
null_rate
|
| labels_lc | categorical | 2.0% | 6 |
|
| nova_group_debug | categorical | 0.0% | 3 |
long_tail
imbalance
|
| nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value | numeric | 8.0% | 6 |
high_skew
outliers
|
| lc | categorical | 0.0% | 5 |
|
| allergens_from_user | categorical | 0.0% | 34 |
long_tail
|
| debug_param_sorted_langs | unknown | 0.0% | — |
skipped
|
| ecoscore_tags | unknown | 0.0% | — |
skipped
|
| nutriscore_score_opposite | numeric | 2.0% | 28 |
|
| image_small_url | categorical | 0.0% | 50 |
long_tail
|
| codes_tags | unknown | 0.0% | — |
skipped
|
| pnns_groups_2_tags | unknown | 0.0% | — |
skipped
|
| ingredients_analysis_tags | unknown | 0.0% | — |
skipped
|
| purchase_places_tags | unknown | 0.0% | — |
skipped
|
| unique_scans_n | numeric | 0.0% | 48 |
high_skew
outliers
|
| update_key | categorical | 0.0% | 9 |
long_tail
|
| emb_codes_orig | categorical | 34.0% | 5 |
long_tail
null_rate
|
| ingredients_text_with_allergens_de | categorical | 66.0% | 16 |
long_tail
null_rate
|
| ingredients_without_ecobalyse_ids_n | numeric | 0.0% | 20 |
|
| main_countries_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_with_allergens_en | categorical | 16.0% | 36 |
long_tail
|
| nucleotides_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_with_allergens_sv | categorical | 92.0% | 4 |
long_tail
null_rate
|
| entry_dates_tags | unknown | 0.0% | — |
skipped
|
| allergens_from_ingredients | categorical | 0.0% | 35 |
long_tail
|
| nova_groups | categorical | 4.0% | 3 |
|
| product_quantity | categorical | 6.0% | 27 |
long_tail
|
| ingredients_debug | unknown | 0.0% | — |
skipped
|
| generic_name | categorical | 4.0% | 28 |
long_tail
|
| origins_tags | unknown | 0.0% | — |
skipped
|
| added_countries_tags | unknown | 0.0% | — |
skipped
|
| categories_lc | categorical | 0.0% | 6 |
|
| image_url | categorical | 0.0% | 50 |
long_tail
|
| ingredients_sweeteners_n | numeric | 0.0% | 1 |
constant
|
| ingredients_text_ja | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| allergens_tags | unknown | 0.0% | — |
skipped
|
| origin_es | categorical | 60.0% | 1 |
null_rate
imbalance
|
| last_updated_t | numeric | 0.0% | 50 |
outliers
|
| origin_fr | categorical | 8.0% | 7 |
long_tail
|
| nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value | numeric | 10.0% | 13 |
high_skew
outliers
|
| ingredients_without_ecobalyse_ids | unknown | 0.0% | — |
skipped
|
| ingredients_text_with_allergens_it | categorical | 68.0% | 12 |
long_tail
null_rate
|
| data_quality_errors_tags | unknown | 0.0% | — |
skipped
|
| origin_pl | categorical | 90.0% | 1 |
null_rate
imbalance
|
| packaging_text_fr | categorical | 6.0% | 14 |
long_tail
|
| debug_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_sv | categorical | 92.0% | 4 |
long_tail
null_rate
|
| cities_tags | unknown | 0.0% | — |
skipped
|
| ingredients_with_unspecified_percent_n | numeric | 0.0% | 18 |
|
| product_name_fr | categorical | 2.0% | 47 |
long_tail
|
| traces | categorical | 0.0% | 23 |
long_tail
|
| known_ingredients_n | numeric | 0.0% | 22 |
|
| packaging_text_pl | categorical | 90.0% | 1 |
null_rate
imbalance
|
| image_front_small_url | categorical | 0.0% | 50 |
long_tail
|
| origin_en | categorical | 14.0% | 2 |
imbalance
|
| interface_version_modified | categorical | 0.0% | 2 |
|
| serving_size | categorical | 12.0% | 37 |
long_tail
|
| states | categorical | 0.0% | 26 |
long_tail
|
| generic_name_fi | categorical | 90.0% | 5 |
long_tail
null_rate
|
| schema_version | numeric | 0.0% | 1 |
constant
|
| packaging_old_before_taxonomization | categorical | 24.0% | 36 |
long_tail
null_rate
|
| nova_groups_markers | unknown | 0.0% | — |
skipped
|
| amino_acids_prev_tags | unknown | 0.0% | — |
skipped
|
| product | unknown | 0.0% | — |
skipped
|
| emb_codes | categorical | 4.0% | 11 |
long_tail
|
| labels_tags | unknown | 0.0% | — |
skipped
|
| selected_images | unknown | 0.0% | — |
skipped
|
| nutriscore | unknown | 0.0% | — |
skipped
|
| packaging_tags | unknown | 0.0% | — |
skipped
|
| traces_from_ingredients | categorical | 0.0% | 12 |
long_tail
|
| nutrition_data_per | categorical | 0.0% | 2 |
|
| ecoscore_grade | categorical | 0.0% | 9 |
|
| packaging_hierarchy | unknown | 0.0% | — |
skipped
|
| nova_group | numeric | 4.0% | 3 |
high_skew
|
| additives_tags | unknown | 0.0% | — |
skipped
|
| emb_codes_20141016 | categorical | 58.0% | 7 |
long_tail
null_rate
|
| ingredients_without_ciqual_codes | unknown | 0.0% | — |
skipped
|
| categories_tags | unknown | 0.0% | — |
skipped
|
| category_properties | unknown | 0.0% | — |
skipped
|
| packagings | unknown | 0.0% | — |
skipped
|
| languages_codes | unknown | 0.0% | — |
skipped
|
| ingredients_text_with_allergens_fi | categorical | 90.0% | 4 |
long_tail
null_rate
|
| ciqual_food_name_tags | unknown | 0.0% | — |
skipped
|
| complete | numeric | 0.0% | 2 |
|
| ingredients_text_with_allergens_pl | categorical | 92.0% | 3 |
long_tail
null_rate
|
| allergens_hierarchy | unknown | 0.0% | — |
skipped
|
| languages_hierarchy | unknown | 0.0% | — |
skipped
|
| nova_groups_tags | unknown | 0.0% | — |
skipped
|
| ingredients_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_it | categorical | 68.0% | 12 |
long_tail
null_rate
|
| informers | unknown | 0.0% | — |
skipped
|
| origin_nb | categorical | 96.0% | 1 |
null_rate
imbalance
|
| creator | categorical | 0.0% | 13 |
long_tail
|
| packaging_text_ja | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| sortkey | numeric | 12.0% | 44 |
high_skew
outliers
|
| packagings_materials_main | categorical | 62.0% | 3 |
null_rate
|
| ingredients_percent_analysis | numeric | 0.0% | 2 |
high_skew
outliers
|
| amino_acids_tags | unknown | 0.0% | — |
skipped
|
| categories_properties_tags | unknown | 0.0% | — |
skipped
|
| environment_impact_level | categorical | 56.0% | 1 |
null_rate
imbalance
|
| expiration_date | categorical | 4.0% | 34 |
long_tail
|
| ingredients_from_or_that_may_be_from_palm_oil_n | numeric | 6.0% | 3 |
|
| nutriscore_score | numeric | 2.0% | 28 |
|
| ingredients_text_with_allergens | categorical | 0.0% | 50 |
long_tail
|
| ingredients_with_specified_percent_sum | numeric | 0.0% | 22 |
|
| nutriscore_version | categorical | 0.0% | 1 |
imbalance
|
| lang | categorical | 0.0% | 5 |
|
| origins_hierarchy | unknown | 0.0% | — |
skipped
|
| origins_lc | categorical | 4.0% | 6 |
|
| origin_it | categorical | 68.0% | 1 |
null_rate
imbalance
|
| serving_quantity | categorical | 12.0% | 27 |
long_tail
|
| checkers | unknown | 0.0% | — |
skipped
|
| editors_tags | unknown | 0.0% | — |
skipped
|
| stores | categorical | 4.0% | 31 |
long_tail
|
| product_name_pl | categorical | 90.0% | 3 |
long_tail
null_rate
|
| weighters_tags | unknown | 0.0% | — |
skipped
|
| ecoscore_score | numeric | 14.0% | 31 |
|
| generic_name_it | categorical | 68.0% | 5 |
long_tail
null_rate
|
| obsolete | categorical | 12.0% | 1 |
imbalance
|
| other_nutritional_substances_prev_tags | unknown | 0.0% | — |
skipped
|
| compared_to_category | categorical | 0.0% | 35 |
long_tail
|
| generic_name_es | categorical | 60.0% | 7 |
long_tail
null_rate
|
| correctors | unknown | 0.0% | — |
skipped
|
| additives_n | numeric | 0.0% | 8 |
|
| ingredients_text_nb | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_es | categorical | 60.0% | 13 |
long_tail
null_rate
|
| manufacturing_places_tags | unknown | 0.0% | — |
skipped
|
| origin | categorical | 6.0% | 6 |
long_tail
|
| origins_old | categorical | 22.0% | 9 |
long_tail
null_rate
|
| packaging_text_de | categorical | 60.0% | 2 |
null_rate
|
| languages | unknown | 0.0% | — |
skipped
|
| categories_old | categorical | 2.0% | 45 |
long_tail
|
| ingredients_from_palm_oil_tags | unknown | 0.0% | — |
skipped
|
| minerals_prev_tags | unknown | 0.0% | — |
skipped
|
| origin_fi | categorical | 90.0% | 1 |
null_rate
imbalance
|
| packaging_old | categorical | 14.0% | 40 |
long_tail
|
| ingredients_text_fi | categorical | 90.0% | 4 |
long_tail
null_rate
|
| product_type | categorical | 0.0% | 1 |
imbalance
|
| ingredients_hierarchy | unknown | 0.0% | — |
skipped
|
| removed_countries_tags | unknown | 0.0% | — |
skipped
|
| unknown_nutrients_tags | unknown | 0.0% | — |
skipped
|
| no_nutrition_data | categorical | 4.0% | 1 |
imbalance
|
| ingredients_analysis | unknown | 0.0% | — |
skipped
|
| packagings_materials | unknown | 0.0% | — |
skipped
|
| serving_quantity_unit | categorical | 8.0% | 2 |
imbalance
|
| product_name | categorical | 0.0% | 49 |
long_tail
|
| id | categorical | 0.0% | 50 |
long_tail
|
| ingredients_text_with_allergens_nl | categorical | 78.0% | 9 |
long_tail
null_rate
|
| categories | categorical | 0.0% | 46 |
long_tail
|
| nutrition_grades_tags | unknown | 0.0% | — |
skipped
|
| nutriscore_2023_tags | unknown | 0.0% | — |
skipped
|
| origin_ja | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| nutrition_score_debug | categorical | 0.0% | 2 |
imbalance
|
| teams | categorical | 8.0% | 39 |
long_tail
|
| unknown_ingredients_n | numeric | 0.0% | 6 |
high_skew
outliers
|
| url | categorical | 0.0% | 50 |
long_tail
|
| data_quality_completeness_tags | unknown | 0.0% | — |
skipped
|
| ecoscore_data | unknown | 0.0% | — |
skipped
|
| generic_name_pl | categorical | 90.0% | 2 |
null_rate
|
| nutrition_data | categorical | 2.0% | 1 |
imbalance
|
| generic_name_ja | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| nutriments | unknown | 0.0% | — |
skipped
|
| last_image_dates_tags | unknown | 0.0% | — |
skipped
|
| brands | categorical | 0.0% | 41 |
long_tail
|
| minerals_tags | unknown | 0.0% | — |
skipped
|
| nutrition_data_prepared_per | categorical | 0.0% | 1 |
imbalance
|
| popularity_tags | unknown | 0.0% | — |
skipped
|
| packaging_text_es | categorical | 60.0% | 2 |
null_rate
|
| manufacturing_places | categorical | 2.0% | 20 |
long_tail
|
| generic_name_nb | categorical | 96.0% | 1 |
null_rate
imbalance
|
| last_modified_t | numeric | 0.0% | 50 |
outliers
|
| vitamins_tags | unknown | 0.0% | — |
skipped
|
| _id | categorical | 0.0% | 50 |
long_tail
|
| teams_tags | unknown | 0.0% | — |
skipped
|
| countries | categorical | 0.0% | 43 |
long_tail
|
| pnns_groups_2 | categorical | 0.0% | 11 |
|
| states_hierarchy | unknown | 0.0% | — |
skipped
|
| code | categorical | 0.0% | 50 |
long_tail
|
| countries_lc | categorical | 2.0% | 6 |
|
| stores_tags | unknown | 0.0% | — |
skipped
|
| generic_name_de | categorical | 60.0% | 9 |
long_tail
null_rate
|
| ingredients_n_tags | unknown | 0.0% | — |
skipped
|
| allergens | categorical | 0.0% | 16 |
|
| allergens_lc | categorical | 4.0% | 6 |
|
| ingredients_text_en | categorical | 12.0% | 36 |
long_tail
|
| misc_tags | unknown | 0.0% | — |
skipped
|
| photographers_tags | unknown | 0.0% | — |
skipped
|
| packaging_materials_tags | unknown | 0.0% | — |
skipped
|
| product_name_nl | categorical | 76.0% | 7 |
long_tail
null_rate
|
| nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients | numeric | 8.0% | 1 |
constant
|
| product_name_sv | categorical | 92.0% | 4 |
long_tail
null_rate
|
| food_groups_tags | unknown | 0.0% | — |
skipped
|
| completeness | numeric | 0.0% | 14 |
outliers
|
| pnns_groups_1_tags | unknown | 0.0% | — |
skipped
|
| ingredients_with_specified_percent_n | numeric | 0.0% | 7 |
|
| origin_nl | categorical | 76.0% | 1 |
null_rate
imbalance
|
| fruits-vegetables-nuts_100g_estimate | numeric | 46.0% | 2 |
null_rate
high_skew
|
| brands_old | categorical | 32.0% | 29 |
long_tail
null_rate
|
| generic_name_fr | categorical | 6.0% | 34 |
long_tail
|
| ingredients | unknown | 0.0% | — |
skipped
|
| countries_tags | unknown | 0.0% | — |
skipped
|
| ingredients_original_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_de | categorical | 60.0% | 16 |
long_tail
null_rate
|
| nutriscore_grade | categorical | 0.0% | 6 |
|
| image_thumb_url | categorical | 0.0% | 50 |
long_tail
|
| packaging_text_en | categorical | 14.0% | 5 |
long_tail
|
| packaging_text_it | categorical | 68.0% | 3 |
long_tail
null_rate
|
| traces_tags | unknown | 0.0% | — |
skipped
|
| brands_tags | unknown | 0.0% | — |
skipped
|
| nutriscore_2021_tags | unknown | 0.0% | — |
skipped
|
| packaging_text | categorical | 4.0% | 13 |
long_tail
|
| popularity_key | numeric | 0.0% | 49 |
high_skew
outliers
|
| ingredients_text | categorical | 0.0% | 50 |
long_tail
|
| ingredients_text_with_allergens_fr | categorical | 4.0% | 47 |
long_tail
|
| ingredients_text_nl | categorical | 76.0% | 9 |
long_tail
null_rate
|
| product_name_es | categorical | 60.0% | 17 |
long_tail
null_rate
|
| data_sources_tags | unknown | 0.0% | — |
skipped
|
| data_quality_bugs_tags | unknown | 0.0% | — |
skipped
|
| obsolete_since_date | categorical | 12.0% | 1 |
imbalance
|
| weighers_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_debug | categorical | 28.0% | 35 |
long_tail
null_rate
|
| link | categorical | 4.0% | 28 |
long_tail
|
| created_t | numeric | 0.0% | 50 |
|
| ingredients_text_fr | categorical | 4.0% | 47 |
long_tail
|
| labels_hierarchy | unknown | 0.0% | — |
skipped
|
| ingredients_non_nutritive_sweeteners_n | numeric | 0.0% | 1 |
constant
|
| last_edit_dates_tags | unknown | 0.0% | — |
skipped
|
| packaging_text_nb | categorical | 96.0% | 1 |
null_rate
imbalance
|
| packagings_complete | numeric | 4.0% | 2 |
|
| data_sources | categorical | 0.0% | 43 |
long_tail
|
| labels_old | categorical | 8.0% | 38 |
long_tail
|
| data_quality_info_tags | unknown | 0.0% | — |
skipped
|
| ingredients_from_palm_oil_n | numeric | 8.0% | 2 |
outliers
|
| ingredients_text_with_allergens_ja | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_lc | categorical | 0.0% | 4 |
|
| origins | categorical | 4.0% | 20 |
long_tail
|
| nutriscore_data | unknown | 0.0% | — |
skipped
|
| scans_n | numeric | 0.0% | 49 |
high_skew
outliers
|
| ingredients_that_may_be_from_palm_oil_tags | unknown | 0.0% | — |
skipped
|
| generic_name_ar | categorical | 80.0% | 2 |
null_rate
|
| product_name_uk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| last_checked_t | numeric | 86.0% | 7 |
null_rate
|
| last_check_dates_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_uk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| carbon_footprint_from_known_ingredients_debug | categorical | 72.0% | 14 |
long_tail
null_rate
|
| packaging_text_ar | categorical | 80.0% | 1 |
null_rate
imbalance
|
| generic_name_uk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| last_checker | categorical | 86.0% | 4 |
null_rate
|
| checked | categorical | 86.0% | 1 |
null_rate
imbalance
|
| product_name_ar | categorical | 78.0% | 6 |
long_tail
null_rate
|
| ingredients_text_with_allergens_uk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_uk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_uk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_ar | categorical | 78.0% | 2 |
null_rate
|
| ingredients_text_with_allergens_ar | categorical | 82.0% | 2 |
null_rate
|
| carbon_footprint_percent_of_known_ingredients | numeric | 62.0% | 19 |
null_rate
|
| origin_ar | categorical | 80.0% | 1 |
null_rate
imbalance
|
| nutrition_score_warning_no_fiber | numeric | 70.0% | 1 |
null_rate
constant
|
| ingredients_text_debug_tags | unknown | 0.0% | — |
skipped
|
| nutriments_estimated | unknown | 0.0% | — |
skipped
|
| completed_t | numeric | 68.0% | 16 |
null_rate
|
| taxonomies_enhancer_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_with_allergens_sl | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_sk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_bg | categorical | 94.0% | 3 |
long_tail
null_rate
|
| ingredients_text_pt | categorical | 80.0% | 4 |
long_tail
null_rate
|
| ingredients_text_dz | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_ca | categorical | 96.0% | 1 |
null_rate
imbalance
|
| generic_name_bg | categorical | 94.0% | 1 |
null_rate
imbalance
|
| origin_sl | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_et | categorical | 94.0% | 3 |
long_tail
null_rate
|
| origin_et | categorical | 94.0% | 1 |
null_rate
imbalance
|
| ingredients_text_with_allergens_sk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_et | categorical | 94.0% | 3 |
long_tail
null_rate
|
| nutrition_score_warning_nutriments_estimated | numeric | 96.0% | 1 |
null_rate
constant
|
| ingredients_text_sk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_pt | categorical | 80.0% | 3 |
long_tail
null_rate
|
| ingredients_text_bg | categorical | 94.0% | 3 |
long_tail
null_rate
|
| packaging_text_et | categorical | 94.0% | 1 |
null_rate
imbalance
|
| product_name_sk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_ca | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_with_allergens_ca | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_dz | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_sl | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_sk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_et | categorical | 94.0% | 1 |
null_rate
imbalance
|
| ingredients_text_et | categorical | 94.0% | 3 |
long_tail
null_rate
|
| packaging_text_ca | categorical | 96.0% | 1 |
null_rate
imbalance
|
| packaging_text_sl | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_dz | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_ca | categorical | 96.0% | 1 |
null_rate
imbalance
|
| product_name_ca | categorical | 96.0% | 1 |
null_rate
imbalance
|
| packaging_text_pt | categorical | 80.0% | 1 |
null_rate
imbalance
|
| origin_bg | categorical | 94.0% | 1 |
null_rate
imbalance
|
| packaging_text_bg | categorical | 94.0% | 1 |
null_rate
imbalance
|
| origin_pt | categorical | 80.0% | 1 |
null_rate
imbalance
|
| ingredients_text_with_allergens_pt | categorical | 84.0% | 4 |
long_tail
null_rate
|
| product_name_bg | categorical | 94.0% | 3 |
long_tail
null_rate
|
| ingredients_text_sl | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_sl | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_sk | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_pt | categorical | 80.0% | 7 |
long_tail
null_rate
|
| lc_imported | categorical | 84.0% | 2 |
null_rate
|
| abbreviated_product_name_fr_imported | categorical | 86.0% | 7 |
long_tail
null_rate
|
| generic_name_zh | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| obsolete_imported | categorical | 86.0% | 1 |
null_rate
imbalance
|
| generic_name_fr_imported | categorical | 86.0% | 7 |
long_tail
null_rate
|
| owners_tags | categorical | 86.0% | 6 |
long_tail
null_rate
|
| owner_imported | categorical | 88.0% | 5 |
long_tail
null_rate
|
| customer_service | categorical | 86.0% | 6 |
long_tail
null_rate
|
| ingredients_text_zh_debug_tags | unknown | 0.0% | — |
skipped
|
| countries_imported | categorical | 84.0% | 2 |
null_rate
|
| data_sources_imported | categorical | 84.0% | 8 |
long_tail
null_rate
|
| product_name_zh | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| categories_imported | categorical | 88.0% | 5 |
long_tail
null_rate
|
| quantity_imported | categorical | 86.0% | 7 |
long_tail
null_rate
|
| ingredients_text_zh | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| emb_code | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origins_fr | categorical | 96.0% | 2 |
long_tail
null_rate
|
| nutrition_data_prepared_per_imported | categorical | 86.0% | 1 |
null_rate
imbalance
|
| product_name_zh_debug_tags | unknown | 0.0% | — |
skipped
|
| sources_fields | unknown | 0.0% | — |
skipped
|
| customer_service_fr | categorical | 86.0% | 6 |
long_tail
null_rate
|
| nutrition_data_per_imported | categorical | 84.0% | 1 |
null_rate
imbalance
|
| owner | categorical | 86.0% | 6 |
long_tail
null_rate
|
| abbreviated_product_name | categorical | 86.0% | 7 |
long_tail
null_rate
|
| conservation_conditions_fr | categorical | 86.0% | 7 |
long_tail
null_rate
|
| brands_imported | categorical | 86.0% | 6 |
long_tail
null_rate
|
| owner_fields | unknown | 0.0% | — |
skipped
|
| conservation_conditions_fr_imported | categorical | 86.0% | 7 |
long_tail
null_rate
|
| origin_fr_imported | categorical | 96.0% | 2 |
long_tail
null_rate
|
| customer_service_fr_imported | categorical | 86.0% | 6 |
long_tail
null_rate
|
| generic_name_zh_debug_tags | unknown | 0.0% | — |
skipped
|
| product_name_fr_imported | categorical | 86.0% | 7 |
long_tail
null_rate
|
| lang_imported | categorical | 86.0% | 1 |
null_rate
imbalance
|
| abbreviated_product_name_fr | categorical | 86.0% | 7 |
long_tail
null_rate
|
| ingredients_text_fr_imported | categorical | 86.0% | 7 |
long_tail
null_rate
|
| conservation_conditions | categorical | 86.0% | 7 |
long_tail
null_rate
|
| nova_group_error | categorical | 96.0% | 1 |
null_rate
imbalance
|
| producer_version_id_imported | categorical | 92.0% | 3 |
long_tail
null_rate
|
| ingredients_text_de_ocr_1648990410 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_ro | categorical | 96.0% | 2 |
long_tail
null_rate
|
| packaging_imported | categorical | 92.0% | 2 |
null_rate
|
| ingredients_text_de_ocr_1648990410_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_ro | categorical | 96.0% | 1 |
null_rate
imbalance
|
| producer_version_id | categorical | 92.0% | 3 |
long_tail
null_rate
|
| labels_imported | categorical | 90.0% | 3 |
long_tail
null_rate
|
| allergens_imported | categorical | 90.0% | 4 |
long_tail
null_rate
|
| origin_ro | categorical | 96.0% | 1 |
null_rate
imbalance
|
| no_nutrition_data_imported | categorical | 92.0% | 1 |
null_rate
imbalance
|
| serving_size_imported | categorical | 88.0% | 6 |
long_tail
null_rate
|
| generic_name_ro | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_de_ocr_1648897071 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_de_ocr_1648897071_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_ro | categorical | 96.0% | 1 |
null_rate
imbalance
|
| abbreviated_product_name_imported | categorical | 94.0% | 3 |
long_tail
null_rate
|
| traces_imported | categorical | 92.0% | 4 |
long_tail
null_rate
|
| specific_ingredients | unknown | 0.0% | — |
skipped
|
| packaging_text_ru | categorical | 94.0% | 1 |
null_rate
imbalance
|
| origin_ru | categorical | 94.0% | 1 |
null_rate
imbalance
|
| ingredients_text_with_allergens_ru | categorical | 94.0% | 1 |
null_rate
imbalance
|
| product_name_ru | categorical | 94.0% | 2 |
null_rate
|
| generic_name_ru | categorical | 94.0% | 2 |
null_rate
|
| ingredients_text_ru | categorical | 94.0% | 1 |
null_rate
imbalance
|
| packaging_text_da | categorical | 96.0% | 1 |
null_rate
imbalance
|
| generic_name_da | categorical | 96.0% | 2 |
long_tail
null_rate
|
| forest_footprint_data | unknown | 0.0% | — |
skipped
|
| product_name_da | categorical | 96.0% | 2 |
long_tail
null_rate
|
| ingredients_text_with_allergens_da | categorical | 96.0% | 2 |
long_tail
null_rate
|
| origin_da | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_da | categorical | 96.0% | 2 |
long_tail
null_rate
|
| ingredients_text_cs | categorical | 94.0% | 2 |
null_rate
|
| ingredients_text_nl_ocr_1675675383_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_cs | categorical | 94.0% | 2 |
null_rate
|
| ingredients_text_hu_ocr_1571428260_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_cs | categorical | 94.0% | 1 |
null_rate
imbalance
|
| ingredients_text_sr | categorical | 96.0% | 2 |
long_tail
null_rate
|
| origin_sr | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_hu_ocr_1571428260 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_hu | categorical | 92.0% | 1 |
null_rate
imbalance
|
| origin_cs | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_nl_ocr_1675675383 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_sr | categorical | 96.0% | 2 |
long_tail
null_rate
|
| generic_name_hu | categorical | 92.0% | 2 |
null_rate
|
| packaging_text_sr | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_with_allergens_cs | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_sr | categorical | 96.0% | 2 |
long_tail
null_rate
|
| ingredients_text_hu | categorical | 92.0% | 4 |
long_tail
null_rate
|
| product_name_hu | categorical | 92.0% | 3 |
long_tail
null_rate
|
| generic_name_sr | categorical | 96.0% | 2 |
long_tail
null_rate
|
| origin_hu | categorical | 92.0% | 1 |
null_rate
imbalance
|
| ingredients_text_with_allergens_hu | categorical | 94.0% | 3 |
long_tail
null_rate
|
| generic_name_cs | categorical | 94.0% | 1 |
null_rate
imbalance
|
| ingredients_text_xx | categorical | 96.0% | 1 |
null_rate
imbalance
|
| origin_xx | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_xx | categorical | 96.0% | 1 |
null_rate
imbalance
|
| packaging_text_xx | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_xx | categorical | 96.0% | 1 |
null_rate
imbalance
|
| ingredients_text_es_ocr_1548767061 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_es_ocr_1548767061_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_ur | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_he | categorical | 96.0% | 2 |
long_tail
null_rate
|
| ingredients_text_he | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_ur | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_he | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_he | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_ur | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_he | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_ur | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_ur | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_he | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| nutriscore_grade_producer_imported | categorical | 94.0% | 3 |
long_tail
null_rate
|
| nutriscore_grade_producer | categorical | 94.0% | 3 |
long_tail
null_rate
|
| ingredients_text_el | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_el | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_el | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_el | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_el | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_el | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_it_ocr_1559410715 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_de_ocr_1559410715 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_th | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_th | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_de_ocr_1548767354 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_de_ocr_1548767354_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_th | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_it_ocr_1559410715_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_th | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_th | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_th | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_de_ocr_1559410715_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_fr_imported | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| preparation | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| preparation_fr_imported | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| preparation_fr | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_lc | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_lc | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_lc | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_lc | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_xx_debug_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_xx_debug_tags | unknown | 0.0% | — |
skipped
|
| product_name_xx_debug_tags | unknown | 0.0% | — |
skipped
|
| ingredients_text_fr_ocr_1561814324_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1561814324 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1624039072 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1624039072_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573108349 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573108349_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573107560_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573108360 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573107556_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573109955 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1566920858 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573107560 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573108346 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573108346_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573109955_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1566920858_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573107556 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1573108360_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_ro | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| origin_lt | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_with_allergens_lt | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| product_name_lt | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_lt | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| packaging_text_lt | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| generic_name_lt | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1713713129_result | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
| ingredients_text_fr_ocr_1713713129 | categorical | 98.0% | 1 |
long_tail
null_rate
imbalance
|
ingredients_with_unspecified_percent_sum
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 22
- min
- 0.4
- max
- 100
- mean
- 79.42
- median
- 100
- std
- 31.64
- q1
- 53.6
- q3
- 100
- iqr
- 46.4
- skew
- -1.183
- kurtosis
- -0.133
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
purchase_places
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 32
- top_value
- France
- top_rate
- 0.1837
- cardinality
- 32
- entropy
- 4.479
- entropy_ratio
- 0.8958
rev
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 46
- min
- 19
- max
- 674
- mean
- 230
- median
- 233.5
- std
- 166.6
- q1
- 72.75
- q3
- 310.5
- iqr
- 237.8
- skew
- 0.7092
- kurtosis
- -0.02278
- n_outliers
- 1
- outlier_rate
- 0.02
- zero_rate
- 0
product_name_it
categorical long_tail null_rate- n
- 50
- nulls
- 34 (68.0%)
- unique
- 12
- top_value
- top_rate
- 0.3125
- cardinality
- 12
- entropy
- 3.274
- entropy_ratio
- 0.9134
editors
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients
numeric constant- n
- 50
- nulls
- 5 (10.0%)
- unique
- 1
- min
- 1
- max
- 1
- mean
- 1
- median
- 1
- std
- 0
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
traces_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
packaging
categorical long_tail- n
- 50
- nulls
- 6 (12.0%)
- unique
- 41
- top_value
- Plastique
- top_rate
- 0.09091
- cardinality
- 41
- entropy
- 5.278
- entropy_ratio
- 0.9851
packagings_n
numeric outliers- n
- 50
- nulls
- 9 (18.0%)
- unique
- 5
- min
- 1
- max
- 5
- mean
- 2.073
- median
- 2
- std
- 0.8772
- q1
- 2
- q3
- 2
- iqr
- 0
- skew
- 0.9834
- kurtosis
- 1.602
- n_outliers
- 20
- outlier_rate
- 0.4878
- zero_rate
- 0
categories_properties
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
generic_name_en
categorical long_tail- n
- 50
- nulls
- 7 (14.0%)
- unique
- 8
- top_value
- top_rate
- 0.8372
- cardinality
- 8
- entropy
- 1.098
- entropy_ratio
- 0.366
food_groups
categorical- n
- 50
- nulls
- 1 (2.0%)
- unique
- 11
- top_value
- en:biscuits-and-cakes
- top_rate
- 0.3469
- cardinality
- 11
- entropy
- 2.549
- entropy_ratio
- 0.7367
ingredients_without_ciqual_codes_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 15
- min
- 0
- max
- 22
- mean
- 4.98
- median
- 3.5
- std
- 4.825
- q1
- 1
- q3
- 7.75
- iqr
- 6.75
- skew
- 1.208
- kurtosis
- 1.491
- n_outliers
- 1
- outlier_rate
- 0.02
- zero_rate
- 0.18
origin_sv
categorical other null_rate imbalanceThis column, likely an origin or source indicator (possibly a survey or system variant field), is effectively empty: 92% of its 50 rows are null, and the sole non-null 'value' present in 4 rows is itself an empty string. With cardinality of 1 and entropy of 0, there is zero information content in this column. The combination of near-total nulls and a blank top value means the column carries no usable signal whatsoever. Treatment: Drop — column contains no information (92% null, remaining values are empty strings).
- n
- 50
- nulls
- 46 (92.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_ja
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
scores
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
data_quality_dimensions
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
product_name_fi
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 4
- top_value
- top_rate
- 0.4
- cardinality
- 4
- entropy
- 1.922
- entropy_ratio
- 0.961
origin_de
categorical label null_rate imbalanceThis column appears to be a German-language origin/source label field ('origin_de'), but it contains effectively no usable data: the only observed value across all 50 rows is an empty string, appearing 20 times, with 60% of rows (30) being null. Cardinality is 1, entropy is 0, and top_rate is 1.0 — the column is entirely uninformative in its current state. Treatment: Drop this column; it carries zero information (all non-null values are empty strings and 60% are null).
- n
- 50
- nulls
- 30 (60.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_lc
categorical- n
- 50
- nulls
- 6 (12.0%)
- unique
- 7
- top_value
- fr
- top_rate
- 0.3864
- cardinality
- 7
- entropy
- 1.992
- entropy_ratio
- 0.7094
categories_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_ids_debug
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
traces_lc
categorical- n
- 50
- nulls
- 2 (4.0%)
- unique
- 6
- top_value
- fr
- top_rate
- 0.4792
- cardinality
- 6
- entropy
- 1.575
- entropy_ratio
- 0.6093
last_image_t
numeric high_skew- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- min
- 1.639e+09
- max
- 1.768e+09
- mean
- 1.745e+09
- median
- 1.752e+09
- std
- 2.681e+07
- q1
- 1.735e+09
- q3
- 1.764e+09
- iqr
- 2.896e+07
- skew
- -2.443
- kurtosis
- 7.36
- n_outliers
- 2
- outlier_rate
- 0.04
- zero_rate
- 0
ingredients_that_may_be_from_palm_oil_n
numeric high_skew outliers- n
- 50
- nulls
- 4 (8.0%)
- unique
- 3
- min
- 0
- max
- 2
- mean
- 0.1957
- median
- 0
- std
- 0.4531
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 2.23
- kurtosis
- 4.321
- n_outliers
- 8
- outlier_rate
- 0.1739
- zero_rate
- 0.8261
max_imgid
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 38
- top_value
- 47
- top_rate
- 0.06
- cardinality
- 38
- entropy
- 5.149
- entropy_ratio
- 0.9811
generic_name_sv
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 4
- top_value
- Fin mörk choklad med 90% kakao
- top_rate
- 0.25
- cardinality
- 4
- entropy
- 2
- entropy_ratio
- 1
ingredients_text_with_allergens_nb
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
quantity
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 36
- top_value
- 100 g
- top_rate
- 0.1224
- cardinality
- 36
- entropy
- 4.956
- entropy_ratio
- 0.9587
countries_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 22
- min
- 1
- max
- 39
- mean
- 11.7
- median
- 9
- std
- 8.244
- q1
- 5
- q3
- 16
- iqr
- 11
- skew
- 1.237
- kurtosis
- 1.435
- n_outliers
- 2
- outlier_rate
- 0.04
- zero_rate
- 0
grades
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
nutrition_score_beverage
numeric high_skew- n
- 50
- nulls
- 0 (0.0%)
- unique
- 2
- min
- 0
- max
- 1
- mean
- 0.02
- median
- 0
- std
- 0.1414
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 6.857
- kurtosis
- 45.02
- n_outliers
- 1
- outlier_rate
- 0.02
- zero_rate
- 0.98
packaging_text_nl
categorical other null_rate imbalanceThis column appears to hold Dutch-language packaging text for products, but is effectively empty: 76% of the 50 rows are null, and the sole non-null value is an empty string appearing 12 times, giving a cardinality of 1 and zero entropy. Every observed value is either missing or a blank string, meaning this column carries no usable information in this sample. Treatment: Drop this column; it contains no informative values in the current dataset.
- n
- 50
- nulls
- 38 (76.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
photographers
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
pnns_groups_1
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 7
- top_value
- Sugary snacks
- top_rate
- 0.76
- cardinality
- 7
- entropy
- 1.36
- entropy_ratio
- 0.4846
product_name_en
categorical long_tail- n
- 50
- nulls
- 7 (14.0%)
- unique
- 34
- top_value
- top_rate
- 0.2326
- cardinality
- 34
- entropy
- 4.654
- entropy_ratio
- 0.9147
traces_from_user
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 35
- top_value
- (en)
- top_rate
- 0.14
- cardinality
- 35
- entropy
- 4.811
- entropy_ratio
- 0.9379
generic_name_nl
categorical long_tail null_rate- n
- 50
- nulls
- 38 (76.0%)
- unique
- 4
- top_value
- top_rate
- 0.75
- cardinality
- 4
- entropy
- 1.208
- entropy_ratio
- 0.6038
nutrition_grade_fr
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 6
- top_value
- e
- top_rate
- 0.54
- cardinality
- 6
- entropy
- 1.913
- entropy_ratio
- 0.7399
image_front_thumb_url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
last_editor
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 24
- top_value
- foodless
- top_rate
- 0.4286
- cardinality
- 24
- entropy
- 3.513
- entropy_ratio
- 0.7662
product_name_nb
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
_keywords
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
images
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
packaging_text_sv
categorical other null_rate imbalanceThis column appears to be a Swedish-language packaging text field ('sv' suffix indicating Swedish locale), but it is effectively empty in this dataset. A 92% null rate leaves only 4 non-null rows, and all 4 of those contain an empty string — meaning there is zero usable content across all 50 rows. The cardinality of 1 and entropy of 0.0 confirm complete absence of informational signal. Treatment: Drop — 100% of present values are empty strings and 92% are null, yielding no usable signal.
- n
- 50
- nulls
- 46 (92.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_pl
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 3
- top_value
- top_rate
- 0.6
- cardinality
- 3
- entropy
- 1.371
- entropy_ratio
- 0.865
labels
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 42
- top_value
- top_rate
- 0.1633
- cardinality
- 42
- entropy
- 5.125
- entropy_ratio
- 0.9504
sources
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
product_quantity_unit
categorical imbalance- n
- 50
- nulls
- 5 (10.0%)
- unique
- 2
- top_value
- g
- top_rate
- 0.9778
- cardinality
- 2
- entropy
- 0.1537
- entropy_ratio
- 0.1537
last_modified_by
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 24
- top_value
- foodless
- top_rate
- 0.4286
- cardinality
- 24
- entropy
- 3.513
- entropy_ratio
- 0.7662
image_front_url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
nutrition_data_prepared
categorical imbalance- n
- 50
- nulls
- 2 (4.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_fi
categorical metadata null_rate imbalanceThis column appears to be Finnish-language packaging text for a product dataset, but it is almost entirely empty: 90% of the 50 rows are null, and the sole non-null value across all 5 populated rows is an empty string. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated. Treatment: Drop this column; it is 90% null with only empty strings in the remaining rows and provides no signal for modelling or analysis.
- n
- 50
- nulls
- 45 (90.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
interface_version_created
categorical- n
- 50
- nulls
- 1 (2.0%)
- unique
- 3
- top_value
- 20120622
- top_rate
- 0.5918
- cardinality
- 3
- entropy
- 1.167
- entropy_ratio
- 0.7363
nutrient_levels
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
product_name_de
categorical long_tail null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 16
- top_value
- top_rate
- 0.25
- cardinality
- 16
- entropy
- 3.741
- entropy_ratio
- 0.9354
nutrition_grades
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 6
- top_value
- e
- top_rate
- 0.54
- cardinality
- 6
- entropy
- 1.913
- entropy_ratio
- 0.7399
countries_beforescanbot
categorical long_tail- n
- 50
- nulls
- 7 (14.0%)
- unique
- 38
- top_value
- France
- top_rate
- 0.1395
- cardinality
- 38
- entropy
- 5.066
- entropy_ratio
- 0.9653
ingredients_text_with_allergens_es
categorical long_tail null_rate- n
- 50
- nulls
- 31 (62.0%)
- unique
- 13
- top_value
- top_rate
- 0.3684
- cardinality
- 13
- entropy
- 3.214
- entropy_ratio
- 0.8684
labels_lc
categorical- n
- 50
- nulls
- 1 (2.0%)
- unique
- 6
- top_value
- en
- top_rate
- 0.449
- cardinality
- 6
- entropy
- 1.57
- entropy_ratio
- 0.6072
nova_group_debug
categorical long_tail imbalance- n
- 50
- nulls
- 0 (0.0%)
- unique
- 3
- top_value
- top_rate
- 0.96
- cardinality
- 3
- entropy
- 0.2823
- entropy_ratio
- 0.1781
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value
numeric high_skew outliers- n
- 50
- nulls
- 4 (8.0%)
- unique
- 6
- min
- 0
- max
- 50
- mean
- 1.652
- median
- 0
- std
- 7.551
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 5.932
- kurtosis
- 35.23
- n_outliers
- 5
- outlier_rate
- 0.1087
- zero_rate
- 0.8913
lc
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- fr
- top_rate
- 0.7
- cardinality
- 5
- entropy
- 1.294
- entropy_ratio
- 0.5572
allergens_from_user
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 34
- top_value
- (fr)
- top_rate
- 0.16
- cardinality
- 34
- entropy
- 4.636
- entropy_ratio
- 0.9112
debug_param_sorted_langs
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
nutriscore_score_opposite
numeric- n
- 50
- nulls
- 1 (2.0%)
- unique
- 28
- min
- -40
- max
- 0
- mean
- -17.47
- median
- -19
- std
- 9.906
- q1
- -25
- q3
- -10
- iqr
- 15
- skew
- 0.1616
- kurtosis
- -0.5337
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.08163
image_small_url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
unique_scans_n
numeric feature high_skew outliersThis column represents a count of unique scans (likely QR-code or barcode scan events) per record, with 50 observations and no nulls. The bulk of values cluster between 362.75 (Q1) and 560.75 (Q3), yet a right-skewed tail (skew=3.91, kurtosis=18.71) driven by 4 outliers pulls the mean (525.38) well above the median (432.0), with a maximum of 2257.0 — nearly 4× the Q3 value. The outlier rate of 8% in just 50 rows is a strong signal that a small number of records see dramatically higher scan volumes than the rest. Treatment: Log-transform or apply robust scaling before modelling to reduce the influence of the 4 extreme outliers; investigate those records for data-quality issues.
- n
- 50
- nulls
- 0 (0.0%)
- unique
- 48
- min
- 319
- max
- 2,257
- mean
- 525.4
- median
- 432
- std
- 306.4
- q1
- 362.8
- q3
- 560.8
- iqr
- 198
- skew
- 3.911
- kurtosis
- 18.71
- n_outliers
- 4
- outlier_rate
- 0.08
- zero_rate
- 0
update_key
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- brands
- top_rate
- 0.56
- cardinality
- 9
- entropy
- 2.015
- entropy_ratio
- 0.6357
emb_codes_orig
categorical long_tail null_rate- n
- 50
- nulls
- 17 (34.0%)
- unique
- 5
- top_value
- top_rate
- 0.8485
- cardinality
- 5
- entropy
- 0.9048
- entropy_ratio
- 0.3897
ingredients_text_with_allergens_de
categorical long_tail null_rate- n
- 50
- nulls
- 33 (66.0%)
- unique
- 16
- top_value
- top_rate
- 0.1176
- cardinality
- 16
- entropy
- 3.97
- entropy_ratio
- 0.9925
ingredients_without_ecobalyse_ids_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 20
- min
- 0
- max
- 29
- mean
- 8.16
- median
- 6.5
- std
- 5.898
- q1
- 4
- q3
- 11
- iqr
- 7
- skew
- 1.28
- kurtosis
- 1.743
- n_outliers
- 1
- outlier_rate
- 0.02
- zero_rate
- 0.02
ingredients_text_with_allergens_en
categorical long_tail- n
- 50
- nulls
- 8 (16.0%)
- unique
- 36
- top_value
- top_rate
- 0.1667
- cardinality
- 36
- entropy
- 4.924
- entropy_ratio
- 0.9525
ingredients_text_with_allergens_sv
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 4
- top_value
- kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.
- top_rate
- 0.25
- cardinality
- 4
- entropy
- 2
- entropy_ratio
- 1
allergens_from_ingredients
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 35
- top_value
- top_rate
- 0.3
- cardinality
- 35
- entropy
- 4.432
- entropy_ratio
- 0.864
nova_groups
categorical- n
- 50
- nulls
- 2 (4.0%)
- unique
- 3
- top_value
- 4
- top_rate
- 0.6875
- cardinality
- 3
- entropy
- 1.006
- entropy_ratio
- 0.635
product_quantity
categorical long_tail- n
- 50
- nulls
- 3 (6.0%)
- unique
- 27
- top_value
- 100
- top_rate
- 0.234
- cardinality
- 27
- entropy
- 4.287
- entropy_ratio
- 0.9017
ingredients_debug
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
generic_name
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 28
- top_value
- top_rate
- 0.4375
- cardinality
- 28
- entropy
- 3.663
- entropy_ratio
- 0.762
categories_lc
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 6
- top_value
- fr
- top_rate
- 0.5
- cardinality
- 6
- entropy
- 1.628
- entropy_ratio
- 0.6297
image_url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
ingredients_sweeteners_n
numeric constant- n
- 50
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 0
- max
- 0
- mean
- 0
- median
- 0
- std
- 0
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 1
ingredients_text_ja
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_es
categorical other null_rate imbalanceThis column appears to be a Spanish-language origin/source label field ('origin_es'), but it is entirely devoid of meaningful content: the sole observed value is an empty string, appearing 20 times across 50 rows. With a 60% null rate and the remaining 40% being empty strings, the column carries zero informational entropy and is effectively blank across the entire dataset. This is a strong signal that the field was never populated. Treatment: Drop this column; it contains no usable signal (cardinality 1, top value is empty string, 60% nulls).
- n
- 50
- nulls
- 30 (60.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
last_updated_t
numeric outliers- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- min
- 1.739e+09
- max
- 1.769e+09
- mean
- 1.763e+09
- median
- 1.767e+09
- std
- 8.037e+06
- q1
- 1.762e+09
- q3
- 1.768e+09
- iqr
- 6.138e+06
- skew
- -1.945
- kurtosis
- 2.892
- n_outliers
- 6
- outlier_rate
- 0.12
- zero_rate
- 0
origin_fr
categorical long_tail- n
- 50
- nulls
- 4 (8.0%)
- unique
- 7
- top_value
- top_rate
- 0.8696
- cardinality
- 7
- entropy
- 0.8958
- entropy_ratio
- 0.3191
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value
numeric high_skew outliers- n
- 50
- nulls
- 5 (10.0%)
- unique
- 13
- min
- 0
- max
- 100
- mean
- 4.532
- median
- 0
- std
- 15.52
- q1
- 0
- q3
- 2.326
- iqr
- 2.326
- skew
- 5.411
- kurtosis
- 30.37
- n_outliers
- 7
- outlier_rate
- 0.1556
- zero_rate
- 0.7111
ingredients_without_ecobalyse_ids
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_text_with_allergens_it
categorical long_tail null_rate- n
- 50
- nulls
- 34 (68.0%)
- unique
- 12
- top_value
- top_rate
- 0.3125
- cardinality
- 12
- entropy
- 3.274
- entropy_ratio
- 0.9134
origin_pl
categorical metadata null_rate imbalanceThis column appears to be an 'origin platform' or similar provenance field, but it is essentially empty: 90% of its 50 rows are null, and the only non-null value is an empty string appearing 5 times. With cardinality of 1 and entropy of 0.0, it carries zero information. The combination of high null rate and a single blank value strongly suggests this field was never populated in this dataset slice. Treatment: Drop — zero variance and 90% nulls make this column useless for modelling or analysis.
- n
- 50
- nulls
- 45 (90.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_fr
categorical long_tail- n
- 50
- nulls
- 3 (6.0%)
- unique
- 14
- top_value
- top_rate
- 0.7234
- cardinality
- 14
- entropy
- 1.874
- entropy_ratio
- 0.4923
ingredients_text_sv
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 4
- top_value
- kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.
- top_rate
- 0.25
- cardinality
- 4
- entropy
- 2
- entropy_ratio
- 1
ingredients_with_unspecified_percent_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 18
- min
- 1
- max
- 33
- mean
- 8.8
- median
- 7
- std
- 6.061
- q1
- 5
- q3
- 11
- iqr
- 6
- skew
- 1.645
- kurtosis
- 3.545
- n_outliers
- 2
- outlier_rate
- 0.04
- zero_rate
- 0
product_name_fr
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 47
- top_value
- Henry’s
- top_rate
- 0.04082
- cardinality
- 47
- entropy
- 5.533
- entropy_ratio
- 0.9961
traces
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 23
- top_value
- top_rate
- 0.22
- cardinality
- 23
- entropy
- 3.922
- entropy_ratio
- 0.8671
known_ingredients_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 22
- min
- 0
- max
- 36
- mean
- 11.76
- median
- 9
- std
- 8.721
- q1
- 5
- q3
- 18.5
- iqr
- 13.5
- skew
- 0.8598
- kurtosis
- 0.07411
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.04
packaging_text_pl
categorical other null_rate imbalanceThis column appears to be a Polish-language packaging text field that is almost entirely empty: 90% of its 50 rows are null, and the sole non-null value present in 5 rows is an empty string. With cardinality of 1 and entropy of 0, the column carries zero information. The combination of a 90% null rate and a top_value of '' means not a single meaningful entry exists in this sample. Treatment: Drop this column; it contains no usable information in the current sample.
- n
- 50
- nulls
- 45 (90.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
image_front_small_url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
origin_en
categorical imbalance- n
- 50
- nulls
- 7 (14.0%)
- unique
- 2
- top_value
- top_rate
- 0.9767
- cardinality
- 2
- entropy
- 0.1594
- entropy_ratio
- 0.1594
interface_version_modified
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 20150316.jqm2
- top_rate
- 0.84
- cardinality
- 2
- entropy
- 0.6343
- entropy_ratio
- 0.6343
serving_size
categorical long_tail- n
- 50
- nulls
- 6 (12.0%)
- unique
- 37
- top_value
- 100g
- top_rate
- 0.06818
- cardinality
- 37
- entropy
- 5.107
- entropy_ratio
- 0.9803
states
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 26
- top_value
- en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded
- top_rate
- 0.16
- cardinality
- 26
- entropy
- 4.286
- entropy_ratio
- 0.9119
generic_name_fi
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 5
- top_value
- Hieno tumma suklaa jossa 90% kaakaota
- top_rate
- 0.2
- cardinality
- 5
- entropy
- 2.322
- entropy_ratio
- 1
schema_version
numeric constant- n
- 50
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 996
- max
- 996
- mean
- 996
- median
- 996
- std
- 0
- q1
- 996
- q3
- 996
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
packaging_old_before_taxonomization
categorical long_tail null_rate- n
- 50
- nulls
- 12 (24.0%)
- unique
- 36
- top_value
- plastique
- top_rate
- 0.07895
- cardinality
- 36
- entropy
- 5.123
- entropy_ratio
- 0.9909
nova_groups_markers
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
product
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
emb_codes
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 11
- top_value
- top_rate
- 0.7292
- cardinality
- 11
- entropy
- 1.72
- entropy_ratio
- 0.4972
selected_images
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
nutriscore
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
traces_from_ingredients
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 12
- top_value
- top_rate
- 0.78
- cardinality
- 12
- entropy
- 1.521
- entropy_ratio
- 0.4243
nutrition_data_per
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 100g
- top_rate
- 0.84
- cardinality
- 2
- entropy
- 0.6343
- entropy_ratio
- 0.6343
ecoscore_grade
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- e
- top_rate
- 0.24
- cardinality
- 9
- entropy
- 2.808
- entropy_ratio
- 0.8857
packaging_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
nova_group
numeric high_skew- n
- 50
- nulls
- 2 (4.0%)
- unique
- 3
- min
- 1
- max
- 4
- mean
- 3.646
- median
- 4
- std
- 0.601
- q1
- 3
- q3
- 4
- iqr
- 1
- skew
- -2.062
- kurtosis
- 5.651
- n_outliers
- 1
- outlier_rate
- 0.02083
- zero_rate
- 0
emb_codes_20141016
categorical long_tail null_rate- n
- 50
- nulls
- 29 (58.0%)
- unique
- 7
- top_value
- top_rate
- 0.7143
- cardinality
- 7
- entropy
- 1.602
- entropy_ratio
- 0.5705
ingredients_without_ciqual_codes
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
category_properties
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
packagings
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
languages_codes
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_text_with_allergens_fi
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 4
- top_value
- top_rate
- 0.4
- cardinality
- 4
- entropy
- 1.922
- entropy_ratio
- 0.961
complete
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 2
- min
- 0
- max
- 1
- mean
- 0.32
- median
- 0
- std
- 0.4712
- q1
- 0
- q3
- 1
- iqr
- 1
- skew
- 0.7717
- kurtosis
- -1.404
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.68
ingredients_text_with_allergens_pl
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 3
- top_value
- top_rate
- 0.5
- cardinality
- 3
- entropy
- 1.5
- entropy_ratio
- 0.9464
allergens_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
languages_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_text_it
categorical long_tail null_rate- n
- 50
- nulls
- 34 (68.0%)
- unique
- 12
- top_value
- top_rate
- 0.3125
- cardinality
- 12
- entropy
- 3.274
- entropy_ratio
- 0.9134
informers
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
origin_nb
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
creator
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 13
- top_value
- openfoodfacts-contributors
- top_rate
- 0.46
- cardinality
- 13
- entropy
- 2.351
- entropy_ratio
- 0.6353
packaging_text_ja
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
sortkey
numeric high_skew outliers- n
- 50
- nulls
- 6 (12.0%)
- unique
- 44
- min
- 1.568e+09
- max
- 1.611e+09
- mean
- 1.605e+09
- median
- 1.608e+09
- std
- 8.692e+06
- q1
- 1.604e+09
- q3
- 1.61e+09
- iqr
- 6.16e+06
- skew
- -2.782
- kurtosis
- 8.091
- n_outliers
- 4
- outlier_rate
- 0.09091
- zero_rate
- 0
packagings_materials_main
categorical null_rate- n
- 50
- nulls
- 31 (62.0%)
- unique
- 3
- top_value
- en:paper-or-cardboard
- top_rate
- 0.6842
- cardinality
- 3
- entropy
- 1.105
- entropy_ratio
- 0.6972
ingredients_percent_analysis
numeric feature high_skew outliersThis column appears to be a binary flag or pass/fail indicator for ingredient percentage analysis, taking only two distinct values across all 50 rows: 1.0 (present in the vast majority) and -1.0 (a minority case). With Q1, median, and Q3 all equal to 1.0 and a mean of 0.84, roughly 84% of records are coded 1.0 while the remaining ~16% are -1.0, which are flagged as the 4 outliers (8% outlier rate). The extreme skew (−3.10) and kurtosis (7.59) are entirely explained by this near-constant binary distribution, not by a continuous numeric spread. Treatment: Recode as a binary categorical (1 / -1 → 1 / 0) before modelling; verify whether -1.0 encodes 'fail' or 'missing' to avoid misinterpretation.
- n
- 50
- nulls
- 0 (0.0%)
- unique
- 2
- min
- -1
- max
- 1
- mean
- 0.84
- median
- 1
- std
- 0.5481
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- -3.096
- kurtosis
- 7.587
- n_outliers
- 4
- outlier_rate
- 0.08
- zero_rate
- 0
environment_impact_level
categorical other null_rate imbalanceThis column is intended to capture an environmental impact level category, but it is effectively empty: 56% of the 50 rows are null and the remaining 44% (22 rows) contain only a blank string, yielding a single unique value and zero entropy. The column carries no usable information in its current state and is entirely uninformative for modelling or analysis. Treatment: Drop this column; all non-null values are blank strings and it contains zero informational signal.
- n
- 50
- nulls
- 28 (56.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
expiration_date
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 34
- top_value
- top_rate
- 0.3125
- cardinality
- 34
- entropy
- 4.364
- entropy_ratio
- 0.8578
ingredients_from_or_that_may_be_from_palm_oil_n
numeric- n
- 50
- nulls
- 3 (6.0%)
- unique
- 3
- min
- 0
- max
- 2
- mean
- 0.3404
- median
- 0
- std
- 0.5625
- q1
- 0
- q3
- 1
- iqr
- 1
- skew
- 1.393
- kurtosis
- 0.969
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.7021
nutriscore_score
numeric- n
- 50
- nulls
- 1 (2.0%)
- unique
- 28
- min
- 0
- max
- 40
- mean
- 17.47
- median
- 19
- std
- 9.906
- q1
- 10
- q3
- 25
- iqr
- 15
- skew
- -0.1616
- kurtosis
- -0.5337
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.08163
ingredients_text_with_allergens
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- milk cream, cream, sugar, banana, bacteria
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
ingredients_with_specified_percent_sum
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 22
- min
- 0
- max
- 99.6
- mean
- 22.74
- median
- 0
- std
- 32.88
- q1
- 0
- q3
- 52.25
- iqr
- 52.25
- skew
- 0.9979
- kurtosis
- -0.5856
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.58
nutriscore_version
categorical imbalance- n
- 50
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- 2023
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
lang
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- fr
- top_rate
- 0.7
- cardinality
- 5
- entropy
- 1.294
- entropy_ratio
- 0.5572
origins_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
origins_lc
categorical- n
- 50
- nulls
- 2 (4.0%)
- unique
- 6
- top_value
- fr
- top_rate
- 0.4792
- cardinality
- 6
- entropy
- 1.575
- entropy_ratio
- 0.6093
origin_it
categorical other null_rate imbalanceThis column appears to be an 'origin Italy' flag or similar origin/locale indicator, but it is effectively empty: 68% of its 50 rows are null, and the sole non-null value present is an empty string appearing 16 times. With cardinality of 1 and entropy of 0, the column carries zero information. The combination of high nulls and a blank-string-only value suggests the field was never populated in this dataset slice. Treatment: Drop — zero variance and entirely unpopulated (null or empty string); contributes no signal to any downstream task.
- n
- 50
- nulls
- 34 (68.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
serving_quantity
categorical long_tail- n
- 50
- nulls
- 6 (12.0%)
- unique
- 27
- top_value
- 100
- top_rate
- 0.1591
- cardinality
- 27
- entropy
- 4.322
- entropy_ratio
- 0.9089
checkers
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
stores
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 31
- top_value
- top_rate
- 0.2917
- cardinality
- 31
- entropy
- 4.233
- entropy_ratio
- 0.8543
product_name_pl
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 3
- top_value
- top_rate
- 0.6
- cardinality
- 3
- entropy
- 1.371
- entropy_ratio
- 0.865
ecoscore_score
numeric- n
- 50
- nulls
- 7 (14.0%)
- unique
- 31
- min
- 13
- max
- 94
- mean
- 47.74
- median
- 44
- std
- 21.19
- q1
- 27.5
- q3
- 64
- iqr
- 36.5
- skew
- 0.3069
- kurtosis
- -0.7946
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
generic_name_it
categorical long_tail null_rate- n
- 50
- nulls
- 34 (68.0%)
- unique
- 5
- top_value
- top_rate
- 0.6875
- cardinality
- 5
- entropy
- 1.497
- entropy_ratio
- 0.6446
obsolete
categorical imbalance- n
- 50
- nulls
- 6 (12.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
compared_to_category
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 35
- top_value
- en:dark-chocolate-bar-with-more-than-70-cocoa
- top_rate
- 0.1
- cardinality
- 35
- entropy
- 4.886
- entropy_ratio
- 0.9526
generic_name_es
categorical long_tail null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 7
- top_value
- top_rate
- 0.65
- cardinality
- 7
- entropy
- 1.817
- entropy_ratio
- 0.6471
correctors
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
additives_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 8
- min
- 0
- max
- 8
- mean
- 1.52
- median
- 1
- std
- 1.821
- q1
- 0
- q3
- 2
- iqr
- 2
- skew
- 1.473
- kurtosis
- 2.105
- n_outliers
- 2
- outlier_rate
- 0.04
- zero_rate
- 0.4
ingredients_text_nb
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_es
categorical long_tail null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 13
- top_value
- top_rate
- 0.4
- cardinality
- 13
- entropy
- 3.122
- entropy_ratio
- 0.8437
origin
categorical long_tail- n
- 50
- nulls
- 3 (6.0%)
- unique
- 6
- top_value
- top_rate
- 0.8936
- cardinality
- 6
- entropy
- 0.7359
- entropy_ratio
- 0.2847
origins_old
categorical long_tail null_rate- n
- 50
- nulls
- 11 (22.0%)
- unique
- 9
- top_value
- top_rate
- 0.7949
- cardinality
- 9
- entropy
- 1.347
- entropy_ratio
- 0.4251
packaging_text_de
categorical null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 2
- top_value
- top_rate
- 0.95
- cardinality
- 2
- entropy
- 0.2864
- entropy_ratio
- 0.2864
languages
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
categories_old
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 45
- top_value
- Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits secs
- top_rate
- 0.04082
- cardinality
- 45
- entropy
- 5.451
- entropy_ratio
- 0.9926
origin_fi
categorical other null_rate imbalanceThis column, likely representing an origin financial institution or similar identifier, is almost entirely empty: 90% null rate with only 5 non-null rows across 50 records. Among those 5 non-null values, every single one is an empty string, meaning the column contains zero meaningful information—cardinality is 1, entropy is 0, and the sole 'value' is a blank. Treatment: Drop this column entirely; it carries no information and is 100% effectively empty across all 50 rows.
- n
- 50
- nulls
- 45 (90.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_old
categorical long_tail- n
- 50
- nulls
- 7 (14.0%)
- unique
- 40
- top_value
- Plastique
- top_rate
- 0.06977
- cardinality
- 40
- entropy
- 5.269
- entropy_ratio
- 0.9901
ingredients_text_fi
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 4
- top_value
- top_rate
- 0.4
- cardinality
- 4
- entropy
- 1.922
- entropy_ratio
- 0.961
product_type
categorical imbalance- n
- 50
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- food
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
no_nutrition_data
categorical imbalance- n
- 50
- nulls
- 2 (4.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_analysis
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
packagings_materials
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
serving_quantity_unit
categorical imbalance- n
- 50
- nulls
- 4 (8.0%)
- unique
- 2
- top_value
- g
- top_rate
- 0.9783
- cardinality
- 2
- entropy
- 0.1511
- entropy_ratio
- 0.1511
product_name
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 49
- top_value
- Henry’s
- top_rate
- 0.04
- cardinality
- 49
- entropy
- 5.604
- entropy_ratio
- 0.9981
id
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- 6111242100992
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
ingredients_text_with_allergens_nl
categorical long_tail null_rate- n
- 50
- nulls
- 39 (78.0%)
- unique
- 9
- top_value
- top_rate
- 0.2727
- cardinality
- 9
- entropy
- 3.027
- entropy_ratio
- 0.955
categories
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 46
- top_value
- Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolat noir en tablette extra dégustation à 70% de cacao minimum
- top_rate
- 0.06
- cardinality
- 46
- entropy
- 5.469
- entropy_ratio
- 0.9901
origin_ja
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
nutrition_score_debug
categorical imbalance- n
- 50
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- top_rate
- 0.98
- cardinality
- 2
- entropy
- 0.1414
- entropy_ratio
- 0.1414
teams
categorical long_tail- n
- 50
- nulls
- 4 (8.0%)
- unique
- 39
- top_value
- pain-au-chocolat
- top_rate
- 0.1087
- cardinality
- 39
- entropy
- 5.124
- entropy_ratio
- 0.9695
unknown_ingredients_n
numeric high_skew outliers- n
- 50
- nulls
- 0 (0.0%)
- unique
- 6
- min
- 0
- max
- 13
- mean
- 0.66
- median
- 0
- std
- 2.255
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 4.236
- kurtosis
- 18.32
- n_outliers
- 8
- outlier_rate
- 0.16
- zero_rate
- 0.84
url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://world.openfoodfacts.org/product/6111242100992/perly
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
ecoscore_data
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
generic_name_pl
categorical null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 2
- top_value
- top_rate
- 0.8
- cardinality
- 2
- entropy
- 0.7219
- entropy_ratio
- 0.7219
nutrition_data
categorical imbalance- n
- 50
- nulls
- 1 (2.0%)
- unique
- 1
- top_value
- on
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_ja
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
nutriments
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
brands
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 41
- top_value
- Lindt
- top_rate
- 0.08
- cardinality
- 41
- entropy
- 5.214
- entropy_ratio
- 0.9731
nutrition_data_prepared_per
categorical imbalance- n
- 50
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- 100g
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_es
categorical null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 2
- top_value
- top_rate
- 0.95
- cardinality
- 2
- entropy
- 0.2864
- entropy_ratio
- 0.2864
manufacturing_places
categorical long_tail- n
- 50
- nulls
- 1 (2.0%)
- unique
- 20
- top_value
- top_rate
- 0.4082
- cardinality
- 20
- entropy
- 3.187
- entropy_ratio
- 0.7374
generic_name_nb
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
last_modified_t
numeric outliers- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- min
- 1.738e+09
- max
- 1.769e+09
- mean
- 1.763e+09
- median
- 1.767e+09
- std
- 8.093e+06
- q1
- 1.762e+09
- q3
- 1.768e+09
- iqr
- 6.138e+06
- skew
- -1.961
- kurtosis
- 2.972
- n_outliers
- 6
- outlier_rate
- 0.12
- zero_rate
- 0
_id
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- 6111242100992
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
countries
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 43
- top_value
- Maroc
- top_rate
- 0.1
- cardinality
- 43
- entropy
- 5.252
- entropy_ratio
- 0.9678
pnns_groups_2
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 11
- top_value
- Biscuits and cakes
- top_rate
- 0.34
- cardinality
- 11
- entropy
- 2.599
- entropy_ratio
- 0.7513
states_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
code
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- 6111242100992
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
countries_lc
categorical- n
- 50
- nulls
- 1 (2.0%)
- unique
- 6
- top_value
- en
- top_rate
- 0.5714
- cardinality
- 6
- entropy
- 1.521
- entropy_ratio
- 0.5883
generic_name_de
categorical long_tail null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 9
- top_value
- top_rate
- 0.6
- cardinality
- 9
- entropy
- 2.171
- entropy_ratio
- 0.6849
allergens
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 16
- top_value
- top_rate
- 0.32
- cardinality
- 16
- entropy
- 3.364
- entropy_ratio
- 0.8411
allergens_lc
categorical- n
- 50
- nulls
- 2 (4.0%)
- unique
- 6
- top_value
- en
- top_rate
- 0.4583
- cardinality
- 6
- entropy
- 1.578
- entropy_ratio
- 0.6104
ingredients_text_en
categorical long_tail- n
- 50
- nulls
- 6 (12.0%)
- unique
- 36
- top_value
- top_rate
- 0.2045
- cardinality
- 36
- entropy
- 4.811
- entropy_ratio
- 0.9306
product_name_nl
categorical long_tail null_rate- n
- 50
- nulls
- 38 (76.0%)
- unique
- 7
- top_value
- top_rate
- 0.5
- cardinality
- 7
- entropy
- 2.292
- entropy_ratio
- 0.8166
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients
numeric constant- n
- 50
- nulls
- 4 (8.0%)
- unique
- 1
- min
- 1
- max
- 1
- mean
- 1
- median
- 1
- std
- 0
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
product_name_sv
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 4
- top_value
- 90% Cocoa
- top_rate
- 0.25
- cardinality
- 4
- entropy
- 2
- entropy_ratio
- 1
completeness
numeric outliers- n
- 50
- nulls
- 0 (0.0%)
- unique
- 14
- min
- 0.575
- max
- 1.1
- mean
- 0.91
- median
- 0.9
- std
- 0.1358
- q1
- 0.8875
- q3
- 1
- iqr
- 0.1125
- skew
- -0.6678
- kurtosis
- 0.32
- n_outliers
- 6
- outlier_rate
- 0.12
- zero_rate
- 0
ingredients_with_specified_percent_n
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 7
- min
- 0
- max
- 8
- mean
- 1.1
- median
- 0
- std
- 1.729
- q1
- 0
- q3
- 2
- iqr
- 2
- skew
- 1.878
- kurtosis
- 3.676
- n_outliers
- 1
- outlier_rate
- 0.02
- zero_rate
- 0.58
origin_nl
categorical other null_rate imbalanceThis column ('origin_nl') is a categorical field, likely a Dutch-language origin label or description, but it is effectively empty: 76% of the 50 rows are null, and the sole non-null value present is an empty string (''), appearing 12 times. With cardinality of 1, zero entropy, and a top_rate of 1.0 across only 12 non-null rows, the column carries no information whatsoever. Treatment: Drop this column; it contains no usable signal (100% null or empty string across all 50 rows).
- n
- 50
- nulls
- 38 (76.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
fruits-vegetables-nuts_100g_estimate
numeric null_rate high_skew- n
- 50
- nulls
- 23 (46.0%)
- unique
- 2
- min
- 0
- max
- 85
- mean
- 3.148
- median
- 0
- std
- 16.36
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 4.903
- kurtosis
- 22.04
- n_outliers
- 1
- outlier_rate
- 0.03704
- zero_rate
- 0.963
brands_old
categorical long_tail null_rate- n
- 50
- nulls
- 16 (32.0%)
- unique
- 29
- top_value
- Gerblé
- top_rate
- 0.08824
- cardinality
- 29
- entropy
- 4.749
- entropy_ratio
- 0.9776
generic_name_fr
categorical long_tail- n
- 50
- nulls
- 3 (6.0%)
- unique
- 34
- top_value
- top_rate
- 0.2979
- cardinality
- 34
- entropy
- 4.42
- entropy_ratio
- 0.8689
ingredients
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_text_de
categorical long_tail null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 16
- top_value
- top_rate
- 0.25
- cardinality
- 16
- entropy
- 3.741
- entropy_ratio
- 0.9354
nutriscore_grade
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 6
- top_value
- e
- top_rate
- 0.54
- cardinality
- 6
- entropy
- 1.913
- entropy_ratio
- 0.7399
image_thumb_url
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
packaging_text_en
categorical long_tail- n
- 50
- nulls
- 7 (14.0%)
- unique
- 5
- top_value
- top_rate
- 0.907
- cardinality
- 5
- entropy
- 0.6325
- entropy_ratio
- 0.2724
packaging_text_it
categorical long_tail null_rate- n
- 50
- nulls
- 34 (68.0%)
- unique
- 3
- top_value
- top_rate
- 0.875
- cardinality
- 3
- entropy
- 0.6686
- entropy_ratio
- 0.4218
packaging_text
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 13
- top_value
- top_rate
- 0.75
- cardinality
- 13
- entropy
- 1.708
- entropy_ratio
- 0.4614
popularity_key
numeric identifier high_skew outliersThis column appears to be a synthetic or encoded identifier rather than a true popularity metric — values cluster tightly in the 23.9–24.0 billion range, with a median of ~23,999,500,422 and a max of ~23,999,992,269, suggesting a fixed-prefix integer key scheme. The strong negative skew (−2.67) and high kurtosis (5.11) are driven by 5 outlier values that fall far below the cluster, near the minimum of ~22,999,500,355, which is about 1 billion lower than the bulk of records. Despite the name 'popularity_key', the distribution is inconsistent with any organic popularity signal and is almost certainly a generated or composite key. Treatment: Treat as an opaque identifier; do not use as a numeric feature — investigate the 5 outlier records (~10% of data) for data integrity issues before joining or filtering.
- n
- 50
- nulls
- 0 (0.0%)
- unique
- 49
- min
- 2.3e+10
- max
- 2.4e+10
- mean
- 2.39e+10
- median
- 2.4e+10
- std
- 3.03e+08
- q1
- 2.4e+10
- q3
- 2.4e+10
- iqr
- 4.002e+05
- skew
- -2.667
- kurtosis
- 5.111
- n_outliers
- 5
- outlier_rate
- 0.1
- zero_rate
- 0
ingredients_text
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- top_value
- milk cream, cream, sugar, banana, bacteria
- top_rate
- 0.02
- cardinality
- 50
- entropy
- 5.644
- entropy_ratio
- 1
ingredients_text_with_allergens_fr
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 47
- top_value
- top_rate
- 0.04167
- cardinality
- 47
- entropy
- 5.543
- entropy_ratio
- 0.998
ingredients_text_nl
categorical long_tail null_rate- n
- 50
- nulls
- 38 (76.0%)
- unique
- 9
- top_value
- top_rate
- 0.3333
- cardinality
- 9
- entropy
- 2.918
- entropy_ratio
- 0.9206
product_name_es
categorical long_tail null_rate- n
- 50
- nulls
- 30 (60.0%)
- unique
- 17
- top_value
- top_rate
- 0.2
- cardinality
- 17
- entropy
- 3.922
- entropy_ratio
- 0.9595
obsolete_since_date
categorical imbalance- n
- 50
- nulls
- 6 (12.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_debug
categorical long_tail null_rate- n
- 50
- nulls
- 14 (28.0%)
- unique
- 35
- top_value
- top_rate
- 0.05556
- cardinality
- 35
- entropy
- 5.114
- entropy_ratio
- 0.9971
link
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 28
- top_value
- top_rate
- 0.4375
- cardinality
- 28
- entropy
- 3.663
- entropy_ratio
- 0.762
created_t
numeric- n
- 50
- nulls
- 0 (0.0%)
- unique
- 50
- min
- 1.338e+09
- max
- 1.724e+09
- mean
- 1.483e+09
- median
- 1.476e+09
- std
- 1.043e+08
- q1
- 1.386e+09
- q3
- 1.555e+09
- iqr
- 1.694e+08
- skew
- 0.3311
- kurtosis
- -0.8095
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
ingredients_text_fr
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 47
- top_value
- top_rate
- 0.04167
- cardinality
- 47
- entropy
- 5.543
- entropy_ratio
- 0.998
labels_hierarchy
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
ingredients_non_nutritive_sweeteners_n
numeric constant- n
- 50
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 0
- max
- 0
- mean
- 0
- median
- 0
- std
- 0
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 1
packaging_text_nb
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packagings_complete
numeric- n
- 50
- nulls
- 2 (4.0%)
- unique
- 2
- min
- 0
- max
- 1
- mean
- 0.5208
- median
- 1
- std
- 0.5049
- q1
- 0
- q3
- 1
- iqr
- 1
- skew
- -0.08341
- kurtosis
- -1.993
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0.4792
data_sources
categorical long_tail- n
- 50
- nulls
- 0 (0.0%)
- unique
- 43
- top_value
- App - yuka, Apps, App - Open Food Facts, App - smoothie-openfoodfacts
- top_rate
- 0.08
- cardinality
- 43
- entropy
- 5.309
- entropy_ratio
- 0.9783
labels_old
categorical long_tail- n
- 50
- nulls
- 4 (8.0%)
- unique
- 38
- top_value
- top_rate
- 0.1957
- cardinality
- 38
- entropy
- 4.903
- entropy_ratio
- 0.9343
ingredients_from_palm_oil_n
numeric outliers- n
- 50
- nulls
- 4 (8.0%)
- unique
- 2
- min
- 0
- max
- 1
- mean
- 0.1522
- median
- 0
- std
- 0.3632
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 1.937
- kurtosis
- 1.751
- n_outliers
- 7
- outlier_rate
- 0.1522
- zero_rate
- 0.8478
ingredients_text_with_allergens_ja
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_lc
categorical- n
- 50
- nulls
- 0 (0.0%)
- unique
- 4
- top_value
- fr
- top_rate
- 0.7
- cardinality
- 4
- entropy
- 1.212
- entropy_ratio
- 0.6061
origins
categorical long_tail- n
- 50
- nulls
- 2 (4.0%)
- unique
- 20
- top_value
- top_rate
- 0.5
- cardinality
- 20
- entropy
- 3.027
- entropy_ratio
- 0.7003
nutriscore_data
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
scans_n
numeric feature high_skew outliersThis column likely represents a count of scans per record (e.g., barcode or document scans), with 50 records and no nulls. The bulk of values sit in a moderate range (Q1=387, median=492, Q3=604), but extreme positive skew (3.90) and very high kurtosis (18.72) are driven by 4 outliers (8% of rows) reaching up to 2523 — more than 4× the median. The min of 333 suggests a natural floor, possibly a minimum scan threshold or truncation artefact. Treatment: Investigate the 4 outliers before modelling; apply log-transform or robust scaling to reduce skew impact in regression or distance-based models.
- n
- 50
- nulls
- 0 (0.0%)
- unique
- 49
- min
- 333
- max
- 2,523
- mean
- 577.9
- median
- 492
- std
- 343.9
- q1
- 387
- q3
- 604
- iqr
- 217
- skew
- 3.899
- kurtosis
- 18.72
- n_outliers
- 4
- outlier_rate
- 0.08
- zero_rate
- 0
generic_name_ar
categorical null_rate- n
- 50
- nulls
- 40 (80.0%)
- unique
- 2
- top_value
- top_rate
- 0.9
- cardinality
- 2
- entropy
- 0.469
- entropy_ratio
- 0.469
product_name_uk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
last_checked_t
numeric null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- min
- 1.541e+09
- max
- 1.73e+09
- mean
- 1.607e+09
- median
- 1.565e+09
- std
- 7.772e+07
- q1
- 1.556e+09
- q3
- 1.652e+09
- iqr
- 9.601e+07
- skew
- 0.8106
- kurtosis
- -1.103
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
ingredients_text_uk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
carbon_footprint_from_known_ingredients_debug
categorical long_tail null_rate- n
- 50
- nulls
- 36 (72.0%)
- unique
- 14
- top_value
- en:cereal 50% x 0.3 = 15 g -
- top_rate
- 0.07143
- cardinality
- 14
- entropy
- 3.807
- entropy_ratio
- 1
packaging_text_ar
categorical metadata null_rate imbalanceThis column appears to hold Arabic-language packaging text, but it is effectively empty: 80% of the 50 rows are null, and the remaining 10 non-null rows contain only an empty string — giving a single unique value with top_rate of 1.0 and zero entropy. The column carries no information whatsoever in this dataset snapshot. Treatment: Drop this column; it contains no usable signal (100% null or empty string across all rows).
- n
- 50
- nulls
- 40 (80.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_uk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
last_checker
categorical null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 4
- top_value
- aleene
- top_rate
- 0.4286
- cardinality
- 4
- entropy
- 1.842
- entropy_ratio
- 0.9212
checked
categorical feature null_rate imbalanceThis column appears to be a binary checkbox field (HTML-style 'on'/'off'), but only the value 'on' is ever recorded — cardinality is 1 with 'on' appearing in all 7 non-null rows. The 86% null rate is the dominant signal: nulls almost certainly represent unchecked state rather than missing data, meaning the column encodes a boolean with an unconventional null-as-false convention. Zero entropy confirms complete absence of variation among non-null values. Treatment: Recode nulls as 0 and 'on' as 1 to produce a proper boolean/integer column before modelling.
- n
- 50
- nulls
- 43 (86.0%)
- unique
- 1
- top_value
- on
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_ar
categorical long_tail null_rate- n
- 50
- nulls
- 39 (78.0%)
- unique
- 6
- top_value
- top_rate
- 0.5455
- cardinality
- 6
- entropy
- 2.049
- entropy_ratio
- 0.7928
ingredients_text_with_allergens_uk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_uk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_uk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_ar
categorical null_rate- n
- 50
- nulls
- 39 (78.0%)
- unique
- 2
- top_value
- top_rate
- 0.9091
- cardinality
- 2
- entropy
- 0.4395
- entropy_ratio
- 0.4395
ingredients_text_with_allergens_ar
categorical null_rate- n
- 50
- nulls
- 41 (82.0%)
- unique
- 2
- top_value
- top_rate
- 0.8889
- cardinality
- 2
- entropy
- 0.5033
- entropy_ratio
- 0.5033
carbon_footprint_percent_of_known_ingredients
numeric null_rate- n
- 50
- nulls
- 31 (62.0%)
- unique
- 19
- min
- 8
- max
- 105
- mean
- 61.79
- median
- 70
- std
- 28.98
- q1
- 45.5
- q3
- 78.3
- iqr
- 32.8
- skew
- -0.4493
- kurtosis
- -0.8083
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
origin_ar
categorical other null_rate imbalanceThis column appears to be an Arabic-language origin field ('origin_ar') that is almost entirely empty. With an 80% null rate and cardinality of 1, the sole 'unique' value is itself an empty string appearing 10 times across 50 rows — meaning the column contains no actual data at all. This is a fully degenerate column with zero informational content. Treatment: Drop — column carries no information (100% null or empty string, entropy 0.0).
- n
- 50
- nulls
- 40 (80.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
nutrition_score_warning_no_fiber
numeric null_rate constant- n
- 50
- nulls
- 35 (70.0%)
- unique
- 1
- min
- 1
- max
- 1
- mean
- 1
- median
- 1
- std
- 0
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
nutriments_estimated
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
completed_t
numeric null_rate- n
- 50
- nulls
- 34 (68.0%)
- unique
- 16
- min
- 1.628e+09
- max
- 1.763e+09
- mean
- 1.7e+09
- median
- 1.703e+09
- std
- 4.07e+07
- q1
- 1.663e+09
- q3
- 1.74e+09
- iqr
- 7.618e+07
- skew
- 0.001247
- kurtosis
- -1.155
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
ingredients_text_with_allergens_sl
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_sk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_bg
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
ingredients_text_pt
categorical long_tail null_rate- n
- 50
- nulls
- 40 (80.0%)
- unique
- 4
- top_value
- top_rate
- 0.7
- cardinality
- 4
- entropy
- 1.357
- entropy_ratio
- 0.6784
ingredients_text_dz
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_ca
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_bg
categorical label null_rate imbalanceThis column appears to be a Bulgarian-language generic name field (likely a pharmaceutical or product name localization), but it is almost entirely absent: 94% of rows are null and the remaining 3 non-null rows contain only an empty string. With cardinality of 1 and entropy of 0, the column carries zero information. Treatment: Drop this column; it is 94% null and the only observed value is an empty string, making it analytically useless.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_sl
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_et
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- Chocolat noir - 85% cacao
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
origin_et
categorical metadata null_rate imbalanceThis column appears to be an origin or source tag in Amharic/Ethiopic script (indicated by the '_et' suffix), but it is effectively empty: 94% of the 50 rows are null, and the sole non-null value present is an empty string appearing 3 times. With cardinality of 1 and entropy of 0.0, the column carries zero information. This is likely an unfilled localization or metadata field. Treatment: Drop this column; it contains no usable signal (94% null, sole value is empty string).
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_sk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_et
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije.
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
nutrition_score_warning_nutriments_estimated
numeric null_rate constant- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- min
- 1
- max
- 1
- mean
- 1
- median
- 1
- std
- 0
- q1
- 1
- q3
- 1
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
ingredients_text_sk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_pt
categorical long_tail null_rate- n
- 50
- nulls
- 40 (80.0%)
- unique
- 3
- top_value
- top_rate
- 0.8
- cardinality
- 3
- entropy
- 0.9219
- entropy_ratio
- 0.5817
ingredients_text_bg
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
packaging_text_et
categorical free_text null_rate imbalanceThis column contains Estonian-language packaging text (`_et` locale suffix), but is effectively empty: 94% of its 50 rows are null, and the sole non-null value across all 3 populated rows is an empty string. With cardinality of 1 and entropy of 0.0, the column carries zero information — it has never been populated in this dataset. Treatment: Drop — 94% null rate and only empty-string values provide no usable signal.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_sk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_ca
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_ca
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_dz
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_sl
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- ARRIBA 85% cacao
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_sk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_et
categorical label null_rate imbalanceThis column appears to be an Estonian-language generic name field ('et' locale suffix), but it is effectively empty: 94% of its 50 rows are null, and the sole non-null value is a blank string appearing 3 times, giving a cardinality of 1. The column carries zero information — entropy is 0.0 and top_rate is 1.0 across a single empty token. Treatment: Drop this column; it contains no usable data (94% null, remaining values are blank strings).
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_et
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (_sojin_ lecitin); ekstrakt vanilije.
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
packaging_text_ca
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_sl
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_dz
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_ca
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_ca
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_pt
categorical free_text null_rate imbalanceThis column appears to be a Portuguese-language packaging text field, almost certainly intended to carry product label or packaging descriptions. With an 80% null rate and the sole non-null value being an empty string appearing 10 times, the column contains zero usable information across all 50 rows. The effective data-present rate is 0%, making this column entirely empty in practice. Treatment: Drop this column; it carries no information and all present values are empty strings.
- n
- 50
- nulls
- 40 (80.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_bg
categorical other null_rate imbalanceThis column ('origin_bg') is a categorical field with 50 rows, but 94% of values are null and the sole non-null value is an empty string appearing 3 times — making it entirely devoid of usable information. Cardinality is 1, entropy is 0, and top_rate is 1.0, confirming complete uniformity across non-null entries. Both alerts (null_rate and imbalance) are triggered, which is expected given the near-total absence of data. Treatment: Drop this column; it carries zero information with 94% nulls and only empty strings remaining.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_bg
categorical free_text null_rate imbalanceThis column contains Bulgarian-language packaging text for products, but it is almost entirely empty: 94% of the 50 rows are null, and the sole non-null value observed is an empty string appearing 3 times (top_rate 1.0). With cardinality of 1 and entropy of 0.0, the column carries zero information in its current state. Treatment: Drop from modelling; re-evaluate only if Bulgarian market data is backfilled, otherwise exclude as zero-variance.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_pt
categorical other null_rate imbalanceThis column, likely representing an origin point or location, is almost entirely empty: 80% of its 50 rows are null, and the only non-null value present is an empty string appearing 10 times — meaning the column contains no actual information whatsoever. With a cardinality of 1 and entropy of 0.0, it is completely invariant. The combination of high null rate and a sole value being an empty string suggests the field was never populated in this dataset. Treatment: Drop — column carries zero information due to 80% nulls and a single empty-string value across all remaining rows.
- n
- 50
- nulls
- 40 (80.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_pt
categorical long_tail null_rate- n
- 50
- nulls
- 42 (84.0%)
- unique
- 4
- top_value
- top_rate
- 0.625
- cardinality
- 4
- entropy
- 1.549
- entropy_ratio
- 0.7744
product_name_bg
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- Шоколад 85% какаова маса
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
ingredients_text_sl
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_sl
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_sk
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_pt
categorical long_tail null_rate- n
- 50
- nulls
- 40 (80.0%)
- unique
- 7
- top_value
- top_rate
- 0.4
- cardinality
- 7
- entropy
- 2.522
- entropy_ratio
- 0.8983
lc_imported
categorical null_rate- n
- 50
- nulls
- 42 (84.0%)
- unique
- 2
- top_value
- fr
- top_rate
- 0.875
- cardinality
- 2
- entropy
- 0.5436
- entropy_ratio
- 0.5436
abbreviated_product_name_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- CRISTALINE Eau De Source 0.5L
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
generic_name_zh
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
obsolete_imported
categorical other null_rate imbalanceThis column appears to be a boolean or flag field (likely 'imported' status, now obsolete) that contains only the value '0' across all 7 non-null rows. With an 86% null rate and a cardinality of 1, the column carries zero information — entropy is exactly 0.0 and the single observed value covers 100% of non-null records. Both the near-total nulls and complete value imbalance are flagged as alerts. Treatment: Drop — zero variance, 86% nulls, and a name explicitly marking it obsolete make this column uninformative for any downstream use.
- n
- 50
- nulls
- 43 (86.0%)
- unique
- 1
- top_value
- 0
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- Eau De Source
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
owner_imported
categorical long_tail null_rate- n
- 50
- nulls
- 44 (88.0%)
- unique
- 5
- top_value
- org-barilla-france-sa
- top_rate
- 0.3333
- cardinality
- 5
- entropy
- 2.252
- entropy_ratio
- 0.9697
customer_service
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 6
- top_value
- Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
- top_rate
- 0.2857
- cardinality
- 6
- entropy
- 2.522
- entropy_ratio
- 0.9755
countries_imported
categorical null_rate- n
- 50
- nulls
- 42 (84.0%)
- unique
- 2
- top_value
- France
- top_rate
- 0.875
- cardinality
- 2
- entropy
- 0.5436
- entropy_ratio
- 0.5436
data_sources_imported
categorical long_tail null_rate- n
- 50
- nulls
- 42 (84.0%)
- unique
- 8
- top_value
- Producers, Producer - gie-sources-alma, Database - Equadis, Database - GDSN, Databases, Producers, Producer - gie-sources-alma
- top_rate
- 0.125
- cardinality
- 8
- entropy
- 3
- entropy_ratio
- 1
product_name_zh
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
categories_imported
categorical long_tail null_rate- n
- 50
- nulls
- 44 (88.0%)
- unique
- 5
- top_value
- Snacks, Snacks salés, Amuse-gueules, Chips et frites, Chips
- top_rate
- 0.3333
- cardinality
- 5
- entropy
- 2.252
- entropy_ratio
- 0.9697
quantity_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- 500 ml
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
ingredients_text_zh
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
emb_code
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- EMB 44068 A
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origins_fr
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Chambon-la-Forêt,France,Cairanne,Provence-Alpes-Côte d'Azur,Vaucluse,Italie,Source Sainte Cécile,Source Ofélia,Source Éléonore,Source Emma,Source Éléna
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
nutrition_data_prepared_per_imported
categorical metadata null_rate imbalanceThis column captures the unit basis for imported nutrition data (e.g., 'per 100g'), and is effectively a constant — the only observed value is '100g' across all 7 non-null rows. With an 86% null rate and cardinality of 1, it carries zero discriminative information. The combination of near-total missingness and zero entropy is a strong signal this field was either sparsely populated at ingestion or serves as a fixed schema placeholder. Treatment: Drop before modelling; column is a zero-variance constant with 86% nulls and provides no analytical value.
- n
- 50
- nulls
- 43 (86.0%)
- unique
- 1
- top_value
- 100g
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
sources_fields
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
customer_service_fr
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 6
- top_value
- Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
- top_rate
- 0.2857
- cardinality
- 6
- entropy
- 2.522
- entropy_ratio
- 0.9755
nutrition_data_per_imported
categorical metadata null_rate imbalanceThis column represents the unit basis for imported nutrition data, and every non-null value is identically '100g' — giving it a cardinality of 1 and an entropy of 0.0. With an 84% null rate across 50 rows, only 8 observations carry a value at all, making the column almost entirely absent. The combination of extreme nullity and zero variance means this column provides no discriminating information whatsoever. Treatment: Drop — 84% null with a single constant value ('100g') offers no predictive or analytical signal.
- n
- 50
- nulls
- 42 (84.0%)
- unique
- 1
- top_value
- 100g
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
owner
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 6
- top_value
- org-barilla-france-sa
- top_rate
- 0.2857
- cardinality
- 6
- entropy
- 2.522
- entropy_ratio
- 0.9755
abbreviated_product_name
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- CRISTALINE Eau De Source 0.5L
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
conservation_conditions_fr
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
brands_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 6
- top_value
- Wasa
- top_rate
- 0.2857
- cardinality
- 6
- entropy
- 2.522
- entropy_ratio
- 0.9755
owner_fields
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
conservation_conditions_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
origin_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- France
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
customer_service_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 6
- top_value
- Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
- top_rate
- 0.2857
- cardinality
- 6
- entropy
- 2.522
- entropy_ratio
- 0.9755
product_name_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- CRISTALINE Eau De Source 0.5L
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
lang_imported
categorical metadata null_rate imbalanceThis column records the imported language of a record, and across the full 50-row dataset every non-null value is 'fr' (French) — a single unique value with zero entropy. With an 86% null rate, only 7 of 50 rows carry any value at all, making the column nearly empty and entirely constant where populated. Both the extreme null rate and perfect imbalance are flagged as alerts, suggesting this field may be partially populated metadata from an import pipeline rather than a reliable feature. Treatment: Drop or impute cautiously — 86% nulls and zero variance make this column uninformative for modelling; investigate import pipeline for why values are absent.
- n
- 50
- nulls
- 43 (86.0%)
- unique
- 1
- top_value
- fr
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
abbreviated_product_name_fr
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- CRISTALINE Eau De Source 0.5L
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
ingredients_text_fr_imported
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- Eau de Source
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
conservation_conditions
categorical long_tail null_rate- n
- 50
- nulls
- 43 (86.0%)
- unique
- 7
- top_value
- A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
- top_rate
- 0.1429
- cardinality
- 7
- entropy
- 2.807
- entropy_ratio
- 1
nova_group_error
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- too_many_unknown_ingredients
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
producer_version_id_imported
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 3
- top_value
- 1
- top_rate
- 0.5
- cardinality
- 3
- entropy
- 1.5
- entropy_ratio
- 0.9464
ingredients_text_de_ocr_1648990410
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Kekse mit Nuss - Nugat- Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_ro
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
packaging_imported
categorical null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 2
- top_value
- Enveloppe
- top_rate
- 0.75
- cardinality
- 2
- entropy
- 0.8113
- entropy_ratio
- 0.8113
ingredients_text_de_ocr_1648990410_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Kekse mit Nuss - Nugat - Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_ro
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
producer_version_id
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 3
- top_value
- 1
- top_rate
- 0.5
- cardinality
- 3
- entropy
- 1.5
- entropy_ratio
- 0.9464
labels_imported
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 3
- top_value
- Végétarien
- top_rate
- 0.6
- cardinality
- 3
- entropy
- 1.371
- entropy_ratio
- 0.865
allergens_imported
categorical long_tail null_rate- n
- 50
- nulls
- 45 (90.0%)
- unique
- 4
- top_value
- Gluten
- top_rate
- 0.4
- cardinality
- 4
- entropy
- 1.922
- entropy_ratio
- 0.961
origin_ro
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
no_nutrition_data_imported
categorical feature null_rate imbalanceThis column is a boolean flag indicating whether nutrition data was absent for a record. It has a 92% null rate across 50 rows, and the only 4 non-null values all carry the single value 'false', giving it zero entropy and cardinality of 1. The extreme null rate combined with complete value uniformity among non-nulls means this column carries no predictive signal whatsoever — it is effectively empty. Treatment: Drop — zero variance and 92% nulls make this column useless for modelling or analysis.
- n
- 50
- nulls
- 46 (92.0%)
- unique
- 1
- top_value
- false
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
serving_size_imported
categorical long_tail null_rate- n
- 50
- nulls
- 44 (88.0%)
- unique
- 6
- top_value
- 13.8 g (1)
- top_rate
- 0.1667
- cardinality
- 6
- entropy
- 2.585
- entropy_ratio
- 1
generic_name_ro
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_de_ocr_1648897071
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_- und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_de_ocr_1648897071_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_ - und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_ro
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
abbreviated_product_name_imported
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- Authentique 275g, fr
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
traces_imported
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 4
- top_value
- Lupin, Lait, Moutarde, Graines de sésame, Soja
- top_rate
- 0.25
- cardinality
- 4
- entropy
- 2
- entropy_ratio
- 1
specific_ingredients
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
packaging_text_ru
categorical metadata null_rate imbalanceThis column holds Russian-language packaging text, but is almost entirely empty: 94% of the 50 rows are null, and the sole non-null value appearing 3 times is an empty string — giving a cardinality of 1 and zero entropy. In practice the column carries no information whatsoever across the observed sample. Treatment: Drop this column; it is effectively unpopulated (94% null, remaining values are empty strings) and provides no signal for modelling or analysis.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_ru
categorical other null_rate imbalanceThis column appears to be a Russian-language origin/source field that is almost entirely unpopulated: 94% of the 50 rows are null, and the sole non-null value is an empty string appearing 3 times. With cardinality of 1, zero entropy, and a top_rate of 1.0, the column carries absolutely no information. It was likely intended to capture Russian-locale origin metadata but was never populated. Treatment: Drop this column — it contains no usable signal (94% null, remaining values are empty strings).
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_ru
categorical metadata null_rate imbalanceThis column is intended to store Russian-language ingredients text with allergen information for food products. It is effectively empty: 94% of the 50 rows are null, and the sole non-null value present is an empty string (''), giving a cardinality of 1 and entropy of 0. There is no usable signal whatsoever in this column for the sampled data. Treatment: Drop this column; it carries no information (94% null, remaining values are empty strings).
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_ru
categorical null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 2
- top_value
- top_rate
- 0.6667
- cardinality
- 2
- entropy
- 0.9183
- entropy_ratio
- 0.9183
generic_name_ru
categorical null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 2
- top_value
- top_rate
- 0.6667
- cardinality
- 2
- entropy
- 0.9183
- entropy_ratio
- 0.9183
ingredients_text_ru
categorical other null_rate imbalanceThis column is a Russian-language ingredients text field for food/product records, almost certainly a localized variant of a broader ingredients column. It is 94% null across 50 rows, and the only non-null value observed is an empty string (appearing 3 times), meaning there is effectively zero usable content in this column. Cardinality of 1 and entropy of 0.0 confirm complete absence of informational signal. Treatment: Drop; 94% null with only empty-string values provides no modelling or analytical value.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_da
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_da
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Kiks
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
forest_footprint_data
unknown skipped- n
- 50
- nulls
- 0 (0.0%)
- unique
- —
product_name_da
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Original
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
ingredients_text_with_allergens_da
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- VETEMJÖL/HVEDEMEL, palmolja/-olie, glukossirap, maltextrakt från KORN/BYG, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, ÄGG/ÆG/EGG, arom, mjölbehandlingsmedel/melbehandlingsmiddel (NATRIUMDISULFIT).
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
origin_da
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_da
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- _VETEMJÖL_/_HVEDEMEL_, palmolja/-olie, glukossirap, maltextrakt från _KORN_/_BYG_, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, _ÄGG_/_ÆG_/_EGG_, arom, mjölbehandlingsmedel/melbehandlingsmiddel (_NATRIUMDISULFIT_).
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
ingredients_text_cs
categorical null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 2
- top_value
- top_rate
- 0.6667
- cardinality
- 2
- entropy
- 0.9183
- entropy_ratio
- 0.9183
ingredients_text_nl_ocr_1675675383_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille - stokje.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_cs
categorical null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 2
- top_value
- top_rate
- 0.6667
- cardinality
- 2
- entropy
- 0.9183
- entropy_ratio
- 0.9183
ingredients_text_hu_ocr_1571428260_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- kakaómassza, cukor, kakaó - vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_cs
categorical metadata null_rate imbalanceThis column appears to be Czech-language packaging text (`_cs` locale suffix), but it is almost entirely empty: 94% null rate across 50 rows, and the only observed non-null value is an empty string appearing 3 times. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated for this dataset slice. Treatment: Drop this column; it contains no usable signal (94% nulls, sole value is empty string).
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_sr
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Šećer, kakao masa, kakao buter, vanile.
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
origin_sr
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_hu_ocr_1571428260
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- kakaómassza, cukor, kakaó- vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_hu
categorical feature null_rate imbalanceThis column contains Hungarian-language packaging text, but is almost entirely empty: 92% null rate across 50 rows, and the only non-null value observed is an empty string appearing 4 times. With cardinality of 1 and entropy of 0.0, the column carries zero information — it is effectively unpopulated. Treatment: Drop — 92% nulls and a single empty-string value provide no modelling or analytical signal.
- n
- 50
- nulls
- 46 (92.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_cs
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_nl_ocr_1675675383
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille- stokje.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_sr
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Excellence 70% Cocoa Intense Dark
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
generic_name_hu
categorical null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 2
- top_value
- top_rate
- 0.75
- cardinality
- 2
- entropy
- 0.8113
- entropy_ratio
- 0.8113
packaging_text_sr
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_cs
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Kakaová hmota, cukr, kakaové máslo, vanilka.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_sr
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Šećer, kakao masa, kakao buter, vanile.
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
ingredients_text_hu
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 4
- top_value
- Kakaómassza, cukor, kakaó - vaj, vanília.
- top_rate
- 0.25
- cardinality
- 4
- entropy
- 2
- entropy_ratio
- 1
product_name_hu
categorical long_tail null_rate- n
- 50
- nulls
- 46 (92.0%)
- unique
- 3
- top_value
- top_rate
- 0.5
- cardinality
- 3
- entropy
- 1.5
- entropy_ratio
- 0.9464
generic_name_sr
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- Tamna čokolada sa 70% kakaa
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
origin_hu
categorical other null_rate imbalanceThis column appears to be an origin or handling-unit identifier that is almost entirely unpopulated — 92% of its 50 rows are null, and the sole non-null value present is an empty string appearing 4 times. With cardinality of 1, zero entropy, and a top_rate of 1.0 across non-null values, the column carries no discriminative information whatsoever. This is a effectively a blank field in the current dataset snapshot. Treatment: Drop — 92% null with a single empty-string value provides zero signal for any downstream task.
- n
- 50
- nulls
- 46 (92.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_hu
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- Kakaómassza, cukor, kakaó - vaj, vanília.
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
generic_name_cs
categorical label null_rate imbalanceThis column appears to be a Czech-language generic name field (indicated by the '_cs' suffix) that is almost entirely empty: 94% of its 50 rows are null, and the sole non-null value is an empty string appearing 3 times. With cardinality of 1 and entropy of 0, the column carries zero information — it is effectively unpopulated. Treatment: Drop this column; it contains no usable signal with a 94% null rate and only empty-string values in the remainder.
- n
- 50
- nulls
- 47 (94.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_xx
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_xx
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_xx
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_xx
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_xx
categorical null_rate imbalance- n
- 50
- nulls
- 48 (96.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_es_ocr_1548767061
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_es_ocr_1548767061_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_ur
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_he
categorical long_tail null_rate- n
- 50
- nulls
- 48 (96.0%)
- unique
- 2
- top_value
- נוטלה
- top_rate
- 0.5
- cardinality
- 2
- entropy
- 1
- entropy_ratio
- 1
ingredients_text_he
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_ur
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_he
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- ממרח אגוזי לוז עם קקאו
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_he
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_ur
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_he
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_ur
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_ur
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_he
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
nutriscore_grade_producer_imported
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- c
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
nutriscore_grade_producer
categorical long_tail null_rate- n
- 50
- nulls
- 47 (94.0%)
- unique
- 3
- top_value
- c
- top_rate
- 0.3333
- cardinality
- 3
- entropy
- 1.585
- entropy_ratio
- 1
ingredients_text_el
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_el
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_el
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_el
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_el
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_el
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_it_ocr_1559410715
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Cioccolato amaro extra. Cacao: 99% minimo. Ingredienti: pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_de_ocr_1559410715
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_th
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- ลินด์ เอ็กเซอร์แลนซ์ ดาร์ก 99% โกโก้ ดาร์ก แอปโซลูท ช็อกโกแลต
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_th
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_de_ocr_1548767354
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_de_ocr_1548767354_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_th
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_it_ocr_1559410715_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_th
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_th
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_th
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_de_ocr_1559410715_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_fr_imported
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- 1 FEUILLE PAPIER À RECYCLER, 1 FEUILLE METAL À RECYCLER.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
preparation
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Produit prêt à consommer
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
preparation_fr_imported
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Produit prêt à consommer
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
preparation_fr
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Produit prêt à consommer
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_lc
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_lc
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_lc
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_lc
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1561814324_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- 25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1561814324
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- 25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides. Conditionné sous atmosphère protectrice.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1624039072
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- ingrédients : cacao, émulsifiant (lécithine de _soja_), vanille.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1624039072_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Cacao, émulsifiant (lécithine de _soja_), vanille.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573108349
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573108349_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573107560_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573108360
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573107556_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573109955
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1566920858
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573107560
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573108346
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573108346_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573109955_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1566920858_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573107556
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1573108360_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_ro
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
origin_lt
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_with_allergens_lt
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
product_name_lt
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_lt
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
packaging_text_lt
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
generic_name_lt
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1713713129_result
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
ingredients_text_fr_ocr_1713713129
categorical long_tail null_rate imbalance- n
- 50
- nulls
- 49 (98.0%)
- unique
- 1
- top_value
- Ingrédients : Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0