saturn·

wild openfoodfacts sample

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/wild/openfoodfacts_sample.json

Saturn profiled 50 rows across 545 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/wild/openfoodfacts_sample.json",
    "--findings", "wild-openfoodfacts_sample.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is a 50-row sample from Open Food Facts with 545 columns, dominated by per-language localized fields (product names, generic names, ingredient texts, packaging texts, origin) plus nutrition, scoring, and provenance metadata. The shape is extremely sparse: the vast majority of localized columns have null rates of 0.92–0.98, so most analytical signal lives in a small core of fields. Worth a closer look first: the Nutri-Score and NOVA distributions (the catalog skews heavily to grade 'e' and NOVA group 4), the Eco-Score grade mix, and the food_groups / pnns_groups_2 breakdown showing this sample is concentrated in chocolate and biscuit products. Also note the heavy imbalance in `lang` (70% French) and `countries_lc`, which biases any text or origin analysis. Treat the hundreds of `*_xx` / `ingredients_text_` columns as effectively empty rather than as features.

citing: column_count · row_count · nutriscore_grade · nova_groups · ecoscore_grade · food_groups · pnns_groups_2 · lang · countries_lc · ecoscore_score · nutriscore_score · completeness

Out[4]:

saturn.schema() · 545 columns

column kind n null% unique alerts
update_key categorical 50 0.0% 9 long_tail
categories_old categorical 50 2.0% 45 long_tail
ecoscore_score numeric 50 14.0% 31
environment_impact_level categorical 50 56.0% 1 null_rate imbalance
ingredients_text_fi categorical 50 90.0% 4 long_tail null_rate
nutrition_data_prepared categorical 50 4.0% 1 imbalance
packaging_shapes_tags unknown 50 0.0% skipped
nutrient_levels_tags unknown 50 0.0% skipped
packagings_materials unknown 50 0.0% skipped
ingredients_without_ecobalyse_ids unknown 50 0.0% skipped
generic_name_nl categorical 50 76.0% 4 long_tail null_rate
misc_tags unknown 50 0.0% skipped
product_name_sv categorical 50 92.0% 4 long_tail null_rate
scans_n numeric 50 0.0% 49 high_skew outliers
schema_version numeric 50 0.0% 1 constant
url categorical 50 0.0% 50 long_tail
vitamins_tags unknown 50 0.0% skipped
debug_param_sorted_langs unknown 50 0.0% skipped
packaging categorical 50 12.0% 41 long_tail
grades unknown 50 0.0% skipped
last_modified_t numeric 50 0.0% 50 outliers
origin_nl categorical 50 76.0% 1 null_rate imbalance
allergens_lc categorical 50 4.0% 6
states_hierarchy unknown 50 0.0% skipped
ingredients_text_ja categorical 50 98.0% 1 long_tail null_rate imbalance
teams_tags unknown 50 0.0% skipped
traces_from_user categorical 50 0.0% 35 long_tail
origins_tags unknown 50 0.0% skipped
serving_quantity_unit categorical 50 8.0% 2 imbalance
vitamins_prev_tags unknown 50 0.0% skipped
ingredients_hierarchy unknown 50 0.0% skipped
unique_scans_n numeric 50 0.0% 48 high_skew outliers
labels categorical 50 2.0% 42 long_tail
generic_name_en categorical 50 14.0% 8 long_tail
weighters_tags unknown 50 0.0% skipped
popularity_tags unknown 50 0.0% skipped
product_name_fi categorical 50 90.0% 4 long_tail null_rate
origin_fr categorical 50 8.0% 7 long_tail
generic_name categorical 50 4.0% 28 long_tail
nutriscore_version categorical 50 0.0% 1 imbalance
ingredients_without_ciqual_codes unknown 50 0.0% skipped
manufacturing_places_tags unknown 50 0.0% skipped
photographers_tags unknown 50 0.0% skipped
packaging_text_pl categorical 50 90.0% 1 null_rate imbalance
informers_tags unknown 50 0.0% skipped
ingredients_text_en categorical 50 12.0% 36 long_tail
ingredients_text_it categorical 50 68.0% 12 long_tail null_rate
origin_de categorical 50 60.0% 1 null_rate imbalance
nova_group numeric 50 4.0% 3 high_skew
packaging_text_fi categorical 50 90.0% 1 null_rate imbalance
states categorical 50 0.0% 26 long_tail
ingredients_with_unspecified_percent_sum numeric 50 0.0% 22
added_countries_tags unknown 50 0.0% skipped
id categorical 50 0.0% 50 long_tail
nutrient_levels unknown 50 0.0% skipped
sortkey numeric 50 12.0% 44 high_skew outliers
image_small_url categorical 50 0.0% 50 long_tail
packaging_recycling_tags unknown 50 0.0% skipped
food_groups categorical 50 2.0% 11
nova_groups_markers unknown 50 0.0% skipped
packaging_text_de categorical 50 60.0% 2 null_rate
categories_lc categorical 50 0.0% 6
checkers unknown 50 0.0% skipped
packaging_text_es categorical 50 60.0% 2 null_rate
unknown_nutrients_tags unknown 50 0.0% skipped
editors_tags unknown 50 0.0% skipped
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients numeric 50 10.0% 1 constant
labels_lc categorical 50 2.0% 6
nutriscore_data unknown 50 0.0% skipped
other_nutritional_substances_tags unknown 50 0.0% skipped
product_name_nb categorical 50 96.0% 2 long_tail null_rate
nutrition_data_prepared_per categorical 50 0.0% 1 imbalance
product_quantity categorical 50 6.0% 27 long_tail
product_type categorical 50 0.0% 1 imbalance
checkers_tags unknown 50 0.0% skipped
nucleotides_tags unknown 50 0.0% skipped
languages_tags unknown 50 0.0% skipped
traces_lc categorical 50 4.0% 6
categories_hierarchy unknown 50 0.0% skipped
image_front_small_url categorical 50 0.0% 50 long_tail
entry_dates_tags unknown 50 0.0% skipped
ecoscore_tags unknown 50 0.0% skipped
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients numeric 50 8.0% 1 constant
ingredients_without_ciqual_codes_n numeric 50 0.0% 15
rev numeric 50 0.0% 46
ingredients_non_nutritive_sweeteners_n numeric 50 0.0% 1 constant
ingredients_without_ecobalyse_ids_n numeric 50 0.0% 20
environment_impact_level_tags unknown 50 0.0% skipped
last_image_dates_tags unknown 50 0.0% skipped
labels_hierarchy unknown 50 0.0% skipped
product_name_en categorical 50 14.0% 34 long_tail
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value numeric 50 8.0% 6 high_skew outliers
traces categorical 50 0.0% 23 long_tail
generic_name_fi categorical 50 90.0% 5 long_tail null_rate
emb_codes_orig categorical 50 34.0% 5 long_tail null_rate
ingredients_with_specified_percent_n numeric 50 0.0% 7
nutrition_grades categorical 50 0.0% 6
weighers_tags unknown 50 0.0% skipped
categories_tags unknown 50 0.0% skipped
image_url categorical 50 0.0% 50 long_tail
sources unknown 50 0.0% skipped
languages_hierarchy unknown 50 0.0% skipped
pnns_groups_1 categorical 50 0.0% 7
countries_lc categorical 50 2.0% 6
additives_tags unknown 50 0.0% skipped
codes_tags unknown 50 0.0% skipped
countries_tags unknown 50 0.0% skipped
creator categorical 50 0.0% 13 long_tail
ingredients unknown 50 0.0% skipped
product_name_nl categorical 50 76.0% 7 long_tail null_rate
ingredients_n_tags unknown 50 0.0% skipped
origin_es categorical 50 60.0% 1 null_rate imbalance
product_name_pl categorical 50 90.0% 3 long_tail null_rate
scores unknown 50 0.0% skipped
brands categorical 50 0.0% 41 long_tail
ingredients_text_de categorical 50 60.0% 16 long_tail null_rate
ingredients_text_nb categorical 50 96.0% 1 null_rate imbalance
packagings_n numeric 50 18.0% 5 outliers
complete numeric 50 0.0% 2
emb_codes_20141016 categorical 50 58.0% 7 long_tail null_rate
ingredients_tags unknown 50 0.0% skipped
packaging_text_ja categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_de categorical 50 60.0% 9 long_tail null_rate
last_editor categorical 50 2.0% 24 long_tail
minerals_prev_tags unknown 50 0.0% skipped
last_image_t numeric 50 0.0% 50 high_skew
obsolete_since_date categorical 50 12.0% 1 imbalance
pnns_groups_2_tags unknown 50 0.0% skipped
emb_codes_tags unknown 50 0.0% skipped
countries_beforescanbot categorical 50 14.0% 38 long_tail
nutrition_grade_fr categorical 50 0.0% 6
data_quality_tags unknown 50 0.0% skipped
ingredients_with_specified_percent_sum numeric 50 0.0% 22
origin_it categorical 50 68.0% 1 null_rate imbalance
nutrition_data_per categorical 50 0.0% 2
origin_pl categorical 50 90.0% 1 null_rate imbalance
product unknown 50 0.0% skipped
link categorical 50 4.0% 28 long_tail
ingredients_text_nl categorical 50 76.0% 9 long_tail null_rate
additives_n numeric 50 0.0% 8
generic_name_sv categorical 50 92.0% 4 long_tail null_rate
ingredients_that_may_be_from_palm_oil_tags unknown 50 0.0% skipped
known_ingredients_n numeric 50 0.0% 22
completeness numeric 50 0.0% 14 outliers
ingredients_sweeteners_n numeric 50 0.0% 1 constant
nova_groups categorical 50 4.0% 3
allergens_hierarchy unknown 50 0.0% skipped
obsolete categorical 50 12.0% 1 imbalance
origin_sv categorical 50 92.0% 1 null_rate imbalance
packaging_hierarchy unknown 50 0.0% skipped
ingredients_with_unspecified_percent_n numeric 50 0.0% 18
fruits-vegetables-nuts_100g_estimate numeric 50 46.0% 2 null_rate high_skew
emb_codes categorical 50 4.0% 11 long_tail
packagings unknown 50 0.0% skipped
purchase_places_tags unknown 50 0.0% skipped
additives_original_tags unknown 50 0.0% skipped
image_front_url categorical 50 0.0% 50 long_tail
data_quality_bugs_tags unknown 50 0.0% skipped
origin_fi categorical 50 90.0% 1 null_rate imbalance
images unknown 50 0.0% skipped
ingredients_analysis unknown 50 0.0% skipped
ingredients_text_with_allergens_pl categorical 50 92.0% 3 long_tail null_rate
product_name_de categorical 50 60.0% 16 long_tail null_rate
ingredients_text_with_allergens_nb categorical 50 96.0% 1 null_rate imbalance
packaging_text_it categorical 50 68.0% 3 long_tail null_rate
product_name_it categorical 50 68.0% 12 long_tail null_rate
serving_quantity categorical 50 12.0% 27 long_tail
product_name_ja categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_sv categorical 50 92.0% 4 long_tail null_rate
allergens_tags unknown 50 0.0% skipped
ingredients_text_fr categorical 50 4.0% 47 long_tail
nutrition_score_beverage numeric 50 0.0% 2 high_skew
ingredients_ids_debug unknown 50 0.0% skipped
nutrition_data categorical 50 2.0% 1 imbalance
origin_ja categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_en categorical 50 14.0% 5 long_tail
unknown_ingredients_n numeric 50 0.0% 6 high_skew outliers
ingredients_from_palm_oil_tags unknown 50 0.0% skipped
labels_tags unknown 50 0.0% skipped
packaging_old_before_taxonomization categorical 50 24.0% 36 long_tail null_rate
packaging_text_nb categorical 50 96.0% 1 null_rate imbalance
nutrition_grades_tags unknown 50 0.0% skipped
category_properties unknown 50 0.0% skipped
nutriscore_score numeric 50 2.0% 28
packaging_tags unknown 50 0.0% skipped
labels_old categorical 50 8.0% 38 long_tail
packaging_text categorical 50 4.0% 13 long_tail
ingredients_percent_analysis numeric 50 0.0% 2 high_skew outliers
ecoscore_data unknown 50 0.0% skipped
ingredients_text_sv categorical 50 92.0% 4 long_tail null_rate
brands_tags unknown 50 0.0% skipped
compared_to_category categorical 50 0.0% 35 long_tail
data_sources categorical 50 0.0% 43 long_tail
other_nutritional_substances_prev_tags unknown 50 0.0% skipped
ingredients_from_palm_oil_n numeric 50 8.0% 2 outliers
last_updated_t numeric 50 0.0% 50 outliers
nutrition_score_debug categorical 50 0.0% 2 imbalance
popularity_key numeric 50 0.0% 49 high_skew outliers
product_name_es categorical 50 60.0% 17 long_tail null_rate
allergens_from_user categorical 50 0.0% 34 long_tail
informers unknown 50 0.0% skipped
brands_old categorical 50 32.0% 29 long_tail null_rate
data_quality_errors_tags unknown 50 0.0% skipped
ingredients_text categorical 50 0.0% 50 long_tail
categories categorical 50 0.0% 46 long_tail
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value numeric 50 10.0% 13 high_skew outliers
ingredients_from_or_that_may_be_from_palm_oil_n numeric 50 6.0% 3
origins_old categorical 50 22.0% 9 long_tail null_rate
packaging_text_nl categorical 50 76.0% 1 null_rate imbalance
expiration_date categorical 50 4.0% 34 long_tail
selected_images unknown 50 0.0% skipped
traces_from_ingredients categorical 50 0.0% 12 long_tail
ingredients_text_with_allergens categorical 50 0.0% 50 long_tail
image_front_thumb_url categorical 50 0.0% 50 long_tail
lc categorical 50 0.0% 5
ingredients_text_debug categorical 50 28.0% 35 long_tail null_rate
packagings_materials_main categorical 50 62.0% 3 null_rate
data_quality_dimensions unknown 50 0.0% skipped
serving_size categorical 50 12.0% 37 long_tail
pnns_groups_1_tags unknown 50 0.0% skipped
origin categorical 50 6.0% 6 long_tail
ingredients_lc categorical 50 0.0% 4
packaging_old categorical 50 14.0% 40 long_tail
packaging_text_fr categorical 50 6.0% 14 long_tail
nova_group_debug categorical 50 0.0% 3 long_tail imbalance
ingredients_original_tags unknown 50 0.0% skipped
data_quality_completeness_tags unknown 50 0.0% skipped
cities_tags unknown 50 0.0% skipped
countries_hierarchy unknown 50 0.0% skipped
nutriscore_score_opposite numeric 50 2.0% 28
categories_properties_tags unknown 50 0.0% skipped
origins_lc categorical 50 4.0% 6
ciqual_food_name_tags unknown 50 0.0% skipped
countries categorical 50 0.0% 43 long_tail
ingredients_text_with_allergens_it categorical 50 68.0% 12 long_tail null_rate
packaging_lc categorical 50 12.0% 7
correctors_tags unknown 50 0.0% skipped
interface_version_created categorical 50 2.0% 3
states_tags unknown 50 0.0% skipped
nutriscore_2021_tags unknown 50 0.0% skipped
stores_tags unknown 50 0.0% skipped
image_thumb_url categorical 50 0.0% 50 long_tail
categories_properties unknown 50 0.0% skipped
nucleotides_prev_tags unknown 50 0.0% skipped
allergens_from_ingredients categorical 50 0.0% 35 long_tail
ingredients_text_with_allergens_fi categorical 50 90.0% 4 long_tail null_rate
_keywords unknown 50 0.0% skipped
manufacturing_places categorical 50 2.0% 20 long_tail
pnns_groups_2 categorical 50 0.0% 11
ingredients_text_pl categorical 50 90.0% 3 long_tail null_rate
generic_name_es categorical 50 60.0% 7 long_tail null_rate
origin_en categorical 50 14.0% 2 imbalance
generic_name_it categorical 50 68.0% 5 long_tail null_rate
ingredients_that_may_be_from_palm_oil_n numeric 50 8.0% 3 high_skew outliers
ingredients_text_es categorical 50 60.0% 13 long_tail null_rate
teams categorical 50 8.0% 39 long_tail
food_groups_tags unknown 50 0.0% skipped
data_quality_warnings_tags unknown 50 0.0% skipped
debug_tags unknown 50 0.0% skipped
main_countries_tags unknown 50 0.0% skipped
origins_hierarchy unknown 50 0.0% skipped
packagings_complete numeric 50 4.0% 2
nutriscore_tags unknown 50 0.0% skipped
ingredients_text_with_allergens_nl categorical 50 78.0% 9 long_tail null_rate
created_t numeric 50 0.0% 50
traces_hierarchy unknown 50 0.0% skipped
generic_name_nb categorical 50 96.0% 1 null_rate imbalance
ingredients_text_with_allergens_de categorical 50 66.0% 16 long_tail null_rate
ingredients_text_with_allergens_es categorical 50 62.0% 13 long_tail null_rate
product_name_fr categorical 50 2.0% 47 long_tail
stores categorical 50 4.0% 31 long_tail
_id categorical 50 0.0% 50 long_tail
nutriments unknown 50 0.0% skipped
editors unknown 50 0.0% skipped
max_imgid categorical 50 0.0% 38 long_tail
nutriscore_grade categorical 50 0.0% 6
product_quantity_unit categorical 50 10.0% 2 imbalance
ingredients_analysis_tags unknown 50 0.0% skipped
ingredients_text_with_allergens_fr categorical 50 4.0% 47 long_tail
interface_version_modified categorical 50 0.0% 2
data_sources_tags unknown 50 0.0% skipped
ingredients_text_with_allergens_en categorical 50 16.0% 36 long_tail
removed_countries_tags unknown 50 0.0% skipped
amino_acids_prev_tags unknown 50 0.0% skipped
code categorical 50 0.0% 50 long_tail
correctors unknown 50 0.0% skipped
generic_name_ja categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_fr categorical 50 6.0% 34 long_tail
generic_name_pl categorical 50 90.0% 2 null_rate
amino_acids_tags unknown 50 0.0% skipped
ingredients_debug unknown 50 0.0% skipped
ingredients_text_with_allergens_ja categorical 50 98.0% 1 long_tail null_rate imbalance
data_quality_info_tags unknown 50 0.0% skipped
last_edit_dates_tags unknown 50 0.0% skipped
last_modified_by categorical 50 2.0% 24 long_tail
no_nutrition_data categorical 50 4.0% 1 imbalance
nutriscore unknown 50 0.0% skipped
origin_nb categorical 50 96.0% 1 null_rate imbalance
origins categorical 50 4.0% 20 long_tail
nova_groups_tags unknown 50 0.0% skipped
languages unknown 50 0.0% skipped
nutriscore_2023_tags unknown 50 0.0% skipped
packaging_materials_tags unknown 50 0.0% skipped
lang categorical 50 0.0% 5
packaging_text_sv categorical 50 92.0% 1 null_rate imbalance
photographers unknown 50 0.0% skipped
languages_codes unknown 50 0.0% skipped
ecoscore_grade categorical 50 0.0% 9
ingredients_n numeric 50 0.0% 22
allergens categorical 50 0.0% 16
minerals_tags unknown 50 0.0% skipped
product_name categorical 50 0.0% 49 long_tail
purchase_places categorical 50 2.0% 32 long_tail
quantity categorical 50 2.0% 36 long_tail
traces_tags unknown 50 0.0% skipped
origin_uk categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_ar categorical 50 80.0% 2 null_rate
packaging_text_uk categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_ar categorical 50 78.0% 2 null_rate
ingredients_text_uk categorical 50 98.0% 1 long_tail null_rate imbalance
last_check_dates_tags unknown 50 0.0% skipped
checked categorical 50 86.0% 1 null_rate imbalance
packaging_text_ar categorical 50 80.0% 1 null_rate imbalance
carbon_footprint_percent_of_known_ingredients numeric 50 62.0% 19 null_rate
last_checker categorical 50 86.0% 4 null_rate
product_name_uk categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_uk categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_ar categorical 50 78.0% 6 long_tail null_rate
carbon_footprint_from_known_ingredients_debug categorical 50 72.0% 14 long_tail null_rate
last_checked_t numeric 50 86.0% 7 null_rate
ingredients_text_with_allergens_uk categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_ar categorical 50 82.0% 2 null_rate
origin_ar categorical 50 80.0% 1 null_rate imbalance
nutriments_estimated unknown 50 0.0% skipped
nutrition_score_warning_no_fiber numeric 50 70.0% 1 null_rate constant
ingredients_text_debug_tags unknown 50 0.0% skipped
taxonomies_enhancer_tags unknown 50 0.0% skipped
completed_t numeric 50 68.0% 16 null_rate
product_name_bg categorical 50 94.0% 3 long_tail null_rate
ingredients_text_et categorical 50 94.0% 3 long_tail null_rate
origin_sl categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_dz categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_sl categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_ca categorical 50 96.0% 1 null_rate imbalance
ingredients_text_dz categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_ca categorical 50 96.0% 1 null_rate imbalance
origin_ca categorical 50 96.0% 1 null_rate imbalance
product_name_et categorical 50 94.0% 3 long_tail null_rate
ingredients_text_with_allergens_bg categorical 50 94.0% 3 long_tail null_rate
ingredients_text_with_allergens_et categorical 50 94.0% 3 long_tail null_rate
origin_sk categorical 50 98.0% 1 long_tail null_rate imbalance
origin_bg categorical 50 94.0% 1 null_rate imbalance
packaging_text_sl categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_sk categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_sl categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_ca categorical 50 96.0% 1 null_rate imbalance
generic_name_sl categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_dz categorical 50 98.0% 1 long_tail null_rate imbalance
origin_et categorical 50 94.0% 1 null_rate imbalance
ingredients_text_with_allergens_sk categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_sk categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_pt categorical 50 84.0% 4 long_tail null_rate
ingredients_text_with_allergens_ca categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_pt categorical 50 80.0% 3 long_tail null_rate
packaging_text_pt categorical 50 80.0% 1 null_rate imbalance
ingredients_text_pt categorical 50 80.0% 4 long_tail null_rate
origin_pt categorical 50 80.0% 1 null_rate imbalance
nutrition_score_warning_nutriments_estimated numeric 50 96.0% 1 null_rate constant
packaging_text_bg categorical 50 94.0% 1 null_rate imbalance
generic_name_et categorical 50 94.0% 1 null_rate imbalance
packaging_text_ca categorical 50 96.0% 1 null_rate imbalance
product_name_sl categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_bg categorical 50 94.0% 1 null_rate imbalance
ingredients_text_sk categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_bg categorical 50 94.0% 3 long_tail null_rate
packaging_text_et categorical 50 94.0% 1 null_rate imbalance
packaging_text_sk categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_pt categorical 50 80.0% 7 long_tail null_rate
abbreviated_product_name_fr categorical 50 86.0% 7 long_tail null_rate
obsolete_imported categorical 50 86.0% 1 null_rate imbalance
sources_fields unknown 50 0.0% skipped
emb_code categorical 50 98.0% 1 long_tail null_rate imbalance
lang_imported categorical 50 86.0% 1 null_rate imbalance
generic_name_zh categorical 50 98.0% 1 long_tail null_rate imbalance
conservation_conditions_fr_imported categorical 50 86.0% 7 long_tail null_rate
origin_fr_imported categorical 50 96.0% 2 long_tail null_rate
owner categorical 50 86.0% 6 long_tail null_rate
ingredients_text_fr_imported categorical 50 86.0% 7 long_tail null_rate
owners_tags categorical 50 86.0% 6 long_tail null_rate
product_name_zh categorical 50 98.0% 1 long_tail null_rate imbalance
nutrition_data_prepared_per_imported categorical 50 86.0% 1 null_rate imbalance
abbreviated_product_name_fr_imported categorical 50 86.0% 7 long_tail null_rate
generic_name_zh_debug_tags unknown 50 0.0% skipped
customer_service_fr categorical 50 86.0% 6 long_tail null_rate
customer_service_fr_imported categorical 50 86.0% 6 long_tail null_rate
ingredients_text_zh_debug_tags unknown 50 0.0% skipped
product_name_fr_imported categorical 50 86.0% 7 long_tail null_rate
brands_imported categorical 50 86.0% 6 long_tail null_rate
owner_imported categorical 50 88.0% 5 long_tail null_rate
product_name_zh_debug_tags unknown 50 0.0% skipped
lc_imported categorical 50 84.0% 2 null_rate
ingredients_text_zh categorical 50 98.0% 1 long_tail null_rate imbalance
quantity_imported categorical 50 86.0% 7 long_tail null_rate
nutrition_data_per_imported categorical 50 84.0% 1 null_rate imbalance
generic_name_fr_imported categorical 50 86.0% 7 long_tail null_rate
owner_fields unknown 50 0.0% skipped
categories_imported categorical 50 88.0% 5 long_tail null_rate
conservation_conditions_fr categorical 50 86.0% 7 long_tail null_rate
conservation_conditions categorical 50 86.0% 7 long_tail null_rate
countries_imported categorical 50 84.0% 2 null_rate
origins_fr categorical 50 96.0% 2 long_tail null_rate
abbreviated_product_name categorical 50 86.0% 7 long_tail null_rate
customer_service categorical 50 86.0% 6 long_tail null_rate
data_sources_imported categorical 50 84.0% 8 long_tail null_rate
nova_group_error categorical 50 96.0% 1 null_rate imbalance
ingredients_text_de_ocr_1648897071_result categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_ro categorical 50 96.0% 1 null_rate imbalance
product_name_ro categorical 50 96.0% 2 long_tail null_rate
producer_version_id categorical 50 92.0% 3 long_tail null_rate
serving_size_imported categorical 50 88.0% 6 long_tail null_rate
no_nutrition_data_imported categorical 50 92.0% 1 null_rate imbalance
packaging_imported categorical 50 92.0% 2 null_rate
ingredients_text_ro categorical 50 96.0% 1 null_rate imbalance
producer_version_id_imported categorical 50 92.0% 3 long_tail null_rate
labels_imported categorical 50 90.0% 3 long_tail null_rate
ingredients_text_de_ocr_1648990410_result categorical 50 98.0% 1 long_tail null_rate imbalance
allergens_imported categorical 50 90.0% 4 long_tail null_rate
ingredients_text_de_ocr_1648990410 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_de_ocr_1648897071 categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_ro categorical 50 96.0% 1 null_rate imbalance
origin_ro categorical 50 96.0% 1 null_rate imbalance
abbreviated_product_name_imported categorical 50 94.0% 3 long_tail null_rate
traces_imported categorical 50 92.0% 4 long_tail null_rate
specific_ingredients unknown 50 0.0% skipped
product_name_ru categorical 50 94.0% 2 null_rate
origin_ru categorical 50 94.0% 1 null_rate imbalance
ingredients_text_with_allergens_ru categorical 50 94.0% 1 null_rate imbalance
packaging_text_ru categorical 50 94.0% 1 null_rate imbalance
generic_name_ru categorical 50 94.0% 2 null_rate
ingredients_text_ru categorical 50 94.0% 1 null_rate imbalance
ingredients_text_da categorical 50 96.0% 2 long_tail null_rate
ingredients_text_with_allergens_da categorical 50 96.0% 2 long_tail null_rate
product_name_da categorical 50 96.0% 2 long_tail null_rate
packaging_text_da categorical 50 96.0% 1 null_rate imbalance
generic_name_da categorical 50 96.0% 2 long_tail null_rate
forest_footprint_data unknown 50 0.0% skipped
origin_da categorical 50 96.0% 1 null_rate imbalance
origin_sr categorical 50 96.0% 1 null_rate imbalance
ingredients_text_nl_ocr_1675675383_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_cs categorical 50 94.0% 2 null_rate
product_name_cs categorical 50 94.0% 2 null_rate
origin_hu categorical 50 92.0% 1 null_rate imbalance
packaging_text_hu categorical 50 92.0% 1 null_rate imbalance
origin_cs categorical 50 96.0% 1 null_rate imbalance
ingredients_text_with_allergens_hu categorical 50 94.0% 3 long_tail null_rate
generic_name_cs categorical 50 94.0% 1 null_rate imbalance
ingredients_text_hu categorical 50 92.0% 4 long_tail null_rate
ingredients_text_sr categorical 50 96.0% 2 long_tail null_rate
packaging_text_sr categorical 50 96.0% 1 null_rate imbalance
ingredients_text_nl_ocr_1675675383 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_cs categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_sr categorical 50 96.0% 2 long_tail null_rate
packaging_text_cs categorical 50 94.0% 1 null_rate imbalance
product_name_sr categorical 50 96.0% 2 long_tail null_rate
ingredients_text_hu_ocr_1571428260_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_hu_ocr_1571428260 categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_hu categorical 50 92.0% 2 null_rate
product_name_hu categorical 50 92.0% 3 long_tail null_rate
ingredients_text_with_allergens_sr categorical 50 96.0% 2 long_tail null_rate
ingredients_text_es_ocr_1548767061_result categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_xx categorical 50 96.0% 1 null_rate imbalance
generic_name_xx categorical 50 96.0% 1 null_rate imbalance
ingredients_text_es_ocr_1548767061 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_xx categorical 50 96.0% 1 null_rate imbalance
origin_xx categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_xx categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_ur categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_ur categorical 50 98.0% 1 long_tail null_rate imbalance
origin_he categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_he categorical 50 96.0% 2 long_tail null_rate
origin_ur categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_ur categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_he categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_he categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_ur categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_he categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_he categorical 50 98.0% 1 long_tail null_rate imbalance
nutriscore_grade_producer categorical 50 94.0% 3 long_tail null_rate
nutriscore_grade_producer_imported categorical 50 94.0% 3 long_tail null_rate
packaging_text_el categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_el categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_el categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_el categorical 50 98.0% 1 long_tail null_rate imbalance
origin_el categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_el categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_th categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_de_ocr_1559410715_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_th categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_th categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_th categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_de_ocr_1548767354_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_th categorical 50 98.0% 1 long_tail null_rate imbalance
origin_th categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_de_ocr_1548767354 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_de_ocr_1559410715 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_it_ocr_1559410715 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_it_ocr_1559410715_result categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_fr_imported categorical 50 98.0% 1 long_tail null_rate imbalance
preparation_fr_imported categorical 50 98.0% 1 long_tail null_rate imbalance
preparation categorical 50 98.0% 1 long_tail null_rate imbalance
preparation_fr categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_lc categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_lc categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_lc categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_lc categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_xx_debug_tags unknown 50 0.0% skipped
product_name_xx_debug_tags unknown 50 0.0% skipped
generic_name_xx_debug_tags unknown 50 0.0% skipped
ingredients_text_fr_ocr_1561814324 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1561814324_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1624039072_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1624039072 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108346 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1566920858_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107556 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108346_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107560_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108349_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108360 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573109955_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108349 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573109955 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107556_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573108360_result categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1573107560 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1566920858 categorical 50 98.0% 1 long_tail null_rate imbalance
generic_name_lt categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_ro categorical 50 98.0% 1 long_tail null_rate imbalance
packaging_text_lt categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_lt categorical 50 98.0% 1 long_tail null_rate imbalance
origin_lt categorical 50 98.0% 1 long_tail null_rate imbalance
product_name_lt categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_with_allergens_lt categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1713713129 categorical 50 98.0% 1 long_tail null_rate imbalance
ingredients_text_fr_ocr_1713713129_result categorical 50 98.0% 1 long_tail null_rate imbalance
Fig 1.
nutriscore_grade · Check how heavily the sample skews to Nutri-Score 'e' (27 of 50) versus a/b/c.
Show data table
Top values for nutriscore_grade (6 unique shown, of 6 total).
valuecountshare
e2754.0%
d918.0%
c714.0%
a48.0%
b24.0%
unknown12.0%
Fig 2.
nova_groups · See the split between ultra-processed (NOVA 4) and processed (NOVA 3) products, which dominate the sample.
Show data table
Top values for nova_groups (3 unique shown, of 3 total).
valuecountshare
43366.0%
31428.0%
112.0%
Fig 3.
ecoscore_grade · Compare Eco-Score grade frequencies including the notable 'unknown' and 'not-applicable' buckets.
Show data table
Top values for ecoscore_grade (9 unique shown, of 9 total).
valuecountshare
e1224.0%
d918.0%
b816.0%
c816.0%
unknown612.0%
a36.0%
a-plus24.0%
not-applicable12.0%
f12.0%
Fig 4.
pnns_groups_2 · Confirm that biscuits/cakes and chocolate products together account for most of the sample's food categories.
Show data table
Top values for pnns_groups_2 (11 unique shown, of 11 total).
valuecountshare
Biscuits and cakes1734.0%
Chocolate products1632.0%
Appetizers48.0%
Pastries36.0%
Bread24.0%
unknown24.0%
Sweets24.0%
Dairy desserts12.0%
Waters and flavored waters12.0%
Cereals12.0%
Dried fruits12.0%
Fig 5.
lang · Spot the strong French-language bias (35/50) that will affect any text-based analysis.
Show data table
Top values for lang (5 unique shown, of 5 total).
valuecountshare
fr3570.0%
en1020.0%
de36.0%
bg12.0%
ro12.0%
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
update_keycategorical0.0%
categories_oldcategorical2.0%
ecoscore_scorenumeric14.0%
environment_impact_levelcategorical56.0%
ingredients_text_ficategorical90.0%
nutrition_data_preparedcategorical4.0%
packaging_shapes_tagsunknown0.0%
nutrient_levels_tagsunknown0.0%
packagings_materialsunknown0.0%
ingredients_without_ecobalyse_idsunknown0.0%
generic_name_nlcategorical76.0%
misc_tagsunknown0.0%
product_name_svcategorical92.0%
scans_nnumeric0.0%
schema_versionnumeric0.0%
urlcategorical0.0%
vitamins_tagsunknown0.0%
debug_param_sorted_langsunknown0.0%
packagingcategorical12.0%
gradesunknown0.0%
last_modified_tnumeric0.0%
origin_nlcategorical76.0%
allergens_lccategorical4.0%
states_hierarchyunknown0.0%
ingredients_text_jacategorical98.0%
teams_tagsunknown0.0%
traces_from_usercategorical0.0%
origins_tagsunknown0.0%
serving_quantity_unitcategorical8.0%
vitamins_prev_tagsunknown0.0%
ingredients_hierarchyunknown0.0%
unique_scans_nnumeric0.0%
labelscategorical2.0%
generic_name_encategorical14.0%
weighters_tagsunknown0.0%
popularity_tagsunknown0.0%
product_name_ficategorical90.0%
origin_frcategorical8.0%
generic_namecategorical4.0%
nutriscore_versioncategorical0.0%
ingredients_without_ciqual_codesunknown0.0%
manufacturing_places_tagsunknown0.0%
photographers_tagsunknown0.0%
packaging_text_plcategorical90.0%
informers_tagsunknown0.0%
ingredients_text_encategorical12.0%
ingredients_text_itcategorical68.0%
origin_decategorical60.0%
nova_groupnumeric4.0%
packaging_text_ficategorical90.0%
statescategorical0.0%
ingredients_with_unspecified_percent_sumnumeric0.0%
added_countries_tagsunknown0.0%
idcategorical0.0%
nutrient_levelsunknown0.0%
sortkeynumeric12.0%
image_small_urlcategorical0.0%
packaging_recycling_tagsunknown0.0%
food_groupscategorical2.0%
nova_groups_markersunknown0.0%
packaging_text_decategorical60.0%
categories_lccategorical0.0%
checkersunknown0.0%
packaging_text_escategorical60.0%
unknown_nutrients_tagsunknown0.0%
editors_tagsunknown0.0%
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredientsnumeric10.0%
labels_lccategorical2.0%
nutriscore_dataunknown0.0%
other_nutritional_substances_tagsunknown0.0%
product_name_nbcategorical96.0%
nutrition_data_prepared_percategorical0.0%
product_quantitycategorical6.0%
product_typecategorical0.0%
checkers_tagsunknown0.0%
nucleotides_tagsunknown0.0%
languages_tagsunknown0.0%
traces_lccategorical4.0%
categories_hierarchyunknown0.0%
image_front_small_urlcategorical0.0%
entry_dates_tagsunknown0.0%
ecoscore_tagsunknown0.0%
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredientsnumeric8.0%
ingredients_without_ciqual_codes_nnumeric0.0%
revnumeric0.0%
ingredients_non_nutritive_sweeteners_nnumeric0.0%
ingredients_without_ecobalyse_ids_nnumeric0.0%
environment_impact_level_tagsunknown0.0%
last_image_dates_tagsunknown0.0%
labels_hierarchyunknown0.0%
product_name_encategorical14.0%
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_valuenumeric8.0%
tracescategorical0.0%
generic_name_ficategorical90.0%
emb_codes_origcategorical34.0%
ingredients_with_specified_percent_nnumeric0.0%
nutrition_gradescategorical0.0%
weighers_tagsunknown0.0%
categories_tagsunknown0.0%
image_urlcategorical0.0%
sourcesunknown0.0%
languages_hierarchyunknown0.0%
pnns_groups_1categorical0.0%
countries_lccategorical2.0%
additives_tagsunknown0.0%
codes_tagsunknown0.0%
countries_tagsunknown0.0%
creatorcategorical0.0%
ingredientsunknown0.0%
product_name_nlcategorical76.0%
ingredients_n_tagsunknown0.0%
origin_escategorical60.0%
product_name_plcategorical90.0%
scoresunknown0.0%
brandscategorical0.0%
ingredients_text_decategorical60.0%
ingredients_text_nbcategorical96.0%
packagings_nnumeric18.0%
completenumeric0.0%
emb_codes_20141016categorical58.0%
ingredients_tagsunknown0.0%
packaging_text_jacategorical98.0%
generic_name_decategorical60.0%
last_editorcategorical2.0%
minerals_prev_tagsunknown0.0%
last_image_tnumeric0.0%
obsolete_since_datecategorical12.0%
pnns_groups_2_tagsunknown0.0%
emb_codes_tagsunknown0.0%
countries_beforescanbotcategorical14.0%
nutrition_grade_frcategorical0.0%
data_quality_tagsunknown0.0%
ingredients_with_specified_percent_sumnumeric0.0%
origin_itcategorical68.0%
nutrition_data_percategorical0.0%
origin_plcategorical90.0%
productunknown0.0%
linkcategorical4.0%
ingredients_text_nlcategorical76.0%
additives_nnumeric0.0%
generic_name_svcategorical92.0%
ingredients_that_may_be_from_palm_oil_tagsunknown0.0%
known_ingredients_nnumeric0.0%
completenessnumeric0.0%
ingredients_sweeteners_nnumeric0.0%
nova_groupscategorical4.0%
allergens_hierarchyunknown0.0%
obsoletecategorical12.0%
origin_svcategorical92.0%
packaging_hierarchyunknown0.0%
ingredients_with_unspecified_percent_nnumeric0.0%
fruits-vegetables-nuts_100g_estimatenumeric46.0%
emb_codescategorical4.0%
packagingsunknown0.0%
purchase_places_tagsunknown0.0%
additives_original_tagsunknown0.0%
image_front_urlcategorical0.0%
data_quality_bugs_tagsunknown0.0%
origin_ficategorical90.0%
imagesunknown0.0%
ingredients_analysisunknown0.0%
ingredients_text_with_allergens_plcategorical92.0%
product_name_decategorical60.0%
ingredients_text_with_allergens_nbcategorical96.0%
packaging_text_itcategorical68.0%
product_name_itcategorical68.0%
serving_quantitycategorical12.0%
product_name_jacategorical98.0%
ingredients_text_with_allergens_svcategorical92.0%
allergens_tagsunknown0.0%
ingredients_text_frcategorical4.0%
nutrition_score_beveragenumeric0.0%
ingredients_ids_debugunknown0.0%
nutrition_datacategorical2.0%
origin_jacategorical98.0%
packaging_text_encategorical14.0%
unknown_ingredients_nnumeric0.0%
ingredients_from_palm_oil_tagsunknown0.0%
labels_tagsunknown0.0%
packaging_old_before_taxonomizationcategorical24.0%
packaging_text_nbcategorical96.0%
nutrition_grades_tagsunknown0.0%
category_propertiesunknown0.0%
nutriscore_scorenumeric2.0%
packaging_tagsunknown0.0%
labels_oldcategorical8.0%
packaging_textcategorical4.0%
ingredients_percent_analysisnumeric0.0%
ecoscore_dataunknown0.0%
ingredients_text_svcategorical92.0%
brands_tagsunknown0.0%
compared_to_categorycategorical0.0%
data_sourcescategorical0.0%
other_nutritional_substances_prev_tagsunknown0.0%
ingredients_from_palm_oil_nnumeric8.0%
last_updated_tnumeric0.0%
nutrition_score_debugcategorical0.0%
popularity_keynumeric0.0%
product_name_escategorical60.0%
allergens_from_usercategorical0.0%
informersunknown0.0%
brands_oldcategorical32.0%
data_quality_errors_tagsunknown0.0%
ingredients_textcategorical0.0%
categoriescategorical0.0%
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_valuenumeric10.0%
ingredients_from_or_that_may_be_from_palm_oil_nnumeric6.0%
origins_oldcategorical22.0%
packaging_text_nlcategorical76.0%
expiration_datecategorical4.0%
selected_imagesunknown0.0%
traces_from_ingredientscategorical0.0%
ingredients_text_with_allergenscategorical0.0%
image_front_thumb_urlcategorical0.0%
lccategorical0.0%
ingredients_text_debugcategorical28.0%
packagings_materials_maincategorical62.0%
data_quality_dimensionsunknown0.0%
serving_sizecategorical12.0%
pnns_groups_1_tagsunknown0.0%
origincategorical6.0%
ingredients_lccategorical0.0%
packaging_oldcategorical14.0%
packaging_text_frcategorical6.0%
nova_group_debugcategorical0.0%
ingredients_original_tagsunknown0.0%
data_quality_completeness_tagsunknown0.0%
cities_tagsunknown0.0%
countries_hierarchyunknown0.0%
nutriscore_score_oppositenumeric2.0%
categories_properties_tagsunknown0.0%
origins_lccategorical4.0%
ciqual_food_name_tagsunknown0.0%
countriescategorical0.0%
ingredients_text_with_allergens_itcategorical68.0%
packaging_lccategorical12.0%
correctors_tagsunknown0.0%
interface_version_createdcategorical2.0%
states_tagsunknown0.0%
nutriscore_2021_tagsunknown0.0%
stores_tagsunknown0.0%
image_thumb_urlcategorical0.0%
categories_propertiesunknown0.0%
nucleotides_prev_tagsunknown0.0%
allergens_from_ingredientscategorical0.0%
ingredients_text_with_allergens_ficategorical90.0%
_keywordsunknown0.0%
manufacturing_placescategorical2.0%
pnns_groups_2categorical0.0%
ingredients_text_plcategorical90.0%
generic_name_escategorical60.0%
origin_encategorical14.0%
generic_name_itcategorical68.0%
ingredients_that_may_be_from_palm_oil_nnumeric8.0%
ingredients_text_escategorical60.0%
teamscategorical8.0%
food_groups_tagsunknown0.0%
data_quality_warnings_tagsunknown0.0%
debug_tagsunknown0.0%
main_countries_tagsunknown0.0%
origins_hierarchyunknown0.0%
packagings_completenumeric4.0%
nutriscore_tagsunknown0.0%
ingredients_text_with_allergens_nlcategorical78.0%
created_tnumeric0.0%
traces_hierarchyunknown0.0%
generic_name_nbcategorical96.0%
ingredients_text_with_allergens_decategorical66.0%
ingredients_text_with_allergens_escategorical62.0%
product_name_frcategorical2.0%
storescategorical4.0%
_idcategorical0.0%
nutrimentsunknown0.0%
editorsunknown0.0%
max_imgidcategorical0.0%
nutriscore_gradecategorical0.0%
product_quantity_unitcategorical10.0%
ingredients_analysis_tagsunknown0.0%
ingredients_text_with_allergens_frcategorical4.0%
interface_version_modifiedcategorical0.0%
data_sources_tagsunknown0.0%
ingredients_text_with_allergens_encategorical16.0%
removed_countries_tagsunknown0.0%
amino_acids_prev_tagsunknown0.0%
codecategorical0.0%
correctorsunknown0.0%
generic_name_jacategorical98.0%
generic_name_frcategorical6.0%
generic_name_plcategorical90.0%
amino_acids_tagsunknown0.0%
ingredients_debugunknown0.0%
ingredients_text_with_allergens_jacategorical98.0%
data_quality_info_tagsunknown0.0%
last_edit_dates_tagsunknown0.0%
last_modified_bycategorical2.0%
no_nutrition_datacategorical4.0%
nutriscoreunknown0.0%
origin_nbcategorical96.0%
originscategorical4.0%
nova_groups_tagsunknown0.0%
languagesunknown0.0%
nutriscore_2023_tagsunknown0.0%
packaging_materials_tagsunknown0.0%
langcategorical0.0%
packaging_text_svcategorical92.0%
photographersunknown0.0%
languages_codesunknown0.0%
ecoscore_gradecategorical0.0%
ingredients_nnumeric0.0%
allergenscategorical0.0%
minerals_tagsunknown0.0%
product_namecategorical0.0%
purchase_placescategorical2.0%
quantitycategorical2.0%
traces_tagsunknown0.0%
origin_ukcategorical98.0%
generic_name_arcategorical80.0%
packaging_text_ukcategorical98.0%
ingredients_text_arcategorical78.0%
ingredients_text_ukcategorical98.0%
last_check_dates_tagsunknown0.0%
checkedcategorical86.0%
packaging_text_arcategorical80.0%
carbon_footprint_percent_of_known_ingredientsnumeric62.0%
last_checkercategorical86.0%
product_name_ukcategorical98.0%
generic_name_ukcategorical98.0%
product_name_arcategorical78.0%
carbon_footprint_from_known_ingredients_debugcategorical72.0%
last_checked_tnumeric86.0%
ingredients_text_with_allergens_ukcategorical98.0%
ingredients_text_with_allergens_arcategorical82.0%
origin_arcategorical80.0%
nutriments_estimatedunknown0.0%
nutrition_score_warning_no_fibernumeric70.0%
ingredients_text_debug_tagsunknown0.0%
taxonomies_enhancer_tagsunknown0.0%
completed_tnumeric68.0%
product_name_bgcategorical94.0%
ingredients_text_etcategorical94.0%
origin_slcategorical98.0%
generic_name_dzcategorical98.0%
ingredients_text_slcategorical98.0%
generic_name_cacategorical96.0%
ingredients_text_dzcategorical98.0%
product_name_cacategorical96.0%
origin_cacategorical96.0%
product_name_etcategorical94.0%
ingredients_text_with_allergens_bgcategorical94.0%
ingredients_text_with_allergens_etcategorical94.0%
origin_skcategorical98.0%
origin_bgcategorical94.0%
packaging_text_slcategorical98.0%
generic_name_skcategorical98.0%
ingredients_text_with_allergens_slcategorical98.0%
ingredients_text_cacategorical96.0%
generic_name_slcategorical98.0%
product_name_dzcategorical98.0%
origin_etcategorical94.0%
ingredients_text_with_allergens_skcategorical98.0%
product_name_skcategorical98.0%
ingredients_text_with_allergens_ptcategorical84.0%
ingredients_text_with_allergens_cacategorical98.0%
generic_name_ptcategorical80.0%
packaging_text_ptcategorical80.0%
ingredients_text_ptcategorical80.0%
origin_ptcategorical80.0%
nutrition_score_warning_nutriments_estimatednumeric96.0%
packaging_text_bgcategorical94.0%
generic_name_etcategorical94.0%
packaging_text_cacategorical96.0%
product_name_slcategorical98.0%
generic_name_bgcategorical94.0%
ingredients_text_skcategorical98.0%
ingredients_text_bgcategorical94.0%
packaging_text_etcategorical94.0%
packaging_text_skcategorical98.0%
product_name_ptcategorical80.0%
abbreviated_product_name_frcategorical86.0%
obsolete_importedcategorical86.0%
sources_fieldsunknown0.0%
emb_codecategorical98.0%
lang_importedcategorical86.0%
generic_name_zhcategorical98.0%
conservation_conditions_fr_importedcategorical86.0%
origin_fr_importedcategorical96.0%
ownercategorical86.0%
ingredients_text_fr_importedcategorical86.0%
owners_tagscategorical86.0%
product_name_zhcategorical98.0%
nutrition_data_prepared_per_importedcategorical86.0%
abbreviated_product_name_fr_importedcategorical86.0%
generic_name_zh_debug_tagsunknown0.0%
customer_service_frcategorical86.0%
customer_service_fr_importedcategorical86.0%
ingredients_text_zh_debug_tagsunknown0.0%
product_name_fr_importedcategorical86.0%
brands_importedcategorical86.0%
owner_importedcategorical88.0%
product_name_zh_debug_tagsunknown0.0%
lc_importedcategorical84.0%
ingredients_text_zhcategorical98.0%
quantity_importedcategorical86.0%
nutrition_data_per_importedcategorical84.0%
generic_name_fr_importedcategorical86.0%
owner_fieldsunknown0.0%
categories_importedcategorical88.0%
conservation_conditions_frcategorical86.0%
conservation_conditionscategorical86.0%
countries_importedcategorical84.0%
origins_frcategorical96.0%
abbreviated_product_namecategorical86.0%
customer_servicecategorical86.0%
data_sources_importedcategorical84.0%
nova_group_errorcategorical96.0%
ingredients_text_de_ocr_1648897071_resultcategorical98.0%
packaging_text_rocategorical96.0%
product_name_rocategorical96.0%
producer_version_idcategorical92.0%
serving_size_importedcategorical88.0%
no_nutrition_data_importedcategorical92.0%
packaging_importedcategorical92.0%
ingredients_text_rocategorical96.0%
producer_version_id_importedcategorical92.0%
labels_importedcategorical90.0%
ingredients_text_de_ocr_1648990410_resultcategorical98.0%
allergens_importedcategorical90.0%
ingredients_text_de_ocr_1648990410categorical98.0%
ingredients_text_de_ocr_1648897071categorical98.0%
generic_name_rocategorical96.0%
origin_rocategorical96.0%
abbreviated_product_name_importedcategorical94.0%
traces_importedcategorical92.0%
specific_ingredientsunknown0.0%
product_name_rucategorical94.0%
origin_rucategorical94.0%
ingredients_text_with_allergens_rucategorical94.0%
packaging_text_rucategorical94.0%
generic_name_rucategorical94.0%
ingredients_text_rucategorical94.0%
ingredients_text_dacategorical96.0%
ingredients_text_with_allergens_dacategorical96.0%
product_name_dacategorical96.0%
packaging_text_dacategorical96.0%
generic_name_dacategorical96.0%
forest_footprint_dataunknown0.0%
origin_dacategorical96.0%
origin_srcategorical96.0%
ingredients_text_nl_ocr_1675675383_resultcategorical98.0%
ingredients_text_cscategorical94.0%
product_name_cscategorical94.0%
origin_hucategorical92.0%
packaging_text_hucategorical92.0%
origin_cscategorical96.0%
ingredients_text_with_allergens_hucategorical94.0%
generic_name_cscategorical94.0%
ingredients_text_hucategorical92.0%
ingredients_text_srcategorical96.0%
packaging_text_srcategorical96.0%
ingredients_text_nl_ocr_1675675383categorical98.0%
ingredients_text_with_allergens_cscategorical98.0%
generic_name_srcategorical96.0%
packaging_text_cscategorical94.0%
product_name_srcategorical96.0%
ingredients_text_hu_ocr_1571428260_resultcategorical98.0%
ingredients_text_hu_ocr_1571428260categorical98.0%
generic_name_hucategorical92.0%
product_name_hucategorical92.0%
ingredients_text_with_allergens_srcategorical96.0%
ingredients_text_es_ocr_1548767061_resultcategorical98.0%
product_name_xxcategorical96.0%
generic_name_xxcategorical96.0%
ingredients_text_es_ocr_1548767061categorical98.0%
ingredients_text_xxcategorical96.0%
origin_xxcategorical98.0%
packaging_text_xxcategorical98.0%
ingredients_text_urcategorical98.0%
product_name_urcategorical98.0%
origin_hecategorical98.0%
product_name_hecategorical96.0%
origin_urcategorical98.0%
generic_name_urcategorical98.0%
packaging_text_hecategorical98.0%
ingredients_text_hecategorical98.0%
packaging_text_urcategorical98.0%
generic_name_hecategorical98.0%
ingredients_text_with_allergens_hecategorical98.0%
nutriscore_grade_producercategorical94.0%
nutriscore_grade_producer_importedcategorical94.0%
packaging_text_elcategorical98.0%
ingredients_text_with_allergens_elcategorical98.0%
ingredients_text_elcategorical98.0%
generic_name_elcategorical98.0%
origin_elcategorical98.0%
product_name_elcategorical98.0%
generic_name_thcategorical98.0%
ingredients_text_de_ocr_1559410715_resultcategorical98.0%
ingredients_text_with_allergens_thcategorical98.0%
packaging_text_thcategorical98.0%
product_name_thcategorical98.0%
ingredients_text_de_ocr_1548767354_resultcategorical98.0%
ingredients_text_thcategorical98.0%
origin_thcategorical98.0%
ingredients_text_de_ocr_1548767354categorical98.0%
ingredients_text_de_ocr_1559410715categorical98.0%
ingredients_text_it_ocr_1559410715categorical98.0%
ingredients_text_it_ocr_1559410715_resultcategorical98.0%
packaging_text_fr_importedcategorical98.0%
preparation_fr_importedcategorical98.0%
preparationcategorical98.0%
preparation_frcategorical98.0%
ingredients_text_lccategorical98.0%
product_name_lccategorical98.0%
ingredients_text_with_allergens_lccategorical98.0%
generic_name_lccategorical98.0%
ingredients_text_xx_debug_tagsunknown0.0%
product_name_xx_debug_tagsunknown0.0%
generic_name_xx_debug_tagsunknown0.0%
ingredients_text_fr_ocr_1561814324categorical98.0%
ingredients_text_fr_ocr_1561814324_resultcategorical98.0%
ingredients_text_fr_ocr_1624039072_resultcategorical98.0%
ingredients_text_fr_ocr_1624039072categorical98.0%
ingredients_text_fr_ocr_1573108346categorical98.0%
ingredients_text_fr_ocr_1566920858_resultcategorical98.0%
ingredients_text_fr_ocr_1573107556categorical98.0%
ingredients_text_fr_ocr_1573108346_resultcategorical98.0%
ingredients_text_fr_ocr_1573107560_resultcategorical98.0%
ingredients_text_fr_ocr_1573108349_resultcategorical98.0%
ingredients_text_fr_ocr_1573108360categorical98.0%
ingredients_text_fr_ocr_1573109955_resultcategorical98.0%
ingredients_text_fr_ocr_1573108349categorical98.0%
ingredients_text_fr_ocr_1573109955categorical98.0%
ingredients_text_fr_ocr_1573107556_resultcategorical98.0%
ingredients_text_fr_ocr_1573108360_resultcategorical98.0%
ingredients_text_fr_ocr_1573107560categorical98.0%
ingredients_text_fr_ocr_1566920858categorical98.0%
generic_name_ltcategorical98.0%
ingredients_text_with_allergens_rocategorical98.0%
packaging_text_ltcategorical98.0%
ingredients_text_ltcategorical98.0%
origin_ltcategorical98.0%
product_name_ltcategorical98.0%
ingredients_text_with_allergens_ltcategorical98.0%
ingredients_text_fr_ocr_1713713129categorical98.0%
ingredients_text_fr_ocr_1713713129_resultcategorical98.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
ecoscore_scorescans_nschema_versionlast_modified_tunique_scans_nnova_groupingredients_with_unspecified_percent_sumsortkeynutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredientsnutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredientsingredients_without_ciqual_codes_nrev
ecoscore_score+1.00+1.00+nan+1.00+1.00-1.00+1.00-1.00+nan+nan-1.00-1.00
scans_n+1.00+1.00+nan+1.00+1.00-1.00+1.00-1.00+nan+nan-1.00-1.00
schema_version+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
last_modified_t+1.00+1.00+nan+1.00+1.00-1.00+1.00-1.00+nan+nan-1.00-1.00
unique_scans_n+1.00+1.00+nan+1.00+1.00-1.00+1.00-1.00+nan+nan-1.00-1.00
nova_group-1.00-1.00+nan-1.00-1.00+1.00-1.00+1.00+nan+nan+1.00+1.00
ingredients_with_unspecified_percent_sum+1.00+1.00+nan+1.00+1.00-1.00+1.00-1.00+nan+nan-1.00-1.00
sortkey-1.00-1.00+nan-1.00-1.00+1.00-1.00+1.00+nan+nan+1.00+1.00
nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan+nan
ingredients_without_ciqual_codes_n-1.00-1.00+nan-1.00-1.00+1.00-1.00+1.00+nan+nan+1.00+1.00
rev-1.00-1.00+nan-1.00-1.00+1.00-1.00+1.00+nan+nan+1.00+1.00

update_key categorical metadata

A categorical update_key field with only 9 distinct values across 50 rows, dominated by 'brands' at 56% (28/50) and 'sort' at 20%. The long tail mixes human-readable labels ('divinfood', 'nova-yogurts', 'germany2', 'france') with timestamp-style tokens ('key_1748337248', 'ingredients20240805'), suggesting inconsistent naming conventions for what appears to track update batches or jobs. Entropy ratio of 0.64 confirms the heavy concentration on a few keys.

Treatment: Group rare keys into 'other' or normalize naming before using as a grouping dimension.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["update_key"].stats

statvalue
n50
nulls0 (0.0%)
unique9
top_value brands
top_rate 0.56
cardinality 9
entropy 2.015
entropy_ratio 0.6357
alert: long_tail5 singleton categories
Fig 8.
Top values for update_key.
Show data table
Top values for update_key (9 unique shown, of 9 total).
valuecountshare
brands2856.0%
sort1020.0%
divinfood510.0%
key_174833724824.0%
nova-yogurts12.0%
key_174483097012.0%
ingredients2024080512.0%
germany212.0%
france12.0%

categories_old categorical feature

Hierarchical product category strings (Open Food Facts style taxonomy paths), stored as comma-separated breadcrumbs. Near-unique with 45 distinct values across 50 rows and entropy ratio 0.99, and the strings appear in mixed languages (French, English, Polish, Bulgarian Cyrillic), so direct grouping will fragment. Top value covers only 4% of rows and one row is null.

Treatment: Split on commas, normalise language, and keep only the top-level taxon as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["categories_old"].stats

statvalue
n50
nulls1 (2.0%)
unique45
top_value Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits secs
top_rate 0.04082
cardinality 45
entropy 5.451
entropy_ratio 0.9926
alert: long_tail41 singleton categories
Fig 9.
Top values for categories_old.
Show data table
Top values for categories_old (20 unique shown, of 45 total).
valuecountshare
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits secs24.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits24.0%
Aliments et boissons à base de végétaux, Aliments d'origine végétale, Snacks, Céréales et pommes de terre, Pains, Tartines craquantes extrudées, Pains croustillants24.0%
Snacks, Sweet snacks, Cocoa and its products, Chocolates, Dark chocolates24.0%
Dairies, Fermented foods, Fermented milk products, Cheeses, Cream cheeses, fr:Fromages-frais-sucres, en:yogurts12.0%
Snacks, Snacks sucrés, Cacao et dérivés12.0%
Przekąski, Słodkie przekąski, Kakao i produkty na bazie kakao, Czekolada, Czekolada deserowa, Czekolada gorzka12.0%
Закуски, Сладки закуски, Какаови изделия, Шоколади, Тъмен шоколад12.0%
Boissons et préparations de boissons, Boissons, Eaux, Eaux de sources, Boissons sans sucre ajouté12.0%
Snacks, Snacks sucrés, Confiseries, Succédanés du chocolat, en:Vegecaos12.0%
Snacks, Sweet snacks, Cocoa and its products, Confectioneries, Chocolates, Dark chocolates12.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, en:Biscuits et gâteaux, en:Snacks sucrés12.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits, Biscuits sablés, Sablés à la noix de coco12.0%
Botanas,Snacks dulces,Galletas y pasteles,Galletas,Galletas rellenas12.0%
Produits laitiers, Produits fermentés, Produits laitiers fermentés, Snacks, Fromages, Snacks sucrés, Cacao et dérivés, Chocolats, Chocolats noirs, Chocolats noirs en tablette, Chocolat noir en tablette extra dégustation à 70% de cacao minimum12.0%
Aliments et boissons à base de végétaux, Aliments d'origine végétale, Snacks, Céréales et pommes de terre, Snacks salés, Amuse-gueules, Chips et frites, Chips, Chips de pommes de terre, Chips de pommes de terre à l'huile de tournesol, en:Aliments d'origine végétale, en:Aliments et boissons à base de végétaux, en:Amuse-gueules, en:Chips, en:Chips de pommes de terre, en:Chips de pommes de terre classiques, en:Chips de pommes de terre à l'huile de tournesol, en:Chips et frites, en:Céréales et pommes de terre, en:Snacks salés12.0%
Snacks, Snacks sucrés, Cacao et dérivés, Chocolats, Chocolats noirs, Chocolats noirs en tablette12.0%
Snacks,Sweet snacks,Biscuits and cakes,Biscuits,Chocolate biscuits,Filled biscuits,Dark chocolate biscuits12.0%
Snacks, Sweet snacks, Cocoa and its products, Chocolates, Dark chocolates, Cacao-et-derives, Chocolats, Chocolats-noirs, Chocolats-noirs-extra-fin12.0%
Aliments et boissons à base de végétaux,Aliments d'origine végétale,Céréales et pommes de terre,Pains,Pains croustillants12.0%

ecoscore_score numeric feature

Numeric Eco-Score rating per item, ranging from 13 to 94 with a mean of 47.7 and median of 44. The distribution is mildly right-skewed (0.31) and platykurtic (-0.79), spanning a wide IQR of 36.5 with no outliers flagged. Notably, 14% of values are null and only 31 unique scores appear across 50 rows.

Treatment: Impute or flag the 14% nulls, then use as a continuous feature without transformation.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["ecoscore_score"].stats

statvalue
n50
nulls7 (14.0%)
unique31
min 13
max 94
mean 47.74
median 44
std 21.19
q1 27.5
q3 64
iqr 36.5
skew 0.3069
kurtosis -0.7946
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 10.
Distribution of ecoscore_score. Vertical dash marks the median.
Show data table
Histogram bins for ecoscore_score (median: 44.0).
bincount
13 – 26.510
26.5 – 404
40 – 53.512
53.5 – 678
67 – 80.56
80.5 – 943

environment_impact_level categorical feature

This appears to be a categorical flag for environmental impact severity, but it carries no usable signal in this sample. 56% of the 50 rows are null, and the remaining 22 records all hold the empty string, leaving cardinality at 1 and entropy at 0.

Treatment: Drop; the column is effectively constant and majority-null.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["environment_impact_level"].stats

statvalue
n50
nulls28 (56.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate56.0% null
alert: imbalancetop value is 100.0% of rows
Fig 11.
Top values for environment_impact_level.
Show data table
Top values for environment_impact_level (1 unique shown, of 1 total).
valuecountshare
2244.0%

ingredients_text_fi categorical free_text

Finnish-language ingredient declarations, almost entirely absent: 90% of the 50 rows are null and only 4 distinct non-null values appear, two of which are empty strings. The few populated entries are verbose product ingredient lists (chocolate, wheat-based baked goods) with allergen markup, suggesting this is a localized free-text field rather than a categorical feature despite its low cardinality here.

Treatment: Drop or set aside; null_rate 0.9 makes it unusable as a feature without a Finnish-text NLP pipeline.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["ingredients_text_fi"].stats

statvalue
n50
nulls45 (90.0%)
unique4
top_value
top_rate 0.4
cardinality 4
entropy 1.922
entropy_ratio 0.961
alert: long_tail3 singleton categories
alert: null_rate90.0% null
Fig 12.
Top values for ingredients_text_fi.
Show data table
Top values for ingredients_text_fi (4 unique shown, of 4 total).
valuecountshare
24.0%
kaakaomassa, kaakaovoi, vähärasvainen kaakaojauhe, sokeri, vanilja. Saattaa sisältää hasselpähkinää, muita pähkinöitä, maitoa, soijaa. Tummassa suklaassa kaakaota vähintään 90%.12.0%
kaakaomassa, vähärasvainen kaakaojauhe, kaakaovoi, sokeri, emulgointiaine (_soijalesitiini_), vaniljauute. Suklaassa kaakaota vähintään 85 %. Saattaa sisältää pieniä määriä pähkinää ja maitoa.12.0%
_VEHNÄJAUHO_, palmuöljy, tärkkelyssiirappi, _OHRAMALLASUUTE_, nostatusaineet ammoniumkarbonaatit, natriumkarbonaatit), suola, _KANANMUNAT_, aromi, jauhonparanne (_NATRIUMDISULFIITTI_).12.0%

nutrition_data_prepared categorical metadata

This appears to be a flag indicating whether nutrition data was prepared, but it carries no information: only one unique value (an empty string) appears across all 48 non-null rows, with a 4% null rate. Entropy is 0 and top_rate is 1.0, so the column is constant.

Treatment: Drop; constant column with no signal.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["nutrition_data_prepared"].stats

statvalue
n50
nulls2 (4.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 13.
Top values for nutrition_data_prepared.
Show data table
Top values for nutrition_data_prepared (1 unique shown, of 1 total).
valuecountshare
4896.0%

packaging_shapes_tags unknown free_text

This column, packaging_shapes_tags, was skipped by the profiler so no descriptive statistics are available beyond a row count of 50 and a null rate of 0. The name suggests a tag-style field listing packaging shape descriptors, likely multi-valued per row, which is probably why it was bypassed. Without unique counts or value samples, nothing further can be confirmed.

Treatment: Re-profile after splitting the tag list, then one-hot or multi-label encode.

anthropic:claude-opus-4-7 · confidence low
Out[31]:

saturn.columns["packaging_shapes_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutrient_levels_tags unknown feature

Column 'nutrient_levels_tags' was skipped by the profiler, so no statistics beyond a 50-row count and 0% null rate are available. The name suggests a list of nutrient classification tags (likely multi-valued strings like 'fat-in-low-quantity'), but uniqueness, cardinality, and value distribution are all unknown. Treat any downstream use cautiously until the column is re-profiled with list-aware parsing.

Treatment: Re-profile with list/tag-aware parsing, then one-hot or multi-label encode the individual tags.

anthropic:claude-opus-4-7 · confidence low
Out[33]:

saturn.columns["nutrient_levels_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packagings_materials unknown other

The column packagings_materials was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. We only know there are 50 rows and a 0.0 null rate; uniqueness, type, and value distribution are all missing. The name suggests structured packaging material data (likely nested or list-valued), which would explain why the profiler bailed out.

Treatment: Inspect raw values manually and parse the nested structure before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[35]:

saturn.columns["packagings_materials"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_without_ecobalyse_ids unknown other

This column is named `ingredients_without_ecobalyse_ids`, suggesting it lists ingredients that lack matching identifiers in the Ecobalyse reference system. Saturn skipped profiling, so type, uniqueness, and value distribution are unknown despite a populated null_rate of 0.0 across 50 rows.

Treatment: Inspect raw values manually to determine structure (likely a list) before deciding on parsing or join strategy.

anthropic:claude-opus-4-7 · confidence low
Out[37]:

saturn.columns["ingredients_without_ecobalyse_ids"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

generic_name_nl categorical metadata

This appears to be a Dutch-language generic product name field, likely from a food product catalog. It is largely unusable as-is: 76% of rows are null and among the 12 non-null entries, 9 are empty strings, leaving only 3 distinct real values (e.g., 'Extra fijne pure chocolade'). Cardinality is just 4 across 50 rows, so there is essentially no signal here.

Treatment: Drop, or retain only as a descriptive label — too sparse to model.

anthropic:claude-opus-4-7 · confidence high
Out[39]:

saturn.columns["generic_name_nl"].stats

statvalue
n50
nulls38 (76.0%)
unique4
top_value
top_rate 0.75
cardinality 4
entropy 1.208
entropy_ratio 0.6038
alert: long_tail3 singleton categories
alert: null_rate76.0% null
Fig 14.
Top values for generic_name_nl.
Show data table
Top values for generic_name_nl (4 unique shown, of 4 total).
valuecountshare
918.0%
Extra fijne pure chocolade12.0%
Biscuits bedekt met melkchocolade12.0%
Krokante volkorentoasts12.0%

misc_tags unknown other

The column 'misc_tags' was skipped by the profiler, so no type inference, uniqueness count, or value statistics are available. The only confirmed signals are 50 rows with a 0.0 null rate. Without further stats, its content and structure cannot be characterized.

Treatment: Re-profile with a parser suited to this column (e.g., list/JSON tags) before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[42]:

saturn.columns["misc_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_sv categorical metadata

Swedish-localised product name field, populated for only 4 of 50 rows (null_rate 0.92). The 4 present values are all unique, giving maximum entropy (entropy_ratio 1.0) but no repeated category to learn from. Values like "90% Cocoa" and "Dark 70%" look English rather than Swedish, suggesting localisation is incomplete or mislabelled.

Treatment: Drop or defer until localisation coverage improves; not usable as a feature at 92% null.

anthropic:claude-opus-4-7 · confidence high
Out[44]:

saturn.columns["product_name_sv"].stats

statvalue
n50
nulls46 (92.0%)
unique4
top_value 90% Cocoa
top_rate 0.25
cardinality 4
entropy 2
entropy_ratio 1
alert: long_tail4 singleton categories
alert: null_rate92.0% null
Fig 15.
Top values for product_name_sv.
Show data table
Top values for product_name_sv (4 unique shown, of 4 total).
valuecountshare
90% Cocoa12.0%
Arriba 85% Cacao Dark Chocolate12.0%
Dark 70%12.0%
Original12.0%

scans_n numeric feature

A numeric count of scans per record, with 49 unique values across 50 rows and no nulls or zeros. The distribution is tightly clustered (median 492, IQR 217) but extremely right-skewed (skew 3.90, kurtosis 18.72) with a max of 2523 versus a Q3 of 604, producing 4 outliers (8%). The mean (577.94) sits well above the median, confirming a heavy upper tail.

Treatment: Log-transform or winsorize before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[47]:

saturn.columns["scans_n"].stats

statvalue
n50
nulls0 (0.0%)
unique49
min 333
max 2,523
mean 577.9
median 492
std 343.9
q1 387
q3 604
iqr 217
skew 3.899
kurtosis 18.72
n_outliers 4
outlier_rate 0.08
zero_rate 0
alert: high_skewskew=+3.90
alert: outliers8.0% rows beyond 1.5 IQR
Fig 16.
Distribution of scans_n. Vertical dash marks the median.
Show data table
Histogram bins for scans_n (median: 492.0).
bincount
333 – 645.939
645.9 – 958.77
958.7 – 12723
1272 – 15840
1584 – 18970
1897 – 22100
2210 – 25231

schema_version numeric metadata

Constant numeric column holding the value 996.0 across all 50 rows with no nulls. Despite being typed as numeric, the zero variance (std 0.0, iqr 0.0) and single unique value indicate this is a schema/version tag rather than a measurement. Carries no signal for modelling.

Treatment: Drop before modelling; retain only as a provenance tag.

anthropic:claude-opus-4-7 · confidence high
Out[50]:

saturn.columns["schema_version"].stats

statvalue
n50
nulls0 (0.0%)
unique1
min 996
max 996
mean 996
median 996
std 0
q1 996
q3 996
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 17.
Distribution of schema_version. Vertical dash marks the median.
Show data table
Histogram bins for schema_version (median: 996.0).
bincount
995.5 – 995.60
995.6 – 995.80
995.8 – 995.90
995.9 – 996.150
996.1 – 996.20
996.2 – 996.40
996.4 – 996.50

url categorical identifier

This column holds Open Food Facts product URLs, one per row, with every value unique across all 50 rows (cardinality 50, entropy_ratio 1.0). The URL path embeds a product barcode plus a slugified name, so it functions as a permalink/identifier rather than a feature. No nulls, but the long_tail alert simply reflects that every row is its own category.

Treatment: Drop from modelling; keep as a join key or reference link to the source product page.

anthropic:claude-opus-4-7 · confidence high
Out[53]:

saturn.columns["url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://world.openfoodfacts.org/product/6111242100992/perly
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 18.
Top values for url.
Show data table
Top values for url (20 unique shown, of 50 total).
valuecountshare
https://world.openfoodfacts.org/product/6111242100992/perly12.0%
https://world.openfoodfacts.org/product/7622210449283/prince-gout-chocolat-lu12.0%
https://world.openfoodfacts.org/product/3046920029759/edelbitter-schokolade-lindt12.0%
https://world.openfoodfacts.org/product/6111031005064/tonik-%D8%B9%D8%B1%D8%A8%D9%8A12.0%
https://world.openfoodfacts.org/product/3175680011480/gerble-sesame-cookie-230g-8-2oz12.0%
https://world.openfoodfacts.org/product/20995553/chocolat-noir-85-cacao-j-d-gross12.0%
https://world.openfoodfacts.org/product/3268840001008/hhhhh-cristaline12.0%
https://world.openfoodfacts.org/product/3362600011044/henry-s12.0%
https://world.openfoodfacts.org/product/8425197712024/compound-chocolate-with-milk-and-almonds-maruja12.0%
https://world.openfoodfacts.org/product/7622210578464/organic-70-dark-chocolate-bar-green-black-s12.0%
https://world.openfoodfacts.org/product/6111259343108/king-cookies-excelo12.0%
https://world.openfoodfacts.org/product/3362600011228/sable-coco-henry-s-42g12.0%
https://world.openfoodfacts.org/product/8000500310427/biscuits-nutella12.0%
https://world.openfoodfacts.org/product/7300400481595/authentique-wasa12.0%
https://world.openfoodfacts.org/product/3046920022651/excellence-noir-intense-70-cacao-lindt12.0%
https://world.openfoodfacts.org/product/5060042641000/tyrell-s-lightly-sea-salted-tyrrell-s12.0%
https://world.openfoodfacts.org/product/7622210584724/intense-dark-chocolate-green-and-black12.0%
https://world.openfoodfacts.org/product/3046920022606/excellence-85-cacao-chocolat-noir-puissant-lindt-lindt12.0%
https://world.openfoodfacts.org/product/3229820100234/filled-dark-chocolate-bjorg12.0%
https://world.openfoodfacts.org/product/20022464/extra-dark-74-cocoa-fin-carre12.0%

vitamins_tags unknown other

The column `vitamins_tags` was skipped by the profiler, so no type, uniqueness, or distribution statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a list-valued field enumerating vitamin identifiers (e.g., tag-style strings), but this cannot be confirmed from the evidence. Without parsing, downstream use is blocked.

Treatment: Re-profile after parsing as a list of tags, then one-hot or multi-hot encode.

anthropic:claude-opus-4-7 · confidence low
Out[56]:

saturn.columns["vitamins_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

debug_param_sorted_langs unknown metadata

This column was skipped by the profiler (alert: "skipped"), so its kind is unknown and no statistics were computed beyond a row count of 50 with 0% nulls. The name suggests a debug artefact holding sorted language codes, likely a list or compound value the profiler couldn't classify. Without unique counts or value samples there is nothing further to infer.

Treatment: Drop unless you can re-profile with list/struct support; it appears to be a debug field.

anthropic:claude-opus-4-7 · confidence low
Out[58]:

saturn.columns["debug_param_sorted_langs"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging categorical free_text

Free-form packaging descriptions, likely from a food/product database (Open Food Facts style) given multilingual prefixes like 'en:', 'es:', 'pt:'. Cardinality is extreme: 41 unique values across 50 rows with entropy ratio 0.985, and the top value 'Plastique' covers only 9% — most entries are comma-separated multi-tag strings mixing languages. 12% are null, and the long_tail alert confirms there is no usable category structure as-is.

Treatment: Split on commas, normalize language prefixes, and one-hot encode the resulting material tags rather than using the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[60]:

saturn.columns["packaging"].stats

statvalue
n50
nulls6 (12.0%)
unique41
top_value Plastique
top_rate 0.09091
cardinality 41
entropy 5.278
entropy_ratio 0.9851
alert: long_tail40 singleton categories
Fig 19.
Top values for packaging.
Show data table
Top values for packaging (20 unique shown, of 41 total).
valuecountshare
Plastique48.0%
Packet,Hdpe film-packet,Etui en carton,Film en plastique12.0%
en:Aluminium wrap,en:Box cardboard,en:Caja de cartón,en:Card-box,en:Foil-wrapper,es:Recipiente,pt:Papel de aluminio,Étui carton,Feuille aluminium12.0%
Cardboard,Plastic12.0%
Cardboard,Non-corrugated cardboard12.0%
Plastique,Bouteille ou Flacon,PET 1 - Polytéréphtalate d'éthylène,Bouteille,Bouchon en plastique12.0%
Métal,Papier,en:Recyclable Metals,Aluminium12.0%
Paper/Foil12.0%
Plastique,O 7 - Autres plastiques12.0%
Papier,Enveloppe,en:Package paper,en:Paper recycling12.0%
Métal,Carton,Métaux recyclables,Aluminium12.0%
en:MixedPlasticFilm-packet,en:mixed plastic film-packet12.0%
1 film to recycle, 1 paper wrap to recycle, en:paper-wrapper, en:foil-wrapper12.0%
fr:emballage carton,fr:papier aluminium12.0%
Étui,Carton,Plastique,Sec,Film12.0%
Plastikowe,Mixed plastic-packet,Sachet plastique de 3g,12.0%
Carta,Busta12.0%
Papier12.0%
12.0%
Plástico12.0%

grades unknown other

The column is named "grades" and contains 50 rows with no nulls, but saturn skipped profiling and could not infer a kind, so no distributional stats are available. Without n_unique or value summaries, it's impossible to tell whether this holds letter grades, numeric scores, or a nested structure. The "skipped" alert is the key signal: something about the storage type prevented standard analysis.

Treatment: Manually inspect a sample to determine the underlying type before deciding on a downstream encoding.

anthropic:claude-opus-4-7 · confidence low
Out[63]:

saturn.columns["grades"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

last_modified_t numeric timestamp

Values are Unix epoch seconds (min 1737907641, max 1768643720) so this column is a last-modified timestamp, likely covering early 2025 through late 2025. All 50 rows are unique with no nulls, but the distribution is heavily left-skewed (skew -1.96) with 6 outliers (12%) sitting far below the q1 of 1761612624, suggesting a small tail of much older edits while most records cluster within a ~6.1M second IQR. Treat as a timestamp, not a numeric feature.

Treatment: Convert from epoch seconds to datetime and derive recency or bucketed features instead of using the raw integer.

anthropic:claude-opus-4-7 · confidence high
Out[65]:

saturn.columns["last_modified_t"].stats

statvalue
n50
nulls0 (0.0%)
unique50
min 1.738e+09
max 1.769e+09
mean 1.763e+09
median 1.767e+09
std 8.093e+06
q1 1.762e+09
q3 1.768e+09
iqr 6.138e+06
skew -1.961
kurtosis 2.972
n_outliers 6
outlier_rate 0.12
zero_rate 0
alert: outliers12.0% rows beyond 1.5 IQR
Fig 20.
Distribution of last_modified_t. Vertical dash marks the median.
Show data table
Histogram bins for last_modified_t (median: 1766580948.5).
bincount
1.738e+09 – 1.742e+093
1.742e+09 – 1.747e+091
1.747e+09 – 1.751e+091
1.751e+09 – 1.755e+092
1.755e+09 – 1.76e+093
1.76e+09 – 1.764e+098
1.764e+09 – 1.769e+0932

origin_nl categorical metadata

origin_nl appears to be a categorical attribute (likely a Dutch-language origin label) but is effectively empty in this sample. 76% of the 50 rows are null, and the remaining 12 non-null entries are all the empty string, giving a cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop; column has no variance and is mostly null.

anthropic:claude-opus-4-7 · confidence high
Out[68]:

saturn.columns["origin_nl"].stats

statvalue
n50
nulls38 (76.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate76.0% null
alert: imbalancetop value is 100.0% of rows
Fig 21.
Top values for origin_nl.
Show data table
Top values for origin_nl (1 unique shown, of 1 total).
valuecountshare
1224.0%

allergens_lc categorical metadata

Language code for the allergens text, with 6 distinct values across 50 rows and a 4% null rate. The distribution is nearly bimodal between 'en' (22) and 'fr' (21), with es/de/it/pl appearing once or twice each — a language mix worth flagging before any text processing.

Treatment: Use as a language filter or routing key before tokenizing the allergens text.

anthropic:claude-opus-4-7 · confidence high
Out[71]:

saturn.columns["allergens_lc"].stats

statvalue
n50
nulls2 (4.0%)
unique6
top_value en
top_rate 0.4583
cardinality 6
entropy 1.578
entropy_ratio 0.6104
Fig 22.
Top values for allergens_lc.
Show data table
Top values for allergens_lc (6 unique shown, of 6 total).
valuecountshare
en2244.0%
fr2142.0%
es24.0%
de12.0%
it12.0%
pl12.0%

states_hierarchy unknown other

The column 'states_hierarchy' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. We can only confirm there are 50 rows with no nulls; uniqueness, type, and value distribution are unavailable.

Treatment: Re-profile or inspect manually to determine type before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[74]:

saturn.columns["states_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_ja categorical free_text

Japanese-language ingredients text, almost entirely absent from this sample. 98% of the 50 rows are null, and the single non-null value is an empty string, leaving cardinality at 1 and entropy at 0.

Treatment: Drop; the column carries no usable signal in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[76]:

saturn.columns["ingredients_text_ja"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 23.
Top values for ingredients_text_ja.
Show data table
Top values for ingredients_text_ja (1 unique shown, of 1 total).
valuecountshare
12.0%

teams_tags unknown other

The column `teams_tags` was skipped by the profiler, so its kind, uniqueness, and value distribution are unknown. Only two facts are available: 50 rows were seen and none were null. Without further stats, the content (e.g. whether it holds lists, delimited tags, or structured objects) cannot be characterised.

Treatment: Re-profile with a parser that handles this column's type before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[79]:

saturn.columns["teams_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

traces_from_user categorical free_text

This column appears to capture user-submitted allergen/ingredient traces, prefixed with a language code like '(en)' or '(fr)' followed by comma-separated tags such as 'en:milk,en:nuts'. With 35 unique values across 50 rows and entropy ratio 0.938, it is highly diverse; the top value '(en) ' (an empty tag list) covers only 14% and the distribution has a long tail. Notably, the language prefix is mixed (English and French) and many entries are blank tag lists, which complicates direct use as a category.

Treatment: Parse the language prefix and split the tag list into a multi-hot allergen feature before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[81]:

saturn.columns["traces_from_user"].stats

statvalue
n50
nulls0 (0.0%)
unique35
top_value (en)
top_rate 0.14
cardinality 35
entropy 4.811
entropy_ratio 0.9379
alert: long_tail29 singleton categories
Fig 24.
Top values for traces_from_user.
Show data table
Top values for traces_from_user (20 unique shown, of 35 total).
valuecountshare
(en) 714.0%
(fr) 48.0%
(fr) en:milk,en:nuts48.0%
(en) en:milk,en:nuts24.0%
(en) en:milk,en:nuts,en:sesame-seeds,en:soybeans24.0%
(en) en:nuts24.0%
(en) Eggs12.0%
(fr) en:milk,en:nuts,en:soybeans12.0%
(fr) en:eggs,en:lupin,en:milk,en:mustard,en:nuts,en:soybeans12.0%
(fr) en:milk,en:soybeans12.0%
(fr) Lait,Fruits à coque12.0%
(fr) Soja12.0%
(es) en:mustard12.0%
(fr) en:lupin,en:milk,en:mustard,en:sesame-seeds,en:soybeans12.0%
(fr) en:milk,en:nuts,en:sesame-seeds,en:soybeans12.0%
(en) en:milk12.0%
(fr) Lait,Fruits à coque,Graines de sésame,Soja12.0%
(en) en:eggs,en:mustard,en:nuts,en:sesame-seeds,en:soybeans12.0%
(en) en:gluten,en:Amande,en:Arachides,en:Avoine,en:Blé,en:Lait,en:Noisettes,en:Noix,en:Noix de cajou,en:Noix de macadamia,en:Noix de pécan,en:Noix du brésil,en:Orge,en:Pistaches,en:Seigle12.0%
(fr) en:lupin,en:milk,en:mustard,en:soybeans12.0%

origins_tags unknown other

The column `origins_tags` was skipped by the profiler, so kind is unknown and no descriptive statistics were computed. The only confirmed signals are 50 rows present and a 0.0 null rate; uniqueness, value distribution, and data type are all unavailable.

Treatment: Re-profile with an appropriate parser (likely a list/tag field) before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[84]:

saturn.columns["origins_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

serving_quantity_unit categorical metadata

This column records the unit of measurement for serving quantity, almost exclusively grams ('g' at 45 of 46 non-null rows, top_rate 0.978) with a single 'ml' entry. With only 2 unique values, an 8% null rate, and entropy_ratio of 0.151, it carries almost no information.

Treatment: Drop or collapse to a binary flag; near-constant with negligible signal.

anthropic:claude-opus-4-7 · confidence high
Out[86]:

saturn.columns["serving_quantity_unit"].stats

statvalue
n50
nulls4 (8.0%)
unique2
top_value g
top_rate 0.9783
cardinality 2
entropy 0.1511
entropy_ratio 0.1511
alert: imbalancetop value is 97.8% of rows
Fig 25.
Top values for serving_quantity_unit.
Show data table
Top values for serving_quantity_unit (2 unique shown, of 2 total).
valuecountshare
g4590.0%
ml12.0%

vitamins_prev_tags unknown other

The column "vitamins_prev_tags" was skipped by the profiler, so no type, uniqueness, or distribution stats are available. The only confirmed signals are 50 rows with a 0.0 null rate. Without further evidence the content (likely a list/array of prior tag values given the name) cannot be characterized.

Treatment: Re-profile with a parser that handles nested/array values before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[89]:

saturn.columns["vitamins_prev_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_hierarchy unknown other

This column is labelled ingredients_hierarchy but saturn skipped profiling it, so no type, uniqueness, or value statistics are available. The only confirmed signals are that it has 50 rows and zero nulls. Without further evidence, the structure (likely nested or list-valued, given the name) cannot be verified.

Treatment: Re-profile with a parser that handles nested or list-valued fields before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[91]:

saturn.columns["ingredients_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

unique_scans_n numeric feature

Numeric count of unique scans per row, with 48 distinct values across 50 records and no nulls or zeros. The distribution is heavily right-skewed (skew 3.91, kurtosis 18.71): median is 432 against a mean of 525.38, and the max of 2257 sits far beyond q3 of 560.75, producing 4 outliers (8% outlier rate). Std of 306.41 dwarfs the IQR of 198, confirming a long upper tail.

Treatment: Log-transform or winsorize before modelling to tame the long upper tail.

anthropic:claude-opus-4-7 · confidence high
Out[93]:

saturn.columns["unique_scans_n"].stats

statvalue
n50
nulls0 (0.0%)
unique48
min 319
max 2,257
mean 525.4
median 432
std 306.4
q1 362.8
q3 560.8
iqr 198
skew 3.911
kurtosis 18.71
n_outliers 4
outlier_rate 0.08
zero_rate 0
alert: high_skewskew=+3.91
alert: outliers8.0% rows beyond 1.5 IQR
Fig 26.
Distribution of unique_scans_n. Vertical dash marks the median.
Show data table
Histogram bins for unique_scans_n (median: 432.0).
bincount
319 – 595.939
595.9 – 872.77
872.7 – 11503
1150 – 14260
1426 – 17030
1703 – 19800
1980 – 22571

labels categorical feature

Free-form labels/certifications column (e.g. organic, fair-trade, Triman, Green Dot) stored as comma-separated multi-label strings, often mixing English, French, Portuguese and Spanish tokens. Of 50 rows, 42 distinct values and entropy ratio 0.95 indicate near-unique combinations; the only repeated 'value' is the empty string (8 rows, 16%) on top of a 2% null rate, so roughly one in five records carries no label at all. The long_tail alert is well earned — almost every non-empty cell is its own bag of tags.

Treatment: Split on commas into a multi-hot tag set (normalising language variants) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[96]:

saturn.columns["labels"].stats

statvalue
n50
nulls1 (2.0%)
unique42
top_value
top_rate 0.1633
cardinality 42
entropy 5.125
entropy_ratio 0.9504
alert: long_tail41 singleton categories
Fig 27.
Top values for labels.
Show data table
Top values for labels (20 unique shown, of 42 total).
valuecountshare
816.0%
Distributor labels,Charte LU Harmony,Triman12.0%
Point Vert,Triman12.0%
No preservatives, Made in France, Natural flavors, No colorings, No palm oil, Nutriscore, Nutriscore Grade B, Triman, en:green-dot12.0%
Vegetarian,Fair trade,Fairtrade International,No artificial flavors,Vegan,Fairtrade cocoa,FSC,FSC Mix,Max Havelaar12.0%
Triman,Sans Nitrates12.0%
Green Dot,Made in Spain,Ce12.0%
Commerce équitable,Bio,Végétarien,Bio européen,Fairtrade International,Végétalien,PL-EKO-07,en:Soil Association Organic,The Vegan Society,en:Commerce équitable12.0%
Green Dot12.0%
Vegetariano,fr:Ponto Verde12.0%
Végétarien,Point Vert,Triman12.0%
Sans conservateurs,Fabriqué en France,Triman,Lindt & Sprüngli Cacao Farming Program12.0%
No gluten,Vegetarian,No artificial flavors,Vegan,Assured Food Standards,Green Dot,No artificial colors,No flavour enhancer,No MSG,Triman,Made-in-england,Terracycle12.0%
Commerce équitable,Bio,Végétarien,Bio européen,Fairtrade International,Agriculture non UE,Végétalien,FR-BIO-01,en:FSC,FSC Mix,Point Vert,Max Havelaar,PL-EKO-07,en:Soil Association Organic,The Vegan Society12.0%
Agriculture non UE,Fabriqué en Belgique,Fabriqué en France,Sans huile de palme,Triman12.0%
Organic,EU Organic,Non-EU Agriculture,Certified B Corporation,EU Agriculture,EU/non-EU Agriculture,FR-BIO-01,No palm oil,Pure cocoa butter,AB Agriculture Biologique,fr:Farine de blé français12.0%
Vegetarian,Fair trade,Fairtrade International,Vegan,Fairtrade cocoa,Pure cocoa butter,Rainforest Alliance,Rainforest Alliance Cocoa,Commerce-equitable12.0%
Végétarien,Source de fibres alimentaires,Point Vert,Riche en fibres,Triman,Emballage-recyclable12.0%
Halal12.0%
en:Unknown12.0%

generic_name_en categorical free_text

Likely an English-language generic product name field, but it is essentially empty: the top value is the blank string at 83.7% of non-null rows, with a further 14% null. Only 7 actual product descriptions appear across 50 rows (e.g. 'Dark chocolate', 'Crackers'), all singletons, giving cardinality 8 and entropy ratio 0.37. The long_tail alert reflects that every real value occurs exactly once.

Treatment: Drop or treat blanks as missing; too sparse and unique to use as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[99]:

saturn.columns["generic_name_en"].stats

statvalue
n50
nulls7 (14.0%)
unique8
top_value
top_rate 0.8372
cardinality 8
entropy 1.098
entropy_ratio 0.366
alert: long_tail7 singleton categories
Fig 28.
Top values for generic_name_en.
Show data table
Top values for generic_name_en (8 unique shown, of 8 total).
valuecountshare
3672.0%
Extra fine dark chocolate 90% cocoa12.0%
Dark chocolate12.0%
Compound Chocolate with MILK AND ALMONDS12.0%
Lightly sea salted potato chips12.0%
Crackers12.0%
Dark Chocolate 70% cocoa12.0%
Chocolate bar with milk and hazelnuts12.0%

weighters_tags unknown other

The column 'weighters_tags' was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. Without kind detection or sample values, its content and structure cannot be characterised here. The name suggests it may hold tag-like annotations, but this is not confirmed by evidence.

Treatment: Re-profile with parsing enabled to determine type and cardinality before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[102]:

saturn.columns["weighters_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

popularity_tags unknown other

The column `popularity_tags` was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. The only signals available are that 50 rows were seen with a null rate of 0.0, meaning every row carries some value. Cardinality, type, and distribution are all missing, so the column's actual content cannot be characterized from this evidence.

Treatment: Re-profile with the appropriate parser (likely list/JSON) before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[104]:

saturn.columns["popularity_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_fi categorical metadata

Likely a Finnish-localized product name field, but it is essentially empty: 90% nulls and the most frequent observed value is the empty string (top_rate 0.4 of the 5 non-null entries). Among the few populated rows, the names are in English (e.g., 'Excellence: 90% cocoa Dark Supreme', 'Arriba 85% Cacao Dark Chocolate'), contradicting the _fi suffix. With only 4 unique values across 50 rows, this column carries almost no usable signal.

Treatment: Drop or defer until localization coverage improves; do not use as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[106]:

saturn.columns["product_name_fi"].stats

statvalue
n50
nulls45 (90.0%)
unique4
top_value
top_rate 0.4
cardinality 4
entropy 1.922
entropy_ratio 0.961
alert: long_tail3 singleton categories
alert: null_rate90.0% null
Fig 29.
Top values for product_name_fi.
Show data table
Top values for product_name_fi (4 unique shown, of 4 total).
valuecountshare
24.0%
Excellence: 90% cocoa Dark Supreme12.0%
Arriba 85% Cacao Dark Chocolate12.0%
Original12.0%

origin_fr categorical free_text

This appears to be a French-language origin/provenance field describing where a product or its ingredients are made. The column is essentially empty: 40 of 50 rows hold the empty string and another 8% are null, leaving only 6 distinct non-blank descriptions ranging from a single country ('France') to multi-region ingredient breakdowns. Entropy ratio of 0.319 and a top_rate of 0.87 confirm the long-tail alert — almost no usable signal here.

Treatment: Drop or defer; too sparse and unstructured to use without targeted NER on the few populated strings.

anthropic:claude-opus-4-7 · confidence high
Out[109]:

saturn.columns["origin_fr"].stats

statvalue
n50
nulls4 (8.0%)
unique7
top_value
top_rate 0.8696
cardinality 7
entropy 0.8958
entropy_ratio 0.3191
alert: long_tail6 singleton categories
Fig 30.
Top values for origin_fr.
Show data table
Top values for origin_fr (7 unique shown, of 7 total).
valuecountshare
4080.0%
Fabriqué par: Aachen Allemagne12.0%
Germe de blé origine ue. Sésame origine non-ue.12.0%
France12.0%
fabriqué en France.pommes origine UE. noisettes origine UE et non UE12.0%
Fabriqué en France par Nutrition et Santé. Farine de blé: France. Figues : non UE12.0%
Pâte de cacao (Afrique de l'Ouest, Amérique du Sud)Afrique, Europe, Madagascar, Amérique du Sud, Afrique de l'Ouest12.0%

generic_name categorical free_text

Free-text generic product names, predominantly French with some English entries (e.g., "Compound Chocolate with MILK AND ALMONDS"). The dominant value is the empty string at 21/50 (top_rate 0.4375), and combined with a 0.04 null_rate this means most rows carry no usable name. The remaining 28 unique values are nearly all singletons, producing the flagged long tail.

Treatment: Treat empty strings as missing, then tokenize/normalize language before embedding or matching.

anthropic:claude-opus-4-7 · confidence high
Out[112]:

saturn.columns["generic_name"].stats

statvalue
n50
nulls2 (4.0%)
unique28
top_value
top_rate 0.4375
cardinality 28
entropy 3.663
entropy_ratio 0.762
alert: long_tail27 singleton categories
Fig 31.
Top values for generic_name.
Show data table
Top values for generic_name (20 unique shown, of 28 total).
valuecountshare
2142.0%
BISCUITS FOURRÉS (35%) PARFUM CHOCOLAT12.0%
Chocolat noir extra-fin traditionnel à 90% de cacao12.0%
Biscuits au sésame12.0%
Eau de source12.0%
Compound Chocolate with MILK AND ALMONDS12.0%
Sablé coco12.0%
Biscuit fourré à la pâte à tartiner aux noisettes et au cacao Nutella®12.0%
Pain croustillant a la farine de seigle12.0%
Chocolat noir extra-fin traditionnel12.0%
Chips de pommes de terre légèrement salées au sel de mer12.0%
Chocolat noir extra fin, traditionnel12.0%
goûters fourrés au chocolat noir12.0%
Pain croustillant à la farine complète de seigle, avoine et sésame.12.0%
Crackers12.0%
Dark Chocolate 70% cocoa12.0%
Biscuits aux pommes et aux noisettes, très pauvres en sel, riches en vitamines B1, B2, B9 et E et source de vitamines PP et B612.0%
Nuss-Nugat-Creme12.0%
Snack Salé12.0%
Biscuits au son de blé et la figue, riches en fibres, magnesium et phosphore, source de fer, et tres pauvres en sodium.12.0%

nutriscore_version categorical metadata

This column records the Nutri-Score version applied to each row, and every one of the 50 records carries the value "2023". With cardinality 1 and entropy 0, it offers no discriminative signal in this sample.

Treatment: Drop, constant column.

anthropic:claude-opus-4-7 · confidence high
Out[115]:

saturn.columns["nutriscore_version"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value 2023
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 32.
Top values for nutriscore_version.
Show data table
Top values for nutriscore_version (1 unique shown, of 1 total).
valuecountshare
202350100.0%

ingredients_without_ciqual_codes unknown other

This column, named ingredients_without_ciqual_codes, was skipped by the profiler so no descriptive statistics are available beyond a row count of 50 and a null rate of 0. The name suggests it holds ingredient entries that lack a matching CIQUAL food-database code, likely as a list or nested structure that the profiler could not introspect. Without unique counts or value samples, nothing further can be inferred.

Treatment: Re-profile after parsing the nested structure, or explode to a list before downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[118]:

saturn.columns["ingredients_without_ciqual_codes"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

manufacturing_places_tags unknown metadata

This column was skipped by the profiler, so no statistics beyond row count and null rate are available. The name suggests it holds tags for manufacturing locations, likely a multi-valued or list-like field that the dissector could not classify. With 50 rows and 0% nulls reported but no uniqueness or value stats, nothing further can be inferred from the evidence.

Treatment: Re-profile after parsing the tag list, then one-hot or multi-label encode top tags.

anthropic:claude-opus-4-7 · confidence low
Out[120]:

saturn.columns["manufacturing_places_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

photographers_tags unknown other

The column `photographers_tags` was skipped by the profiler, so no kind, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds tag annotations associated with photographers, likely a list or delimited string, but this cannot be confirmed from the evidence. No further signal is present to characterise distribution, cardinality, or content.

Treatment: Re-profile with list/string parsing enabled before deciding on downstream handling.

anthropic:claude-opus-4-7 · confidence low
Out[122]:

saturn.columns["photographers_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging_text_pl categorical metadata

Polish-language packaging text field that is effectively empty: 90% of the 50 rows are null and the remaining 10% are all the empty string, giving a single observed value and zero entropy. There is no usable signal here, only nulls and blanks.

Treatment: Drop; the column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[124]:

saturn.columns["packaging_text_pl"].stats

statvalue
n50
nulls45 (90.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate90.0% null
alert: imbalancetop value is 100.0% of rows
Fig 33.
Top values for packaging_text_pl.
Show data table
Top values for packaging_text_pl (1 unique shown, of 1 total).
valuecountshare
510.0%

informers_tags unknown other

The column `informers_tags` was skipped by the profiler, so no type, uniqueness, or value statistics were computed beyond a row count of 50 with 0% nulls. Without stats it's impossible to tell whether this holds scalar tags, delimited lists, or nested structures, though the plural name hints at a multi-valued tag field. Treat any interpretation as provisional until the column is re-profiled.

Treatment: Re-run profiling with parsing for list/JSON values before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[127]:

saturn.columns["informers_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_en categorical free_text

English-language ingredient lists for food products, stored as free-form text rather than a controlled vocabulary. With 36 unique values across 50 rows and entropy ratio 0.93, values are nearly all distinct; the only repeated 'value' is the empty string (9 occurrences, top_rate 0.20), and 12% are null, so roughly a third of rows carry no usable ingredient text. Content is heterogeneous — multi-sentence allergen-tagged lists, percentages, punctuation noise, and at least one junk entry ('Hhhhh').

Treatment: Normalize, tokenize, and embed (or parse into ingredient lists) before modelling; treat empty strings as nulls.

anthropic:claude-opus-4-7 · confidence high
Out[129]:

saturn.columns["ingredients_text_en"].stats

statvalue
n50
nulls6 (12.0%)
unique36
top_value
top_rate 0.2045
cardinality 36
entropy 4.811
entropy_ratio 0.9306
alert: long_tail35 singleton categories
Fig 34.
Top values for ingredients_text_en.
Show data table
Top values for ingredients_text_en (20 unique shown, of 36 total).
valuecountshare
918.0%
milk cream, cream, sugar, banana, bacteria12.0%
WHEAT flour 35%, whole WHEAT flour 15.7%, sugar, vegetable oils (palm, rapeseed), low-fat cocoa powder 4.5%, glucose syrup, WHEAT starch, raising agents (ammonium bicarbonate, sodium bicarbonate, disodium diphosphate), emulsifiers (SOY lecithin, sunflower lecithin), salt, skimmed MILK powder, lactose and MILK proteins, flavors, MAY CONTAIN EGG.12.0%
cocoa mass, cocoa butter, fat reduced cocoa, sugar, vanilla12.0%
Wheat flour, brown cane sugar, rapeseed oil, toasted sesame 10.6%, wheat germ 5.4%, whole wheat flour 5.4%, natural flavor, magnesium, emulsifier: lecithins, raising agents (potassium tartrates, sodium carbonates, ammonium carbonates), sea salt, wheat starch, vitamins (E, PP, B6, B1, B9).12.0%
cocoa mass, low-fat cocoa powder, cocoa butter, sugar, emulsifier: lecithin (soy), vanilla extract, may contain traces of nuts and milk,12.0%
Hhhhh12.0%
sugar, cocoa butter, whole milk powder, cocoa mass, almonds, emulsifier (soya lecithin), flavoring12.0%
cocoa mass #, cane sugar #, cocoa butter #, vanilla extract #, may contain nuts, milk,12.0%
wholemeal rye flour (77 g*), rye flour (28 g*), yeast, salt, may contain traces of milk and sesame seeds, *in g per 100 g of product,12.0%
cocoa paste, sugar, cocoa butter, vanilla,12.0%
Potatoes, sunflower oil, sea salt. May contain Milk.12.0%
cocoa mass, cocoa butter, fat-reduced cocoa powder, cane sugar, vanilla extract12.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille bourbon naturelle en gousse.12.0%
_Wheat_ flour 39%, dark chocolate 25% (cocoa mass, cane sugar, cocoa butter), unrefined brown cane sugar, wholemeal _wheat_ flour 15%, oleic sunflower oil, natural vanilla flavouring, skimmed _milk_ powder, sea salt, raising agents: ammonium carbonates, sodium carbonates, thickener: acacia gum, antioxidant: rosemary extract.12.0%
cocoa mass, sugar, cocoa butter, fat reduced cocoa powder, emulsifier: lecithins (soya), natural vanilla flavouring, dark chocolate contains: cocoa solids 74% minimum,12.0%
whole rye flour (57 g), wheat bran (27 g), oatmeal (13 g), sesame seeds (7.9 g), wheat germ, salt.12.0%
wheat flour, palm oil, glucose syrup, barley malt extract, raising agents (ammonium carbonates, sodium carbonates), salt, eggs , flavouring, flour treatment agent (sodium metabisulfite ),12.0%
cocoa mass, sugar, cocoa butter, vanilla,12.0%
Farine de maïs* (70%), farine de riz*, sel marin. * K issus de l'agriculture biologique. • sans sucres ajoutés(¹) (contient des sucres naturellement présents.12.0%

ingredients_text_it categorical free_text

Free-form Italian ingredient lists for food products, with 68% nulls and only 50 rows total. Of the 16 non-null entries, 5 are empty strings (top_rate 0.3125) and the remaining values are nearly all unique long product descriptions, yielding 12 distinct values and entropy_ratio 0.913. Effectively unstructured text rather than a categorical field, despite being typed as such.

Treatment: Treat as free text: normalize empty strings to null, then tokenize/parse for allergen or ingredient extraction rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[132]:

saturn.columns["ingredients_text_it"].stats

statvalue
n50
nulls34 (68.0%)
unique12
top_value
top_rate 0.3125
cardinality 12
entropy 3.274
entropy_ratio 0.9134
alert: long_tail11 singleton categories
alert: null_rate68.0% null
Fig 35.
Top values for ingredients_text_it.
Show data table
Top values for ingredients_text_it (12 unique shown, of 12 total).
valuecountshare
510.0%
Pasta di cacao, burro di cacao, cacao magro in polvere, zucchero. Può contenere nocciole, mandorle, altra frutta a guscio, latte, soia.12.0%
crema alle NOCCIOLE e al cacao 40% (zucchero, olio di palma, NOCCIOLE 13%, LATTE Scremato in polvere 8.7%, cacao magro 7,4%, emulsionanti: lecitine (SOIA): vanillina), farina di FRUMENTO (32%), grassi vegetali (palma, palmisto), zucchero di canna (9%), LATTOSIO, crusca di FRUMENTO, LATTE intero in polvere, estratto in polvere di malto d'ORZO e mais, miele, agenti lievitanti (difosfato disodico. carbonato acido di ammonio, carbonato acido di sodio), cacao magro, sale, amido di FRUMENTO, farina di ORZO maltato, emulsionanti: lecitine (SOIA), vanillina.12.0%
pasta di cacao, zucchero, burro di cacao, vaniglia12.0%
patate, olio di girasole, sale marino.12.0%
Pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna, vaniglia.12.0%
Farina integrale di _segale_ (59 g), crusca di _grano_ (27 g), fiocchi d'_avena_ (12 g), semi di _sesamo_ (7,0 g), germe di _grano_, sale. Può contenere tracce di _latte_.12.0%
Farina di _FRUMENTO_, olio di palma, sciroppo di glucosio, estratto di malto d'_ORZO_, agenti lievitanti (carbonati di ammonio, carbonati di sodio), sale, _UOVA_, aroma, agente di trattamento della farina (_METABISOLFITO_ di sodio).12.0%
Pasta di cacao, zucchero, burro di cacao, vaniglia.12.0%
Massa di cacao, zucchero, burro di cacao, emulsionante: lecitine (soia); estratto di vaniglia. Può contenere tracce di frutta a guscio e latte. Il 40% della massa di cacao proviene da piantagioni selezionate dell'Ecuador.12.0%
wdrated potatoes, sunflower oll, wheat flour, corn lour.test NRC b ber otin. Emulgator (E471), Salz, Farbstoff (Annatto Norbirin, k hottom (BB). Packaged in a protective atmosphere, (DE) KNAEF Kam ef s1sel colorant (n0rbixine de rocou). Peut contenir lait, soja. À conse gie vepackt. (FR) SNACK SALE. INGREDIENTS: Pommes de terre disht SNCK SALATO. : Patate disidratate, olio di girasole, (arina d frmu botisiha d annatto). Puo contenere latte, sola. Da consumarsi prelerbilmetp SEL NGREDIENTES: Batatas desidratadas, óleo de girasol, farinha de trigo.(aimha d mh e o, Pode conter leite, soja. Consumir de preferëncia antes de: ver fundo (BB), Enbazhyer OHTS Pttas deshidratadas, aceite de qirasol, harina de trigo, harina de maiz, haia ca rm e eche, soja. Consumir preferentemente antes del: ver parte interior (8B), Enast et 'Releenc itle dn 100 g | RI" /30g| Eectsge/Ayt acuilo medo 84U bole / Prodoth te /30g ji begja /Valor energetico Tpas (Grassi/ Unjdos / Grasas tan eậticte Fetsäuren / dont 2214 kJ 664 kJ 530 kcal 159 kcal adulo medio / 8% 31g 3.0 9 9.3 0.9g 17g 13% Produoad by: see yd Aii dd cassi satui / dos quais Producido por urdes thtrde | Glucites | 5% oidrati / MedaCoyK Sabd 55g 7% Uont sucres /di eui *FRSCAME QNg12.0%
25% noci, 25% mandorle, 25% uva sultanina (99,5% uva sultanina, olio di semi di girasole), 25% mirtilli rossi americani, essiccati e zuccherati (60% mirtilli rossi americani, 39% zucchero, olio di semi di girasole). Può contenere tracce di altra frutta a guscio e arachidi. Confezionato in atmosfera protettiva.12.0%

origin_de categorical feature

This appears to be a German-origin flag or label, but it carries no information in this sample: 60% of rows are null and the remaining 20 rows all hold the empty string, giving a single unique value and zero entropy. There is no signal to model on here.

Treatment: Drop; constant column with majority nulls.

anthropic:claude-opus-4-7 · confidence high
Out[135]:

saturn.columns["origin_de"].stats

statvalue
n50
nulls30 (60.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate60.0% null
alert: imbalancetop value is 100.0% of rows
Fig 36.
Top values for origin_de.
Show data table
Top values for origin_de (1 unique shown, of 1 total).
valuecountshare
2040.0%

nova_group numeric feature

This is the NOVA food classification group (1-4 scale) indicating processing level, with 3 unique values present across 50 rows and a 4% null rate. The distribution is heavily skewed toward ultra-processed foods: median is 4.0, Q1-Q3 spans 3-4, and skew of -2.06 with kurtosis 5.65 confirms a long left tail with one outlier at the low end. Despite being numeric, only 3 of the 4 possible NOVA categories appear in this sample.

Treatment: Treat as ordinal categorical rather than continuous; impute the 4% nulls with median (4) or a missing-indicator.

anthropic:claude-opus-4-7 · confidence high
Out[138]:

saturn.columns["nova_group"].stats

statvalue
n50
nulls2 (4.0%)
unique3
min 1
max 4
mean 3.646
median 4
std 0.601
q1 3
q3 4
iqr 1
skew -2.062
kurtosis 5.651
n_outliers 1
outlier_rate 0.02083
zero_rate 0
alert: high_skewskew=-2.06
Fig 37.
Distribution of nova_group. Vertical dash marks the median.
Show data table
Histogram bins for nova_group (median: 4.0).
bincount
1 – 1.51
1.5 – 20
2 – 2.50
2.5 – 30
3 – 3.514
3.5 – 433

packaging_text_fi categorical free_text

Finnish-language packaging text field that is effectively empty in this sample. 90% of the 50 rows are null and the remaining 5 rows all hold the empty string, giving a single observed value and zero entropy.

Treatment: Drop; column carries no information in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[141]:

saturn.columns["packaging_text_fi"].stats

statvalue
n50
nulls45 (90.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate90.0% null
alert: imbalancetop value is 100.0% of rows
Fig 38.
Top values for packaging_text_fi.
Show data table
Top values for packaging_text_fi (1 unique shown, of 1 total).
valuecountshare
510.0%

states categorical feature

This column packs an Open Food Facts-style product completion checklist into a single comma-joined string of `en:*-completed` / `en:*-to-be-completed` tags covering nutrition, ingredients, photos, packaging, etc. With 26 unique combinations across just 50 rows (entropy ratio 0.91) and the most common state appearing only 8 times, it behaves like a long-tail composite status flag rather than a clean category. The values are clearly multi-valued — they should be split into individual status tags before any modelling.

Treatment: Split on comma and one-hot encode each `en:*` tag instead of treating the concatenated string as a single category.

anthropic:claude-opus-4-7 · confidence high
Out[144]:

saturn.columns["states"].stats

statvalue
n50
nulls0 (0.0%)
unique26
top_value en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded
top_rate 0.16
cardinality 26
entropy 4.286
entropy_ratio 0.9119
alert: long_tail16 singleton categories
Fig 39.
Top values for states.
Show data table
Top values for states (20 unique shown, of 26 total).
valuecountshare
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded816.0%
en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded612.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded510.0%
en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded36.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-to-be-completed, en:quantity-to-be-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded24.0%
en:checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded24.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded24.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-to-be-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded24.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded24.0%
en:to-be-checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded24.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-to-be-selected, en:ingredients-photo-selected, en:front-photo-to-be-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-to-be-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-to-be-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-completed, en:packaging-code-to-be-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-to-be-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-completed, en:characteristics-to-be-completed, en:origins-to-be-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:to-be-completed, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:packaging-photo-to-be-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%
en:checked, en:complete, en:nutrition-facts-completed, en:ingredients-completed, en:expiration-date-to-be-completed, en:packaging-code-to-be-completed, en:characteristics-completed, en:origins-completed, en:categories-completed, en:brands-completed, en:packaging-completed, en:quantity-completed, en:product-name-completed, en:photos-validated, en:packaging-photo-selected, en:nutrition-photo-selected, en:ingredients-photo-selected, en:front-photo-selected, en:photos-uploaded12.0%

ingredients_with_unspecified_percent_sum numeric feature

This column appears to be a per-record sum of ingredient percentages where the precise share was not specified, expressed on a 0–100 scale (max 100.0, min 0.4). The distribution is heavily left-skewed (skew -1.18) with a median of 100.0 and Q3 also at 100.0, meaning at least half of the 50 rows have effectively all of their ingredient mass unspecified. Only 22 unique values across 50 rows and a mean of 79.4 confirm the concentration at the upper bound.

Treatment: Treat as a data-quality indicator; consider binarizing (e.g. =100 vs <100) rather than using the raw left-skewed value.

anthropic:claude-opus-4-7 · confidence high
Out[147]:

saturn.columns["ingredients_with_unspecified_percent_sum"].stats

statvalue
n50
nulls0 (0.0%)
unique22
min 0.4
max 100
mean 79.42
median 100
std 31.64
q1 53.6
q3 100
iqr 46.4
skew -1.183
kurtosis -0.133
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 40.
Distribution of ingredients_with_unspecified_percent_sum. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_with_unspecified_percent_sum (median: 100.0).
bincount
0.4 – 14.632
14.63 – 28.864
28.86 – 43.094
43.09 – 57.313
57.31 – 71.542
71.54 – 85.771
85.77 – 10034

added_countries_tags unknown other

The column 'added_countries_tags' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds country tags associated with record additions, likely a list-like or multi-valued field that the profiler could not classify. Without further stats, nothing can be said about cardinality, format, or content.

Treatment: Inspect raw values manually and re-profile after parsing into a normalized list type.

anthropic:claude-opus-4-7 · confidence low
Out[150]:

saturn.columns["added_countries_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

id categorical identifier

This column is a unique row identifier, with all 50 values distinct (n_unique=50, entropy_ratio=1.0) and no nulls. The values look like product barcodes (mostly 13-digit EAN/GTIN strings such as '6111242100992', with at least one shorter numeric like '20995553'), suggesting a product-level key rather than a sequential surrogate ID. The long_tail alert simply reflects that every value occurs exactly once (top_rate=0.02).

Treatment: Drop from modelling; retain as a join key for linking to product metadata.

anthropic:claude-opus-4-7 · confidence high
Out[152]:

saturn.columns["id"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value 6111242100992
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 41.
Top values for id.
Show data table
Top values for id (20 unique shown, of 50 total).
valuecountshare
611124210099212.0%
762221044928312.0%
304692002975912.0%
611103100506412.0%
317568001148012.0%
2099555312.0%
326884000100812.0%
336260001104412.0%
842519771202412.0%
762221057846412.0%
611125934310812.0%
336260001122812.0%
800050031042712.0%
730040048159512.0%
304692002265112.0%
506004264100012.0%
762221058472412.0%
304692002260612.0%
322982010023412.0%
2002246412.0%

nutrient_levels unknown other

The column 'nutrient_levels' was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. We only know it has 50 rows with a 0.0 null rate; uniqueness, distribution, and value structure are all missing from the evidence.

Treatment: Re-profile or manually inspect a sample to determine the underlying type before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[155]:

saturn.columns["nutrient_levels"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

sortkey numeric timestamp

Values range from 1,567,543,172 to 1,610,897,644 with a median of 1,608,147,866 — consistent with Unix epoch seconds spanning roughly 2019 to early 2021, masquerading as a numeric sort key. Distribution is heavily left-skewed (skew -2.78, kurtosis 8.09) with 4 outliers (9.1%) trailing toward older timestamps, and 12% of rows are null. The tight IQR of ~6.16M seconds (~71 days) versus a 43M-second range confirms most records cluster late in the window.

Treatment: Convert from epoch seconds to datetime and use as a temporal feature rather than a raw numeric.

anthropic:claude-opus-4-7 · confidence high
Out[157]:

saturn.columns["sortkey"].stats

statvalue
n50
nulls6 (12.0%)
unique44
min 1.568e+09
max 1.611e+09
mean 1.605e+09
median 1.608e+09
std 8.692e+06
q1 1.604e+09
q3 1.61e+09
iqr 6.16e+06
skew -2.782
kurtosis 8.091
n_outliers 4
outlier_rate 0.09091
zero_rate 0
alert: high_skewskew=-2.78
alert: outliers9.1% rows beyond 1.5 IQR
Fig 42.
Distribution of sortkey. Vertical dash marks the median.
Show data table
Histogram bins for sortkey (median: 1608147866.0).
bincount
1.568e+09 – 1.575e+091
1.575e+09 – 1.582e+091
1.582e+09 – 1.589e+091
1.589e+09 – 1.596e+091
1.596e+09 – 1.604e+095
1.604e+09 – 1.611e+0935

image_small_url categorical identifier

Per-row URL pointing to a 200px product front image hosted on images.openfoodfacts.org, with French/English locale suffixes embedded in the filename. All 50 rows are unique with zero nulls, so this acts as a row-level asset reference rather than a feature. The path segments encode the product barcode (e.g. 6111242100992), making this effectively a derivable identifier.

Treatment: Drop from modelling; retain as a fetch URL for image pipelines or extract the embedded barcode.

anthropic:claude-opus-4-7 · confidence high
Out[160]:

saturn.columns["image_small_url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 43.
Top values for image_small_url.
Show data table
Top values for image_small_url (20 unique shown, of 50 total).
valuecountshare
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/044/9283/front_en.605.200.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/9759/front_en.492.200.jpg12.0%
https://images.openfoodfacts.org/images/products/611/103/100/5064/front_fr.56.200.jpg12.0%
https://images.openfoodfacts.org/images/products/317/568/001/1480/front_en.221.200.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/099/5553/front_en.314.200.jpg12.0%
https://images.openfoodfacts.org/images/products/326/884/000/1008/front_fr.422.200.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1044/front_fr.50.200.jpg12.0%
https://images.openfoodfacts.org/images/products/842/519/771/2024/front_en.60.200.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/057/8464/front_en.29.200.jpg12.0%
https://images.openfoodfacts.org/images/products/611/125/934/3108/front_fr.25.200.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1228/front_fr.38.200.jpg12.0%
https://images.openfoodfacts.org/images/products/800/050/031/0427/front_fr.488.200.jpg12.0%
https://images.openfoodfacts.org/images/products/730/040/048/1595/front_fr.242.200.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2651/front_en.159.200.jpg12.0%
https://images.openfoodfacts.org/images/products/506/004/264/1000/front_en.179.200.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/058/4724/front_en.95.200.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2606/front_en.102.200.jpg12.0%
https://images.openfoodfacts.org/images/products/322/982/010/0234/front_fr.246.200.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/002/2464/front_en.301.200.jpg12.0%

packaging_recycling_tags unknown other

The column 'packaging_recycling_tags' was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds packaging or recyclability tags, likely a multi-valued or list-like field that the profiler could not categorise. Without parsed values, nothing can be said about cardinality, distribution, or label vocabulary.

Treatment: Re-profile after parsing into a list or one-hot tag set before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[163]:

saturn.columns["packaging_recycling_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

food_groups categorical feature

This is a categorical food taxonomy field using Open Food Facts-style prefixed slugs (e.g., 'en:biscuits-and-cakes'). The distribution is heavily concentrated on sweets: 'en:biscuits-and-cakes' (17/49) and 'en:chocolate-products' (16/49) together account for roughly two-thirds of non-null rows, with 11 distinct categories across 50 records and a 2% null rate. Entropy ratio of 0.74 confirms moderate concentration rather than uniform spread.

Treatment: Strip the 'en:' prefix and one-hot or target-encode; consider grouping the long tail of single-occurrence categories into 'other'.

anthropic:claude-opus-4-7 · confidence high
Out[165]:

saturn.columns["food_groups"].stats

statvalue
n50
nulls1 (2.0%)
unique11
top_value en:biscuits-and-cakes
top_rate 0.3469
cardinality 11
entropy 2.549
entropy_ratio 0.7367
Fig 44.
Top values for food_groups.
Show data table
Top values for food_groups (11 unique shown, of 11 total).
valuecountshare
en:biscuits-and-cakes1734.0%
en:chocolate-products1632.0%
en:appetizers48.0%
en:pastries36.0%
en:bread24.0%
en:sweets24.0%
en:dairy-desserts12.0%
en:unsweetened-beverages12.0%
en:cereals12.0%
en:dried-fruits12.0%
en:cereals-and-potatoes12.0%

nova_groups_markers unknown other

Column 'nova_groups_markers' was skipped by the profiler, so no type, uniqueness, or distribution stats are available beyond a row count of 50 and a null rate of 0.0. The name suggests it carries NOVA food-classification group markers, likely a list or structured field that the dissector could not parse. Without parsed values, nothing further can be said about its content.

Treatment: Inspect raw values manually and reparse (likely a list/struct) before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[168]:

saturn.columns["nova_groups_markers"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging_text_de categorical free_text

German-language packaging description field, almost entirely unpopulated. With a 60% null rate and the empty string accounting for 19 of the 20 non-null rows (95% top_rate), only one row carries actual content ("1 Folie aus 22 PAP zum Recyclen"). Cardinality of 2 and entropy ratio of 0.29 confirm there is virtually no usable signal here.

Treatment: Drop; effectively empty with only one informative value.

anthropic:claude-opus-4-7 · confidence high
Out[170]:

saturn.columns["packaging_text_de"].stats

statvalue
n50
nulls30 (60.0%)
unique2
top_value
top_rate 0.95
cardinality 2
entropy 0.2864
entropy_ratio 0.2864
alert: null_rate60.0% null
Fig 45.
Top values for packaging_text_de.
Show data table
Top values for packaging_text_de (2 unique shown, of 2 total).
valuecountshare
1938.0%
1 Folie aus 22 PAP zum Recyclen12.0%

categories_lc categorical feature

This column appears to hold lowercase ISO language codes, with 6 distinct values across 50 rows and no nulls. The distribution is dominated by 'fr' (25) and 'en' (19), together covering 44 of 50 rows, while 'es', 'de', 'it', and 'pl' form a thin long tail. Entropy ratio of 0.63 reflects this Franco-English skew rather than a balanced multilingual mix.

Treatment: One-hot encode, optionally collapsing rare codes (it, pl, de, es) into an 'other' bucket.

anthropic:claude-opus-4-7 · confidence high
Out[173]:

saturn.columns["categories_lc"].stats

statvalue
n50
nulls0 (0.0%)
unique6
top_value fr
top_rate 0.5
cardinality 6
entropy 1.628
entropy_ratio 0.6297
Fig 46.
Top values for categories_lc.
Show data table
Top values for categories_lc (6 unique shown, of 6 total).
valuecountshare
fr2550.0%
en1938.0%
es24.0%
de24.0%
it12.0%
pl12.0%

checkers unknown other

The column 'checkers' was skipped by the profiler, so its data type and value distribution are unknown. Only the row count (50) and null rate (0.0) are reported; n_unique and all other statistics are missing. Without further inspection there is no basis to infer what this column represents.

Treatment: Re-profile or manually inspect the column before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[176]:

saturn.columns["checkers"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging_text_es categorical free_text

Spanish-language packaging description, populated for almost none of the rows. 60% are null and of the 20 non-null entries, 19 are empty strings, leaving exactly one real value describing a recyclable cardboard box and plastic tray. Effective cardinality is 2 and entropy ratio is 0.29, so this column carries virtually no signal in this sample.

Treatment: Drop unless a larger sample shows meaningful Spanish text coverage.

anthropic:claude-opus-4-7 · confidence high
Out[178]:

saturn.columns["packaging_text_es"].stats

statvalue
n50
nulls30 (60.0%)
unique2
top_value
top_rate 0.95
cardinality 2
entropy 0.2864
entropy_ratio 0.2864
alert: null_rate60.0% null
Fig 47.
Top values for packaging_text_es.
Show data table
Top values for packaging_text_es (2 unique shown, of 2 total).
valuecountshare
1938.0%
1 caja de cartón para reciclar, 1 bandeja de plástico para reciclar12.0%

unknown_nutrients_tags unknown other

This column is labelled `unknown_nutrients_tags` and was skipped by the profiler, so no descriptive statistics, uniqueness count, or value samples are available. The only confirmed signals are that all 50 rows are non-null and the column kind is reported as 'unknown'. Without further evidence its content and structure cannot be characterised.

Treatment: Re-profile with type inference enabled before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[181]:

saturn.columns["unknown_nutrients_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

editors_tags unknown other

The column `editors_tags` was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 50 and a null rate of 0.0. Without sample values or a detected kind, the content and structure are unknown — the name suggests editor-assigned tags, possibly a list or delimited string, but this is not confirmed by evidence.

Treatment: Re-profile with list/string parsing enabled to determine structure before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[183]:

saturn.columns["editors_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients numeric metadata

This is a nutrition-score warning flag indicating whether fruit/vegetable/nut content was estimated from ingredients. Every one of the 45 non-null rows holds the value 1.0, and 10% of rows are null — so the column carries no discriminative signal in this sample, only a presence/absence distinction.

Treatment: Drop as a feature; optionally retain a binary is_null indicator if the missingness itself is meaningful.

anthropic:claude-opus-4-7 · confidence high
Out[185]:

saturn.columns["nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients"].stats

statvalue
n50
nulls5 (10.0%)
unique1
min 1
max 1
mean 1
median 1
std 0
q1 1
q3 1
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 48.
Distribution of nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients (median: 1.0).
bincount
0.5 – 0.66670
0.6667 – 0.83330
0.8333 – 10
1 – 1.16745
1.167 – 1.3330
1.333 – 1.50

labels_lc categorical label

This column appears to be a lowercase ISO language code label, with 6 distinct values across 50 rows and one null. English and French dominate at 22 occurrences each, leaving es, de, it, and pl with just 1-2 rows combined — a near-binary distribution despite the multilingual appearance. Entropy ratio of 0.61 confirms the imbalance.

Treatment: Group rare codes (es/de/it/pl) into 'other' before stratifying or one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[188]:

saturn.columns["labels_lc"].stats

statvalue
n50
nulls1 (2.0%)
unique6
top_value en
top_rate 0.449
cardinality 6
entropy 1.57
entropy_ratio 0.6072
Fig 49.
Top values for labels_lc.
Show data table
Top values for labels_lc (6 unique shown, of 6 total).
valuecountshare
en2244.0%
fr2244.0%
es24.0%
de12.0%
it12.0%
pl12.0%

nutriscore_data unknown other

The column 'nutriscore_data' was skipped by the profiler, so its kind, uniqueness, and value distribution are unknown. The only confirmed signals are 50 rows with a 0.0 null rate. Without further stats, the contents (likely a nested Nutri-Score payload given the name) cannot be characterised.

Treatment: Re-profile with nested/struct support enabled before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[191]:

saturn.columns["nutriscore_data"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

other_nutritional_substances_tags unknown other

This column is flagged as skipped by the profiler, so no statistics beyond row count (50) and a null rate of 0.0 were computed. The name suggests it holds tag-style annotations for additional nutritional substances, likely a delimited or list-valued field that the dissector could not type. Without unique counts or value samples, its actual content and cardinality remain unverified.

Treatment: Manually inspect raw values and re-profile as a multi-label tag list before deciding to encode or drop.

anthropic:claude-opus-4-7 · confidence low
Out[193]:

saturn.columns["other_nutritional_substances_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_nb categorical metadata

Norwegian product name field (suffix _nb suggests Bokmål locale) that is almost entirely empty: 96% null across 50 rows, leaving only 2 non-null observations with one being an empty string and the other '99% mørk sjokolade'. With just two distinct values and effectively no signal, this column cannot support analysis as-is.

Treatment: Drop unless joined to a richer localized catalog; null rate is too high to model.

anthropic:claude-opus-4-7 · confidence high
Out[195]:

saturn.columns["product_name_nb"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 50.
Top values for product_name_nb.
Show data table
Top values for product_name_nb (2 unique shown, of 2 total).
valuecountshare
12.0%
99% mørk sjokolade12.0%

nutrition_data_prepared_per categorical metadata

This column records the basis on which nutrition data is reported, and every one of the 50 rows carries the single value "100g". With cardinality of 1, entropy of 0, and a top_rate of 1.0, the field provides no discriminating information whatsoever.

Treatment: Drop; constant column carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[198]:

saturn.columns["nutrition_data_prepared_per"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value 100g
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 51.
Top values for nutrition_data_prepared_per.
Show data table
Top values for nutrition_data_prepared_per (1 unique shown, of 1 total).
valuecountshare
100g50100.0%

product_quantity categorical feature

Numeric product quantities stored as strings, treated here as categorical with 27 distinct values across 50 rows. The mode '100' covers 23.4% of non-nulls, but entropy ratio of 0.90 confirms a long tail with most other values appearing only once or twice. Note 6% nulls and the presence of '0' as a quantity, which may indicate missing or placeholder stock entries.

Treatment: Cast to numeric and treat as a quantitative feature; investigate zeros and nulls before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[201]:

saturn.columns["product_quantity"].stats

statvalue
n50
nulls3 (6.0%)
unique27
top_value 100
top_rate 0.234
cardinality 27
entropy 4.287
entropy_ratio 0.9017
alert: long_tail18 singleton categories
Fig 52.
Top values for product_quantity.
Show data table
Top values for product_quantity (20 unique shown, of 27 total).
valuecountshare
1001122.0%
23036.0%
4236.0%
12524.0%
50024.0%
15024.0%
9024.0%
024.0%
20024.0%
30012.0%
2212.0%
30412.0%
27512.0%
22512.0%
8512.0%
3612.0%
16012.0%
2012.0%
75012.0%
17512.0%

product_type categorical metadata

This is a categorical column recording product type, but every one of the 50 rows holds the same value, "food". Cardinality is 1 and entropy is 0, so the column carries no information for modelling or segmentation.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[204]:

saturn.columns["product_type"].stats

statvalue
n50
nulls0 (0.0%)
unique1
top_value food
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 53.
Top values for product_type.
Show data table
Top values for product_type (1 unique shown, of 1 total).
valuecountshare
food50100.0%

checkers_tags unknown other

The column `checkers_tags` was skipped by the profiler, so its kind is unknown and no statistics (uniqueness, value distribution, type) were computed. Only the row count (50) and null rate (0.0) are available; everything else is missing. The name suggests it may hold tag-like values associated with a checkers process, but this cannot be confirmed from the evidence.

Treatment: Re-run profiling or manually inspect a sample before deciding how to use this column.

anthropic:claude-opus-4-7 · confidence low
Out[207]:

saturn.columns["checkers_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nucleotides_tags unknown other

The column 'nucleotides_tags' was skipped by the profiler, so no type, uniqueness, or distribution statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds tag-style annotations related to nucleotides, likely a list or delimited string, but this cannot be confirmed from the evidence. No further signal is present to characterise its values.

Treatment: Re-profile with list/string parsing enabled before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[209]:

saturn.columns["nucleotides_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

languages_tags unknown metadata

This column is named languages_tags, suggesting it holds language metadata (likely tag strings such as locale codes) for each record. Saturn skipped detailed profiling, so no cardinality, distribution, or value samples are available beyond a row count of 50 and a null rate of 0.0. Without uniqueness or value stats, no surprises can be flagged.

Treatment: Re-profile or inspect raw values to determine structure before deciding whether to split tags and one-hot encode.

anthropic:claude-opus-4-7 · confidence low
Out[211]:

saturn.columns["languages_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

traces_lc categorical feature

This is a low-cardinality categorical column holding lowercase language codes (fr, en, es, de, it, pl), almost certainly a detected or declared language tag. The distribution is heavily concentrated on French (23/50) and English (20/50), with the top value covering 47.9% of non-null rows and entropy ratio of 0.61. Four percent of rows are null and three languages appear only once, so any per-language analysis will be unstable beyond fr/en.

Treatment: Keep fr/en as-is and bucket de/es/it/pl into an 'other' category before encoding.

anthropic:claude-opus-4-7 · confidence high
Out[213]:

saturn.columns["traces_lc"].stats

statvalue
n50
nulls2 (4.0%)
unique6
top_value fr
top_rate 0.4792
cardinality 6
entropy 1.575
entropy_ratio 0.6093
Fig 54.
Top values for traces_lc.
Show data table
Top values for traces_lc (6 unique shown, of 6 total).
valuecountshare
fr2346.0%
en2040.0%
es24.0%
de12.0%
it12.0%
pl12.0%

categories_hierarchy unknown other

The column `categories_hierarchy` was skipped by the profiler, so no type, uniqueness, or distribution stats are available. The name suggests a nested or path-like categorical structure (e.g., taxonomy levels), but this cannot be confirmed from the evidence. Only the row count (50) and null rate (0.0) are known.

Treatment: Re-profile after parsing the hierarchy (e.g., split into level columns) before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[216]:

saturn.columns["categories_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

image_front_small_url categorical metadata

URLs pointing to small front-of-pack product images on the Open Food Facts CDN, one per row. Every one of 50 values is unique (entropy_ratio 1.0, top_rate 0.02) and there are no nulls, so this acts as a per-product asset link rather than a feature. URLs mix `front_fr` and `front_en` suffixes, hinting at a French/English language mix in the source catalogue.

Treatment: Keep as a reference link; drop from modelling or fetch the images separately for vision features.

anthropic:claude-opus-4-7 · confidence high
Out[218]:

saturn.columns["image_front_small_url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 55.
Top values for image_front_small_url.
Show data table
Top values for image_front_small_url (20 unique shown, of 50 total).
valuecountshare
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.200.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/044/9283/front_en.605.200.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/9759/front_en.492.200.jpg12.0%
https://images.openfoodfacts.org/images/products/611/103/100/5064/front_fr.56.200.jpg12.0%
https://images.openfoodfacts.org/images/products/317/568/001/1480/front_en.221.200.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/099/5553/front_en.314.200.jpg12.0%
https://images.openfoodfacts.org/images/products/326/884/000/1008/front_fr.422.200.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1044/front_fr.50.200.jpg12.0%
https://images.openfoodfacts.org/images/products/842/519/771/2024/front_en.60.200.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/057/8464/front_en.29.200.jpg12.0%
https://images.openfoodfacts.org/images/products/611/125/934/3108/front_fr.25.200.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1228/front_fr.38.200.jpg12.0%
https://images.openfoodfacts.org/images/products/800/050/031/0427/front_fr.488.200.jpg12.0%
https://images.openfoodfacts.org/images/products/730/040/048/1595/front_fr.242.200.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2651/front_en.159.200.jpg12.0%
https://images.openfoodfacts.org/images/products/506/004/264/1000/front_en.179.200.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/058/4724/front_en.95.200.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2606/front_en.102.200.jpg12.0%
https://images.openfoodfacts.org/images/products/322/982/010/0234/front_fr.246.200.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/002/2464/front_en.301.200.jpg12.0%

entry_dates_tags unknown other

This column was skipped by the profiler, so no statistics beyond row count and null rate are available. The name 'entry_dates_tags' suggests a composite field combining dates and tags, likely a nested or list-like structure that didn't fit a scalar type. With 50 rows and 0% nulls, every record has some value, but its shape is unknown from this evidence.

Treatment: Inspect raw values and parse into separate date and tag columns before use.

anthropic:claude-opus-4-7 · confidence low
Out[221]:

saturn.columns["entry_dates_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ecoscore_tags unknown other

The column 'ecoscore_tags' was skipped by the profiler, so no statistics, uniqueness, or value samples are available beyond a row count of 50 and a null rate of 0.0. Based on the name alone, it likely holds Open Food Facts-style ecoscore category tags (e.g., 'en:b'), but this cannot be confirmed from the evidence. The 'skipped' alert is the key signal here and warrants a re-profile with appropriate parsing.

Treatment: Re-profile with list/tag-aware parsing before deciding on encoding or drop.

anthropic:claude-opus-4-7 · confidence low
Out[223]:

saturn.columns["ecoscore_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients numeric metadata

This appears to be a binary warning flag indicating that the fruits/vegetables/legumes share in a Nutri-Score calculation was estimated from ingredients. Every non-null value is 1.0 (n_unique=1, std=0), and 8% of rows are null, so the column carries no discriminative signal in this sample.

Treatment: Drop; constant value provides no information.

anthropic:claude-opus-4-7 · confidence high
Out[225]:

saturn.columns["nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients"].stats

statvalue
n50
nulls4 (8.0%)
unique1
min 1
max 1
mean 1
median 1
std 0
q1 1
q3 1
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: constantonly one distinct value
Fig 56.
Distribution of nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients (median: 1.0).
bincount
0.5 – 0.66670
0.6667 – 0.83330
0.8333 – 10
1 – 1.16746
1.167 – 1.3330
1.333 – 1.50

ingredients_without_ciqual_codes_n numeric feature

Counts the number of ingredients in a record that lack a CIQUAL code, so it's a data-quality feature describing how complete the ingredient mapping is. The distribution is right-skewed (skew 1.21) with a median of 3.5 but a max of 22 and one outlier; 18% of rows are already fully mapped (zero_rate 0.18). Only 15 unique values across 50 rows, so it behaves like a small ordinal count.

Treatment: Treat as a count; consider log1p or a binary 'fully mapped' flag before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[228]:

saturn.columns["ingredients_without_ciqual_codes_n"].stats

statvalue
n50
nulls0 (0.0%)
unique15
min 0
max 22
mean 4.98
median 3.5
std 4.825
q1 1
q3 7.75
iqr 6.75
skew 1.208
kurtosis 1.491
n_outliers 1
outlier_rate 0.02
zero_rate 0.18
Fig 57.
Distribution of ingredients_without_ciqual_codes_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_without_ciqual_codes_n (median: 3.5).
bincount
0 – 3.14325
3.143 – 6.2869
6.286 – 9.4298
9.429 – 12.574
12.57 – 15.713
15.71 – 18.860
18.86 – 221

rev numeric feature

A numeric revenue feature spanning 19 to 674 with a mean of 230 and median of 233.5, suggesting per-record monetary or count values. Distribution is moderately right-skewed (0.71) with a wide IQR of 237.75 and only one outlier (2%), so spread is large but not pathological. All 50 rows are populated with no zeros and 46 unique values.

Treatment: Consider a log or sqrt transform before regression to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[231]:

saturn.columns["rev"].stats

statvalue
n50
nulls0 (0.0%)
unique46
min 19
max 674
mean 230
median 233.5
std 166.6
q1 72.75
q3 310.5
iqr 237.8
skew 0.7092
kurtosis -0.02278
n_outliers 1
outlier_rate 0.02
zero_rate 0
Fig 58.
Distribution of rev. Vertical dash marks the median.
Show data table
Histogram bins for rev (median: 233.5).
bincount
19 – 112.615
112.6 – 206.19
206.1 – 299.712
299.7 – 393.36
393.3 – 486.93
486.9 – 580.43
580.4 – 6742

ingredients_non_nutritive_sweeteners_n numeric feature

This column appears to be a count of non-nutritive (artificial) sweeteners listed in a product's ingredients. Across all 50 rows it is exactly 0, with zero_rate of 1.0 and no nulls, so it carries no information in this sample.

Treatment: Drop; constant zero provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[234]:

saturn.columns["ingredients_non_nutritive_sweeteners_n"].stats

statvalue
n50
nulls0 (0.0%)
unique1
min 0
max 0
mean 0
median 0
std 0
q1 0
q3 0
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 1
alert: constantonly one distinct value
Fig 59.
Distribution of ingredients_non_nutritive_sweeteners_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_non_nutritive_sweeteners_n (median: 0.0).
bincount
-0.5 – -0.35710
-0.3571 – -0.21430
-0.2143 – -0.071430
-0.07143 – 0.0714350
0.07143 – 0.21430
0.2143 – 0.35710
0.3571 – 0.50

ingredients_without_ecobalyse_ids_n numeric feature

This is a count of ingredients on a product that lack Ecobalyse identifiers, ranging from 0 to 29 with a median of 6.5 and mean 8.16. The distribution is right-skewed (skew 1.28) with one outlier at the high end, suggesting most products have a handful of unmapped ingredients while a few have many. Only 2% are zero, meaning nearly every row has at least one ingredient missing an Ecobalyse ID — a notable data-coverage gap.

Treatment: Use as-is or log-transform if feeding into a regression; treat as a coverage-quality signal.

anthropic:claude-opus-4-7 · confidence high
Out[237]:

saturn.columns["ingredients_without_ecobalyse_ids_n"].stats

statvalue
n50
nulls0 (0.0%)
unique20
min 0
max 29
mean 8.16
median 6.5
std 5.898
q1 4
q3 11
iqr 7
skew 1.28
kurtosis 1.743
n_outliers 1
outlier_rate 0.02
zero_rate 0.02
Fig 60.
Distribution of ingredients_without_ecobalyse_ids_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_without_ecobalyse_ids_n (median: 6.5).
bincount
0 – 4.14315
4.143 – 8.28616
8.286 – 12.438
12.43 – 16.576
16.57 – 20.713
20.71 – 24.861
24.86 – 291

environment_impact_level_tags unknown other

This column was skipped by the profiler, so no type, uniqueness, or distribution stats are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds tags describing environmental impact levels, likely a list-valued or multi-label field that the profiler could not classify. Without parsed values there is no way to confirm cardinality, label vocabulary, or whether tags are single- or multi-valued.

Treatment: Re-profile after parsing the tag structure (e.g., split lists) before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[240]:

saturn.columns["environment_impact_level_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

last_image_dates_tags unknown other

This column is named `last_image_dates_tags`, suggesting it holds image-related dates and tags, but saturn skipped profiling so type and content cannot be confirmed. The only evidence available is 50 rows with no nulls; uniqueness, distribution, and value samples are all missing.

Treatment: Inspect raw values manually to determine structure before deciding on parsing or modelling.

anthropic:claude-opus-4-7 · confidence low
Out[242]:

saturn.columns["last_image_dates_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

labels_hierarchy unknown other

This column was skipped by the profiler, so its contents are uncharacterised beyond a row count of 50 and a null rate of 0.0. The name suggests a nested or structured label taxonomy, which likely tripped the profiler's type detection. No uniqueness, value, or distribution statistics are available to confirm.

Treatment: Inspect raw values manually and parse the hierarchy (e.g., split on delimiter or expand JSON) before profiling again.

anthropic:claude-opus-4-7 · confidence low
Out[244]:

saturn.columns["labels_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_en categorical free_text

Free-text English product names with 34 unique values across 50 rows and high entropy ratio (0.91), indicating heavy diversity. Notable issues: 14% nulls plus an empty-string value taking the top slot at 23.3% (10 occurrences), so effective missingness is much higher than null_rate alone suggests. Values mix languages (e.g., 'Edelbitter-Schokolade', 'Chocolat noir', 'tonik') and include junk like 'Hhhhh', flagged as long_tail.

Treatment: Normalise empties to null, language-detect, then tokenize/embed before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[246]:

saturn.columns["product_name_en"].stats

statvalue
n50
nulls7 (14.0%)
unique34
top_value
top_rate 0.2326
cardinality 34
entropy 4.654
entropy_ratio 0.9147
alert: long_tail33 singleton categories
Fig 61.
Top values for product_name_en.
Show data table
Top values for product_name_en (20 unique shown, of 34 total).
valuecountshare
1020.0%
Perly12.0%
Prince Gout Chocolat12.0%
Edelbitter-Schokolade12.0%
tonik12.0%
Gerblé - Sesame Cookie, 230g (8.2oz)12.0%
Chocolat noir - 85% cacao12.0%
Hhhhh12.0%
Organic 70% Dark Chocolate Bar12.0%
biscuits12.0%
AUTHENTIQUE12.0%
Tyrell's Lightly Sea Salted12.0%
Intense dark chocolate12.0%
Excellence 85% Cacao Chocolat Noir Puissant Lindt % Lindt12.0%
Filled - Dark Chocolate12.0%
Extra dark 74% Cocoa12.0%
Fine Rye Crispbread - Fibre12.0%
Tuc Original12.0%
Intense Dark 70% Cocoa12.0%
Gerblé - Apple Hazelnut Cookie, 230g (8.2oz)12.0%

nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value numeric feature

This appears to be a Nutri-Score warning value estimating fruit/vegetable/legume content from ingredients. The distribution is dominated by zeros (zero_rate 0.89, median and IQR both 0), with a handful of extreme values pushing the max to 50 and producing severe skew (5.93) and kurtosis (35.2). Five outliers (10.9% rate) drive the mean to 1.65 despite a std of 7.55, and 8% of rows are null.

Treatment: Binarize (zero vs non-zero) or winsorize before modelling given the heavy zero mass and extreme skew.

anthropic:claude-opus-4-7 · confidence high
Out[249]:

saturn.columns["nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value"].stats

statvalue
n50
nulls4 (8.0%)
unique6
min 0
max 50
mean 1.652
median 0
std 7.551
q1 0
q3 0
iqr 0
skew 5.932
kurtosis 35.23
n_outliers 5
outlier_rate 0.1087
zero_rate 0.8913
alert: high_skewskew=+5.93
alert: outliers10.9% rows beyond 1.5 IQR
Fig 62.
Distribution of nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_fruits_vegetables_legumes_estimate_from_ingredients_value (median: 0.0).
bincount
0 – 8.33344
8.333 – 16.671
16.67 – 250
25 – 33.330
33.33 – 41.670
41.67 – 501

traces categorical feature

This column holds comma-separated allergen/ingredient trace tags with an `en:` language prefix (e.g. `en:milk,en:nuts`), so each cell is a multi-label set rather than a single category. Across 50 rows there are 23 distinct combinations and entropy ratio 0.87, indicating high diversity, and the most common value is the empty string at 22% (11 rows) — meaning missing-as-empty rather than a true null (null_rate 0.0). The long_tail alert reflects many combinations appearing only once or twice.

Treatment: Split on commas and multi-hot encode the individual `en:` tags; treat empty string as missing.

anthropic:claude-opus-4-7 · confidence high
Out[252]:

saturn.columns["traces"].stats

statvalue
n50
nulls0 (0.0%)
unique23
top_value
top_rate 0.22
cardinality 23
entropy 3.922
entropy_ratio 0.8671
alert: long_tail16 singleton categories
Fig 63.
Top values for traces.
Show data table
Top values for traces (20 unique shown, of 23 total).
valuecountshare
1122.0%
en:milk,en:nuts714.0%
en:nuts510.0%
en:milk,en:nuts,en:sesame-seeds,en:soybeans48.0%
en:milk,en:soybeans36.0%
en:soybeans24.0%
en:lupin,en:milk,en:nuts,en:sesame-seeds,en:soybeans24.0%
en:eggs12.0%
en:milk,en:nuts,en:soybeans12.0%
en:eggs,en:lupin,en:milk,en:mustard,en:nuts,en:soybeans12.0%
en:mustard12.0%
en:lupin,en:milk,en:mustard,en:sesame-seeds,en:soybeans12.0%
en:milk12.0%
en:eggs,en:mustard,en:nuts,en:sesame-seeds,en:soybeans12.0%
en:gluten,en:Amande,en:Arachides,en:Avoine,en:Blé,en:Lait,en:Noisettes,en:Noix,en:Noix de cajou,en:Noix de macadamia,en:Noix de pécan,en:Noix du brésil,en:Orge,en:Pistaches,en:Seigle12.0%
en:lupin,en:milk,en:mustard,en:soybeans12.0%
en:gluten,en:milk12.0%
en:gluten,en:nuts,en:peanuts,en:soybeans12.0%
en:nuts,en:peanuts,en:soybeans12.0%
en:gluten,en:nuts12.0%

generic_name_fi categorical free_text

Finnish-language generic product name field, populated for only 5 of 50 rows (90% null). Among the 5 present values, all are unique with maximum entropy (2.32, ratio 1.0), and casing inconsistencies appear ("Tumma suklaa" vs "tumma suklaa") plus one empty string counted as a value.

Treatment: Normalise case and treat empty strings as null; too sparse (90% missing) to use as a feature without imputation or dropping.

anthropic:claude-opus-4-7 · confidence high
Out[255]:

saturn.columns["generic_name_fi"].stats

statvalue
n50
nulls45 (90.0%)
unique5
top_value Hieno tumma suklaa jossa 90% kaakaota
top_rate 0.2
cardinality 5
entropy 2.322
entropy_ratio 1
alert: long_tail5 singleton categories
alert: null_rate90.0% null
Fig 64.
Top values for generic_name_fi.
Show data table
Top values for generic_name_fi (5 unique shown, of 5 total).
valuecountshare
Hieno tumma suklaa jossa 90% kaakaota12.0%
Tumma suklaa12.0%
tumma suklaa12.0%
Keksejä12.0%
12.0%

emb_codes_orig categorical metadata

This appears to be original packaging or establishment codes (EMB-prefixed identifiers used on European food labels), kept in raw form. The column is sparsely populated: 34% are null and among the remaining rows the empty string dominates at roughly 85% (top_rate 0.848), leaving only 5 distinct values across 50 rows. One entry is not a code at all but a company name pair (SOLENT GMBH & CO. KG,SCHWARZ BETEILIGUNGS GMBH), suggesting inconsistent source formatting.

Treatment: Normalise empty strings to null and parse/validate the EMB code pattern before use; too sparse to model directly.

anthropic:claude-opus-4-7 · confidence medium
Out[258]:

saturn.columns["emb_codes_orig"].stats

statvalue
n50
nulls17 (34.0%)
unique5
top_value
top_rate 0.8485
cardinality 5
entropy 0.9048
entropy_ratio 0.3897
alert: long_tail3 singleton categories
alert: null_rate34.0% null
Fig 65.
Top values for emb_codes_orig.
Show data table
Top values for emb_codes_orig (5 unique shown, of 5 total).
valuecountshare
2856.0%
EMB 3125024.0%
EMB 44068A12.0%
SOLENT GMBH & CO. KG,SCHWARZ BETEILIGUNGS GMBH12.0%
EMB 6442212.0%

ingredients_with_specified_percent_n numeric feature

Counts the number of ingredients whose percentage is explicitly specified on a product label. The distribution is heavily zero-inflated (zero_rate 0.58) with median 0 and mean 1.1, but a long right tail reaches 8 (skew 1.88, kurtosis 3.68), and only 7 distinct values appear across 50 rows.

Treatment: Treat as a count feature; consider a binary 'has_specified_percent' flag plus log1p transform to tame the skew.

anthropic:claude-opus-4-7 · confidence high
Out[261]:

saturn.columns["ingredients_with_specified_percent_n"].stats

statvalue
n50
nulls0 (0.0%)
unique7
min 0
max 8
mean 1.1
median 0
std 1.729
q1 0
q3 2
iqr 2
skew 1.878
kurtosis 3.676
n_outliers 1
outlier_rate 0.02
zero_rate 0.58
Fig 66.
Distribution of ingredients_with_specified_percent_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_with_specified_percent_n (median: 0.0).
bincount
0 – 1.14336
1.143 – 2.2865
2.286 – 3.4293
3.429 – 4.5714
4.571 – 5.7141
5.714 – 6.8570
6.857 – 81

nutrition_grades categorical label

This is a Nutri-Score-style nutrition grade for each item, with six observed levels (a-e plus 'unknown'). The distribution is heavily skewed toward the worst grade: 'e' accounts for 27 of 50 rows (top_rate 0.54), while 'a' and 'b' together appear only 6 times. One row carries the literal value 'unknown' rather than null, so null_rate is 0.0 despite missing information.

Treatment: Treat as ordered categorical (a

anthropic:claude-opus-4-7 · confidence high
Out[264]:

saturn.columns["nutrition_grades"].stats

statvalue
n50
nulls0 (0.0%)
unique6
top_value e
top_rate 0.54
cardinality 6
entropy 1.913
entropy_ratio 0.7399
Fig 67.
Top values for nutrition_grades.
Show data table
Top values for nutrition_grades (6 unique shown, of 6 total).
valuecountshare
e2754.0%
d918.0%
c714.0%
a48.0%
b24.0%
unknown12.0%

weighers_tags unknown other

The column `weighers_tags` was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 50 and a null rate of 0.0. Without `n_unique` or any sample values it is impossible to tell whether this holds tag strings, arrays, or something else. The name suggests a multi-valued tag field associated with 'weighers', which would explain why the profiler couldn't fit it into a standard kind.

Treatment: Re-profile with array/text handling enabled to determine structure before use.

anthropic:claude-opus-4-7 · confidence low
Out[267]:

saturn.columns["weighers_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

categories_tags unknown other

The column `categories_tags` was skipped by the profiler, so no type, cardinality, or distribution stats are available beyond the row count (n=50) and a null rate of 0.0. The name suggests a multi-valued tag field (e.g., comma- or colon-separated category labels), but this cannot be confirmed from the evidence. No further signal is present.

Treatment: Re-profile with parsing for delimited tag lists before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[269]:

saturn.columns["categories_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

image_url categorical identifier

This column holds Open Food Facts product image URLs, one per row, all pointing to front-of-package JPEGs at 400px width. Every one of the 50 values is unique (entropy_ratio 1.0, top_rate 0.02), so it functions as a per-row asset link rather than a categorical feature. URL paths mix _fr and _en locale suffixes, hinting at a multilingual product catalog.

Treatment: Drop from modelling; retain as a reference link or fetch for downstream image features.

anthropic:claude-opus-4-7 · confidence high
Out[271]:

saturn.columns["image_url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 68.
Top values for image_url.
Show data table
Top values for image_url (20 unique shown, of 50 total).
valuecountshare
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/044/9283/front_en.605.400.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/9759/front_en.492.400.jpg12.0%
https://images.openfoodfacts.org/images/products/611/103/100/5064/front_fr.56.400.jpg12.0%
https://images.openfoodfacts.org/images/products/317/568/001/1480/front_en.221.400.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/099/5553/front_en.314.400.jpg12.0%
https://images.openfoodfacts.org/images/products/326/884/000/1008/front_fr.422.400.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1044/front_fr.50.400.jpg12.0%
https://images.openfoodfacts.org/images/products/842/519/771/2024/front_en.60.400.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/057/8464/front_en.29.400.jpg12.0%
https://images.openfoodfacts.org/images/products/611/125/934/3108/front_fr.25.400.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1228/front_fr.38.400.jpg12.0%
https://images.openfoodfacts.org/images/products/800/050/031/0427/front_fr.488.400.jpg12.0%
https://images.openfoodfacts.org/images/products/730/040/048/1595/front_fr.242.400.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2651/front_en.159.400.jpg12.0%
https://images.openfoodfacts.org/images/products/506/004/264/1000/front_en.179.400.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/058/4724/front_en.95.400.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2606/front_en.102.400.jpg12.0%
https://images.openfoodfacts.org/images/products/322/982/010/0234/front_fr.246.400.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/002/2464/front_en.301.400.jpg12.0%

sources unknown other

The column "sources" was skipped by the profiler, so its kind is unknown and no statistics (uniqueness, value distribution, type) were computed. Only two facts are available: 50 rows were seen and none were null. Without further inspection, nothing can be said about its content or structure.

Treatment: Re-profile or manually inspect this column before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[274]:

saturn.columns["sources"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

languages_hierarchy unknown other

The column 'languages_hierarchy' was skipped by the profiler, so no statistics are available beyond a row count of 50 and a null rate of 0. The name suggests a nested or structured representation of languages (likely a list or path-like string), but the dissector did not characterize its values. No uniqueness, length, or value-distribution signals are present to confirm.

Treatment: Re-profile with a parser that handles nested/structured values before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[276]:

saturn.columns["languages_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

pnns_groups_1 categorical feature

This is a PNNS food group classifier with 7 distinct categories and no nulls across 50 rows. The distribution is severely imbalanced: 'Sugary snacks' accounts for 76% of records, with entropy ratio just 0.48, suggesting the sample is dominated by one food type. Two rows are explicitly labeled 'unknown', and four other categories appear only once or twice each.

Treatment: One-hot encode, but expect the 'Sugary snacks' class to dominate any downstream model.

anthropic:claude-opus-4-7 · confidence high
Out[278]:

saturn.columns["pnns_groups_1"].stats

statvalue
n50
nulls0 (0.0%)
unique7
top_value Sugary snacks
top_rate 0.76
cardinality 7
entropy 1.36
entropy_ratio 0.4846
Fig 69.
Top values for pnns_groups_1.
Show data table
Top values for pnns_groups_1 (7 unique shown, of 7 total).
valuecountshare
Sugary snacks3876.0%
Salty snacks48.0%
Cereals and potatoes36.0%
unknown24.0%
Milk and dairy products12.0%
Beverages12.0%
Fruits and vegetables12.0%

countries_lc categorical feature

Lowercase ISO-style language or country codes with 6 distinct values across 50 rows and a 2% null rate. The distribution is heavily English-dominant (en at 28, top_rate 0.57) followed by fr at 16, leaving es/de/it/pl as singletons or near-singletons. Entropy ratio of 0.59 confirms the long tail is thin and unlikely to support per-class modelling.

Treatment: Group rare codes into an 'other' bucket and one-hot encode; impute the 2% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[281]:

saturn.columns["countries_lc"].stats

statvalue
n50
nulls1 (2.0%)
unique6
top_value en
top_rate 0.5714
cardinality 6
entropy 1.521
entropy_ratio 0.5883
Fig 70.
Top values for countries_lc.
Show data table
Top values for countries_lc (6 unique shown, of 6 total).
valuecountshare
en2856.0%
fr1632.0%
es24.0%
de12.0%
it12.0%
pl12.0%

additives_tags unknown other

The column `additives_tags` was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a list-style field enumerating food additive identifiers (e.g., E-numbers), but this cannot be confirmed from the evidence. No distributional signal is present to flag.

Treatment: Re-profile with list/string parsing enabled, then explode tags for one-hot or multi-label encoding.

anthropic:claude-opus-4-7 · confidence low
Out[284]:

saturn.columns["additives_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

codes_tags unknown other

Column `codes_tags` was skipped by the profiler, so no type inference, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a tag or code list (likely a delimited or array-valued field), but this cannot be confirmed from the evidence. Without `n_unique` or any sampled values, no distributional claims can be made.

Treatment: Re-profile with array/string parsing enabled before deciding on a downstream transform.

anthropic:claude-opus-4-7 · confidence low
Out[286]:

saturn.columns["codes_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

countries_tags unknown feature

The column `countries_tags` was skipped by the profiler (kind=unknown) so no statistics were computed beyond a 50-row count with 0% nulls. Based solely on the name, it likely holds country tag strings (e.g., comma- or colon-delimited slugs), but uniqueness, cardinality, and value distribution are all unknown here. Treat any interpretation as provisional until the column is reparsed.

Treatment: Reparse as a delimited tag list, then explode and one-hot or multi-label encode before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[288]:

saturn.columns["countries_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

creator categorical metadata

Usernames of the people or bots that created each record, with 13 distinct creators across 50 rows and no nulls. Two accounts dominate: 'openfoodfacts-contributors' at 46% (23 rows) and 'kiliweb' at 15 rows, together covering 76% of the column, while the remaining creators each appear once or twice — a classic long tail flagged in alerts.

Treatment: Collapse rare creators into an 'other' bucket before any encoding.

anthropic:claude-opus-4-7 · confidence high
Out[290]:

saturn.columns["creator"].stats

statvalue
n50
nulls0 (0.0%)
unique13
top_value openfoodfacts-contributors
top_rate 0.46
cardinality 13
entropy 2.351
entropy_ratio 0.6353
alert: long_tail10 singleton categories
Fig 71.
Top values for creator.
Show data table
Top values for creator (13 unique shown, of 13 total).
valuecountshare
openfoodfacts-contributors2346.0%
kiliweb1530.0%
javichu24.0%
meryemali12.0%
vichenze12.0%
mllep12.0%
andre12.0%
sqoia12.0%
shaolan12.0%
tacite12.0%
mambl12.0%
norbert45fr12.0%
date-limite-app12.0%

ingredients unknown free_text

This column is named 'ingredients' but saturn skipped profiling it (kind=unknown, no stats computed). Across 50 rows there are zero nulls, but uniqueness, types, and value distribution are all unknown. Based on the name alone it is likely a list or free-text field of recipe components, which is why a generic profiler bailed out.

Treatment: Parse into a list and one-hot or tokenize/embed before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[293]:

saturn.columns["ingredients"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_nl categorical free_text

Dutch-language product names, but the column is mostly empty: 76% of rows are null and of the 12 populated rows, 6 are blank strings, leaving only 6 actual names across 7 unique values. The surviving entries are a language mix (English 'Dark absolute', French 'Tartines craquantes multi-céréales', Dutch 'Volkoren cracotte'), so the field is not consistently Dutch despite its name.

Treatment: Drop or defer; too sparse and linguistically inconsistent to use without upstream cleanup.

anthropic:claude-opus-4-7 · confidence high
Out[295]:

saturn.columns["product_name_nl"].stats

statvalue
n50
nulls38 (76.0%)
unique7
top_value
top_rate 0.5
cardinality 7
entropy 2.292
entropy_ratio 0.8166
alert: long_tail6 singleton categories
alert: null_rate76.0% null
Fig 72.
Top values for product_name_nl.
Show data table
Top values for product_name_nl (7 unique shown, of 7 total).
valuecountshare
612.0%
Excellence 70% Cocoa Intense Dark12.0%
Tartines craquantes multi-céréales12.0%
Dark absolute12.0%
Nuts & Fruits Mix12.0%
Granola12.0%
Volkoren cracotte12.0%

ingredients_n_tags unknown other

The column 'ingredients_n_tags' was skipped by the profiler, so no statistics, uniqueness, or type information are available beyond a row count of 50 and a null rate of 0.0. The name suggests a count of ingredient tags, but this cannot be confirmed from the evidence. Without stats, any downstream assumption about its distribution or role is unsupported.

Treatment: Re-profile with type coercion to determine whether this is numeric before use.

anthropic:claude-opus-4-7 · confidence low
Out[298]:

saturn.columns["ingredients_n_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origin_es categorical metadata

This appears to be a Spanish-language origin field, but it carries no usable signal in this sample. 60% of rows are null and the remaining 20 non-null entries are all the empty string, giving cardinality 1 and entropy 0.

Treatment: Drop; the column has no variation and a 60% null rate.

anthropic:claude-opus-4-7 · confidence high
Out[300]:

saturn.columns["origin_es"].stats

statvalue
n50
nulls30 (60.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate60.0% null
alert: imbalancetop value is 100.0% of rows
Fig 73.
Top values for origin_es.
Show data table
Top values for origin_es (1 unique shown, of 1 total).
valuecountshare
2040.0%

product_name_pl categorical metadata

Polish-language product names, populated for only 10% of rows (null_rate 0.9) with just 3 distinct values across 50 records. The top value is the empty string at 60%, leaving only two real product names ('Czekolada gorzka 74%' and 'Excellence 70% Cocoa Intense Dark') appearing once each. Both the long_tail and null_rate alerts fire, and empty strings are being counted as a category rather than nulls.

Treatment: Normalise empty strings to null and treat as a sparse localisation field; drop unless Polish-market analysis is required.

anthropic:claude-opus-4-7 · confidence high
Out[303]:

saturn.columns["product_name_pl"].stats

statvalue
n50
nulls45 (90.0%)
unique3
top_value
top_rate 0.6
cardinality 3
entropy 1.371
entropy_ratio 0.865
alert: long_tail2 singleton categories
alert: null_rate90.0% null
Fig 74.
Top values for product_name_pl.
Show data table
Top values for product_name_pl (3 unique shown, of 3 total).
valuecountshare
36.0%
Czekolada gorzka 74%12.0%
Excellence 70% Cocoa Intense Dark12.0%

scores unknown other

The column 'scores' was skipped by the profiler and reports kind 'unknown', so no statistics, uniqueness, or value distribution were computed. The only confirmed signals are 50 rows and a 0.0 null rate; everything else is missing. Without type inference or sample values, the column's actual content (numeric scores, lists, structured objects) cannot be determined from this evidence.

Treatment: Re-profile with type coercion or inspect raw values before deciding on a downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[306]:

saturn.columns["scores"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

brands categorical feature

Brand name of each product, with 41 distinct values across 50 rows and no nulls. The distribution is essentially flat (entropy ratio 0.97), with Lindt leading at just 8% (4 occurrences) and most brands appearing once — a long tail flagged explicitly. One value is in Arabic script (عربي), suggesting mixed-language entries that may need normalization.

Treatment: Group rare brands into an 'other' bucket and normalize encodings/scripts before one-hot or target encoding.

anthropic:claude-opus-4-7 · confidence high
Out[308]:

saturn.columns["brands"].stats

statvalue
n50
nulls0 (0.0%)
unique41
top_value Lindt
top_rate 0.08
cardinality 41
entropy 5.214
entropy_ratio 0.9731
alert: long_tail36 singleton categories
Fig 75.
Top values for brands.
Show data table
Top values for brands (20 unique shown, of 41 total).
valuecountshare
Lindt48.0%
Gerblé36.0%
Excelo36.0%
Henry's24.0%
Pringles24.0%
Perly12.0%
LU12.0%
عربي12.0%
J. D. Gross12.0%
Cristaline12.0%
Maruja12.0%
Green & Black's12.0%
Nutella12.0%
wasa12.0%
Tyrrell's12.0%
Green and black12.0%
Bjorg12.0%
fin CARRÉ12.0%
Wasa12.0%
Henry’s12.0%

ingredients_text_de categorical free_text

German-language ingredient declarations, likely scraped from product packaging (e.g. Kakaomasse, Zucker, Weizenmehl). Coverage is poor: 60% null and the most common value is the empty string (5/50, 25% of non-nulls), while the remaining 16 unique strings are essentially free text with allergen markup like _SOJA_ and _WEIZENMEHL_. Entropy ratio of 0.94 confirms each populated row is nearly unique, so this behaves as free text rather than a category.

Treatment: Treat as free text: parse comma-separated tokens or embed; do not one-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[311]:

saturn.columns["ingredients_text_de"].stats

statvalue
n50
nulls30 (60.0%)
unique16
top_value
top_rate 0.25
cardinality 16
entropy 3.741
entropy_ratio 0.9354
alert: long_tail15 singleton categories
alert: null_rate60.0% null
Fig 76.
Top values for ingredients_text_de.
Show data table
Top values for ingredients_text_de (16 unique shown, of 16 total).
valuecountshare
510.0%
Kakaomasse, Kakaobutter, fettarmes Kakaopulver, Zucker, Vanille12.0%
Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Zucker, Emulgator: Lecithine (Soja); Vanilleextrakt.12.0%
Nuss-Nugat-Creme 40 % (Zucker, Palmöl, _HASELNÜSSE_ 13 %, _MAGERMILCHPULVER_ 8.7%, fettarmer Kakao 7,4 %, Emulgator Lecithine (_SOJA_), Vanillin), _WEIZENMEHL_ (32,5 %), pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5 % (enthält _WEIZEN_), _MILCHZUCKER_, _WEIZENKLEIE_, _VOLLMILCHPULVER_, _GERSTENMALZ_ - und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _WEIZENSTÄRKE_, _GERSTENMALZMEHL_, Emulgator Lecithine (_SOJA_), Vanillin12.0%
Kakaomasse, Zucker, Kakaobutter, Vanille12.0%
Kartoffeln, Sonnenblumenöl, Meersalz.12.0%
Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker, Vanille. Kann Schalenfrüchte, Milch, Soja, Sesamsamen und Weizen enthalten.12.0%
kakaomass of*, zucker, kakaobutter, kakaopulver stark entöit, emulgator: sonnenblumenlecithine (e-322), natürliche in vanille-aroma, * rainforest alliance certified, cocoa: 74% mindestens,12.0%
_WEIZENMEHL_, Palmöl, Glukosesirup, _GERSTENMALZEXTRAKT_, Backtriebmittel (Ammoniumcarbonate, Natriumcarbonate), Speisesalz 1,4 %, _EIER_, Aroma, Mehlbehandlungsmittel (_NATRIUMMETABISULFIT_).12.0%
Kakaomasse, Zucker, Kakaobutter, Emulgator: Lecithine (_Soja_); Vanilleextrakt.12.0%
Kartoffelpüreepulver, pflanzliche Öle (Sonnenblume, Palm, Mais) in veränderlichen Gewichtsanteilen, Weizenmehl, Maismehl, Reismehl, Maltodextrin, Emulgator (E471), Salz, Farbstoff (Annatto Norbixin).12.0%
Kakaomasse, fettarmes Kakaopulver, Kakaobutter . Kann Schalenfrüchte, Milch und Soja enthalten.12.0%
Alpenmilch Schokolade. Zutaten: Zucker, Kakaobutter, Magermilchpulver, Kakaomasse, Süßmolkenpulver (aus Milch), Butterreinfett, Haselnüsse, Emulgatoren (Sojalecithin, E476), Aroma. Kakao: 30 % mindestens. Kann andere Nüsse und Weizen enthalten. Ohne Farbstoffe** und Konservierungsstoffe** -**Gemäß rechtlicher Vorschriften.12.0%
Kakaomasse¹, Rohrzucker¹, Kakaobutter¹, Emulgator: Lecithine (_Soja_)¹. ¹aus kontrolliert ökologischem Anbau.12.0%
25% _Walnusskerne_, 25% _Mandeln_, 25% Sultaninen geschwefelt (Sultaninen, Sonnenblumenöl, Konservierungsstoff: _Schwefeldioxid_), 25% Cranberries (Cranberries, Zucker, Sonnenblumenöl).12.0%
Kakaomasse, Zucker, Kakaobutter, Emulgator (Sojalecithin), Vanille. Kann Haselnüsse, Mandeln, Milch enthalten.12.0%

ingredients_text_nb categorical free_text

This appears to be a Norwegian Bokmål ingredients text field, likely from a multilingual product dataset. It is effectively empty: 96% of the 50 rows are null, and the only non-null value across the remaining 2 rows is an empty string, leaving cardinality at 1 and entropy at 0.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[314]:

saturn.columns["ingredients_text_nb"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 77.
Top values for ingredients_text_nb.
Show data table
Top values for ingredients_text_nb (1 unique shown, of 1 total).
valuecountshare
24.0%

packagings_n numeric feature

Likely a count of packaging components per product, ranging from 1 to 5 with a mean of 2.07 and median of 2. The IQR is 0 because Q1 and Q3 both equal 2, which mechanically labels nearly half the rows (outlier_rate 0.488, n_outliers 20) as outliers — a quirk of the IQR rule on a low-cardinality integer, not a data quality issue. Note the 18% null rate and only 5 distinct values across 50 rows.

Treatment: Treat as a small-count integer feature; impute the 18% nulls and ignore the IQR-flagged outliers.

anthropic:claude-opus-4-7 · confidence high
Out[317]:

saturn.columns["packagings_n"].stats

statvalue
n50
nulls9 (18.0%)
unique5
min 1
max 5
mean 2.073
median 2
std 0.8772
q1 2
q3 2
iqr 0
skew 0.9834
kurtosis 1.602
n_outliers 20
outlier_rate 0.4878
zero_rate 0
alert: outliers48.8% rows beyond 1.5 IQR
Fig 78.
Distribution of packagings_n. Vertical dash marks the median.
Show data table
Histogram bins for packagings_n (median: 2.0).
bincount
1 – 1.66710
1.667 – 2.33321
2.333 – 30
3 – 3.6678
3.667 – 4.3331
4.333 – 51

complete numeric label

Binary 0/1 indicator (n_unique=2, min=0, max=1) likely flagging completion status. Only 32% of rows are marked complete (mean=0.32, zero_rate=0.68), so the negative class dominates roughly 2:1. No nulls or outliers across the 50 rows.

Treatment: Treat as binary target; account for the 68/32 class imbalance during modelling.

anthropic:claude-opus-4-7 · confidence high
Out[320]:

saturn.columns["complete"].stats

statvalue
n50
nulls0 (0.0%)
unique2
min 0
max 1
mean 0.32
median 0
std 0.4712
q1 0
q3 1
iqr 1
skew 0.7717
kurtosis -1.404
n_outliers 0
outlier_rate 0
zero_rate 0.68
Fig 79.
Distribution of complete. Vertical dash marks the median.
Show data table
Histogram bins for complete (median: 0.0).
bincount
0 – 0.142934
0.1429 – 0.28570
0.2857 – 0.42860
0.4286 – 0.57140
0.5714 – 0.71430
0.7143 – 0.85710
0.8571 – 116

emb_codes_20141016 categorical metadata

This appears to be a packager/manufacturer code field from an Open Food Facts-style export, dated 2014-10-16, mixing French EMB establishment codes (e.g., 'EMB 44068A') with free-text manufacturer descriptors in multiple languages (German, Spanish). With only 50 rows, 58% are null and another 30% (15/50) are empty strings as the top value, leaving just 6 distinct non-empty entries — each appearing exactly once. Entropy ratio of 0.57 and the dominance of blanks make this column nearly unusable as-is.

Treatment: Drop or defer; coverage is too sparse and values too heterogeneous to feature-engineer without a dedicated parser.

anthropic:claude-opus-4-7 · confidence high
Out[323]:

saturn.columns["emb_codes_20141016"].stats

statvalue
n50
nulls29 (58.0%)
unique7
top_value
top_rate 0.7143
cardinality 7
entropy 1.602
entropy_ratio 0.5705
alert: long_tail6 singleton categories
alert: null_rate58.0% null
Fig 80.
Top values for emb_codes_20141016.
Show data table
Top values for emb_codes_20141016 (7 unique shown, of 7 total).
valuecountshare
1530.0%
LINDT & SPRÜNGLI SAS,CHOCOLADEFABRIKEN LINDT & SPRÜNGLI AG12.0%
EMB 44068A12.0%
//HERSTELLER UND VERPACKER://,SOLENT GMBH & CO. KG,//DIE ZUGEHÖRIGKEIT ZU://,SCHWARZ BETEILIGUNGS GMBH12.0%
//FABRICANTE Y ENVASADOR://,LINDT & SPRÜNGLI SAS,//PERTENECIENTE A://,CHOCOLADEFABRIKEN LINDT & SPRÜNGLI AG12.0%
//FABRICANTE Y ENVASADOR://,RAUSCH SCHOKOLADEN GMBH12.0%
EMB 6442212.0%

ingredients_tags unknown free_text

The column 'ingredients_tags' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available. The only confirmed signals are 50 rows with a 0.0 null rate. The name suggests a list-valued or delimited tag field (e.g., ingredient identifiers), which would explain why standard profiling bailed out.

Treatment: Parse/explode the tag list and one-hot or embed the individual ingredients before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[326]:

saturn.columns["ingredients_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging_text_ja categorical free_text

Japanese-language packaging text, almost entirely absent: 98% of the 50 rows are null and the single non-null value is itself an empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here for any downstream task.

Treatment: Drop the column; it is effectively empty.

anthropic:claude-opus-4-7 · confidence high
Out[328]:

saturn.columns["packaging_text_ja"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 81.
Top values for packaging_text_ja.
Show data table
Top values for packaging_text_ja (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_de categorical free_text

German-language generic product name, likely a free-text label for food items (chocolates, biscuits, spreads). Coverage is poor: null_rate is 0.6 and the top value is the empty string at 12 occurrences, while the remaining 9 unique values each appear once, indicating no repetition across products. Entropy_ratio of 0.68 reflects the empty-string mass dominating an otherwise unique long tail.

Treatment: Treat as free text; impute missing and normalize/tokenize before any categorical use.

anthropic:claude-opus-4-7 · confidence high
Out[331]:

saturn.columns["generic_name_de"].stats

statvalue
n50
nulls30 (60.0%)
unique9
top_value
top_rate 0.6
cardinality 9
entropy 2.171
entropy_ratio 0.6849
alert: long_tail8 singleton categories
alert: null_rate60.0% null
Fig 82.
Top values for generic_name_de.
Show data table
Top values for generic_name_de (9 unique shown, of 9 total).
valuecountshare
1224.0%
Edelbitterschokolade 90% Kakao12.0%
Kekse mit Nuss-Nougat-Creme-Füllung12.0%
Extra feine dunkle Schokolade12.0%
Edelbitter-Schokolade 74% Kakao12.0%
Kräcker12.0%
Edel-Bitter-Schokolade. Ecuador 70% Kakao12.0%
Nuss-Nugat-Creme12.0%
Alpenmilch-Schokolade12.0%

last_editor categorical metadata

Likely the username or bot handle that last edited each record. One contributor, "foodless", dominates with 21 of 50 rows (top_rate 0.43), while the remaining 49 rows spread across 23 other editors, producing a long tail and entropy ratio of 0.77. Roughly 2% of values are null, and several handles look like apps/bots (e.g., municorn-calorie-counter-app, macrofactor) mixed with human usernames.

Treatment: Group rare editors into an "other" bucket and keep as a categorical provenance feature.

anthropic:claude-opus-4-7 · confidence high
Out[334]:

saturn.columns["last_editor"].stats

statvalue
n50
nulls1 (2.0%)
unique24
top_value foodless
top_rate 0.4286
cardinality 24
entropy 3.513
entropy_ratio 0.7662
alert: long_tail19 singleton categories
Fig 83.
Top values for last_editor.
Show data table
Top values for last_editor (20 unique shown, of 24 total).
valuecountshare
foodless2142.0%
municorn-calorie-counter-app36.0%
charlesnepote24.0%
macrofactor24.0%
bodysupport24.0%
moon-rabbit12.0%
gmlaa12.0%
prepperapp12.0%
marmotte7312.0%
laura-chaud12.0%
org-barilla-france-sa12.0%
tom170712.0%
bubu6312.0%
moncoachigbas12.0%
natrius12.0%
clxtng12.0%
roboto-app12.0%
fgouget12.0%
ludolm12.0%
foodiq12.0%

minerals_prev_tags unknown other

The column `minerals_prev_tags` was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 50 and a null rate of 0.0. Without `n_unique` or any descriptive stats, the content and structure of this field are unknown. The name suggests it may hold prior tag annotations related to minerals, but this cannot be confirmed from the evidence.

Treatment: Re-profile with the appropriate parser (likely list/string) before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[337]:

saturn.columns["minerals_prev_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

last_image_t numeric timestamp

Values are 10-digit integers ranging from 1,639,159,016 to 1,767,675,445 with a median of 1,752,195,111 — these are Unix epoch seconds, so the column is a 'last image' timestamp spanning roughly late 2021 through 2025. All 50 rows are unique with no nulls or zeros, but the distribution is strongly left-skewed (skew -2.44, kurtosis 7.36) with 2 outliers (4%) sitting far below the bulk, indicating a few very stale records against an otherwise recent cluster.

Treatment: Cast from epoch seconds to datetime and derive recency features rather than using the raw integer.

anthropic:claude-opus-4-7 · confidence high
Out[339]:

saturn.columns["last_image_t"].stats

statvalue
n50
nulls0 (0.0%)
unique50
min 1.639e+09
max 1.768e+09
mean 1.745e+09
median 1.752e+09
std 2.681e+07
q1 1.735e+09
q3 1.764e+09
iqr 2.896e+07
skew -2.443
kurtosis 7.36
n_outliers 2
outlier_rate 0.04
zero_rate 0
alert: high_skewskew=-2.44
Fig 84.
Distribution of last_image_t. Vertical dash marks the median.
Show data table
Histogram bins for last_image_t (median: 1752195111.0).
bincount
1.639e+09 – 1.658e+092
1.658e+09 – 1.676e+090
1.676e+09 – 1.694e+090
1.694e+09 – 1.713e+090
1.713e+09 – 1.731e+095
1.731e+09 – 1.749e+0917
1.749e+09 – 1.768e+0926

obsolete_since_date categorical metadata

This appears to be a date column marking when items became obsolete, but it carries no usable information in this sample. Across 50 rows there is a single non-null distinct value — the empty string — making up 100% of non-nulls (44 of 44), with a 12% null rate on top. Entropy is 0.0 and cardinality is 1, so the field is effectively blank.

Treatment: Drop; the column is constant (empty) and offers no signal.

anthropic:claude-opus-4-7 · confidence high
Out[342]:

saturn.columns["obsolete_since_date"].stats

statvalue
n50
nulls6 (12.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 85.
Top values for obsolete_since_date.
Show data table
Top values for obsolete_since_date (1 unique shown, of 1 total).
valuecountshare
4488.0%

pnns_groups_2_tags unknown other

Column `pnns_groups_2_tags` was skipped by the profiler, so no statistics, uniqueness count, or value samples are available. The only confirmed signals are 50 rows present and a 0.0 null rate. The name suggests Open Food Facts PNNS group-2 category tags, typically a low-cardinality categorical, but this cannot be verified from the evidence.

Treatment: Re-run the profiler on this column to determine type before deciding; if categorical tags, one-hot or target-encode.

anthropic:claude-opus-4-7 · confidence low
Out[345]:

saturn.columns["pnns_groups_2_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

emb_codes_tags unknown other

The column 'emb_codes_tags' was skipped by saturn, so no statistics beyond row count (50) and a null rate of 0.0 are available. The name suggests it holds embossing or packaging code tags, likely a multi-valued categorical string field, but uniqueness, cardinality, and value distribution are unknown. Without further profiling no surprises can be flagged.

Treatment: Re-profile with string/list parsing enabled before deciding whether to one-hot encode or drop.

anthropic:claude-opus-4-7 · confidence low
Out[347]:

saturn.columns["emb_codes_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

countries_beforescanbot categorical feature

This appears to be a multi-country list field (likely product distribution countries from an Open Food Facts–style source, captured before a scan-bot pass). With 38 unique values across 50 rows and entropy_ratio 0.965, it is nearly free-form: France leads at only 6 occurrences (top_rate 0.14), and many cells are comma-separated lists. Values mix languages (French 'Belgique', Spanish 'Bélgica', Dutch 'nl:Duitsland', English 'en:Morocco') and taxonomy-prefixed codes, plus a 14% null rate.

Treatment: Split on comma, normalize language variants and 'xx:' prefixes to ISO country codes, then multi-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[349]:

saturn.columns["countries_beforescanbot"].stats

statvalue
n50
nulls7 (14.0%)
unique38
top_value France
top_rate 0.1395
cardinality 38
entropy 5.066
entropy_ratio 0.9653
alert: long_tail37 singleton categories
Fig 86.
Top values for countries_beforescanbot.
Show data table
Top values for countries_beforescanbot (20 unique shown, of 38 total).
valuecountshare
France612.0%
Maroc12.0%
Belgique,France,Polynésie française,Guadeloupe,Luxembourg,Portugal,La Réunion12.0%
Argelia,Bélgica,República Checa,Finlandia,Francia,Polinesia Francesa,Alemania,Italia,Mauricio,Marruecos,Países Bajos,Reunión,Singapur,España,Suecia,Suiza,Reino Unido12.0%
en:Morocco12.0%
nl:Duitsland,nl:Slovenië,nl:Spanje,nl:Frankrijk12.0%
Belgique,Côte d'Ivoire,France,Luxembourg,Mali,Martinique,Russie,Suisse,Royaume-Uni12.0%
Algérie, Cameroun, France, Maroc, en:spain12.0%
France,Suède,Royaume-Uni12.0%
France,Allemagne,Italie12.0%
France,Italie,Espagne,Suisse12.0%
Česko,Francie,Německo,Guadeloupe,Itálie,en:algerie,en:espagne,en:la-reunion,en:royaume-uni,en:suisse12.0%
Belgique,France,Royaume-Uni12.0%
Austria,France,Italy,Réunion,Spain,Alemania,Belgica,Francia,Paises-bajos,Suiza12.0%
Finland,France,Germany,Spain12.0%
France,Guadeloupe,La Réunion,Suisse,en:en12.0%
en:fr12.0%
Germany12.0%
Australia, Belgium, Denmark, Estonia, France, Germany, Hungary, Italy, Lebanon, Portugal, Serbia, Spain, Switzerland, United Kingdom, en:nl12.0%
Belgique,France,Pays-Bas,Sénégal12.0%

nutrition_grade_fr categorical label

This is the French Nutri-Score grade (a-e) for each food item, with one row coded as 'unknown'. The distribution is heavily skewed toward the worst grade: 'e' alone covers 54% of the 50 rows, and grades d+e together dominate while only 6 rows are 'a' or 'b'. Entropy ratio of 0.74 confirms moderate concentration rather than a balanced ordinal spread.

Treatment: Treat as ordered categorical (a

anthropic:claude-opus-4-7 · confidence high
Out[352]:

saturn.columns["nutrition_grade_fr"].stats

statvalue
n50
nulls0 (0.0%)
unique6
top_value e
top_rate 0.54
cardinality 6
entropy 1.913
entropy_ratio 0.7399
Fig 87.
Top values for nutrition_grade_fr.
Show data table
Top values for nutrition_grade_fr (6 unique shown, of 6 total).
valuecountshare
e2754.0%
d918.0%
c714.0%
a48.0%
b24.0%
unknown12.0%

data_quality_tags unknown other

The column 'data_quality_tags' was skipped by the profiler, so no kind, uniqueness, or value statistics are available. The only confirmed signals are 50 rows with a null_rate of 0.0, meaning every row has some value, but its content and cardinality are unknown.

Treatment: Re-profile or manually inspect before use; the profiler skipped this column.

anthropic:claude-opus-4-7 · confidence low
Out[355]:

saturn.columns["data_quality_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_with_specified_percent_sum numeric feature

This appears to be a numeric feature summing the percentages of ingredients whose proportions are explicitly disclosed (likely on food product labels). The distribution is heavily zero-inflated with a zero_rate of 0.58 and median of 0.0, while non-zero values stretch up to 99.6 with mean 22.74 and std 32.88. The right skew (0.998) and bimodal shape (q1=0, q3=52.25) suggest two regimes: products with no specified percentages and those with substantial disclosure.

Treatment: Consider a hurdle approach: a binary 'has_disclosure' flag plus the continuous value, since 58% of rows are zero.

anthropic:claude-opus-4-7 · confidence high
Out[357]:

saturn.columns["ingredients_with_specified_percent_sum"].stats

statvalue
n50
nulls0 (0.0%)
unique22
min 0
max 99.6
mean 22.74
median 0
std 32.88
q1 0
q3 52.25
iqr 52.25
skew 0.9979
kurtosis -0.5856
n_outliers 0
outlier_rate 0
zero_rate 0.58
Fig 88.
Distribution of ingredients_with_specified_percent_sum. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_with_specified_percent_sum (median: 0.0).
bincount
0 – 14.2333
14.23 – 28.460
28.46 – 42.692
42.69 – 56.914
56.91 – 71.145
71.14 – 85.374
85.37 – 99.62

origin_it categorical feature

This column appears to be an origin flag for Italy, but it carries no information: of 50 rows, 68% are null and the remaining 16 non-null values are all empty strings, giving a single unique value and zero entropy. There is no signal here to model on.

Treatment: Drop; constant column with majority nulls.

anthropic:claude-opus-4-7 · confidence high
Out[360]:

saturn.columns["origin_it"].stats

statvalue
n50
nulls34 (68.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate68.0% null
alert: imbalancetop value is 100.0% of rows
Fig 89.
Top values for origin_it.
Show data table
Top values for origin_it (1 unique shown, of 1 total).
valuecountshare
1632.0%

nutrition_data_per categorical metadata

This column records the basis on which nutrition values are reported, taking only two values: '100g' and 'serving'. The encoding is heavily skewed, with 84% of the 50 rows using '100g' and the remaining 8 rows using 'serving', and there are no nulls. Analysts should note that nutrition figures in other columns are not directly comparable across rows without normalising to a common basis.

Treatment: Use as a grouping flag and normalise nutrition fields to a single basis before aggregation.

anthropic:claude-opus-4-7 · confidence high
Out[363]:

saturn.columns["nutrition_data_per"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value 100g
top_rate 0.84
cardinality 2
entropy 0.6343
entropy_ratio 0.6343
Fig 90.
Top values for nutrition_data_per.
Show data table
Top values for nutrition_data_per (2 unique shown, of 2 total).
valuecountshare
100g4284.0%
serving816.0%

origin_pl categorical metadata

This appears to be an origin-related categorical field (likely a place/location code from the column name 'origin_pl'), but it carries almost no information here. 90% of the 50 rows are null, and the remaining 5 non-null entries are all empty strings, giving cardinality 1 and entropy 0.

Treatment: Drop; the column is 90% null with only empty strings remaining.

anthropic:claude-opus-4-7 · confidence high
Out[366]:

saturn.columns["origin_pl"].stats

statvalue
n50
nulls45 (90.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate90.0% null
alert: imbalancetop value is 100.0% of rows
Fig 91.
Top values for origin_pl.
Show data table
Top values for origin_pl (1 unique shown, of 1 total).
valuecountshare
510.0%

product unknown other

The column 'product' was skipped by the profiler, so no type, cardinality, or value statistics are available beyond a row count of 50 and a null rate of 0.0. Without unique counts or sample values, the role of this column cannot be inferred from the evidence.

Treatment: Re-run the profiler on this column to obtain stats before deciding on treatment.

anthropic:claude-opus-4-7 · confidence low
Out[369]:

saturn.columns["product"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown
Out[371]:

saturn.columns["link"].stats

statvalue
n50
nulls2 (4.0%)
unique28
top_value
top_rate 0.4375
cardinality 28
entropy 3.663
entropy_ratio 0.762
alert: long_tail27 singleton categories
Fig 92.
Top values for link.
Show data table
Top values for link (20 unique shown, of 28 total).
valuecountshare
2142.0%
www.copag.ma12.0%
https://www.lu.fr/prince12.0%
http://www.lindt.es/swf/spa/productos/excellence/altos-porcentajes/excellence-90/www.lindt.com12.0%
https://www.gerble.fr/vitalite/biscuit-sesame12.0%
https://www.nutella.com/de/de/produkte/nutella-biscuits12.0%
http://www.wasa.fr/produits/tartines-croustillantes/authentique/pack/12.0%
https://www.lindt.fr/excellence-noir-7012.0%
https://www.tyrrellscrisps.co.uk/range/potato-crisps/lightly-sea-salted/12.0%
https://www.lindt.fr/excellence-noir-8512.0%
www.bjorg.fr12.0%
https://www.wasa.fr12.0%
www.henrys.ma12.0%
https://www.tuc.eu/produkte_de_at#tuc-prod-412.0%
http://www.lindt.es/swf/spa/productos/excellence/altos-porcentajes/excellence-70/12.0%
https://www.lepaindesfleurs.fr/la-marque12.0%
https://www.gerble.fr/teneur-reduite/biscuit-pomme-noisette12.0%
https://www.pringles.com/de/products/flavours/pringles-original-product.html12.0%
http://www.lindt.ca/swf/fra/produits/excellence/barres/excellence-99-cacao/12.0%
www.nestledessert.fr12.0%

ingredients_text_nl categorical free_text

Dutch-language ingredient lists for food products, present for only 24% of the 50 rows (null_rate 0.76). Among the 12 non-null entries there are 9 distinct strings with high entropy_ratio 0.92, and the modal value is actually the empty string (4 occurrences) rather than a real ingredient list. Contents range from short declarations like 'Aardappelen, zonnebloemolie, zeezout.' to long packaging blurbs containing addresses and URLs, so the field mixes ingredients with marketing text.

Treatment: Treat empty strings as nulls, then tokenize and embed (or parse comma-separated ingredients) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[374]:

saturn.columns["ingredients_text_nl"].stats

statvalue
n50
nulls38 (76.0%)
unique9
top_value
top_rate 0.3333
cardinality 9
entropy 2.918
entropy_ratio 0.9206
alert: long_tail8 singleton categories
alert: null_rate76.0% null
Fig 93.
Top values for ingredients_text_nl.
Show data table
Top values for ingredients_text_nl (9 unique shown, of 9 total).
valuecountshare
48.0%
Cacaomassa, cacaoboter, magere cacaopoeder, suiker.12.0%
Aardappelen, zonnebloemolie, zeezout.12.0%
Cacaomassa, magere cacao, cacaoboter, bruine suiker, vanille. Kan noten, melk, soja, sesamzaad en tarwe bevatten.12.0%
Cacaomassa, suiker, cacaoboter, vanille.12.0%
Cacaomassa, magere cacaopoeder, cacaoboter, bruine suiker.12.0%
*Referentie inname van een gemiddelde volwassehe (8400 kJ/ 2000 ReJI), 16,7 g 46x4, www,snackmindful,com Milka www,milka,com ER Mondelez France SAS, 6 avenue Réaumur, CS 50014, 92142 Clamart Cedex, Service Consommateurs Nº Cristal:09,69,39,79,79 BE Mondelez Belgium, Stationsstraat 100, 2800 Mechelen, ND Mondelez Nederland, Verlengde Poolseweg 34, 4818 CL Breda, eu mondelezinternational,com e 100 g COCOA LIFE www,cocoalife,org 8 FR FRANCE ONLY 05 pp 3 045140 10550212.0%
_tarwebloem_ 47%, _melkchocolade_ 29% (suiker, cacaomassa, cacaoboter, weipoeder (van _melk_), magere _melkpoeder_, plantaardige vetten (shea, palm in wisselende verhoudingen), _melkvet_, emulgatoren (_sojalecithine_, E476), lactose (van _melk_), aroma), plantaardige oliën (palm, kokos), suiker, suikerstroop, _tarwezemelen_, rijsmiddelen (natriumwaterstofcarbonaat, ammoniumwaterstofcarbonaat), zout, _tarwekiemen_, voedingszuur (citroenzuur)12.0%
granen 98.3% (_volkorentarwemeel_ 65.8%, _roggebloem_, _tarwebloem_ 10.2%, rijstbloem, gemoute _tarwebloem_, _tarwegriesmeel_, boekweitbloem, _gerstebloem_), suiker, magere _melkpoeder_, zout, palmolie, _tarwekiemen_, emulgator (zonnebloemlecithine)12.0%

additives_n numeric feature

Count of additives per product, ranging from 0 to 8 across 50 rows with no nulls and only 8 distinct values. The distribution is heavily right-skewed (skew 1.47, kurtosis 2.10) with a zero_rate of 0.4 and median of 1, while a small tail produces 2 outliers (outlier_rate 0.04). Mean (1.52) sits well above the median, confirming a few additive-heavy products pull the average up.

Treatment: Treat as a discrete count; consider log1p or binning (0 vs 1+ vs many) before modelling given the skew and high zero_rate.

anthropic:claude-opus-4-7 · confidence high
Out[377]:

saturn.columns["additives_n"].stats

statvalue
n50
nulls0 (0.0%)
unique8
min 0
max 8
mean 1.52
median 1
std 1.821
q1 0
q3 2
iqr 2
skew 1.473
kurtosis 2.105
n_outliers 2
outlier_rate 0.04
zero_rate 0.4
Fig 94.
Distribution of additives_n. Vertical dash marks the median.
Show data table
Histogram bins for additives_n (median: 1.0).
bincount
0 – 1.14329
1.143 – 2.28611
2.286 – 3.4293
3.429 – 4.5713
4.571 – 5.7142
5.714 – 6.8571
6.857 – 81

generic_name_sv categorical free_text

Swedish-language generic product name, populated for only 4 of 50 rows (null_rate 0.92). The four observed values are all distinct (entropy_ratio 1.0), including one empty string, so there is effectively no usable signal here. Top value 'Fin mörk choklad med 90% kakao' appears just once.

Treatment: Drop or defer — too sparse (92% null) and unique to model.

anthropic:claude-opus-4-7 · confidence high
Out[380]:

saturn.columns["generic_name_sv"].stats

statvalue
n50
nulls46 (92.0%)
unique4
top_value Fin mörk choklad med 90% kakao
top_rate 0.25
cardinality 4
entropy 2
entropy_ratio 1
alert: long_tail4 singleton categories
alert: null_rate92.0% null
Fig 95.
Top values for generic_name_sv.
Show data table
Top values for generic_name_sv (4 unique shown, of 4 total).
valuecountshare
Fin mörk choklad med 90% kakao12.0%
Mörk choklad12.0%
12.0%
Kex12.0%

ingredients_that_may_be_from_palm_oil_tags unknown feature

This column was skipped by the profiler, so no statistics, uniqueness, or value samples are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds tags listing ingredients potentially derived from palm oil, likely a multi-valued/list field typical of Open Food Facts exports. Without parsed values, nothing can be said about cardinality, distribution, or content.

Treatment: Re-profile after parsing as a list of tags, then one-hot or count-encode before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[383]:

saturn.columns["ingredients_that_may_be_from_palm_oil_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

known_ingredients_n numeric feature

A non-negative integer count of recognised ingredients per record, ranging from 0 to 36 with a mean of 11.76 and median of 9. The distribution is right-skewed (skew 0.86) with a wide IQR of 13.5, and 4% of rows are zero — meaning a small fraction had no ingredients matched at all. No outliers were flagged and there are no nulls across the 50 rows.

Treatment: Consider a log1p transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[385]:

saturn.columns["known_ingredients_n"].stats

statvalue
n50
nulls0 (0.0%)
unique22
min 0
max 36
mean 11.76
median 9
std 8.721
q1 5
q3 18.5
iqr 13.5
skew 0.8598
kurtosis 0.07411
n_outliers 0
outlier_rate 0
zero_rate 0.04
Fig 96.
Distribution of known_ingredients_n. Vertical dash marks the median.
Show data table
Histogram bins for known_ingredients_n (median: 9.0).
bincount
0 – 5.14316
5.143 – 10.2912
10.29 – 15.436
15.43 – 20.577
20.57 – 25.715
25.71 – 30.862
30.86 – 362

completeness numeric feature

A numeric quality score named 'completeness', bounded loosely between 0.575 and 1.1 with mean 0.91 and median 0.9, so most rows are near-complete. The max of 1.1 is suspicious for a metric that nominally caps at 1.0, and 12% of values flag as outliers with a left skew of -0.67, suggesting a tail of poorly-populated records. Only 14 unique values across 50 rows hints at a discretised or rounded score rather than a continuous measurement.

Treatment: Clip values above 1.0 and inspect the low-end outliers before using as a quality filter.

anthropic:claude-opus-4-7 · confidence high
Out[388]:

saturn.columns["completeness"].stats

statvalue
n50
nulls0 (0.0%)
unique14
min 0.575
max 1.1
mean 0.91
median 0.9
std 0.1358
q1 0.8875
q3 1
iqr 0.1125
skew -0.6678
kurtosis 0.32
n_outliers 6
outlier_rate 0.12
zero_rate 0
alert: outliers12.0% rows beyond 1.5 IQR
Fig 97.
Distribution of completeness. Vertical dash marks the median.
Show data table
Histogram bins for completeness (median: 0.9).
bincount
0.575 – 0.653
0.65 – 0.7253
0.725 – 0.82
0.8 – 0.8752
0.875 – 0.9522
0.95 – 1.0259
1.025 – 1.19

ingredients_sweeteners_n numeric feature

This column appears to count sweetener ingredients per record, but every one of the 50 rows holds the value 0 (zero_rate 1.0, n_unique 1, std 0.0). It carries no information for modelling and is flagged constant.

Treatment: Drop; constant column with zero variance.

anthropic:claude-opus-4-7 · confidence high
Out[391]:

saturn.columns["ingredients_sweeteners_n"].stats

statvalue
n50
nulls0 (0.0%)
unique1
min 0
max 0
mean 0
median 0
std 0
q1 0
q3 0
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 1
alert: constantonly one distinct value
Fig 98.
Distribution of ingredients_sweeteners_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_sweeteners_n (median: 0.0).
bincount
-0.5 – -0.35710
-0.3571 – -0.21430
-0.2143 – -0.071430
-0.07143 – 0.0714350
0.07143 – 0.21430
0.2143 – 0.35710
0.3571 – 0.50

nova_groups categorical feature

This column holds NOVA food classification groups, a 4-level ordinal scheme encoded as strings ('1' through '4'). Only 3 of the 4 possible groups appear across 50 rows, with group '4' (ultra-processed) dominating at 33/50 (top_rate 0.6875) and group '2' entirely absent. Null rate is 0.04 and entropy_ratio is 0.64, indicating concentration toward the ultra-processed end.

Treatment: Treat as ordinal (cast to int) and impute the 4% missing before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[394]:

saturn.columns["nova_groups"].stats

statvalue
n50
nulls2 (4.0%)
unique3
top_value 4
top_rate 0.6875
cardinality 3
entropy 1.006
entropy_ratio 0.635
Fig 99.
Top values for nova_groups.
Show data table
Top values for nova_groups (3 unique shown, of 3 total).
valuecountshare
43366.0%
31428.0%
112.0%

allergens_hierarchy unknown feature

This column is labeled 'allergens_hierarchy', suggesting it holds hierarchical allergen tags (likely a list or delimited path structure). Saturn skipped profiling, so no uniqueness, cardinality, or value statistics are available beyond the fact that all 50 rows are non-null. Without parsed content, the structure and value distribution cannot be characterized.

Treatment: Parse the hierarchy into a list, then one-hot or multi-label encode allergen tags before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[397]:

saturn.columns["allergens_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

obsolete categorical metadata

The 'obsolete' column has a single observed value—an empty string—across all 44 non-null rows, with the remaining 12% of rows null. Cardinality is 1 and entropy is 0, so this column carries no information as-is. The name suggests a deprecated flag, consistent with it being effectively unused.

Treatment: Drop; constant column with no signal.

anthropic:claude-opus-4-7 · confidence high
Out[399]:

saturn.columns["obsolete"].stats

statvalue
n50
nulls6 (12.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 100.
Top values for obsolete.
Show data table
Top values for obsolete (1 unique shown, of 1 total).
valuecountshare
4488.0%

origin_sv categorical metadata

This appears to be a source/origin indicator (likely Swedish, given the _sv suffix) but it carries virtually no information in this sample. With a 92% null rate and the only non-null value being an empty string repeated 4 times, cardinality is 1 and entropy is 0. The column is effectively constant and unusable as-is.

Treatment: Drop the column; it is 92% null and otherwise constant.

anthropic:claude-opus-4-7 · confidence high
Out[402]:

saturn.columns["origin_sv"].stats

statvalue
n50
nulls46 (92.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate92.0% null
alert: imbalancetop value is 100.0% of rows
Fig 101.
Top values for origin_sv.
Show data table
Top values for origin_sv (1 unique shown, of 1 total).
valuecountshare
48.0%

packaging_hierarchy unknown other

The column packaging_hierarchy was skipped by the profiler, so no type, uniqueness, or distribution stats are available. All 50 rows are non-null, but every other signal (kind, n_unique, summary stats) is missing. Without further inspection the contents and structure remain unknown.

Treatment: Re-profile or manually inspect a sample before deciding on downstream handling.

anthropic:claude-opus-4-7 · confidence low
Out[405]:

saturn.columns["packaging_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_with_unspecified_percent_n numeric feature

Likely a count of ingredients on a product whose declared percentage is unspecified, ranging from 1 to 33 with a mean of 8.8 and median of 7. The distribution is right-skewed (skew 1.64, kurtosis 3.55) with two outliers (4%) pulling the upper tail toward 33, well above the Q3 of 11. Every row has a value (null_rate 0, zero_rate 0), so no product in this sample fully discloses ingredient percentages.

Treatment: Apply a log or sqrt transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[407]:

saturn.columns["ingredients_with_unspecified_percent_n"].stats

statvalue
n50
nulls0 (0.0%)
unique18
min 1
max 33
mean 8.8
median 7
std 6.061
q1 5
q3 11
iqr 6
skew 1.645
kurtosis 3.545
n_outliers 2
outlier_rate 0.04
zero_rate 0
Fig 102.
Distribution of ingredients_with_unspecified_percent_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_with_unspecified_percent_n (median: 7.0).
bincount
1 – 5.57122
5.571 – 10.1413
10.14 – 14.716
14.71 – 19.297
19.29 – 23.861
23.86 – 28.430
28.43 – 331

fruits-vegetables-nuts_100g_estimate numeric feature

This is an estimated percentage of fruits, vegetables, and nuts per 100g of product. The signal is almost absent: 46% of rows are null, and of the 27 non-null values, 96.3% are zero, leaving essentially one non-zero observation at 85.0 that drives the mean of 3.15 and skew of 4.9.

Treatment: Drop or collapse to a binary has_value flag; the column carries almost no variance.

anthropic:claude-opus-4-7 · confidence high
Out[410]:

saturn.columns["fruits-vegetables-nuts_100g_estimate"].stats

statvalue
n50
nulls23 (46.0%)
unique2
min 0
max 85
mean 3.148
median 0
std 16.36
q1 0
q3 0
iqr 0
skew 4.903
kurtosis 22.04
n_outliers 1
outlier_rate 0.03704
zero_rate 0.963
alert: null_rate46.0% null
alert: high_skewskew=+4.90
Fig 103.
Distribution of fruits-vegetables-nuts_100g_estimate. Vertical dash marks the median.
Show data table
Histogram bins for fruits-vegetables-nuts_100g_estimate (median: 0.0).
bincount
0 – 1726
17 – 340
34 – 510
51 – 680
68 – 851

emb_codes categorical metadata

Looks like a free-form certification/packaging code field (FSC-*, EMB *, LPL.*) with mixed formats including one company-name string. The column is dominated by empty strings — 35 of 50 rows (top_rate 0.73) — and has a 4% null rate on top, leaving very little signal across 11 unique values. Entropy ratio of 0.50 and the long_tail alert confirm most non-empty codes appear only once or twice.

Treatment: Treat empty strings as missing and consider dropping or collapsing into a binary has_code flag given the sparsity and long tail.

anthropic:claude-opus-4-7 · confidence medium
Out[413]:

saturn.columns["emb_codes"].stats

statvalue
n50
nulls2 (4.0%)
unique11
top_value
top_rate 0.7292
cardinality 11
entropy 1.72
entropy_ratio 0.4972
alert: long_tail7 singleton categories
Fig 104.
Top values for emb_codes.
Show data table
Top values for emb_codes (11 unique shown, of 11 total).
valuecountshare
3570.0%
FSC-C02144224.0%
FSC-C01248424.0%
EMB 3125024.0%
LPL.28.01.1312.0%
EMB 44068A12.0%
SOLENT GMBH & CO. KG,SCHWARZ BETEILIGUNGS GMBH12.0%
200029-N4/724312.0%
EMB 6442212.0%
FSC-C19042612.0%
C-352-255-22-1012.0%

packagings unknown other

This column was skipped by the profiler, so no statistics are available beyond a row count of 50 and a null rate of 0.0. The name 'packagings' suggests it likely holds nested or structured packaging descriptions (lists or objects), which is consistent with the profiler classifying its kind as 'unknown' and emitting a 'skipped' alert. Without unique counts or value summaries, nothing further can be inferred.

Treatment: Inspect raw values and parse/normalize the structure before deciding on a downstream treatment.

anthropic:claude-opus-4-7 · confidence low
Out[416]:

saturn.columns["packagings"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

purchase_places_tags unknown other

Profiling skipped this column, so type, uniqueness, and value distribution are unknown. The only confirmed facts are 50 rows present with a null rate of 0.0 and a name suggesting a tags-style field for purchase locations. No further signal is available to characterise content or cardinality.

Treatment: Re-run profiling with parsing enabled to inspect tag values before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[418]:

saturn.columns["purchase_places_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

additives_original_tags unknown other

The column 'additives_original_tags' was skipped by the profiler, so no statistics, uniqueness count, or value samples are available beyond a row count of 50 and a null rate of 0.0. Based solely on the name, it likely holds lists of food-additive tag identifiers (e.g., E-numbers) in their original locale, but this cannot be verified from the evidence.

Treatment: Re-run the profiler with list/tag parsing enabled, then explode tags before encoding.

anthropic:claude-opus-4-7 · confidence low
Out[420]:

saturn.columns["additives_original_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

image_front_url categorical identifier

Per-row URLs pointing to Open Food Foundation product front images, with locale suffixes like front_fr and front_en in the path. All 50 values are unique (entropy_ratio 1.0, top_rate 0.02) and there are no nulls, so this is effectively a 1:1 asset link rather than a feature.

Treatment: Treat as an asset URL: drop from modelling, or fetch images out-of-band for vision pipelines.

anthropic:claude-opus-4-7 · confidence high
Out[422]:

saturn.columns["image_front_url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 105.
Top values for image_front_url.
Show data table
Top values for image_front_url (20 unique shown, of 50 total).
valuecountshare
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.400.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/044/9283/front_en.605.400.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/9759/front_en.492.400.jpg12.0%
https://images.openfoodfacts.org/images/products/611/103/100/5064/front_fr.56.400.jpg12.0%
https://images.openfoodfacts.org/images/products/317/568/001/1480/front_en.221.400.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/099/5553/front_en.314.400.jpg12.0%
https://images.openfoodfacts.org/images/products/326/884/000/1008/front_fr.422.400.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1044/front_fr.50.400.jpg12.0%
https://images.openfoodfacts.org/images/products/842/519/771/2024/front_en.60.400.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/057/8464/front_en.29.400.jpg12.0%
https://images.openfoodfacts.org/images/products/611/125/934/3108/front_fr.25.400.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1228/front_fr.38.400.jpg12.0%
https://images.openfoodfacts.org/images/products/800/050/031/0427/front_fr.488.400.jpg12.0%
https://images.openfoodfacts.org/images/products/730/040/048/1595/front_fr.242.400.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2651/front_en.159.400.jpg12.0%
https://images.openfoodfacts.org/images/products/506/004/264/1000/front_en.179.400.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/058/4724/front_en.95.400.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2606/front_en.102.400.jpg12.0%
https://images.openfoodfacts.org/images/products/322/982/010/0234/front_fr.246.400.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/002/2464/front_en.301.400.jpg12.0%

data_quality_bugs_tags unknown other

This column was skipped by the profiler, so its kind is unknown and no descriptive statistics are available beyond a row count of 50 and a null rate of 0. The name suggests it holds tags related to data-quality bugs, likely a list or delimited string, but that structure is not confirmed by evidence. Without uniqueness, value, or length signals, no distributional claims can be made.

Treatment: Re-run the profiler with parsing enabled (e.g., explode tags) before deciding how to use this column.

anthropic:claude-opus-4-7 · confidence low
Out[425]:

saturn.columns["data_quality_bugs_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origin_fi categorical metadata

This appears to be an origin field (likely a financial or geographic origin code) that is essentially empty. 90% of the 50 rows are null, and the remaining 5 non-null entries are all the empty string, giving a single unique value and zero entropy. There is no usable signal here.

Treatment: Drop; the column is 90% null and the remaining values are blank.

anthropic:claude-opus-4-7 · confidence high
Out[427]:

saturn.columns["origin_fi"].stats

statvalue
n50
nulls45 (90.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate90.0% null
alert: imbalancetop value is 100.0% of rows
Fig 106.
Top values for origin_fi.
Show data table
Top values for origin_fi (1 unique shown, of 1 total).
valuecountshare
510.0%

images unknown other

The column 'images' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. Only the row count (50) and a null rate of 0.0 are available; uniqueness, type, and value distribution are all missing. The name suggests binary or path-like image payloads, which would explain why the dissector bypassed it.

Treatment: Inspect raw values manually to confirm format, then route to an image-processing pipeline rather than tabular modelling.

anthropic:claude-opus-4-7 · confidence low
Out[430]:

saturn.columns["images"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_analysis unknown other

The column 'ingredients_analysis' was skipped by the profiler, so no type, uniqueness, or distribution statistics are available. The only confirmed signals are 50 rows present and a 0.0 null rate. Without further inspection, its content and structure remain unknown.

Treatment: Inspect raw values manually to determine type before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[432]:

saturn.columns["ingredients_analysis"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_with_allergens_pl categorical free_text

Polish-language ingredient text with embedded allergen HTML markup, almost entirely absent from this sample. 92% of 50 rows are null and only 3 distinct values appear, two of which are unique product descriptions and one is an empty string (top_rate 0.5 among non-nulls).

Treatment: Drop for modelling given 92% nulls; if retained, strip HTML allergen tags and treat as free text.

anthropic:claude-opus-4-7 · confidence high
Out[434]:

saturn.columns["ingredients_text_with_allergens_pl"].stats

statvalue
n50
nulls46 (92.0%)
unique3
top_value
top_rate 0.5
cardinality 3
entropy 1.5
entropy_ratio 0.9464
alert: long_tail2 singleton categories
alert: null_rate92.0% null
Fig 107.
Top values for ingredients_text_with_allergens_pl.
Show data table
Top values for ingredients_text_with_allergens_pl (3 unique shown, of 3 total).
valuecountshare
24.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, kakao w proszku o obniżonej zawartości tłuszczu, emulgator: lecytyny (soja); naturalny aromat waniliowy. Czekolada gorzka: masa kakaowa minimum 74 %. Może zawierać orzeszki ziemne, orzechy, mleko i gluten (pszenica, żyt jęczmień, owies, pszenica orkisz i pszenica khorosan).12.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, wanilia.12.0%

product_name_de categorical free_text

German-language product names, almost certainly the localized display label for food/confectionery items (chocolate, biscuits, Nutella). 60% of rows are null and the top non-null value is an empty string occurring 5 times, so effectively only ~15 distinct names cover 50 rows. Entropy ratio of 0.935 confirms the populated values are nearly all unique, and at least one entry ('Lightly Sea Salted') is English rather than German.

Treatment: Treat as free text: normalize empty strings to null, then tokenize/embed if used as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[437]:

saturn.columns["product_name_de"].stats

statvalue
n50
nulls30 (60.0%)
unique16
top_value
top_rate 0.25
cardinality 16
entropy 3.741
entropy_ratio 0.9354
alert: long_tail15 singleton categories
alert: null_rate60.0% null
Fig 108.
Top values for product_name_de.
Show data table
Top values for product_name_de (16 unique shown, of 16 total).
valuecountshare
510.0%
Edelbitterschokolade Mild 90%12.0%
Edelbitter mild 85%12.0%
Knusprige Kekse mit einem cremigen Herz aus Nutella®12.0%
Lightly Sea Salted12.0%
85% kraftvoller schwarzer Kakao12.0%
Noir intense 74%cacao12.0%
Tuc Original12.0%
Schokolade Ecuador Edelbitter 70% Cacao12.0%
Nutella12.0%
Bitter Extra Kraftig12.0%
Schokolade (Alpenmilch Schokolade)12.0%
Granatapfel Sauerkirsche Fruchtgummi12.0%
Bio-Bitterschokolade 70%12.0%
Nuss-Frucht-Mix12.0%
Dark Milde Edelbitter Scholade 70%12.0%

ingredients_text_with_allergens_nb categorical free_text

This appears to be a Norwegian-language ingredients text field with allergen annotations, likely a localized variant of a product description column. It is effectively empty: 96% of rows are null and the only non-null value across the remaining 2 records is an empty string, giving a single unique value and zero entropy.

Treatment: Drop; the column carries no information at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[440]:

saturn.columns["ingredients_text_with_allergens_nb"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 109.
Top values for ingredients_text_with_allergens_nb.
Show data table
Top values for ingredients_text_with_allergens_nb (1 unique shown, of 1 total).
valuecountshare
24.0%

packaging_text_it categorical free_text

Italian-language packaging description text, almost entirely absent from this sample. 68% of rows are null and of the 16 non-null entries, 14 are empty strings, leaving only 2 substantive Italian descriptions of recycling instructions. With cardinality of 3 and a top_rate of 0.875 on the empty string, this column carries virtually no usable signal here.

Treatment: Drop unless joined with a much larger Italian-locale slice; too sparse to model.

anthropic:claude-opus-4-7 · confidence high
Out[443]:

saturn.columns["packaging_text_it"].stats

statvalue
n50
nulls34 (68.0%)
unique3
top_value
top_rate 0.875
cardinality 3
entropy 0.6686
entropy_ratio 0.4218
alert: long_tail2 singleton categories
alert: null_rate68.0% null
Fig 110.
Top values for packaging_text_it.
Show data table
Top values for packaging_text_it (3 unique shown, of 3 total).
valuecountshare
1428.0%
Incarto esterno in carta da riciclare, Incarto interno in alluminio da riciclare.12.0%
1 tubo C/PAP 85 da indifferenziata, 1 sigillo C/PAP 84 da indifferenziata, 1 tappo di plastica PP5 da riciclare.12.0%

product_name_it categorical free_text

Italian-language product name field, mostly empty: 68% of the 50 rows are null and the modal non-null value is the empty string "" (5 occurrences, top_rate 0.3125). Among the 12 distinct values the names are heterogeneous chocolate and snack labels (e.g. "Fondente Prodigioso 90% Cacao", "Pringles classiche 175 gr", "Milka"), with case-variant duplicates like "cioccolato fondente" vs "Cioccolato fondente" inflating cardinality. Entropy ratio 0.913 confirms the non-null tail is essentially flat, each name appearing once.

Treatment: Normalise case/whitespace, treat empty strings as null, then tokenize and embed; not usable as a categorical feature given 68% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[446]:

saturn.columns["product_name_it"].stats

statvalue
n50
nulls34 (68.0%)
unique12
top_value
top_rate 0.3125
cardinality 12
entropy 3.274
entropy_ratio 0.9134
alert: long_tail11 singleton categories
alert: null_rate68.0% null
Fig 111.
Top values for product_name_it.
Show data table
Top values for product_name_it (12 unique shown, of 12 total).
valuecountshare
510.0%
Fondente Prodigioso 90% Cacao12.0%
Croccanti biscotti con cuore cremoso di Nutella12.0%
Excellence 85% Cacao Chocolat Noir Puissant Lindt % Lindt12.0%
cioccolato fondente12.0%
Original12.0%
Excellence 70% Cocoa Fondente Intenso12.0%
Cioccolato fondente12.0%
Pringles classiche 175 gr12.0%
Milka12.0%
Mix di frutta secca12.0%
Granola12.0%

serving_quantity categorical feature

Numeric serving sizes stored as strings, with 27 distinct values across 50 rows and a 12% null rate. The distribution is long-tailed: top values "100" and "10" each cover only 7 records (top_rate 0.159), entropy_ratio is 0.909 indicating values are spread almost uniformly, and outliers like "1000" and decimals like "11.5" sit alongside round numbers.

Treatment: Cast to numeric, impute the 12% nulls, and consider log-transforming before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[449]:

saturn.columns["serving_quantity"].stats

statvalue
n50
nulls6 (12.0%)
unique27
top_value 100
top_rate 0.1591
cardinality 27
entropy 4.322
entropy_ratio 0.9089
alert: long_tail21 singleton categories
Fig 112.
Top values for serving_quantity.
Show data table
Top values for serving_quantity (20 unique shown, of 27 total).
valuecountshare
100714.0%
10714.0%
2036.0%
2524.0%
4224.0%
3024.0%
2312.0%
11.512.0%
100012.0%
13.812.0%
11.412.0%
1812.0%
5012.0%
8512.0%
3612.0%
4012.0%
4512.0%
8.412.0%
7.14312.0%
5812.0%

product_name_ja categorical metadata

This appears to be a Japanese product name field that is effectively empty in this sample: 98% of the 50 rows are null and the single non-null value is itself the empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here whatsoever.

Treatment: Drop the column; it is 98% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[452]:

saturn.columns["product_name_ja"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 113.
Top values for product_name_ja.
Show data table
Top values for product_name_ja (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_sv categorical free_text

Swedish-language ingredient lists with embedded HTML allergen markup (), likely the Swedish localisation of a product ingredients field. Coverage is extremely poor: 92% null and only 4 distinct values across 50 rows, with the top value appearing just once (top_rate 0.25 over the non-null subset). One value is an empty string and others mix Swedish with Danish/Norwegian terms (HVEDEMEL, BYG, EGG), indicating inconsistent locale handling.

Treatment: Strip HTML tags and parse allergen spans separately; given 92% nulls, do not use as a primary feature.

anthropic:claude-opus-4-7 · confidence high
Out[455]:

saturn.columns["ingredients_text_with_allergens_sv"].stats

statvalue
n50
nulls46 (92.0%)
unique4
top_value kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.
top_rate 0.25
cardinality 4
entropy 2
entropy_ratio 1
alert: long_tail4 singleton categories
alert: null_rate92.0% null
Fig 114.
Top values for ingredients_text_with_allergens_sv.
Show data table
Top values for ingredients_text_with_allergens_sv (4 unique shown, of 4 total).
valuecountshare
kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.12.0%
kakaomassa, fettreducerat kakaopulver, kakaosmör, socker, emulgeringsmedel (sojalecitin), vaniljextrakt. Minst 85 % kakao i chokladen. Kan innehålla spår av nötter och mjölk.12.0%
12.0%
VETEMJÖL/HVEDEMEL, palmolja/-olie, glukossirap, maltextrakt från KORN/BYG, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, ÄGG/ÆG/EGG, arom, mjölbehandlingsmedel/melbehandlingsmiddel (NATRIUMDISULFIT).12.0%

allergens_tags unknown feature

Column `allergens_tags` was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a multi-valued tag field listing allergens (e.g., milk, nuts), but this cannot be verified from the evidence. Re-profile with list/string handling enabled to learn cardinality and tag distribution.

Treatment: Re-profile with tag-aware parsing, then one-hot or multi-hot encode the individual allergen tokens.

anthropic:claude-opus-4-7 · confidence low
Out[458]:

saturn.columns["allergens_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_fr categorical free_text

This is the French-language ingredients list for food products, stored as free-form text. Of 50 rows, 47 are unique and entropy ratio is 0.998, so values are essentially all distinct long strings; 4% are null and the most common value is an empty string (2 occurrences). Contents range from a two-word 'Eau de source' to multi-sentence ingredient declarations with percentages, allergens and additive codes.

Treatment: Tokenize and embed (or extract structured ingredient/allergen features) rather than treating as a category.

anthropic:claude-opus-4-7 · confidence high
Out[460]:

saturn.columns["ingredients_text_fr"].stats

statvalue
n50
nulls2 (4.0%)
unique47
top_value
top_rate 0.04167
cardinality 47
entropy 5.543
entropy_ratio 0.998
alert: long_tail46 singleton categories
Fig 115.
Top values for ingredients_text_fr.
Show data table
Top values for ingredients_text_fr (20 unique shown, of 47 total).
valuecountshare
24.0%
Lait écrémé, crème, SUcre, ferments laciques12.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.12.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.12.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja12.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).12.0%
Pâte de cacao, cacao maigre en poudre, beurre de cacao, sucre, émulsifiant : lécithines (soja) ; extrait de vanille. Traces éventuelles de fruits à coque et de lait.12.0%
Eau de source12.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.12.0%
Sucre, graisse vegetale de palmiste hidrogenée, Lait Enteir en poudre, Amandes, Cacao Dégraissé en poudre, lactoserum en poudre, Emulsifiant Lécithine de soja, Arômes (Vainilline).12.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب12.0%
Farine de _froment_, sucre, graisse végétale, noix de coco râpée, poudre de _lait_, poudre de _lactosérum_, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.12.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.12.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.12.0%
Pâte de cacao, sucre, beurre de cacao, vanille. Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.12.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique12.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille12.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.12.0%
Pâte de cacao, sucre, beurre de cacao, cacao maigre en poudre, émulsifiant : lécithines (_soja_), arôme naturel de vanille.12.0%
Farine complète de SEIGLE 59 g*, son de BLÉ 27 g*, flocons d'AVOINE 12 g*, GRAINES DE SÉSAME 7,0 g*, germe de BLÉ, sel. *en g pour 100 g de produit fini. Peut contenir des traces de LUPIN, LAIT, MOUTARDE et SOJA.12.0%

nutrition_score_beverage numeric feature

This appears to be a beverage-specific nutrition score, encoded as a numeric flag rather than a continuous metric: only 2 unique values across 50 rows, with min 0 and max 1. The distribution is overwhelmingly zero (zero_rate 0.98), leaving a single outlier at 1 that drives the extreme skew (6.86) and kurtosis (45.02). Effectively a near-constant indicator column.

Treatment: Treat as a binary flag, or drop as near-constant since 98% of rows share one value.

anthropic:claude-opus-4-7 · confidence high
Out[463]:

saturn.columns["nutrition_score_beverage"].stats

statvalue
n50
nulls0 (0.0%)
unique2
min 0
max 1
mean 0.02
median 0
std 0.1414
q1 0
q3 0
iqr 0
skew 6.857
kurtosis 45.02
n_outliers 1
outlier_rate 0.02
zero_rate 0.98
alert: high_skewskew=+6.86
Fig 116.
Distribution of nutrition_score_beverage. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_beverage (median: 0.0).
bincount
0 – 0.142949
0.1429 – 0.28570
0.2857 – 0.42860
0.4286 – 0.57140
0.5714 – 0.71430
0.7143 – 0.85710
0.8571 – 11

ingredients_ids_debug unknown other

This column was skipped by the profiler, so no statistics beyond row count (50) and null rate (0.0) are available. The name suggests it holds debug-only ingredient identifiers, likely a complex or nested structure that the dissector could not categorize. Without unique counts or value samples, its content and utility cannot be assessed here.

Treatment: Drop unless a downstream consumer specifically needs the raw debug payload.

anthropic:claude-opus-4-7 · confidence low
Out[466]:

saturn.columns["ingredients_ids_debug"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutrition_data categorical metadata

This appears to be a flag indicating whether nutrition data is present, with the only observed value being "on" across all 49 non-null rows. Cardinality is 1 and entropy is 0, so the column carries no discriminative information; one row (2%) is null.

Treatment: Drop, constant column with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[468]:

saturn.columns["nutrition_data"].stats

statvalue
n50
nulls1 (2.0%)
unique1
top_value on
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 117.
Top values for nutrition_data.
Show data table
Top values for nutrition_data (1 unique shown, of 1 total).
valuecountshare
on4998.0%

origin_ja categorical metadata

This appears to be a Japanese-language origin field, likely a localized counterpart to a primary origin column. It is effectively empty: 98% of the 50 rows are null and the only non-null value is itself the empty string, yielding a single unique value and zero entropy.

Treatment: Drop; the column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[471]:

saturn.columns["origin_ja"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 118.
Top values for origin_ja.
Show data table
Top values for origin_ja (1 unique shown, of 1 total).
valuecountshare
12.0%

packaging_text_en categorical free_text

English-language packaging descriptions, likely free-text recycling instructions scraped from product labels. Of 50 rows, 14% are null and another 39 (top_rate 0.91) are empty strings, leaving only 4 rows with actual content across 5 unique values. Entropy ratio of 0.27 confirms the column is almost entirely uninformative as-is.

Treatment: Drop or defer; coverage is too sparse to model, but if retained treat empty strings as nulls and tokenize the remainder.

anthropic:claude-opus-4-7 · confidence high
Out[474]:

saturn.columns["packaging_text_en"].stats

statvalue
n50
nulls7 (14.0%)
unique5
top_value
top_rate 0.907
cardinality 5
entropy 0.6325
entropy_ratio 0.2724
alert: long_tail4 singleton categories
Fig 119.
Top values for packaging_text_en.
Show data table
Top values for packaging_text_en (5 unique shown, of 5 total).
valuecountshare
3978.0%
1 plastic bottle to recycle 1 plastic cap to recycle12.0%
1 cardboard sleeve recyclable, 1 sheet of aluminium recyclable12.0%
Terracycle. Please dispose of this pack responsibly. Find out more at www.terracycle.co.uk.12.0%
cardboard (to recycle) foil paper (to throw away)12.0%

unknown_ingredients_n numeric feature

This is a count of unrecognised ingredients per row, ranging from 0 to 13 with a mean of 0.66. The distribution is dominated by zeros (zero_rate 0.84) with median, q1, and q3 all at 0, but a long right tail produces extreme skew (4.24) and kurtosis (18.32), with 8 outliers (16%) pulling the max to 13. Effectively a sparse anomaly indicator rather than a continuous count.

Treatment: Binarise (zero vs non-zero) or cap before modelling; raw values are too skewed for linear models.

anthropic:claude-opus-4-7 · confidence high
Out[477]:

saturn.columns["unknown_ingredients_n"].stats

statvalue
n50
nulls0 (0.0%)
unique6
min 0
max 13
mean 0.66
median 0
std 2.255
q1 0
q3 0
iqr 0
skew 4.236
kurtosis 18.32
n_outliers 8
outlier_rate 0.16
zero_rate 0.84
alert: high_skewskew=+4.24
alert: outliers16.0% rows beyond 1.5 IQR
Fig 120.
Distribution of unknown_ingredients_n. Vertical dash marks the median.
Show data table
Histogram bins for unknown_ingredients_n (median: 0.0).
bincount
0 – 1.85746
1.857 – 3.7141
3.714 – 5.5711
5.571 – 7.4290
7.429 – 9.2861
9.286 – 11.140
11.14 – 131

ingredients_from_palm_oil_tags unknown other

This column, named ingredients_from_palm_oil_tags, was skipped by the profiler so no distribution, uniqueness, or value-level statistics are available. The only confirmed signals are 50 rows and a 0.0 null rate; everything else (kind, n_unique) is missing.

Treatment: Re-profile with type coercion before deciding; likely a list/tag field needing parsing and one-hot or multi-label encoding.

anthropic:claude-opus-4-7 · confidence low
Out[480]:

saturn.columns["ingredients_from_palm_oil_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

labels_tags unknown other

The column `labels_tags` was skipped by the profiler, so its kind, cardinality, and value distribution are all unknown. The only confirmed facts are that it contains 50 rows with no nulls. The name suggests a labels or tags field, likely multi-valued or delimited text, but no evidence confirms that.

Treatment: Re-profile with a parser that handles list/tag-style values before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[482]:

saturn.columns["labels_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging_old_before_taxonomization categorical free_text

Pre-taxonomy packaging descriptions captured as free text, mixing languages (French, Spanish, English, German) and multi-value comma-separated lists. With 36 unique values across 38 non-null rows and entropy ratio 0.99, the field is almost fully unique; even the top value 'plastique' covers only 7.9% and 24% are null. Values combine material terms, language prefixes like 'fr:'/'en:', and counts ('20 biscuits en 4 sachets'), so it behaves more like free text than a category.

Treatment: Normalise and split on commas, then map tokens to a controlled packaging taxonomy before use.

anthropic:claude-opus-4-7 · confidence high
Out[484]:

saturn.columns["packaging_old_before_taxonomization"].stats

statvalue
n50
nulls12 (24.0%)
unique36
top_value plastique
top_rate 0.07895
cardinality 36
entropy 5.123
entropy_ratio 0.9909
alert: long_tail35 singleton categories
alert: null_rate24.0% null
Fig 121.
Top values for packaging_old_before_taxonomization.
Show data table
Top values for packaging_old_before_taxonomization (20 unique shown, of 36 total).
valuecountshare
plastique36.0%
fr:Film en plastique,paquet,fr:Etui en carton12.0%
Papel de aluminio,Caja de cartón,Carton,Karton,emballage,box cardboard,Aluminium wrap, en:card-box, en:foil-wrapper12.0%
Carton,Sachets,20 biscuits en 4 sachets,packet,paquetes12.0%
sl:PAP,fr:FSC mixte,Produkt,21 PAP12.0%
Papier,aluminium12.0%
Plastic12.0%
Plastique,en:mixed plastic-packet,Enveloppe12.0%
fr:Papier,Package paper,Paper recycling,papier,Enveloppe12.0%
carton,aluminium,Emballage carton12.0%
Sachet,Sous atmosphère protectrice,en:mixed plastic-packet12.0%
paper, foil12.0%
papier aluminium,emballage carton12.0%
fr:film plastique à jeter,fr:étui carton à recycler, fr:Film en plastique12.0%
papier,Enveloppe12.0%
paper12.0%
Kunststoff12.0%
Papel de aluminio, Caja de cartón, Carton, en:card-carton, en:aluminium-wrapper12.0%
Carton,plastique12.0%
4 sachets plastiques de 4 biscuits, Carton, fr:Film en plastique, fr:Etui en carton12.0%

packaging_text_nb categorical free_text

This appears to be a Norwegian Bokmål packaging text field, but it is effectively empty: 96% of 50 rows are null and the only 2 non-null values are blank strings, yielding cardinality 1 and entropy 0.

Treatment: Drop; the column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[487]:

saturn.columns["packaging_text_nb"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 122.
Top values for packaging_text_nb.
Show data table
Top values for packaging_text_nb (1 unique shown, of 1 total).
valuecountshare
24.0%

nutrition_grades_tags unknown other

This column was skipped by the profiler, so no statistics beyond a 50-row count and zero null rate are available. The name nutrition_grades_tags suggests categorical tags (likely Nutri-Score letters such as a-e) from an Open Food Facts-style source, but uniqueness, frequencies, and value examples are all missing. Treat any interpretation as provisional until the column is reprofiled.

Treatment: Reprofile with categorical parsing enabled before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[490]:

saturn.columns["nutrition_grades_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

category_properties unknown other

This column was skipped by the profiler, so no type, uniqueness, or distribution stats were computed beyond a 50-row count and 0% null rate. The name 'category_properties' suggests it holds nested or structured per-category attributes (likely dict/list/JSON), which is why saturn flagged it as unknown rather than a scalar kind.

Treatment: Inspect raw values and, if structured, flatten or JSON-normalize into separate columns before profiling again.

anthropic:claude-opus-4-7 · confidence low
Out[492]:

saturn.columns["category_properties"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutriscore_score numeric feature

This is the Nutri-Score numeric grade (typically -15 best to 40 worst), here ranging 0 to 40 with a mean of 17.47 and median of 19. The distribution is roughly symmetric (skew -0.16, kurtosis -0.53) with no outliers flagged, and 8.2% of values are exactly zero. Only 2% are null and 28 unique values across 50 rows, so the column is well populated and reasonably varied.

Treatment: Use directly as a numeric feature; impute the 2% nulls with the median.

anthropic:claude-opus-4-7 · confidence high
Out[494]:

saturn.columns["nutriscore_score"].stats

statvalue
n50
nulls1 (2.0%)
unique28
min 0
max 40
mean 17.47
median 19
std 9.906
q1 10
q3 25
iqr 15
skew -0.1616
kurtosis -0.5337
n_outliers 0
outlier_rate 0
zero_rate 0.08163
Fig 123.
Distribution of nutriscore_score. Vertical dash marks the median.
Show data table
Histogram bins for nutriscore_score (median: 19.0).
bincount
0 – 5.7148
5.714 – 11.435
11.43 – 17.147
17.14 – 22.8613
22.86 – 28.5712
28.57 – 34.292
34.29 – 402

packaging_tags unknown other

The column 'packaging_tags' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds packaging-related tags, likely a multi-valued or list-like field that the profiler could not classify. Without parsed values we cannot confirm cardinality, delimiter, or language.

Treatment: Re-profile after parsing the tag list (e.g., split on delimiter) before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[497]:

saturn.columns["packaging_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

labels_old categorical free_text

Legacy multi-label tags for products, stored as comma-separated strings mixing French, English, Polish, Bulgarian (Cyrillic), and namespaced codes like 'en:CE'. With 38 uniques across 50 rows, an 8% null rate, and the most common value being the empty string at 19.6%, the field is sparse and nearly free-form. Entropy ratio 0.93 and the long_tail alert confirm almost every non-empty value is singleton.

Treatment: Split on commas, normalise language/namespace prefixes, and one-hot the resulting tag tokens rather than treating the raw string as a category.

anthropic:claude-opus-4-7 · confidence high
Out[499]:

saturn.columns["labels_old"].stats

statvalue
n50
nulls4 (8.0%)
unique38
top_value
top_rate 0.1957
cardinality 38
entropy 4.903
entropy_ratio 0.9343
alert: long_tail37 singleton categories
Fig 124.
Top values for labels_old.
Show data table
Top values for labels_old (20 unique shown, of 38 total).
valuecountshare
918.0%
Triman, en:Sin gluten12.0%
Bezglutenowy, Triman12.0%
Point Vert, Fabriqué en France, Arômes naturels, Sans colorants, Sans huile de palme, Nutriscore, Nutriscore B, Triman12.0%
Справедлива търговия, Вегетарианско, Веган, Fairtrade cocoa, FSC, FSC Mix12.0%
Triman, Sans Nitrates12.0%
Point Vert, Fabriqué en Espagne, en:CE12.0%
Fair trade, Organic, Vegetarian, EU Organic, Fairtrade International, Vegan, Soil Association Organic, The Vegan Society, Commerce équitable12.0%
Point Vert, Non-bio, Triman12.0%
Sans conservateurs, Fabriqué en France, Triman12.0%
Sans gluten, Végétarien, Sans arômes artificiels, Végétalien, Assured Food Standards, Point Vert, Sans colorants artificiels, Sans exhausteur de goût, Sans glutamate, en:Made-in-england, en:Terracycle12.0%
Organic, Vegetarian, EU Organic, Fair trade, Non-EU Agriculture, Vegan, Fairtrade International, FR-BIO-01, FSC, FSC Mix, Green Dot, Max Havelaar, PL-EKO-07, Soil Association Organic, The Vegan Society12.0%
Agriculture non UE, Fabriqué en Belgique, Fabriqué en France, Sans huile de palme, Triman12.0%
Organic,EU Organic,Non-EU Agriculture,Certified B Corporation,EU Agriculture,EU/non-EU Agriculture,FR-BIO-01,No palm oil,Nutriscore,Nutriscore Grade D,Pure cocoa butter,AB Agriculture Biologique12.0%
Fair trade, Vegetarian, Fairtrade International, Vegan, Pure cocoa butter, Rainforest Alliance, Commerce-equitable, Pur-beurre-de-cacao12.0%
Source de fibres alimentaires,Point Vert,Riche en fibres,Triman,Emballage-recyclable12.0%
Halal12.0%
Vegetariano,Vegano,Punto Verde12.0%
Commerce équitable, Sans gluten, Bio, Végétarien, Épi barré, Bio européen, Kascher, Végétalien, Point Vert, Fabriqué en France, Nutriscore, Nutriscore A, The Vegan Society, AB Agriculture Biologique, Afdiag12.0%
Peu ou pas de sucre, Peu de sucre, Pauvre ou sans sodium, Sans conservateurs, Agriculture non UE, Allégé en sucre, Riche en vitamine E, Source de fibres alimentaires, Agriculture durable, Enrichi en vitamines, Agriculture UE, Agriculture UE/Non UE, Riche en fibres, Faible teneur en sodium, Fabriqué en France, Arômes naturels, Sans colorants, Sans colorants ou conservateurs, Sans huile de palme, Nutriscore, Nutriscore A, Riche en vitamine B1, Riche en vitamine B9, Source de vitamine B6, Sans édulcorants, Farine de blé français, Triman12.0%

packaging_text categorical free_text

Free-text packaging descriptions, mostly in French with some English mixed in, detailing materials and recycling instructions. The dominant value is an empty string at 75% (36 of 50 rows), and only 13 unique values exist with entropy ratio 0.46, so signal is sparse and long-tailed. Among non-empty entries, formats vary widely (multi-line itemized lists, comma-separated tags, uppercase marketing strings), suggesting no controlled vocabulary.

Treatment: Normalise case/whitespace and parse material keywords into multi-hot features; treat empty string as missing.

anthropic:claude-opus-4-7 · confidence high
Out[502]:

saturn.columns["packaging_text"].stats

statvalue
n50
nulls2 (4.0%)
unique13
top_value
top_rate 0.75
cardinality 13
entropy 1.708
entropy_ratio 0.4614
alert: long_tail12 singleton categories
Fig 125.
Top values for packaging_text.
Show data table
Top values for packaging_text (13 unique shown, of 13 total).
valuecountshare
3672.0%
1 film en plastique à recycler 1 étui en papier ondulé à recycler12.0%
carton, plastique12.0%
1 bouchon en plastique à trier 1 bouteille en plastique à trier12.0%
1 étui en carton à recycler 1 feuille en aluminium à recycler12.0%
1 sachet plastique à jeter12.0%
1 étui en carton  à recycler 1 feuille en aluminium à recycler12.0%
LE TRI +FACILE + BAC DE TRI12.0%
4 FILMS PLASTIQUE A JETER 1 ÉTUI CARTON À RECYCLER12.0%
cardboard (to recycle) foil paper (to throw away)12.0%
FR LE TRI + FACILE ÉTUI 8+ SACHETS BAC DE TRI A consommer de préférence avant le : en France par et Santé S.A.S. 10:02 11914538 112 eCastelnaudary REVEL 30 04 202412.0%
Sachet, clip à recycler12.0%
2 sachets en plastique à recycler 1 boîte en carton à recycler12.0%

ingredients_percent_analysis numeric feature

This appears to be a binary status flag for ingredient percent analysis, taking only 2 unique values (1.0 and -1.0) across 50 rows with no nulls. The distribution is heavily dominated by 1.0 (median, q1, q3 all 1.0; mean 0.84), with 4 outliers (8%) at -1.0 producing extreme negative skew (-3.10) and high kurtosis (7.59). Despite the numeric kind, the IQR of 0 and only two unique values indicate this is categorical rather than continuous.

Treatment: Recode as a categorical/boolean flag rather than treating as continuous numeric.

anthropic:claude-opus-4-7 · confidence high
Out[505]:

saturn.columns["ingredients_percent_analysis"].stats

statvalue
n50
nulls0 (0.0%)
unique2
min -1
max 1
mean 0.84
median 1
std 0.5481
q1 1
q3 1
iqr 0
skew -3.096
kurtosis 7.587
n_outliers 4
outlier_rate 0.08
zero_rate 0
alert: high_skewskew=-3.10
alert: outliers8.0% rows beyond 1.5 IQR
Fig 126.
Distribution of ingredients_percent_analysis. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_percent_analysis (median: 1.0).
bincount
-1 – -0.71434
-0.7143 – -0.42860
-0.4286 – -0.14290
-0.1429 – 0.14290
0.1429 – 0.42860
0.4286 – 0.71430
0.7143 – 146

ecoscore_data unknown other

The column 'ecoscore_data' was skipped by the profiler, so no type, uniqueness, or distribution stats are available. Only the row count (50) and a null rate of 0.0 are reported. The name suggests it holds Eco-Score payloads, likely a nested/structured object that the profiler could not introspect.

Treatment: Inspect raw values and parse the nested structure into typed sub-fields before use.

anthropic:claude-opus-4-7 · confidence low
Out[508]:

saturn.columns["ecoscore_data"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_sv categorical free_text

Swedish-language ingredient lists for food products, free-text rather than truly categorical. Coverage is extremely sparse: 92% null with only 4 distinct values across 50 rows, one of which is an empty string. The non-null entries are full ingredient declarations including allergen markers and bilingual Swedish/Danish/Norwegian terms.

Treatment: Treat as free text; given 92% nulls, drop or use only as a fallback to other-language ingredient columns.

anthropic:claude-opus-4-7 · confidence high
Out[510]:

saturn.columns["ingredients_text_sv"].stats

statvalue
n50
nulls46 (92.0%)
unique4
top_value kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.
top_rate 0.25
cardinality 4
entropy 2
entropy_ratio 1
alert: long_tail4 singleton categories
alert: null_rate92.0% null
Fig 127.
Top values for ingredients_text_sv.
Show data table
Top values for ingredients_text_sv (4 unique shown, of 4 total).
valuecountshare
kakaomassa, kakaosmör, fettreducerat kakaopulver, socker, vanilj.12.0%
kakaomassa, fettreducerat kakaopulver, kakaosmör, socker, emulgeringsmedel (_sojalecitin_), vaniljextrakt. Minst 85 % kakao i chokladen. Kan innehålla spår av nötter och mjölk.12.0%
12.0%
_VETEMJÖL_/_HVEDEMEL_, palmolja/-olie, glukossirap, maltextrakt från _KORN_/_BYG_, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, _ÄGG_/_ÆG_/_EGG_, arom, mjölbehandlingsmedel/melbehandlingsmiddel (_NATRIUMDISULFIT_).12.0%

brands_tags unknown other

The column 'brands_tags' was skipped by the profiler, so no type, uniqueness, or distribution statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds brand tag strings (likely slug-style identifiers, possibly multi-valued), but this cannot be confirmed from the evidence. Treat any downstream assumption with caution until the column is re-profiled.

Treatment: Re-profile or sample the raw values before deciding; if multi-valued tag strings, split and one-hot or embed.

anthropic:claude-opus-4-7 · confidence low
Out[513]:

saturn.columns["brands_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

compared_to_category categorical metadata

Holds an Open Food Facts category taxonomy code (e.g., 'en:dark-chocolate-bar-with-more-than-70-cocoa') used as a comparison reference. With 35 unique values across 50 rows and entropy ratio 0.95, the column is extremely diffuse — the modal category covers only 10% of rows and a long tail dominates. No nulls, but the high cardinality relative to sample size will make this hard to use as-is.

Treatment: Roll up to a coarser taxonomy level (e.g., chocolate/biscuits/dairy) before any grouping or modelling.

anthropic:claude-opus-4-7 · confidence high
Out[515]:

saturn.columns["compared_to_category"].stats

statvalue
n50
nulls0 (0.0%)
unique35
top_value en:dark-chocolate-bar-with-more-than-70-cocoa
top_rate 0.1
cardinality 35
entropy 4.886
entropy_ratio 0.9526
alert: long_tail28 singleton categories
Fig 128.
Top values for compared_to_category.
Show data table
Top values for compared_to_category (20 unique shown, of 35 total).
valuecountshare
en:dark-chocolate-bar-with-more-than-70-cocoa510.0%
en:biscuits48.0%
en:extra-fine-dark-chocolates36.0%
en:dark-chocolates36.0%
en:snacks-sucres36.0%
en:sandwich-biscuits24.0%
en:extruded-crispbreads24.0%
en:plain-fermented-dairy-desserts-with-cream12.0%
en:chocolate-stuffed-wafers12.0%
en:spring-waters12.0%
en:food12.0%
en:drop-cookies12.0%
en:shortbread-cookie-with-coconut12.0%
en:biscuits-cookies-shelf-stable12.0%
en:crispbreads12.0%
fr:chips-de-pommes-de-terre-classiques12.0%
en:dark-chocolate-bar12.0%
en:cacao-et-derives12.0%
en:crispbreads-wholemeal12.0%
en:biscuit-snack-with-chocolate-filling12.0%

data_sources categorical metadata

This column records the set of apps/databases that contributed each product's data, stored as a comma-separated list rather than a normalized relation. With 43 unique strings across 50 rows (entropy ratio 0.98) and the most common combination appearing only 4 times (top_rate 0.08), nearly every row has a bespoke source bundle. Notable: the values mix case ('yuka' vs 'Yuka') and overlap heavily on 'App - smoothie-openfoodfacts' and 'Apps', suggesting the same sources are repeatedly concatenated in different orders.

Treatment: split on commas, normalize case, and one-hot encode individual sources instead of treating the raw string as a category.

anthropic:claude-opus-4-7 · confidence high
Out[518]:

saturn.columns["data_sources"].stats

statvalue
n50
nulls0 (0.0%)
unique43
top_value App - yuka, Apps, App - Open Food Facts, App - smoothie-openfoodfacts
top_rate 0.08
cardinality 43
entropy 5.309
entropy_ratio 0.9783
alert: long_tail39 singleton categories
Fig 129.
Top values for data_sources.
Show data table
Top values for data_sources (20 unique shown, of 43 total).
valuecountshare
App - yuka, Apps, App - Open Food Facts, App - smoothie-openfoodfacts48.0%
App - yuka, Apps, App - smoothie-openfoodfacts36.0%
App - yuka, Apps, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - macrofactor24.0%
App - Yuka, Apps, App - smoothie-openfoodfacts24.0%
App - yuka, Apps, App - Open Food Facts, App - smoothie-openfoodfacts, App - allergytracker, App - openfoodfactsflutterapp12.0%
App - yuka, Apps, App - InFood, App - Open Food Facts, App - Horizon, App - smoothie-openfoodfacts, App - halal-healthy, App - foodwasteieee, App - mon-coach-ig-bas, App - intolerapp, App - fooducate12.0%
Database - FoodRepo / openfood.ch, Databases, App - yuka, Apps, App - Horizon, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - mon-coach-ig-bas, App - macrofactor, App - caloriecounterapp, App - Speisekammer12.0%
App - smoothie-openfoodfacts, Apps12.0%
App - yuka, Apps, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - Waistline12.0%
App - elcoco, App - yuka, Apps, App - off, App - El CoCo, App - InFood, App - Open Food Facts, App - Speisekammer, App - smoothie-openfoodfacts, App - macrofactor, App - mon-coach-ig-bas, App - caloriecounterapp12.0%
App - yuka, Apps, App - ethic-advisor, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, Producers, Producer - gie-sources-alma, Database - Equadis, Database - GDSN, Databases12.0%
App - yuka, Apps, App - InFood, App - Open Food Facts, App - halal-healthy, App - smoothie-openfoodfacts12.0%
Producer - Ferrero, Producers, App - off, App - yuka, Apps, Producer - ferrero-france-commerciale, Database - Equadis, Database - GDSN, Databases, App - Horizon, App - InFood, App - Open Food Facts, App - Speisekammer, App - smoothie-openfoodfacts, App - El CoCo, App - mon-coach-ig-bas, App - intolerapp, App - macrofactor, App - caloriecounterapp12.0%
Database - FoodRepo / openfood.ch, Databases, App - yuka, Apps, App - ethic-advisor, Producers, Producer - barilla, Producer - barilla-france-sa, Database - Equadis, Database - GDSN, App - Open Food Facts, App - smoothie-openfoodfacts, App - mon-coach-ig-bas, App - InFood, App - caloriecounterapp12.0%
Database - FoodRepo / openfood.ch, Databases, App - off, Apps, App - InFood, App - Open Food Facts, App - Yuka, App - smoothie-openfoodfacts, App - mon-coach-ig-bas, App - macrofactor12.0%
Database - FoodRepo / openfood.ch, Databases, App - yuka, Apps, App - Horizon, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - macrofactor, App - Speisekammer12.0%
App - yuka, Apps, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - caloriecounterapp, App - macrofactor12.0%
Database - FoodRepo / openfood.ch, Databases, App - yuka, Apps, app-elcoco, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - mon-coach-ig-bas12.0%
App - yuka, Apps, App - Open Food Facts, App - InFood, App - smoothie-openfoodfacts12.0%
App - yuka, Apps, App - Horizon, App - InFood, App - Open Food Facts, App - smoothie-openfoodfacts, App - macrofactor, App - caloriecounterapp12.0%

other_nutritional_substances_prev_tags unknown other

This column, `other_nutritional_substances_prev_tags`, was skipped by the profiler, so no statistics on uniqueness, distribution, or content are available. The only signals are that all 50 rows are non-null and the kind is unknown. Without further evidence the contents cannot be characterised; the name suggests a tag list referencing prior values of a nutritional-substances field.

Treatment: Re-profile with parsing enabled (likely a delimited tag list) before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[521]:

saturn.columns["other_nutritional_substances_prev_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_from_palm_oil_n numeric feature

This is effectively a binary indicator counting palm-oil-derived ingredients per product, stored as numeric with values only 0 or 1 (n_unique=2, max=1.0). The column is heavily zero-dominated (zero_rate ≈ 0.85) with mean ≈ 0.152, and the 7 ones get flagged as outliers because the IQR is 0. Null rate is 8%, modest but worth noting.

Treatment: Recast as a boolean palm-oil flag and impute the 8% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[523]:

saturn.columns["ingredients_from_palm_oil_n"].stats

statvalue
n50
nulls4 (8.0%)
unique2
min 0
max 1
mean 0.1522
median 0
std 0.3632
q1 0
q3 0
iqr 0
skew 1.937
kurtosis 1.751
n_outliers 7
outlier_rate 0.1522
zero_rate 0.8478
alert: outliers15.2% rows beyond 1.5 IQR
Fig 130.
Distribution of ingredients_from_palm_oil_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_from_palm_oil_n (median: 0.0).
bincount
0 – 0.166739
0.1667 – 0.33330
0.3333 – 0.50
0.5 – 0.66670
0.6667 – 0.83330
0.8333 – 17

last_updated_t numeric timestamp

Values are unique 10-digit integers in the ~1.74e9–1.77e9 range, which is the Unix-epoch band for early 2025 through late 2025, consistent with the column name suggesting a 'last updated' timestamp. The distribution is heavily left-skewed (skew -1.94) with 12% flagged as outliers — a handful of much older updates pulling the tail down while most rows cluster within a ~6.1M-second IQR (~71 days). No nulls or zeros.

Treatment: Cast from Unix seconds to datetime and derive recency features rather than using the raw integer.

anthropic:claude-opus-4-7 · confidence high
Out[526]:

saturn.columns["last_updated_t"].stats

statvalue
n50
nulls0 (0.0%)
unique50
min 1.739e+09
max 1.769e+09
mean 1.763e+09
median 1.767e+09
std 8.037e+06
q1 1.762e+09
q3 1.768e+09
iqr 6.138e+06
skew -1.945
kurtosis 2.892
n_outliers 6
outlier_rate 0.12
zero_rate 0
alert: outliers12.0% rows beyond 1.5 IQR
Fig 131.
Distribution of last_updated_t. Vertical dash marks the median.
Show data table
Histogram bins for last_updated_t (median: 1766580948.5).
bincount
1.739e+09 – 1.743e+093
1.743e+09 – 1.747e+091
1.747e+09 – 1.752e+091
1.752e+09 – 1.756e+092
1.756e+09 – 1.76e+093
1.76e+09 – 1.764e+098
1.764e+09 – 1.769e+0932

nutrition_score_debug categorical metadata

This looks like a debug/diagnostic field for a nutrition scoring pipeline, capturing which input nutrients were missing during computation. It is overwhelmingly empty: 49 of 50 rows (top_rate 0.98) hold an empty string, with only one row carrying a substantive message about missing saturated-fat, sugars, and sodium. Entropy of 0.14 confirms near-zero information content in this sample.

Treatment: Drop from modelling; retain only for pipeline debugging.

anthropic:claude-opus-4-7 · confidence high
Out[529]:

saturn.columns["nutrition_score_debug"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value
top_rate 0.98
cardinality 2
entropy 0.1414
entropy_ratio 0.1414
alert: imbalancetop value is 98.0% of rows
Fig 132.
Top values for nutrition_score_debug.
Show data table
Top values for nutrition_score_debug (2 unique shown, of 2 total).
valuecountshare
4998.0%
missing saturated-fat_100g - missing sugars_100g - missing sodium_100g12.0%

popularity_key numeric identifier

Values cluster tightly between 23.999B and 24.000B with an IQR of only ~400K, yet the minimum drops to ~22.9999B, producing severe negative skew (-2.67) and 5 low-side outliers (10%). With 49 unique values across 50 rows and no nulls, this looks like an opaque high-magnitude key or encoded rank rather than a true numeric measure.

Treatment: Treat as an identifier and exclude from numeric modelling; join on it if it links to another table.

anthropic:claude-opus-4-7 · confidence medium
Out[532]:

saturn.columns["popularity_key"].stats

statvalue
n50
nulls0 (0.0%)
unique49
min 2.3e+10
max 2.4e+10
mean 2.39e+10
median 2.4e+10
std 3.03e+08
q1 2.4e+10
q3 2.4e+10
iqr 4.002e+05
skew -2.667
kurtosis 5.111
n_outliers 5
outlier_rate 0.1
zero_rate 0
alert: high_skewskew=-2.67
alert: outliers10.0% rows beyond 1.5 IQR
Fig 133.
Distribution of popularity_key. Vertical dash marks the median.
Show data table
Histogram bins for popularity_key (median: 23999500422.0).
bincount
2.3e+10 – 2.314e+105
2.314e+10 – 2.329e+100
2.329e+10 – 2.343e+100
2.343e+10 – 2.357e+100
2.357e+10 – 2.371e+100
2.371e+10 – 2.386e+100
2.386e+10 – 2.4e+1045

product_name_es categorical free_text

Spanish-language product names, evidently a localized label field paralleling a primary product identifier. With null_rate 0.6 and 4 of the 20 non-null entries being empty strings, only ~16 rows carry usable text; among those, near-uniqueness is extreme (17 distinct values, entropy_ratio 0.96). Values mix branded items (Nutella Biscuits, Excellence 85% cacao) with generic descriptors (Original, Chocolate negro 85% cacao), so it behaves more like free text than a controlled vocabulary.

Treatment: Treat empty strings as nulls and tokenize/embed if used as a feature; otherwise drop given 60% missingness and high cardinality.

anthropic:claude-opus-4-7 · confidence high
Out[535]:

saturn.columns["product_name_es"].stats

statvalue
n50
nulls30 (60.0%)
unique17
top_value
top_rate 0.2
cardinality 17
entropy 3.922
entropy_ratio 0.9595
alert: long_tail16 singleton categories
alert: null_rate60.0% null
Fig 134.
Top values for product_name_es.
Show data table
Top values for product_name_es (17 unique shown, of 17 total).
valuecountshare
48.0%
Príncipe Galletas de Chocolate12.0%
Excellence chocolate 90% cacao12.0%
Chocolate negro 85% cacao12.0%
Nutella Biscuits12.0%
Biscotes integrales original12.0%
Excellence 85% cacao12.0%
Chocolate negro 74% cacao12.0%
Tostadas crujientes de fibra12.0%
Original12.0%
Excellence 70% Cocoa Intense Dark12.0%
Chocolate negro Ecuador 70% cacao12.0%
Nutella12.0%
Crunchy Oats & Honey12.0%
Excellence 99% Cacao Noir Absolu12.0%
Chocolate Con Leche Milka12.0%
Excellence suave 70% cacao12.0%

allergens_from_user categorical free_text

User-submitted allergen tags prefixed with a language code like (fr), (en), (es). 34 distinct values across 50 rows with high entropy ratio 0.9112064098150886, and the top value '(fr) ' (rate 0.16) is just an empty language tag, as is '(en) ' at 7 occurrences. Values mix languages and free-form casing (e.g. 'Gluten,Lait,Soja, en:gluten' alongside normalised 'en:gluten'), so the same allergen appears under multiple spellings.

Treatment: Strip the language prefix, split on commas, and normalise tokens to the en: namespace before using as multi-label features.

anthropic:claude-opus-4-7 · confidence high
Out[538]:

saturn.columns["allergens_from_user"].stats

statvalue
n50
nulls0 (0.0%)
unique34
top_value (fr)
top_rate 0.16
cardinality 34
entropy 4.636
entropy_ratio 0.9112
alert: long_tail30 singleton categories
Fig 135.
Top values for allergens_from_user.
Show data table
Top values for allergens_from_user (20 unique shown, of 34 total).
valuecountshare
(fr) 816.0%
(en) 714.0%
(fr) en:gluten36.0%
(en) en:soybeans, en:soybeans24.0%
(en) en:banana,en:milk12.0%
(en) Eggs,Gluten,Milk,Soybeans, en:milk12.0%
(fr) Gluten,Lait,Soja, en:gluten12.0%
(en) en:milk,en:nuts,en:soybeans12.0%
(fr) Gluten,Lait12.0%
(es) en:gluten,en:milk,en:nuts,en:soybeans12.0%
(en) en:gluten,en:milk,en:soybeans12.0%
(fr) en:gluten,en:sesame-seeds12.0%
(fr) Gluten12.0%
(fr) en:gluten,en:milk,en:soybeans12.0%
(de) en:eggs,en:gluten,en:sulphur-dioxide-and-sulphites12.0%
(en) en:gluten,en:nuts12.0%
(fr) en:soybeans12.0%
(en) en:milk,en:nuts,en:soybeans, en:soybeans12.0%
(it) en:gluten12.0%
(es) 12.0%

informers unknown other

The column 'informers' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. The only confirmed signals are 50 rows with no nulls; uniqueness, type, and value distribution are all missing from the evidence.

Treatment: Re-profile or manually inspect the raw values before deciding on any downstream handling.

anthropic:claude-opus-4-7 · confidence low
Out[541]:

saturn.columns["informers"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

brands_old categorical metadata

This is a legacy brand-name field for product records, with 29 distinct values across 50 rows and 32% nulls. The distribution is nearly flat (entropy_ratio 0.98) and the top brand 'Gerblé' covers only ~8.8% of non-null rows, so no brand dominates. Values mix clean names (Lindt, Cristaline) with concatenations like 'Wasa,Barilla' and oddities like 'LuMondelez', suggesting prior data-entry or merge artefacts.

Treatment: Clean and split multi-brand strings, then reconcile against a canonical brand list before use.

anthropic:claude-opus-4-7 · confidence high
Out[543]:

saturn.columns["brands_old"].stats

statvalue
n50
nulls16 (32.0%)
unique29
top_value Gerblé
top_rate 0.08824
cardinality 29
entropy 4.749
entropy_ratio 0.9776
alert: long_tail26 singleton categories
alert: null_rate32.0% null
Fig 136.
Top values for brands_old.
Show data table
Top values for brands_old (20 unique shown, of 29 total).
valuecountshare
Gerblé36.0%
Lindt36.0%
Green & Black's24.0%
LuMondelez12.0%
Lindt & sprüngli (nordic)12.0%
J.D. Gross12.0%
Cristaline12.0%
Maruja12.0%
Wasa,Barilla12.0%
Tyrrell's12.0%
Bjorg12.0%
Fin Carré12.0%
Wasa12.0%
Le pain des Fleurs,Ekibio12.0%
Aperitivos company12.0%
Lidl,J.D. Gross12.0%
Nutella,Ferrero12.0%
Pringles12.0%
Nature Valley12.0%
Lindt,ลินด์12.0%

data_quality_errors_tags unknown other

Profiling was skipped for this column, so no type, uniqueness, or value statistics are available. The only confirmed signals are 50 rows and a 0.0 null rate; everything else is missing. The name suggests it carries tags describing data-quality errors, likely a list or delimited string, but that is not verified by evidence.

Treatment: Re-run profiling with list/string parsing enabled before deciding how to use it.

anthropic:claude-opus-4-7 · confidence low
Out[546]:

saturn.columns["data_quality_errors_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text categorical free_text

Free-text ingredient lists from food packaging, one per row. Every one of the 50 rows is unique (entropy_ratio 1.0, top_rate 0.02) and the samples mix multiple languages (English, French, Bulgarian Cyrillic) with punctuation, percentages, and allergen notes. Treating this as a categorical feature is misleading despite the kind tag — it is unstructured multilingual prose flagged long_tail.

Treatment: Parse and tokenize (language-detect first), then embed or extract ingredient entities before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[548]:

saturn.columns["ingredients_text"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value milk cream, cream, sugar, banana, bacteria
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 137.
Top values for ingredients_text.
Show data table
Top values for ingredients_text (20 unique shown, of 50 total).
valuecountshare
milk cream, cream, sugar, banana, bacteria12.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.12.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.12.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja12.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).12.0%
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,12.0%
Eau de source12.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.12.0%
sugar, cocoa butter, whole milk powder, cocoa mass, almonds, emulsifier (soya lecithin), flavoring12.0%
cocoa mass #, cane sugar #, cocoa butter #, vanilla extract #, may contain nuts, milk,12.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب12.0%
Farine de _froment_, sucre, graisse végétale, noix de coco râpée, poudre de _lait_, poudre de _lactosérum_, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.12.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.12.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.12.0%
Pâte de cacao, sucre, beurre de cacao, vanille. Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.12.0%
Kartoffeln, Sonnenblumenöl, Meersalz.12.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique12.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille12.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.12.0%
cocoa mass, sugar, cocoa butter, fat reduced cocoa powder, emulsifier: lecithins (soya), natural vanilla flavouring, dark chocolate contains: cocoa solids 74% minimum,12.0%

categories categorical feature

This column holds Open Food Facts-style hierarchical category breadcrumbs, with each value a comma-separated taxonomy path from broad ('Snacks') to specific ('Chocolat noir en tablette extra dégustation à 70% de cacao minimum'). It is nearly unique (46 distinct values across 50 rows, top_rate just 0.06, entropy_ratio 0.99) and mixes French and English labels for overlapping concepts (e.g. 'Snacks sucrés' vs 'Sweet snacks', 'Chocolats noirs' vs 'Dark chocolates'), which the long_tail alert flags. Treat as a multi-label taxonomy rather than a flat category.

Treatment: Split on commas, normalize French/English synonyms, and one-hot or embed the resulting taxonomy tags rather than using the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[551]:

saturn.columns["categories"].stats

statvalue
n50
nulls0 (0.0%)
unique46
top_value Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolat noir en tablette extra dégustation à 70% de cacao minimum
top_rate 0.06
cardinality 46
entropy 5.469
entropy_ratio 0.9901
alert: long_tail43 singleton categories
Fig 138.
Top values for categories.
Show data table
Top values for categories (20 unique shown, of 46 total).
valuecountshare
Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolat noir en tablette extra dégustation à 70% de cacao minimum36.0%
Snacks,Snacks sucrés,Biscuits et gâteaux,Biscuits sucrés & biscuits apéritifs,Biscuits24.0%
Snacks,Sweet snacks,Cocoa and its products,Chocolates,Dark chocolates24.0%
Dairies,Fermented foods,Fermented milk products,Snacks,Desserts,Dairy desserts,Fermented dairy desserts,Plain fermented dairy desserts,Plain fermented dairy desserts with cream12.0%
Snacks,Breakfasts,Sweet snacks,Biscuits and cakes,Biscuits and crackers,Sandwich biscuits12.0%
Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolats noirs en tablette,Chocolats noirs extra fin12.0%
Snacks sucrés,Biscuits et gâteaux,Gaufrettes fourrées au chocolat12.0%
Boissons et préparations de boissons,Boissons,Snacks,Eaux,Eaux de sources12.0%
Snacks,Snacks sucrés,Biscuits et gâteaux,Biscuits12.0%
Snacks,Sweet snacks,Cocoa and its products,Confectioneries,Chocolates,Compound chocolates,Food12.0%
Snacks,Sweet snacks,Biscuits and cakes,Biscuits and crackers,Biscuits,Drop cookies12.0%
Snacks,Snacks sucrés,Biscuits et gâteaux,Biscuits,Biscuits sablés,Sablés à la noix de coco12.0%
Botanas,Snacks dulces,Galletas y pasteles,en:Biscuits and crackers,Galletas,en:Biscuits/Cookies (Shelf Stable),fr:Biscoitos recheados12.0%
Aliments d'origine végétale,Snacks,Céréales et pommes de terre,Pains,Pains croustillants,Petit-déjeuners12.0%
Produits fermentés,Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolats noirs en tablette,Chocolat noir en tablette extra dégustation à 70% de cacao minimum12.0%
Plant-based foods and beverages,Plant-based foods,Snacks,Cereals and potatoes,Salty snacks,Appetizers,Chips and fries,Crisps,Potato crisps,Potato crisps in sunflower oil,fr:Chips de pommes de terre classiques12.0%
Snacks,Snacks sucrés,Cacao et dérivés,Confiseries,Confiseries chocolatées,Chocolats,Chocolats noirs12.0%
Snacks,Snacks sucrés,Cacao et dérivés,Chocolats,Chocolats noirs,Chocolats noirs en tablette12.0%
Snacks, Sweet snacks, Biscuits and cakes, Biscuits and crackers, Biscuits, Chocolate biscuits, Filled biscuits, Dark chocolate biscuits, Sandwich biscuits12.0%
Snacks,Sweet snacks,Cocoa and its products,Chocolates,Dark chocolates,Extra fine dark chocolates,Cacao-et-derives12.0%

nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value numeric feature

This appears to be an estimated percentage of fruits/vegetables/nuts content derived from ingredients, used in nutrition score warnings. The distribution is dominated by zeros (zero_rate 0.71) with median 0.0 and Q3 only 2.33, yet a long tail pushes max to 100.0, producing extreme skew (5.41) and kurtosis (30.37). Seven outliers (15.6%) and a 10% null rate further indicate this signal fires for only a small subset of products.

Treatment: Binarize (zero vs non-zero) or log1p-transform before modelling given the heavy zero mass and skew.

anthropic:claude-opus-4-7 · confidence high
Out[554]:

saturn.columns["nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value"].stats

statvalue
n50
nulls5 (10.0%)
unique13
min 0
max 100
mean 4.532
median 0
std 15.52
q1 0
q3 2.326
iqr 2.326
skew 5.411
kurtosis 30.37
n_outliers 7
outlier_rate 0.1556
zero_rate 0.7111
alert: high_skewskew=+5.41
alert: outliers15.6% rows beyond 1.5 IQR
Fig 139.
Distribution of nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_fruits_vegetables_nuts_estimate_from_ingredients_value (median: 0.0).
bincount
0 – 16.6743
16.67 – 33.331
33.33 – 500
50 – 66.670
66.67 – 83.330
83.33 – 1001

ingredients_from_or_that_may_be_from_palm_oil_n numeric feature

Likely a count of ingredients sourced from or potentially from palm oil per product. With only 3 unique values ranging 0–2 and 70.2% zeros, most products contain none, while the right skew (1.39) reflects a small tail with one or two such ingredients. Null rate is modest at 6%.

Treatment: Treat as a low-cardinality ordinal count; impute missing as 0 or add a missing flag before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[557]:

saturn.columns["ingredients_from_or_that_may_be_from_palm_oil_n"].stats

statvalue
n50
nulls3 (6.0%)
unique3
min 0
max 2
mean 0.3404
median 0
std 0.5625
q1 0
q3 1
iqr 1
skew 1.393
kurtosis 0.969
n_outliers 0
outlier_rate 0
zero_rate 0.7021
Fig 140.
Distribution of ingredients_from_or_that_may_be_from_palm_oil_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_from_or_that_may_be_from_palm_oil_n (median: 0.0).
bincount
0 – 0.333333
0.3333 – 0.66670
0.6667 – 10
1 – 1.33312
1.333 – 1.6670
1.667 – 22

origins_old categorical metadata

Legacy free-text origin field, likely superseded (note the `_old` suffix). Of 50 rows, 22% are null and another 31 are empty strings, so the top_rate of 0.795 is dominated by blanks; only 9 distinct values exist and the non-empty entries mix country names ('France', 'Morocco'), multi-region comma lists, and noise like 'biologique' or 'Farine de blé: France'. Entropy ratio 0.425 confirms most signal is absent, and the inconsistent formats mean this cannot be used as a clean categorical without parsing.

Treatment: Drop or archive; if needed, parse non-empty strings into a normalised origins list and prefer the replacement column.

anthropic:claude-opus-4-7 · confidence high
Out[560]:

saturn.columns["origins_old"].stats

statvalue
n50
nulls11 (22.0%)
unique9
top_value
top_rate 0.7949
cardinality 9
entropy 1.347
entropy_ratio 0.4251
alert: long_tail8 singleton categories
alert: null_rate22.0% null
Fig 141.
Top values for origins_old.
Show data table
Top values for origins_old (9 unique shown, of 9 total).
valuecountshare
3162.0%
France12.0%
Chambon-la-Forêt,France,Cairanne,Provence-Alpes-Côte d'Azur,Vaucluse,Italie,Source Sainte Cécile,Source Ofélia,Source Éléonore,Source Emma,Source Éléna12.0%
United Kingdom12.0%
biologique12.0%
Morocco12.0%
[KAKAO],Los Ríos (Provinz),Ecuador12.0%
Farine de blé: France12.0%
Afrique de l'Ouest,Amérique du Sud,Madagascar12.0%

packaging_text_nl categorical free_text

Dutch-language packaging text field (likely from Open Food Facts or similar). 76% of the 50 rows are null, and every one of the 12 non-null values is the empty string, giving cardinality 1 and entropy 0. The column carries no usable signal in this sample.

Treatment: Drop; column is effectively empty (null or blank in all rows).

anthropic:claude-opus-4-7 · confidence high
Out[563]:

saturn.columns["packaging_text_nl"].stats

statvalue
n50
nulls38 (76.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate76.0% null
alert: imbalancetop value is 100.0% of rows
Fig 142.
Top values for packaging_text_nl.
Show data table
Top values for packaging_text_nl (1 unique shown, of 1 total).
valuecountshare
1224.0%

expiration_date categorical metadata

This is an expiration_date field captured as free-form text rather than a parsed date. With 34 unique values across 50 rows and a top_rate of 0.3125 driven by an empty string, roughly 31% of entries are blank (plus a 4% null_rate), and the remaining values mix incompatible formats like '31/07/2020', '28/02/24', '25.11.2025', '01/2018', '19-10-2023', and even non-date tokens like '30days'.

Treatment: Normalise to ISO dates with multi-format parsing and treat blanks/'30days' as missing before use.

anthropic:claude-opus-4-7 · confidence high
Out[566]:

saturn.columns["expiration_date"].stats

statvalue
n50
nulls2 (4.0%)
unique34
top_value
top_rate 0.3125
cardinality 34
entropy 4.364
entropy_ratio 0.8578
alert: long_tail33 singleton categories
Fig 143.
Top values for expiration_date.
Show data table
Top values for expiration_date (20 unique shown, of 34 total).
valuecountshare
1530.0%
30days12.0%
31/07/202012.0%
28/02/2412.0%
30/06/202512.0%
25.11.202512.0%
12.12.201812.0%
01/201812.0%
12/06/202112.0%
19-10-202312.0%
31 jul. 201912.0%
30-04-202412.0%
11/10/202512.0%
30 jun. 202012.0%
2024-04-0112.0%
31 mai 201912.0%
31-01-202512.0%
05 202612.0%
2021-11-1512.0%
31/12/202412.0%

selected_images unknown other

The column 'selected_images' was skipped by the profiler, so no type, cardinality, or distribution stats are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds image references or selections (filenames, URLs, or arrays), but this cannot be confirmed from the evidence. Without n_unique or value samples, no further characterisation is possible.

Treatment: Inspect raw values manually to determine structure before deciding on parsing, exploding, or dropping.

anthropic:claude-opus-4-7 · confidence low
Out[569]:

saturn.columns["selected_images"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

traces_from_ingredients categorical free_text

Allergen trace declarations parsed from product ingredient lists, recorded as free-form comma-separated allergen names. 78% of the 50 rows (39) are empty strings rather than nulls, and the remaining 11 distinct values mix languages (French 'œuf', 'lait', English 'nuts, milk', German 'Schalenfrüchte') and inconsistent casing, with some entries duplicating the same allergens twice in one string.

Treatment: Normalise case, split on commas, translate to a canonical allergen vocabulary, and treat empty strings as missing before one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[571]:

saturn.columns["traces_from_ingredients"].stats

statvalue
n50
nulls0 (0.0%)
unique12
top_value
top_rate 0.78
cardinality 12
entropy 1.521
entropy_ratio 0.4243
alert: long_tail11 singleton categories
Fig 144.
Top values for traces_from_ingredients.
Show data table
Top values for traces_from_ingredients (12 unique shown, of 12 total).
valuecountshare
3978.0%
œuf12.0%
nuts, milk12.0%
LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME , SOJA, LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME, SOJA12.0%
fruits à coque, lait, soja, sésame12.0%
soja, œufs, fruits à coque, sésame, moutarde12.0%
LUPIN, LAIT, MOUTARDE , SOJA, LUPIN, LAIT, MOUTARDE, SOJA12.0%
Schalenfrüchte, Milch, Soja12.0%
LAIT, FRUITS A COQUE, LAIT, FRUITS A COQUE12.0%
lait, moutarde, soja12.0%
: fruits à coque12.0%
soja, sésame12.0%

ingredients_text_with_allergens categorical free_text

Free-text ingredient lists with embedded HTML markup highlighting allergens. All 50 rows are unique (entropy_ratio 1.0, top_rate 0.02) and the language mix spans English, French, and Bulgarian Cyrillic, so any naive categorical encoding will explode. The HTML tags and multilingual content mean raw values need cleaning before NLP use.

Treatment: Strip HTML tags, language-detect, then tokenize/embed; do not treat as a category.

anthropic:claude-opus-4-7 · confidence high
Out[574]:

saturn.columns["ingredients_text_with_allergens"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value milk cream, cream, sugar, banana, bacteria
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 145.
Top values for ingredients_text_with_allergens.
Show data table
Top values for ingredients_text_with_allergens (20 unique shown, of 50 total).
valuecountshare
milk cream, cream, sugar, banana, bacteria12.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.12.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.12.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja12.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).12.0%
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,12.0%
Eau de source12.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.12.0%
sugar, cocoa butter, whole milk powder, cocoa mass, almonds, emulsifier (soya lecithin), flavoring12.0%
cocoa mass #, cane sugar #, cocoa butter #, vanilla extract #, may contain nuts, milk,12.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب12.0%
Farine de froment, sucre, graisse végétale, noix de coco râpée, poudre de lait, poudre de lactosérum, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.12.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.12.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.12.0%
Pâte de cacao, sucre, beurre de cacao, vanille. Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.12.0%
Kartoffeln, Sonnenblumenöl, Meersalz.12.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique12.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille12.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.12.0%
cocoa mass, sugar, cocoa butter, fat reduced cocoa powder, emulsifier: lecithins (soya), natural vanilla flavouring, dark chocolate contains: cocoa solids 74% minimum,12.0%

image_front_thumb_url categorical identifier

This column holds Open Food Facts thumbnail URLs pointing to product front images, embedding the product barcode and a language suffix (front_fr/front_en) in the path. Every one of the 50 rows is unique with zero nulls, so it functions as a per-row identifier rather than a feature. The mix of fr and en suffixes hints at a multi-locale product set.

Treatment: Drop for modelling; retain as a media link or fetch images if vision features are needed.

anthropic:claude-opus-4-7 · confidence high
Out[577]:

saturn.columns["image_front_thumb_url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 146.
Top values for image_front_thumb_url.
Show data table
Top values for image_front_thumb_url (20 unique shown, of 50 total).
valuecountshare
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/044/9283/front_en.605.100.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/9759/front_en.492.100.jpg12.0%
https://images.openfoodfacts.org/images/products/611/103/100/5064/front_fr.56.100.jpg12.0%
https://images.openfoodfacts.org/images/products/317/568/001/1480/front_en.221.100.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/099/5553/front_en.314.100.jpg12.0%
https://images.openfoodfacts.org/images/products/326/884/000/1008/front_fr.422.100.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1044/front_fr.50.100.jpg12.0%
https://images.openfoodfacts.org/images/products/842/519/771/2024/front_en.60.100.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/057/8464/front_en.29.100.jpg12.0%
https://images.openfoodfacts.org/images/products/611/125/934/3108/front_fr.25.100.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1228/front_fr.38.100.jpg12.0%
https://images.openfoodfacts.org/images/products/800/050/031/0427/front_fr.488.100.jpg12.0%
https://images.openfoodfacts.org/images/products/730/040/048/1595/front_fr.242.100.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2651/front_en.159.100.jpg12.0%
https://images.openfoodfacts.org/images/products/506/004/264/1000/front_en.179.100.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/058/4724/front_en.95.100.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2606/front_en.102.100.jpg12.0%
https://images.openfoodfacts.org/images/products/322/982/010/0234/front_fr.246.100.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/002/2464/front_en.301.100.jpg12.0%

lc categorical feature

This is a low-cardinality categorical with 5 distinct values that look like ISO 639-1 language codes (fr, en, de, bg, ro), suggesting a language tag for each row. The distribution is heavily concentrated: 'fr' accounts for 35 of 50 rows (top_rate 0.70), 'en' for 10, while 'de', 'bg', and 'ro' appear only 1-3 times. Entropy ratio of 0.56 confirms the imbalance, and there are no nulls.

Treatment: One-hot encode, or group rare codes (bg, ro, de) into an 'other' bucket before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[580]:

saturn.columns["lc"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value fr
top_rate 0.7
cardinality 5
entropy 1.294
entropy_ratio 0.5572
Fig 147.
Top values for lc.
Show data table
Top values for lc (5 unique shown, of 5 total).
valuecountshare
fr3570.0%
en1020.0%
de36.0%
bg12.0%
ro12.0%

ingredients_text_debug categorical free_text

Free-text ingredient lists in French (e.g., 'Lait écrémé, crème, sucre...'), likely a debug dump of OpenFoodFacts-style product compositions. Near-maximal entropy (0.997) and 35 unique values out of 50 confirm essentially every non-null row is distinct, while 28% are null and the top value is an empty string appearing twice. Texts vary wildly in length and include allergen markup (_lait_, _soja_) plus stray non-ingredient prose like publication dates.

Treatment: Tokenize and embed (or parse into structured allergen/ingredient lists) after imputing empty strings as nulls.

anthropic:claude-opus-4-7 · confidence high
Out[583]:

saturn.columns["ingredients_text_debug"].stats

statvalue
n50
nulls14 (28.0%)
unique35
top_value
top_rate 0.05556
cardinality 35
entropy 5.114
entropy_ratio 0.9971
alert: long_tail34 singleton categories
alert: null_rate28.0% null
Fig 148.
Top values for ingredients_text_debug.
Show data table
Top values for ingredients_text_debug (20 unique shown, of 35 total).
valuecountshare
24.0%
Lait écrémé, créme, sucre, ferments lactiques. matière grosse 3% , sa première date de publication au maroc 01/10/1993 le changement du packaging 10 ans par 10 ans depuis vingt-cinq ans de l’offre12.0%
Céréale 50,7 % (farine de blé 35 %, farine de blé complète 15,7 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudre à lever : (carbonate acide d'ammonium, carbonate acide de sodium, diphosphate disodique), émulsifiants : (lécithine de soja, lécithine de tournesol), sel, lait écrémé en poudre, lactose et protéines de lait, arômes.12.0%
Pâte de cacao, beurre de cacao, cacao maige, sucre, vanille. Cacao: 90% minimum.12.0%
Farine de blé 55,1%, sucre de canne roux, huile de colza 14,3%, sésame toasté 11,6%, germe de blé 5,2%, levain de seigle dévitalisé en poudre, fibres d'avoine, calcium, sel de mer, arôme naturel, magnésium, émulsifiant : lécithines de colza, poudres à lever : (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), acidifiant : acide malique, protéines de lait, amidon de blé, vitamines B1, B6, B9, PP et E (lactose, protéines de lait).12.0%
Eau de source12.0%
Farine de froment sucre, graisse végétale ,sucre inverti, agents levants ( bicarbonate d'ammonium-bicarbonate de sodium, sel , arome. Contient du gluten Peut contenir traces de lait et soja. Conserver dans un endroit frais et sec12.0%
Sucre, graisse végétale de palmiste hydrogénée, _Lait_ entier en poudre, Amandes, Cacao dégraissé en poudre, _lactosérum_ en poudre, Émulsifiant : Lécithine de _soja_, Arômes (Vanilline).12.0%
Pâte à tartiner aux _noisettes_ et au cacao 40% (sucre, huile de palme, _noisettes_ 13%, _lait_ écrémé en poudre 8,7%, cacao maigre 7,4%, émulsifiants : lécithines _soja_ ; vanilline), farine de _froment_ 32%, graisses végétales (palme, palmiste), sucre de canne 9%, _lactose_, son de _blé_, _lait_ en poudre, extrait en poudre de malt d'orge et de maïs, miel, poudres à lever : (disphosfate disodique, carbonate acide d'ammonium, carbonate acide de sodium), cacao maigre, sel, amidon de _froment_, farine d'_orge_ malté, lécithines _soja_ ; vanilline.12.0%
Farine complète de _seigle_, farine de _seigle_ 29%, levure, sel.12.0%
Pâte de cacao, sucre, beurre de cacao, vanille.12.0%
Pomme de terre, huile de tournesol, sel de mer.12.0%
pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille.12.0%
Céréales 54%(*farine de _blé_, *farine complète de _blé_ (15%)), *chocolat noir (25%) (*pâte de cacao, *sucre de canne non raffiné, *beurre de cacao), *sucre de canne roux non raffiné, *huile de tournesol oléique (9,7%), arôme naturel de vanille, *_lait_ écrémé en poudre, sel de mer non raffiné, poudres à lever : carbonates d'ammonium et de sodium, épaississant : *gomme d'acacia, antioxydant : , *extraits de romarin.12.0%
Kakaomasse*, Zucker, Kakaobutter, Kakaopulver stark entöit, Emulgator: Sonnenblumenlecithine ( - e322 - ), natürliches Vanille-Aroma. * Rainforest Alliance Certified. Kakao: 74% mindestens.12.0%
en gras peuvent provoquer yne réaction chez tes personnes souffrant d'allergies d'intolérahces alimentaires. en g pour 100g de produit. ou 412.0%
farine de froment, sucre, Graisse végétale , Sucre inverti, Agents levants (Bicarbonate d'ammonium, Bicarbonate de sodium), arôme vanille12.0%
Farine de _Blé_ 73.5 %, matière grasse végétale,extrait de malt d'_orge_, sirop de glucose, sel, poudre à lever : (carbonate acide d’ammonium, carbonate acide de sodium), _œufs_, agent de traitement de la farine : (_sulfite_ de sodium_), arôme12.0%
Pasta de cacao, azúcar, manteca de cacao, vainilla Bourbon natural. (Cacao: 70% mínimo)12.0%
Farine de _blé_ 68,4%, huile de colza, sirop de sucres issu de fruits, jus concentré de pomme 5,3%, _noisettes_ torréfiées 5,3%, germe de _blé_ 5,2%, fibres de chicorée : fructo-oligosaccharides, extrait de malt d'_orge_, arôme naturel de pomme, émulsifiant : lécithines de colza, amidon de _blé_, poudres à lever : (tartrates de potassium, carbonates de potassium, carbonates d‘ammonium), protéines de _lait_, vitamines B1, B2, B6, B9, PP et E (_lactose_, protéines de _lait_).12.0%

packagings_materials_main categorical feature

This is a low-cardinality categorical tagging the dominant packaging material, with only 3 distinct values across 50 rows ('en:paper-or-cardboard', 'en:plastic', 'en:unknown'). The headline issue is a 62% null rate, leaving just 19 observed rows where 'en:paper-or-cardboard' alone covers 68.4%. Entropy ratio of 0.70 indicates moderate concentration among the few non-null entries.

Treatment: Impute missing as an explicit 'unknown' category before one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[586]:

saturn.columns["packagings_materials_main"].stats

statvalue
n50
nulls31 (62.0%)
unique3
top_value en:paper-or-cardboard
top_rate 0.6842
cardinality 3
entropy 1.105
entropy_ratio 0.6972
alert: null_rate62.0% null
Fig 149.
Top values for packagings_materials_main.
Show data table
Top values for packagings_materials_main (3 unique shown, of 3 total).
valuecountshare
en:paper-or-cardboard1326.0%
en:plastic510.0%
en:unknown12.0%

data_quality_dimensions unknown other

The column `data_quality_dimensions` was skipped by the profiler, so no type, uniqueness, or value statistics were computed beyond a row count of 50 and a null rate of 0.0. Without `n_unique` or any descriptive stats, its content and structure are unknown from this evidence alone. The name suggests it may hold structured or list-like quality metadata, but that cannot be confirmed here.

Treatment: Re-profile with type inference forced, or inspect raw values manually before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[589]:

saturn.columns["data_quality_dimensions"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

serving_size categorical feature

Free-text serving size descriptors, with 37 unique values across only 50 rows (entropy ratio 0.98) and a 12% null rate. The top value '100g' covers just 6.8% of rows, and inconsistent formatting is rampant — '100g' vs '100 g', '10 g' vs '20g', plus compound strings like '1 Square (10 g)' — so the same physical quantity appears under multiple labels.

Treatment: Parse into a numeric grams column via regex and unit normalization before use.

anthropic:claude-opus-4-7 · confidence high
Out[591]:

saturn.columns["serving_size"].stats

statvalue
n50
nulls6 (12.0%)
unique37
top_value 100g
top_rate 0.06818
cardinality 37
entropy 5.107
entropy_ratio 0.9803
alert: long_tail32 singleton categories
Fig 150.
Top values for serving_size.
Show data table
Top values for serving_size (20 unique shown, of 37 total).
valuecountshare
100g36.0%
10 g36.0%
42 g24.0%
100 g24.0%
30 g24.0%
20g12.0%
1 Square (10 g)12.0%
23g12.0%
11.5g12.0%
25 g12.0%
1 L12.0%
1 portion (100 g)12.0%
13,8 g12.0%
11.4 g (1 tranche)12.0%
1 serving (100 g)12.0%
6 squares (18 g)12.0%
50g12.0%
20 gram12.0%
10 g (1 tranche)12.0%
85g12.0%

pnns_groups_1_tags unknown metadata

This column is named pnns_groups_1_tags, suggesting it holds Programme National Nutrition Santé top-level group tags (likely a categorical food classification). Saturn skipped profiling, so no uniqueness, frequency, or value statistics are available beyond n=50 and a 0.0 null rate. Without distribution evidence, the cardinality and dominant categories cannot be confirmed.

Treatment: Re-profile or inspect manually before use; if categorical, encode as a low-cardinality factor.

anthropic:claude-opus-4-7 · confidence low
Out[594]:

saturn.columns["pnns_groups_1_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origin categorical free_text

Free-text origin/provenance field, likely scraped from product packaging in mixed French/German wording. The column is almost entirely empty: the blank string accounts for 42 of 50 rows (top_rate 0.894) and another 6% are null, leaving only 5 distinct populated values. Entropy ratio of 0.285 and the long_tail alert confirm there is essentially no usable signal as-is.

Treatment: Drop or parse with regex/NER to extract country tokens; too sparse to use directly.

anthropic:claude-opus-4-7 · confidence high
Out[596]:

saturn.columns["origin"].stats

statvalue
n50
nulls3 (6.0%)
unique6
top_value
top_rate 0.8936
cardinality 6
entropy 0.7359
entropy_ratio 0.2847
alert: long_tail5 singleton categories
Fig 151.
Top values for origin.
Show data table
Top values for origin (6 unique shown, of 6 total).
valuecountshare
4284.0%
Fabriqué par: Aachen Allemagne12.0%
Germe de blé origine ue. Sésame origine non-ue.12.0%
France12.0%
fabriqué en France.pommes origine UE. noisettes origine UE et non UE12.0%
Fabriqué en France par Nutrition et Santé. Farine de blé: France. Figues : non UE12.0%

ingredients_lc categorical metadata

This column appears to be a language code for ingredient text, with only 4 distinct values across 50 rows. French dominates at 70% (35 rows), followed by English (11), with Bulgarian and German trailing at 2 each. The skew is heavy and the entropy ratio of 0.606 confirms concentration around a single language.

Treatment: One-hot encode or use as a filter; consider grouping rare languages into 'other'.

anthropic:claude-opus-4-7 · confidence high
Out[599]:

saturn.columns["ingredients_lc"].stats

statvalue
n50
nulls0 (0.0%)
unique4
top_value fr
top_rate 0.7
cardinality 4
entropy 1.212
entropy_ratio 0.6061
Fig 152.
Top values for ingredients_lc.
Show data table
Top values for ingredients_lc (4 unique shown, of 4 total).
valuecountshare
fr3570.0%
en1122.0%
bg24.0%
de24.0%

packaging_old categorical free_text

Free-form packaging descriptions, almost certainly from an Open Food Facts-style export, mixing French and English tokens plus language-prefixed tags (e.g. 'fr:Triman', 'en:Bottle'). With 40 unique values across 50 rows and entropy_ratio 0.99, it's near-unique; the top value 'Plastique' covers only 6.98% and 14% are null. Entries are comma-separated multi-tags of varying granularity, so this behaves more like a tag list than a clean category.

Treatment: Split on commas, normalise language prefixes, and one-hot the resulting tag set rather than treating raw strings as categories.

anthropic:claude-opus-4-7 · confidence high
Out[602]:

saturn.columns["packaging_old"].stats

statvalue
n50
nulls7 (14.0%)
unique40
top_value Plastique
top_rate 0.06977
cardinality 40
entropy 5.269
entropy_ratio 0.9901
alert: long_tail38 singleton categories
Fig 153.
Top values for packaging_old.
Show data table
Top values for packaging_old (20 unique shown, of 40 total).
valuecountshare
Plastique36.0%
24.0%
Paquet, Etui en carton, Film en plastique12.0%
Cardboard, Container, Packaging, Paperboard, Aluminium wrap, Caja de cartón, Box cardboard, Card-box, Foil-wrapper, pt:Papel de aluminio12.0%
Sachet, Carton, Paquet, 20 biscuits en 4 sachets12.0%
Cardboard, Non-corrugated cardboard, Produkt, fr:FSC mixte, sl:PAP12.0%
fr:Point vert,fr:Triman,fr:Bouteille et bouchon 100% recyclable,fr:PET,en:Bottle12.0%
Métal, Papier, en:Recyclable Metals, Aluminium12.0%
Plastic, Envelope, Mixed plastic-packet12.0%
Papier, Enveloppe, en:Package paper, en:Paper recycling12.0%
Métal, en:Recyclable Metals, Aluminium, Carton, Emballage carton12.0%
Sachet, Sous atmosphère protectrice, en:mixed plastic-packet12.0%
Paper, Film12.0%
fr:emballage carton, fr:papier aluminium12.0%
Film en plastique, Film plastique à jeter, Étui carton à recycler12.0%
fr:Plastique,fr:Sachet plastique de 3g,en:mixed plastic-packet12.0%
Papier, Enveloppe12.0%
Papier12.0%
Plastic12.0%
Container, Caja de cartón, Aluminium-wrapper, Card-carton, pt:Papel de aluminio12.0%

packaging_text_fr categorical free_text

Free-text French packaging instructions, mostly empty: 34 of 50 rows (top_rate 0.723) are blank and another 6% are null. Of 14 distinct values, the populated ones are heterogeneous descriptions of materials and recycling instructions (plastic films, cardboard étuis, aluminium sheets), with one outlier containing OCR-like artefacts and a date string. Entropy ratio 0.49 confirms the long-tail alert: almost every non-empty entry is unique.

Treatment: Treat blanks as missing and parse remaining strings for material/recyclability tokens rather than using as a categorical.

anthropic:claude-opus-4-7 · confidence high
Out[605]:

saturn.columns["packaging_text_fr"].stats

statvalue
n50
nulls3 (6.0%)
unique14
top_value
top_rate 0.7234
cardinality 14
entropy 1.874
entropy_ratio 0.4923
alert: long_tail13 singleton categories
Fig 154.
Top values for packaging_text_fr.
Show data table
Top values for packaging_text_fr (14 unique shown, of 14 total).
valuecountshare
3468.0%
1 film en plastique à recycler 1 étui en papier ondulé à recycler12.0%
carton, plastique12.0%
1 bouchon en plastique à trier 1 bouteille en plastique à trier12.0%
1 étui en carton à recycler 1 feuille en aluminium à recycler12.0%
1 sachet plastique à jeter12.0%
1 étui en carton  à recycler 1 feuille en aluminium à recycler12.0%
LE TRI +FACILE + BAC DE TRI12.0%
4 FILMS PLASTIQUE A JETER 1 ÉTUI CARTON À RECYCLER12.0%
FR LE TRI + FACILE ÉTUI 8+ SACHETS BAC DE TRI A consommer de préférence avant le : en France par et Santé S.A.S. 10:02 11914538 112 eCastelnaudary REVEL 30 04 202412.0%
1 étui carton à recycler, 1 film plastique à jeter, 1 barquette plastique à jeter.12.0%
1 FEUILLE PAPIER À RECYCLER, 1 FEUILLE METAL À RECYCLER, 1 FILM PLASTIQUE À JETER12.0%
Sachet, clip à recycler12.0%
2 sachets en plastique à recycler 1 boîte en carton à recycler12.0%

nova_group_debug categorical metadata

A categorical debug/diagnostic field, presumably trace messages from a NOVA food-group classifier. It's overwhelmingly empty (96% blank, 48 of 50 rows), with only two non-empty entries — both error strings explaining that NOVA classification was skipped due to unknown ingredients. Entropy ratio of 0.178 confirms near-zero information content.

Treatment: Drop; near-constant debug log with no modelling value.

anthropic:claude-opus-4-7 · confidence high
Out[608]:

saturn.columns["nova_group_debug"].stats

statvalue
n50
nulls0 (0.0%)
unique3
top_value
top_rate 0.96
cardinality 3
entropy 0.2823
entropy_ratio 0.1781
alert: long_tail2 singleton categories
alert: imbalancetop value is 96.0% of rows
Fig 155.
Top values for nova_group_debug.
Show data table
Top values for nova_group_debug (3 unique shown, of 3 total).
valuecountshare
4896.0%
no nova group if too many ingredients are unknown: 5 out of 512.0%
no nova group if too many ingredients are unknown: 13 out of 1312.0%

ingredients_original_tags unknown free_text

The column `ingredients_original_tags` was skipped by the profiler, so no statistics, uniqueness, or value samples are available beyond a row count of 50 and a null rate of 0.0. The name suggests a list-valued field of ingredient tags (likely arrays or delimited strings), which is consistent with the profiler classifying it as `unknown` and bailing out. Without type or cardinality signals, nothing further can be inferred from the evidence.

Treatment: Parse the list/array structure and explode or multi-hot encode before downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[611]:

saturn.columns["ingredients_original_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

data_quality_completeness_tags unknown other

This column is named data_quality_completeness_tags but saturn skipped profiling it, so its kind is unknown and no uniqueness or value statistics were computed. The only confirmed signals are that it has 50 rows with a null rate of 0.0. Without sample values or cardinality, its actual content (likely tag strings about completeness checks) cannot be verified from the evidence.

Treatment: Re-profile with explicit parsing before deciding how to use it downstream.

anthropic:claude-opus-4-7 · confidence low
Out[613]:

saturn.columns["data_quality_completeness_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

cities_tags unknown other

The column `cities_tags` was skipped by the profiler, so no type inference, uniqueness count, or value statistics are available. Only two facts are known: 50 rows were seen and none were null. Without further stats the content and structure cannot be characterised.

Treatment: Re-profile or inspect raw values manually before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[615]:

saturn.columns["cities_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

countries_hierarchy unknown feature

Column `countries_hierarchy` was skipped by the profiler, so no kind, uniqueness, or value statistics are available beyond a row count of 50 with zero nulls. The name suggests a nested or list-like representation of country tags (e.g., `en:france > en:europe`), which likely tripped the type detector. Treat the absence of stats as a signal that the values are non-scalar rather than missing.

Treatment: Parse the hierarchical strings into a list of country tags, then explode or one-hot encode before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[617]:

saturn.columns["countries_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutriscore_score_opposite numeric feature

Numeric column holding the negation of a Nutri-Score (range -40 to 0, median -19), so lower values correspond to better nutritional grades. Distribution is roughly symmetric (skew 0.16, kurtosis -0.53) with no outliers and a tight IQR of 15. Notable signals: 2% nulls, 8% zeros, and only 28 unique values across 50 rows, consistent with an integer score derived by sign-flipping the original Nutri-Score.

Treatment: Use as-is for modelling, or invert the sign back to the original Nutri-Score for interpretability.

anthropic:claude-opus-4-7 · confidence high
Out[619]:

saturn.columns["nutriscore_score_opposite"].stats

statvalue
n50
nulls1 (2.0%)
unique28
min -40
max 0
mean -17.47
median -19
std 9.906
q1 -25
q3 -10
iqr 15
skew 0.1616
kurtosis -0.5337
n_outliers 0
outlier_rate 0
zero_rate 0.08163
Fig 156.
Distribution of nutriscore_score_opposite. Vertical dash marks the median.
Show data table
Histogram bins for nutriscore_score_opposite (median: -19.0).
bincount
-40 – -34.292
-34.29 – -28.572
-28.57 – -22.8612
-22.86 – -17.1413
-17.14 – -11.437
-11.43 – -5.7145
-5.714 – 08

categories_properties_tags unknown other

The column `categories_properties_tags` was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a nested or multi-valued field (categories/properties/tags), which likely tripped the dissector's scalar assumptions. Without distinct-value or sample evidence, its actual content and cardinality are unknown.

Treatment: Re-profile after flattening or JSON-parsing this field before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[622]:

saturn.columns["categories_properties_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origins_lc categorical feature

This is a lowercase language/origin code with 6 distinct values across 50 rows and a 4% null rate. The distribution is dominated by 'fr' (23) and 'en' (20), together accounting for nearly all non-null entries, while 'es', 'de', 'it', and 'pl' appear only once or twice each. Entropy ratio of 0.61 confirms the heavy concentration in two categories.

Treatment: One-hot encode with rare categories (es/de/it/pl) collapsed into an 'other' bucket.

anthropic:claude-opus-4-7 · confidence high
Out[624]:

saturn.columns["origins_lc"].stats

statvalue
n50
nulls2 (4.0%)
unique6
top_value fr
top_rate 0.4792
cardinality 6
entropy 1.575
entropy_ratio 0.6093
Fig 157.
Top values for origins_lc.
Show data table
Top values for origins_lc (6 unique shown, of 6 total).
valuecountshare
fr2346.0%
en2040.0%
es24.0%
de12.0%
it12.0%
pl12.0%

ciqual_food_name_tags unknown other

This column, ciqual_food_name_tags, was skipped by the profiler so no distributional statistics are available. The only confirmed signals are 50 rows present and a null rate of 0.0; uniqueness, value samples, and type are all missing. Based on the name alone it likely holds CIQUAL food-name tag strings, but that cannot be verified from the evidence.

Treatment: Re-run the profiler on this column to recover type and cardinality before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[627]:

saturn.columns["ciqual_food_name_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

countries categorical free_text

Free-text country list per record, not a clean categorical: 43 unique values across 50 rows (entropy ratio 0.97) with the top value 'Maroc' at only 10%. Values mix languages (Maroc vs Morocco, Belgique vs Belgium), comma-separated multi-country strings, and even an 'en:switzerland' prefix, so the same country appears in several surface forms. The 'long_tail' alert is consistent with this near-unique, multi-label encoding.

Treatment: Split on commas, normalise language and prefixes to ISO country codes, then one-hot or multi-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[629]:

saturn.columns["countries"].stats

statvalue
n50
nulls0 (0.0%)
unique43
top_value Maroc
top_rate 0.1
cardinality 43
entropy 5.252
entropy_ratio 0.9678
alert: long_tail41 singleton categories
Fig 158.
Top values for countries.
Show data table
Top values for countries (20 unique shown, of 43 total).
valuecountshare
Maroc510.0%
Morocco48.0%
Morocco,United States12.0%
Algeria,Belgium,France,French Polynesia,Germany,Guadeloupe,Hungary,Luxembourg,Martinique,Morocco,New Caledonia,Réunion,Spain,Switzerland,United States12.0%
Algérie,Autriche,Belgique,Bulgarie,Canada,République tchèque,Finlande,France,Polynésie française,Allemagne,Irlande,Italie,Maurice,Maroc,Pays-Bas,Norvège,La Réunion,Roumanie,Singapour,Espagne,Suède,Suisse,Tunisie,Royaume-Uni12.0%
Belgium, Bulgaria, France, en:switzerland12.0%
Austria,Belgium,Bulgaria,Estonia,Finland,France,Germany,Italy,Lithuania,Slovakia,Slovenia,Spain,United Kingdom12.0%
Belgique,Côte d'Ivoire,France,Allemagne,Luxembourg,Mali,Martinique,Russie,Suisse,Royaume-Uni12.0%
Algeria,Cameroon,France,Morocco,Spain12.0%
France,Irlande,Suède,Royaume-Uni12.0%
Francia,Alemania,Italia,Marruecos,Portugal,Rumania,España,Suiza12.0%
France, Italy, Spain, Switzerland, en:reunion12.0%
Algérie,Belgique,République tchèque,France,Allemagne,Guadeloupe,Italie,Maroc,La Réunion,Espagne,Suisse12.0%
France,Germany,Spain,United Kingdom12.0%
Belgium, France, United Kingdom, en:ireland12.0%
Autriche,Belgique,France,Allemagne,Italie,Maroc,Pays-Bas,La Réunion,Espagne,Suisse12.0%
France,Luxembourg,Switzerland12.0%
Belgium,Bulgaria,Czech Republic,Finland,Germany,Netherlands,Poland,Spain12.0%
Belgique,France,Guadeloupe,Italie,La Réunion,Espagne,Suisse12.0%
Österreich,Belgien,Dänemark,Estland,Finnland,Frankreich,Deutschland,Italien,Luxemburg,Malta,Marokko,Niederlande,Portugal,Spanien,Schweden,Schweiz12.0%

ingredients_text_with_allergens_it categorical free_text

Italian-language ingredient lists with embedded HTML markup, one row per product. Coverage is poor: 68% of the 50 rows are null and the most common non-null value is the empty string (5 occurrences, 31% of present values), leaving only a handful of genuine ingredient strings. Among the 12 distinct values, contents range from short lists (e.g. "patate, olio di girasole, sale marino.") to long compound declarations, so length and structure vary widely.

Treatment: Strip HTML allergen tags, treat empty strings as null, then tokenize for NLP or extract allergen flags as features.

anthropic:claude-opus-4-7 · confidence high
Out[632]:

saturn.columns["ingredients_text_with_allergens_it"].stats

statvalue
n50
nulls34 (68.0%)
unique12
top_value
top_rate 0.3125
cardinality 12
entropy 3.274
entropy_ratio 0.9134
alert: long_tail11 singleton categories
alert: null_rate68.0% null
Fig 159.
Top values for ingredients_text_with_allergens_it.
Show data table
Top values for ingredients_text_with_allergens_it (12 unique shown, of 12 total).
valuecountshare
510.0%
Pasta di cacao, burro di cacao, cacao magro in polvere, zucchero. Può contenere nocciole, mandorle, altra frutta a guscio, latte, soia.12.0%
crema alle NOCCIOLE e al cacao 40% (zucchero, olio di palma, NOCCIOLE 13%, LATTE Scremato in polvere 8.7%, cacao magro 7,4%, emulsionanti: lecitine (SOIA): vanillina), farina di FRUMENTO (32%), grassi vegetali (palma, palmisto), zucchero di canna (9%), LATTOSIO, crusca di FRUMENTO, LATTE intero in polvere, estratto in polvere di malto d'ORZO e mais, miele, agenti lievitanti (difosfato disodico. carbonato acido di ammonio, carbonato acido di sodio), cacao magro, sale, amido di FRUMENTO, farina di ORZO maltato, emulsionanti: lecitine (SOIA), vanillina.12.0%
pasta di cacao, zucchero, burro di cacao, vaniglia12.0%
patate, olio di girasole, sale marino.12.0%
Pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna, vaniglia.12.0%
Farina integrale di segale (59 g), crusca di grano (27 g), fiocchi d'avena (12 g), semi di sesamo (7,0 g), germe di grano, sale. Può contenere tracce di latte.12.0%
Farina di FRUMENTO, olio di palma, sciroppo di glucosio, estratto di malto d'ORZO, agenti lievitanti (carbonati di ammonio, carbonati di sodio), sale, UOVA, aroma, agente di trattamento della farina (METABISOLFITO di sodio).12.0%
Pasta di cacao, zucchero, burro di cacao, vaniglia.12.0%
Massa di cacao, zucchero, burro di cacao, emulsionante: lecitine (soia); estratto di vaniglia. Può contenere tracce di frutta a guscio e latte. Il 40% della massa di cacao proviene da piantagioni selezionate dell'Ecuador.12.0%
wdrated potatoes, sunflower oll, wheat flour, corn lour.test NRC b ber otin. Emulgator (E471), Salz, Farbstoff (Annatto Norbirin, k hottom (BB). Packaged in a protective atmosphere, (DE) KNAEF Kam ef s1sel colorant (n0rbixine de rocou). Peut contenir lait, soja. À conse gie vepackt. (FR) SNACK SALE. INGREDIENTS: Pommes de terre disht SNCK SALATO. : Patate disidratate, olio di girasole, (arina d frmu botisiha d annatto). Puo contenere latte, sola. Da consumarsi prelerbilmetp SEL NGREDIENTES: Batatas desidratadas, óleo de girasol, farinha de trigo.(aimha d mh e o, Pode conter leite, soja. Consumir de preferëncia antes de: ver fundo (BB), Enbazhyer OHTS Pttas deshidratadas, aceite de qirasol, harina de trigo, harina de maiz, haia ca rm e eche, soja. Consumir preferentemente antes del: ver parte interior (8B), Enast et 'Releenc itle dn 100 g | RI" /30g| Eectsge/Ayt acuilo medo 84U bole / Prodoth te /30g ji begja /Valor energetico Tpas (Grassi/ Unjdos / Grasas tan eậticte Fetsäuren / dont 2214 kJ 664 kJ 530 kcal 159 kcal adulo medio / 8% 31g 3.0 9 9.3 0.9g 17g 13% Produoad by: see yd Aii dd cassi satui / dos quais Producido por urdes thtrde | Glucites | 5% oidrati / MedaCoyK Sabd 55g 7% Uont sucres /di eui *FRSCAME QNg12.0%
25% noci, 25% mandorle, 25% uva sultanina (99,5% uva sultanina, olio di semi di girasole), 25% mirtilli rossi americani, essiccati e zuccherati (60% mirtilli rossi americani, 39% zucchero, olio di semi di girasole). Può contenere tracce di altra frutta a guscio e arachidi. Confezionato in atmosfera protettiva.12.0%

packaging_lc categorical metadata

This column appears to be the language code of packaging text, with 7 distinct ISO-style codes across 50 rows. French and English tie at 17 occurrences each, though the reported top_rate of 0.386 reflects only one being chosen as top_value ('fr'); German trails at 5, with Portuguese, Italian, Spanish, and Croatian as singletons. A 12% null rate and entropy ratio of 0.71 indicate moderate diversity but a clear FR/EN dominance.

Treatment: Treat as a low-cardinality categorical; impute nulls and one-hot encode, optionally collapsing rare codes into 'other'.

anthropic:claude-opus-4-7 · confidence high
Out[635]:

saturn.columns["packaging_lc"].stats

statvalue
n50
nulls6 (12.0%)
unique7
top_value fr
top_rate 0.3864
cardinality 7
entropy 1.992
entropy_ratio 0.7094
Fig 160.
Top values for packaging_lc.
Show data table
Top values for packaging_lc (7 unique shown, of 7 total).
valuecountshare
fr1734.0%
en1734.0%
de510.0%
pt24.0%
it12.0%
es12.0%
hr12.0%

correctors_tags unknown other

The column `correctors_tags` was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds tags identifying correctors, plausibly a list- or set-valued field that the dissector could not coerce into a known kind. Without further stats, nothing can be said about cardinality, value mix, or skew.

Treatment: Re-profile after parsing into a primitive type (e.g., explode tags into strings) before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[638]:

saturn.columns["correctors_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

interface_version_created categorical metadata

This column appears to record the interface version in use when each record was created, encoded as a date-stamp with optional jQuery Mobile suffix. Only 3 distinct values appear across 50 rows, with '20120622' dominating at 59.2% and a long-tail '20130323.jqm' appearing just twice. Entropy ratio of 0.74 confirms moderate concentration, and there is a 2% null rate to account for.

Treatment: Treat as a low-cardinality categorical; one-hot encode or bucket the rare '20130323.jqm' level.

anthropic:claude-opus-4-7 · confidence high
Out[640]:

saturn.columns["interface_version_created"].stats

statvalue
n50
nulls1 (2.0%)
unique3
top_value 20120622
top_rate 0.5918
cardinality 3
entropy 1.167
entropy_ratio 0.7363
Fig 161.
Top values for interface_version_created.
Show data table
Top values for interface_version_created (3 unique shown, of 3 total).
valuecountshare
201206222958.0%
20150316.jqm21836.0%
20130323.jqm24.0%

states_tags unknown other

This column was skipped by the profiler, so no statistics beyond row count (50) and a 0.0 null rate are available. The name suggests it holds tag-like state markers, possibly multi-valued, but kind is reported as unknown and uniqueness is not measured. Without sampled values or cardinality, its content and structure cannot be characterised from the evidence.

Treatment: Re-profile with parsing enabled (likely a delimited tag list) before deciding whether to one-hot or drop.

anthropic:claude-opus-4-7 · confidence low
Out[643]:

saturn.columns["states_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutriscore_2021_tags unknown feature

This column is labelled nutriscore_2021_tags, suggesting Nutri-Score grade tags from a 2021 reference (typically values like a/b/c/d/e). Saturn skipped profiling, so no distribution, uniqueness, or value statistics are available beyond a row count of 50 and a 0.0 null rate. No further signal can be extracted without re-profiling.

Treatment: Re-profile or inspect manually; if confirmed categorical, treat as an ordinal Nutri-Score grade.

anthropic:claude-opus-4-7 · confidence low
Out[645]:

saturn.columns["nutriscore_2021_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

stores_tags unknown other

This column was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a multi-valued tag field associated with stores (likely a list or delimited string), but this cannot be confirmed from the evidence.

Treatment: Re-profile after parsing as a list/array to determine cardinality and tag distribution before use.

anthropic:claude-opus-4-7 · confidence low
Out[647]:

saturn.columns["stores_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

image_thumb_url categorical metadata

This column holds Open Food Facts product thumbnail URLs, one per row. Every one of the 50 values is unique (entropy_ratio 1.0, top_rate 0.02), so it acts as a per-row asset pointer rather than a categorical feature. URLs mix `front_fr` and `front_en` locale suffixes, hinting at a French/English product mix.

Treatment: Drop for modelling; retain only as a display link or for image-fetching pipelines.

anthropic:claude-opus-4-7 · confidence high
Out[649]:

saturn.columns["image_thumb_url"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 162.
Top values for image_thumb_url.
Show data table
Top values for image_thumb_url (20 unique shown, of 50 total).
valuecountshare
https://images.openfoodfacts.org/images/products/611/124/210/0992/front_fr.172.100.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/044/9283/front_en.605.100.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/9759/front_en.492.100.jpg12.0%
https://images.openfoodfacts.org/images/products/611/103/100/5064/front_fr.56.100.jpg12.0%
https://images.openfoodfacts.org/images/products/317/568/001/1480/front_en.221.100.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/099/5553/front_en.314.100.jpg12.0%
https://images.openfoodfacts.org/images/products/326/884/000/1008/front_fr.422.100.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1044/front_fr.50.100.jpg12.0%
https://images.openfoodfacts.org/images/products/842/519/771/2024/front_en.60.100.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/057/8464/front_en.29.100.jpg12.0%
https://images.openfoodfacts.org/images/products/611/125/934/3108/front_fr.25.100.jpg12.0%
https://images.openfoodfacts.org/images/products/336/260/001/1228/front_fr.38.100.jpg12.0%
https://images.openfoodfacts.org/images/products/800/050/031/0427/front_fr.488.100.jpg12.0%
https://images.openfoodfacts.org/images/products/730/040/048/1595/front_fr.242.100.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2651/front_en.159.100.jpg12.0%
https://images.openfoodfacts.org/images/products/506/004/264/1000/front_en.179.100.jpg12.0%
https://images.openfoodfacts.org/images/products/762/221/058/4724/front_en.95.100.jpg12.0%
https://images.openfoodfacts.org/images/products/304/692/002/2606/front_en.102.100.jpg12.0%
https://images.openfoodfacts.org/images/products/322/982/010/0234/front_fr.246.100.jpg12.0%
https://images.openfoodfacts.org/images/products/000/002/002/2464/front_en.301.100.jpg12.0%

categories_properties unknown other

This column was skipped by the profiler, so its type, cardinality, and value distribution are unknown beyond a count of 50 rows with no nulls. The name `categories_properties` suggests a nested or structured field (e.g., a list or dict of category attributes) that the profiler could not coerce into a scalar kind. Without parsed contents there is nothing further to infer.

Treatment: Inspect raw values and parse the nested structure (explode or flatten) before profiling again.

anthropic:claude-opus-4-7 · confidence low
Out[652]:

saturn.columns["categories_properties"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nucleotides_prev_tags unknown other

Saturn skipped profiling for this column, so its type and contents are unknown beyond a row count of 50 and a null rate of 0.0. The name suggests it holds prior tags associated with nucleotide records, possibly a list or nested structure that the profiler could not introspect. No uniqueness, distribution, or value statistics are available to characterise it further.

Treatment: Inspect raw values manually to determine structure before deciding on parsing or encoding.

anthropic:claude-opus-4-7 · confidence low
Out[654]:

saturn.columns["nucleotides_prev_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

allergens_from_ingredients categorical feature

Free-text allergen list parsed from ingredient strings, mixing Open Food Facts taxonomy codes (en:gluten, en:milk, en:soybeans) with raw multilingual tokens (blé, lait, NOISETTES, соеви). 30% of the 50 rows are empty strings and the remaining 35 unique values are nearly all singletons with duplicated tokens within a single cell, so this is dirty list-encoded data rather than a clean category.

Treatment: Split on commas, normalize to en: taxonomy codes, dedupe tokens, then multi-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[656]:

saturn.columns["allergens_from_ingredients"].stats

statvalue
n50
nulls0 (0.0%)
unique35
top_value
top_rate 0.3
cardinality 35
entropy 4.432
entropy_ratio 0.864
alert: long_tail33 singleton categories
Fig 163.
Top values for allergens_from_ingredients.
Show data table
Top values for allergens_from_ingredients (20 unique shown, of 35 total).
valuecountshare
1530.0%
en:gluten, froment24.0%
en:milk, en:milk, cream, banana12.0%
en:milk, en:milk, en:soybeans, en:gluten, en:gluten, en:gluten, blé, blé complet, lécithines de soja, lait12.0%
en:milk, Lécithine de soja, lait, blé, gluten, soja12.0%
en:gluten, en:gluten, en:gluten, en:sesame-seeds, en:gluten, blé12.0%
соеви12.0%
en:soybeans, en:nuts, en:milk, almonds, soya lecithin12.0%
en:milk, en:milk, en:gluten, froment, lait, lactosérum12.0%
en:soybeans, en:gluten, en:gluten, en:gluten, en:milk, en:gluten, en:gluten, en:soybeans, en:milk, en:nuts, NOISETTES , NOISETTES , LAIT , SOJA, FROMENT , BLE, LACTOSE, BLE, LAIT , ORGE , ORGE , FROMENT, SOJA, NOISETTES, NOISETTES, LAIT, SOJA, FROMENT, BLE, LACTOSE, BLE, LAIT, ORGE, ORGE, FROMENT, SOJA12.0%
SEIGLE, SEIGLE, SEIGLE, SEIGLE12.0%
en:milk, en:gluten, en:gluten, blé*12.0%
soya12.0%
en:gluten, en:sesame-seeds, en:gluten, SEIGLE , BLÉ , GRAINES DE SÉSAME , BLÉ, SEIGLE, BLÉ, GRAINES DE SÉSAME, BLÉ12.0%
en:soybeans, en:soybeans, en:gluten, blé, lécithine de soja12.0%
en:eggs, en:gluten, en:gluten, wheat flour, eggs12.0%
en:soybeans, en:soybeans, en:milk, en:milk, en:milk, en:gluten, Poudre de lait, Lécithine de soja12.0%
en:gluten, en:gluten, en:gluten, en:nuts, en:gluten, blé, noisettes, blé, orge, blé12.0%
en:soybeans12.0%
en:milk, en:nuts, ЛЕШНИЦИ, СОЯ12.0%

ingredients_text_with_allergens_fi categorical free_text

Finnish-language ingredient text with inline HTML allergen markup, mirroring the multilingual ingredient fields common in Open Food Facts. Coverage is extremely thin: null_rate is 0.9 and only 4 distinct values exist across n=50, with the empty string itself appearing twice as the top_value (top_rate 0.4 of non-nulls). The non-empty entries are long free-text strings wrapping allergens in tags rather than clean tokens.

Treatment: Strip HTML tags and tokenize for allergen extraction; otherwise drop, since 90% are null.

anthropic:claude-opus-4-7 · confidence high
Out[659]:

saturn.columns["ingredients_text_with_allergens_fi"].stats

statvalue
n50
nulls45 (90.0%)
unique4
top_value
top_rate 0.4
cardinality 4
entropy 1.922
entropy_ratio 0.961
alert: long_tail3 singleton categories
alert: null_rate90.0% null
Fig 164.
Top values for ingredients_text_with_allergens_fi.
Show data table
Top values for ingredients_text_with_allergens_fi (4 unique shown, of 4 total).
valuecountshare
24.0%
kaakaomassa, kaakaovoi, vähärasvainen kaakaojauhe, sokeri, vanilja. Saattaa sisältää hasselpähkinää, muita pähkinöitä, maitoa, soijaa. Tummassa suklaassa kaakaota vähintään 90%.12.0%
kaakaomassa, vähärasvainen kaakaojauhe, kaakaovoi, sokeri, emulgointiaine (soijalesitiini), vaniljauute. Suklaassa kaakaota vähintään 85 %. Saattaa sisältää pieniä määriä pähkinää ja maitoa.12.0%
VEHNÄJAUHO, palmuöljy, tärkkelyssiirappi, OHRAMALLASUUTE, nostatusaineet ammoniumkarbonaatit, natriumkarbonaatit), suola, KANANMUNAT, aromi, jauhonparanne (NATRIUMDISULFIITTI).12.0%

_keywords unknown other

The column `_keywords` was skipped by the profiler, so kind is unknown and no statistics (n_unique, value distribution, length, etc.) are available. The only confirmed signals are 50 rows with a 0.0 null rate. Without further evidence the content and structure cannot be characterised — the name suggests a keyword list, but this is not verified.

Treatment: Re-profile with appropriate parser (likely list/tokenized text) before deciding usage.

anthropic:claude-opus-4-7 · confidence low
Out[662]:

saturn.columns["_keywords"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

manufacturing_places categorical free_text

Free-text manufacturing locations, mostly country names but mixed with multi-token strings combining cities, regions, and postal codes. The dominant value is the empty string (20 of 50, top_rate 0.408), making missing-or-blank the modal state, with France a distant second at 9. Across 20 unique values the entropy_ratio of 0.737 plus the long_tail alert signals scattered, inconsistent formatting (e.g. 'France,Italie' vs full German address chains).

Treatment: Normalise blanks to null and parse/standardise to country tokens before using as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[664]:

saturn.columns["manufacturing_places"].stats

statvalue
n50
nulls1 (2.0%)
unique20
top_value
top_rate 0.4082
cardinality 20
entropy 3.187
entropy_ratio 0.7374
alert: long_tail16 singleton categories
Fig 165.
Top values for manufacturing_places.
Show data table
Top values for manufacturing_places (20 unique shown, of 20 total).
valuecountshare
2040.0%
France918.0%
Maroc24.0%
Espagne24.0%
Aachen12.0%
France,Italie12.0%
Barilla Sverige AB,682 82,Filipstad,Zweden12.0%
United Kingdom12.0%
France,Oloron-sainte-marie 6440012.0%
Übach-Palenberg,Heinsberg (Kreis),Köln (Regierungsbezirk),Nordrhein-Westfalen,Deutschland12.0%
Barilla Deutschland GmbH,Wasastrasze 10,29229,Celle,Allemagne12.0%
Biscuits12.0%
maroc12.0%
Peaugres 0734012.0%
Tanger,Maroc12.0%
Rausch Schokoladen GmbH,Peine (Landkreis),Niedersachsen,Deutschland12.0%
Revel (31250),Annoray,France12.0%
Allemagne12.0%
85150,Vendée,France,Pays de la Loire,La Mothe Achard12.0%
Belgique12.0%

pnns_groups_2 categorical label

This is a food sub-category label (PNNS group 2), with 11 distinct values across 50 rows and no nulls. The distribution is heavily concentrated in sweets: 'Biscuits and cakes' (17) and 'Chocolate products' (16) account for 33 of 50 rows, giving a top_rate of 0.34 and entropy_ratio of 0.75. Two rows carry the literal value 'unknown', which should be treated as missing rather than a real category.

Treatment: One-hot or target-encode after recoding 'unknown' to null.

anthropic:claude-opus-4-7 · confidence high
Out[667]:

saturn.columns["pnns_groups_2"].stats

statvalue
n50
nulls0 (0.0%)
unique11
top_value Biscuits and cakes
top_rate 0.34
cardinality 11
entropy 2.599
entropy_ratio 0.7513
Fig 166.
Top values for pnns_groups_2.
Show data table
Top values for pnns_groups_2 (11 unique shown, of 11 total).
valuecountshare
Biscuits and cakes1734.0%
Chocolate products1632.0%
Appetizers48.0%
Pastries36.0%
Bread24.0%
unknown24.0%
Sweets24.0%
Dairy desserts12.0%
Waters and flavored waters12.0%
Cereals12.0%
Dried fruits12.0%

ingredients_text_pl categorical free_text

Polish-language ingredient text for food products, almost entirely absent in this sample: 90% of rows are null and of the 5 non-null rows, 3 are empty strings, leaving only 2 genuine ingredient lists (both for cocoa-based chocolate). With n_unique=3 across 50 rows, this column carries virtually no usable signal here.

Treatment: Drop unless Polish-language analysis is required; too sparse to model.

anthropic:claude-opus-4-7 · confidence high
Out[670]:

saturn.columns["ingredients_text_pl"].stats

statvalue
n50
nulls45 (90.0%)
unique3
top_value
top_rate 0.6
cardinality 3
entropy 1.371
entropy_ratio 0.865
alert: long_tail2 singleton categories
alert: null_rate90.0% null
Fig 167.
Top values for ingredients_text_pl.
Show data table
Top values for ingredients_text_pl (3 unique shown, of 3 total).
valuecountshare
36.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, kakao w proszku o obniżonej zawartości tłuszczu, emulgator: lecytyny (soja); naturalny aromat waniliowy. Czekolada gorzka: masa kakaowa minimum 74 %. Może zawierać orzeszki ziemne, orzechy, mleko i gluten (pszenica, żyt jęczmień, owies, pszenica orkisz i pszenica khorosan).12.0%
Miazga kakaowa, cukier, tłuszcz kakaowy, wanilia.12.0%

generic_name_es categorical metadata

Spanish-language generic product name, sparsely populated with only 7 unique values across 50 rows and a 60% null rate. The non-null values skew heavily toward dark chocolate descriptions (e.g., 'Chocolate negro' appears twice, with several variants citing cacao percentages), suggesting the dataset is dominated by chocolate products. Top rate of 0.65 reflects the empty string acting as the modal 'value', so usable coverage is even thinner than the null rate alone implies.

Treatment: Treat empty strings as nulls and drop or backfill from a canonical product-name field before use.

anthropic:claude-opus-4-7 · confidence high
Out[673]:

saturn.columns["generic_name_es"].stats

statvalue
n50
nulls30 (60.0%)
unique7
top_value
top_rate 0.65
cardinality 7
entropy 1.817
entropy_ratio 0.6471
alert: long_tail5 singleton categories
alert: null_rate60.0% null
Fig 168.
Top values for generic_name_es.
Show data table
Top values for generic_name_es (7 unique shown, of 7 total).
valuecountshare
1326.0%
Chocolate negro24.0%
Chocolate negro con un 74% de cacao mínimo12.0%
Crackers12.0%
Tableta de chocolate negro extrafino con 70% de cacao12.0%
Tableta de chocolate negro Ecuador con un 70% de cacao mínimo12.0%
Chocolate Negro 99%12.0%

origin_en categorical metadata

Categorical column likely intended to mark country of origin in English, but it is effectively empty: of 50 rows, 14% are null and 42 of the remaining values are blank strings, leaving just one populated label ("France"). Cardinality is 2 with a top_rate of 0.977 and entropy_ratio of 0.16, so the field carries almost no information.

Treatment: Drop; the column is near-constant with blanks and only one real value.

anthropic:claude-opus-4-7 · confidence high
Out[676]:

saturn.columns["origin_en"].stats

statvalue
n50
nulls7 (14.0%)
unique2
top_value
top_rate 0.9767
cardinality 2
entropy 0.1594
entropy_ratio 0.1594
alert: imbalancetop value is 97.7% of rows
Fig 169.
Top values for origin_en.
Show data table
Top values for origin_en (2 unique shown, of 2 total).
valuecountshare
4284.0%
France12.0%

generic_name_it categorical metadata

Italian generic product name, evidently a localized label field on food items. 68% of rows are null and among the 16 non-null entries the most common value is the empty string (11 occurrences), leaving only 5 distinct real names like 'Cioccolato extra fondente' and 'Crackers'. Coverage is too sparse to be useful as-is.

Treatment: Drop or defer until Italian coverage improves; not usable at 68% null.

anthropic:claude-opus-4-7 · confidence high
Out[679]:

saturn.columns["generic_name_it"].stats

statvalue
n50
nulls34 (68.0%)
unique5
top_value
top_rate 0.6875
cardinality 5
entropy 1.497
entropy_ratio 0.6446
alert: long_tail3 singleton categories
alert: null_rate68.0% null
Fig 170.
Top values for generic_name_it.
Show data table
Top values for generic_name_it (5 unique shown, of 5 total).
valuecountshare
1122.0%
Cioccolato extra fondente24.0%
Cioccolato fondente 90%12.0%
Prodotto da forno con segale ricco di fibre alimentari12.0%
Crackers12.0%

ingredients_that_may_be_from_palm_oil_n numeric feature

Count of ingredients that may be derived from palm oil per product. Values are extremely concentrated at zero (zero_rate 0.83, median and IQR both 0), with only 3 distinct values up to a max of 2, yet 17% of non-null rows register as outliers and skew is 2.23. An 8% null rate also means some products lack this assessment entirely.

Treatment: Binarise to zero/non-zero or drop, since the column is near-constant with heavy skew.

anthropic:claude-opus-4-7 · confidence high
Out[682]:

saturn.columns["ingredients_that_may_be_from_palm_oil_n"].stats

statvalue
n50
nulls4 (8.0%)
unique3
min 0
max 2
mean 0.1957
median 0
std 0.4531
q1 0
q3 0
iqr 0
skew 2.23
kurtosis 4.321
n_outliers 8
outlier_rate 0.1739
zero_rate 0.8261
alert: high_skewskew=+2.23
alert: outliers17.4% rows beyond 1.5 IQR
Fig 171.
Distribution of ingredients_that_may_be_from_palm_oil_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_that_may_be_from_palm_oil_n (median: 0.0).
bincount
0 – 0.333338
0.3333 – 0.66670
0.6667 – 10
1 – 1.3337
1.333 – 1.6670
1.667 – 21

ingredients_text_es categorical free_text

Spanish-language ingredient lists for food products, stored as free text. Of 50 rows, 60% are null and another 8 entries (top_rate 0.4) are empty strings, leaving only a handful of distinct populated values—mostly chocolate and cereal formulations with allergen markers like _TRIGO_ and _HUEVO_. The 13-value cardinality and high entropy ratio (0.84) reflect that nearly every non-empty entry is unique long-form prose, not a true category.

Treatment: Treat as multilingual free text: tokenize/embed or parse into ingredient lists; do not use as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[685]:

saturn.columns["ingredients_text_es"].stats

statvalue
n50
nulls30 (60.0%)
unique13
top_value
top_rate 0.4
cardinality 13
entropy 3.122
entropy_ratio 0.8437
alert: long_tail12 singleton categories
alert: null_rate60.0% null
Fig 172.
Top values for ingredients_text_es.
Show data table
Top values for ingredients_text_es (13 unique shown, of 13 total).
valuecountshare
816.0%
Pasta de cacao, manteca de cacao, cacao magro en polvo, azúcar, vainilla.12.0%
Azúcar, Grasa vegetal de palmiste parcialmente hidrogenada, Leche en polvo, Almendras, Cacao desgrasado en polvo, suero lácteo en polvo, Emulgente (lecitina de soja), aroma (vainilla).12.0%
Crema de avellanas y cacao 40% (azúcar, manteca de palma, avellanas 13%, leche desnatada en polvo 8,7%, cacao desgrasado 7.4%, emulgentes (lecitinas (soja), vainillina), harina de trigo 32,5%, grasas vegetales (palma, palmiste), azúcar de caña 8,5% (trigo), lactosa, salvado de trigo, leche entera en polvo, extracto en polvo de malta de cebada y maíz, miel, gasificantes (difosfato disódico, carbonato ácido de sodio, carbonato ácido de amonio), cacao desgrasado, sal, almidón de trigo, harina de cebada, malteada, emulsionantes (lecitinas (soja), vainillina.12.0%
70% pasta de cacao*, azúcar, rnanteca de cacao, cacao desgrasado en polvo, emulgente: lecitlna de girasol (E-322), aroma natural de vainilla. *Pasta de cacao Ralnforest Alliance Certified cocoa. Cacao: 74% mínimo.12.0%
Harina de _TRIGO_, grasa de palma, extracto de malta de _CEBADA_, gasificantes (carbonatos de amonio, carbonatos de sodio), sal, _HUEVO_, aroma, agente de tratamiento de la harina (_METABISULFITO_ sódico).12.0%
Pasta de cacao, azúcar, manteca de cacao, vainilla.12.0%
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.12.0%
Copos de avena integral (60%),azúcar, aceite refinado de girasol, miel (3%), sal, melaza de caña, emulgente (lecitina de girasol), gasificante (carbonato ácido de sodio),12.0%
Pasta de cacao, cacao magro, manteca de cacao, azúcar moreno de caña12.0%
Zucker, Kakaobutter, Magermilchpulver, Kakaomasse, Molkenpulver (Milch), Butterreinfett, Emulgator (Sojalecithin), Haselnusspaste, natürliches Aroma12.0%
pasta de cacao, azúcar, manteca de cacao, emulgente (lecitina de _soja_), vainilla. Cacao: 70% mínimo.12.0%
Pasta de cacao, cacao desgrasado en polvo, manteca de cacao, azúcar, leche en polvo, pasta de almendras y avellanas, emulgentes (lecitinas de soja, girasol), aroma12.0%

teams categorical feature

This column holds team affiliations as comma-separated lists of slugs, with 39 unique combinations across 50 rows and an 8% null rate. Cardinality is extreme (entropy ratio 0.97) and the most common value 'pain-au-chocolat' covers only 10.9%, while several rows pack 4-14 teams into one string. The mix of single-team and multi-team entries means this is effectively a multi-label field stored as a delimited string.

Treatment: split on commas and one-hot encode as multi-label team membership before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[688]:

saturn.columns["teams"].stats

statvalue
n50
nulls4 (8.0%)
unique39
top_value pain-au-chocolat
top_rate 0.1087
cardinality 39
entropy 5.124
entropy_ratio 0.9695
alert: long_tail36 singleton categories
Fig 173.
Top values for teams.
Show data table
Top values for teams (20 unique shown, of 39 total).
valuecountshare
pain-au-chocolat510.0%
stakano,chocolatine36.0%
swipe-studio,pain-au-chocolat24.0%
stakano,chocolatine,la-robe-est-bleue12.0%
pain-au-chocolat,shark-attack,chocolatine,la-robe-est-bleue,stakano,dietreflux,m,b,c,swipe-studio,gmlaa,heathy-app-cross-eat,specialtiz12.0%
stakano,chocolatine,swipe-studio,pain-au-chocolat12.0%
chocolatine,la-robe-est-bleue,scaneco,feat,stakano,specialtiz12.0%
gmlaa,pain-au-chocolat12.0%
stakano,chocolatine,scaneco,gmlaa,pain-au-chocolat12.0%
houda,chocolatine,la-robe-est-bleue,stakano12.0%
pain-au-chocolat,specialtiz,gmlaa12.0%
stakano,chocolatine,pain-au-chocolat12.0%
chocolatine,la-robe-est-bleue,pain-au-chocolat,stakano12.0%
pain-au-chocolat,shark-attack,swipe-studio,stakano,chocolatine,italy,feat12.0%
chocolatine,la-robe-est-bleue,pain-au-chocolat,shark-attack,feat12.0%
vendredi,pain-au-chocolat,stakano,chocolatine,gmlaa,italy12.0%
swipe-studio,pain-au-chocolat,chocolatine,la-robe-est-bleue,gmlaa12.0%
pain-au-chocolat,chocolatine,la-robe-est-bleue,vegan,specialtiz12.0%
chocolatine,la-robe-est-bleue,pain-au-chocolat,feat,stakano12.0%
swipe-studio,feat,bodysupport,pain-au-chocolat12.0%

food_groups_tags unknown feature

This column is labeled food_groups_tags, suggesting it holds categorical food group classifiers (likely list-valued or comma-delimited tags). Saturn skipped profiling, so no uniqueness, cardinality, or distribution stats are available beyond a 50-row sample with zero nulls. Treat any inferences cautiously until the column is re-profiled with a parser that handles its native type.

Treatment: Re-profile after parsing as a list/multi-label field, then one-hot or multi-hot encode the tags.

anthropic:claude-opus-4-7 · confidence low
Out[691]:

saturn.columns["food_groups_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

data_quality_warnings_tags unknown metadata

This column is named data_quality_warnings_tags, suggesting it carries flags or tag arrays describing data quality issues per row. Saturn skipped profiling, so no uniqueness, value distribution, or stats are available beyond a 50-row sample with 0% nulls. Without parsed contents it is impossible to tell whether the field is empty strings, lists, or structured tags.

Treatment: Inspect raw values manually and parse tag structure before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[693]:

saturn.columns["data_quality_warnings_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

debug_tags unknown metadata

Column 'debug_tags' was skipped by the profiler and classified as kind 'unknown', so no descriptive statistics are available beyond a row count of 50 and a null rate of 0.0. Uniqueness, distribution, and content type are all unreported, meaning we cannot infer what the field carries. The name suggests internal debugging annotations rather than analytical signal.

Treatment: Drop unless a downstream consumer specifically needs the debug annotations.

anthropic:claude-opus-4-7 · confidence low
Out[695]:

saturn.columns["debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

main_countries_tags unknown feature

The column `main_countries_tags` was skipped during profiling, so no type, cardinality, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests a tags-style field listing principal countries, likely delimited or list-valued, which is probably why the profiler bailed. Nothing else can be inferred without re-profiling with list/text handling enabled.

Treatment: Re-profile as a multi-valued tag field, then split/explode and one-hot encode before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[697]:

saturn.columns["main_countries_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origins_hierarchy unknown other

Profiling was skipped for this column, so saturn emitted no type, uniqueness, or value statistics beyond a row count of 50 and a null rate of 0.0. The name suggests a nested or path-like representation of origin categories (e.g. a taxonomy hierarchy), but without parsed values this is inference from the label only. Treat it as opaque until reprofiled.

Treatment: Reprofile after parsing the hierarchy (split path or explode levels) before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[699]:

saturn.columns["origins_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packagings_complete numeric feature

This is a binary 0/1 flag (n_unique=2, min=0, max=1) indicating whether packaging information is complete. The split is nearly even with a mean of 0.52 and zero_rate of 0.48, and 4% of rows are null. The strongly negative kurtosis (-1.99) is expected for a balanced binary variable.

Treatment: Cast to boolean and impute the 4% nulls before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[701]:

saturn.columns["packagings_complete"].stats

statvalue
n50
nulls2 (4.0%)
unique2
min 0
max 1
mean 0.5208
median 1
std 0.5049
q1 0
q3 1
iqr 1
skew -0.08341
kurtosis -1.993
n_outliers 0
outlier_rate 0
zero_rate 0.4792
Fig 174.
Distribution of packagings_complete. Vertical dash marks the median.
Show data table
Histogram bins for packagings_complete (median: 1.0).
bincount
0 – 0.166723
0.1667 – 0.33330
0.3333 – 0.50
0.5 – 0.66670
0.6667 – 0.83330
0.8333 – 125

nutriscore_tags unknown label

This column is labeled nutriscore_tags, suggesting it holds Nutri-Score classification tags (likely letter grades a-e or arrays thereof) for food products. Profiling was skipped, so no cardinality, value distribution, or type stats are available beyond a 50-row sample with zero nulls. Without n_unique or value frequencies, the actual contents and structure remain unverified.

Treatment: Re-profile with parsing enabled, then one-hot or ordinal-encode the Nutri-Score grade for modelling.

anthropic:claude-opus-4-7 · confidence low
Out[704]:

saturn.columns["nutriscore_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_with_allergens_nl categorical free_text

Dutch-language ingredient lists with inline HTML `` markup, evidently the NL localisation of an Open Food Facts-style ingredients field. Coverage is poor: 78% null and only 9 distinct values across 50 rows, with the most common non-null entries being short cocoa/chocolate ingredient strings while one row is clearly mis-parsed packaging footer text (Mondelez addresses, URLs). Entropy ratio 0.95 confirms the few present values are nearly all unique, so this is free text rather than a category.

Treatment: Strip HTML allergen tags, then tokenize for NLP/allergen extraction; expect heavy missingness so do not use as a primary feature.

anthropic:claude-opus-4-7 · confidence high
Out[706]:

saturn.columns["ingredients_text_with_allergens_nl"].stats

statvalue
n50
nulls39 (78.0%)
unique9
top_value
top_rate 0.2727
cardinality 9
entropy 3.027
entropy_ratio 0.955
alert: long_tail8 singleton categories
alert: null_rate78.0% null
Fig 175.
Top values for ingredients_text_with_allergens_nl.
Show data table
Top values for ingredients_text_with_allergens_nl (9 unique shown, of 9 total).
valuecountshare
36.0%
Cacaomassa, cacaoboter, magere cacaopoeder, suiker.12.0%
Aardappelen, zonnebloemolie, zeezout.12.0%
Cacaomassa, magere cacao, cacaoboter, bruine suiker, vanille. Kan noten, melk, soja, sesamzaad en tarwe bevatten.12.0%
Cacaomassa, suiker, cacaoboter, vanille.12.0%
Cacaomassa, magere cacaopoeder, cacaoboter, bruine suiker.12.0%
*Referentie inname van een gemiddelde volwassehe (8400 kJ/ 2000 ReJI), 16,7 g 46x4, www,snackmindful,com Milka www,milka,com ER Mondelez France SAS, 6 avenue Réaumur, CS 50014, 92142 Clamart Cedex, Service Consommateurs Nº Cristal:09,69,39,79,79 BE Mondelez Belgium, Stationsstraat 100, 2800 Mechelen, ND Mondelez Nederland, Verlengde Poolseweg 34, 4818 CL Breda, eu mondelezinternational,com e 100 g COCOA LIFE www,cocoalife,org 8 FR FRANCE ONLY 05 pp 3 045140 10550212.0%
tarwebloem 47%, melkchocolade 29% (suiker, cacaomassa, cacaoboter, weipoeder (van melk), magere melkpoeder, plantaardige vetten (shea, palm in wisselende verhoudingen), melkvet, emulgatoren (sojalecithine, E476), lactose (van melk), aroma), plantaardige oliën (palm, kokos), suiker, suikerstroop, tarwezemelen, rijsmiddelen (natriumwaterstofcarbonaat, ammoniumwaterstofcarbonaat), zout, tarwekiemen, voedingszuur (citroenzuur)12.0%
granen 98.3% (volkorentarwemeel 65.8%, roggebloem, tarwebloem 10.2%, rijstbloem, gemoute tarwebloem, tarwegriesmeel, boekweitbloem, gerstebloem), suiker, magere melkpoeder, zout, palmolie, tarwekiemen, emulgator (zonnebloemlecithine)12.0%

created_t numeric timestamp

Values are 10-digit integers ranging from 1,337,517,352 to 1,724,094,916 with all 50 rows unique and no nulls — consistent with Unix epoch seconds spanning roughly mid-2012 to mid-2024. The distribution is mildly right-skewed (skew 0.33) and platykurtic (kurtosis -0.81), with a median of 1,475,927,880.5 sitting near the mean, suggesting events are spread fairly evenly across the window rather than clustered. The name 'created_t' reinforces a creation-timestamp interpretation.

Treatment: convert from Unix seconds to datetime and derive features (year, recency) before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[709]:

saturn.columns["created_t"].stats

statvalue
n50
nulls0 (0.0%)
unique50
min 1.338e+09
max 1.724e+09
mean 1.483e+09
median 1.476e+09
std 1.043e+08
q1 1.386e+09
q3 1.555e+09
iqr 1.694e+08
skew 0.3311
kurtosis -0.8095
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 176.
Distribution of created_t. Vertical dash marks the median.
Show data table
Histogram bins for created_t (median: 1475927880.5).
bincount
1.338e+09 – 1.393e+0913
1.393e+09 – 1.448e+098
1.448e+09 – 1.503e+098
1.503e+09 – 1.558e+099
1.558e+09 – 1.614e+097
1.614e+09 – 1.669e+093
1.669e+09 – 1.724e+092

traces_hierarchy unknown other

Column 'traces_hierarchy' was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. Without kind inference or sample stats, the content remains unknown — the name hints at nested trace/hierarchy data (likely a complex or non-scalar structure), which is consistent with the profiler skipping it.

Treatment: Inspect raw values manually and parse the nested structure before it can be profiled or modelled.

anthropic:claude-opus-4-7 · confidence low
Out[712]:

saturn.columns["traces_hierarchy"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

generic_name_nb categorical metadata

This appears to be a Norwegian Bokmål generic name field, likely meant to hold localized drug or product names. It is effectively empty: 96% of the 50 rows are null and the only non-null value observed is the empty string (2 occurrences), giving cardinality 1 and zero entropy.

Treatment: Drop; the column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[714]:

saturn.columns["generic_name_nb"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 177.
Top values for generic_name_nb.
Show data table
Top values for generic_name_nb (1 unique shown, of 1 total).
valuecountshare
24.0%

ingredients_text_with_allergens_de categorical free_text

German-language ingredient lists with embedded HTML tags marking allergens like SOJA, WEIZEN, and HASELNÜSSE. Two-thirds of rows are null (null_rate 0.66) and among the 17 non-null values 16 are unique (entropy_ratio 0.99), with the empty string itself appearing twice as the top value. Casing and punctuation are inconsistent across entries, and one row contains lowercase OCR-style text with stray newlines.

Treatment: Strip HTML tags to extract allergen labels into a multi-hot feature, then drop or embed the residual text.

anthropic:claude-opus-4-7 · confidence high
Out[717]:

saturn.columns["ingredients_text_with_allergens_de"].stats

statvalue
n50
nulls33 (66.0%)
unique16
top_value
top_rate 0.1176
cardinality 16
entropy 3.97
entropy_ratio 0.9925
alert: long_tail15 singleton categories
alert: null_rate66.0% null
Fig 178.
Top values for ingredients_text_with_allergens_de.
Show data table
Top values for ingredients_text_with_allergens_de (16 unique shown, of 16 total).
valuecountshare
24.0%
Kakaomasse, Kakaobutter, fettarmes Kakaopulver, Zucker, Vanille12.0%
Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Zucker, Emulgator: Lecithine (Soja); Vanilleextrakt.12.0%
Nuss-Nugat-Creme 40 % (Zucker, Palmöl, HASELNÜSSE 13 %, MAGERMILCHPULVER 8.7%, fettarmer Kakao 7,4 %, Emulgator Lecithine (SOJA), Vanillin), WEIZENMEHL (32,5 %), pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5 % (enthält WEIZEN), MILCHZUCKER, WEIZENKLEIE, VOLLMILCHPULVER, GERSTENMALZ - und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, WEIZENSTÄRKE, GERSTENMALZMEHL, Emulgator Lecithine (SOJA), Vanillin12.0%
Kakaomasse, Zucker, Kakaobutter, Vanille12.0%
Kartoffeln, Sonnenblumenöl, Meersalz.12.0%
Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker, Vanille. Kann Schalenfrüchte, Milch, Soja, Sesamsamen und Weizen enthalten.12.0%
kakaomass of*, zucker, kakaobutter, kakaopulver stark entöit, emulgator: sonnenblumenlecithine (e-322), natürliche in vanille-aroma, * rainforest alliance certified, cocoa: 74% mindestens,12.0%
WEIZENMEHL, Palmöl, Glukosesirup, GERSTENMALZEXTRAKT, Backtriebmittel (Ammoniumcarbonate, Natriumcarbonate), Speisesalz 1,4 %, EIER, Aroma, Mehlbehandlungsmittel (NATRIUMMETABISULFIT).12.0%
Kakaomasse, Zucker, Kakaobutter, Emulgator: Lecithine (Soja); Vanilleextrakt.12.0%
Kartoffelpüreepulver, pflanzliche Öle (Sonnenblume, Palm, Mais) in veränderlichen Gewichtsanteilen, Weizenmehl, Maismehl, Reismehl, Maltodextrin, Emulgator (E471), Salz, Farbstoff (Annatto Norbixin).12.0%
Kakaomasse, fettarmes Kakaopulver, Kakaobutter . Kann Schalenfrüchte, Milch und Soja enthalten.12.0%
Alpenmilch Schokolade. Zutaten: Zucker, Kakaobutter, Magermilchpulver, Kakaomasse, Süßmolkenpulver (aus Milch), Butterreinfett, Haselnüsse, Emulgatoren (Sojalecithin, E476), Aroma. Kakao: 30 % mindestens. Kann andere Nüsse und Weizen enthalten. Ohne Farbstoffe** und Konservierungsstoffe** -**Gemäß rechtlicher Vorschriften.12.0%
Kakaomasse¹, Rohrzucker¹, Kakaobutter¹, Emulgator: Lecithine (Soja)¹. ¹aus kontrolliert ökologischem Anbau.12.0%
25% Walnusskerne, 25% Mandeln, 25% Sultaninen geschwefelt (Sultaninen, Sonnenblumenöl, Konservierungsstoff: Schwefeldioxid), 25% Cranberries (Cranberries, Zucker, Sonnenblumenöl).12.0%
Kakaomasse, Zucker, Kakaobutter, Emulgator (Sojalecithin), Vanille. Kann Haselnüsse, Mandeln, Milch enthalten.12.0%

ingredients_text_with_allergens_es categorical free_text

Spanish-language ingredients lists with inline HTML markup highlighting allergens like trigo, soja, avellanas, and lactosa. 62% of the 50 rows are null and another 7 entries are empty strings, leaving only a handful of populated free-text recipes; among those that exist, all 13 unique values appear nearly distinct (entropy ratio 0.87).

Treatment: Strip the allergen HTML tags, then tokenize/embed or parse into a structured allergen list before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[720]:

saturn.columns["ingredients_text_with_allergens_es"].stats

statvalue
n50
nulls31 (62.0%)
unique13
top_value
top_rate 0.3684
cardinality 13
entropy 3.214
entropy_ratio 0.8684
alert: long_tail12 singleton categories
alert: null_rate62.0% null
Fig 179.
Top values for ingredients_text_with_allergens_es.
Show data table
Top values for ingredients_text_with_allergens_es (13 unique shown, of 13 total).
valuecountshare
714.0%
Pasta de cacao, manteca de cacao, cacao magro en polvo, azúcar, vainilla.12.0%
Azúcar, Grasa vegetal de palmiste parcialmente hidrogenada, Leche en polvo, Almendras, Cacao desgrasado en polvo, suero lácteo en polvo, Emulgente (lecitina de soja), aroma (vainilla).12.0%
Crema de avellanas y cacao 40% (azúcar, manteca de palma, avellanas 13%, leche desnatada en polvo 8,7%, cacao desgrasado 7.4%, emulgentes (lecitinas (soja), vainillina), harina de trigo 32,5%, grasas vegetales (palma, palmiste), azúcar de caña 8,5% (trigo), lactosa, salvado de trigo, leche entera en polvo, extracto en polvo de malta de cebada y maíz, miel, gasificantes (difosfato disódico, carbonato ácido de sodio, carbonato ácido de amonio), cacao desgrasado, sal, almidón de trigo, harina de cebada, malteada, emulsionantes (lecitinas (soja), vainillina.12.0%
70% pasta de cacao*, azúcar, rnanteca de cacao, cacao desgrasado en polvo, emulgente: lecitlna de girasol (E-322), aroma natural de vainilla. *Pasta de cacao Ralnforest Alliance Certified cocoa. Cacao: 74% mínimo.12.0%
Harina de TRIGO, grasa de palma, extracto de malta de CEBADA, gasificantes (carbonatos de amonio, carbonatos de sodio), sal, HUEVO, aroma, agente de tratamiento de la harina (METABISULFITO sódico).12.0%
Pasta de cacao, azúcar, manteca de cacao, vainilla.12.0%
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.12.0%
Copos de avena integral (60%),azúcar, aceite refinado de girasol, miel (3%), sal, melaza de caña, emulgente (lecitina de girasol), gasificante (carbonato ácido de sodio),12.0%
Pasta de cacao, cacao magro, manteca de cacao, azúcar moreno de caña12.0%
Zucker, Kakaobutter, Magermilchpulver, Kakaomasse, Molkenpulver (Milch), Butterreinfett, Emulgator (Sojalecithin), Haselnusspaste, natürliches Aroma12.0%
pasta de cacao, azúcar, manteca de cacao, emulgente (lecitina de soja), vainilla. Cacao: 70% mínimo.12.0%
Pasta de cacao, cacao desgrasado en polvo, manteca de cacao, azúcar, leche en polvo, pasta de almendras y avellanas, emulgentes (lecitinas de soja, girasol), aroma12.0%

product_name_fr categorical free_text

French-language product names from what appears to be a food/grocery catalogue (chocolate bars, mineral water, biscuits). With 47 unique values across 50 rows and entropy ratio 0.996, this is essentially a free-text label rather than a categorical feature — the top value 'Henry's' only appears twice (4%). One null and a long-tail alert are flagged.

Treatment: Treat as free-text product label; tokenize and embed (or use as a join key to a product table) rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[723]:

saturn.columns["product_name_fr"].stats

statvalue
n50
nulls1 (2.0%)
unique47
top_value Henry’s
top_rate 0.04082
cardinality 47
entropy 5.533
entropy_ratio 0.9961
alert: long_tail45 singleton categories
Fig 180.
Top values for product_name_fr.
Show data table
Top values for product_name_fr (20 unique shown, of 47 total).
valuecountshare
Henry’s24.0%
Excellence Noir Subtil Doux 70% Cacao24.0%
Perly12.0%
Prince Goût Chocolat12.0%
Excellence Noir Prodigieux 90% Cacao12.0%
Tonik12.0%
Sésame12.0%
Chocolat noir - 85% cacao12.0%
CRISTALINE Eau De Source 0.5L12.0%
Maruja12.0%
Dark chocolate 70%12.0%
KING COOKIES12.0%
Sable coco Henry s 42g12.0%
Biscuits croquants au coeur onctueux de Nutella®12.0%
Tartine croustillante Authentique12.0%
Excellence Noir Intense 70% Cacao12.0%
Lightly sea salted crisps12.0%
Dark chocolate12.0%
Excellence Noir Puissant 85% Cacao12.0%
Fourrés Chocolat Noir12.0%

stores categorical feature

Comma-delimited list of retail chains where each product was observed (Lidl, Carrefour, Tesco, etc.), stored as a single string per row. Cardinality is high (31 unique across 50 rows, entropy_ratio 0.854) because most non-empty entries are bespoke multi-store concatenations appearing only once. The dominant value is an empty string at 29.17% top_rate plus a 4% null_rate, so roughly a third of rows carry no store information at all.

Treatment: Split on comma and one-hot or multi-hot encode individual store names; treat empty string as missing.

anthropic:claude-opus-4-7 · confidence high
Out[726]:

saturn.columns["stores"].stats

statvalue
n50
nulls2 (4.0%)
unique31
top_value
top_rate 0.2917
cardinality 31
entropy 4.233
entropy_ratio 0.8543
alert: long_tail29 singleton categories
Fig 181.
Top values for stores.
Show data table
Top values for stores (20 unique shown, of 31 total).
valuecountshare
1428.0%
Lidl510.0%
Carrefour Market,Magasins U,Auchan,Intermarché,Carrefour,Casino,Cora,Bi1,carrefour.fr,Netto,bannete,E.Leclerc12.0%
Carrefour,Géant,kupsch,Magasins U,Esselunga,Lindt,carrefour.fr,COOP,El Corte Inglés,Consum,Meny,Walmart12.0%
E.Leclerc,Carrefour,Auchan,Monoprix,carrefour.fr,Lidl,Intermarché12.0%
Sogeres,Holyday Inn Toulon12.0%
Leclerc,Magasins U,carrefour.fr,Intermarché12.0%
Magasins U,Carrefour,carrefour.fr,Carrefour Market,E.leclerc,Carrefour City,Intermarché12.0%
Carrefour,Magasins U,Sainsbury's,carrefour.fr,Plus,Albert Heijn,Asda,El Corte Inglés12.0%
Tesco12.0%
Magasins U,Carrefour,Auchan,carrefour.fr,E.leclerc,Carrefour Market,Carrefour City12.0%
LIDL,Monoprix,Carrefour,Auchan,Intermarché,Carrefour Market,Leclerc12.0%
Dia,Auchan,Magasins U,carrefour.fr,monoprix,Centre Commercial E.Leclerc12.0%
private shops,groceries,Marjane12.0%
Carrefour,E.Leclerc,REWE12.0%
biocoop12.0%
Franprix,Magasins U,Leclerc,E Leclerc,Delhaize,carrefour.fr,Carrefour,Auchan,Carrefour Market12.0%
Sainsbury's,Coop12.0%
E.leclerc12.0%
Franprix,Magasins U,Carrefour,carrefour.fr,Carrefour City12.0%

_id categorical identifier

This column is a unique record identifier — every one of the 50 rows has a distinct value (n_unique=50, top_rate=0.02, entropy_ratio=1.0). Values look like long numeric codes resembling EAN/GTIN barcodes (e.g., '6111242100992', '7622210578464'), with at least one shorter outlier ('20995553'). The long_tail alert simply reflects that each value occurs exactly once.

Treatment: drop from modelling features; retain as a join key.

anthropic:claude-opus-4-7 · confidence high
Out[729]:

saturn.columns["_id"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value 6111242100992
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 182.
Top values for _id.
Show data table
Top values for _id (20 unique shown, of 50 total).
valuecountshare
611124210099212.0%
762221044928312.0%
304692002975912.0%
611103100506412.0%
317568001148012.0%
2099555312.0%
326884000100812.0%
336260001104412.0%
842519771202412.0%
762221057846412.0%
611125934310812.0%
336260001122812.0%
800050031042712.0%
730040048159512.0%
304692002265112.0%
506004264100012.0%
762221058472412.0%
304692002260612.0%
322982010023412.0%
2002246412.0%

nutriments unknown other

The column 'nutriments' was skipped by the profiler, so no statistics, uniqueness, or value samples are available beyond a row count of 50 and a null rate of 0.0. The name suggests it likely holds nested nutritional data (e.g., a struct or JSON object per product), which is consistent with the profiler's inability to classify it as a standard kind. Without parsed contents we cannot describe its distribution or cardinality.

Treatment: Parse/flatten the nested structure into typed sub-columns before profiling or modelling.

anthropic:claude-opus-4-7 · confidence low
Out[732]:

saturn.columns["nutriments"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

editors unknown other

The column is named "editors" and was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. Across 50 rows there are zero nulls, but uniqueness, type, and value distribution are all unreported. Without further evidence, the content (likely a list or nested structure of editor entries) cannot be characterised.

Treatment: Inspect raw values manually to determine type before deciding whether to parse, explode, or drop.

anthropic:claude-opus-4-7 · confidence low
Out[734]:

saturn.columns["editors"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

max_imgid categorical identifier

`max_imgid` holds 38 distinct integer-like strings across 50 rows with no nulls, suggesting it stores the maximum image identifier per record. Distribution is nearly uniform (entropy_ratio 0.98) with the top value '47' appearing only 3 times (top_rate 0.06), so it behaves like a high-cardinality numeric id mis-typed as categorical. The long_tail alert confirms most values occur once or twice.

Treatment: Cast to integer and treat as a numeric id; do not one-hot encode.

anthropic:claude-opus-4-7 · confidence medium
Out[736]:

saturn.columns["max_imgid"].stats

statvalue
n50
nulls0 (0.0%)
unique38
top_value 47
top_rate 0.06
cardinality 38
entropy 5.149
entropy_ratio 0.9811
alert: long_tail27 singleton categories
Fig 183.
Top values for max_imgid.
Show data table
Top values for max_imgid (20 unique shown, of 38 total).
valuecountshare
4736.0%
10824.0%
1324.0%
1224.0%
624.0%
724.0%
8824.0%
1524.0%
8224.0%
6824.0%
7924.0%
2812.0%
23512.0%
912.0%
10512.0%
8012.0%
15812.0%
1112.0%
7312.0%
6612.0%

nutriscore_grade categorical label

This is the Nutri-Score grade, a categorical food-health rating with the expected letter levels a-e plus an 'unknown' bucket, giving 6 distinct values across 50 rows with no nulls. The distribution is heavily weighted toward the worst grade: 'e' alone accounts for 54% (27/50), while healthier grades 'a' and 'b' together cover only 6 rows. Entropy ratio of 0.74 confirms the imbalance, and the lone 'unknown' row signals a missing-data sentinel mixed in with the real grades.

Treatment: Treat as ordered categorical (a

anthropic:claude-opus-4-7 · confidence high
Out[739]:

saturn.columns["nutriscore_grade"].stats

statvalue
n50
nulls0 (0.0%)
unique6
top_value e
top_rate 0.54
cardinality 6
entropy 1.913
entropy_ratio 0.7399
Fig 184.
Top values for nutriscore_grade.
Show data table
Top values for nutriscore_grade (6 unique shown, of 6 total).
valuecountshare
e2754.0%
d918.0%
c714.0%
a48.0%
b24.0%
unknown12.0%

product_quantity_unit categorical metadata

Unit of measure for product quantities, taking only 'g' or 'ml'. The distribution is severely imbalanced: 'g' covers 44 of 45 non-null rows (top_rate 0.978) while 'ml' appears just once, and 10% of values are null. Entropy ratio of 0.154 confirms the column carries almost no information as-is.

Treatment: Likely drop or collapse to a binary indicator; near-constant with one rare 'ml' case.

anthropic:claude-opus-4-7 · confidence high
Out[742]:

saturn.columns["product_quantity_unit"].stats

statvalue
n50
nulls5 (10.0%)
unique2
top_value g
top_rate 0.9778
cardinality 2
entropy 0.1537
entropy_ratio 0.1537
alert: imbalancetop value is 97.8% of rows
Fig 185.
Top values for product_quantity_unit.
Show data table
Top values for product_quantity_unit (2 unique shown, of 2 total).
valuecountshare
g4488.0%
ml12.0%

ingredients_analysis_tags unknown metadata

This column is labelled ingredients_analysis_tags, suggesting it carries categorical or list-valued tags from an ingredient analysis pipeline. Saturn skipped profiling, so no uniqueness, frequency, or value statistics are available beyond a 50-row sample with zero nulls. Without further stats, neither cardinality nor structure (scalar vs. list) can be confirmed.

Treatment: Re-profile with list/tag-aware parsing before deciding to one-hot encode or drop.

anthropic:claude-opus-4-7 · confidence low
Out[745]:

saturn.columns["ingredients_analysis_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_with_allergens_fr categorical free_text

French-language ingredient lists with embedded HTML markup highlighting allergens. Near-unique across 47 of 50 rows (entropy ratio 0.998), with 4% nulls and 2 empty strings as the modal value. Content varies wildly in length and formatting, mixing prose, percentages, and tagged allergen tokens.

Treatment: Strip HTML tags, parse allergen spans into a structured list, then tokenize the remaining text for NLP.

anthropic:claude-opus-4-7 · confidence high
Out[747]:

saturn.columns["ingredients_text_with_allergens_fr"].stats

statvalue
n50
nulls2 (4.0%)
unique47
top_value
top_rate 0.04167
cardinality 47
entropy 5.543
entropy_ratio 0.998
alert: long_tail46 singleton categories
Fig 186.
Top values for ingredients_text_with_allergens_fr.
Show data table
Top values for ingredients_text_with_allergens_fr (20 unique shown, of 47 total).
valuecountshare
24.0%
Lait écrémé, crème, SUcre, ferments laciques12.0%
Céréale 50 % (Farine de blé 34,8 %, farine de blé complet 15,2 %), sucre, huiles végétales (palme, colza), cacao maigre en poudre 4,5 %, sirop de glucose, amidon de blé, poudres à lever (carbonates d'ammonium, carbonates de sodium), émulsifiant (lécithines de soja), sel, lait écrémé en poudre, perméat de lactosérum (de lait), arômes. Peut contenir œuf.12.0%
Pâte de cacao, beurre de cacao, cacao maigre, sucre, vanille.12.0%
Coffret fourré au cacao (41,6%) et à la vanille (208) - Ingrédients Farine de blé, sucre, huile végétale non hydrogénée (huile de palme), filtrat de lait, poudre de cacao Émulsifiant à faible teneur en cacao (322) Lécithine de soja) Agent levant (5000) Sucre artificiel (vanilline) Sel Contient du lait, du blé (gluten) du soja12.0%
Farine de blé 57%, sucre de canne roux, huile de colza, sésame toasté 10,6%, germe de blé 5,4%, farine complète de blé 5,4%, arôme naturel, magnésium, émulsifiant : lécithines, poudres à lever (tartrates de potassium, carbonates de sodium, carbonates d'ammonium), sel de mer, amidon de blé, vitamines (E, PP, B6, B1, B9).12.0%
Pâte de cacao, cacao maigre en poudre, beurre de cacao, sucre, émulsifiant : lécithines (soja) ; extrait de vanille. Traces éventuelles de fruits à coque et de lait.12.0%
Eau de source12.0%
Farine de froment, sucre, graisse végétale, sucre inverti, agents levants ( bicarbonate d'ammonium - bicarbonate de sodium), sel, arome.12.0%
Sucre, graisse vegetale de palmiste hidrogenée, Lait Enteir en poudre, Amandes, Cacao Dégraissé en poudre, lactoserum en poudre, Emulsifiant Lécithine de soja, Arômes (Vainilline).12.0%
دقيقالقمح،رقائق الشوكولاته20%[عجينة زيت النخلة.الكاكاو،سكر،دكستروز و مستحلب12.0%
Farine de froment, sucre, graisse végétale, noix de coco râpée, poudre de lait, poudre de lactosérum, sucre inverti, agents levants (bicarbonate d'ammonium - bicarbonate de Sodium), sel, arômes.12.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%**, LAIT écrémé en poudre 8,7%**, cacao maigre 7,4%**, émulsifiants : lécithines [SOJA]; vanilline), farine de FROMENT 32,5%, graisses végétales (palme, palmiste), sucre de canne (contient BLE) 8,5%, LACTOSE, son de BLE, LAIT en poudre, miel, poudres à lever (diphosphate disodique, carbonate acide de sodium, carbonate acide d'ammonium), farine d'ORGE malté, cacao maigre en poudre, sel, extrait en poudre de malt d'ORGE et de maïs, amidon de FROMENT, émulsifiants: lécithines [SOJA]; vanilline.12.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.12.0%
Pâte de cacao, sucre, beurre de cacao, vanille. Peut contenir des fruits à coque, du lait, du soja et des graines de sésame.12.0%
pâte de cacao*, beurre de cacao*, cacao maigre en poudre*, sucre de canne*, extrait de vanille*, * ingrédients issus de l'agriculture biologique12.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille12.0%
Farine de blé* 41%, Chocolat noir* 22% (pâte de cacao*, sucre de canne", beurre de cacao"), Sucre de canne* roux non raffiné, Farine complète de blé* 16%, Huile de tournesol oléique*, Arôme naturel de vanille, Lait écrémé en poudre, Sel de mer, carbonates d'ammonium, carbonates de sodium, gomme d'acacia*, extraits de romarin* Peut contenir du soja, des œufs, des fruits à coque, des graines de sésame et de la moutarde. *Ingrédients biologiques.12.0%
Pâte de cacao, sucre, beurre de cacao, cacao maigre en poudre, émulsifiant : lécithines (soja), arôme naturel de vanille.12.0%
Farine complète de SEIGLE 59 g*, son de BLÉ 27 g*, flocons d'AVOINE 12 g*, GRAINES DE SÉSAME 7,0 g*, germe de BLÉ, sel. *en g pour 100 g de produit fini. Peut contenir des traces de LUPIN, LAIT, MOUTARDE et SOJA.12.0%

interface_version_modified categorical metadata

A categorical column capturing an interface version stamp, with only 2 distinct values across 50 rows and no nulls. The distribution is heavily skewed: '20150316.jqm2' covers 84% (42 rows) while '20190830' accounts for the remaining 8 rows. The mixed format (one value carries a '.jqm2' suffix, the other is a bare date) suggests a schema or convention change between releases.

Treatment: Treat as a binary version flag; one-hot encode or collapse to pre/post-change indicator.

anthropic:claude-opus-4-7 · confidence high
Out[750]:

saturn.columns["interface_version_modified"].stats

statvalue
n50
nulls0 (0.0%)
unique2
top_value 20150316.jqm2
top_rate 0.84
cardinality 2
entropy 0.6343
entropy_ratio 0.6343
Fig 187.
Top values for interface_version_modified.
Show data table
Top values for interface_version_modified (2 unique shown, of 2 total).
valuecountshare
20150316.jqm24284.0%
20190830816.0%

data_sources_tags unknown other

The column 'data_sources_tags' was skipped by the profiler, so its kind, uniqueness, and value distribution are all unknown. The only confirmed signals are 50 rows with no nulls. Without parsed stats, the name suggests a multi-valued tag field (e.g., a list or delimited string of source labels), but this cannot be verified from the evidence.

Treatment: Manually inspect a sample to confirm structure, then explode tags into a multi-hot encoding before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[753]:

saturn.columns["data_sources_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_with_allergens_en categorical free_text

This column holds English ingredient lists with embedded HTML markup highlighting allergens like wheat, milk, soy, and nuts. With 36 unique values across 50 rows (entropy ratio 0.95) and a 16% null rate, it's near-unique free text; the top 'value' is actually the empty string (7 occurrences), and one row is junk ('Hhhhh'). The HTML tags and inconsistent casing/punctuation mean it needs cleaning before any allergen extraction.

Treatment: Strip HTML, normalize case, and parse allergen spans into a structured multi-label feature before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[755]:

saturn.columns["ingredients_text_with_allergens_en"].stats

statvalue
n50
nulls8 (16.0%)
unique36
top_value
top_rate 0.1667
cardinality 36
entropy 4.924
entropy_ratio 0.9525
alert: long_tail35 singleton categories
Fig 188.
Top values for ingredients_text_with_allergens_en.
Show data table
Top values for ingredients_text_with_allergens_en (20 unique shown, of 36 total).
valuecountshare
714.0%
milk cream, cream, sugar, banana, bacteria12.0%
WHEAT flour 35%, whole WHEAT flour 15.7%, sugar, vegetable oils (palm, rapeseed), low-fat cocoa powder 4.5%, glucose syrup, WHEAT starch, raising agents (ammonium bicarbonate, sodium bicarbonate, disodium diphosphate), emulsifiers (SOY lecithin, sunflower lecithin), salt, skimmed MILK powder, lactose and MILK proteins, flavors, MAY CONTAIN EGG.12.0%
cocoa mass, cocoa butter, fat reduced cocoa, sugar, vanilla12.0%
Wheat flour, brown cane sugar, rapeseed oil, toasted sesame 10.6%, wheat germ 5.4%, whole wheat flour 5.4%, natural flavor, magnesium, emulsifier: lecithins, raising agents (potassium tartrates, sodium carbonates, ammonium carbonates), sea salt, wheat starch, vitamins (E, PP, B6, B1, B9).12.0%
cocoa mass, low-fat cocoa powder, cocoa butter, sugar, emulsifier: lecithin (soy), vanilla extract, may contain traces of nuts and milk,12.0%
Hhhhh12.0%
sugar, cocoa butter, whole milk powder, cocoa mass, almonds, emulsifier (soya lecithin), flavoring12.0%
cocoa mass #, cane sugar #, cocoa butter #, vanilla extract #, may contain nuts, milk,12.0%
wholemeal rye flour (77 g*), rye flour (28 g*), yeast, salt, may contain traces of milk and sesame seeds, *in g per 100 g of product,12.0%
cocoa paste, sugar, cocoa butter, vanilla,12.0%
Potatoes, sunflower oil, sea salt. May contain Milk.12.0%
cocoa mass, cocoa butter, fat-reduced cocoa powder, cane sugar, vanilla extract12.0%
Pâte de cacao, cacao maigre, beurre de cacao, cassonade, vanille bourbon naturelle en gousse.12.0%
Wheat flour 39%, dark chocolate 25% (cocoa mass, cane sugar, cocoa butter), unrefined brown cane sugar, wholemeal wheat flour 15%, oleic sunflower oil, natural vanilla flavouring, skimmed milk powder, sea salt, raising agents: ammonium carbonates, sodium carbonates, thickener: acacia gum, antioxidant: rosemary extract.12.0%
cocoa mass, sugar, cocoa butter, fat reduced cocoa powder, emulsifier: lecithins (soya), natural vanilla flavouring, dark chocolate contains: cocoa solids 74% minimum,12.0%
whole rye flour (57 g), wheat bran (27 g), oatmeal (13 g), sesame seeds (7.9 g), wheat germ, salt.12.0%
wheat flour, palm oil, glucose syrup, barley malt extract, raising agents (ammonium carbonates, sodium carbonates), salt, eggs , flavouring, flour treatment agent (sodium metabisulfite ),12.0%
cocoa mass, sugar, cocoa butter, vanilla,12.0%
Farine de maïs* (70%), farine de riz*, sel marin. * K issus de l'agriculture biologique. • sans sucres ajoutés(¹) (contient des sucres naturellement présents.12.0%

removed_countries_tags unknown other

Column `removed_countries_tags` was skipped by the profiler, so no type, uniqueness, or distribution stats are available. The only facts on hand are 50 rows with a 0.0 null rate. The name suggests a list of country tags that were removed (likely a multi-valued tag field from an Open Food Facts-style schema), but this cannot be confirmed from the evidence.

Treatment: Re-profile with list/string parsing enabled before deciding whether to keep, explode, or drop.

anthropic:claude-opus-4-7 · confidence low
Out[758]:

saturn.columns["removed_countries_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

amino_acids_prev_tags unknown other

This column was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds prior tag values associated with amino acid annotations, likely a list-like or structured field that the dissector could not parse. Nothing else can be inferred without re-profiling.

Treatment: Re-profile with a parser that handles its container type before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[760]:

saturn.columns["amino_acids_prev_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

code categorical identifier

This column holds 50 unique numeric strings of varying length (8 to 13 digits), almost certainly product barcodes (EAN/UPC/GTIN). Every one of 50 rows is unique with no nulls, giving maximum entropy (entropy_ratio 1.0) and a top_rate of just 0.02 — it functions as a row identifier rather than a feature. The long_tail alert simply reflects that uniqueness.

Treatment: Use as a join key; drop from any model as it carries no predictive signal.

anthropic:claude-opus-4-7 · confidence high
Out[762]:

saturn.columns["code"].stats

statvalue
n50
nulls0 (0.0%)
unique50
top_value 6111242100992
top_rate 0.02
cardinality 50
entropy 5.644
entropy_ratio 1
alert: long_tail50 singleton categories
Fig 189.
Top values for code.
Show data table
Top values for code (20 unique shown, of 50 total).
valuecountshare
611124210099212.0%
762221044928312.0%
304692002975912.0%
611103100506412.0%
317568001148012.0%
2099555312.0%
326884000100812.0%
336260001104412.0%
842519771202412.0%
762221057846412.0%
611125934310812.0%
336260001122812.0%
800050031042712.0%
730040048159512.0%
304692002265112.0%
506004264100012.0%
762221058472412.0%
304692002260612.0%
322982010023412.0%
2002246412.0%

correctors unknown other

The column 'correctors' was skipped by the profiler, so its kind, uniqueness, and value distribution are unknown. Only the row count (50) and a null rate of 0.0 are reported; no other statistics are available to characterize content.

Treatment: Re-profile or inspect manually before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[765]:

saturn.columns["correctors"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

generic_name_ja categorical metadata

Likely a Japanese generic-name field (generic_name_ja), but it carries essentially no information in this sample: 98% of 50 rows are null and the single non-null value is an empty string, giving cardinality 1 and entropy 0.

Treatment: Drop from modelling; retain only if needed for display lookups.

anthropic:claude-opus-4-7 · confidence high
Out[767]:

saturn.columns["generic_name_ja"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 190.
Top values for generic_name_ja.
Show data table
Top values for generic_name_ja (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_fr categorical free_text

French-language generic product names, almost certainly from an Open Food Facts-style food catalogue. Cardinality is high (34 unique across 50 rows, entropy ratio 0.87) and most values are one-off descriptors like 'Chocolat noir extra-fin traditionnel à 90% de cacao'. The dominant 'value' is actually the empty string at 29.8% of non-null rows, on top of a 6% null rate, so effectively over a third of records carry no usable label.

Treatment: Treat empty strings as missing, then tokenize/embed for any modelling rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[770]:

saturn.columns["generic_name_fr"].stats

statvalue
n50
nulls3 (6.0%)
unique34
top_value
top_rate 0.2979
cardinality 34
entropy 4.42
entropy_ratio 0.8689
alert: long_tail33 singleton categories
Fig 191.
Top values for generic_name_fr.
Show data table
Top values for generic_name_fr (20 unique shown, of 34 total).
valuecountshare
1428.0%
Perly fromage frais12.0%
BISCUITS FOURRÉS (35%) PARFUM CHOCOLAT12.0%
Chocolat noir extra-fin traditionnel à 90% de cacao12.0%
Biscuits au sésame12.0%
Chocolat noir, 85% de cacao12.0%
Eau de source12.0%
Succédané de chocolat au lait avec amandes12.0%
Sablé coco12.0%
Biscuit fourré à la pâte à tartiner aux noisettes et au cacao Nutella®12.0%
Pain croustillant a la farine de seigle12.0%
Chocolat noir extra-fin traditionnel12.0%
Chips de pommes de terre légèrement salées au sel de mer12.0%
Chocolat noir extra fin, traditionnel12.0%
goûters fourrés au chocolat noir12.0%
Edelbitter-Schokolade 74% Kakao12.0%
Pain croustillant à la farine complète de seigle, avoine et sésame.12.0%
Crackers12.0%
Chocolat noir extra-fin12.0%
Biscuits aux pommes et aux noisettes, très pauvres en sel, riches en vitamines B1, B2, B9 et E et source de vitamines PP et B612.0%

generic_name_pl categorical metadata

Polish-language generic product name field, populated for only 5 of 50 rows (90% null) and containing just 2 distinct values where 4 of the 5 non-nulls are empty strings. Effectively a single real value ('Wyśmienita czkolada gorzka 70% kakao'), making the column unusable as a feature.

Treatment: Drop; null rate 0.9 and only one meaningful value.

anthropic:claude-opus-4-7 · confidence high
Out[773]:

saturn.columns["generic_name_pl"].stats

statvalue
n50
nulls45 (90.0%)
unique2
top_value
top_rate 0.8
cardinality 2
entropy 0.7219
entropy_ratio 0.7219
alert: null_rate90.0% null
Fig 192.
Top values for generic_name_pl.
Show data table
Top values for generic_name_pl (2 unique shown, of 2 total).
valuecountshare
48.0%
Wyśmienita czkolada gorzka 70% kakao12.0%

amino_acids_tags unknown other

This column is labelled amino_acids_tags, suggesting it would hold tags describing amino acid composition (likely a list-valued food annotation field). Saturn skipped profiling, so no uniqueness, cardinality, or value statistics are available beyond an n of 50 and a 0.0 null rate. Nothing further can be inferred without re-profiling.

Treatment: Re-profile after parsing as a list/tag field before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low
Out[776]:

saturn.columns["amino_acids_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_debug unknown metadata

Column 'ingredients_debug' was skipped by the profiler, so no type, uniqueness, or distribution stats are available. Only the row count (50) and null rate (0.0) are known; everything else is missing. The name suggests it is a debug/auxiliary field rather than a modelling input.

Treatment: Drop from modelling; retain only if needed for debugging upstream pipelines.

anthropic:claude-opus-4-7 · confidence low
Out[778]:

saturn.columns["ingredients_debug"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_with_allergens_ja categorical free_text

Japanese-language ingredient text with allergen markup, almost entirely absent from this sample. 98% of the 50 rows are null, and the only non-null value observed is an empty string, giving a cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop; the column carries no information in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[780]:

saturn.columns["ingredients_text_with_allergens_ja"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 193.
Top values for ingredients_text_with_allergens_ja.
Show data table
Top values for ingredients_text_with_allergens_ja (1 unique shown, of 1 total).
valuecountshare
12.0%

data_quality_info_tags unknown other

This column, data_quality_info_tags, was skipped by the profiler so its type and contents remain uncharacterised. The only signals available are 50 rows with no nulls; uniqueness, value distribution, and data kind are all unknown.

Treatment: Inspect raw values manually to determine type before deciding on handling.

anthropic:claude-opus-4-7 · confidence low
Out[783]:

saturn.columns["data_quality_info_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

last_edit_dates_tags unknown other

This column was skipped by the profiler, so no type, uniqueness, or distribution stats are available beyond a row count of 50 with no nulls. The name suggests it holds last-edit dates paired with tags, likely a composite or nested field that the dissector could not parse. Without further evidence, its structure and content cannot be characterised.

Treatment: Inspect raw values and parse into separate date and tag fields before use.

anthropic:claude-opus-4-7 · confidence low
Out[785]:

saturn.columns["last_edit_dates_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

last_modified_by categorical metadata

This column records the user or app that last modified each record, dominated by the bot/account 'foodless' which accounts for 21 of 49 non-null entries (top_rate 0.43). With 24 unique values across 50 rows and entropy_ratio 0.77, there's a long tail of mostly singleton contributors alongside a handful of app-like editors (municorn-calorie-counter-app, macrofactor). Null rate is low at 0.02.

Treatment: Keep as audit metadata; if used as a feature, collapse the long tail into 'other' and flag bot-vs-human editors.

anthropic:claude-opus-4-7 · confidence high
Out[787]:

saturn.columns["last_modified_by"].stats

statvalue
n50
nulls1 (2.0%)
unique24
top_value foodless
top_rate 0.4286
cardinality 24
entropy 3.513
entropy_ratio 0.7662
alert: long_tail19 singleton categories
Fig 194.
Top values for last_modified_by.
Show data table
Top values for last_modified_by (20 unique shown, of 24 total).
valuecountshare
foodless2142.0%
municorn-calorie-counter-app36.0%
charlesnepote24.0%
macrofactor24.0%
bodysupport24.0%
moon-rabbit12.0%
gmlaa12.0%
prepperapp12.0%
marmotte7312.0%
laura-chaud12.0%
org-barilla-france-sa12.0%
tom170712.0%
bubu6312.0%
moncoachigbas12.0%
natrius12.0%
clxtng12.0%
roboto-app12.0%
fgouget12.0%
ludolm12.0%
foodiq12.0%

no_nutrition_data categorical feature

A flag column indicating products lacking nutrition data, but it carries no information here: the only observed value is the empty string, present in all 48 non-null rows (top_rate 1.0, cardinality 1, entropy 0.0). 4% of rows are null, so there is literally nothing to distinguish records.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[790]:

saturn.columns["no_nutrition_data"].stats

statvalue
n50
nulls2 (4.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: imbalancetop value is 100.0% of rows
Fig 195.
Top values for no_nutrition_data.
Show data table
Top values for no_nutrition_data (1 unique shown, of 1 total).
valuecountshare
4896.0%

nutriscore unknown other

The column is named "nutriscore" but saturn skipped profiling it (kind="unknown"), so no type, uniqueness, or distribution stats are available. All 50 rows are non-null, but nothing else can be confirmed from the evidence.

Treatment: Manually inspect and cast to a known type before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[793]:

saturn.columns["nutriscore"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origin_nb categorical metadata

The column 'origin_nb' is effectively empty: 96% of the 50 rows are null and the only non-null value observed is the empty string, which appears twice. Cardinality is 1 with zero entropy, so the field carries no information in this sample.

Treatment: Drop; the column is 96% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[795]:

saturn.columns["origin_nb"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 196.
Top values for origin_nb.
Show data table
Top values for origin_nb (1 unique shown, of 1 total).
valuecountshare
24.0%

origins categorical free_text

Free-text origin/provenance strings for ingredients or products, with 20 unique values across 50 rows and a 4% null rate. The dominant value is the empty string at 24/50 (top_rate 0.5), so half the column is effectively blank rather than missing. The remainder is messy: language mix (France vs Maroc vs Morocco), comma-delimited multi-origin lists, and 'en:'-prefixed taxonomy tags like 'en:Madagarcar vanilla' (note the typo) — clearly not a clean categorical.

Treatment: Treat empty strings as null, normalise synonyms (Maroc/Morocco), and split on comma into a multi-label set before any encoding.

anthropic:claude-opus-4-7 · confidence high
Out[798]:

saturn.columns["origins"].stats

statvalue
n50
nulls2 (4.0%)
unique20
top_value
top_rate 0.5
cardinality 20
entropy 3.027
entropy_ratio 0.7003
alert: long_tail17 singleton categories
Fig 197.
Top values for origins.
Show data table
Top values for origins (20 unique shown, of 20 total).
valuecountshare
2448.0%
France48.0%
Maroc36.0%
Morocco12.0%
France,Union européenne,Non Union Européenne12.0%
France,Provence-Alpes-Côte d'Azur,Italie,Vaucluse,en:Cairanne,en:Chambon-la-Forêt,en:Source Emma,en:Source Ofélia,en:Source Sainte Cécile,en:Source Éléna,en:Source Éléonore12.0%
United Kingdom12.0%
en:Madagarcar vanilla12.0%
France,European Union and Non European Union12.0%
Germany,Ludwig Weinrich,Ludwig Weinrich in Germany12.0%
Suède,Allemagne,Biélorussie,Estonie,Lettonie,Pologne,Seigle12.0%
European Union and Non European Union12.0%
Équateur12.0%
España12.0%
France,Non Union Européenne,Non indiqué12.0%
madagascar, fr:afrique, amérique-du-sud12.0%
fr:maroc12.0%
Unspecified12.0%
Farine œuf France12.0%
European Union12.0%

nova_groups_tags unknown metadata

This column is named nova_groups_tags, suggesting it carries NOVA food-classification group tags (a 1-4 processing-level scheme used in nutrition datasets). However, saturn skipped profiling it, so no type, cardinality, or value statistics are available beyond a 50-row sample with zero nulls. Without inferred kind or unique counts, the actual content and format remain unverified.

Treatment: Manually inspect raw values to confirm format, then one-hot encode the small set of NOVA group tags.

anthropic:claude-opus-4-7 · confidence low
Out[801]:

saturn.columns["nova_groups_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

languages unknown other

The column is named 'languages' but saturn skipped profiling, so type and distribution are unknown. With 50 rows and no nulls, every record carries some value, yet n_unique and other stats are unavailable. The name suggests a list-like field (e.g., languages spoken or supported), which would explain why the dissector flagged it as unknown.

Treatment: Inspect raw values and parse (likely explode list-typed entries) before further profiling.

anthropic:claude-opus-4-7 · confidence low
Out[803]:

saturn.columns["languages"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutriscore_2023_tags unknown other

This column is flagged as skipped by the profiler, so no descriptive statistics, uniqueness, or value samples are available beyond a row count of 50 and a 0.0 null rate. The name suggests it holds Nutri-Score 2023 classification tags (likely a categorical label such as a, b, c, d, e), but that interpretation cannot be verified from the evidence provided. No distributional or quality signals can be assessed here.

Treatment: Re-run profiling with this column included to determine its type and cardinality before use.

anthropic:claude-opus-4-7 · confidence low
Out[805]:

saturn.columns["nutriscore_2023_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

packaging_materials_tags unknown free_text

This column appears to be a tags field listing packaging materials, likely a delimited or list-valued string per row. Saturn skipped profiling, so no uniqueness, cardinality, or value-frequency stats are available beyond a 50-row sample with zero nulls. Without parsed token statistics, the actual material vocabulary and its distribution remain unknown.

Treatment: split on the tag delimiter and one-hot or multi-hot encode the resulting material tokens before modelling.

anthropic:claude-opus-4-7 · confidence low
Out[807]:

saturn.columns["packaging_materials_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

lang categorical feature

This is a language code column with 5 distinct values and no nulls across 50 rows. The distribution is heavily dominated by 'fr' at 70% (35/50), with 'en' a distant second at 10 occurrences and 'de', 'bg', 'ro' appearing only 1-3 times each. Entropy ratio of 0.56 confirms the imbalance, and the long tail of rare languages (bg, ro with single observations) may be unstable for any per-language modelling.

Treatment: One-hot encode with rare languages (bg, ro, de) collapsed into an 'other' bucket.

anthropic:claude-opus-4-7 · confidence high
Out[809]:

saturn.columns["lang"].stats

statvalue
n50
nulls0 (0.0%)
unique5
top_value fr
top_rate 0.7
cardinality 5
entropy 1.294
entropy_ratio 0.5572
Fig 198.
Top values for lang.
Show data table
Top values for lang (5 unique shown, of 5 total).
valuecountshare
fr3570.0%
en1020.0%
de36.0%
bg12.0%
ro12.0%

packaging_text_sv categorical free_text

Swedish packaging text field that is effectively empty: 92% of the 50 rows are null and the remaining 4 non-null values are all the empty string, giving a single observed category with entropy 0. There is no usable signal here.

Treatment: Drop; column is 92% null and the only non-null value is an empty string.

anthropic:claude-opus-4-7 · confidence high
Out[812]:

saturn.columns["packaging_text_sv"].stats

statvalue
n50
nulls46 (92.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate92.0% null
alert: imbalancetop value is 100.0% of rows
Fig 199.
Top values for packaging_text_sv.
Show data table
Top values for packaging_text_sv (1 unique shown, of 1 total).
valuecountshare
48.0%

photographers unknown other

This column 'photographers' was skipped by the profiler, so no type, uniqueness, or value statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds photographer attributions, possibly as a list or nested structure that the dissector could not parse. Without unique counts or sample values, nothing further can be inferred.

Treatment: Inspect raw values manually to determine structure before deciding whether to parse, explode, or drop.

anthropic:claude-opus-4-7 · confidence low
Out[815]:

saturn.columns["photographers"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

languages_codes unknown other

This column is named languages_codes but saturn skipped detailed profiling, leaving kind as unknown with no uniqueness or value statistics. The only confirmed signals are 50 rows and a 0.0 null rate. Without sample values or cardinality, the structure (single code, list, or delimited string) cannot be determined from the evidence.

Treatment: Re-profile with parsing enabled to determine whether values are scalar codes or lists before use.

anthropic:claude-opus-4-7 · confidence low
Out[817]:

saturn.columns["languages_codes"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ecoscore_grade categorical label

This is the Eco-Score grade, a categorical environmental rating with letter tiers from a-plus through f plus sentinel values 'unknown' and 'not-applicable'. Distribution skews toward worse grades: 'e' leads at 12/50 (top_rate 0.24), followed by 'd' (9), while 'a' and 'a-plus' together account for only 5 rows. Six rows are 'unknown' and one 'not-applicable', so roughly 14% of values are non-informative sentinels that need handling.

Treatment: Map sentinels ('unknown','not-applicable') to NA and treat the remaining tiers as an ordinal factor.

anthropic:claude-opus-4-7 · confidence high
Out[819]:

saturn.columns["ecoscore_grade"].stats

statvalue
n50
nulls0 (0.0%)
unique9
top_value e
top_rate 0.24
cardinality 9
entropy 2.808
entropy_ratio 0.8857
Fig 200.
Top values for ecoscore_grade.
Show data table
Top values for ecoscore_grade (9 unique shown, of 9 total).
valuecountshare
e1224.0%
d918.0%
b816.0%
c816.0%
unknown612.0%
a36.0%
a-plus24.0%
not-applicable12.0%
f12.0%

ingredients_n numeric feature

Numeric count of ingredients per record, ranging from 1 to 39 with a median of 9 and mean of 11.7. The distribution is right-skewed (skew 1.24, kurtosis 1.44) with a wide IQR of 11 and 2 outliers (4%) on the high end. No nulls or zeros, and 22 unique values across 50 rows suggest a discrete count variable.

Treatment: Consider log or sqrt transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[822]:

saturn.columns["ingredients_n"].stats

statvalue
n50
nulls0 (0.0%)
unique22
min 1
max 39
mean 11.7
median 9
std 8.244
q1 5
q3 16
iqr 11
skew 1.237
kurtosis 1.435
n_outliers 2
outlier_rate 0.04
zero_rate 0
Fig 201.
Distribution of ingredients_n. Vertical dash marks the median.
Show data table
Histogram bins for ingredients_n (median: 9.0).
bincount
1 – 6.42918
6.429 – 11.869
11.86 – 17.2913
17.29 – 22.716
22.71 – 28.142
28.14 – 33.570
33.57 – 392

allergens categorical feature

Categorical allergen tags using an Open Food Facts-style 'en:' prefix, often combined as comma-separated lists (e.g., 'en:gluten,en:milk,en:soybeans'). The most common value is an empty string at 32% (16/50), suggesting missing or no-allergen records encoded as blanks rather than nulls. With 16 unique values across 50 rows and entropy ratio 0.84, the distribution is fairly spread; gluten, milk, and soybeans dominate the non-empty tags.

Treatment: Split on comma and multi-hot encode allergen tags; treat empty string as missing.

anthropic:claude-opus-4-7 · confidence high
Out[825]:

saturn.columns["allergens"].stats

statvalue
n50
nulls0 (0.0%)
unique16
top_value
top_rate 0.32
cardinality 16
entropy 3.364
entropy_ratio 0.8411
Fig 202.
Top values for allergens.
Show data table
Top values for allergens (16 unique shown, of 16 total).
valuecountshare
1632.0%
en:soybeans510.0%
en:gluten510.0%
en:gluten,en:milk,en:soybeans48.0%
en:milk,en:nuts,en:soybeans48.0%
en:gluten,en:milk36.0%
en:eggs,en:gluten,en:milk,en:soybeans24.0%
en:milk24.0%
en:eggs,en:gluten,en:milk24.0%
en:banana,en:milk12.0%
en:gluten,en:milk,en:nuts,en:soybeans12.0%
en:gluten,en:sesame-seeds12.0%
en:eggs,en:gluten,en:sulphur-dioxide-and-sulphites12.0%
en:gluten,en:nuts12.0%
en:eggs,en:gluten12.0%
en:nuts,en:sulphur-dioxide-and-sulphites12.0%

minerals_tags unknown other

This column was skipped by the profiler, so no statistics are available beyond a row count of 50 and a 0.0 null rate. The name suggests it holds tag-style annotations for minerals (likely a list or delimited string per row), but uniqueness, cardinality, and value distribution are all unknown here.

Treatment: Re-profile with a parser appropriate for list/tag fields before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[828]:

saturn.columns["minerals_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name categorical free_text

Free-text product name field with 49 unique values across 50 rows, near-maximal entropy ratio of 0.998 — effectively a per-row label. Values mix languages (French, English, Cyrillic) and formats (brand-only like 'Henry's' versus full descriptors like 'CRISTALINE Eau De Source 0.5L'), and one row is an empty string despite a reported null_rate of 0.0. The single repeat ('Henry's', 2) is the only signal preventing full uniqueness.

Treatment: Normalize casing and empty strings, then tokenize/embed rather than one-hot encode.

anthropic:claude-opus-4-7 · confidence high
Out[830]:

saturn.columns["product_name"].stats

statvalue
n50
nulls0 (0.0%)
unique49
top_value Henry’s
top_rate 0.04
cardinality 49
entropy 5.604
entropy_ratio 0.9981
alert: long_tail48 singleton categories
Fig 203.
Top values for product_name.
Show data table
Top values for product_name (20 unique shown, of 49 total).
valuecountshare
Henry’s24.0%
Perly12.0%
Prince Goût Chocolat12.0%
Excellence Noir Prodigieux 90% Cacao12.0%
Tonik12.0%
Sésame12.0%
Шоколад 85% какаова маса12.0%
CRISTALINE Eau De Source 0.5L12.0%
12.0%
Organic 70% Dark Chocolate Bar12.0%
KING COOKIES12.0%
Sable coco Henry s 42g12.0%
Biscuits croquants au coeur onctueux de Nutella®12.0%
Tartine croustillante Authentique12.0%
Excellence Noir Intense 70% Cacao12.0%
Lightly sea salted crisps12.0%
Dark chocolate12.0%
Excellence Noir Puissant 85% Cacao12.0%
Fourrés Chocolat Noir12.0%
Extra dark 74% Cocoa12.0%

purchase_places categorical free_text

Free-form purchase location strings, often listing multiple places per row separated by commas. France dominates at 9/50 (18.4%), followed by an empty string (6) and Maroc (5), but with 32 unique values across 50 rows and entropy ratio 0.90, the long tail includes multi-country concatenations like 'Madrid,España,Montargis,France,Würzburg,Deutschland,...'. Mixed languages (Maroc vs Morocco, España vs Spain) and embedded postal codes signal inconsistent data entry rather than a clean categorical.

Treatment: Split on commas and normalize each token to a canonical country before using as a multi-label feature.

anthropic:claude-opus-4-7 · confidence high
Out[833]:

saturn.columns["purchase_places"].stats

statvalue
n50
nulls1 (2.0%)
unique32
top_value France
top_rate 0.1837
cardinality 32
entropy 4.479
entropy_ratio 0.8958
alert: long_tail29 singleton categories
Fig 204.
Top values for purchase_places.
Show data table
Top values for purchase_places (20 unique shown, of 32 total).
valuecountshare
France918.0%
612.0%
Maroc510.0%
Casablanca,Morocco12.0%
F-77480 Mousseaux-les-Bray,France12.0%
Madrid,España,Montargis,France,Würzburg,Deutschland,Italia,Singapore,République tchèque,Toronto,Burlington,Oakville12.0%
France,Lacaune12.0%
Slovenija,Finland,United Kingdom12.0%
Villeurbanne,France,Toulon12.0%
Lund,Sweden12.0%
Fez,Morocco12.0%
Italien,France,Lacaune,Portugal12.0%
Bar-le-Duc,France12.0%
France,République tchèque,Lacaune12.0%
France,United Kingdom12.0%
Veynes,France,Trignac12.0%
Morocco12.0%
France,Belgique,Espagne,Estonie12.0%
España,France,Serbia,Praha,Czechia12.0%
France,Normandie12.0%

quantity categorical feature

This column records product quantities as free-text strings, dominated by gram weights but with no consistent format — '100 g', '100g', and '100 gram' all appear separately among the top values. With 36 unique values across 50 rows and entropy ratio 0.959, the field is highly fragmented; the most common value '100 g' covers only 12.2% of non-nulls, and 2% are null plus 2 empty strings. The long_tail alert reflects this unit/spacing inconsistency rather than genuine variety.

Treatment: Normalize units and parse into a numeric grams column before use.

anthropic:claude-opus-4-7 · confidence high
Out[836]:

saturn.columns["quantity"].stats

statvalue
n50
nulls1 (2.0%)
unique36
top_value 100 g
top_rate 0.1224
cardinality 36
entropy 4.956
entropy_ratio 0.9587
alert: long_tail28 singleton categories
Fig 205.
Top values for quantity.
Show data table
Top values for quantity (20 unique shown, of 36 total).
valuecountshare
100 g612.0%
100g36.0%
125g24.0%
42g24.0%
90g24.0%
24.0%
100 gram24.0%
230 g24.0%
300 g12.0%
22 g12.0%
230g12.0%
500 ml12.0%
150 g12.0%
304 g12.0%
275 g12.0%
150g12.0%
225 g12.0%
85 g12.0%
36 g12.0%
5212.0%

traces_tags unknown other

The column `traces_tags` was skipped by the profiler, so no type, uniqueness, or distribution statistics are available. The only confirmed signals are 50 rows present and a null rate of 0.0, meaning every row has some value but its content and structure are unknown. The name suggests it may hold tag annotations associated with traces, possibly a nested or list-typed field that the profiler could not parse.

Treatment: Inspect raw values manually to determine structure before deciding whether to parse, explode, or drop.

anthropic:claude-opus-4-7 · confidence low
Out[839]:

saturn.columns["traces_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origin_uk categorical feature

This appears to be a binary/flag column indicating UK origin, but it carries virtually no signal: 98% of the 50 rows are null and the only non-null value observed is an empty string. With cardinality of 1 and entropy of 0, the column has no discriminative power as it stands.

Treatment: Drop; column is 98% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[841]:

saturn.columns["origin_uk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 206.
Top values for origin_uk.
Show data table
Top values for origin_uk (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_ar categorical metadata

Arabic generic-name field that is overwhelmingly empty: 80% of the 50 rows are null and of the 10 populated rows, 9 are blank strings and only 1 carries an actual Arabic value (الامير). With cardinality of just 2 and a top_rate of 0.9 on the empty string, this column carries almost no information as currently captured.

Treatment: Drop or defer until source data is backfilled; not usable as-is.

anthropic:claude-opus-4-7 · confidence high
Out[844]:

saturn.columns["generic_name_ar"].stats

statvalue
n50
nulls40 (80.0%)
unique2
top_value
top_rate 0.9
cardinality 2
entropy 0.469
entropy_ratio 0.469
alert: null_rate80.0% null
Fig 207.
Top values for generic_name_ar.
Show data table
Top values for generic_name_ar (2 unique shown, of 2 total).
valuecountshare
918.0%
الامير12.0%

packaging_text_uk categorical metadata

This column appears to be Ukrainian packaging text, but it is effectively empty: 98% of the 50 rows are null and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, so it carries no information.

Treatment: Drop; the column has no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[847]:

saturn.columns["packaging_text_uk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 208.
Top values for packaging_text_uk.
Show data table
Top values for packaging_text_uk (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_ar categorical free_text

Arabic-language ingredients text, populated for only 11 of 50 rows (null_rate 0.78) and with 10 of those 11 non-null entries being empty strings. Only one row carries an actual Arabic ingredient list, giving cardinality 2 and a top_rate of 0.91 on the empty string. Effectively unusable as a feature on this sample.

Treatment: Drop for modelling; retain only if you specifically need Arabic ingredient parsing and can source more populated rows.

anthropic:claude-opus-4-7 · confidence high
Out[850]:

saturn.columns["ingredients_text_ar"].stats

statvalue
n50
nulls39 (78.0%)
unique2
top_value
top_rate 0.9091
cardinality 2
entropy 0.4395
entropy_ratio 0.4395
alert: null_rate78.0% null
Fig 209.
Top values for ingredients_text_ar.
Show data table
Top values for ingredients_text_ar (2 unique shown, of 2 total).
valuecountshare
1020.0%
سكر،دقيق،دهون نباتية (نخيل،شيا)،مسحوق كاكاو،شراب جلوكوز،نشا الذرة،مسحوق حليب،مسحوق مصل اللبن،مسحوق حليب كامل الدسم،عجينة الكاكاو،مواد رافعة(بكربونات الصوديوم و الأمونيوم)،ملح،مستحلب(لسيتين الصويا(E322)وڤانيلين12.0%

ingredients_text_uk categorical free_text

Ukrainian-language ingredients text, almost entirely absent in this sample. 98% of rows are null and the single non-null value is an empty string, leaving zero usable content. Entropy is 0 and cardinality is 1, so the column carries no signal here.

Treatment: Drop from this slice; revisit only if a Ukrainian-locale subset is loaded.

anthropic:claude-opus-4-7 · confidence high
Out[853]:

saturn.columns["ingredients_text_uk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 210.
Top values for ingredients_text_uk.
Show data table
Top values for ingredients_text_uk (1 unique shown, of 1 total).
valuecountshare
12.0%

last_check_dates_tags unknown other

This column was skipped by the profiler, so no type, uniqueness, or distribution statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it stores tags associated with last-check dates, possibly a list or composite field that the dissector could not parse. Without further stats, nothing can be said about its values.

Treatment: Inspect raw values manually and re-profile after parsing into a structured type.

anthropic:claude-opus-4-7 · confidence low
Out[856]:

saturn.columns["last_check_dates_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

checked categorical feature

This looks like a checkbox-style flag (likely from a web form), where the only observed value is "on" in 7 of 50 rows. The remaining 86% are null, and entropy is 0.0 because there is no variation among the non-null entries. With cardinality of 1, the column carries no discriminative signal as captured.

Treatment: Convert to a boolean (on vs null) or drop, since it has only one observed value.

anthropic:claude-opus-4-7 · confidence high
Out[858]:

saturn.columns["checked"].stats

statvalue
n50
nulls43 (86.0%)
unique1
top_value on
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate86.0% null
alert: imbalancetop value is 100.0% of rows
Fig 211.
Top values for checked.
Show data table
Top values for checked (1 unique shown, of 1 total).
valuecountshare
on714.0%

packaging_text_ar categorical free_text

This appears to be Arabic-language packaging text, but it carries no information in this sample: 80% of the 50 rows are null and the remaining 10 values are all empty strings, giving cardinality 1 and entropy 0. There is nothing to model or join on here.

Treatment: Drop; the column is effectively constant-empty with 80% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[861]:

saturn.columns["packaging_text_ar"].stats

statvalue
n50
nulls40 (80.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate80.0% null
alert: imbalancetop value is 100.0% of rows
Fig 212.
Top values for packaging_text_ar.
Show data table
Top values for packaging_text_ar (1 unique shown, of 1 total).
valuecountshare
1020.0%

carbon_footprint_percent_of_known_ingredients numeric feature

Numeric coverage metric indicating the share of an item's known ingredients that have a carbon footprint estimate, ranging from 8.0 to 105.0 with a median of 70.0. The 62% null rate is the dominant signal — only 19 distinct values populate this column across 50 rows, so most records lack any coverage figure at all. The max of 105.0 is mildly surprising for what reads like a percentage, and the distribution is slightly left-skewed (skew -0.45) with no flagged outliers.

Treatment: Impute or add a missingness indicator before modelling, and verify whether values above 100 are valid.

anthropic:claude-opus-4-7 · confidence medium
Out[864]:

saturn.columns["carbon_footprint_percent_of_known_ingredients"].stats

statvalue
n50
nulls31 (62.0%)
unique19
min 8
max 105
mean 61.79
median 70
std 28.98
q1 45.5
q3 78.3
iqr 32.8
skew -0.4493
kurtosis -0.8083
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate62.0% null
Fig 213.
Distribution of carbon_footprint_percent_of_known_ingredients. Vertical dash marks the median.
Show data table
Histogram bins for carbon_footprint_percent_of_known_ingredients (median: 70.0).
bincount
8 – 27.43
27.4 – 46.82
46.8 – 66.23
66.2 – 85.68
85.6 – 1053

last_checker categorical metadata

This looks like the username of the last reviewer/checker on a record, with only 4 distinct values across 50 rows. The column is 86% null, so just 7 rows carry a value, and 'aleene' accounts for 3 of those (top_rate 0.43). Entropy ratio of 0.92 indicates the few present values are spread fairly evenly across the small handful of checkers.

Treatment: Treat missingness as a 'never checked' category; too sparse to use as a model feature.

anthropic:claude-opus-4-7 · confidence high
Out[867]:

saturn.columns["last_checker"].stats

statvalue
n50
nulls43 (86.0%)
unique4
top_value aleene
top_rate 0.4286
cardinality 4
entropy 1.842
entropy_ratio 0.9212
alert: null_rate86.0% null
Fig 214.
Top values for last_checker.
Show data table
Top values for last_checker (4 unique shown, of 4 total).
valuecountshare
aleene36.0%
moon-rabbit24.0%
beniben12.0%
sebleouf12.0%

product_name_uk categorical metadata

This appears to be a Ukrainian-language product name field that is effectively empty: 98% of the 50 rows are null and the single non-null value is itself an empty string, giving a cardinality of 1 and entropy of 0. There is no usable signal here whatsoever.

Treatment: Drop the column; it carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[870]:

saturn.columns["product_name_uk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 215.
Top values for product_name_uk.
Show data table
Top values for product_name_uk (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_uk categorical metadata

This appears to be a UK-localized generic product name field, but it is effectively empty in this sample: 98% of the 50 rows are null and the only non-null value is an empty string. Cardinality is 1 with zero entropy, so the column carries no information here.

Treatment: Drop; no usable signal at this null rate and cardinality.

anthropic:claude-opus-4-7 · confidence high
Out[873]:

saturn.columns["generic_name_uk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 216.
Top values for generic_name_uk.
Show data table
Top values for generic_name_uk (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_ar categorical metadata

Arabic-language product name field that is mostly absent: 78% null and only 6 distinct values across 50 rows. The non-null entries are a language mix — one Arabic string (برنس) alongside Spanish and English names like 'Leche Y Almendras' and 'Chocolate Negro 92% Cacao' — suggesting the column is not consistently populated with Arabic translations. The most frequent observed value is an empty string (6 occurrences, 54.5% of non-nulls), indicating empties coexist with true nulls.

Treatment: Drop or defer until translation coverage improves; normalise empty strings to null and validate language before use.

anthropic:claude-opus-4-7 · confidence high
Out[876]:

saturn.columns["product_name_ar"].stats

statvalue
n50
nulls39 (78.0%)
unique6
top_value
top_rate 0.5455
cardinality 6
entropy 2.049
entropy_ratio 0.7928
alert: long_tail5 singleton categories
alert: null_rate78.0% null
Fig 217.
Top values for product_name_ar.
Show data table
Top values for product_name_ar (6 unique shown, of 6 total).
valuecountshare
612.0%
برنس12.0%
Tonjik12.0%
Leche Y Almendras12.0%
Eyoo cover12.0%
Chocolate Negro 92% Cacao12.0%

carbon_footprint_from_known_ingredients_debug categorical metadata

Debug trace string showing the per-ingredient carbon footprint computation (percentage × emission factor = grams) for each product. Every one of the 14 non-null values is unique (entropy_ratio ≈ 1.0, top_rate 0.07), and 72% of rows are null, so it functions as a verbose audit log rather than a feature.

Treatment: Drop from modelling; retain only for auditing the carbon calculation.

anthropic:claude-opus-4-7 · confidence high
Out[879]:

saturn.columns["carbon_footprint_from_known_ingredients_debug"].stats

statvalue
n50
nulls36 (72.0%)
unique14
top_value en:cereal 50% x 0.3 = 15 g -
top_rate 0.07143
cardinality 14
entropy 3.807
entropy_ratio 1
alert: long_tail14 singleton categories
alert: null_rate72.0% null
Fig 218.
Top values for carbon_footprint_from_known_ingredients_debug.
Show data table
Top values for carbon_footprint_from_known_ingredients_debug (14 unique shown, of 14 total).
valuecountshare
en:cereal 50% x 0.3 = 15 g - 12.0%
en:wheat-flour 55.1% x 1.2 = 66.12 g - 12.0%
en:wheat-flour 32% x 1.2 = 38.4 g - en:cane-sugar 9% x 1.3 = 11.7 g - 12.0%
en:wholemeal-rye-flour 77% x 1.2 = 92.4 g - en:rye-flour 28% x 1.2 = 33.6 g - 12.0%
en:wheat-flour 39% x 1.2 = 46.8 g - en:dark-chocolate 25% x 4.9 = 122.5 g - en:whole-wheat-flour 15% x 1.2 = 18 g - 12.0%
en:wholemeal-rye-flour 59% x 1.2 = 70.8 g - en:wheat-bran 27% x 0.6 = 16.2 g - en:oat-flakes 12% x 0.3 = 3.6 g - 12.0%
en:wheat-flour 68.5% x 1.2 = 82.2 g - en:wheat-germ 5.2% x 0.6 = 3.12 g - 12.0%
en:hazelnut-oil 13% x 2.6 = 33.8 g - 12.0%
en:whole-wheat-flour 26.5% x 1.2 = 31.8 g - en:wheat-flour 26.1% x 1.2 = 31.32 g - en:wheat-bran 19.9% x 0.6 = 11.94 g - en:fig-paste 5.1% x 0.3 = 1.53 g - 12.0%
en:wheat-flour 41% x 1.2 = 49.2 g - en:fresh-egg 11% x 2.6 = 28.6 g - 12.0%
en:walnut-kernel 25% x 1.3 = 32.5 g - en:almond 25% x 5.9 = 147.5 g - en:cranberry 25% x 0.3 = 7.5 g - 12.0%
en:whole-fresh-eggs 8% x 2.6 = 20.8 g - 12.0%
en:wheat-flour 37% x 1.2 = 44.4 g - en:milk-chocolate 27% x 5.9 = 159.3 g - en:whole-wheat-flour 12% x 1.2 = 14.4 g - 12.0%
en:cereal 98.3% x 0.3 = 29.49 g - 12.0%

last_checked_t numeric timestamp

Values are Unix epoch seconds ranging from 1540933974 to 1730226344, consistent with 'last checked' timestamps spanning roughly late 2018 to late 2024. Severe sparsity dominates: null_rate is 0.86 and only 7 unique values populate the 50 rows, so this column is barely usable as-is. Distribution is mildly right-skewed (skew 0.81) with no outliers flagged.

Treatment: Convert from epoch seconds to datetime and treat as mostly-missing; impute or drop before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[882]:

saturn.columns["last_checked_t"].stats

statvalue
n50
nulls43 (86.0%)
unique7
min 1.541e+09
max 1.73e+09
mean 1.607e+09
median 1.565e+09
std 7.772e+07
q1 1.556e+09
q3 1.652e+09
iqr 9.601e+07
skew 0.8106
kurtosis -1.103
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate86.0% null
Fig 219.
Distribution of last_checked_t. Vertical dash marks the median.
Show data table
Histogram bins for last_checked_t (median: 1564679969.0).
bincount
1.541e+09 – 1.579e+094
1.579e+09 – 1.617e+091
1.617e+09 – 1.655e+090
1.655e+09 – 1.692e+090
1.692e+09 – 1.73e+092

ingredients_text_with_allergens_uk categorical free_text

This appears to be a UK-localized variant of an ingredients-with-allergens text field, but it is effectively empty in this sample. 98% of the 50 rows are null, and the single non-null value is itself an empty string, giving cardinality 1 and zero entropy. There is no usable signal here.

Treatment: Drop; no observed values in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[885]:

saturn.columns["ingredients_text_with_allergens_uk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 220.
Top values for ingredients_text_with_allergens_uk.
Show data table
Top values for ingredients_text_with_allergens_uk (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_ar categorical free_text

Arabic-language ingredients text (with allergen markup) for food products. The column is 82% null across just 50 rows, and of the 9 non-null entries 8 are empty strings — only 1 row carries an actual ingredient list. Effectively no usable signal at this sample size.

Treatment: Drop for now; revisit only if a larger Arabic-localized sample becomes available.

anthropic:claude-opus-4-7 · confidence high
Out[888]:

saturn.columns["ingredients_text_with_allergens_ar"].stats

statvalue
n50
nulls41 (82.0%)
unique2
top_value
top_rate 0.8889
cardinality 2
entropy 0.5033
entropy_ratio 0.5033
alert: null_rate82.0% null
Fig 221.
Top values for ingredients_text_with_allergens_ar.
Show data table
Top values for ingredients_text_with_allergens_ar (2 unique shown, of 2 total).
valuecountshare
816.0%
سكر،دقيق،دهون نباتية (نخيل،شيا)،مسحوق كاكاو،شراب جلوكوز،نشا الذرة،مسحوق حليب،مسحوق مصل اللبن،مسحوق حليب كامل الدسم،عجينة الكاكاو،مواد رافعة(بكربونات الصوديوم و الأمونيوم)،ملح،مستحلب(لسيتين الصويا(E322)وڤانيلين12.0%

origin_ar categorical metadata

Categorical field 'origin_ar' carries a single observed value (an empty string) across the 10 non-null rows, while 80% of records are null. With cardinality 1 and entropy 0, the column conveys no information in this sample.

Treatment: Drop; the column is 80% null and constant on the remainder.

anthropic:claude-opus-4-7 · confidence high
Out[891]:

saturn.columns["origin_ar"].stats

statvalue
n50
nulls40 (80.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate80.0% null
alert: imbalancetop value is 100.0% of rows
Fig 222.
Top values for origin_ar.
Show data table
Top values for origin_ar (1 unique shown, of 1 total).
valuecountshare
1020.0%

nutriments_estimated unknown other

The column `nutriments_estimated` was skipped by the profiler, so no type, uniqueness, or distribution stats are available. The only facts on record are that all 50 sampled rows are non-null and the kind is unknown. Without further evidence, the content and structure of this field cannot be characterised.

Treatment: Re-profile with an appropriate parser before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[894]:

saturn.columns["nutriments_estimated"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

nutrition_score_warning_no_fiber numeric feature

This appears to be a binary warning flag indicating a missing-fiber condition in a nutrition score, encoded as 1 when triggered. Every one of the 15 non-null rows holds the value 1.0, and 70% of rows are null — consistent with a sparse flag that is only populated when the warning fires. With zero variance, it carries no discriminative signal as-is.

Treatment: Recode nulls to 0 to convert into a usable binary indicator, or drop if still constant after recoding.

anthropic:claude-opus-4-7 · confidence high
Out[896]:

saturn.columns["nutrition_score_warning_no_fiber"].stats

statvalue
n50
nulls35 (70.0%)
unique1
min 1
max 1
mean 1
median 1
std 0
q1 1
q3 1
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate70.0% null
alert: constantonly one distinct value
Fig 223.
Distribution of nutrition_score_warning_no_fiber. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_no_fiber (median: 1.0).
bincount
0.5 – 0.70
0.7 – 0.90
0.9 – 1.115
1.1 – 1.30
1.3 – 1.50

ingredients_text_debug_tags unknown metadata

This column, named `ingredients_text_debug_tags`, was skipped by the profiler so no distributional statistics are available. The name suggests it holds debugging tags emitted by an ingredients-text parser, likely a list or sparse string field. With 50 rows observed and a 0.0 null rate reported but no unique count or value stats, nothing further can be inferred from the evidence.

Treatment: Inspect raw values manually; likely drop unless debugging the ingredients parser.

anthropic:claude-opus-4-7 · confidence low
Out[899]:

saturn.columns["ingredients_text_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

taxonomies_enhancer_tags unknown other

The column 'taxonomies_enhancer_tags' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 50 and a null rate of 0.0. Without kind detection or value samples, its content (likely some form of taxonomy/tag payload based on the name) cannot be verified from the evidence.

Treatment: Re-profile with parsing enabled (likely a nested/list field) before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[901]:

saturn.columns["taxonomies_enhancer_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

completed_t numeric timestamp

Values are 10-digit integers ranging from 1628199203 to 1763195431, consistent with Unix epoch seconds spanning roughly 2021 through 2025 — almost certainly a 'completed at' timestamp. The 68% null rate is the dominant signal, suggesting most records were never completed. Distribution across the non-null 16 unique values is near-symmetric (skew ~0.001) with no outliers.

Treatment: Convert from epoch seconds to datetime and treat nulls as 'not yet completed' rather than imputing.

anthropic:claude-opus-4-7 · confidence high
Out[903]:

saturn.columns["completed_t"].stats

statvalue
n50
nulls34 (68.0%)
unique16
min 1.628e+09
max 1.763e+09
mean 1.7e+09
median 1.703e+09
std 4.07e+07
q1 1.663e+09
q3 1.74e+09
iqr 7.618e+07
skew 0.001247
kurtosis -1.155
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate68.0% null
Fig 224.
Distribution of completed_t. Vertical dash marks the median.
Show data table
Histogram bins for completed_t (median: 1703093252.0).
bincount
1.628e+09 – 1.655e+091
1.655e+09 – 1.682e+095
1.682e+09 – 1.709e+094
1.709e+09 – 1.736e+091
1.736e+09 – 1.763e+095

product_name_bg categorical metadata

This is a Bulgarian-language product name field, with values like 'Шоколад 85% какаова маса' indicating localized chocolate/cocoa product labels. It is 94% null across 50 rows, leaving only 3 non-null entries that are all unique. With so little populated data, this column carries almost no analytical signal in its current state.

Treatment: Drop or defer until Bulgarian localization coverage improves; too sparse to use.

anthropic:claude-opus-4-7 · confidence high
Out[906]:

saturn.columns["product_name_bg"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value Шоколад 85% какаова маса
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 225.
Top values for product_name_bg.
Show data table
Top values for product_name_bg (3 unique shown, of 3 total).
valuecountshare
Шоколад 85% какаова маса12.0%
Тъмен шоколад 74% какао12.0%
Лешниково-какаов крем12.0%

ingredients_text_et categorical free_text

Free-text ingredient lists ostensibly tagged as Estonian (et), but the three observed values mix Slovenian, German, and Estonian, suggesting mislabeled locale tagging. The field is 94% null with only 3 non-null entries out of 50, so any signal here is anecdotal at best.

Treatment: Drop or defer; too sparse and language-inconsistent to model without a language-detection cleanup pass.

anthropic:claude-opus-4-7 · confidence high
Out[909]:

saturn.columns["ingredients_text_et"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (_sojin_ lecitin); ekstrakt vanilije.
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 226.
Top values for ingredients_text_et.
Show data table
Top values for ingredients_text_et (3 unique shown, of 3 total).
valuecountshare
kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (_sojin_ lecitin); ekstrakt vanilije.12.0%
Kakaomasse*, Zucker, Kakaobutter, Kakaopulver stark entöit, Emulgator: Sonnenblumenlecithine (E-322), natürliches Vanille-Aroma, * Rainforest Alliance Certified, Kakao: 74% mindestens,12.0%
Kakaomass, suhkur, kakaovoi, vanill.12.0%

origin_sl categorical metadata

The column appears to be an origin identifier or location code, but it is effectively empty in this sample. 98% of the 50 rows are null, and the single non-null value is itself a blank string, leaving cardinality at 1 and entropy at 0.

Treatment: Drop; the column carries no usable signal at this null rate.

anthropic:claude-opus-4-7 · confidence high
Out[912]:

saturn.columns["origin_sl"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 227.
Top values for origin_sl.
Show data table
Top values for origin_sl (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_dz categorical metadata

This appears to be a localized (Algerian/Dzongkha?) generic-name field, but it is effectively empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, so the column carries no information.

Treatment: Drop; no signal (98% null, single empty-string value).

anthropic:claude-opus-4-7 · confidence high
Out[915]:

saturn.columns["generic_name_dz"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 228.
Top values for generic_name_dz.
Show data table
Top values for generic_name_dz (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_sl categorical free_text

Slovenian-language ingredients text, almost entirely empty: 98% null with only 1 non-null value across 50 rows. The single observed entry is a free-form product label (cocoa-based confection) rather than a controlled vocabulary, so the categorical framing is misleading.

Treatment: Drop for modelling; if needed, treat as free text and merge with other-language ingredient fields.

anthropic:claude-opus-4-7 · confidence high
Out[918]:

saturn.columns["ingredients_text_sl"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 229.
Top values for ingredients_text_sl.
Show data table
Top values for ingredients_text_sl (1 unique shown, of 1 total).
valuecountshare
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.12.0%

generic_name_ca categorical metadata

This appears to be a Catalan-language generic product name field, but it is effectively empty: 96% of rows are null and the only non-null value observed is the empty string (2 occurrences). Cardinality is 1 with zero entropy, so the column carries no information in this sample.

Treatment: Drop; the column is 96% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[921]:

saturn.columns["generic_name_ca"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 230.
Top values for generic_name_ca.
Show data table
Top values for generic_name_ca (1 unique shown, of 1 total).
valuecountshare
24.0%

ingredients_text_dz categorical free_text

This appears to be a Dzongkha-language ingredients text field, likely a localized variant of a multilingual product description column. Out of 50 rows, 98% are null and the single non-null value is an empty string, leaving zero usable content.

Treatment: Drop; effectively empty in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[924]:

saturn.columns["ingredients_text_dz"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 231.
Top values for ingredients_text_dz.
Show data table
Top values for ingredients_text_dz (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_ca categorical metadata

This appears to be a Catalan-language product name field, but it is effectively empty: 96% of the 50 rows are null and the only 2 non-null values are blank strings, giving a single observed category with entropy 0. There is no usable signal here.

Treatment: Drop the column; it is 96% null with no distinct values.

anthropic:claude-opus-4-7 · confidence high
Out[927]:

saturn.columns["product_name_ca"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 232.
Top values for product_name_ca.
Show data table
Top values for product_name_ca (1 unique shown, of 1 total).
valuecountshare
24.0%

origin_ca categorical feature

This appears to be a Canadian-origin flag or code field, but it's effectively empty: 96% of the 50 rows are null, and the only 2 non-null values are both blank strings. With cardinality of 1 and entropy of 0, the column carries no information.

Treatment: Drop; the column is 96% null and the remaining values are empty strings.

anthropic:claude-opus-4-7 · confidence high
Out[930]:

saturn.columns["origin_ca"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 233.
Top values for origin_ca.
Show data table
Top values for origin_ca (1 unique shown, of 1 total).
valuecountshare
24.0%

product_name_et categorical metadata

Estonian-localized product name field, but 94% of the 50 rows are null and only 3 distinct values appear among the remainder — including one empty string and two names that are actually French and English, not Estonian. With each surviving value occurring exactly once (entropy_ratio 1.0), this column carries almost no usable signal and shows a language-tagging mismatch.

Treatment: Drop from modelling; revisit upstream localization pipeline since values aren't in Estonian.

anthropic:claude-opus-4-7 · confidence high
Out[933]:

saturn.columns["product_name_et"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value Chocolat noir - 85% cacao
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 234.
Top values for product_name_et.
Show data table
Top values for product_name_et (3 unique shown, of 3 total).
valuecountshare
Chocolat noir - 85% cacao12.0%
12.0%
Excellence 70% Cocoa Intense Dark12.0%

ingredients_text_with_allergens_bg categorical free_text

Bulgarian-language ingredient lists with inline HTML allergen markup (), localised for the bg market. Coverage is extremely thin: 94% null and only 3 distinct values across 50 rows, one of which is an empty string. The two real entries are confectionery ingredient declarations mentioning soy, hazelnuts and milk allergens.

Treatment: Strip the HTML tags and treat as free text; too sparse (94% null) to use as a feature without aggregation across locales.

anthropic:claude-opus-4-7 · confidence high
Out[936]:

saturn.columns["ingredients_text_with_allergens_bg"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 235.
Top values for ingredients_text_with_allergens_bg.
Show data table
Top values for ingredients_text_with_allergens_bg (3 unique shown, of 3 total).
valuecountshare
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,12.0%
12.0%
Захар, палмово масло, ЛЕШНИЦИ (13%), обезмаслено МЛЯКО на прах (8,7%), нискомаслено какао на прах (7,4%), емулгатор: лецитини (СОЯ), ванилин.12.0%

ingredients_text_with_allergens_et categorical free_text

Estonian-localised ingredient text with allergen markup, but only 3 of 50 rows carry a value (null_rate 0.94) and all three are unique — and not even in Estonian (one Slovenian, one German, plus one short Estonian entry). The field is essentially empty and the few populated rows show a language mix rather than the expected `et` locale, suggesting upstream localisation fallback or mislabelling.

Treatment: Drop unless you specifically need allergen extraction; 94% nulls and inconsistent language make it unusable as-is.

anthropic:claude-opus-4-7 · confidence high
Out[939]:

saturn.columns["ingredients_text_with_allergens_et"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije.
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 236.
Top values for ingredients_text_with_allergens_et.
Show data table
Top values for ingredients_text_with_allergens_et (3 unique shown, of 3 total).
valuecountshare
kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije.12.0%
Kakaomasse*, Zucker, Kakaobutter, Kakaopulver stark entöit, Emulgator: Sonnenblumenlecithine (E-322), natürliches Vanille-Aroma, * Rainforest Alliance Certified, Kakao: 74% mindestens,12.0%
Kakaomass, suhkur, kakaovoi, vanill.12.0%

origin_sk categorical foreign_key

`origin_sk` appears to be a surrogate key for an origin entity, but it carries almost no information in this slice: 98% of the 50 rows are null and the single non-null value is an empty string. Cardinality is 1 and entropy is 0, so the column is effectively constant where populated.

Treatment: Drop from modelling; investigate upstream join before relying on it.

anthropic:claude-opus-4-7 · confidence high
Out[942]:

saturn.columns["origin_sk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 237.
Top values for origin_sk.
Show data table
Top values for origin_sk (1 unique shown, of 1 total).
valuecountshare
12.0%

origin_bg categorical foreign_key

This appears to be an origin block-group identifier, likely a geographic foreign key. It is effectively unusable here: 94% of rows are null, and the only 1 distinct value observed across the 50 rows is the empty string (3 occurrences), giving entropy 0.0.

Treatment: Drop; no usable signal at this null rate and cardinality.

anthropic:claude-opus-4-7 · confidence high
Out[945]:

saturn.columns["origin_bg"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 238.
Top values for origin_bg.
Show data table
Top values for origin_bg (1 unique shown, of 1 total).
valuecountshare
36.0%

packaging_text_sl categorical free_text

This appears to be a Slovenian-language packaging text field, but it is effectively empty: 98% of the 50 rows are null and the single non-null value is an empty string, giving cardinality 1 and entropy 0. There is no usable signal here.

Treatment: Drop; the column is 98% null with only an empty-string value otherwise.

anthropic:claude-opus-4-7 · confidence high
Out[948]:

saturn.columns["packaging_text_sl"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 239.
Top values for packaging_text_sl.
Show data table
Top values for packaging_text_sl (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_sk categorical foreign_key

Likely a surrogate key linking to a generic drug name dimension, but it is effectively empty in this sample. 98% of rows are null and the single non-null value is the empty string, giving cardinality 1 and zero entropy.

Treatment: Drop or defer until a non-empty sample is available; carries no signal here.

anthropic:claude-opus-4-7 · confidence high
Out[951]:

saturn.columns["generic_name_sk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 240.
Top values for generic_name_sk.
Show data table
Top values for generic_name_sk (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_sl categorical free_text

This appears to be a Slovenian-language ingredients list with embedded allergen HTML markup (``), likely a localized product label field. The column is almost entirely empty with a null_rate of 0.98, leaving only 1 non-null row out of 50, and that single value is the only unique entry (cardinality 1, entropy 0.0). With essentially no signal and HTML mixed into the text, it carries no analytical value as-is.

Treatment: Drop; 98% null and only one observed value make it unusable, or strip HTML and reserve for text extraction if more rows arrive.

anthropic:claude-opus-4-7 · confidence high
Out[954]:

saturn.columns["ingredients_text_with_allergens_sl"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 241.
Top values for ingredients_text_with_allergens_sl.
Show data table
Top values for ingredients_text_with_allergens_sl (1 unique shown, of 1 total).
valuecountshare
Kakavova masa, manjmasten kakavov prah, kakavovo maslo, sladkor, emulgator: lecitini (sojin lecitin); ekstrakt vanilije. Lahko vsebuje sledi oreškov (lešniki, mandlji, pistacija) in mleka. Uporabno najmanj do: glej odtis na zadnji strani embalaže.12.0%

ingredients_text_ca categorical free_text

Catalan-language ingredients text field, almost entirely absent from this sample. 96% of the 50 rows are null and the only 2 non-null values are empty strings, giving a single distinct value and zero entropy.

Treatment: Drop; the column carries no usable signal in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[957]:

saturn.columns["ingredients_text_ca"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 242.
Top values for ingredients_text_ca.
Show data table
Top values for ingredients_text_ca (1 unique shown, of 1 total).
valuecountshare
24.0%

generic_name_sl categorical metadata

This column appears to be a Slovenian-language generic name field that is effectively empty in this sample. With a 98% null rate and the only non-null value being an empty string, there is zero usable signal (entropy 0.0, cardinality 1).

Treatment: Drop; no usable values present.

anthropic:claude-opus-4-7 · confidence high
Out[960]:

saturn.columns["generic_name_sl"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 243.
Top values for generic_name_sl.
Show data table
Top values for generic_name_sl (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_dz categorical metadata

This appears to be a localized product name field (Dzongkha or similar locale suffix), but it is effectively empty: 98% of the 50 rows are null and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, so the column carries no usable signal in this sample.

Treatment: Drop; the column is 98% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[963]:

saturn.columns["product_name_dz"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 244.
Top values for product_name_dz.
Show data table
Top values for product_name_dz (1 unique shown, of 1 total).
valuecountshare
12.0%

origin_et categorical metadata

`origin_et` appears to be a categorical metadata field, but it carries almost no information here: 94% of the 50 rows are null and the only non-null value observed is the empty string, which accounts for all 3 populated rows. Cardinality is 1 and entropy is 0, so the column is effectively constant where present.

Treatment: Drop; constant empty value with 94% nulls offers no signal.

anthropic:claude-opus-4-7 · confidence high
Out[966]:

saturn.columns["origin_et"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 245.
Top values for origin_et.
Show data table
Top values for origin_et (1 unique shown, of 1 total).
valuecountshare
36.0%

ingredients_text_with_allergens_sk categorical identifier

Column appears to be a surrogate key for ingredient text with allergen markup, but it is effectively empty: 98% of 50 rows are null and the only observed value is the empty string. Cardinality is 1 with zero entropy, so there is no usable signal here.

Treatment: Drop; the column is 98% null with a single empty value.

anthropic:claude-opus-4-7 · confidence high
Out[969]:

saturn.columns["ingredients_text_with_allergens_sk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 246.
Top values for ingredients_text_with_allergens_sk.
Show data table
Top values for ingredients_text_with_allergens_sk (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_sk categorical free_text

Almost certainly a Slovak product name field that is effectively empty in this slice — 98% of the 50 rows are null and the single non-null value is itself an empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here whatsoever.

Treatment: Drop; the column is 98% null with only an empty string observed.

anthropic:claude-opus-4-7 · confidence high
Out[972]:

saturn.columns["product_name_sk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 247.
Top values for product_name_sk.
Show data table
Top values for product_name_sk (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_pt categorical free_text

Portuguese-language ingredient lists with embedded HTML allergen tags (), likely scraped from a food product database. The column is sparsely populated with an 0.84 null rate, and among the 8 non-null rows 5 are empty strings, leaving only 3 genuine ingredient declarations. Each non-empty value is unique and contains raw HTML markup rather than cleaned text.

Treatment: Strip HTML tags to extract allergen tokens, then treat as sparse free text; too null-heavy for direct modelling.

anthropic:claude-opus-4-7 · confidence high
Out[975]:

saturn.columns["ingredients_text_with_allergens_pt"].stats

statvalue
n50
nulls42 (84.0%)
unique4
top_value
top_rate 0.625
cardinality 4
entropy 1.549
entropy_ratio 0.7744
alert: long_tail3 singleton categories
alert: null_rate84.0% null
Fig 248.
Top values for ingredients_text_with_allergens_pt.
Show data table
Top values for ingredients_text_with_allergens_pt (4 unique shown, of 4 total).
valuecountshare
510.0%
Creme para barrar de AVELAS e cacau 40% (açúcar, gordura de palma, AVELAS (13%), LEITE desnatado em pó (8,7%), cacau magro (7,4%), emulsionantes: lecitinas (SOJA), vanilina), farinha de TRIGO (32,5%), gorduras vegetais (palma, palmiste), açúcar de cana (contém TRIGO) (8,5%), LACTOSE, farelo de TRIGO, LEITE inteiro em pó, mel, levedantes químicos (difosfato dissódico, hidrogenocarbonato de sódio, hidrogenocarbonato de amónio), farinha de CEVADA maltada, cacau magro, sal, extrato em pó de malte de CEVADA e milho, amido de TRIGO, emulsionantes: lecitinas (SOJA), vanilina.12.0%
Farinha de TRIGO, gordura de palma, xarope de glucose, extrato de CEVADA malteada, levedantes (carbonatos de amónio, carbonatos de sódio), sal, OVOS, aroma, agente de tratamento da farinha (METABISSULFITO de sódio).12.0%
Pasta de cacau, açúcar, manteiga de cacau, baunilha.12.0%

ingredients_text_with_allergens_ca categorical free_text

Localized ingredients text with allergens for Catalan, but it's effectively empty in this sample: 98% null and the only non-null value observed is itself an empty string. With cardinality of 1 and entropy 0, this column carries no information here.

Treatment: Drop; no usable signal in this slice.

anthropic:claude-opus-4-7 · confidence high
Out[978]:

saturn.columns["ingredients_text_with_allergens_ca"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 249.
Top values for ingredients_text_with_allergens_ca.
Show data table
Top values for ingredients_text_with_allergens_ca (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_pt categorical free_text

Portuguese-language generic product name, present for only 20% of the 50 rows (null_rate 0.8) and otherwise dominated by an empty string (top_rate 0.8 on value ''). Among the 10 non-null entries only 2 distinct strings appear in the top values, both descriptive food labels like 'Chocolate extrafino com 70% de cacau'. Coverage is too thin and cardinality too low (n_unique 3 including the blank) to support modelling on its own.

Treatment: Drop or retain only as a fallback display label; coverage is too sparse to feature-engineer.

anthropic:claude-opus-4-7 · confidence high
Out[981]:

saturn.columns["generic_name_pt"].stats

statvalue
n50
nulls40 (80.0%)
unique3
top_value
top_rate 0.8
cardinality 3
entropy 0.9219
entropy_ratio 0.5817
alert: long_tail2 singleton categories
alert: null_rate80.0% null
Fig 250.
Top values for generic_name_pt.
Show data table
Top values for generic_name_pt (3 unique shown, of 3 total).
valuecountshare
816.0%
Bolachas recheadas de creme para barrar de avelãs e cacau NUTELLA®12.0%
Chocolate extrafino com 70% de cacau12.0%

packaging_text_pt categorical free_text

This appears to be a Portuguese packaging-text field, likely free-form descriptions of product packaging. It is effectively empty: 80% of the 50 rows are null, and the remaining 10 rows all hold the empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here.

Treatment: Drop the column; it carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[984]:

saturn.columns["packaging_text_pt"].stats

statvalue
n50
nulls40 (80.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate80.0% null
alert: imbalancetop value is 100.0% of rows
Fig 251.
Top values for packaging_text_pt.
Show data table
Top values for packaging_text_pt (1 unique shown, of 1 total).
valuecountshare
1020.0%

ingredients_text_pt categorical free_text

Portuguese-language ingredient lists for food products, stored as free text. The column is mostly empty: 80% null and the most common value (7 of 50 rows) is an empty string, leaving only 4 distinct values across 50 rows. The few populated entries are long, comma-separated ingredient declarations with allergen tokens in caps or underscores.

Treatment: Treat as free text: drop or impute the empty majority, then tokenize and parse ingredients before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[987]:

saturn.columns["ingredients_text_pt"].stats

statvalue
n50
nulls40 (80.0%)
unique4
top_value
top_rate 0.7
cardinality 4
entropy 1.357
entropy_ratio 0.6784
alert: long_tail3 singleton categories
alert: null_rate80.0% null
Fig 252.
Top values for ingredients_text_pt.
Show data table
Top values for ingredients_text_pt (4 unique shown, of 4 total).
valuecountshare
714.0%
Creme para barrar de AVELAS e cacau 40% (açúcar, gordura de palma, AVELAS (13%), LEITE desnatado em pó (8,7%), cacau magro (7,4%), emulsionantes: lecitinas (SOJA), vanilina), farinha de TRIGO (32,5%), gorduras vegetais (palma, palmiste), açúcar de cana (contém TRIGO) (8,5%), LACTOSE, farelo de TRIGO, LEITE inteiro em pó, mel, levedantes químicos (difosfato dissódico, hidrogenocarbonato de sódio, hidrogenocarbonato de amónio), farinha de CEVADA maltada, cacau magro, sal, extrato em pó de malte de CEVADA e milho, amido de TRIGO, emulsionantes: lecitinas (SOJA), vanilina.12.0%
Farinha de _TRIGO_, gordura de palma, xarope de glucose, extrato de _CEVADA_ malteada, levedantes (carbonatos de amónio, carbonatos de sódio), sal, _OVOS_, aroma, agente de tratamento da farinha (_METABISSULFITO_ de sódio).12.0%
Pasta de cacau, açúcar, manteiga de cacau, baunilha.12.0%

origin_pt categorical metadata

This appears to be an origin point identifier, but it carries no usable signal in this sample. 80% of rows are null and the remaining 10 rows all hold the same empty-string value, giving a single unique category and entropy of 0.

Treatment: Drop; the column is 80% null and the rest is a constant empty string.

anthropic:claude-opus-4-7 · confidence high
Out[990]:

saturn.columns["origin_pt"].stats

statvalue
n50
nulls40 (80.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate80.0% null
alert: imbalancetop value is 100.0% of rows
Fig 253.
Top values for origin_pt.
Show data table
Top values for origin_pt (1 unique shown, of 1 total).
valuecountshare
1020.0%

nutrition_score_warning_nutriments_estimated numeric feature

This appears to be a flag indicating that the nutrition score warning was estimated from nutriment data, likely a 0/1 boolean. Of 50 rows, 96% are null and the remaining 4% all carry the value 1.0, making it effectively constant where present. With no variation and almost no coverage, it carries no usable signal.

Treatment: Drop; constant-when-present and 96% null.

anthropic:claude-opus-4-7 · confidence high
Out[993]:

saturn.columns["nutrition_score_warning_nutriments_estimated"].stats

statvalue
n50
nulls48 (96.0%)
unique1
min 1
max 1
mean 1
median 1
std 0
q1 1
q3 1
iqr 0
skew 0
kurtosis 0
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate96.0% null
alert: constantonly one distinct value
Fig 254.
Distribution of nutrition_score_warning_nutriments_estimated. Vertical dash marks the median.
Show data table
Histogram bins for nutrition_score_warning_nutriments_estimated (median: 1.0).
bincount
0.5 – 0.70
0.7 – 0.90
0.9 – 1.12
1.1 – 1.30
1.3 – 1.50

packaging_text_bg categorical free_text

Likely Bulgarian-language packaging text from a product database. The column is effectively empty: 94% null and the only non-null value across 50 rows is the empty string itself (3 occurrences), giving cardinality 1 and zero entropy.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[996]:

saturn.columns["packaging_text_bg"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 255.
Top values for packaging_text_bg.
Show data table
Top values for packaging_text_bg (1 unique shown, of 1 total).
valuecountshare
36.0%

generic_name_et categorical metadata

This appears to be an Estonian-language generic name field, but it carries no usable signal in this sample: 94% of rows are null and the only non-null value observed is the empty string (3 occurrences), giving cardinality 1 and entropy 0. Effectively every record is missing or blank.

Treatment: Drop; column is 94% null with a single empty-string value otherwise.

anthropic:claude-opus-4-7 · confidence high
Out[999]:

saturn.columns["generic_name_et"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 256.
Top values for generic_name_et.
Show data table
Top values for generic_name_et (1 unique shown, of 1 total).
valuecountshare
36.0%

packaging_text_ca categorical metadata

Likely a Canadian-locale packaging text field from a product catalog. It is effectively empty: 96% null and the only 2 non-null values are both blank strings, giving a single distinct value and zero entropy.

Treatment: Drop; the column carries no signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1002]:

saturn.columns["packaging_text_ca"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 257.
Top values for packaging_text_ca.
Show data table
Top values for packaging_text_ca (1 unique shown, of 1 total).
valuecountshare
24.0%

product_name_sl categorical metadata

Localized Slovenian product name field that is effectively empty: 98% of 50 rows are null and the single populated row reads "ARRIBA 85% cacao". With cardinality 1 and entropy 0, this column carries no usable signal in the current sample.

Treatment: Drop from modelling; revisit only if a fuller localized catalogue becomes available.

anthropic:claude-opus-4-7 · confidence high
Out[1005]:

saturn.columns["product_name_sl"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value ARRIBA 85% cacao
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 258.
Top values for product_name_sl.
Show data table
Top values for product_name_sl (1 unique shown, of 1 total).
valuecountshare
ARRIBA 85% cacao12.0%

generic_name_bg categorical metadata

This appears to be a Bulgarian-language generic drug name field, but it is effectively empty: 94% of the 50 rows are null and the only non-null value observed is the empty string itself, repeated 3 times. Cardinality is 1 with zero entropy, so the column carries no information.

Treatment: Drop; the column is 94% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[1008]:

saturn.columns["generic_name_bg"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 259.
Top values for generic_name_bg.
Show data table
Top values for generic_name_bg (1 unique shown, of 1 total).
valuecountshare
36.0%

ingredients_text_sk categorical free_text

This appears to be a Slovak-language ingredients text field (suffix _sk), but it is effectively empty in this sample: 98% of 50 rows are null and the single non-null value is an empty string, yielding cardinality 1 and entropy 0.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1011]:

saturn.columns["ingredients_text_sk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 260.
Top values for ingredients_text_sk.
Show data table
Top values for ingredients_text_sk (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_bg categorical free_text

Bulgarian-language ingredient lists for food products, stored as free text in Cyrillic. The column is almost entirely empty (null_rate 0.94) with only 3 non-null values across 50 rows, each unique — effectively unusable as a categorical feature. Despite the categorical kind label, the content is long-form ingredient prose, not a discrete category.

Treatment: Drop unless doing multilingual text analysis; 94% null leaves too little signal.

anthropic:claude-opus-4-7 · confidence high
Out[1014]:

saturn.columns["ingredients_text_bg"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 261.
Top values for ingredients_text_bg.
Show data table
Top values for ingredients_text_bg (3 unique shown, of 3 total).
valuecountshare
Какаова маса, нискомаслено какао на прах, какаово масло, захар, емулгатор: лецитин (соеви), екстракт от ванилия, Може да съдържа следи от ядки и мляко,12.0%
12.0%
Захар, палмово масло, ЛЕШНИЦИ (13%), обезмаслено МЛЯКО на прах (8,7%), нискомаслено какао на прах (7,4%), емулгатор: лецитини (СОЯ), ванилин.12.0%

packaging_text_et categorical free_text

Estonian packaging text field that is effectively empty: 94% of 50 rows are null, and the only non-null value observed is the empty string itself (3 occurrences). With cardinality of 1 and entropy of 0, this column carries no information.

Treatment: Drop; no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1017]:

saturn.columns["packaging_text_et"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 262.
Top values for packaging_text_et.
Show data table
Top values for packaging_text_et (1 unique shown, of 1 total).
valuecountshare
36.0%

packaging_text_sk categorical foreign_key

This appears to be a surrogate key for packaging text, but it is essentially empty: 98% of the 50 rows are null and the single non-null value is an empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here.

Treatment: Drop; 98% null and only one observed value.

anthropic:claude-opus-4-7 · confidence high
Out[1020]:

saturn.columns["packaging_text_sk"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 263.
Top values for packaging_text_sk.
Show data table
Top values for packaging_text_sk (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_pt categorical free_text

This is a Portuguese-localized product name field, but it is mostly empty: 80% null and only 7 distinct values across 50 rows, with the top value being the empty string at 40%. The non-null entries are a language mix (Portuguese, Italian, French, English) rather than purely Portuguese, suggesting fallback to original-language labels when no translation exists. Entropy ratio of 0.90 reflects that the few present values are nearly all unique.

Treatment: Drop or treat as optional metadata; too sparse and language-inconsistent for direct modelling.

anthropic:claude-opus-4-7 · confidence high
Out[1023]:

saturn.columns["product_name_pt"].stats

statvalue
n50
nulls40 (80.0%)
unique7
top_value
top_rate 0.4
cardinality 7
entropy 2.522
entropy_ratio 0.8983
alert: long_tail6 singleton categories
alert: null_rate80.0% null
Fig 264.
Top values for product_name_pt.
Show data table
Top values for product_name_pt (7 unique shown, of 7 total).
valuecountshare
48.0%
Cioccolato Fondente 85% Cacao12.0%
Crocantes bolachas com um coração cremoso de Nutella®12.0%
70% Cacao noir intense12.0%
Excellence 70% Cocoa Intense Dark12.0%
Original12.0%
Mix com sultanas e arandos12.0%

abbreviated_product_name_fr categorical label

Likely a French abbreviated product name field (brand + descriptor + size), used as a display label for items. The column is mostly empty with a null_rate of 0.86, leaving only 7 unique values across 50 rows, each appearing once — entropy_ratio is 1.0, so among the populated rows every value is distinct. Sparsity makes it unusable as a categorical feature in its current state.

Treatment: Drop or treat as free text; too sparse and unique to encode as a category.

anthropic:claude-opus-4-7 · confidence high
Out[1026]:

saturn.columns["abbreviated_product_name_fr"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value CRISTALINE Eau De Source 0.5L
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 265.
Top values for abbreviated_product_name_fr.
Show data table
Top values for abbreviated_product_name_fr (7 unique shown, of 7 total).
valuecountshare
CRISTALINE Eau De Source 0.5L12.0%
Nutella biscuits t2212.0%
Authentique 275g, fr12.0%
Fibres 230g, fr12.0%
ORG Original 175g12.0%
NESTLE DESSERT Noir 205g12.0%
BRIOCHE TRANCHEE BIO 400g12.0%

obsolete_imported categorical feature

This appears to be a boolean-style flag indicating whether a record was imported as obsolete, but the signal is effectively absent: 86% of rows are null and the only observed value is "0" across all 7 non-null entries. Cardinality is 1 with zero entropy, so the column carries no discriminative information in this sample.

Treatment: Drop; constant value with 86% nulls offers no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1029]:

saturn.columns["obsolete_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique1
top_value 0
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate86.0% null
alert: imbalancetop value is 100.0% of rows
Fig 266.
Top values for obsolete_imported.
Show data table
Top values for obsolete_imported (1 unique shown, of 1 total).
valuecountshare
0714.0%

sources_fields unknown other

The column `sources_fields` was skipped by the profiler, so its kind, cardinality, and value statistics are all unavailable. The only confirmed signals are 50 rows present with a null rate of 0.0, meaning every row has some value, but nothing is known about what those values look like. Without further inspection this column cannot be characterised.

Treatment: Re-profile or manually inspect a sample before deciding on downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[1032]:

saturn.columns["sources_fields"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

emb_code categorical metadata

This appears to be an embargo or embarkation code, with values like "EMB 44068 A" suggesting an alphanumeric reference identifier. The column is almost entirely empty: 98% null across 50 rows, leaving a single non-null observation. With only one value present, entropy is 0 and no distributional inference is possible.

Treatment: Drop; 98% null with only one observed value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1034]:

saturn.columns["emb_code"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value EMB 44068 A
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 267.
Top values for emb_code.
Show data table
Top values for emb_code (1 unique shown, of 1 total).
valuecountshare
EMB 44068 A12.0%

lang_imported categorical metadata

Likely a language tag for imported records, but 86% of the 50 rows are null and the remaining 7 entries are all 'fr'. With only one observed value, entropy is 0 and the column carries no discriminative signal as captured.

Treatment: Drop or hold aside until more non-null values arrive; constant 'fr' with 86% nulls is unusable as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[1037]:

saturn.columns["lang_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique1
top_value fr
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate86.0% null
alert: imbalancetop value is 100.0% of rows
Fig 268.
Top values for lang_imported.
Show data table
Top values for lang_imported (1 unique shown, of 1 total).
valuecountshare
fr714.0%

generic_name_zh categorical metadata

This appears to be a Chinese generic-name field, but it is effectively empty: 98% of the 50 rows are null and the single non-null observation is itself an empty string. Cardinality is 1 with zero entropy, so the column carries no usable signal in this sample.

Treatment: Drop; the column is 98% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[1040]:

saturn.columns["generic_name_zh"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 269.
Top values for generic_name_zh.
Show data table
Top values for generic_name_zh (1 unique shown, of 1 total).
valuecountshare
12.0%

conservation_conditions_fr_imported categorical free_text

This column holds French-language storage instructions imported from an external source (e.g., 'A conserver de préférence à l'abri du soleil...'). Coverage is extremely sparse: 86% null and only 7 distinct phrasings across 7 non-null rows, each appearing exactly once. The values are free-text variants of the same advice rather than a controlled vocabulary, so entropy_ratio sits at 1.0.

Treatment: Drop or normalise via keyword extraction; too sparse and too variable to use as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[1043]:

saturn.columns["conservation_conditions_fr_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 270.
Top values for conservation_conditions_fr_imported.
Show data table
Top values for conservation_conditions_fr_imported (7 unique shown, of 7 total).
valuecountshare
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.12.0%
A conserver au sec et à l'abri de la chaleur. Ne pas mettre au réfrigérateur.12.0%
A conserver dans un endroit sec à l'abri de la lumière.12.0%
Conserver dans un endroit frais et sec.12.0%
À conserver dans un endroit sec12.0%
A conserver au frais et au sec.12.0%
À conserver dans son emballage fermé, dans un endroit sec, à température ambiante.12.0%

origin_fr_imported categorical free_text

This appears to be a French-language origin/import provenance field, with values ranging from a single country tag ("France") to a multi-line description of cocoa paste sourcing across continents. Only 2 of 50 rows are populated (null_rate 0.96), and both populated values are unique, giving entropy_ratio 1.0 over a cardinality of 2. The mix of a clean country label with a long descriptive string suggests inconsistent data entry rather than a true categorical.

Treatment: Drop or defer; 96% null and entries mix country codes with prose, so not usable as a category without manual normalisation.

anthropic:claude-opus-4-7 · confidence high
Out[1046]:

saturn.columns["origin_fr_imported"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value France
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 271.
Top values for origin_fr_imported.
Show data table
Top values for origin_fr_imported (2 unique shown, of 2 total).
valuecountshare
France12.0%
Pâte de cacao (Afrique de l'Ouest, Amérique du Sud) Afrique, Europe, Madagascar, Amérique du Sud, Afrique de l'Ouest12.0%

owner categorical metadata

Categorical column listing the owning organization (food/beverage manufacturers like Barilla, Ferrero, Nestlé) for each record. The column is overwhelmingly empty: 86% null, leaving only 7 populated rows spread across 6 distinct owners, with Barilla appearing twice and the rest singletons. Entropy ratio of 0.98 confirms the non-null values are nearly uniform, so there is little signal beyond identifying who submitted the entry.

Treatment: Drop or retain as provenance metadata only; too sparse for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[1049]:

saturn.columns["owner"].stats

statvalue
n50
nulls43 (86.0%)
unique6
top_value org-barilla-france-sa
top_rate 0.2857
cardinality 6
entropy 2.522
entropy_ratio 0.9755
alert: long_tail5 singleton categories
alert: null_rate86.0% null
Fig 272.
Top values for owner.
Show data table
Top values for owner (6 unique shown, of 6 total).
valuecountshare
org-barilla-france-sa24.0%
org-gie-sources-alma12.0%
org-ferrero-france-commerciale12.0%
org-kellogg-s12.0%
org-nestle-france12.0%
org-la-boulangere-co12.0%

ingredients_text_fr_imported categorical free_text

French-language ingredient declarations imported from an external source, with each non-null value being a long free-text recipe listing (allergens capitalised, percentages, additive codes). The column is 86% null and the 7 present values are all unique, yielding maximum entropy (entropy_ratio 1.0) and a top_rate of just 0.14. This is unstructured product copy, not a category, despite being typed as categorical.

Treatment: Treat as free text: parse ingredient lists or tokenize/embed for NLP rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[1052]:

saturn.columns["ingredients_text_fr_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value Eau de Source
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 273.
Top values for ingredients_text_fr_imported.
Show data table
Top values for ingredients_text_fr_imported (7 unique shown, of 7 total).
valuecountshare
Eau de Source12.0%
Pâte à tartiner aux NOISETTES et au cacao 40% (sucre, huile de palme, NOISETTES 13%, LAIT écrémé en poudre 8,7%, cacao maigre 7,4%, émulsifiants : lécithines [SOJA] ; vanilline), farine de FROMENT 32%, graisses végétales (palme, palmiste), sucre de canne 8,5%, LACTOSE, son de BLE, LAIT en poudre, extrait en poudre de malt d'ORGE et de maïs, miel, poudres à lever (disphosphate disodique, carbonate acide d'ammonium, carbonate acide de sodium), cacao maigre, sel, amidon de FROMENT, farine d'ORGE malté, émulsifiants : lécithines [SOJA] ; vanilline.12.0%
Farine complète de SEIGLE (77 g*), farine de SEIGLE (28 g*), levure, sel. Peut contenir des traces de LUPIN, LAIT, MOUTARDE, GRAINES DE SÉSAME et SOJA. *en g pour 100 g de produit.12.0%
Farine complète de SEIGLE 59 g*, son de BLÉ 27 g*, flocons d'AVOINE 12 g*, GRAINES DE SÉSAME 7,0 g*, germe de BLÉ, sel. *en g pour 100 g de produit fini. Peut contenir des traces de LUPIN, LAIT, MOUTARDE et SOJA.12.0%
Pommes de terre déshydratées, huiles végétales (tournesol, maïs), farine de riz, amidon de BLÉ, farine de maïs, émulsifiant (E471), maltodextrine, sel, extrait de levure, levure en poudre, colorant (rocou).12.0%
Sucre, pâte de cacao (Afrique de l'Ouest, Amérique du Sud), beurre de cacao, émulsifiant (lécithine), arôme naturel de vanille de Madagascar. Cacao : 53% minimum. Peut contenir : LAIT, FRUITS A COQUE.12.0%
Farine de BLÉ*/** 54%, ŒUFS entiers*/** 14%, sucre de canne roux*, huile de tournesol*/** 8%, levain* (eau, farines de BLÉ*/** 2% et de SEIGLE*, levures), GLUTEN DE BLÉ*, sel, levure, arôme naturel de vanille* (contient alcool*), extrait de vanille*, levure désactivée. Traces éventuelles de lait, moutarde et soja. *Ingrédients issus de l'Agriculture Biologique. **Ingrédients issus du commerce équitable français.12.0%

owners_tags categorical metadata

Categorical tag identifying the owning organization for each record, with values like 'org-barilla-france-sa' and 'org-nestle-france' suggesting Open Food Facts-style contributor org slugs. The column is 86% null, leaving only 7 populated rows spread across 6 distinct owners, so entropy ratio is near-saturated (0.976) and the top value covers just 2 records. With this much sparsity it carries almost no signal at n=50.

Treatment: Drop or retain only as a provenance tag; too sparse to use as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[1055]:

saturn.columns["owners_tags"].stats

statvalue
n50
nulls43 (86.0%)
unique6
top_value org-barilla-france-sa
top_rate 0.2857
cardinality 6
entropy 2.522
entropy_ratio 0.9755
alert: long_tail5 singleton categories
alert: null_rate86.0% null
Fig 274.
Top values for owners_tags.
Show data table
Top values for owners_tags (6 unique shown, of 6 total).
valuecountshare
org-barilla-france-sa24.0%
org-gie-sources-alma12.0%
org-ferrero-france-commerciale12.0%
org-kellogg-s12.0%
org-nestle-france12.0%
org-la-boulangere-co12.0%

product_name_zh categorical metadata

This appears to be a Chinese product name field that is effectively empty in this sample: 98% of the 50 rows are null, and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, so the column carries no usable signal here.

Treatment: Drop from modelling; revisit only if a larger sample shows actual Chinese strings populated.

anthropic:claude-opus-4-7 · confidence high
Out[1058]:

saturn.columns["product_name_zh"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 275.
Top values for product_name_zh.
Show data table
Top values for product_name_zh (1 unique shown, of 1 total).
valuecountshare
12.0%

nutrition_data_prepared_per_imported categorical metadata

This column appears to be metadata indicating the basis on which nutrition data was prepared, with the only observed value being '100g'. It is essentially a constant: 86% of rows are null and the remaining 7 entries all share the single value '100g', giving zero entropy.

Treatment: Drop; constant column with no information.

anthropic:claude-opus-4-7 · confidence high
Out[1061]:

saturn.columns["nutrition_data_prepared_per_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique1
top_value 100g
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate86.0% null
alert: imbalancetop value is 100.0% of rows
Fig 276.
Top values for nutrition_data_prepared_per_imported.
Show data table
Top values for nutrition_data_prepared_per_imported (1 unique shown, of 1 total).
valuecountshare
100g714.0%

abbreviated_product_name_fr_imported categorical metadata

This appears to be a French-language abbreviated product name field, likely imported from an external catalog. It is overwhelmingly empty with a null_rate of 0.86, leaving only 7 distinct values across 50 rows, each appearing once (top_rate 0.143, entropy_ratio 1.0). The few populated entries mix brand-led formats like "CRISTALINE Eau De Source 0.5L" and "NESTLE DESSERT Noir 205g" with locale tags such as "Authentique 275g, fr".

Treatment: Drop or defer; too sparse (86% null) and unique to model directly.

anthropic:claude-opus-4-7 · confidence high
Out[1064]:

saturn.columns["abbreviated_product_name_fr_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value CRISTALINE Eau De Source 0.5L
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 277.
Top values for abbreviated_product_name_fr_imported.
Show data table
Top values for abbreviated_product_name_fr_imported (7 unique shown, of 7 total).
valuecountshare
CRISTALINE Eau De Source 0.5L12.0%
Nutella biscuits t2212.0%
Authentique 275g, fr12.0%
Fibres 230g, fr12.0%
ORG Original 175g12.0%
NESTLE DESSERT Noir 205g12.0%
BRIOCHE TRANCHEE BIO 400g12.0%

generic_name_zh_debug_tags unknown metadata

This column appears to be a debug-tag field associated with Chinese generic names, but saturn skipped profiling so no value-level statistics are available. The only confirmed signals are 50 rows with a 0.0 null rate; uniqueness, distribution, and content are unknown.

Treatment: Re-profile or inspect manually before use; likely drop as debug instrumentation.

anthropic:claude-opus-4-7 · confidence low
Out[1067]:

saturn.columns["generic_name_zh_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

customer_service_fr categorical free_text

This column holds French-language customer service contact details (postal addresses or web contact URLs) for product manufacturers. It is overwhelmingly empty with an 86% null rate, leaving only 7 non-null values across 6 nearly-unique strings (entropy ratio 0.976), with the top entry — a Wasa contact URL — appearing just twice. The values are unstructured free text mixing URLs, company names, and postal addresses.

Treatment: Drop or treat as sparse metadata; not usable as a categorical feature given 86% nulls and near-unique values.

anthropic:claude-opus-4-7 · confidence high
Out[1069]:

saturn.columns["customer_service_fr"].stats

statvalue
n50
nulls43 (86.0%)
unique6
top_value Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate 0.2857
cardinality 6
entropy 2.522
entropy_ratio 0.9755
alert: long_tail5 singleton categories
alert: null_rate86.0% null
Fig 278.
Top values for customer_service_fr.
Show data table
Top values for customer_service_fr (6 unique shown, of 6 total).
valuecountshare
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)24.0%
Service Consommateurs Cristaline, 70 avenue des Sources 03270 SAINT YORRE12.0%
FERRERO FRANCE COMMERCIALE - Service Consommateurs, CS 90058 - 76136 MONT SAINT AIGNAN Cedex12.0%
Service Conseil Consommateurs, Kellogg's Produits Alimentaires S.A.S. - Immeuble Neptune - 1 rue Galilée 93160 Noisy-le-Grand (France)12.0%
Nestlé France, 34-40 rue Guynemer 92130 Issy-les-Moulineaux12.0%
Service consommateurs La Boulangère & Co, La Boulangère & Co 1 rue du petit bocage CS 40 201 85140 ESSARTS12.0%

customer_service_fr_imported categorical metadata

This column holds French-language customer service contact details (postal addresses or web contact URLs) for product manufacturers, imported as free-form strings. It is 86% null with only 7 populated rows yielding 6 distinct values, so it functions more as sparse metadata than an analytical feature. Entries vary in format from full postal addresses (Nestlé, Ferrero, Cristaline) to URLs, indicating no normalization upstream.

Treatment: Drop for modelling; retain only if needed as a manufacturer contact lookup.

anthropic:claude-opus-4-7 · confidence high
Out[1072]:

saturn.columns["customer_service_fr_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique6
top_value Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate 0.2857
cardinality 6
entropy 2.522
entropy_ratio 0.9755
alert: long_tail5 singleton categories
alert: null_rate86.0% null
Fig 279.
Top values for customer_service_fr_imported.
Show data table
Top values for customer_service_fr_imported (6 unique shown, of 6 total).
valuecountshare
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)24.0%
Service Consommateurs Cristaline, 70 avenue des Sources 03270 SAINT YORRE12.0%
FERRERO FRANCE COMMERCIALE - Service Consommateurs, CS 90058 - 76136 MONT SAINT AIGNAN Cedex12.0%
Service Conseil Consommateurs, Kellogg's Produits Alimentaires S.A.S. - Immeuble Neptune - 1 rue Galilée 93160 Noisy-le-Grand (France)12.0%
Nestlé France, 34-40 rue Guynemer 92130 Issy-les-Moulineaux12.0%
Service consommateurs La Boulangère & Co, La Boulangère & Co 1 rue du petit bocage CS 40 201 85140 ESSARTS12.0%

ingredients_text_zh_debug_tags unknown metadata

This column is flagged as kind "unknown" and was skipped by the profiler, so no statistics, uniqueness, or value samples are available. The only confirmed signals are 50 rows present and a 0.0 null rate. The name suggests it holds debug tags from Chinese-language ingredient text parsing, but that is inferred from the column name, not the evidence.

Treatment: Drop unless debug tags are explicitly needed; re-profile with a parser that handles this type before use.

anthropic:claude-opus-4-7 · confidence low
Out[1075]:

saturn.columns["ingredients_text_zh_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_fr_imported categorical free_text

French-language product names imported from an external source, judging by the suffix and the values like 'CRISTALINE Eau De Source 0.5L' and 'Biscuits Nutella x22 biscuits fourrés - 304g'. Only 7 of 50 rows carry a value (null_rate 0.86), and every populated value is unique (entropy_ratio 1.0, top_rate 0.143), so this behaves as free-text rather than a category. The extreme nullity combined with full uniqueness makes it unusable as a grouping key.

Treatment: Treat as sparse free text—drop for modelling or tokenize/embed if product identification is needed.

anthropic:claude-opus-4-7 · confidence high
Out[1077]:

saturn.columns["product_name_fr_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value CRISTALINE Eau De Source 0.5L
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 280.
Top values for product_name_fr_imported.
Show data table
Top values for product_name_fr_imported (7 unique shown, of 7 total).
valuecountshare
CRISTALINE Eau De Source 0.5L12.0%
Biscuits Nutella x22 biscuits fourrés - 304g12.0%
Wasa tartine croustillante authentique au seigle 275g12.0%
Wasa tartine croustillante fibres 230g12.0%
Chips Pringles Original12.0%
NESTLE DESSERT Noir 205g12.0%
Brioche Tranchée Bio 400g12.0%

brands_imported categorical feature

This appears to be a free-text brand field listing imported product brands, with 6 distinct values across only 7 non-null rows out of 50 (null_rate 0.86). The top value 'Wasa' appears just twice (top_rate 0.286), and entropy_ratio 0.976 indicates the few present values are nearly uniformly distributed. One entry 'NESTLE DESSERT,Tablettes' looks like a comma-joined multi-value string, suggesting inconsistent encoding.

Treatment: Split multi-value strings on comma and treat as low-coverage categorical; consider dropping given 86% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1080]:

saturn.columns["brands_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique6
top_value Wasa
top_rate 0.2857
cardinality 6
entropy 2.522
entropy_ratio 0.9755
alert: long_tail5 singleton categories
alert: null_rate86.0% null
Fig 281.
Top values for brands_imported.
Show data table
Top values for brands_imported (6 unique shown, of 6 total).
valuecountshare
Wasa24.0%
Cristaline12.0%
Nutella biscuits12.0%
Pringles12.0%
NESTLE DESSERT,Tablettes12.0%
La boulangere12.0%

owner_imported categorical foreign_key

Categorical column holding organisation slugs (e.g. 'org-barilla-france-sa', 'org-nestle-france'), almost certainly a foreign key to an owning company. It is 88% null with only 6 non-null rows spread across 5 distinct owners, so entropy_ratio is 0.97 simply because nearly every present value is unique. The column is too sparse to support any aggregation or join in its current state.

Treatment: Drop or defer: 88% null leaves too few rows to join or model on.

anthropic:claude-opus-4-7 · confidence high
Out[1083]:

saturn.columns["owner_imported"].stats

statvalue
n50
nulls44 (88.0%)
unique5
top_value org-barilla-france-sa
top_rate 0.3333
cardinality 5
entropy 2.252
entropy_ratio 0.9697
alert: long_tail4 singleton categories
alert: null_rate88.0% null
Fig 282.
Top values for owner_imported.
Show data table
Top values for owner_imported (5 unique shown, of 5 total).
valuecountshare
org-barilla-france-sa24.0%
org-gie-sources-alma12.0%
org-ferrero-france-commerciale12.0%
org-nestle-france12.0%
org-la-boulangere-co12.0%

product_name_zh_debug_tags unknown metadata

This column appears to be an internal debug-tagging field attached to Chinese product names, but saturn skipped profiling so its contents are opaque. The only confirmed facts are 50 rows with no nulls; uniqueness, value distribution, and type are all unreported. Without a profile pass it is impossible to tell whether it carries useful signal or just developer annotations.

Treatment: Re-run profiling with this kind enabled before deciding; provisionally drop as unparsed debug metadata.

anthropic:claude-opus-4-7 · confidence low
Out[1086]:

saturn.columns["product_name_zh_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

lc_imported categorical metadata

A categorical flag indicating the source language of imported records, with values 'fr' and 'es'. The column is dominated by missingness — 84% null across 50 rows — and among the 8 populated rows, 'fr' accounts for 7 (87.5%), leaving 'es' as a single observation. Cardinality is just 2, so this carries little signal in its current state.

Treatment: Treat nulls as a category or drop; near-constant with severe missingness limits modelling value.

anthropic:claude-opus-4-7 · confidence high
Out[1088]:

saturn.columns["lc_imported"].stats

statvalue
n50
nulls42 (84.0%)
unique2
top_value fr
top_rate 0.875
cardinality 2
entropy 0.5436
entropy_ratio 0.5436
alert: null_rate84.0% null
Fig 283.
Top values for lc_imported.
Show data table
Top values for lc_imported (2 unique shown, of 2 total).
valuecountshare
fr714.0%
es12.0%

ingredients_text_zh categorical free_text

This appears to be a Chinese-language ingredients text field, likely from a localized product/food dataset. It is effectively empty: 98% of the 50 rows are null and the only non-null value observed is itself an empty string, giving a cardinality of 1 and entropy of 0.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1091]:

saturn.columns["ingredients_text_zh"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 284.
Top values for ingredients_text_zh.
Show data table
Top values for ingredients_text_zh (1 unique shown, of 1 total).
valuecountshare
12.0%

quantity_imported categorical feature

This appears to be a free-form quantity/packaging size field mixing volume ('500 ml') and mass units ('304 g', '275 g'), stored as strings rather than parsed numerics. Coverage is extremely poor: 86% of the 50 rows are null, and among the 7 non-null values every one is unique (entropy_ratio 1.0, top_rate 0.14). With no repeated values and mixed units, it offers little categorical signal as-is.

Treatment: Parse into a numeric magnitude plus a unit column before use; given 86% nulls, consider dropping or imputing.

anthropic:claude-opus-4-7 · confidence high
Out[1094]:

saturn.columns["quantity_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value 500 ml
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 285.
Top values for quantity_imported.
Show data table
Top values for quantity_imported (7 unique shown, of 7 total).
valuecountshare
500 ml12.0%
304 g12.0%
275 g12.0%
230 g12.0%
175 g12.0%
205 g12.0%
400 g12.0%

nutrition_data_per_imported categorical metadata

Likely a metadata flag indicating the basis on which nutrition values were imported, with '100g' as the sole observed value across all 8 non-null rows. The column is 84% null and has only one unique value, giving zero entropy and no discriminative power.

Treatment: Drop; constant value with 84% nulls carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1097]:

saturn.columns["nutrition_data_per_imported"].stats

statvalue
n50
nulls42 (84.0%)
unique1
top_value 100g
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate84.0% null
alert: imbalancetop value is 100.0% of rows
Fig 286.
Top values for nutrition_data_per_imported.
Show data table
Top values for nutrition_data_per_imported (1 unique shown, of 1 total).
valuecountshare
100g816.0%

generic_name_fr_imported categorical free_text

French generic product names imported from an upstream source (e.g. Open Food Facts), holding descriptors like "Eau De Source" and "Biscuit fourré à la pâte à tartiner aux noisettes et au cacao Nutella®". The column is 86% null and every one of the 7 observed values is unique (entropy_ratio 1.0), so it behaves as free-text rather than a categorical feature. Values are in French with accented characters and brand marks, which will need normalisation if joined with other locales.

Treatment: Treat as multilingual free text: normalise accents and tokenize/embed if used; otherwise drop given 86% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1100]:

saturn.columns["generic_name_fr_imported"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value Eau De Source
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 287.
Top values for generic_name_fr_imported.
Show data table
Top values for generic_name_fr_imported (7 unique shown, of 7 total).
valuecountshare
Eau De Source12.0%
Biscuit fourré à la pâte à tartiner aux noisettes et au cacao Nutella®12.0%
Pain croustillant a la farine de seigle12.0%
Pain croustillant à la farine complète de seigle, avoine et sésame.12.0%
Snack salé12.0%
Chocolat noir supérieur12.0%
Brioche tranchée issue de l'agriculture biologique12.0%

owner_fields unknown other

The column `owner_fields` was skipped by the profiler, so its kind is unknown and no descriptive statistics, uniqueness, or value samples are available. The only signals are a row count of 50 and a null rate of 0.0, meaning every row is populated but the contents are opaque from this evidence alone. Without a sample or type inference, nothing can be said about what the field encodes.

Treatment: Re-profile with parsing enabled (or inspect raw values) before deciding how to use this column.

anthropic:claude-opus-4-7 · confidence low
Out[1103]:

saturn.columns["owner_fields"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

categories_imported categorical metadata

Hierarchical product category paths (comma-separated taxonomy strings, mostly French with some en: prefixes) imported from an external source, likely Open Food Facts. The column is 88% null with only 6 non-null rows across 5 distinct values, so coverage is too sparse to be useful as-is. Entropy ratio of 0.97 confirms the few present values are nearly all distinct, and the top value appears just twice.

Treatment: Split on comma into hierarchical levels and use only the top level as a feature, or drop given 88% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1105]:

saturn.columns["categories_imported"].stats

statvalue
n50
nulls44 (88.0%)
unique5
top_value Snacks, Snacks salés, Amuse-gueules, Chips et frites, Chips
top_rate 0.3333
cardinality 5
entropy 2.252
entropy_ratio 0.9697
alert: long_tail4 singleton categories
alert: null_rate88.0% null
Fig 288.
Top values for categories_imported.
Show data table
Top values for categories_imported (5 unique shown, of 5 total).
valuecountshare
Snacks, Snacks salés, Amuse-gueules, Chips et frites, Chips24.0%
Boissons et préparations de boissons, Boissons, Eaux, Eaux de sources12.0%
Snacks, Snacks sucrés, Biscuits et gâteaux, Biscuits sucrés & biscuits apéritifs, Biscuits, en:Biscuits/Cookies (Shelf Stable)12.0%
Snacks, Snacks sucrés, Cacao et dérivés, Chocolats, Chocolats noirs, Chocolat noir pâtissier en tablette à 40% de cacao minimum12.0%
Snacks, Snacks sucrés, en:Sweet pastries and pies, Viennoiseries12.0%

conservation_conditions_fr categorical free_text

French-language storage instructions for products, written as free-form sentences (e.g. "A conserver dans un endroit sec à l'abri de la lumière."). Coverage is very thin: 86% null and only 7 distinct strings across 50 rows, each appearing exactly once, so entropy_ratio is 1.0. Despite semantic overlap (cool, dry, away from light), no two entries are phrased identically, making this unusable as a category without normalisation.

Treatment: Treat as free text: normalise/cluster phrases or extract keywords (sec, frais, lumière) rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high
Out[1108]:

saturn.columns["conservation_conditions_fr"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 289.
Top values for conservation_conditions_fr.
Show data table
Top values for conservation_conditions_fr (7 unique shown, of 7 total).
valuecountshare
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.12.0%
A conserver au sec et à l'abri de la chaleur. Ne pas mettre au réfrigérateur.12.0%
A conserver dans un endroit sec à l'abri de la lumière.12.0%
Conserver dans un endroit frais et sec.12.0%
À conserver dans un endroit sec12.0%
A conserver au frais et au sec.12.0%
À conserver dans son emballage fermé, dans un endroit sec, à température ambiante.12.0%

conservation_conditions categorical free_text

Free-text French storage instructions for a product (e.g., "A conserver à l'abri du soleil..."), captured as a categorical field. With 86% nulls and only 7 distinct values across 50 rows — each appearing exactly once — this behaves like sparse free text rather than a controlled vocabulary. Maximum entropy ratio (1.0) confirms every observed value is unique.

Treatment: Treat as free text; normalize/keyword-extract (e.g., 'sec', 'frais', 'abri') or drop given 86% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1111]:

saturn.columns["conservation_conditions"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 290.
Top values for conservation_conditions.
Show data table
Top values for conservation_conditions (7 unique shown, of 7 total).
valuecountshare
A conserver de préférence à l'abri du soleil, dans un endroit propre, frais et sans odeur.12.0%
A conserver au sec et à l'abri de la chaleur. Ne pas mettre au réfrigérateur.12.0%
A conserver dans un endroit sec à l'abri de la lumière.12.0%
Conserver dans un endroit frais et sec.12.0%
À conserver dans un endroit sec12.0%
A conserver au frais et au sec.12.0%
À conserver dans son emballage fermé, dans un endroit sec, à température ambiante.12.0%

countries_imported categorical metadata

Likely a country-of-origin tag for imported items, but with 84% nulls only 8 of 50 rows actually carry a value. Of those, 7 are 'France' and 1 is 'España', giving a top_rate of 0.875 and just 2 distinct categories. The mixed language ('España' vs the English column name) hints at inconsistent source encoding.

Treatment: Impute or flag missingness and normalise country names to a single language before any grouping.

anthropic:claude-opus-4-7 · confidence medium
Out[1114]:

saturn.columns["countries_imported"].stats

statvalue
n50
nulls42 (84.0%)
unique2
top_value France
top_rate 0.875
cardinality 2
entropy 0.5436
entropy_ratio 0.5436
alert: null_rate84.0% null
Fig 291.
Top values for countries_imported.
Show data table
Top values for countries_imported (2 unique shown, of 2 total).
valuecountshare
France714.0%
España12.0%

origins_fr categorical metadata

This appears to be a French-language origins field listing geographic provenance and source names (towns, regions, water sources) as a comma-concatenated string. The column is almost entirely empty with a 96% null rate, leaving only 2 distinct values across 50 rows—one populated entry bundling 11 locations together and one blank string. The packed multi-value format suggests this was flattened from a list field rather than a clean categorical.

Treatment: split on commas and explode into a multi-label set before use; coverage too sparse to model directly.

anthropic:claude-opus-4-7 · confidence high
Out[1117]:

saturn.columns["origins_fr"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Chambon-la-Forêt,France,Cairanne,Provence-Alpes-Côte d'Azur,Vaucluse,Italie,Source Sainte Cécile,Source Ofélia,Source Éléonore,Source Emma,Source Éléna
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 292.
Top values for origins_fr.
Show data table
Top values for origins_fr (2 unique shown, of 2 total).
valuecountshare
Chambon-la-Forêt,France,Cairanne,Provence-Alpes-Côte d'Azur,Vaucluse,Italie,Source Sainte Cécile,Source Ofélia,Source Éléonore,Source Emma,Source Éléna12.0%
12.0%

abbreviated_product_name categorical free_text

Short product label field, likely a shelf-name abbreviation including brand, variant and pack size (e.g. 'CRISTALINE Eau De Source 0.5L'). It is almost entirely empty with a null_rate of 0.86, and among the 7 populated rows every value is unique (entropy_ratio 1.0, top_rate ~0.143), so it carries no repeating categories.

Treatment: Drop or treat as free text; too sparse and unique to use as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[1120]:

saturn.columns["abbreviated_product_name"].stats

statvalue
n50
nulls43 (86.0%)
unique7
top_value CRISTALINE Eau De Source 0.5L
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
alert: null_rate86.0% null
Fig 293.
Top values for abbreviated_product_name.
Show data table
Top values for abbreviated_product_name (7 unique shown, of 7 total).
valuecountshare
CRISTALINE Eau De Source 0.5L12.0%
Nutella biscuits t2212.0%
Authentique 275g, fr12.0%
Fibres 230g, fr12.0%
ORG Original 175g12.0%
NESTLE DESSERT Noir 205g12.0%
BRIOCHE TRANCHEE BIO 400g12.0%

customer_service categorical free_text

Free-text customer service contact details (postal addresses or URLs) extracted from product packaging, mostly in French. The column is 86% null with only 7 populated rows across 6 near-unique values, and entries are long unstructured strings mixing brands like Wasa, Cristaline, Ferrero, Kellogg's, Nestlé and La Boulangère.

Treatment: Drop or parse out brand/URL/address fields separately; too sparse and unstructured to model as-is.

anthropic:claude-opus-4-7 · confidence high
Out[1123]:

saturn.columns["customer_service"].stats

statvalue
n50
nulls43 (86.0%)
unique6
top_value Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)
top_rate 0.2857
cardinality 6
entropy 2.522
entropy_ratio 0.9755
alert: long_tail5 singleton categories
alert: null_rate86.0% null
Fig 294.
Top values for customer_service.
Show data table
Top values for customer_service (6 unique shown, of 6 total).
valuecountshare
Service Consommateurs, : www.wasa.com/fr-fr/contact (depuis la France), www.wasa.com/fr-be/contact (depuis la Belgique)24.0%
Service Consommateurs Cristaline, 70 avenue des Sources 03270 SAINT YORRE12.0%
FERRERO FRANCE COMMERCIALE - Service Consommateurs, CS 90058 - 76136 MONT SAINT AIGNAN Cedex12.0%
Service Conseil Consommateurs, Kellogg's Produits Alimentaires S.A.S. - Immeuble Neptune - 1 rue Galilée 93160 Noisy-le-Grand (France)12.0%
Nestlé France, BP 900 Noisiel 77446 Marne la Vallée Cedex 212.0%
Service consommateurs La Boulangère & Co, La Boulangère & Co 1 rue du petit bocage CS 40 201 85140 ESSARTS12.0%

data_sources_imported categorical metadata

Concatenated provenance trail listing the producers, databases, and apps that contributed to each record (e.g., 'Database - Equadis, Database - GDSN, Databases, Producers, Producer - nestle-france'). 84% of rows are null and the 8 non-null values are all unique, giving entropy_ratio 1.0 — every observed string is a bespoke composite rather than a clean category. Repeated tokens within a single value (e.g., 'Producers' appearing twice) suggest the field was assembled by concatenation without deduplication.

Treatment: Split on commas and one-hot or multi-hot encode the underlying source tokens rather than using the raw string.

anthropic:claude-opus-4-7 · confidence high
Out[1126]:

saturn.columns["data_sources_imported"].stats

statvalue
n50
nulls42 (84.0%)
unique8
top_value Producers, Producer - gie-sources-alma, Database - Equadis, Database - GDSN, Databases, Producers, Producer - gie-sources-alma
top_rate 0.125
cardinality 8
entropy 3
entropy_ratio 1
alert: long_tail8 singleton categories
alert: null_rate84.0% null
Fig 295.
Top values for data_sources_imported.
Show data table
Top values for data_sources_imported (8 unique shown, of 8 total).
valuecountshare
Producers, Producer - gie-sources-alma, Database - Equadis, Database - GDSN, Databases, Producers, Producer - gie-sources-alma12.0%
Producers, Producer - ferrero-france-commerciale, Database - Equadis, Database - GDSN, Databases, Producers, Producer - ferrero-france-commerciale12.0%
Database - Equadis, Database - GDSN, Databases, Producers, Producer - barilla-france-sa, Producers, Producer - barilla-france-sa12.0%
Apps, app-elcoco12.0%
Producers, Producer - barilla-france-sa, Database - Equadis, Database - GDSN, Databases, Producers, Producer - barilla-france-sa12.0%
Database - CodeOnline, Database - GDSN, Databases12.0%
Database - Equadis, Database - GDSN, Databases, Producers, Producer - nestle-france, Producers, Producer - nestle-france12.0%
Producers, Producer - la-boulangere-co, Database - Equadis, Database - GDSN, Databases, Producers, Producer - la-boulangere-co12.0%

nova_group_error categorical metadata

This appears to be an error/diagnostic flag explaining why a NOVA food classification group could not be assigned. It is null in 96% of 50 rows, and the only observed value across the 2 non-null cases is "too_many_unknown_ingredients" (top_rate 1.0, cardinality 1, entropy 0). With a single category present, the column carries no discriminative signal in this sample.

Treatment: Drop or retain only as a boolean error-present flag; near-constant and overwhelmingly null.

anthropic:claude-opus-4-7 · confidence high
Out[1129]:

saturn.columns["nova_group_error"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value too_many_unknown_ingredients
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 296.
Top values for nova_group_error.
Show data table
Top values for nova_group_error (1 unique shown, of 1 total).
valuecountshare
too_many_unknown_ingredients24.0%

ingredients_text_de_ocr_1648897071_result categorical free_text

This appears to be the OCR result of a German ingredients list (ingredients_text_de_ocr) tied to a specific timestamped run. Of 50 rows, 98% are null and only a single non-null value exists — a detailed Nuss-Nougat-Creme ingredient declaration — giving cardinality 1 and entropy 0.

Treatment: Drop; 98% null with a single observed value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1132]:

saturn.columns["ingredients_text_de_ocr_1648897071_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_ - und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 297.
Top values for ingredients_text_de_ocr_1648897071_result.
Show data table
Top values for ingredients_text_de_ocr_1648897071_result (1 unique shown, of 1 total).
valuecountshare
Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_ - und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin12.0%

packaging_text_ro categorical free_text

Romanian-language packaging text field that is essentially empty: 96% of the 50 rows are null and the remaining 2 non-null values are both blank strings, yielding a single observed category and zero entropy.

Treatment: Drop; no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1135]:

saturn.columns["packaging_text_ro"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 298.
Top values for packaging_text_ro.
Show data table
Top values for packaging_text_ro (1 unique shown, of 1 total).
valuecountshare
24.0%

product_name_ro categorical metadata

A Romanian-language product name field that is effectively empty: 96% of the 50 rows are null, leaving only 2 non-null values, one of which is an empty string and the other an English phrase ('Sour Cream & Onion'). With cardinality of 2 and no actual Romanian content observed, this column carries no usable signal in the sample.

Treatment: Drop; null_rate 0.96 and no Romanian values present.

anthropic:claude-opus-4-7 · confidence high
Out[1138]:

saturn.columns["product_name_ro"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 299.
Top values for product_name_ro.
Show data table
Top values for product_name_ro (2 unique shown, of 2 total).
valuecountshare
12.0%
Sour Cream & Onion12.0%

producer_version_id categorical metadata

Identifier-style categorical capturing a producer/version reference, but 92% of the 50 rows are null, leaving only 4 populated values. The non-null entries are inconsistent in shape — a small integer ('1'), an ISO timestamp, and an 8-digit number — suggesting the field is overloaded or improperly typed. With cardinality 3 and top_rate 0.5 over a tiny populated subset, no reliable signal can be drawn.

Treatment: Drop or quarantine until the upstream schema is clarified; not usable as a feature at 92% null with mixed value types.

anthropic:claude-opus-4-7 · confidence high
Out[1141]:

saturn.columns["producer_version_id"].stats

statvalue
n50
nulls46 (92.0%)
unique3
top_value 1
top_rate 0.5
cardinality 3
entropy 1.5
entropy_ratio 0.9464
alert: long_tail2 singleton categories
alert: null_rate92.0% null
Fig 300.
Top values for producer_version_id.
Show data table
Top values for producer_version_id (3 unique shown, of 3 total).
valuecountshare
124.0%
2021-01-25T13:53:49+01:0012.0%
4421706312.0%

serving_size_imported categorical free_text

Free-text serving size descriptors imported from an upstream source, mixing grams with French unit hints like 'tranche' and 'carrés'. 88% of the 50 rows are null and the 6 non-null values are all unique, so entropy_ratio is 1.0 and there is no modal serving. Format is inconsistent (e.g. '30 g' vs '25.6 g (5 carrés (25,6 g))'), making direct aggregation unsafe.

Treatment: Parse the leading numeric grams into a numeric column and discard the free-text remainder.

anthropic:claude-opus-4-7 · confidence high
Out[1144]:

saturn.columns["serving_size_imported"].stats

statvalue
n50
nulls44 (88.0%)
unique6
top_value 13.8 g (1)
top_rate 0.1667
cardinality 6
entropy 2.585
entropy_ratio 1
alert: long_tail6 singleton categories
alert: null_rate88.0% null
Fig 301.
Top values for serving_size_imported.
Show data table
Top values for serving_size_imported (6 unique shown, of 6 total).
valuecountshare
13.8 g (1)12.0%
11.4 g (1 tranche)12.0%
10 g (1 tranche)12.0%
30 g12.0%
25.6 g (5 carrés (25,6 g))12.0%
26.7 g (1 tranche de 26.7 g environ)12.0%

no_nutrition_data_imported categorical metadata

A boolean-style flag indicating whether nutrition data was skipped during import. With a 0.92 null rate and only 4 non-null rows all reading "false" (top_rate 1.0, cardinality 1, entropy 0.0), the column carries no information in this sample.

Treatment: Drop; constant value with overwhelming nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1147]:

saturn.columns["no_nutrition_data_imported"].stats

statvalue
n50
nulls46 (92.0%)
unique1
top_value false
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate92.0% null
alert: imbalancetop value is 100.0% of rows
Fig 302.
Top values for no_nutrition_data_imported.
Show data table
Top values for no_nutrition_data_imported (1 unique shown, of 1 total).
valuecountshare
false48.0%

packaging_imported categorical metadata

Categorical column capturing imported packaging type, almost certainly free-form labels like 'Enveloppe' or 'Boîte, Barquette'. It's effectively unusable as-is: 92% of the 50 rows are null, leaving only 4 observed values across 2 distinct categories, with 'Enveloppe' covering 3 of them.

Treatment: Drop or set aside until more coverage is available; 92% nulls leave nothing to model.

anthropic:claude-opus-4-7 · confidence high
Out[1150]:

saturn.columns["packaging_imported"].stats

statvalue
n50
nulls46 (92.0%)
unique2
top_value Enveloppe
top_rate 0.75
cardinality 2
entropy 0.8113
entropy_ratio 0.8113
alert: null_rate92.0% null
Fig 303.
Top values for packaging_imported.
Show data table
Top values for packaging_imported (2 unique shown, of 2 total).
valuecountshare
Enveloppe36.0%
Boîte, Barquette12.0%

ingredients_text_ro categorical free_text

Romanian-language ingredients text, almost entirely absent in this sample. 96% of the 50 rows are null, and the only 2 non-null values are empty strings, giving cardinality 1 and entropy 0. There is no usable signal here.

Treatment: Drop; effectively empty for this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1153]:

saturn.columns["ingredients_text_ro"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 304.
Top values for ingredients_text_ro.
Show data table
Top values for ingredients_text_ro (1 unique shown, of 1 total).
valuecountshare
24.0%

producer_version_id_imported categorical metadata

This appears to be a sparsely populated categorical field tracking some imported producer version identifier, with 92% null_rate leaving only 4 non-null values across 50 rows. The 3 distinct values are wildly inconsistent in format — '1', a timestamp '2021-01-25T13:53:49+01:00', and a numeric '44217063' — suggesting the column conflates multiple semantics or has been mis-mapped during import. With only 4 observations, top_rate of 0.5 and entropy_ratio of 0.95 are not meaningful signals.

Treatment: Drop unless the import pipeline can be fixed to emit a single consistent value type.

anthropic:claude-opus-4-7 · confidence low
Out[1156]:

saturn.columns["producer_version_id_imported"].stats

statvalue
n50
nulls46 (92.0%)
unique3
top_value 1
top_rate 0.5
cardinality 3
entropy 1.5
entropy_ratio 0.9464
alert: long_tail2 singleton categories
alert: null_rate92.0% null
Fig 305.
Top values for producer_version_id_imported.
Show data table
Top values for producer_version_id_imported (3 unique shown, of 3 total).
valuecountshare
124.0%
2021-01-25T13:53:49+01:0012.0%
4421706312.0%

labels_imported categorical metadata

Imported product labels (likely certifications or dietary tags) carried over from an external source, with values like 'Végétarien' and comma-separated certification strings ('Point Vert, Rainforest Alliance, Triman'). The column is 90% null, leaving only 5 populated rows across 3 distinct values, and the top value covers 60% of those. With such sparse coverage and multi-label strings packed into single cells, this field is barely usable as-is.

Treatment: Split comma-separated tags and one-hot encode, but expect to drop given 90% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1159]:

saturn.columns["labels_imported"].stats

statvalue
n50
nulls45 (90.0%)
unique3
top_value Végétarien
top_rate 0.6
cardinality 3
entropy 1.371
entropy_ratio 0.865
alert: long_tail2 singleton categories
alert: null_rate90.0% null
Fig 306.
Top values for labels_imported.
Show data table
Top values for labels_imported (3 unique shown, of 3 total).
valuecountshare
Végétarien36.0%
Point Vert, Rainforest Alliance, Triman12.0%
Commerce équitable, Bio, Bio européen, en:organic12.0%

ingredients_text_de_ocr_1648990410_result categorical free_text

This appears to be the OCR-extracted German ingredients text from a timestamped scan (1648990410), capturing raw product label text. It is effectively empty: 98% null with only 1 non-null value out of 50 rows, a single German ingredients string for a hazelnut-nougat cookie product. With cardinality 1 and entropy 0, it carries no discriminative signal in this sample.

Treatment: Drop; 98% null and only one observed value provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1162]:

saturn.columns["ingredients_text_de_ocr_1648990410_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Kekse mit Nuss - Nugat - Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 307.
Top values for ingredients_text_de_ocr_1648990410_result.
Show data table
Top values for ingredients_text_de_ocr_1648990410_result (1 unique shown, of 1 total).
valuecountshare
Kekse mit Nuss - Nugat - Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin12.0%

allergens_imported categorical feature

Categorical column listing imported allergen declarations, with 90% nulls leaving only 5 populated rows across 4 distinct values. Entries are comma-separated multi-allergen strings in French (e.g., 'Gluten, Graines de sésame', 'Œufs, Gluten'), and one value embeds what looks like a GS1 code ('Gs1:T4078:ML'), suggesting inconsistent encoding. 'Gluten' is the only repeated value (2 of 5), and entropy_ratio of 0.96 reflects the near-uniform spread across the tiny populated subset.

Treatment: Split on commas into a multi-label allergen set and impute or flag the 90% missing before use.

anthropic:claude-opus-4-7 · confidence medium
Out[1165]:

saturn.columns["allergens_imported"].stats

statvalue
n50
nulls45 (90.0%)
unique4
top_value Gluten
top_rate 0.4
cardinality 4
entropy 1.922
entropy_ratio 0.961
alert: long_tail3 singleton categories
alert: null_rate90.0% null
Fig 308.
Top values for allergens_imported.
Show data table
Top values for allergens_imported (4 unique shown, of 4 total).
valuecountshare
Gluten24.0%
Gluten, Lait, Fruits à coque, Soja, Gs1:T4078:ML12.0%
Gluten, Graines de sésame12.0%
Œufs, Gluten12.0%

ingredients_text_de_ocr_1648990410 categorical free_text

This appears to be an OCR-extracted German ingredients text field (timestamped 1648990410), likely from a food product database such as Open Food Facts. Out of 50 rows, 98% are null and only a single non-null value exists — a long ingredient string for a hazelnut-nougat cookie product. With cardinality 1 and entropy 0, the column carries no discriminative signal in this sample.

Treatment: Drop; 98% null and only one unique OCR value provides no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1168]:

saturn.columns["ingredients_text_de_ocr_1648990410"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Kekse mit Nuss - Nugat- Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 309.
Top values for ingredients_text_de_ocr_1648990410.
Show data table
Top values for ingredients_text_de_ocr_1648990410 (1 unique shown, of 1 total).
valuecountshare
Kekse mit Nuss - Nugat- Creme - Füllung: Nuss-Nugat-Creme 40% (Zucker, Palmöl, HASELNÜSSE Magermilchpulver, fettarmer Kakao, Emulgator Lecithine (S0JA), Vanillin, Weizenmehl, pflanzliche Fette ( Palm, Palmkern), Rohrzucker, Milchzucker, Weizenkleie, VOLLMILCHPULVER, GERSTENMALZ-und Maisextraktpulver, Honig. Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, Weizenstärke, Gerstenmalzmehl, Emulgator Lecithine (Soja), Vanillin12.0%

ingredients_text_de_ocr_1648897071 categorical free_text

This appears to be a German-language OCR extraction of an ingredients list (timestamped 1648897071), likely from Open Food Facts product packaging. The column is essentially empty: 98% null across 50 rows, with only 1 non-null value present, so cardinality is 1 and entropy is 0. The single observed entry is a long free-text ingredients string for a hazelnut-nougat-cream product, with allergens marked by underscores.

Treatment: Drop; the column is 98% null with only one unique value and carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1171]:

saturn.columns["ingredients_text_de_ocr_1648897071"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_- und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 310.
Top values for ingredients_text_de_ocr_1648897071.
Show data table
Top values for ingredients_text_de_ocr_1648897071 (1 unique shown, of 1 total).
valuecountshare
Nuss-Nougat-Creme 40% (Zucker, Palmöl, _Haselnüsse_ 13%, _Magermilchpulver_ 8,7%, fettarmer Kakao 7,4%, Emulgator Lecithine (_Soja_), Vanillin), _Weizenmehl_ 32,5%, pflanzliche Fette (Palm, Palmkern), Rohrzucker 8,5% (enthält _Weizen_), _Milchzucker_, _Weizenkleie_, _Vollmilchpulver_, _Gerstenmalz_- und Maisextraktpulver, Honig, Backtriebmittel: Dinatriumdiphosphat, Natriumhydrogencarbonat, Ammoniumhydrogencarbonat; fettarmer Kakao, Salz, _Weizenstärke_, _Gerstenmalzmehl_, Emulgator Lecithine (_Soja_), Vanillin12.0%

generic_name_ro categorical metadata

This appears to be a Romanian-language generic drug name field, but it is effectively empty: 96% of the 50 rows are null, and the only 2 non-null entries are blank strings, giving a cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop; the column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[1174]:

saturn.columns["generic_name_ro"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 311.
Top values for generic_name_ro.
Show data table
Top values for generic_name_ro (1 unique shown, of 1 total).
valuecountshare
24.0%

origin_ro categorical other

Column 'origin_ro' is effectively empty: 96% of the 50 rows are null, and the only 2 non-null values are blank strings, giving a single observed category with zero entropy. There is no usable signal here.

Treatment: Drop; column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[1177]:

saturn.columns["origin_ro"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 312.
Top values for origin_ro.
Show data table
Top values for origin_ro (1 unique shown, of 1 total).
valuecountshare
24.0%

abbreviated_product_name_imported categorical metadata

This appears to be a shortened/imported product name field, but it's almost entirely empty: 94% null across 50 rows, leaving only 3 non-null values, each unique (e.g., 'Authentique 275g, fr', 'Fibres 230g, fr', 'DESSERT Noir 205g'). With cardinality equal to the populated count and maximal entropy_ratio of 1.0, there is no repetition to learn from. The mixed formatting and language hints (French abbreviations, weight suffixes) further suggest inconsistent upstream import.

Treatment: Drop or defer — too sparse and near-unique to be useful as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[1180]:

saturn.columns["abbreviated_product_name_imported"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value Authentique 275g, fr
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 313.
Top values for abbreviated_product_name_imported.
Show data table
Top values for abbreviated_product_name_imported (3 unique shown, of 3 total).
valuecountshare
Authentique 275g, fr12.0%
Fibres 230g, fr12.0%
DESSERT Noir 205g12.0%

traces_imported categorical free_text

This column appears to record allergen trace declarations (in French) on food products, listing items like Lupin, Lait, Moutarde, Soja as comma-separated lists. It is almost entirely empty with a 92% null rate, leaving only 4 non-null values across 50 rows, each unique. With cardinality equal to the populated count, every observed value is its own category, making aggregation unreliable.

Treatment: Split on commas into a multi-label allergen indicator set, but expect sparse signal given the 92% null rate.

anthropic:claude-opus-4-7 · confidence high
Out[1183]:

saturn.columns["traces_imported"].stats

statvalue
n50
nulls46 (92.0%)
unique4
top_value Lupin, Lait, Moutarde, Graines de sésame, Soja
top_rate 0.25
cardinality 4
entropy 2
entropy_ratio 1
alert: long_tail4 singleton categories
alert: null_rate92.0% null
Fig 314.
Top values for traces_imported.
Show data table
Top values for traces_imported (4 unique shown, of 4 total).
valuecountshare
Lupin, Lait, Moutarde, Graines de sésame, Soja12.0%
Lupin, Lait, Moutarde, Soja12.0%
Lait, Fruits à coque12.0%
Lait, Moutarde, Soja12.0%

specific_ingredients unknown free_text

The column 'specific_ingredients' was skipped by the profiler, so no type, uniqueness, or distribution stats are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds ingredient lists, likely free text or arrays, which is consistent with the profiler declining to summarise it. Without sample values or cardinality we cannot confirm structure or detect duplicates, language mix, or skew.

Treatment: Inspect raw values and parse or tokenize before any modelling.

anthropic:claude-opus-4-7 · confidence low
Out[1186]:

saturn.columns["specific_ingredients"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_ru categorical metadata

Russian-language product name field, almost entirely absent: 94% of the 50 rows are null and only 2 distinct non-null values appear, one of which is an empty string. The single real value observed is 'Эксeленс 99% какао', suggesting this column is a sparsely populated localization of a product name.

Treatment: Drop or defer; too sparse (94% null) to use as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[1188]:

saturn.columns["product_name_ru"].stats

statvalue
n50
nulls47 (94.0%)
unique2
top_value
top_rate 0.6667
cardinality 2
entropy 0.9183
entropy_ratio 0.9183
alert: null_rate94.0% null
Fig 315.
Top values for product_name_ru.
Show data table
Top values for product_name_ru (2 unique shown, of 2 total).
valuecountshare
24.0%
Экселенс 99% какао12.0%

origin_ru categorical metadata

A categorical column flagged as Russian-origin metadata, but it is effectively empty: 94% of the 50 rows are null and the only non-null value observed is the empty string, repeated 3 times. Cardinality is 1 and entropy is 0, so this column carries no information as-is.

Treatment: Drop; the column has a single empty value and 94% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1191]:

saturn.columns["origin_ru"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 316.
Top values for origin_ru.
Show data table
Top values for origin_ru (1 unique shown, of 1 total).
valuecountshare
36.0%

ingredients_text_with_allergens_ru categorical free_text

Russian-language ingredient text with allergen markup, almost entirely absent from this sample. 94% of rows are null and the remaining 6% are all empty strings, leaving a single unique value and zero entropy.

Treatment: Drop; no usable signal in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1194]:

saturn.columns["ingredients_text_with_allergens_ru"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 317.
Top values for ingredients_text_with_allergens_ru.
Show data table
Top values for ingredients_text_with_allergens_ru (1 unique shown, of 1 total).
valuecountshare
36.0%

packaging_text_ru categorical free_text

Russian-language packaging text field that is effectively empty: 94% of 50 rows are null and the remaining 3 non-null entries are all the empty string, giving a single observed value and zero entropy. There is no usable signal here.

Treatment: Drop; column is 94% null with the only non-null value being an empty string.

anthropic:claude-opus-4-7 · confidence high
Out[1197]:

saturn.columns["packaging_text_ru"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 318.
Top values for packaging_text_ru.
Show data table
Top values for packaging_text_ru (1 unique shown, of 1 total).
valuecountshare
36.0%

generic_name_ru categorical metadata

Russian-language generic product name field, populated for only 3 of 50 rows (null_rate 0.94). Among the 3 non-null entries, 2 are empty strings and 1 is 'Плитка горького шоколада (99% какао)', so effectively only one real value is present. The column is unusable as a feature at this sample size.

Treatment: Drop or hold aside until coverage improves; do not use for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[1200]:

saturn.columns["generic_name_ru"].stats

statvalue
n50
nulls47 (94.0%)
unique2
top_value
top_rate 0.6667
cardinality 2
entropy 0.9183
entropy_ratio 0.9183
alert: null_rate94.0% null
Fig 319.
Top values for generic_name_ru.
Show data table
Top values for generic_name_ru (2 unique shown, of 2 total).
valuecountshare
24.0%
Плитка горького шоколада (99% какао)12.0%

ingredients_text_ru categorical free_text

Russian-language ingredient text from what appears to be a multilingual product catalog. The column is effectively empty: 94% null and the only 3 non-null entries are blank strings, giving cardinality 1 and zero entropy.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1203]:

saturn.columns["ingredients_text_ru"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 320.
Top values for ingredients_text_ru.
Show data table
Top values for ingredients_text_ru (1 unique shown, of 1 total).
valuecountshare
36.0%

ingredients_text_da categorical free_text

Danish-language ingredients text for food products, evidently sourced from Open Food Facts-style multilingual labeling. The column is almost entirely empty (null_rate 0.96), with only 2 non-null values out of 50, one of which is a blank string. The single substantive entry is actually a mixed Swedish/Danish/Norwegian ingredient list with allergen tokens marked by underscores, suggesting the locale tagging is unreliable.

Treatment: Drop unless modelling Danish text specifically; coverage is too sparse to be useful.

anthropic:claude-opus-4-7 · confidence high
Out[1206]:

saturn.columns["ingredients_text_da"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value _VETEMJÖL_/_HVEDEMEL_, palmolja/-olie, glukossirap, maltextrakt från _KORN_/_BYG_, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, _ÄGG_/_ÆG_/_EGG_, arom, mjölbehandlingsmedel/melbehandlingsmiddel (_NATRIUMDISULFIT_).
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 321.
Top values for ingredients_text_da.
Show data table
Top values for ingredients_text_da (2 unique shown, of 2 total).
valuecountshare
_VETEMJÖL_/_HVEDEMEL_, palmolja/-olie, glukossirap, maltextrakt från _KORN_/_BYG_, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, _ÄGG_/_ÆG_/_EGG_, arom, mjölbehandlingsmedel/melbehandlingsmiddel (_NATRIUMDISULFIT_).12.0%
12.0%

ingredients_text_with_allergens_da categorical free_text

Danish-language ingredients text with HTML-tagged allergen spans, evidently sourced from a multilingual food product database. The column is 96% null with only 2 non-null values out of 50, one of which is an empty string, leaving effectively a single real entry that mixes Swedish and Danish/Norwegian terms.

Treatment: Drop unless Danish-specific allergen extraction is required; coverage is too sparse to model.

anthropic:claude-opus-4-7 · confidence high
Out[1209]:

saturn.columns["ingredients_text_with_allergens_da"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value VETEMJÖL/HVEDEMEL, palmolja/-olie, glukossirap, maltextrakt från KORN/BYG, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, ÄGG/ÆG/EGG, arom, mjölbehandlingsmedel/melbehandlingsmiddel (NATRIUMDISULFIT).
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 322.
Top values for ingredients_text_with_allergens_da.
Show data table
Top values for ingredients_text_with_allergens_da (2 unique shown, of 2 total).
valuecountshare
VETEMJÖL/HVEDEMEL, palmolja/-olie, glukossirap, maltextrakt från KORN/BYG, bakpulver/hævemidler (ammoniumkarbonater, natriumkarbonater), salt, ÄGG/ÆG/EGG, arom, mjölbehandlingsmedel/melbehandlingsmiddel (NATRIUMDISULFIT).12.0%
12.0%

product_name_da categorical metadata

Danish product name field with only 2 non-null values out of 50 rows (null_rate 0.96), each appearing once. The two observed labels ("Original", "Alpine Milk") look like product variant descriptors rather than full product names. With 96% missingness the column carries almost no signal as-is.

Treatment: Drop or defer until backfilled; 96% nulls make it unusable for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[1212]:

saturn.columns["product_name_da"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Original
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 323.
Top values for product_name_da.
Show data table
Top values for product_name_da (2 unique shown, of 2 total).
valuecountshare
Original12.0%
Alpine Milk12.0%

packaging_text_da categorical free_text

Danish-language packaging text field that is effectively empty: 96% of the 50 rows are null, and the only 2 non-null values are empty strings, giving cardinality 1 and zero entropy. There is no usable signal here.

Treatment: Drop; column is empty in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1215]:

saturn.columns["packaging_text_da"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 324.
Top values for packaging_text_da.
Show data table
Top values for packaging_text_da (1 unique shown, of 1 total).
valuecountshare
24.0%

generic_name_da categorical metadata

Danish-language generic product name field, almost entirely empty: 96% null across 50 rows, leaving only 2 non-null observations. The two surviving values are 'Kiks' and an empty string, so there is essentially no usable signal here.

Treatment: Drop; null rate too high to be useful.

anthropic:claude-opus-4-7 · confidence high
Out[1218]:

saturn.columns["generic_name_da"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Kiks
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 325.
Top values for generic_name_da.
Show data table
Top values for generic_name_da (2 unique shown, of 2 total).
valuecountshare
Kiks12.0%
12.0%

forest_footprint_data unknown other

The column 'forest_footprint_data' was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. The only confirmed signals are 50 rows with a 0.0 null rate; uniqueness, type, and value distribution are all missing.

Treatment: Re-profile or manually inspect this column before any downstream use.

anthropic:claude-opus-4-7 · confidence low
Out[1221]:

saturn.columns["forest_footprint_data"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

origin_da categorical metadata

The column appears to be an origin date or destination identifier ('origin_da') but is effectively empty: 96% of the 50 rows are null and the only observed value is the empty string, which accounts for the remaining 2 entries. With cardinality of 1 and entropy of 0, it carries no information.

Treatment: Drop; the column has no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1223]:

saturn.columns["origin_da"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 326.
Top values for origin_da.
Show data table
Top values for origin_da (1 unique shown, of 1 total).
valuecountshare
24.0%

origin_sr categorical metadata

This appears to be a source/origin categorical field, but it is effectively empty: 96% of the 50 rows are null and the only 2 non-null values are blank strings, giving a single observed category with entropy 0. There is no usable signal here.

Treatment: Drop; column is 96% null with only blank values.

anthropic:claude-opus-4-7 · confidence high
Out[1226]:

saturn.columns["origin_sr"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 327.
Top values for origin_sr.
Show data table
Top values for origin_sr (1 unique shown, of 1 total).
valuecountshare
24.0%

ingredients_text_nl_ocr_1675675383_result categorical free_text

This column appears to hold OCR-extracted Dutch ingredient text from product packaging, likely a per-image result field. It is 98% null with only one non-null value present ('Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille - stokje.'), giving cardinality 1 and zero entropy. With effectively no variance or coverage, it carries no analytical signal in this sample.

Treatment: Drop; near-empty with a single observed value.

anthropic:claude-opus-4-7 · confidence high
Out[1229]:

saturn.columns["ingredients_text_nl_ocr_1675675383_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille - stokje.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 328.
Top values for ingredients_text_nl_ocr_1675675383_result.
Show data table
Top values for ingredients_text_nl_ocr_1675675383_result (1 unique shown, of 1 total).
valuecountshare
Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille - stokje.12.0%

ingredients_text_cs categorical free_text

Czech-language ingredients text, almost entirely absent: 94% of the 50 rows are null and only 2 distinct non-null values exist, one of which is an empty string appearing twice. The single substantive entry is a Czech ingredients list ("Kakaová hmota, cukr, kakaové máslo, vanilka."), confirming this is a localized free-text field rather than a categorical feature.

Treatment: Drop for modelling; retain only if Czech-localized text is specifically needed.

anthropic:claude-opus-4-7 · confidence high
Out[1232]:

saturn.columns["ingredients_text_cs"].stats

statvalue
n50
nulls47 (94.0%)
unique2
top_value
top_rate 0.6667
cardinality 2
entropy 0.9183
entropy_ratio 0.9183
alert: null_rate94.0% null
Fig 329.
Top values for ingredients_text_cs.
Show data table
Top values for ingredients_text_cs (2 unique shown, of 2 total).
valuecountshare
24.0%
Kakaová hmota, cukr, kakaové máslo, vanilka.12.0%

product_name_cs categorical metadata

Czech-localised product name field (`product_name_cs`) that is almost entirely unpopulated: 94% of the 50 rows are null and only 2 distinct values appear, one of which is an empty string. The single real label observed is an English-language entry ("Excellence 70% Cocoa Intense Dark"), suggesting the Czech translation pipeline has not been applied.

Treatment: Drop unless Czech localisation is required; the column is 94% null and effectively empty.

anthropic:claude-opus-4-7 · confidence high
Out[1235]:

saturn.columns["product_name_cs"].stats

statvalue
n50
nulls47 (94.0%)
unique2
top_value
top_rate 0.6667
cardinality 2
entropy 0.9183
entropy_ratio 0.9183
alert: null_rate94.0% null
Fig 330.
Top values for product_name_cs.
Show data table
Top values for product_name_cs (2 unique shown, of 2 total).
valuecountshare
24.0%
Excellence 70% Cocoa Intense Dark12.0%

origin_hu categorical metadata

A categorical field 'origin_hu' that is 92% null and, among the 4 non-null rows, contains only the empty string. Effective cardinality is 1 with zero entropy, so the column carries no information in this sample.

Treatment: Drop; the column is constant and almost entirely null.

anthropic:claude-opus-4-7 · confidence high
Out[1238]:

saturn.columns["origin_hu"].stats

statvalue
n50
nulls46 (92.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate92.0% null
alert: imbalancetop value is 100.0% of rows
Fig 331.
Top values for origin_hu.
Show data table
Top values for origin_hu (1 unique shown, of 1 total).
valuecountshare
48.0%

packaging_text_hu categorical metadata

Hungarian packaging text field that is effectively empty: 92% null and the only non-null value across 50 rows is the empty string, occurring 4 times. Cardinality is 1 with entropy 0, so the column carries no information.

Treatment: Drop; no signal (single empty value, 92% null).

anthropic:claude-opus-4-7 · confidence high
Out[1241]:

saturn.columns["packaging_text_hu"].stats

statvalue
n50
nulls46 (92.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate92.0% null
alert: imbalancetop value is 100.0% of rows
Fig 332.
Top values for packaging_text_hu.
Show data table
Top values for packaging_text_hu (1 unique shown, of 1 total).
valuecountshare
48.0%

origin_cs categorical metadata

This appears to be an origin call-sign or code field, but it is effectively empty: 96% of the 50 rows are null and the only non-null entries are blank strings. With cardinality of 1 and entropy of 0, the column carries no information.

Treatment: Drop; the column is 96% null with a single blank value otherwise.

anthropic:claude-opus-4-7 · confidence high
Out[1244]:

saturn.columns["origin_cs"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 333.
Top values for origin_cs.
Show data table
Top values for origin_cs (1 unique shown, of 1 total).
valuecountshare
24.0%

ingredients_text_with_allergens_hu categorical free_text

Hungarian-language ingredient lists with inline HTML markup highlighting allergens, drawn from what looks like a food-product catalogue (Open Food Facts style). The column is almost entirely empty (null_rate 0.94) — only 3 of 50 rows are populated, and each of those values is unique. Notable surprise: at least one entry is multilingual, bundling Hungarian, Romanian and Bulgarian label text into a single cell.

Treatment: Strip HTML tags and treat as optional free text; too sparse (94% null) to use as a feature without heavy imputation.

anthropic:claude-opus-4-7 · confidence high
Out[1247]:

saturn.columns["ingredients_text_with_allergens_hu"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value Kakaómassza, cukor, kakaó - vaj, vanília.
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 334.
Top values for ingredients_text_with_allergens_hu.
Show data table
Top values for ingredients_text_with_allergens_hu (3 unique shown, of 3 total).
valuecountshare
Kakaómassza, cukor, kakaó - vaj, vanília.12.0%
HU Étcsokoládé. Kakaó szárazanyag legalább 70% . ÖSszetevők: kakaómassza, cukor, kakaóvaj, emulgeálószerek: lecitinek (szójából); vanília kivonat. Nyomokban dióféléket és tejet tartalmazhat. Bontatlan csomagolásban tárolva minőségét megórzi (nap/hónap/év): a csomagolás hátoldalán feltüntetett időpontig. Száraz, hűvös helyen tárolandó! Készült: Németországban. A kakaóbab származási helye: Ecuador, Elefántcsontpart, Ghána, Kamerun és Nigeria. A Fairtrade Cocoa Program (Fairtrade Kakaó Program) előnyökhöz juttatja a kistermelőket azáltal, hogy több kakaót értékesítenek Fairtrade termékként. Látogasson el a www.info.fairtrade.net/program oldalra. RO Ciocolată amăruie. Substantă uscată de cacao minimum 70% Ingrediente: masă de cacao, zahăr, unt de cacao, emulsifiant: lecitine din soia; extract din vanilie. Cu ingrediente din tări UE şi non UE. Poate contine urme de fructe cu coajă lemnoasă şi lapte. A se consuma de preferintă înainte de/Nr. Lot: vezi spate ambalaj. A se păstra la loc uscat şi răcoros, ferit de razele soarelui și de înghet, atât înainte, cât şi după deschidere. A se consuma în cel mai scurt timp după deschidere. Fairtrade Cocoa Program (Programul Fairtrade de Cacao) permite micilor agricultori să beneficieze de vânzarea propriei cacao ca Fairtrade. Vizitați www.info.fairtrade.net/program. Produs in U.E. pentru S.C. Lidl Discount SRL, Sat Nedelea, Comuna Ariceştii Rahtivani, DN 72, Crângul lui Bot, KM 73+810, județul Prahova, România. BG Натурален шоколад. Съдържа мин. 70% какаова маса. Съставки: какаова маса, захар, какаово масло, емулгатор: лецитин (соев); екстракт от ванилия. Може да съдържа следи от ядки и мляко. Неотворен най-добър до:/ Партида: виж задната страна. Да се съхранява на сухо и хладно място. Програмата за сертифициране на какао Fairtrade Сосоа Program дава възможност на малките производители да продават повече какао при справедливи условия на търговия. Повече информация на www.info.fairtrade.net/program Произведено в Германия за Лидл Щифтунг енд Ко. КГ, Щифтсбергщрасе 1, 74167 Некарзулм, Германия. LIDL12.0%
Cukor, pálmaolaj, MOGYORÓ (13%), zsírszegény kakaópor (7,4%), sovány TEJPOR (6,6%), TEJSAVÓPOR, emulgeálószer: lecitinek (SZÓJA); aroma (vanillin).12.0%

generic_name_cs categorical metadata

This appears to be a Czech-localized generic drug name field, but it carries virtually no information in this sample. 94% of rows are null, and the only 1 distinct non-null value is itself an empty string (3 occurrences), giving entropy of 0.0 and a top_rate of 1.0.

Treatment: Drop; column is effectively empty (94% null and only blank values otherwise).

anthropic:claude-opus-4-7 · confidence high
Out[1250]:

saturn.columns["generic_name_cs"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 335.
Top values for generic_name_cs.
Show data table
Top values for generic_name_cs (1 unique shown, of 1 total).
valuecountshare
36.0%

ingredients_text_hu categorical free_text

Hungarian-language ingredient declarations for food products, mirroring Open Food Facts' per-language ingredients_text fields. The column is 92% null with only 4 distinct values across 50 rows, and one of those is an empty string while another is a multi-language label dump (HU/RO/BG) rather than pure Hungarian. Top value frequency is just 0.25, so there is no real mode to lean on.

Treatment: Treat as sparse free text; drop or language-filter before any NLP, and don't use as a feature without imputing the 92% missing.

anthropic:claude-opus-4-7 · confidence high
Out[1253]:

saturn.columns["ingredients_text_hu"].stats

statvalue
n50
nulls46 (92.0%)
unique4
top_value Kakaómassza, cukor, kakaó - vaj, vanília.
top_rate 0.25
cardinality 4
entropy 2
entropy_ratio 1
alert: long_tail4 singleton categories
alert: null_rate92.0% null
Fig 336.
Top values for ingredients_text_hu.
Show data table
Top values for ingredients_text_hu (4 unique shown, of 4 total).
valuecountshare
Kakaómassza, cukor, kakaó - vaj, vanília.12.0%
HU Étcsokoládé. Kakaó szárazanyag legalább 70% . ÖSszetevők: kakaómassza, cukor, kakaóvaj, emulgeálószerek: lecitinek (szójából); vanília kivonat. Nyomokban dióféléket és tejet tartalmazhat. Bontatlan csomagolásban tárolva minőségét megórzi (nap/hónap/év): a csomagolás hátoldalán feltüntetett időpontig. Száraz, hűvös helyen tárolandó! Készült: Németországban. A kakaóbab származási helye: Ecuador, Elefántcsontpart, Ghána, Kamerun és Nigeria. A Fairtrade Cocoa Program (Fairtrade Kakaó Program) előnyökhöz juttatja a kistermelőket azáltal, hogy több kakaót értékesítenek Fairtrade termékként. Látogasson el a www.info.fairtrade.net/program oldalra. RO Ciocolată amăruie. Substantă uscată de cacao minimum 70% Ingrediente: masă de cacao, zahăr, unt de cacao, emulsifiant: lecitine din soia; extract din vanilie. Cu ingrediente din tări UE şi non UE. Poate contine urme de fructe cu coajă lemnoasă şi lapte. A se consuma de preferintă înainte de/Nr. Lot: vezi spate ambalaj. A se păstra la loc uscat şi răcoros, ferit de razele soarelui și de înghet, atât înainte, cât şi după deschidere. A se consuma în cel mai scurt timp după deschidere. Fairtrade Cocoa Program (Programul Fairtrade de Cacao) permite micilor agricultori să beneficieze de vânzarea propriei cacao ca Fairtrade. Vizitați www.info.fairtrade.net/program. Produs in U.E. pentru S.C. Lidl Discount SRL, Sat Nedelea, Comuna Ariceştii Rahtivani, DN 72, Crângul lui Bot, KM 73+810, județul Prahova, România. BG Натурален шоколад. Съдържа мин. 70% какаова маса. Съставки: какаова маса, захар, какаово масло, емулгатор: лецитин (соев); екстракт от ванилия. Може да съдържа следи от ядки и мляко. Неотворен най-добър до:/ Партида: виж задната страна. Да се съхранява на сухо и хладно място. Програмата за сертифициране на какао Fairtrade Сосоа Program дава възможност на малките производители да продават повече какао при справедливи условия на търговия. Повече информация на www.info.fairtrade.net/program Произведено в Германия за Лидл Щифтунг енд Ко. КГ, Щифтсбергщрасе 1, 74167 Некарзулм, Германия. LIDL12.0%
Cukor, pálmaolaj, _MOGYORÓ_ (13%), zsírszegény kakaópor (7,4%), sovány _TEJPOR_ (6,6%), _TEJSAVÓPOR_, emulgeálószer: lecitinek (_SZÓJA_); aroma (vanillin).12.0%
12.0%

ingredients_text_sr categorical free_text

Serbian-language ingredient list field (likely a localized variant of an ingredients_text column). Out of 50 rows, only 2 are populated (null_rate 0.96), and one of those is an empty string, leaving exactly one substantive value. There is essentially no signal here at this sample size.

Treatment: Drop unless analysis is restricted to Serbian-localized rows; otherwise too sparse to use.

anthropic:claude-opus-4-7 · confidence high
Out[1256]:

saturn.columns["ingredients_text_sr"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Šećer, kakao masa, kakao buter, vanile.
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 337.
Top values for ingredients_text_sr.
Show data table
Top values for ingredients_text_sr (2 unique shown, of 2 total).
valuecountshare
Šećer, kakao masa, kakao buter, vanile.12.0%
12.0%

packaging_text_sr categorical free_text

This appears to be a Serbian-language packaging text field, but it is effectively empty in the sample: 96% of 50 rows are null and the only non-null entries are blank strings, giving cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop the column; it carries no information in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1259]:

saturn.columns["packaging_text_sr"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 338.
Top values for packaging_text_sr.
Show data table
Top values for packaging_text_sr (1 unique shown, of 1 total).
valuecountshare
24.0%

ingredients_text_nl_ocr_1675675383 categorical free_text

This column appears to be Dutch-language OCR-extracted ingredient text from product packaging, likely a sparsely populated language variant of an ingredients field. Out of 50 rows, 98% are null and the single non-null value is a chocolate ingredients list ('Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille- stokje.'). With cardinality of 1 and entropy of 0, this column carries effectively no signal in this sample.

Treatment: Drop; 98% null with a single observed value provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1262]:

saturn.columns["ingredients_text_nl_ocr_1675675383"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille- stokje.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 339.
Top values for ingredients_text_nl_ocr_1675675383.
Show data table
Top values for ingredients_text_nl_ocr_1675675383 (1 unique shown, of 1 total).
valuecountshare
Cacaomassa, suiker, cacaoboter, natuurlijk Bourbon vanille- stokje.12.0%

ingredients_text_with_allergens_cs categorical free_text

This appears to be a Czech-language ingredient list with allergen annotations for food products. With a 98% null rate and only 1 non-null value across 50 rows ('Kakaová hmota, cukr, kakaové máslo, vanilka.'), the column carries virtually no information in this sample. Entropy is 0.0 and cardinality is 1, so it cannot discriminate between records as-is.

Treatment: Drop for modelling; revisit only if a larger sample provides meaningful coverage.

anthropic:claude-opus-4-7 · confidence high
Out[1265]:

saturn.columns["ingredients_text_with_allergens_cs"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Kakaová hmota, cukr, kakaové máslo, vanilka.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 340.
Top values for ingredients_text_with_allergens_cs.
Show data table
Top values for ingredients_text_with_allergens_cs (1 unique shown, of 1 total).
valuecountshare
Kakaová hmota, cukr, kakaové máslo, vanilka.12.0%

generic_name_sr categorical metadata

This appears to be a Serbian-language generic product name field (generic_name_sr), likely a localized label in a multilingual product catalog. It is almost entirely empty: 96% null across 50 rows, with only 2 distinct values observed, one of which is itself a blank string. The single non-empty entry is 'Tamna čokolada sa 70% kakaa', so this column carries virtually no usable signal in this sample.

Treatment: Drop or ignore for modelling; retain only if a Serbian-locale view is required.

anthropic:claude-opus-4-7 · confidence high
Out[1268]:

saturn.columns["generic_name_sr"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Tamna čokolada sa 70% kakaa
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 341.
Top values for generic_name_sr.
Show data table
Top values for generic_name_sr (2 unique shown, of 2 total).
valuecountshare
Tamna čokolada sa 70% kakaa12.0%
12.0%

packaging_text_cs categorical free_text

Czech-language packaging text field that is effectively empty: 94% null and the only non-null value observed (3 rows) is itself an empty string, giving cardinality 1 and entropy 0. There is no usable signal here in this sample.

Treatment: Drop; column is constant-empty with 94% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[1271]:

saturn.columns["packaging_text_cs"].stats

statvalue
n50
nulls47 (94.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate94.0% null
alert: imbalancetop value is 100.0% of rows
Fig 342.
Top values for packaging_text_cs.
Show data table
Top values for packaging_text_cs (1 unique shown, of 1 total).
valuecountshare
36.0%

product_name_sr categorical metadata

This looks like a Serbian-localized product name field, but it is effectively empty: 96% of the 50 rows are null and only 2 distinct non-null values appear, one in English ('Excellence 70% Cocoa Intense Dark') and one in Cyrillic ('Течен Шоколад Нутела'). The language mix between Latin English and Cyrillic Serbian is notable for a column nominally tagged 'sr'.

Treatment: Drop or defer until coverage improves; with 96% nulls and 2 unique values it carries no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1274]:

saturn.columns["product_name_sr"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Excellence 70% Cocoa Intense Dark
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 343.
Top values for product_name_sr.
Show data table
Top values for product_name_sr (2 unique shown, of 2 total).
valuecountshare
Excellence 70% Cocoa Intense Dark12.0%
Течен Шоколад Нутела12.0%

ingredients_text_hu_ocr_1571428260_result categorical free_text

This appears to be the OCR-extracted Hungarian ingredients text for a single product (likely chocolate: cocoa mass, sugar, cocoa butter, vanilla), captured from one timestamped scan. With null_rate 0.98 and only 1 unique non-null value across n=50, this column is effectively empty and carries no comparative signal. The single populated row repeats verbatim, so cardinality and entropy are both at floor.

Treatment: Drop; 98% null with a single OCR string offers no modelling value.

anthropic:claude-opus-4-7 · confidence high
Out[1277]:

saturn.columns["ingredients_text_hu_ocr_1571428260_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value kakaómassza, cukor, kakaó - vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 344.
Top values for ingredients_text_hu_ocr_1571428260_result.
Show data table
Top values for ingredients_text_hu_ocr_1571428260_result (1 unique shown, of 1 total).
valuecountshare
kakaómassza, cukor, kakaó - vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.12.0%

ingredients_text_hu_ocr_1571428260 categorical free_text

This appears to be a Hungarian-language OCR extraction of an ingredients list (likely from a chocolate product, mentioning cocoa mass, sugar, cocoa butter, and bourbon vanilla). Of 50 rows, 98% are null and only 1 non-null value exists, so the column is effectively empty. The single populated entry is free-text in Hungarian, not a category, despite being typed as categorical.

Treatment: Drop; 98% null with a single Hungarian free-text value carries no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1280]:

saturn.columns["ingredients_text_hu_ocr_1571428260"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value kakaómassza, cukor, kakaó- vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 345.
Top values for ingredients_text_hu_ocr_1571428260.
Show data table
Top values for ingredients_text_hu_ocr_1571428260 (1 unique shown, of 1 total).
valuecountshare
kakaómassza, cukor, kakaó- vaj, természetes bourbon vanília. Nyomokban egyéb dióféléket, tejet, szóját, szezámmagot es búzát tartalmazhat.12.0%

generic_name_hu categorical metadata

Hungarian generic-name field that is almost entirely missing: 92% of the 50 rows are null, and of the 4 non-null entries, 3 are empty strings and only 1 carries a value ("Finom"). With cardinality of 2 and a top_rate of 0.75 on the empty string, this column carries virtually no usable signal in the sample.

Treatment: Drop unless a fuller source can be joined in; current null rate makes it unusable.

anthropic:claude-opus-4-7 · confidence high
Out[1283]:

saturn.columns["generic_name_hu"].stats

statvalue
n50
nulls46 (92.0%)
unique2
top_value
top_rate 0.75
cardinality 2
entropy 0.8113
entropy_ratio 0.8113
alert: null_rate92.0% null
Fig 346.
Top values for generic_name_hu.
Show data table
Top values for generic_name_hu (2 unique shown, of 2 total).
valuecountshare
36.0%
Finom12.0%

product_name_hu categorical metadata

Hungarian-language product name field that is effectively empty: 92% of the 50 rows are null, and the only 4 non-null entries collapse to 3 distinct strings (one of which is the empty string, appearing twice). The two actual values present ("Excellence 70% Cocoa Intense Dark", "Dark Chocolate 70% Cacao") are in English, not Hungarian, suggesting the localisation pipeline never populated this column.

Treatment: Drop; null_rate 0.92 and no genuine Hungarian content make it unusable.

anthropic:claude-opus-4-7 · confidence high
Out[1286]:

saturn.columns["product_name_hu"].stats

statvalue
n50
nulls46 (92.0%)
unique3
top_value
top_rate 0.5
cardinality 3
entropy 1.5
entropy_ratio 0.9464
alert: long_tail2 singleton categories
alert: null_rate92.0% null
Fig 347.
Top values for product_name_hu.
Show data table
Top values for product_name_hu (3 unique shown, of 3 total).
valuecountshare
24.0%
Excellence 70% Cocoa Intense Dark12.0%
Dark Chocolate 70% Cacao12.0%

ingredients_text_with_allergens_sr categorical free_text

Serbian-language ingredients list with allergen annotations, populated for only 2 of 50 rows (null_rate 0.96). Of the two non-null entries, one is an empty string and the other is a chocolate ingredient list, leaving effectively a single usable value. Coverage is too sparse to support any aggregate analysis.

Treatment: Drop or defer; coverage is 4% and insufficient for modelling.

anthropic:claude-opus-4-7 · confidence high
Out[1289]:

saturn.columns["ingredients_text_with_allergens_sr"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value Šećer, kakao masa, kakao buter, vanile.
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 348.
Top values for ingredients_text_with_allergens_sr.
Show data table
Top values for ingredients_text_with_allergens_sr (2 unique shown, of 2 total).
valuecountshare
Šećer, kakao masa, kakao buter, vanile.12.0%
12.0%

ingredients_text_es_ocr_1548767061_result categorical free_text

This appears to be the result of an OCR pass extracting Spanish ingredient text, with the timestamped name suggesting it's one run among many. Of 50 rows, 98% are null and the single non-null value is a Spanish chocolate ingredients list (cocoa paste, sugar, cocoa butter, sunflower lecithin E-322, vanilla extract, 70% cocoa minimum). With cardinality of 1 and entropy 0, this column carries essentially no information across the dataset.

Treatment: drop; 98% null with only one distinct OCR result provides no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1292]:

saturn.columns["ingredients_text_es_ocr_1548767061_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 349.
Top values for ingredients_text_es_ocr_1548767061_result.
Show data table
Top values for ingredients_text_es_ocr_1548767061_result (1 unique shown, of 1 total).
valuecountshare
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.12.0%

product_name_xx categorical metadata

A categorical field, likely a localized product name (suffix _xx suggests a translation/locale variant). It is effectively empty: 96% null and the only 2 non-null rows contain the empty string, giving cardinality 1 and zero entropy.

Treatment: Drop; the column carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1295]:

saturn.columns["product_name_xx"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 350.
Top values for product_name_xx.
Show data table
Top values for product_name_xx (1 unique shown, of 1 total).
valuecountshare
24.0%

generic_name_xx categorical metadata

This appears to be a localized generic name field (suffix _xx suggests a translation/locale variant), but it is effectively empty: 96% of the 50 rows are null, and the only 2 non-null values are blank strings. Cardinality is 1 with zero entropy, so the column carries no information.

Treatment: Drop; null_rate 0.96 and single empty value provide no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1298]:

saturn.columns["generic_name_xx"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 351.
Top values for generic_name_xx.
Show data table
Top values for generic_name_xx (1 unique shown, of 1 total).
valuecountshare
24.0%

ingredients_text_es_ocr_1548767061 categorical free_text

This appears to be a Spanish-language OCR capture of an ingredients list (likely from a chocolate product label, given 'Pasta de cacao' and '70% mínimo'). It is effectively empty: 98% null across 50 rows, with the single non-null value being one verbatim ingredients string. There is no analytical signal here — entropy is 0 and cardinality is 1.

Treatment: Drop; 98% null with only one observed value.

anthropic:claude-opus-4-7 · confidence high
Out[1301]:

saturn.columns["ingredients_text_es_ocr_1548767061"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 352.
Top values for ingredients_text_es_ocr_1548767061.
Show data table
Top values for ingredients_text_es_ocr_1548767061 (1 unique shown, of 1 total).
valuecountshare
Pasta de cacao, azúcar, manteca de cacao, emulgente: lecitina de girasol (E-322), extracto de vainilla. Cacao: 70% mínimo. 12.0%

ingredients_text_xx categorical free_text

This appears to be a localized ingredients text field (suffix _xx suggests a placeholder or unknown locale variant). It is effectively empty: 96% of the 50 rows are null, and the only 2 non-null entries are both empty strings, giving cardinality 1 and entropy 0.

Treatment: Drop; the column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[1304]:

saturn.columns["ingredients_text_xx"].stats

statvalue
n50
nulls48 (96.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: null_rate96.0% null
alert: imbalancetop value is 100.0% of rows
Fig 353.
Top values for ingredients_text_xx.
Show data table
Top values for ingredients_text_xx (1 unique shown, of 1 total).
valuecountshare
24.0%

origin_xx categorical other

The column 'origin_xx' is effectively empty: 98% of its 50 rows are null, and the single non-null value is itself an empty string, giving a cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop the column; it carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[1307]:

saturn.columns["origin_xx"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 354.
Top values for origin_xx.
Show data table
Top values for origin_xx (1 unique shown, of 1 total).
valuecountshare
12.0%

packaging_text_xx categorical metadata

This appears to be a localized packaging text field (xx language suffix) that is essentially empty in this sample. 98% of rows are null and the single non-null value is itself an empty string, leaving zero effective cardinality and zero entropy.

Treatment: Drop; the column carries no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1310]:

saturn.columns["packaging_text_xx"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 355.
Top values for packaging_text_xx.
Show data table
Top values for packaging_text_xx (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_ur categorical free_text

Likely an Urdu-language ingredient text field (ingredients_text_ur), almost entirely absent from this sample. 98% of rows are null, and the single non-null value is an empty string, leaving zero usable content. Cardinality is 1 with entropy 0.0, so the column carries no information here.

Treatment: Drop from modelling; retain only if a fuller multilingual extract becomes available.

anthropic:claude-opus-4-7 · confidence high
Out[1313]:

saturn.columns["ingredients_text_ur"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 356.
Top values for ingredients_text_ur.
Show data table
Top values for ingredients_text_ur (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_ur categorical metadata

This appears to be an Urdu-language product name field that is essentially empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string. Cardinality collapses to 1 and entropy is 0, so the column carries no information as captured.

Treatment: Drop; column is effectively empty with zero entropy.

anthropic:claude-opus-4-7 · confidence high
Out[1316]:

saturn.columns["product_name_ur"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 357.
Top values for product_name_ur.
Show data table
Top values for product_name_ur (1 unique shown, of 1 total).
valuecountshare
12.0%

origin_he categorical metadata

Column 'origin_he' is effectively empty: 98% of the 50 rows are null and the only observed value is the empty string, giving a cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop; column carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[1319]:

saturn.columns["origin_he"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 358.
Top values for origin_he.
Show data table
Top values for origin_he (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_he categorical metadata

Hebrew product name field that is almost entirely empty — 96% null across 50 rows, leaving only 2 non-null values, both unique (נוטלה and תפוציפס שמנת בצל). With just two observations the entropy ratio of 1.0 is meaningless, and the column cannot support any analysis in its current state.

Treatment: Drop or defer until backfilled; 96% null makes it unusable downstream.

anthropic:claude-opus-4-7 · confidence high
Out[1322]:

saturn.columns["product_name_he"].stats

statvalue
n50
nulls48 (96.0%)
unique2
top_value נוטלה
top_rate 0.5
cardinality 2
entropy 1
entropy_ratio 1
alert: long_tail2 singleton categories
alert: null_rate96.0% null
Fig 359.
Top values for product_name_he.
Show data table
Top values for product_name_he (2 unique shown, of 2 total).
valuecountshare
נוטלה12.0%
תפוציפס שמנת בצל12.0%

origin_ur categorical metadata

The column 'origin_ur' appears to be a near-empty metadata or URL-like field, with 98% nulls across 50 rows and only 1 non-null value, which is itself an empty string. Cardinality is 1 and entropy is 0, so the column carries no information as captured. The truncated name ('origin_ur', likely 'origin_url') and total absence of real values suggest a broken or unused field.

Treatment: Drop; column has 98% nulls and a single empty-string value, providing no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1325]:

saturn.columns["origin_ur"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 360.
Top values for origin_ur.
Show data table
Top values for origin_ur (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_ur categorical metadata

This appears to be an Urdu-language generic name field, likely a localized translation of a drug or product's generic name. It is effectively empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string, giving cardinality 1 and entropy 0. There is no usable signal here.

Treatment: Drop the column; it is 98% null with the only present value being blank.

anthropic:claude-opus-4-7 · confidence high
Out[1328]:

saturn.columns["generic_name_ur"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 361.
Top values for generic_name_ur.
Show data table
Top values for generic_name_ur (1 unique shown, of 1 total).
valuecountshare
12.0%

packaging_text_he categorical free_text

Hebrew packaging text field that is essentially empty: 98% of the 50 rows are null and the single non-null observation is itself an empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here for any downstream task.

Treatment: Drop the column; it carries no information.

anthropic:claude-opus-4-7 · confidence high
Out[1331]:

saturn.columns["packaging_text_he"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 362.
Top values for packaging_text_he.
Show data table
Top values for packaging_text_he (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_he categorical free_text

Hebrew-language ingredient text field, almost entirely absent from this sample. 98% of the 50 rows are null, and the single non-null value is an empty string, leaving cardinality at 1 and entropy at 0.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1334]:

saturn.columns["ingredients_text_he"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 363.
Top values for ingredients_text_he.
Show data table
Top values for ingredients_text_he (1 unique shown, of 1 total).
valuecountshare
12.0%

packaging_text_ur categorical free_text

Likely an Urdu-language packaging text field, but it carries essentially no information here: 98% of rows are null and the only non-null value observed is an empty string. Cardinality is 1 with entropy 0.0, so the column is constant across the populated rows.

Treatment: Drop; effectively empty with 98% nulls and a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[1337]:

saturn.columns["packaging_text_ur"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 364.
Top values for packaging_text_ur.
Show data table
Top values for packaging_text_ur (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_he categorical metadata

Hebrew generic drug name field, but it is effectively empty: 98% null across 50 rows, leaving a single non-null value (a Hebrew product description) that occupies 100% of the observed entries. Cardinality is 1 and entropy is 0, so the column carries no discriminating signal in this sample.

Treatment: Drop from modelling; revisit only if a fuller extract populates the field.

anthropic:claude-opus-4-7 · confidence high
Out[1340]:

saturn.columns["generic_name_he"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value ממרח אגוזי לוז עם קקאו
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 365.
Top values for generic_name_he.
Show data table
Top values for generic_name_he (1 unique shown, of 1 total).
valuecountshare
ממרח אגוזי לוז עם קקאו12.0%

ingredients_text_with_allergens_he categorical free_text

Hebrew-localized ingredients-with-allergens text, almost entirely absent: 98% null across 50 rows and the single non-null value is an empty string. Cardinality is 1 with zero entropy, so this column carries no information in this sample.

Treatment: Drop; no usable signal in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1343]:

saturn.columns["ingredients_text_with_allergens_he"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 366.
Top values for ingredients_text_with_allergens_he.
Show data table
Top values for ingredients_text_with_allergens_he (1 unique shown, of 1 total).
valuecountshare
12.0%

nutriscore_grade_producer categorical feature

Producer-supplied Nutri-Score letter grade (a–e scale), captured here as three distinct values: 'c', 'e', and 'b', each appearing once. The column is essentially empty, with a 94% null rate leaving only 3 of 50 rows populated, so the apparent uniform entropy (1.58) is an artefact of tiny sample size. No 'a' or 'd' grades observed in the evidence.

Treatment: Drop or defer; too sparse (94% null) to use until producer coverage improves.

anthropic:claude-opus-4-7 · confidence high
Out[1346]:

saturn.columns["nutriscore_grade_producer"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value c
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 367.
Top values for nutriscore_grade_producer.
Show data table
Top values for nutriscore_grade_producer (3 unique shown, of 3 total).
valuecountshare
c12.0%
e12.0%
b12.0%

nutriscore_grade_producer_imported categorical feature

This appears to be a Nutri-Score grade (a-e scale) imported from producer data, stored as a categorical letter grade. The column is almost entirely empty with a 94% null rate, leaving only 3 non-null values across 3 distinct grades (c, e, b) — too sparse to draw any distributional conclusions.

Treatment: Drop or treat as missing-indicator only; too sparse (94% null) to use as a feature.

anthropic:claude-opus-4-7 · confidence high
Out[1349]:

saturn.columns["nutriscore_grade_producer_imported"].stats

statvalue
n50
nulls47 (94.0%)
unique3
top_value c
top_rate 0.3333
cardinality 3
entropy 1.585
entropy_ratio 1
alert: long_tail3 singleton categories
alert: null_rate94.0% null
Fig 368.
Top values for nutriscore_grade_producer_imported.
Show data table
Top values for nutriscore_grade_producer_imported (3 unique shown, of 3 total).
valuecountshare
c12.0%
e12.0%
b12.0%

packaging_text_el categorical free_text

This appears to be Greek-language packaging text, but it's effectively empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, meaning there is no usable signal here whatsoever.

Treatment: Drop the column; it has no observed content.

anthropic:claude-opus-4-7 · confidence high
Out[1352]:

saturn.columns["packaging_text_el"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 369.
Top values for packaging_text_el.
Show data table
Top values for packaging_text_el (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_el categorical free_text

Greek-language ingredients-with-allergens text field that is effectively empty in this sample: 98% null and the only non-null value observed is itself an empty string. With cardinality of 1 and entropy 0, this column carries no signal here.

Treatment: Drop; no usable content in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1355]:

saturn.columns["ingredients_text_with_allergens_el"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 370.
Top values for ingredients_text_with_allergens_el.
Show data table
Top values for ingredients_text_with_allergens_el (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_el categorical free_text

This is the Greek-language ingredients text field, presumably from a multilingual food product dataset. It is effectively empty: 98% null across 50 rows, and the single non-null value is itself an empty string, leaving cardinality at 1 and entropy at 0.

Treatment: Drop; no usable signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1358]:

saturn.columns["ingredients_text_el"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 371.
Top values for ingredients_text_el.
Show data table
Top values for ingredients_text_el (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_el categorical metadata

Greek-language generic name field that is effectively empty: 98% nulls and the only observed non-null value is itself an empty string, giving a single unique entry across 50 rows. Entropy is 0.0 and top_rate is 1.0, so the column carries no information in this sample.

Treatment: Drop; the column has no usable signal here.

anthropic:claude-opus-4-7 · confidence high
Out[1361]:

saturn.columns["generic_name_el"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 372.
Top values for generic_name_el.
Show data table
Top values for generic_name_el (1 unique shown, of 1 total).
valuecountshare
12.0%

origin_el categorical other

The column 'origin_el' is nearly entirely empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string, leaving cardinality at 1 and entropy at 0. There is no usable signal here whatsoever.

Treatment: Drop the column; it is effectively all null with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[1364]:

saturn.columns["origin_el"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 373.
Top values for origin_el.
Show data table
Top values for origin_el (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_el categorical metadata

This appears to be a Greek-language product name field that is effectively empty in this sample. 98% of rows are null and the single non-null value is itself an empty string, giving cardinality 1 and zero entropy. There is no usable signal here at n=50.

Treatment: Drop from modelling; revisit only if a larger sample shows actual Greek text.

anthropic:claude-opus-4-7 · confidence high
Out[1367]:

saturn.columns["product_name_el"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 374.
Top values for product_name_el.
Show data table
Top values for product_name_el (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_th categorical metadata

Thai-language generic drug name field that is effectively empty in this sample: 98% of 50 rows are null and the only non-null value observed is the empty string, giving a single distinct value and zero entropy.

Treatment: Drop from modelling; the column carries no information in this slice.

anthropic:claude-opus-4-7 · confidence high
Out[1370]:

saturn.columns["generic_name_th"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 375.
Top values for generic_name_th.
Show data table
Top values for generic_name_th (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_de_ocr_1559410715_result categorical free_text

This column appears to hold the OCR result of a German ingredients text extraction (likely from a product label image), with a timestamp embedded in the column name. Of 50 rows, 98% are null and only one non-null value exists — a single German cocoa-product ingredient list mentioning possible traces of nuts, milk, and soy. With cardinality 1 and entropy 0, the column carries effectively no signal at this sample size.

Treatment: Drop; 98% null and only one distinct OCR string provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1373]:

saturn.columns["ingredients_text_de_ocr_1559410715_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 376.
Top values for ingredients_text_de_ocr_1559410715_result.
Show data table
Top values for ingredients_text_de_ocr_1559410715_result (1 unique shown, of 1 total).
valuecountshare
Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.12.0%

ingredients_text_with_allergens_th categorical free_text

This appears to be a Thai-localized ingredients-with-allergens text field, but it is effectively empty: 98% of 50 rows are null and the single populated row contains an English-language ingredients string for a 99% cocoa product. The column carries zero entropy (entropy_ratio 0.0) and only one distinct value, so it provides no analytical signal. The language mismatch (English content in a _th column) is also notable.

Treatment: Drop; 98% null and a single non-Thai value make it unusable.

anthropic:claude-opus-4-7 · confidence high
Out[1376]:

saturn.columns["ingredients_text_with_allergens_th"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 377.
Top values for ingredients_text_with_allergens_th.
Show data table
Top values for ingredients_text_with_allergens_th (1 unique shown, of 1 total).
valuecountshare
Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.12.0%

packaging_text_th categorical free_text

This appears to be a Thai-language packaging text field, but it is effectively empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, so the column carries no information.

Treatment: Drop; no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1379]:

saturn.columns["packaging_text_th"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 378.
Top values for packaging_text_th.
Show data table
Top values for packaging_text_th (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_th categorical metadata

Thai-language product name field, almost entirely empty: 98% null across 50 rows with only a single non-null value (one Lindt dark chocolate entry). Cardinality is 1 and entropy is 0, so this column carries no discriminating signal in this sample.

Treatment: Drop from modelling; revisit only if a fuller Thai-localised dump becomes available.

anthropic:claude-opus-4-7 · confidence high
Out[1382]:

saturn.columns["product_name_th"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value ลินด์ เอ็กเซอร์แลนซ์ ดาร์ก 99% โกโก้ ดาร์ก แอปโซลูท ช็อกโกแลต
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 379.
Top values for product_name_th.
Show data table
Top values for product_name_th (1 unique shown, of 1 total).
valuecountshare
ลินด์ เอ็กเซอร์แลนซ์ ดาร์ก 99% โกโก้ ดาร์ก แอปโซลูท ช็อกโกแลต12.0%

ingredients_text_de_ocr_1548767354_result categorical free_text

This column appears to hold the OCR result of a German ingredients label (one specific dark chocolate product) tied to a timestamped run (1548767354). Of 50 rows, 98% are null and the single non-null value occupies the entire observed cardinality of 1, giving zero entropy. There is effectively no variation to learn from here.

Treatment: Drop; 98% null and only one distinct OCR string.

anthropic:claude-opus-4-7 · confidence high
Out[1385]:

saturn.columns["ingredients_text_de_ocr_1548767354_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 380.
Top values for ingredients_text_de_ocr_1548767354_result.
Show data table
Top values for ingredients_text_de_ocr_1548767354_result (1 unique shown, of 1 total).
valuecountshare
Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.12.0%

ingredients_text_th categorical free_text

This is a Thai-language ingredients text field (ingredients_text_th), but 98% of the 50 rows are null and the single non-null entry is actually English text describing cocoa-based product ingredients. With cardinality of 1 and entropy of 0, the column carries no usable signal and the one populated value appears to be in the wrong language for the field.

Treatment: Drop; 98% null and the lone value is mislocalized.

anthropic:claude-opus-4-7 · confidence high
Out[1388]:

saturn.columns["ingredients_text_th"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 381.
Top values for ingredients_text_th.
Show data table
Top values for ingredients_text_th (1 unique shown, of 1 total).
valuecountshare
Cocoa solids 99%, Cocoa paste, fat-reduced cocoa, cocoa butter, demerara sugar. May contain nuts, milk and soya.12.0%

origin_th categorical other

Column 'origin_th' is effectively empty: 98% of the 50 rows are null and the only observed non-null value is itself an empty string, giving a cardinality of 1 and entropy of 0. There is no signal here to model or join on.

Treatment: Drop; the column is 98% null with a single empty-string value otherwise.

anthropic:claude-opus-4-7 · confidence high
Out[1391]:

saturn.columns["origin_th"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 382.
Top values for origin_th.
Show data table
Top values for origin_th (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_de_ocr_1548767354 categorical free_text

This appears to be German-language OCR text of product ingredient lists, likely from a food database (the sole observed value describes dark chocolate ingredients). The column is almost entirely empty with a 0.98 null rate, and only one non-null record exists across 50 rows, yielding cardinality 1 and entropy 0. With a single observation there is no usable variation for analysis.

Treatment: Drop; 98% null with only one observed value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1394]:

saturn.columns["ingredients_text_de_ocr_1548767354"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 383.
Top values for ingredients_text_de_ocr_1548767354.
Show data table
Top values for ingredients_text_de_ocr_1548767354 (1 unique shown, of 1 total).
valuecountshare
Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten. 12.0%

ingredients_text_de_ocr_1559410715 categorical free_text

This appears to be a German-language OCR extraction of an ingredients list (likely from a chocolate product packaging), captured as free text. It is almost entirely empty with a 0.98 null rate, and the single non-null row contains one verbose ingredient declaration, giving cardinality 1 and entropy 0.0. With only one observed value out of 50 rows, this column carries no usable signal as-is.

Treatment: Drop; effectively empty (98% null, single distinct OCR string).

anthropic:claude-opus-4-7 · confidence high
Out[1397]:

saturn.columns["ingredients_text_de_ocr_1559410715"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 384.
Top values for ingredients_text_de_ocr_1559410715.
Show data table
Top values for ingredients_text_de_ocr_1559410715 (1 unique shown, of 1 total).
valuecountshare
Extra feine dunkle Schokolade. Schokolade enthält: Kakao: mind. 99%. Zutaten: Kakaomasse, fettarmes Kakaopulver, Kakaobutter, Rohrzucker. Kann Schalenfrüchte, Milch und Soja enthalten.12.0%

ingredients_text_it_ocr_1559410715 categorical free_text

This appears to be an Italian-language OCR extraction of an ingredients list, likely from a food product label (the lone value describes 99% dark chocolate). The column is effectively empty: 98% null across 50 rows, with only a single non-null value, giving cardinality 1 and entropy 0. There is no variation to learn from here.

Treatment: Drop; 98% null with a single observed value carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1400]:

saturn.columns["ingredients_text_it_ocr_1559410715"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Cioccolato amaro extra. Cacao: 99% minimo. Ingredienti: pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 385.
Top values for ingredients_text_it_ocr_1559410715.
Show data table
Top values for ingredients_text_it_ocr_1559410715 (1 unique shown, of 1 total).
valuecountshare
Cioccolato amaro extra. Cacao: 99% minimo. Ingredienti: pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.12.0%

ingredients_text_it_ocr_1559410715_result categorical free_text

This column appears to hold OCR-extracted Italian ingredient text (timestamp-suffixed name suggests a single OCR pass result). It is effectively empty: 98% null across n=50, with only one non-null value — a chocolate ingredient list — giving cardinality 1 and zero entropy.

Treatment: Drop; a single non-null OCR string carries no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1403]:

saturn.columns["ingredients_text_it_ocr_1559410715_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 386.
Top values for ingredients_text_it_ocr_1559410715_result.
Show data table
Top values for ingredients_text_it_ocr_1559410715_result (1 unique shown, of 1 total).
valuecountshare
pasta di cacao, cacao magro, burro di cacao, zucchero grezzo di canna. Può contenere frutta a guscio, latte e soia.12.0%

packaging_text_fr_imported categorical free_text

Likely a French-language packaging description imported from an external source (e.g., Open Food Facts), describing recycling instructions for components. The column is 98% null and only a single non-null value appears across 50 rows, so it carries no analytical signal in this sample.

Treatment: Drop; near-entirely null with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[1406]:

saturn.columns["packaging_text_fr_imported"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value 1 FEUILLE PAPIER À RECYCLER, 1 FEUILLE METAL À RECYCLER.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 387.
Top values for packaging_text_fr_imported.
Show data table
Top values for packaging_text_fr_imported (1 unique shown, of 1 total).
valuecountshare
1 FEUILLE PAPIER À RECYCLER, 1 FEUILLE METAL À RECYCLER.12.0%

preparation_fr_imported categorical metadata

This appears to be a French-language preparation/readiness label imported from an external source, indicating how a product is prepared. It is effectively unusable here: 98% of the 50 rows are null, and the single non-null value is "Produit prêt à consommer", giving cardinality 1 and entropy 0.

Treatment: Drop; near-entirely null with a single constant value carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1409]:

saturn.columns["preparation_fr_imported"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Produit prêt à consommer
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 388.
Top values for preparation_fr_imported.
Show data table
Top values for preparation_fr_imported (1 unique shown, of 1 total).
valuecountshare
Produit prêt à consommer12.0%

preparation categorical metadata

A categorical preparation field, likely indicating how a food product should be prepared before consumption. It is effectively empty: 98% of the 50 rows are null, leaving only a single observed value ("Produit prêt à consommer") with cardinality 1 and zero entropy. With no variation among the populated rows, the column carries no discriminative signal.

Treatment: Drop; 98% null and only one observed value.

anthropic:claude-opus-4-7 · confidence high
Out[1412]:

saturn.columns["preparation"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Produit prêt à consommer
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 389.
Top values for preparation.
Show data table
Top values for preparation (1 unique shown, of 1 total).
valuecountshare
Produit prêt à consommer12.0%

preparation_fr categorical metadata

This appears to be a French-language preparation instruction field, likely metadata describing how a product is consumed. It is essentially empty: 98% of the 50 rows are null, and the single non-null value is "Produit prêt à consommer", giving cardinality 1 and entropy 0. There is no variation to learn from in this sample.

Treatment: Drop; 98% null with only one observed value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1415]:

saturn.columns["preparation_fr"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Produit prêt à consommer
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 390.
Top values for preparation_fr.
Show data table
Top values for preparation_fr (1 unique shown, of 1 total).
valuecountshare
Produit prêt à consommer12.0%

ingredients_text_lc categorical free_text

This appears to be a lowercased ingredients text field, likely derived from a product or recipe source. It is effectively empty: 98% of the 50 rows are null, and the single non-null value is itself an empty string, giving cardinality 1 and entropy 0. There is no usable signal here.

Treatment: Drop; the column is 98% null and the only observed value is empty.

anthropic:claude-opus-4-7 · confidence high
Out[1418]:

saturn.columns["ingredients_text_lc"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 391.
Top values for ingredients_text_lc.
Show data table
Top values for ingredients_text_lc (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_lc categorical feature

A lowercased product name field that is effectively empty: 98% of the 50 rows are null and the single non-null value is also an empty string, giving a cardinality of 1 and entropy of 0. There is no usable signal here.

Treatment: Drop; column is 98% null with the only observed value being an empty string.

anthropic:claude-opus-4-7 · confidence high
Out[1421]:

saturn.columns["product_name_lc"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 392.
Top values for product_name_lc.
Show data table
Top values for product_name_lc (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_lc categorical free_text

Almost certainly a normalised lowercase ingredients-with-allergens text field, but in this sample it carries no signal: 98% of rows are null and the single non-null value is an empty string. With cardinality 1 and entropy 0, there is nothing to learn from this column as-is.

Treatment: Drop from the working set unless a larger sample shows real text content.

anthropic:claude-opus-4-7 · confidence high
Out[1424]:

saturn.columns["ingredients_text_with_allergens_lc"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 393.
Top values for ingredients_text_with_allergens_lc.
Show data table
Top values for ingredients_text_with_allergens_lc (1 unique shown, of 1 total).
valuecountshare
12.0%

generic_name_lc categorical feature

This appears to be a lowercased generic drug name field, but it is effectively empty in this sample: 98% null and the only non-null value among 50 rows is an empty string. With cardinality of 1 and entropy of 0, the column carries no information here.

Treatment: Drop; no signal at this null rate and cardinality.

anthropic:claude-opus-4-7 · confidence high
Out[1427]:

saturn.columns["generic_name_lc"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 394.
Top values for generic_name_lc.
Show data table
Top values for generic_name_lc (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_xx_debug_tags unknown metadata

This column is flagged as unknown kind with all profiling skipped, so saturn produced no statistics beyond a 50-row count and zero nulls. The name suggests it holds debug tags attached to multilingual (xx) ingredient text, likely a list or structured field the profiler could not parse. Without unique counts, value samples, or type information there is nothing further to characterise.

Treatment: Inspect raw values manually; drop unless debug tags are needed downstream.

anthropic:claude-opus-4-7 · confidence low
Out[1430]:

saturn.columns["ingredients_text_xx_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

product_name_xx_debug_tags unknown metadata

This column was skipped by the profiler, so its kind is unknown and no descriptive statistics are available beyond a row count of 50 and a null rate of 0.0. The name suggests it holds debug tags attached to localized product names (the 'xx' locale and 'debug_tags' suffix), which is typically engineering scaffolding rather than analytical signal. Without unique counts or value samples, nothing can be said about cardinality or content.

Treatment: Drop unless a downstream debugging workflow specifically needs it.

anthropic:claude-opus-4-7 · confidence low
Out[1432]:

saturn.columns["product_name_xx_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

generic_name_xx_debug_tags unknown metadata

This column is flagged as kind 'unknown' and was skipped by the profiler, so no statistics were computed beyond a row count of 50 and a null rate of 0.0. The name suggests it holds debug tags associated with a generic name field, likely diagnostic metadata rather than analytical signal. Without unique counts or value samples, nothing further can be inferred.

Treatment: Drop unless debug tags are explicitly needed for tracing.

anthropic:claude-opus-4-7 · confidence low
Out[1434]:

saturn.columns["generic_name_xx_debug_tags"].stats

statvalue
n50
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

ingredients_text_fr_ocr_1561814324 categorical free_text

This appears to be a French-language OCR capture of an ingredients list, timestamped in the column name (1561814324). With null_rate of 0.98, only a single non-null row exists out of 50, and that lone value is a full ingredients paragraph—cardinality is 1 and entropy is 0. There is effectively no signal here for analysis.

Treatment: Drop; 98% null with a single observed value provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1436]:

saturn.columns["ingredients_text_fr_ocr_1561814324"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value 25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides. Conditionné sous atmosphère protectrice.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 395.
Top values for ingredients_text_fr_ocr_1561814324.
Show data table
Top values for ingredients_text_fr_ocr_1561814324 (1 unique shown, of 1 total).
valuecountshare
25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides. Conditionné sous atmosphère protectrice.12.0%

ingredients_text_fr_ocr_1561814324_result categorical free_text

This appears to be the OCR result of a French ingredients label, captured at a specific timestamp (1561814324) suggesting it's one of many time-stamped OCR snapshot columns. Of 50 rows, 98% are null and only a single non-null value exists — a verbose French ingredients string for a nut-and-raisin mix. With cardinality 1 and entropy 0, the column carries essentially no analytical signal in this sample.

Treatment: Drop; 98% null and only one distinct OCR string provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1439]:

saturn.columns["ingredients_text_fr_ocr_1561814324_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value 25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 396.
Top values for ingredients_text_fr_ocr_1561814324_result.
Show data table
Top values for ingredients_text_fr_ocr_1561814324_result (1 unique shown, of 1 total).
valuecountshare
25 % cerneaux de noix, 25 % amandes décortiquées 25 % raisins secs sultanines (raisins secs,huile de tournesol. antioxydant: anhydride lfureux), 15% canneberges, 9,8% sucre, huile de tournesol. Traces éventuelles d'autres fruits à coque et d'arachides.12.0%

ingredients_text_fr_ocr_1624039072_result categorical free_text

This column appears to hold OCR results of French ingredient lists, likely from a timestamped extraction run (1624039072). It is effectively empty: 98% null with only 1 non-null value out of 50, that single entry being a cocoa/soy lecithin/vanilla ingredient string. With cardinality 1 and entropy 0, the column carries no usable signal in this sample.

Treatment: Drop; 98% null with a single observed value provides no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1442]:

saturn.columns["ingredients_text_fr_ocr_1624039072_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Cacao, émulsifiant (lécithine de _soja_), vanille.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 397.
Top values for ingredients_text_fr_ocr_1624039072_result.
Show data table
Top values for ingredients_text_fr_ocr_1624039072_result (1 unique shown, of 1 total).
valuecountshare
Cacao, émulsifiant (lécithine de _soja_), vanille.12.0%

ingredients_text_fr_ocr_1624039072 categorical free_text

This appears to be French OCR-extracted ingredient text from product packaging, with the timestamp suffix suggesting a dated extraction run. Out of 50 rows, 98% are null and only a single non-null value exists ('ingrédients : cacao, émulsifiant (lécithine de _soja_), vanille.'), giving cardinality 1 and zero entropy. The column is effectively empty and carries no discriminative signal.

Treatment: Drop; 98% null with a single observed value provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1445]:

saturn.columns["ingredients_text_fr_ocr_1624039072"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value ingrédients : cacao, émulsifiant (lécithine de _soja_), vanille.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 398.
Top values for ingredients_text_fr_ocr_1624039072.
Show data table
Top values for ingredients_text_fr_ocr_1624039072 (1 unique shown, of 1 total).
valuecountshare
ingrédients : cacao, émulsifiant (lécithine de _soja_), vanille.12.0%

ingredients_text_fr_ocr_1573108346 categorical free_text

This appears to be a French-language OCR-extracted ingredients list (likely from a food product label, given mentions of flour, sugar, butter, eggs, and emulsifiers). Out of 50 rows, 98% are null and only a single non-null value exists, giving cardinality 1 and entropy 0. The column is effectively empty and carries no discriminative signal in this sample.

Treatment: Drop; 98% null with a single observed value provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1448]:

saturn.columns["ingredients_text_fr_ocr_1573108346"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 399.
Top values for ingredients_text_fr_ocr_1573108346.
Show data table
Top values for ingredients_text_fr_ocr_1573108346 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1566920858_result categorical free_text

This appears to be the OCR-extracted French ingredients text from a single product image (timestamped 1566920858), holding raw label transcriptions. The column is effectively empty: 98% null across n=50, with only one non-null value — a single French ingredient list for a butter/egg pastry product. Cardinality is 1 and entropy is 0, so it carries no discriminative signal in this sample.

Treatment: Drop; 98% null and only one unique OCR string provides no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1451]:

saturn.columns["ingredients_text_fr_ocr_1566920858_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 400.
Top values for ingredients_text_fr_ocr_1566920858_result.
Show data table
Top values for ingredients_text_fr_ocr_1566920858_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.12.0%

ingredients_text_fr_ocr_1573107556 categorical free_text

This is a French OCR-extracted ingredients list, almost certainly a timestamped snapshot column from an Open Food Facts-style export. The column is effectively empty: 98% null across 50 rows, with only a single non-null value present, so cardinality is 1 and entropy is 0. The lone observation is a long free-text French ingredients string, not a category, despite being typed as categorical.

Treatment: Drop; a single non-null value at 98% null rate carries no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1454]:

saturn.columns["ingredients_text_fr_ocr_1573107556"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 401.
Top values for ingredients_text_fr_ocr_1573107556.
Show data table
Top values for ingredients_text_fr_ocr_1573107556 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573108346_result categorical free_text

This appears to be the OCR-extracted French ingredients text from a specific scan run (timestamped 1573108346), holding raw label transcriptions like a bakery product's flour/sugar/butter list. Out of 50 rows, 98% are null and the single populated value is one long French ingredient string, giving cardinality 1 and entropy 0. The column is effectively empty for analytical purposes.

Treatment: Drop; 98% null with only one populated OCR string offers no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1457]:

saturn.columns["ingredients_text_fr_ocr_1573108346_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 402.
Top values for ingredients_text_fr_ocr_1573108346_result.
Show data table
Top values for ingredients_text_fr_ocr_1573108346_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573107560_result categorical free_text

This column appears to hold the OCR-extracted French ingredients text from a product image (timestamp 1573107560 in the name suggests a single OCR run). Of 50 rows, 98% are null and only 1 unique value exists — a single French ingredient list for what looks like a butter/egg pastry. With cardinality 1 and entropy 0, this column carries no discriminative signal in this sample.

Treatment: Drop or defer — 98% null and a single observed value provide no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1460]:

saturn.columns["ingredients_text_fr_ocr_1573107560_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 403.
Top values for ingredients_text_fr_ocr_1573107560_result.
Show data table
Top values for ingredients_text_fr_ocr_1573107560_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573108349_result categorical free_text

This column appears to hold the OCR-extracted French ingredients text for a product, tied to a specific OCR run timestamp (1573108349). It is essentially empty: 98% null across 50 rows, with only a single non-null value — a long French ingredients string for what looks like a butter/egg pastry. With cardinality 1 and entropy 0, it carries no discriminative signal in this sample.

Treatment: Drop; 98% null and only one distinct OCR string provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1463]:

saturn.columns["ingredients_text_fr_ocr_1573108349_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 404.
Top values for ingredients_text_fr_ocr_1573108349_result.
Show data table
Top values for ingredients_text_fr_ocr_1573108349_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573108360 categorical free_text

This appears to be a French-language OCR extraction of a product ingredients list, timestamped 1573108360 in the column name. Out of 50 rows, 98% are null and only a single non-null value exists (a single bakery product's ingredient declaration), giving cardinality 1 and entropy 0. The column is effectively empty and carries no discriminative signal.

Treatment: Drop; 98% null with a single OCR string offers no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1466]:

saturn.columns["ingredients_text_fr_ocr_1573108360"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 405.
Top values for ingredients_text_fr_ocr_1573108360.
Show data table
Top values for ingredients_text_fr_ocr_1573108360 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573109955_result categorical free_text

This column appears to hold the OCR result of a French ingredients list (timestamped 1573109955), capturing the parsed text from a product label. Of 50 rows, 98% are null and only a single non-null value exists — a long French ingredients string for a butter/egg pastry product. With cardinality 1 and entropy 0, it carries essentially no information at this sample size.

Treatment: Drop; 98% null with a single OCR string offers no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[1469]:

saturn.columns["ingredients_text_fr_ocr_1573109955_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 406.
Top values for ingredients_text_fr_ocr_1573109955_result.
Show data table
Top values for ingredients_text_fr_ocr_1573109955_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573108349 categorical free_text

This column appears to be a French OCR-extracted ingredients list (likely from food packaging), based on the column name and the single observed value containing French ingredient text like 'Farine de blé, sucre, beurre frais'. It is almost entirely empty: 98% null across n=50, with only one non-null record and cardinality of 1, giving zero entropy. With a single observation it carries no analytical signal in this sample.

Treatment: Drop; 98% null and only one unique value provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1472]:

saturn.columns["ingredients_text_fr_ocr_1573108349"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 407.
Top values for ingredients_text_fr_ocr_1573108349.
Show data table
Top values for ingredients_text_fr_ocr_1573108349 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573109955 categorical free_text

This appears to be an OCR-extracted French ingredients list (timestamped 1573109955), likely from a product packaging scan. The column is almost entirely empty: 98% null across 50 rows, with only a single non-null value present, giving cardinality 1 and entropy 0. That lone value is a long, noisy free-text string typical of raw OCR output rather than a clean categorical label.

Treatment: Drop; effectively empty with only one OCR string and no analytical signal.

anthropic:claude-opus-4-7 · confidence high
Out[1475]:

saturn.columns["ingredients_text_fr_ocr_1573109955"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 408.
Top values for ingredients_text_fr_ocr_1573109955.
Show data table
Top values for ingredients_text_fr_ocr_1573109955 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573107556_result categorical free_text

This appears to be the OCR result of a French ingredients label (timestamped 1573107556), capturing extracted text from a product image. Of 50 rows, 98% are null and only a single non-null value exists — one French ingredient list for a butter/egg/flour pastry. With cardinality 1 and entropy 0, the column carries essentially no analytical signal.

Treatment: Drop; 98% null with only one observed OCR string.

anthropic:claude-opus-4-7 · confidence high
Out[1478]:

saturn.columns["ingredients_text_fr_ocr_1573107556_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 409.
Top values for ingredients_text_fr_ocr_1573107556_result.
Show data table
Top values for ingredients_text_fr_ocr_1573107556_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573108360_result categorical free_text

This appears to be the OCR-extracted French ingredients text from a timestamped scan run (1573108360), holding raw label transcriptions. With 98% nulls and only 1 non-null value across 50 rows, it is effectively empty — the single populated entry is a long French ingredient list for a butter/egg pastry product. Cardinality of 1 and entropy of 0 mean it carries no discriminative signal here.

Treatment: Drop from modelling; if needed, merge with sibling OCR columns into a single ingredients_text field before NLP.

anthropic:claude-opus-4-7 · confidence high
Out[1481]:

saturn.columns["ingredients_text_fr_ocr_1573108360_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 410.
Top values for ingredients_text_fr_ocr_1573108360_result.
Show data table
Top values for ingredients_text_fr_ocr_1573108360_result (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2 - actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1573107560 categorical free_text

This appears to be an OCR-extracted French ingredients list (timestamped 1573107560), capturing the raw text from a product label. Out of 50 rows, 98% are null and only a single non-null value exists, an entry beginning 'Farine de blé, sucre, beurre frais 9,5%...'. With cardinality 1 and entropy 0, the column carries effectively no signal in this sample.

Treatment: Drop; 98% null with a single OCR string offers no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1484]:

saturn.columns["ingredients_text_fr_ocr_1573107560"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 411.
Top values for ingredients_text_fr_ocr_1573107560.
Show data table
Top values for ingredients_text_fr_ocr_1573107560 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , aeufs entiers frais, crème fraiche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- actylate de sodium, diacétyltartriques des mono et diglycérides d'acides désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque. Esters et mono gras), protéines de lait levure12.0%

ingredients_text_fr_ocr_1566920858 categorical free_text

This is a French-language OCR-extracted ingredients list, timestamped in the column name (1566920858), almost certainly from an Open Food Facts-style product dump. Out of 50 rows, 98% are null and only a single non-null value exists, a verbose ingredients string for a butter/egg pastry. With cardinality 1 and entropy 0, the column carries effectively no signal in this sample.

Treatment: Drop; 98% null and only one distinct OCR string provides no usable signal.

anthropic:claude-opus-4-7 · confidence high
Out[1487]:

saturn.columns["ingredients_text_fr_ocr_1566920858"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 412.
Top values for ingredients_text_fr_ocr_1566920858.
Show data table
Top values for ingredients_text_fr_ocr_1566920858 (1 unique shown, of 1 total).
valuecountshare
Farine de blé, sucre, beurre frais 9,5 % , oeufs entiers frais, crème fraîche 5,5% , levure, sel, arômes naturels (contient alcool), gluten de blé, poudre de lait écrémé, eau de vie, émulsifiants (Mono et diglycérides d'acides gras, Stéaroyl-2- lactylate de sodium, Esters et mono et diacétyltartriques des mono et diglycérides d'acides gras), protéines de lait, levure désactivée, colorant (béta carotène) Traces éventuelles de fruits à coque.12.0%

generic_name_lt categorical metadata

This appears to be a Lithuanian-locale generic name field, but it is effectively empty: 98% of the 50 rows are null and the single non-null value is the empty string. Cardinality is 1 and entropy is 0, so the column carries no information.

Treatment: Drop; the column is 98% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[1490]:

saturn.columns["generic_name_lt"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 413.
Top values for generic_name_lt.
Show data table
Top values for generic_name_lt (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_ro categorical free_text

A Romanian-language ingredients-with-allergens text field, almost entirely empty in this sample. 98% of rows are null and the only non-null value observed is an empty string, giving a single unique value across n=50.

Treatment: Drop; effectively no signal at this sample size.

anthropic:claude-opus-4-7 · confidence high
Out[1493]:

saturn.columns["ingredients_text_with_allergens_ro"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 414.
Top values for ingredients_text_with_allergens_ro.
Show data table
Top values for ingredients_text_with_allergens_ro (1 unique shown, of 1 total).
valuecountshare
12.0%

packaging_text_lt categorical free_text

Lithuanian packaging text field that is effectively empty in this sample: 98% null and the single non-null value is itself an empty string, giving cardinality 1 and zero entropy. There is no usable signal here.

Treatment: Drop; no observed content.

anthropic:claude-opus-4-7 · confidence high
Out[1496]:

saturn.columns["packaging_text_lt"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 415.
Top values for packaging_text_lt.
Show data table
Top values for packaging_text_lt (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_lt categorical free_text

This appears to be a Lithuanian-language ingredients text field, likely from a multilingual product catalog. It is effectively empty: 98% null across 50 rows, and the single non-null value is itself an empty string, giving cardinality 1 and zero entropy.

Treatment: Drop; no usable signal in this sample.

anthropic:claude-opus-4-7 · confidence high
Out[1499]:

saturn.columns["ingredients_text_lt"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 416.
Top values for ingredients_text_lt.
Show data table
Top values for ingredients_text_lt (1 unique shown, of 1 total).
valuecountshare
12.0%

origin_lt categorical metadata

The column 'origin_lt' is nearly entirely null, with a null_rate of 0.98 across 50 rows, leaving only a single non-null observation that is itself an empty string. With cardinality of 1, entropy of 0, and top_rate of 1.0, there is no usable signal here.

Treatment: Drop; effectively empty with no variance.

anthropic:claude-opus-4-7 · confidence high
Out[1502]:

saturn.columns["origin_lt"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 417.
Top values for origin_lt.
Show data table
Top values for origin_lt (1 unique shown, of 1 total).
valuecountshare
12.0%

product_name_lt categorical metadata

This column appears to be a Lithuanian-localized product name field, but it is effectively empty: 98% of the 50 rows are null and the single non-null value is itself an empty string. Cardinality is 1 with zero entropy, so it carries no information.

Treatment: Drop; the column is 98% null with a single empty-string value.

anthropic:claude-opus-4-7 · confidence high
Out[1505]:

saturn.columns["product_name_lt"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 418.
Top values for product_name_lt.
Show data table
Top values for product_name_lt (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_with_allergens_lt categorical free_text

This appears to be a Lithuanian-language ingredients text field with allergen annotations, but it is effectively empty in this sample. 98% of rows are null and the only non-null value observed is the empty string, giving zero entropy across n=50.

Treatment: Drop from analysis; insufficient non-null content to model.

anthropic:claude-opus-4-7 · confidence high
Out[1508]:

saturn.columns["ingredients_text_with_allergens_lt"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 419.
Top values for ingredients_text_with_allergens_lt.
Show data table
Top values for ingredients_text_with_allergens_lt (1 unique shown, of 1 total).
valuecountshare
12.0%

ingredients_text_fr_ocr_1713713129 categorical free_text

This appears to be a French-language OCR extract of an ingredients list (chocolate product), captured at a specific timestamp suggested by the column suffix. Out of 50 rows, 98% are null and only 1 non-null value exists, making the column effectively a single-record artifact rather than a usable feature. The lone value is free-form text describing cacao paste, cocoa butter, sugar, milk powder, and allergen traces.

Treatment: Drop; 98% null and only one observed value provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1511]:

saturn.columns["ingredients_text_fr_ocr_1713713129"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Ingrédients : Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 420.
Top values for ingredients_text_fr_ocr_1713713129.
Show data table
Top values for ingredients_text_fr_ocr_1713713129 (1 unique shown, of 1 total).
valuecountshare
Ingrédients : Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.12.0%

ingredients_text_fr_ocr_1713713129_result categorical free_text

This appears to be the OCR result of a French ingredients list, captured at a single timestamp (1713713129) and stored as raw text. With null_rate 0.98, only 1 of 50 rows has a value, and that single observation is a chocolate ingredients statement (cocoa paste, cocoa powder, almonds, hazelnuts, soy lecithin). Cardinality is 1 and entropy is 0, so there is no variation to model from this column alone.

Treatment: Drop; 98% null and only one distinct OCR string provides no signal.

anthropic:claude-opus-4-7 · confidence high
Out[1514]:

saturn.columns["ingredients_text_fr_ocr_1713713129_result"].stats

statvalue
n50
nulls49 (98.0%)
unique1
top_value Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.
top_rate 1
cardinality 1
entropy 0
entropy_ratio 0
alert: long_tail1 singleton categories
alert: null_rate98.0% null
alert: imbalancetop value is 100.0% of rows
Fig 421.
Top values for ingredients_text_fr_ocr_1713713129_result.
Show data table
Top values for ingredients_text_fr_ocr_1713713129_result (1 unique shown, of 1 total).
valuecountshare
Pâte de cacao, cacao en poudre dégraissé, beurre de cacao, sucre, lait en poudre, pâte de amandes et de noisettes, émulsifiants (lécithines (soja, toumesol)) et arôme. Cacao 92% minimum. Peut contenir des traces d'autres fruits à coque.12.0%

How to cite

click to copy

BibTeX
@misc{saturn-wild-openfoodfacts-sample-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: wild openfoodfacts sample},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/wild-openfoodfacts_sample}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: wild openfoodfacts sample. Source: /home/coolhand/html/datavis/data_trove/cache/wild/openfoodfacts_sample.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/wild-openfoodfacts_sample