{"columns":[{"alerts":[{"code":"long_tail","level":"info","message":"6 singleton categories"},{"code":"null_rate","level":"warn","message":"96.9% null"}],"column":"__UNNAMED__0","extras":{"singletons":6,"top_values":[["Global Health Estimates 2021 Summary Tables\n\nThis workbook contains summary burden of disease estimates from the WHO Global Health Estimates (GHE). The estimates are based on analysis of latest available national information on levels of mortality and cause distributions as of the end of 2023 together with latest available information from WHO programs for causes of public health importance. Data, methods and cause categories are described in a Technical Paper (1) available on the WHO website.  Population estimates are from the 2022 revision of the UN World Population Prospects (2).\n\nThis spreadsheet includes estimates for disability-adjusted life year (DALY) by WHO region and by cause, age and sex, for the years 2000, 2010, 2015, 2019, 2020 and 2021. Documentation, country-level and regional-level summary tables are available on the WHO website ( https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates ). Depending on the available data sources, the cause-specific estimates will have quite substantial uncertainty ranges. Due to changes in data and some methods, these estimates are not comparable to previously-released WHO estimates. \n\nThe preparation of these statistics was undertaken by the WHO Department of Data and Analytics, in collaboration with WHO technical programs. For further queries, please send an email to healthstat@who.int .\n\nReferences:\n(1) WHO methods and data sources for global burden of disease 2000-2021. Global Health Estimates Technical Paper WHO/DDI/DNA/GHE/2020.3.Geneva: World Health Organization; 2024 (https://www.who.int/docs/default-source/gho-documents/global-health-estimates/GlobalBurden_method_2000_2021.pdf).\n(2) World Population Prospects: The 2019 revision. New York: United Nations, Department of Economic and Social Affairs, Population Division; 2019 (https://esa.un.org/unpd/wpp/).\n",1],["Recommended citation:",1],["Global Health Estimates 2021: Disease burden by Cause, Age, Sex, by Country and by Region, 2000-2021. Geneva, World Health Organization; 2024.",1],["List of Countries",1],["Note: WHO Member States with a population of less than 90,000 in 2021 were not included in the analysis.",1],["Countries, areas or territories included",1]]},"kind":"categorical","n":196,"n_null":190,"n_unique":6,"null_rate":0.9693877551020408,"stats":{"cardinality":6,"entropy":2.584962500721156,"entropy_ratio":1.0,"top_rate":0.16666666666666666,"top_value":"Global Health Estimates 2021 Summary Tables\n\nThis workbook contains summary burden of disease estimates from the WHO Global Health Estimates (GHE). The estimates are based on analysis of latest available national information on levels of mortality and cause distributions as of the end of 2023 together with latest available information from WHO programs for causes of public health importance. Data, methods and cause categories are described in a Technical Paper (1) available on the WHO website.  Population estimates are from the 2022 revision of the UN World Population Prospects (2).\n\nThis spreadsheet includes estimates for disability-adjusted life year (DALY) by WHO region and by cause, age and sex, for the years 2000, 2010, 2015, 2019, 2020 and 2021. Documentation, country-level and regional-level summary tables are available on the WHO website ( https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates ). Depending on the available data sources, the cause-specific estimates will have quite substantial uncertainty ranges. Due to changes in data and some methods, these estimates are not comparable to previously-released WHO estimates. \n\nThe preparation of these statistics was undertaken by the WHO Department of Data and Analytics, in collaboration with WHO technical programs. For further queries, please send an email to healthstat@who.int .\n\nReferences:\n(1) WHO methods and data sources for global burden of disease 2000-2021. Global Health Estimates Technical Paper WHO/DDI/DNA/GHE/2020.3.Geneva: World Health Organization; 2024 (https://www.who.int/docs/default-source/gho-documents/global-health-estimates/GlobalBurden_method_2000_2021.pdf).\n(2) World Population Prospects: The 2019 revision. New York: United Nations, Department of Economic and Social Affairs, Population Division; 2019 (https://esa.un.org/unpd/wpp/).\n"}},{"alerts":[{"code":"long_tail","level":"info","message":"190 singleton categories"}],"column":"GLOBAL HEALTH ESTIMATES 2021 SUMMARY TABLES:","extras":{"singletons":190,"top_values":[["DALYs BY CAUSE, AGE AND SEX, BY WHO REGION, 2000-2021",1],["July 2024",1],["World Health Organization",1],["Geneva, Switzerland",1],["https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates",1],["Afghanistan",1],["Albania",1],["Algeria",1],["Angola",1],["Antigua and Barbuda",1],["Argentina",1],["Armenia",1],["Australia",1],["Austria",1],["Azerbaijan",1],["Bahamas",1],["Bahrain",1],["Bangladesh",1],["Barbados",1],["Belarus",1]]},"kind":"categorical","n":196,"n_null":6,"n_unique":190,"null_rate":0.030612244897959183,"stats":{"cardinality":190,"entropy":7.569855608330949,"entropy_ratio":1.0000000000000002,"top_rate":0.005263157894736842,"top_value":"DALYs BY CAUSE, AGE AND SEX, BY WHO REGION, 2000-2021"}}],"insights":{"errors":[],"insights":[{"confidence":"high","critiques":[],"evidence_keys":["row_count","column_count","columns[0].null_rate","columns[0].n_unique","columns[0].top_values","columns[1].n_unique","columns[1].null_rate","columns[1].top_values"],"featured_charts":[{"caption":"String-length distribution separates short country names from longer title/URL lines.","column":"GLOBAL HEALTH ESTIMATES 2021 SUMMARY TABLES:","kind":"length"},{"caption":"Top values show this column is mostly a country roster (Afghanistan, Albania, Algeria, ...) with a handful of header rows.","column":"GLOBAL HEALTH ESTIMATES 2021 SUMMARY TABLES:","kind":"bar"},{"caption":"Null vs non-null share makes clear ~97% of this column is empty \u2014 it only holds six section headers.","column":"__UNNAMED__0","kind":"donut"},{"caption":"Length view highlights one very long descriptive paragraph dwarfing the other short header strings.","column":"__UNNAMED__0","kind":"length"}],"model":"anthropic:claude-opus-4-7","narrative":"This is the 'Notes' sheet from the WHO Global Health Estimates 2021 workbook on DALYs by cause, age and sex, by WHO region, 2000-2021. It is essentially a metadata and country-listing tab rather than analytical data: 196 rows across just two columns. The first column (__UNNAMED__0) is 96.94% null and only carries six header/citation strings, while the second column holds 190 mostly unique entries \u2014 predominantly the list of WHO Member States plus a few title and source lines. Treat this sheet as documentation; the real DALY figures live on other sheets of the workbook.","scope":"dataset","target":"__global__"},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","top_value","top_values","stats.cardinality"],"model":"anthropic:claude-opus-4-7","narrative":"This unnamed column appears to be spreadsheet header/preamble text from a WHO Global Health Estimates 2021 workbook, likely the first column of an Excel sheet read without a header row. Of 196 rows, 96.94% are null and only 6 unique values exist, each appearing once \u2014 these are documentation strings (citation, sheet titles, methodology notes) rather than analytical data. The column carries no tabular signal; it is metadata bleed-through from the source file.","role":"metadata","scope":"column","target":"__UNNAMED__0","treatment":"Drop; this is sheet preamble, not data \u2014 re-read the source with the correct header row."},{"confidence":"high","critiques":[],"evidence_keys":["n","n_unique","null_rate","stats.cardinality","stats.entropy_ratio","stats.top_rate","stats.top_value","top_values","alerts"],"model":"anthropic:claude-opus-4-7","narrative":"This column appears to be a free-text leftmost label from a WHO summary table, mixing report metadata (title \"DALYs BY CAUSE, AGE AND SEX, BY WHO REGION, 2000-2021\", \"July 2024\", \"World Health Organization\", \"Geneva, Switzerland\", a URL) with country names (Afghanistan, Albania, Algeria...). With 190 unique values across 196 rows and entropy_ratio 1.0, every entry is essentially distinct, and the long_tail alert plus 3.06% nulls confirm it is not a clean categorical. The header rows bleeding into the data are the real surprise \u2014 this column was never normalized after import.","role":"label","scope":"column","target":"GLOBAL HEALTH ESTIMATES 2021 SUMMARY TABLES:","treatment":"Strip the leading metadata rows and rename this column to 'country' before joining or analysis."}],"providers":["anthropic:claude-opus-4-7"],"total_usage":{"completion_tokens":1200,"prompt_tokens":5807,"total_tokens":7007}},"language_counts":{},"meta":{"generated_at":"2026-05-01T18:07:45+00:00","mode":"full","row_count":196,"sampled_rows":196,"seed":42,"source":"/home/coolhand/html/datavis/data_trove/data/accessibility/.cache_who/daly_region.xlsx#Notes"},"notes":[],"saturn_version":"0.2.0","schema":{"GLOBAL HEALTH ESTIMATES 2021 SUMMARY TABLES:":"categorical","__UNNAMED__0":"categorical"}}
