saturn

/home/coolhand/html/datavis/data_trove/data/quirky/chocolate_origins.json 2,530 rows sample n=2,530 seed 42 2026-06-21T23:52:45+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/chocolate_origins.json
Total rows2,530
Profiled sample2,530
Columns10
Generated2026-06-21T23:52:45+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
refcategorical0.0%
companycategorical0.0%
company_locationcategorical0.0%
review_datenumeric0.0%
country_of_bean_origincategorical0.0%
specific_bean_origintext0.0%
cocoa_percentnumeric0.0%
ingredientscategorical0.0%
most_memorable_characteristicstext0.0%
ratingnumeric0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset contains 2,530 chocolate bar reviews covering bean origins, cocoa percentages, ingredients, and expert ratings across reviews dated 2006–2021. Two things stand out: first, cocoa percent clusters tightly between 70–74% but has 235 outliers (9.3%) stretching up to 100%, suggesting a small but notable group of ultra-dark bars worth investigating. Second, ratings skew modestly negative with a mean of 3.20 and median of 3.25 out of 4.0, indicating most bars are rated good-to-very-good — but the distribution of scores by bean origin (Venezuela, Peru, Dominican Republic, and Ecuador dominate) could reveal whether provenance drives quality. The 'company' column is entirely blank and should be ignored.

specific_bean_origin high anthropic:default

This column captures the specific geographic or farm-level origin of cacao beans used in chocolate production, ranging from country-level names (Madagascar, Ecuador, Peru) to named estates and cooperatives (Kokoa Kamili, Chuao, Sambirano). The duplicate rate of 36.6% is expected for a categorical-like origin field with 1,605 unique values out of 2,530 rows, but the top word 'batch' appearing 356 times is surprising — nearly 14% of entries reference a batch identifier, suggesting some values encode both origin and batch metadata in a single field. One-word entries account for 33.8% of values (country-level origins), while multi-word entries average ~2.7 words, reflecting finer geographic or supplier granularity.

cocoa_percent high anthropic:default

This column records cocoa percentage for chocolate products, ranging from 42% to 100% across 2,530 rows with no nulls and only 46 distinct values. The distribution is tightly clustered — Q1 and median both sit at 70%, Q3 at 74%, giving an IQR of just 4 — but is right-skewed (skew 1.20) with high kurtosis (6.54), driven by 235 outliers (9.3%) that stretch toward extreme values like 100%. The narrow IQR relative to the full range (42–100) suggests most chocolates fall in a standard dark-chocolate band, with a long tail of unusually high-cocoa products pulling the mean (71.64) above the median.

company high anthropic:default

This column is intended to capture a company name but contains a single blank string across all 2,530 rows — it is effectively empty. Cardinality is 1, entropy is 0, and the top value is an empty string with a 100% hit rate, meaning the field was never populated. This is a completely uninformative column with zero analytical value.

most_memorable_characteristics high anthropic:default

This column contains short, comma-separated flavor/texture descriptor phrases for what appears to be a chocolate or confectionery dataset — top words include 'cocoa', 'sweet', 'nutty', 'creamy', 'sandy', and 'fatty'. With 2487 unique values out of 2530 rows and a mean of ~3.4 words per entry (median 23 characters), entries are brief multi-attribute tags rather than free prose, yet near-uniqueness is triggered by the combinatorial variety of descriptors. Only 43 duplicates exist across 2530 rows (1.7% duplicate rate), and the vocabulary of 868 words suggests a constrained but richly combined descriptor lexicon.

rating high anthropic:default

This column is a discrete rating scale, almost certainly a user or product rating, with only 12 distinct values across 2,530 records and no nulls. The range is 1.0–4.0 (notably not the common 1–5 or 1–10 scale), suggesting a 4-point Likert or star-rating system. The distribution is left-skewed (skew = -0.608) and tightly clustered — IQR of just 0.5, with Q1=3.0 and Q3=3.5 — indicating a strong ceiling effect where most responses pile up near the top. Only 50 outliers (1.98%) exist, likely low ratings near 1.0.

review_date high anthropic:default

This column contains review years, stored as numeric integers spanning 2006 to 2021 — a 16-year range with only 16 distinct values, confirming it is a year-granularity timestamp rather than a full date. The distribution is nearly symmetric (skew −0.18, kurtosis −0.77) with a median of 2015 and an IQR of 6 years, suggesting fairly even coverage across the mid-2010s. Notably, 2530 rows collapse into just 16 discrete year values, meaning this field carries no finer temporal resolution and may limit time-series analyses that require month- or day-level precision.

company_location high anthropic:default

This column encodes the country of company headquarters across 2,530 records, with 67 distinct country values and zero nulls. The distribution is heavily US-dominated: 'U.S.A.' alone accounts for 44.9% of all rows (1,136 of 2,530), nearly 6.4× the next most frequent country (Canada at 177). The entropy ratio of 0.606 confirms moderate-to-high concentration despite 67 categories, and the presence of both abbreviations ('U.S.A.', 'U.K.') and full names ('Canada', 'France') suggests inconsistent formatting that may complicate grouping or joining.

country_of_bean_origin high anthropic:default

This column records the country of origin for cacao beans used in chocolate production, covering 62 distinct origins across 2,530 rows with no nulls. The distribution is fairly broad (entropy ratio 0.79), with Venezuela leading at exactly 10% (253 rows), followed closely by Peru (244) and Dominican Republic (226) — no single origin dominates heavily. Notably, 'Blend' appears as a pseudo-origin with 156 entries, meaning ~6% of records are multi-origin mixtures rather than single-country sourced beans, which may need special handling in origin-based analyses.

ingredients high anthropic:default

This column encodes a structured ingredient combination label for each record, consisting of a count prefix (e.g., '3-') followed by abbreviated ingredient codes (B, S, C, L, V, Sa). With only 22 distinct values across 2,530 rows it functions as a categorical feature rather than free text. Notably, 87 rows carry an empty string despite a reported null_rate of 0.0, which is a hidden missingness issue that needs handling. The top value '3- B,S,C' dominates at ~39.5% of rows, and starred variants (e.g., 'B,S*') suggest a meaningful sub-type modifier that distinguishes at least some categories.

ref medium anthropic:default

This column appears to be a numeric reference or ID code stored as a categorical string, likely a ticket number, document reference, or external record identifier. With 630 unique values across 2,530 rows, the average reuse rate is ~4 rows per value, and the entropy ratio of 0.9954 is nearly maximal, indicating an almost-uniform distribution with no dominant category. The most frequent value ('414') appears only 10 times (top_rate ≈ 0.004), confirming no single reference dominates—but the non-unique nature rules out a pure primary key, suggesting these are foreign references that recur legitimately.

Numeric correlation

Show data table
Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
review_datecocoa_percentrating
review_date+1.00+0.09+0.13
cocoa_percent+0.09+1.00-0.14
rating+0.13-0.14+1.00

ref categorical

rows2,530
null0 (0.0%)
unique630
top_value414
top_rate3.95e-03
cardinality630
entropy9.257
entropy_ratio0.995
Show data table
Top values for ref (20 unique shown, of 630 total).
valuecountshare
414100.4%
2490.4%
40490.4%
38790.4%
146280.3%
145480.3%
43180.3%
43980.3%
145080.3%
55280.3%
145880.3%
146680.3%
37070.3%
50270.3%
63670.3%
57270.3%
35570.3%
48670.3%
47870.3%
37770.3%
Top values (rank 1–20)
  1. 414 — 10
  2. 24 — 9
  3. 404 — 9
  4. 387 — 9
  5. 1462 — 8
  6. 1454 — 8
  7. 431 — 8
  8. 439 — 8
  9. 1450 — 8
  10. 552 — 8
  11. 1458 — 8
  12. 1466 — 8
  13. 370 — 7
  14. 502 — 7
  15. 636 — 7
  16. 572 — 7
  17. 355 — 7
  18. 486 — 7
  19. 478 — 7
  20. 377 — 7

company categorical

top value is 100.0% of rows
rows2,530
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Show data table
Top values for company (1 unique shown, of 1 total).
valuecountshare
2530100.0%
Top values (rank 1–20)
  1. — 2,530

company_location categorical

rows2,530
null0 (0.0%)
unique67
top_valueU.S.A.
top_rate0.449
cardinality67
entropy3.675
entropy_ratio0.606
Show data table
Top values for company_location (20 unique shown, of 67 total).
valuecountshare
U.S.A.113644.9%
Canada1777.0%
France1767.0%
U.K.1335.3%
Italy783.1%
Belgium632.5%
Ecuador582.3%
Australia532.1%
Switzerland441.7%
Germany421.7%
Spain361.4%
Venezuela311.2%
Japan311.2%
Denmark311.2%
Austria301.2%
Colombia291.1%
New Zealand271.1%
Hungary261.0%
Brazil251.0%
Peru230.9%
Top values (rank 1–20)
  1. U.S.A. — 1,136
  2. Canada — 177
  3. France — 176
  4. U.K. — 133
  5. Italy — 78
  6. Belgium — 63
  7. Ecuador — 58
  8. Australia — 53
  9. Switzerland — 44
  10. Germany — 42
  11. Spain — 36
  12. Venezuela — 31
  13. Japan — 31
  14. Denmark — 31
  15. Austria — 30
  16. Colombia — 29
  17. New Zealand — 27
  18. Hungary — 26
  19. Brazil — 25
  20. Peru — 23

review_date numeric

rows2,530
null0 (0.0%)
unique16
min2,006
max2,021
mean2,014
median2,015
std3.968
q12,012
q32,018
iqr6.000
skew-0.183
kurtosis-0.773
n_outliers0
outlier_rate0.000
zero_rate0.000
Show data table
Histogram bins for review_date (median: 2015.0).
bincount
2006 – 200662
2006 – 20070
2007 – 200773
2007 – 20080
2008 – 20080
2008 – 200892
2008 – 20090
2009 – 20090
2009 – 2009123
2009 – 20100
2010 – 2010110
2010 – 20100
2010 – 20110
2011 – 2011163
2011 – 20120
2012 – 20120
2012 – 2012194
2012 – 20130
2013 – 2013183
2013 – 20140
2014 – 20140
2014 – 2014247
2014 – 20150
2015 – 20150
2015 – 2015284
2015 – 20160
2016 – 2016217
2016 – 20160
2016 – 20170
2017 – 2017105
2017 – 20180
2018 – 20180
2018 – 2018228
2018 – 20190
2019 – 2019193
2019 – 20200
2020 – 20200
2020 – 202081
2020 – 20210
2021 – 2021175

country_of_bean_origin categorical

rows2,530
null0 (0.0%)
unique62
top_valueVenezuela
top_rate0.100
cardinality62
entropy4.717
entropy_ratio0.792
Show data table
Top values for country_of_bean_origin (20 unique shown, of 62 total).
valuecountshare
Venezuela25310.0%
Peru2449.6%
Dominican Republic2268.9%
Ecuador2198.7%
Madagascar1777.0%
Blend1566.2%
Nicaragua1004.0%
Bolivia803.2%
Tanzania793.1%
Colombia793.1%
Brazil783.1%
Belize763.0%
Vietnam732.9%
Guatemala622.5%
Mexico552.2%
Papua New Guinea502.0%
Costa Rica431.7%
Trinidad421.7%
Ghana411.6%
India351.4%
Top values (rank 1–20)
  1. Venezuela — 253
  2. Peru — 244
  3. Dominican Republic — 226
  4. Ecuador — 219
  5. Madagascar — 177
  6. Blend — 156
  7. Nicaragua — 100
  8. Bolivia — 80
  9. Tanzania — 79
  10. Colombia — 79
  11. Brazil — 78
  12. Belize — 76
  13. Vietnam — 73
  14. Guatemala — 62
  15. Mexico — 55
  16. Papua New Guinea — 50
  17. Costa Rica — 43
  18. Trinidad — 42
  19. Ghana — 41
  20. India — 35

specific_bean_origin text

33.8% rows are a single word 36.6% duplicate strings
rows2,530
null0 (0.0%)
unique1,605
len_min3
len_max51
len_mean17.115
len_median14.000
len_p9539.000
word_mean2.681
word_median2.000
n_empty0
n_duplicates925
duplicate_rate0.366
vocab_size2,079
readability_flesch_mean28.412
emoji_rate0.000
url_rate0.000
one_word_rate0.338
allcaps_rate1.58e-03
boilerplate_rate0.000
Show data table
Character-length distribution for specific_bean_origin (mean: 17.115415019762846).
charscount
3 – 486
4 – 5106
5 – 7152
7 – 8211
8 – 9142
9 – 10306
10 – 1160
11 – 1371
13 – 1479
14 – 1554
15 – 16143
16 – 1773
17 – 1998
19 – 2065
20 – 2164
21 – 22119
22 – 2348
23 – 2557
25 – 2642
26 – 2745
27 – 2884
28 – 2937
29 – 3139
31 – 3235
32 – 3329
33 – 3458
34 – 3518
35 – 3723
37 – 3830
38 – 3921
39 – 4033
40 – 4113
41 – 4316
43 – 4414
44 – 4513
45 – 4623
46 – 478
47 – 4912
49 – 501
50 – 512
Sample values (first 10)
  1. Matasawalevu, batch 1
  2. Crayfish Bay Estate, 2014
  3. Hawai'i Island, Big Island
  4. Kokoa Kamili Coop
  5. Duarte Province, El Cibao, batch 10
  6. Maya Mountain, 2017, batch 255
  7. Campesino w/ nibs
  8. Amazonas
  9. Ghana
  10. Jamaica

cocoa_percent numeric

9.3% rows beyond 1.5 IQR
rows2,530
null0 (0.0%)
unique46
min42.000
max100.000
mean71.640
median70.000
std5.617
q170.000
q374.000
iqr4.000
skew1.198
kurtosis6.541
n_outliers235
outlier_rate0.093
zero_rate0.000
Show data table
Histogram bins for cocoa_percent (median: 70.0).
bincount
42 – 43.451
43.45 – 44.90
44.9 – 46.351
46.35 – 47.80
47.8 – 49.250
49.25 – 50.71
50.7 – 52.150
52.15 – 53.61
53.6 – 55.0516
55.05 – 56.52
56.5 – 57.951
57.95 – 59.48
59.4 – 60.8547
60.85 – 62.323
62.3 – 63.7514
63.75 – 65.2124
65.2 – 66.6528
66.65 – 68.1106
68.1 – 69.5513
69.55 – 711046
71 – 72.45340
72.45 – 73.972
73.9 – 75.35377
75.35 – 76.835
76.8 – 78.2563
78.25 – 79.72
79.7 – 81.1595
81.15 – 82.618
82.6 – 84.059
84.05 – 85.540
85.5 – 86.951
86.95 – 88.49
88.4 – 89.852
89.85 – 91.312
91.3 – 92.750
92.75 – 94.20
94.2 – 95.650
95.65 – 97.10
97.1 – 98.550
98.55 – 10023

ingredients categorical

rows2,530
null0 (0.0%)
unique22
top_value3- B,S,C
top_rate0.395
cardinality22
entropy2.430
entropy_ratio0.545
Show data table
Top values for ingredients (20 unique shown, of 22 total).
valuecountshare
3- B,S,C99939.5%
2- B,S71828.4%
4- B,S,C,L28611.3%
5- B,S,C,V,L1847.3%
4- B,S,C,V1415.6%
873.4%
2- B,S*311.2%
4- B,S*,C,Sa200.8%
3- B,S*,C120.5%
3- B,S,L80.3%
4- B,S*,C,V70.3%
5-B,S,C,V,Sa60.2%
1- B60.2%
4- B,S,V,L50.2%
4- B,S,C,Sa50.2%
6-B,S,C,V,L,Sa40.2%
3- B,S,V30.1%
4- B,S*,V,L30.1%
4- B,S*,C,L20.1%
3- B,S*,Sa10.0%
Top values (rank 1–20)
  1. 3- B,S,C — 999
  2. 2- B,S — 718
  3. 4- B,S,C,L — 286
  4. 5- B,S,C,V,L — 184
  5. 4- B,S,C,V — 141
  6. — 87
  7. 2- B,S* — 31
  8. 4- B,S*,C,Sa — 20
  9. 3- B,S*,C — 12
  10. 3- B,S,L — 8
  11. 4- B,S*,C,V — 7
  12. 5-B,S,C,V,Sa — 6
  13. 1- B — 6
  14. 4- B,S,V,L — 5
  15. 4- B,S,C,Sa — 5
  16. 6-B,S,C,V,L,Sa — 4
  17. 3- B,S,V — 3
  18. 4- B,S*,V,L — 3
  19. 4- B,S*,C,L — 2
  20. 3- B,S*,Sa — 1

most_memorable_characteristics text

98.3% of rows are unique strings
rows2,530
null0 (0.0%)
unique2,487
len_min3
len_max37
len_mean23.062
len_median23.000
len_p9530.000
word_mean3.376
word_median3.000
n_empty0
n_duplicates43
duplicate_rate0.017
vocab_size868
readability_flesch_mean49.707
emoji_rate0.000
url_rate0.000
one_word_rate7.11e-03
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for most_memorable_characteristics (mean: 23.062450592885376).
charscount
3 – 41
4 – 50
5 – 64
6 – 64
6 – 73
7 – 81
8 – 90
9 – 102
10 – 113
11 – 129
12 – 1239
12 – 1347
13 – 1452
14 – 150
15 – 1629
16 – 1734
17 – 1769
17 – 18100
18 – 19145
19 – 200
20 – 21206
21 – 22206
22 – 23156
23 – 23173
23 – 24179
24 – 25192
25 – 260
26 – 27201
27 – 28179
28 – 28178
28 – 29121
29 – 3086
30 – 3166
31 – 320
32 – 3323
33 – 3413
34 – 343
34 – 354
35 – 361
36 – 371
Sample values (first 10)
  1. chewy, off, rubbery
  2. dark berry, mild floral
  3. intense, tannic, choco, earthy
  4. basic cocoa, gateway
  5. dried fruit, orange peel, cocoa
  6. flat, molasses, creamy
  7. XL nibs, sour, cardboard
  8. blackberry, dirt, high roast
  9. sweet, vanilla, cocoa, mold
  10. sandy, woody, spicy

rating numeric

rows2,530
null0 (0.0%)
unique12
min1.000
max4.000
mean3.196
median3.250
std0.445
q13.000
q33.500
iqr0.500
skew-0.608
kurtosis1.053
n_outliers50
outlier_rate0.020
zero_rate0.000
Show data table
Histogram bins for rating (median: 3.25).
bincount
1 – 1.0754
1.075 – 1.150
1.15 – 1.2250
1.225 – 1.30
1.3 – 1.3750
1.375 – 1.450
1.45 – 1.52510
1.525 – 1.60
1.6 – 1.6750
1.675 – 1.750
1.75 – 1.8253
1.825 – 1.90
1.9 – 1.9750
1.975 – 2.0533
2.05 – 2.1250
2.125 – 2.20
2.2 – 2.27517
2.275 – 2.350
2.35 – 2.4250
2.425 – 2.50
2.5 – 2.575166
2.575 – 2.650
2.65 – 2.7250
2.725 – 2.8333
2.8 – 2.8750
2.875 – 2.950
2.95 – 3.025523
3.025 – 3.10
3.1 – 3.1750
3.175 – 3.250
3.25 – 3.325464
3.325 – 3.40
3.4 – 3.4750
3.475 – 3.55565
3.55 – 3.6250
3.625 – 3.70
3.7 – 3.775300
3.775 – 3.850
3.85 – 3.9250
3.925 – 4112