saturn

/home/coolhand/html/datavis/data_trove/data/quirky/chocolate_origins.json 2,530 rows sample n=2,530 seed 42 2026-06-21T23:52:45+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/data/quirky/chocolate_origins.json
Total rows	2,530
Profiled sample	2,530
Columns	10
Generated	2026-06-21T23:52:45+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
ref	categorical	0.0%
company	categorical	0.0%
company_location	categorical	0.0%
review_date	numeric	0.0%
country_of_bean_origin	categorical	0.0%
specific_bean_origin	text	0.0%
cocoa_percent	numeric	0.0%
ingredients	categorical	0.0%
most_memorable_characteristics	text	0.0%
rating	numeric	0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset contains 2,530 chocolate bar reviews covering bean origins, cocoa percentages, ingredients, and expert ratings across reviews dated 2006–2021. Two things stand out: first, cocoa percent clusters tightly between 70–74% but has 235 outliers (9.3%) stretching up to 100%, suggesting a small but notable group of ultra-dark bars worth investigating. Second, ratings skew modestly negative with a mean of 3.20 and median of 3.25 out of 4.0, indicating most bars are rated good-to-very-good — but the distribution of scores by bean origin (Venezuela, Peru, Dominican Republic, and Ecuador dominate) could reveal whether provenance drives quality. The 'company' column is entirely blank and should be ignored.

specific_bean_origin high anthropic:default

This column captures the specific geographic or farm-level origin of cacao beans used in chocolate production, ranging from country-level names (Madagascar, Ecuador, Peru) to named estates and cooperatives (Kokoa Kamili, Chuao, Sambirano). The duplicate rate of 36.6% is expected for a categorical-like origin field with 1,605 unique values out of 2,530 rows, but the top word 'batch' appearing 356 times is surprising — nearly 14% of entries reference a batch identifier, suggesting some values encode both origin and batch metadata in a single field. One-word entries account for 33.8% of values (country-level origins), while multi-word entries average ~2.7 words, reflecting finer geographic or supplier granularity.

cocoa_percent high anthropic:default

This column records cocoa percentage for chocolate products, ranging from 42% to 100% across 2,530 rows with no nulls and only 46 distinct values. The distribution is tightly clustered — Q1 and median both sit at 70%, Q3 at 74%, giving an IQR of just 4 — but is right-skewed (skew 1.20) with high kurtosis (6.54), driven by 235 outliers (9.3%) that stretch toward extreme values like 100%. The narrow IQR relative to the full range (42–100) suggests most chocolates fall in a standard dark-chocolate band, with a long tail of unusually high-cocoa products pulling the mean (71.64) above the median.

company high anthropic:default

This column is intended to capture a company name but contains a single blank string across all 2,530 rows — it is effectively empty. Cardinality is 1, entropy is 0, and the top value is an empty string with a 100% hit rate, meaning the field was never populated. This is a completely uninformative column with zero analytical value.

most_memorable_characteristics high anthropic:default

This column contains short, comma-separated flavor/texture descriptor phrases for what appears to be a chocolate or confectionery dataset — top words include 'cocoa', 'sweet', 'nutty', 'creamy', 'sandy', and 'fatty'. With 2487 unique values out of 2530 rows and a mean of ~3.4 words per entry (median 23 characters), entries are brief multi-attribute tags rather than free prose, yet near-uniqueness is triggered by the combinatorial variety of descriptors. Only 43 duplicates exist across 2530 rows (1.7% duplicate rate), and the vocabulary of 868 words suggests a constrained but richly combined descriptor lexicon.

rating high anthropic:default

This column is a discrete rating scale, almost certainly a user or product rating, with only 12 distinct values across 2,530 records and no nulls. The range is 1.0–4.0 (notably not the common 1–5 or 1–10 scale), suggesting a 4-point Likert or star-rating system. The distribution is left-skewed (skew = -0.608) and tightly clustered — IQR of just 0.5, with Q1=3.0 and Q3=3.5 — indicating a strong ceiling effect where most responses pile up near the top. Only 50 outliers (1.98%) exist, likely low ratings near 1.0.

review_date high anthropic:default

This column contains review years, stored as numeric integers spanning 2006 to 2021 — a 16-year range with only 16 distinct values, confirming it is a year-granularity timestamp rather than a full date. The distribution is nearly symmetric (skew −0.18, kurtosis −0.77) with a median of 2015 and an IQR of 6 years, suggesting fairly even coverage across the mid-2010s. Notably, 2530 rows collapse into just 16 discrete year values, meaning this field carries no finer temporal resolution and may limit time-series analyses that require month- or day-level precision.

company_location high anthropic:default

This column encodes the country of company headquarters across 2,530 records, with 67 distinct country values and zero nulls. The distribution is heavily US-dominated: 'U.S.A.' alone accounts for 44.9% of all rows (1,136 of 2,530), nearly 6.4× the next most frequent country (Canada at 177). The entropy ratio of 0.606 confirms moderate-to-high concentration despite 67 categories, and the presence of both abbreviations ('U.S.A.', 'U.K.') and full names ('Canada', 'France') suggests inconsistent formatting that may complicate grouping or joining.

country_of_bean_origin high anthropic:default

This column records the country of origin for cacao beans used in chocolate production, covering 62 distinct origins across 2,530 rows with no nulls. The distribution is fairly broad (entropy ratio 0.79), with Venezuela leading at exactly 10% (253 rows), followed closely by Peru (244) and Dominican Republic (226) — no single origin dominates heavily. Notably, 'Blend' appears as a pseudo-origin with 156 entries, meaning ~6% of records are multi-origin mixtures rather than single-country sourced beans, which may need special handling in origin-based analyses.

ingredients high anthropic:default

This column encodes a structured ingredient combination label for each record, consisting of a count prefix (e.g., '3-') followed by abbreviated ingredient codes (B, S, C, L, V, Sa). With only 22 distinct values across 2,530 rows it functions as a categorical feature rather than free text. Notably, 87 rows carry an empty string despite a reported null_rate of 0.0, which is a hidden missingness issue that needs handling. The top value '3- B,S,C' dominates at ~39.5% of rows, and starred variants (e.g., 'B,S*') suggest a meaningful sub-type modifier that distinguishes at least some categories.

ref medium anthropic:default

This column appears to be a numeric reference or ID code stored as a categorical string, likely a ticket number, document reference, or external record identifier. With 630 unique values across 2,530 rows, the average reuse rate is ~4 rows per value, and the entropy ratio of 0.9954 is nearly maximal, indicating an almost-uniform distribution with no dominant category. The most frequent value ('414') appears only 10 times (top_rate ≈ 0.004), confirming no single reference dominates—but the non-unique nature rules out a pure primary key, suggesting these are foreign references that recur legitimately.

Numeric correlation

Show data table

Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
	review_date	cocoa_percent	rating
review_date	+1.00	+0.09	+0.13
cocoa_percent	+0.09	+1.00	-0.14
rating	+0.13	-0.14	+1.00

ref categorical

rows2,530

null0 (0.0%)

unique630

top_value414

top_rate3.95e-03

cardinality630

entropy9.257

entropy_ratio0.995

Show data table

Top values for ref (20 unique shown, of 630 total).
value	count	share
414	10	0.4%
24	9	0.4%
404	9	0.4%
387	9	0.4%
1462	8	0.3%
1454	8	0.3%
431	8	0.3%
439	8	0.3%
1450	8	0.3%
552	8	0.3%
1458	8	0.3%
1466	8	0.3%
370	7	0.3%
502	7	0.3%
636	7	0.3%
572	7	0.3%
355	7	0.3%
486	7	0.3%
478	7	0.3%
377	7	0.3%

Top values (rank 1–20)

414 — 10
24 — 9
404 — 9
387 — 9
1462 — 8
1454 — 8
431 — 8
439 — 8
1450 — 8
552 — 8
1458 — 8
1466 — 8
370 — 7
502 — 7
636 — 7
572 — 7
355 — 7
486 — 7
478 — 7
377 — 7

company categorical

top value is 100.0% of rows

rows2,530

null0 (0.0%)

unique1

top_value

top_rate1.000

cardinality1

entropy-0.000

entropy_ratio0.000

Show data table

Top values for company (1 unique shown, of 1 total).
value	count	share
	2530	100.0%

Top values (rank 1–20)

— 2,530

company_location categorical

rows2,530

null0 (0.0%)

unique67

top_valueU.S.A.

top_rate0.449

cardinality67

entropy3.675

entropy_ratio0.606

Show data table

Top values for company_location (20 unique shown, of 67 total).
value	count	share
U.S.A.	1136	44.9%
Canada	177	7.0%
France	176	7.0%
U.K.	133	5.3%
Italy	78	3.1%
Belgium	63	2.5%
Ecuador	58	2.3%
Australia	53	2.1%
Switzerland	44	1.7%
Germany	42	1.7%
Spain	36	1.4%
Venezuela	31	1.2%
Japan	31	1.2%
Denmark	31	1.2%
Austria	30	1.2%
Colombia	29	1.1%
New Zealand	27	1.1%
Hungary	26	1.0%
Brazil	25	1.0%
Peru	23	0.9%

Top values (rank 1–20)

U.S.A. — 1,136
Canada — 177
France — 176
U.K. — 133
Italy — 78
Belgium — 63
Ecuador — 58
Australia — 53
Switzerland — 44
Germany — 42
Spain — 36
Venezuela — 31
Japan — 31
Denmark — 31
Austria — 30
Colombia — 29
New Zealand — 27
Hungary — 26
Brazil — 25
Peru — 23

review_date numeric

rows2,530

null0 (0.0%)

unique16

min2,006

max2,021

mean2,014

median2,015

std3.968

q12,012

q32,018

iqr6.000

skew-0.183

kurtosis-0.773

n_outliers0

outlier_rate0.000

zero_rate0.000

Show data table

Histogram bins for review_date (median: 2015.0).
bin	count
2006 – 2006	62
2006 – 2007	0
2007 – 2007	73
2007 – 2008	0
2008 – 2008	0
2008 – 2008	92
2008 – 2009	0
2009 – 2009	0
2009 – 2009	123
2009 – 2010	0
2010 – 2010	110
2010 – 2010	0
2010 – 2011	0
2011 – 2011	163
2011 – 2012	0
2012 – 2012	0
2012 – 2012	194
2012 – 2013	0
2013 – 2013	183
2013 – 2014	0
2014 – 2014	0
2014 – 2014	247
2014 – 2015	0
2015 – 2015	0
2015 – 2015	284
2015 – 2016	0
2016 – 2016	217
2016 – 2016	0
2016 – 2017	0
2017 – 2017	105
2017 – 2018	0
2018 – 2018	0
2018 – 2018	228
2018 – 2019	0
2019 – 2019	193
2019 – 2020	0
2020 – 2020	0
2020 – 2020	81
2020 – 2021	0
2021 – 2021	175

country_of_bean_origin categorical

rows2,530

null0 (0.0%)

unique62

top_valueVenezuela

top_rate0.100

cardinality62

entropy4.717

entropy_ratio0.792

Show data table

Top values for country_of_bean_origin (20 unique shown, of 62 total).
value	count	share
Venezuela	253	10.0%
Peru	244	9.6%
Dominican Republic	226	8.9%
Ecuador	219	8.7%
Madagascar	177	7.0%
Blend	156	6.2%
Nicaragua	100	4.0%
Bolivia	80	3.2%
Tanzania	79	3.1%
Colombia	79	3.1%
Brazil	78	3.1%
Belize	76	3.0%
Vietnam	73	2.9%
Guatemala	62	2.5%
Mexico	55	2.2%
Papua New Guinea	50	2.0%
Costa Rica	43	1.7%
Trinidad	42	1.7%
Ghana	41	1.6%
India	35	1.4%

Top values (rank 1–20)

Venezuela — 253
Peru — 244
Dominican Republic — 226
Ecuador — 219
Madagascar — 177
Blend — 156
Nicaragua — 100
Bolivia — 80
Tanzania — 79
Colombia — 79
Brazil — 78
Belize — 76
Vietnam — 73
Guatemala — 62
Mexico — 55
Papua New Guinea — 50
Costa Rica — 43
Trinidad — 42
Ghana — 41
India — 35

specific_bean_origin text

33.8% rows are a single word 36.6% duplicate strings

rows2,530

null0 (0.0%)

unique1,605

len_min3

len_max51

len_mean17.115

len_median14.000

len_p9539.000

word_mean2.681

word_median2.000

n_empty0

n_duplicates925

duplicate_rate0.366

vocab_size2,079

readability_flesch_mean28.412

emoji_rate0.000

url_rate0.000

one_word_rate0.338

allcaps_rate1.58e-03

boilerplate_rate0.000

Show data table

Character-length distribution for specific_bean_origin (mean: 17.115415019762846).
chars	count
3 – 4	86
4 – 5	106
5 – 7	152
7 – 8	211
8 – 9	142
9 – 10	306
10 – 11	60
11 – 13	71
13 – 14	79
14 – 15	54
15 – 16	143
16 – 17	73
17 – 19	98
19 – 20	65
20 – 21	64
21 – 22	119
22 – 23	48
23 – 25	57
25 – 26	42
26 – 27	45
27 – 28	84
28 – 29	37
29 – 31	39
31 – 32	35
32 – 33	29
33 – 34	58
34 – 35	18
35 – 37	23
37 – 38	30
38 – 39	21
39 – 40	33
40 – 41	13
41 – 43	16
43 – 44	14
44 – 45	13
45 – 46	23
46 – 47	8
47 – 49	12
49 – 50	1
50 – 51	2

Sample values (first 10)

Matasawalevu, batch 1
Crayfish Bay Estate, 2014
Hawai'i Island, Big Island
Kokoa Kamili Coop
Duarte Province, El Cibao, batch 10
Maya Mountain, 2017, batch 255
Campesino w/ nibs
Amazonas
Ghana
Jamaica

cocoa_percent numeric

9.3% rows beyond 1.5 IQR

rows2,530

null0 (0.0%)

unique46

min42.000

max100.000

mean71.640

median70.000

std5.617

q170.000

q374.000

iqr4.000

skew1.198

kurtosis6.541

n_outliers235

outlier_rate0.093

zero_rate0.000

Show data table

Histogram bins for cocoa_percent (median: 70.0).
bin	count
42 – 43.45	1
43.45 – 44.9	0
44.9 – 46.35	1
46.35 – 47.8	0
47.8 – 49.25	0
49.25 – 50.7	1
50.7 – 52.15	0
52.15 – 53.6	1
53.6 – 55.05	16
55.05 – 56.5	2
56.5 – 57.95	1
57.95 – 59.4	8
59.4 – 60.85	47
60.85 – 62.3	23
62.3 – 63.75	14
63.75 – 65.2	124
65.2 – 66.65	28
66.65 – 68.1	106
68.1 – 69.55	13
69.55 – 71	1046
71 – 72.45	340
72.45 – 73.9	72
73.9 – 75.35	377
75.35 – 76.8	35
76.8 – 78.25	63
78.25 – 79.7	2
79.7 – 81.15	95
81.15 – 82.6	18
82.6 – 84.05	9
84.05 – 85.5	40
85.5 – 86.95	1
86.95 – 88.4	9
88.4 – 89.85	2
89.85 – 91.3	12
91.3 – 92.75	0
92.75 – 94.2	0
94.2 – 95.65	0
95.65 – 97.1	0
97.1 – 98.55	0
98.55 – 100	23

ingredients categorical

rows2,530

null0 (0.0%)

unique22

top_value3- B,S,C

top_rate0.395

cardinality22

entropy2.430

entropy_ratio0.545

Show data table

Top values for ingredients (20 unique shown, of 22 total).
value	count	share
3- B,S,C	999	39.5%
2- B,S	718	28.4%
4- B,S,C,L	286	11.3%
5- B,S,C,V,L	184	7.3%
4- B,S,C,V	141	5.6%
	87	3.4%
2- B,S*	31	1.2%
4- B,S*,C,Sa	20	0.8%
3- B,S*,C	12	0.5%
3- B,S,L	8	0.3%
4- B,S*,C,V	7	0.3%
5-B,S,C,V,Sa	6	0.2%
1- B	6	0.2%
4- B,S,V,L	5	0.2%
4- B,S,C,Sa	5	0.2%
6-B,S,C,V,L,Sa	4	0.2%
3- B,S,V	3	0.1%
4- B,S*,V,L	3	0.1%
4- B,S*,C,L	2	0.1%
3- B,S*,Sa	1	0.0%

Top values (rank 1–20)

3- B,S,C — 999
2- B,S — 718
4- B,S,C,L — 286
5- B,S,C,V,L — 184
4- B,S,C,V — 141
— 87
2- B,S* — 31
4- B,S*,C,Sa — 20
3- B,S*,C — 12
3- B,S,L — 8
4- B,S*,C,V — 7
5-B,S,C,V,Sa — 6
1- B — 6
4- B,S,V,L — 5
4- B,S,C,Sa — 5
6-B,S,C,V,L,Sa — 4
3- B,S,V — 3
4- B,S*,V,L — 3
4- B,S*,C,L — 2
3- B,S*,Sa — 1

most_memorable_characteristics text

98.3% of rows are unique strings

rows2,530

null0 (0.0%)

unique2,487

len_min3

len_max37

len_mean23.062

len_median23.000

len_p9530.000

word_mean3.376

word_median3.000

n_empty0

n_duplicates43

duplicate_rate0.017

vocab_size868

readability_flesch_mean49.707

emoji_rate0.000

url_rate0.000

one_word_rate7.11e-03

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for most_memorable_characteristics (mean: 23.062450592885376).
chars	count
3 – 4	1
4 – 5	0
5 – 6	4
6 – 6	4
6 – 7	3
7 – 8	1
8 – 9	0
9 – 10	2
10 – 11	3
11 – 12	9
12 – 12	39
12 – 13	47
13 – 14	52
14 – 15	0
15 – 16	29
16 – 17	34
17 – 17	69
17 – 18	100
18 – 19	145
19 – 20	0
20 – 21	206
21 – 22	206
22 – 23	156
23 – 23	173
23 – 24	179
24 – 25	192
25 – 26	0
26 – 27	201
27 – 28	179
28 – 28	178
28 – 29	121
29 – 30	86
30 – 31	66
31 – 32	0
32 – 33	23
33 – 34	13
34 – 34	3
34 – 35	4
35 – 36	1
36 – 37	1

Sample values (first 10)

chewy, off, rubbery
dark berry, mild floral
intense, tannic, choco, earthy
basic cocoa, gateway
dried fruit, orange peel, cocoa
flat, molasses, creamy
XL nibs, sour, cardboard
blackberry, dirt, high roast
sweet, vanilla, cocoa, mold
sandy, woody, spicy

rating numeric

rows2,530

null0 (0.0%)

unique12

min1.000

max4.000

mean3.196

median3.250

std0.445

q13.000

q33.500

iqr0.500

skew-0.608

kurtosis1.053

n_outliers50

outlier_rate0.020

zero_rate0.000

Show data table

Histogram bins for rating (median: 3.25).
bin	count
1 – 1.075	4
1.075 – 1.15	0
1.15 – 1.225	0
1.225 – 1.3	0
1.3 – 1.375	0
1.375 – 1.45	0
1.45 – 1.525	10
1.525 – 1.6	0
1.6 – 1.675	0
1.675 – 1.75	0
1.75 – 1.825	3
1.825 – 1.9	0
1.9 – 1.975	0
1.975 – 2.05	33
2.05 – 2.125	0
2.125 – 2.2	0
2.2 – 2.275	17
2.275 – 2.35	0
2.35 – 2.425	0
2.425 – 2.5	0
2.5 – 2.575	166
2.575 – 2.65	0
2.65 – 2.725	0
2.725 – 2.8	333
2.8 – 2.875	0
2.875 – 2.95	0
2.95 – 3.025	523
3.025 – 3.1	0
3.1 – 3.175	0
3.175 – 3.25	0
3.25 – 3.325	464
3.325 – 3.4	0
3.4 – 3.475	0
3.475 – 3.55	565
3.55 – 3.625	0
3.625 – 3.7	0
3.7 – 3.775	300
3.775 – 3.85	0
3.85 – 3.925	0
3.925 – 4	112