quirky-chocolate_origins · saturn notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/chocolate_origins.json

Saturn profiled 2,530 rows across 10 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/chocolate_origins.json",
    "--findings", "quirky-chocolate_origins.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 2,530 chocolate bar reviews with 10 columns covering bean origins, cocoa percentages, ingredients, ratings, and review metadata. Ratings cluster tightly (median 3.25, IQR 0.5) on a 1–4 scale, while cocoa percent is similarly concentrated around 70% but carries 235 outliers worth investigating. Geographic skew is notable: U.S.A. dominates company locations at 44.9% of records, whereas bean origins are more diverse, led by Venezuela, Peru, and the Dominican Republic. Heads up that the `company` column is entirely empty (single blank value across all rows), so it should be excluded from analysis.

citing: row_count · column_count · rating · cocoa_percent · company_location · country_of_bean_origin · ingredients · company · review_date

Out[4]:

saturn.schema() · 10 columns

column	kind	n	null%	unique	alerts
ref	categorical	2,530	0.0%	630
company	categorical	2,530	0.0%	1	imbalance
company_location	categorical	2,530	0.0%	67
review_date	numeric	2,530	0.0%	16
country_of_bean_origin	categorical	2,530	0.0%	62
specific_bean_origin	text	2,530	0.0%	1,605	one_word duplicates
cocoa_percent	numeric	2,530	0.0%	46	outliers
ingredients	categorical	2,530	0.0%	22
most_memorable_characteristics	text	2,530	0.0%	2,487	near_unique
rating	numeric	2,530	0.0%	12

Fig 1.

rating · Check how tightly ratings cluster around 3.25 and whether the left tail of low scores is meaningful.

Show data table

Histogram bins for rating (median: 3.25).
bin	count
1 – 1.075	4
1.075 – 1.15	0
1.15 – 1.225	0
1.225 – 1.3	0
1.3 – 1.375	0
1.375 – 1.45	0
1.45 – 1.525	10
1.525 – 1.6	0
1.6 – 1.675	0
1.675 – 1.75	0
1.75 – 1.825	3
1.825 – 1.9	0
1.9 – 1.975	0
1.975 – 2.05	33
2.05 – 2.125	0
2.125 – 2.2	0
2.2 – 2.275	17
2.275 – 2.35	0
2.35 – 2.425	0
2.425 – 2.5	0
2.5 – 2.575	166
2.575 – 2.65	0
2.65 – 2.725	0
2.725 – 2.8	333
2.8 – 2.875	0
2.875 – 2.95	0
2.95 – 3.025	523
3.025 – 3.1	0
3.1 – 3.175	0
3.175 – 3.25	0
3.25 – 3.325	464
3.325 – 3.4	0
3.4 – 3.475	0
3.475 – 3.55	565
3.55 – 3.625	0
3.625 – 3.7	0
3.7 – 3.775	300
3.775 – 3.85	0
3.85 – 3.925	0
3.925 – 4	112

Fig 2.

cocoa_percent · Look for the 70% spike and the 9% of bars flagged as outliers above or below the typical range.

Show data table

Histogram bins for cocoa_percent (median: 70.0).
bin	count
42 – 43.45	1
43.45 – 44.9	0
44.9 – 46.35	1
46.35 – 47.8	0
47.8 – 49.25	0
49.25 – 50.7	1
50.7 – 52.15	0
52.15 – 53.6	1
53.6 – 55.05	16
55.05 – 56.5	2
56.5 – 57.95	1
57.95 – 59.4	8
59.4 – 60.85	47
60.85 – 62.3	23
62.3 – 63.75	14
63.75 – 65.2	124
65.2 – 66.65	28
66.65 – 68.1	106
68.1 – 69.55	13
69.55 – 71	1046
71 – 72.45	340
72.45 – 73.9	72
73.9 – 75.35	377
75.35 – 76.8	35
76.8 – 78.25	63
78.25 – 79.7	2
79.7 – 81.15	95
81.15 – 82.6	18
82.6 – 84.05	9
84.05 – 85.5	40
85.5 – 86.95	1
86.95 – 88.4	9
88.4 – 89.85	2
89.85 – 91.3	12
91.3 – 92.75	0
92.75 – 94.2	0
94.2 – 95.65	0
95.65 – 97.1	0
97.1 – 98.55	0
98.55 – 100	23

Fig 3.

company_location · Note how heavily U.S.A. dominates at ~45% of all reviews, dwarfing Canada and France.

Show data table

Top values for company_location (20 unique shown, of 67 total).
value	count	share
U.S.A.	1136	44.9%
Canada	177	7.0%
France	176	7.0%
U.K.	133	5.3%
Italy	78	3.1%
Belgium	63	2.5%
Ecuador	58	2.3%
Australia	53	2.1%
Switzerland	44	1.7%
Germany	42	1.7%
Spain	36	1.4%
Venezuela	31	1.2%
Japan	31	1.2%
Denmark	31	1.2%
Austria	30	1.2%
Colombia	29	1.1%
New Zealand	27	1.1%
Hungary	26	1.0%
Brazil	25	1.0%
Peru	23	0.9%

Fig 4.

country_of_bean_origin · See the more balanced spread across Venezuela, Peru, Dominican Republic, and Ecuador as top bean sources.

Show data table

Top values for country_of_bean_origin (20 unique shown, of 62 total).
value	count	share
Venezuela	253	10.0%
Peru	244	9.6%
Dominican Republic	226	8.9%
Ecuador	219	8.7%
Madagascar	177	7.0%
Blend	156	6.2%
Nicaragua	100	4.0%
Bolivia	80	3.2%
Tanzania	79	3.1%
Colombia	79	3.1%
Brazil	78	3.1%
Belize	76	3.0%
Vietnam	73	2.9%
Guatemala	62	2.5%
Mexico	55	2.2%
Papua New Guinea	50	2.0%
Costa Rica	43	1.7%
Trinidad	42	1.7%
Ghana	41	1.6%
India	35	1.4%

Fig 5.

ingredients · Observe that two recipes (3-ingredient B,S,C and 2-ingredient B,S) account for the majority of bars.

Show data table

Top values for ingredients (20 unique shown, of 22 total).
value	count	share
3- B,S,C	999	39.5%
2- B,S	718	28.4%
4- B,S,C,L	286	11.3%
5- B,S,C,V,L	184	7.3%
4- B,S,C,V	141	5.6%
	87	3.4%
2- B,S*	31	1.2%
4- B,S*,C,Sa	20	0.8%
3- B,S*,C	12	0.5%
3- B,S,L	8	0.3%
4- B,S*,C,V	7	0.3%
5-B,S,C,V,Sa	6	0.2%
1- B	6	0.2%
4- B,S,V,L	5	0.2%
4- B,S,C,Sa	5	0.2%
6-B,S,C,V,L,Sa	4	0.2%
3- B,S,V	3	0.1%
4- B,S*,V,L	3	0.1%
4- B,S*,C,L	2	0.1%
3- B,S*,Sa	1	0.0%

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
ref	categorical	0.0%
company	categorical	0.0%
company_location	categorical	0.0%
review_date	numeric	0.0%
country_of_bean_origin	categorical	0.0%
specific_bean_origin	text	0.0%
cocoa_percent	numeric	0.0%
ingredients	categorical	0.0%
most_memorable_characteristics	text	0.0%
rating	numeric	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
	review_date	cocoa_percent	rating
review_date	+1.00	+0.09	+0.13
cocoa_percent	+0.09	+1.00	-0.14
rating	+0.13	-0.14	+1.00

ref categorical foreign_key

`ref` is a high-cardinality categorical with 630 distinct values across 2530 rows and no nulls, with entropy ratio 0.9954 indicating a near-uniform distribution. Values are short numeric strings (e.g. "414", "24", "1462") and the most frequent appears only 10 times (top_rate 0.0040), so this behaves like a reference/lookup id repeated a handful of times rather than a free-form feature.

Treatment: treat as a foreign key and left-join to its reference table rather than one-hot encoding.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["ref"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	630
top_value	414
top_rate	0.003953
cardinality	630
entropy	9.257
entropy_ratio	0.9954

Fig 8.

Top values for ref.

Show data table

Top values for ref (20 unique shown, of 630 total).
value	count	share
414	10	0.4%
24	9	0.4%
404	9	0.4%
387	9	0.4%
1462	8	0.3%
1454	8	0.3%
431	8	0.3%
439	8	0.3%
1450	8	0.3%
552	8	0.3%
1458	8	0.3%
1466	8	0.3%
370	7	0.3%
502	7	0.3%
636	7	0.3%
572	7	0.3%
355	7	0.3%
486	7	0.3%
478	7	0.3%
377	7	0.3%

company categorical metadata

The column is labelled 'company' but contains a single value—an empty string—across all 2530 rows. Cardinality is 1, entropy is 0, and top_rate is 1.0, so it carries no information. This is effectively a placeholder field that was never populated.

Treatment: Drop; constant empty value provides no signal.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["company"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	1
top_value
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 9.

Top values for company.

Show data table

Top values for company (1 unique shown, of 1 total).
value	count	share
	2530	100.0%

company_location categorical feature

Country of the chocolate maker, with 67 distinct locations and no nulls across 2530 rows. Heavily US-centric: 'U.S.A.' accounts for 44.9% (1136 rows), followed by Canada (177), France (176), and the U.K. (133), giving an entropy ratio of 0.61. Country names use abbreviated forms ('U.S.A.', 'U.K.') so any joins on canonical country lists will need normalisation.

Treatment: Normalise country labels and group long-tail countries before one-hot or target encoding.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["company_location"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	67
top_value	U.S.A.
top_rate	0.449
cardinality	67
entropy	3.675
entropy_ratio	0.6058

Fig 10.

Top values for company_location.

Show data table

Top values for company_location (20 unique shown, of 67 total).
value	count	share
U.S.A.	1136	44.9%
Canada	177	7.0%
France	176	7.0%
U.K.	133	5.3%
Italy	78	3.1%
Belgium	63	2.5%
Ecuador	58	2.3%
Australia	53	2.1%
Switzerland	44	1.7%
Germany	42	1.7%
Spain	36	1.4%
Venezuela	31	1.2%
Japan	31	1.2%
Denmark	31	1.2%
Austria	30	1.2%
Colombia	29	1.1%
New Zealand	27	1.1%
Hungary	26	1.0%
Brazil	25	1.0%
Peru	23	0.9%

review_date numeric timestamp

This column stores the year a review was recorded, ranging from 2006 to 2021 with only 16 unique values across 2530 rows. The distribution is centered around 2015 (mean 2014.37, median 2015) with a modest spread (std 3.97) and is roughly symmetric (skew -0.18). No nulls or outliers are present.

Treatment: Treat as a year-level temporal feature; bin or convert to datetime for trend analysis.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["review_date"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	16
min	2,006
max	2,021
mean	2014
median	2,015
std	3.968
q1	2,012
q3	2,018
iqr	6
skew	-0.1833
kurtosis	-0.7727
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 11.

Distribution of review_date. Vertical dash marks the median.

Show data table

Histogram bins for review_date (median: 2015.0).
bin	count
2006 – 2006	62
2006 – 2007	0
2007 – 2007	73
2007 – 2008	0
2008 – 2008	0
2008 – 2008	92
2008 – 2009	0
2009 – 2009	0
2009 – 2009	123
2009 – 2010	0
2010 – 2010	110
2010 – 2010	0
2010 – 2011	0
2011 – 2011	163
2011 – 2012	0
2012 – 2012	0
2012 – 2012	194
2012 – 2013	0
2013 – 2013	183
2013 – 2014	0
2014 – 2014	0
2014 – 2014	247
2014 – 2015	0
2015 – 2015	0
2015 – 2015	284
2015 – 2016	0
2016 – 2016	217
2016 – 2016	0
2016 – 2017	0
2017 – 2017	105
2017 – 2018	0
2018 – 2018	0
2018 – 2018	228
2018 – 2019	0
2019 – 2019	193
2019 – 2020	0
2020 – 2020	0
2020 – 2020	81
2020 – 2021	0
2021 – 2021	175

country_of_bean_origin categorical feature

Categorical country label identifying where the cocoa beans originated, with 62 distinct values across 2530 complete rows and no nulls. The distribution is broad rather than concentrated: the top value Venezuela accounts for only 10% of rows, and entropy ratio 0.79 confirms fairly even spread across many origins. Notable wrinkle: 'Blend' appears as the 6th most common value (156 rows), meaning some entries aren't a single country and will need special handling.

Treatment: Group rare origins into 'Other', isolate 'Blend' as its own category, then one-hot or target-encode.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["country_of_bean_origin"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	62
top_value	Venezuela
top_rate	0.1
cardinality	62
entropy	4.717
entropy_ratio	0.7921

Fig 12.

Top values for country_of_bean_origin.

Show data table

Top values for country_of_bean_origin (20 unique shown, of 62 total).
value	count	share
Venezuela	253	10.0%
Peru	244	9.6%
Dominican Republic	226	8.9%
Ecuador	219	8.7%
Madagascar	177	7.0%
Blend	156	6.2%
Nicaragua	100	4.0%
Bolivia	80	3.2%
Tanzania	79	3.1%
Colombia	79	3.1%
Brazil	78	3.1%
Belize	76	3.0%
Vietnam	73	2.9%
Guatemala	62	2.5%
Mexico	55	2.2%
Papua New Guinea	50	2.0%
Costa Rica	43	1.7%
Trinidad	42	1.7%
Ghana	41	1.6%
India	35	1.4%

specific_bean_origin text feature

This column captures the specific bean origin (region, estate, or country) for what appears to be a chocolate/cocoa dataset, with 1,605 unique values across 2,530 rows. Top values are dominated by countries like Madagascar (55), Ecuador (43), and Peru (41), but the high frequency of the word 'batch' (356 occurrences) suggests many entries mix origin names with batch identifiers, inflating uniqueness. Roughly 34% of values are single words and 37% are duplicates, indicating inconsistent granularity — some entries are broad countries, others are specific estates or batch-tagged labels.

Treatment: Normalize by stripping batch suffixes and standardizing to country/region before encoding.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["specific_bean_origin"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	1,605
len_min	3
len_max	51
len_mean	17.12
len_median	14
len_p95	39
word_mean	2.681
word_median	2
n_empty	0
n_duplicates	925
duplicate_rate	0.3656
vocab_size	2,079
readability_flesch_mean	28.41
emoji_rate	0
url_rate	0
one_word_rate	0.3375
allcaps_rate	0.001581
boilerplate_rate	0
alert: one_word	33.8% rows are a single word
alert: duplicates	36.6% duplicate strings

Fig 13.

Character-length distribution for specific_bean_origin.

Show data table

Character-length distribution for specific_bean_origin (mean: 17.115415019762846).
chars	count
3 – 4	86
4 – 5	106
5 – 7	152
7 – 8	211
8 – 9	142
9 – 10	306
10 – 11	60
11 – 13	71
13 – 14	79
14 – 15	54
15 – 16	143
16 – 17	73
17 – 19	98
19 – 20	65
20 – 21	64
21 – 22	119
22 – 23	48
23 – 25	57
25 – 26	42
26 – 27	45
27 – 28	84
28 – 29	37
29 – 31	39
31 – 32	35
32 – 33	29
33 – 34	58
34 – 35	18
35 – 37	23
37 – 38	30
38 – 39	21
39 – 40	33
40 – 41	13
41 – 43	16
43 – 44	14
44 – 45	13
45 – 46	23
46 – 47	8
47 – 49	12
49 – 50	1
50 – 51	2

cocoa_percent numeric feature

This is the cocoa percentage of each chocolate bar, ranging from 42 to 100 with a tight median of 70 and IQR of just 4. The distribution is right-skewed (skew 1.20, kurtosis 6.54) and 9.3% of rows flag as outliers — likely the high-cocoa tail pushing toward 100%. With only 46 unique values across 2530 rows, the field is effectively semi-discrete.

Treatment: Use as-is or bin into cocoa-strength buckets; no transform needed given the narrow IQR.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["cocoa_percent"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	46
min	42
max	100
mean	71.64
median	70
std	5.617
q1	70
q3	74
iqr	4
skew	1.198
kurtosis	6.541
n_outliers	235
outlier_rate	0.09289
zero_rate	0
alert: outliers	9.3% rows beyond 1.5 IQR

Fig 14.

Distribution of cocoa_percent. Vertical dash marks the median.

Show data table

Histogram bins for cocoa_percent (median: 70.0).
bin	count
42 – 43.45	1
43.45 – 44.9	0
44.9 – 46.35	1
46.35 – 47.8	0
47.8 – 49.25	0
49.25 – 50.7	1
50.7 – 52.15	0
52.15 – 53.6	1
53.6 – 55.05	16
55.05 – 56.5	2
56.5 – 57.95	1
57.95 – 59.4	8
59.4 – 60.85	47
60.85 – 62.3	23
62.3 – 63.75	14
63.75 – 65.2	124
65.2 – 66.65	28
66.65 – 68.1	106
68.1 – 69.55	13
69.55 – 71	1046
71 – 72.45	340
72.45 – 73.9	72
73.9 – 75.35	377
75.35 – 76.8	35
76.8 – 78.25	63
78.25 – 79.7	2
79.7 – 81.15	95
81.15 – 82.6	18
82.6 – 84.05	9
84.05 – 85.5	40
85.5 – 86.95	1
86.95 – 88.4	9
88.4 – 89.85	2
89.85 – 91.3	12
91.3 – 92.75	0
92.75 – 94.2	0
94.2 – 95.65	0
95.65 – 97.1	0
97.1 – 98.55	0
98.55 – 100	23

ingredients categorical feature

This appears to be a coded recipe/composition field where each value lists a count followed by single-letter ingredient tokens (e.g. 'B,S,C' for what looks like beef/sauce/cheese-style components). With only 22 distinct combinations across 2530 rows and a top value ('3- B,S,C') covering 39.5% of records, the field is highly concentrated — entropy_ratio is just 0.545. Notably, 87 rows carry an empty string rather than null, so null_rate=0.0 understates true missingness.

Treatment: Treat empty strings as missing, then one-hot encode the ingredient tokens (split on comma) rather than the raw combined string.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["ingredients"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	22
top_value	3- B,S,C
top_rate	0.3949
cardinality	22
entropy	2.43
entropy_ratio	0.545

Fig 15.

Top values for ingredients.

Show data table

Top values for ingredients (20 unique shown, of 22 total).
value	count	share
3- B,S,C	999	39.5%
2- B,S	718	28.4%
4- B,S,C,L	286	11.3%
5- B,S,C,V,L	184	7.3%
4- B,S,C,V	141	5.6%
	87	3.4%
2- B,S*	31	1.2%
4- B,S*,C,Sa	20	0.8%
3- B,S*,C	12	0.5%
3- B,S,L	8	0.3%
4- B,S*,C,V	7	0.3%
5-B,S,C,V,Sa	6	0.2%
1- B	6	0.2%
4- B,S,V,L	5	0.2%
4- B,S,C,Sa	5	0.2%
6-B,S,C,V,L,Sa	4	0.2%
3- B,S,V	3	0.1%
4- B,S*,V,L	3	0.1%
4- B,S*,C,L	2	0.1%
3- B,S*,Sa	1	0.0%

most_memorable_characteristics text free_text

Short free-text tasting notes (mean 23 characters, ~3 words) describing flavor and texture characteristics, almost certainly from a chocolate or cocoa review dataset given top tokens like 'cocoa', 'sweet', 'nutty', 'creamy', and 'fruit'. Values are near-unique (2487 distinct of 2530) yet built from a small vocabulary of 868 words, indicating these are comma-separated descriptor combinations rather than prose. Only 43 exact duplicates and no empties or URLs; readability mean of 49.7 is not very meaningful at this length.

Treatment: Split on commas into descriptor tags and one-hot or embed for modelling.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["most_memorable_characteristics"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	2,487
len_min	3
len_max	37
len_mean	23.06
len_median	23
len_p95	30
word_mean	3.376
word_median	3
n_empty	0
n_duplicates	43
duplicate_rate	0.017
vocab_size	868
readability_flesch_mean	49.71
emoji_rate	0
url_rate	0
one_word_rate	0.007115
allcaps_rate	0
boilerplate_rate	0
alert: near_unique	98.3% of rows are unique strings

Fig 16.

Character-length distribution for most_memorable_characteristics.

Show data table

Character-length distribution for most_memorable_characteristics (mean: 23.062450592885376).
chars	count
3 – 4	1
4 – 5	0
5 – 6	4
6 – 6	4
6 – 7	3
7 – 8	1
8 – 9	0
9 – 10	2
10 – 11	3
11 – 12	9
12 – 12	39
12 – 13	47
13 – 14	52
14 – 15	0
15 – 16	29
16 – 17	34
17 – 17	69
17 – 18	100
18 – 19	145
19 – 20	0
20 – 21	206
21 – 22	206
22 – 23	156
23 – 23	173
23 – 24	179
24 – 25	192
25 – 26	0
26 – 27	201
27 – 28	179
28 – 28	178
28 – 29	121
29 – 30	86
30 – 31	66
31 – 32	0
32 – 33	23
33 – 34	13
34 – 34	3
34 – 35	4
35 – 36	1
36 – 37	1

rating numeric feature

A bounded numeric rating on a 1.0–4.0 scale with only 12 distinct values, suggesting half- or quarter-step increments rather than continuous scores. The distribution is tight (IQR 0.5, std 0.45) and slightly left-skewed (-0.61), centered near 3.25 with a mean of 3.20, and 50 low-end outliers (1.98%) pull the tail. No nulls or zeros, so every row carries a usable score.

Treatment: Use as-is as an ordinal/numeric feature; consider treating the 50 low-end outliers separately if modelling tails.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["rating"].stats

stat	value
n	2,530
nulls	0 (0.0%)
unique	12
min	1
max	4
mean	3.196
median	3.25
std	0.4453
q1	3
q3	3.5
iqr	0.5
skew	-0.6084
kurtosis	1.053
n_outliers	50
outlier_rate	0.01976
zero_rate	0

Fig 17.

Distribution of rating. Vertical dash marks the median.

Show data table

Histogram bins for rating (median: 3.25).
bin	count
1 – 1.075	4
1.075 – 1.15	0
1.15 – 1.225	0
1.225 – 1.3	0
1.3 – 1.375	0
1.375 – 1.45	0
1.45 – 1.525	10
1.525 – 1.6	0
1.6 – 1.675	0
1.675 – 1.75	0
1.75 – 1.825	3
1.825 – 1.9	0
1.9 – 1.975	0
1.975 – 2.05	33
2.05 – 2.125	0
2.125 – 2.2	0
2.2 – 2.275	17
2.275 – 2.35	0
2.35 – 2.425	0
2.425 – 2.5	0
2.5 – 2.575	166
2.575 – 2.65	0
2.65 – 2.725	0
2.725 – 2.8	333
2.8 – 2.875	0
2.875 – 2.95	0
2.95 – 3.025	523
3.025 – 3.1	0
3.1 – 3.175	0
3.175 – 3.25	0
3.25 – 3.325	464
3.325 – 3.4	0
3.4 – 3.475	0
3.475 – 3.55	565
3.55 – 3.625	0
3.625 – 3.7	0
3.7 – 3.775	300
3.775 – 3.85	0
3.85 – 3.925	0
3.925 – 4	112

quirky chocolate origins

Overview

Summary confidence: high

ref categorical foreign_key

company categorical metadata

company_location categorical feature

review_date numeric timestamp

country_of_bean_origin categorical feature

specific_bean_origin text feature

cocoa_percent numeric feature

ingredients categorical feature

most_memorable_characteristics text free_text

rating numeric feature

How to cite