saturn·

data trove edible insects database

source /home/coolhand/html/datavis/data_trove/data/quirky/insects_by_form.json 7 rows 2 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · medium confidence anthropic:default

This tiny dataset categorises insect-based food products into 7 form types and records how many products fall into each category. With only 7 rows, the big story is the extreme skew in counts: most form types have just 1–6 products, but one outlier category reaches 57, pulling the mean (10.7) far above the median (3.0). That dominant category is worth identifying immediately, as it likely represents the most commercially developed segment of the edible-insect market. The high standard deviation (20.6) confirms the distribution is anything but uniform.

citing: mean · median · max · std · n_outliers · outlier_rate · skew · top_value · n_unique

Schema

2 columns
Per-column summary. Click column name to jump to its detail.
Alerts
name categorical 0.0% 7
long_tail
count numeric 0.0% 4
outliers

name

categorical label long_tail
This column contains product category names for what appears to be an insect-based food product taxonomy, with 7 distinct categories such as 'Whole Insects', 'Flour/Powder', 'Snack Bars', and 'Crackers'. Every category appears exactly once (top_rate = 0.143, equal to 1/7), yielding a perfectly uniform distribution and maximum entropy ratio of 1.0. The 'long_tail' alert is a statistical artefact of this uniformity rather than a genuine skew signal. With only 7 rows and 7 unique values, this is likely a small reference/lookup table rather than a transactional dataset. Treatment: Use as a categorical label or join key against a larger fact table; no encoding needed at this cardinality. high · anthropic:default
n
7
nulls
0 (0.0%)
unique
7
top_value
Whole Insects
top_rate
0.1429
cardinality
7
entropy
2.807
entropy_ratio
1

count

numeric feature outliers
This column appears to be a frequency or occurrence count, likely recording how many times something was observed across 7 rows with only 4 distinct values. The distribution is severely right-skewed (skew = 1.96) with one outlier: the max of 57.0 sits far above the median of 3.0 and mean of 10.71, pulling the standard deviation to 20.61 — an unusually high spread for such a small dataset. With only 7 rows total, this column provides very limited statistical signal. Treatment: Investigate the outlier row (value 57.0) for data quality issues; if retained, log-transform before modelling to reduce skew. medium · anthropic:default
n
7
nulls
0 (0.0%)
unique
4
min
1
max
57
mean
10.71
median
3
std
20.61
q1
1
q3
6
iqr
5
skew
1.965
kurtosis
1.987
n_outliers
1
outlier_rate
0.1429
zero_rate
0