saturn·

data trove edible insects database

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/quirky/insects_by_form.json

Saturn profiled 7 rows across 2 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/quirky/insects_by_form.json",
    "--findings", "data-trove-edible-insects-database.json",
    "--llm", "anthropic:default",
])

Summary confidence: medium

This tiny dataset categorises insect-based food products into 7 form types and records how many products fall into each category. With only 7 rows, the big story is the extreme skew in counts: most form types have just 1–6 products, but one outlier category reaches 57, pulling the mean (10.7) far above the median (3.0). That dominant category is worth identifying immediately, as it likely represents the most commercially developed segment of the edible-insect market. The high standard deviation (20.6) confirms the distribution is anything but uniform.

citing: mean · median · max · std · n_outliers · outlier_rate · skew · top_value · n_unique

Out[4]:

saturn.schema() · 2 columns

column kind n null% unique alerts
name categorical 7 0.0% 7 long_tail
count numeric 7 0.0% 4 outliers
Fig 1.
count · Look for the single extreme outlier at 57 that dwarfs all other form-type counts.
Show data table
Histogram bins for count (median: 3.0).
bincount
1 – 12.26
12.2 – 23.40
23.4 – 34.60
34.6 – 45.80
45.8 – 571
Fig 2.
name · Compare counts across all 7 product forms to see which category dominates the edible-insect market.
Show data table
Top values for name (7 unique shown, of 7 total).
valuecountshare
Whole Insects114.3%
Flour/Powder114.3%
Protein Products114.3%
Snack Bars114.3%
Confectionery114.3%
Whole Snacks114.3%
Crackers114.3%
Fig 3.
name · The donut shows how lopsided the share of one form type is relative to the rest combined.
Show data table
Top values for name (7 unique shown, of 7 total).
valuecountshare
Whole Insects114.3%
Flour/Powder114.3%
Protein Products114.3%
Snack Bars114.3%
Confectionery114.3%
Whole Snacks114.3%
Crackers114.3%
Fig 4.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
namecategorical0.0%
countnumeric0.0%

name categorical label

This column contains product category names for what appears to be an insect-based food product taxonomy, with 7 distinct categories such as 'Whole Insects', 'Flour/Powder', 'Snack Bars', and 'Crackers'. Every category appears exactly once (top_rate = 0.143, equal to 1/7), yielding a perfectly uniform distribution and maximum entropy ratio of 1.0. The 'long_tail' alert is a statistical artefact of this uniformity rather than a genuine skew signal. With only 7 rows and 7 unique values, this is likely a small reference/lookup table rather than a transactional dataset.

Treatment: Use as a categorical label or join key against a larger fact table; no encoding needed at this cardinality.

anthropic:default · confidence high
Out[10]:

saturn.columns["name"].stats

statvalue
n7
nulls0 (0.0%)
unique7
top_value Whole Insects
top_rate 0.1429
cardinality 7
entropy 2.807
entropy_ratio 1
alert: long_tail7 singleton categories
Fig 5.
Top values for name.
Show data table
Top values for name (7 unique shown, of 7 total).
valuecountshare
Whole Insects114.3%
Flour/Powder114.3%
Protein Products114.3%
Snack Bars114.3%
Confectionery114.3%
Whole Snacks114.3%
Crackers114.3%

count numeric feature

This column appears to be a frequency or occurrence count, likely recording how many times something was observed across 7 rows with only 4 distinct values. The distribution is severely right-skewed (skew = 1.96) with one outlier: the max of 57.0 sits far above the median of 3.0 and mean of 10.71, pulling the standard deviation to 20.61 — an unusually high spread for such a small dataset. With only 7 rows total, this column provides very limited statistical signal.

Treatment: Investigate the outlier row (value 57.0) for data quality issues; if retained, log-transform before modelling to reduce skew.

anthropic:default · confidence medium
Out[13]:

saturn.columns["count"].stats

statvalue
n7
nulls0 (0.0%)
unique4
min 1
max 57
mean 10.71
median 3
std 20.61
q1 1
q3 6
iqr 5
skew 1.965
kurtosis 1.987
n_outliers 1
outlier_rate 0.1429
zero_rate 0
alert: outliers14.3% rows beyond 1.5 IQR
Fig 6.
Distribution of count. Vertical dash marks the median.
Show data table
Histogram bins for count (median: 3.0).
bincount
1 – 12.26
12.2 – 23.40
23.4 – 34.60
34.6 – 45.80
45.8 – 571

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-edible-insects-database-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove edible insects database},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-edible-insects-database}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove edible insects database. Source: /home/coolhand/html/datavis/data_trove/data/quirky/insects_by_form.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-edible-insects-database