Tinyml Model for ESP32Cam

My objective is - Detect baby & draw bounding box around it , then apply masking & remove everything else except that bounding box.

I have created dataset of 1200 images & labels (right now in YOLO format)

So far , i have tried yolov5n & TinyissimoYOLO.
But when I’m converting in c array…size is going above 20mb.

Please guide me in model selection or creation.
& correct me if I’m going wrong anywhere.