Command Palette
Search for a command to run...
StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration
StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration
Rafael Carrillo René Duffard Pablo García-Martín Javier Romero Nicolás Morales Luis Gonçalves
Abstract
Artificial satellites and space debris are increasingly contaminating astronomical images, affecting scientific surveys and producing large volumes of streaked exposures. Manual inspection is no longer feasible at scale, and reliable identification and characterisation of streaks has become essential for both the quality control of data and the monitoring of objects in Earth orbit. We present StreakMind, an automated pipeline designed to detect near-Earth objects (NEOs) and satellite streaks in astronomical images, characterise their geometry, and cross-identify them with known orbital objects. The system integrates all inference results into a structured database suitable for large surveys. A YOLO-OBB model was trained on a hybrid manual-synthetic dataset of 2335 images and used to detect streaks in processed FITS frames. Geometric refinement, inter-frame association, satellite cross-identification, and Gaussian-based confidence scoring were then applied to produce final identifications, which were stored in a normalised relational database. In this work, images acquired at La Sagra Observatory (L98) with a Celestron C14+Fastar telescope were used to develop and test automated streak detection and characterisation methods. On the test set, the model achieved a precision of 94% and a recall of 97%.
One-sentence Summary
StreakMind is an automated pipeline employing a YOLO-OBB model trained on a hybrid manual-synthetic dataset of 2335 images to detect and characterise satellite streaks and near-Earth objects in processed FITS frames, cross-identify them with known orbital objects, and integrate results into a normalised relational database to support data quality control and orbital monitoring, achieving 94% precision and 97% recall on test data from La Sagra Observatory acquired with a Celestron C14+Fastar telescope.
Key Contributions
- This work presents StreakMind, an end-to-end pipeline designed to detect linear streaks in ground-based astronomical images, refine their geometry, and cross-identify candidate artificial objects using external ephemerides. The system standardises measurements into Minor Planet Center-style records and integrates all outputs into a relational database suitable for large-scale analyses.
- A YOLO-OBB model was trained on a hybrid manual-synthetic dataset of 2335 images to detect streaks in processed FITS frames. Inter-frame association and Gaussian-based confidence scoring are applied to produce final identifications.
- Images acquired at La Sagra Observatory with a Celestron C14+Fastar telescope were used to develop and test the automated streak detection methods. The model achieved a precision of 94% and a recall of 97% on the test set.
Introduction
Wide-field astronomical surveys now generate massive volumes of imagery contaminated by artificial satellites and space debris, making manual inspection infeasible for near-Earth object detection and orbital monitoring. While existing detection methods can identify linear features, they often lack robust end-to-end integration for large-scale database management and precise geometric characterization. The authors present StreakMind, an automated pipeline that leverages a YOLO-OBB model trained on hybrid manual and synthetic data to detect and characterize linear streaks. This system refines geometric measurements, associates detections across consecutive frames, and cross-identifies candidates against external ephemerides before integrating all outputs into a normalized relational database.
Dataset
Dataset Composition and Sources
- The authors combine 2055 real astronomical FITS images from La Sagra Observatory with 280 synthetically generated images.
- Real observations were conducted between April and June 2019 using a Celestron C14+Fastar telescope equipped with an SBIG ST-10 3 CCD camera.
- Images were acquired with 2x2 binning to reduce data volume and facilitate nightly transfers.
- Synthetic data was introduced specifically to balance the dataset by increasing the representation of long streaks.
Key Details for Each Subset
- Real images measure 1092 x 736 pixels and contain 765 manually identified streaks ranging from 8.5 to 1161.7 pixels in length.
- Images are categorized based on a 269.1 pixel threshold derived from the 75th percentile of the streak-length distribution.
- This classification yields 1523 images without streaks, 412 with short streaks, and 120 with long streaks.
- The synthetic subset includes 280 images where streaks have a minimum length of 269 pixels and follow a Gaussian angular distribution.
Training Splits and Data Usage
- The final dataset is divided into training (70%), validation (20%), and test (10%) subsets using stratified sampling.
- Stratification ensures each subset preserves the original class distribution of short-streak, long-streak, and no-streak images.
- FITS files are converted to PNG format with ZScale normalisation to enhance contrast for faint structures.
- Manual labelling via Tycho Tracker software generates Oriented Bounding Boxes (OBBs) for each detected streak.
Processing and Metadata Construction
- Images are aligned to a common reference frame, resulting in dead margins caused by telescope pointing variations.
- A vertical flip correction is applied to coordinates during conversion to align FITS origins with standard PNG raster conventions.
- A 40 pixel edge threshold is used to determine if a streak is complete or incomplete relative to image borders.
- Metadata construction includes observatory codes, telescope details, astrometric coordinates, and synthesized MPC-formatted observation records.
Method
The core of the StreakMind pipeline is built upon the You Only Look Once (YOLO) family of real-time object detection models, specifically utilizing the YOLO11 architecture introduced in 2024. This single-stage detector is chosen for its ability to predict object location and category in one pass, which is critical for processing large volumes of astronomical imagery efficiently. The model retains the standard three-part structure: a backbone for feature extraction, a neck for multi-scale feature combination, and a head for final prediction. For this specific application, the network is configured to output Oriented Bounding Boxes (OBBs) rather than standard axis-aligned boxes, allowing it to accurately capture the arbitrary orientation of linear streaks.
The geometric representation of these detections is central to the pipeline's accuracy. As illustrated in the figure below, the OBB is defined by four vertices (v1 to v4), a center point (c), a length (L), a width (w), and an orientation angle (θ) relative to the image axes.
While the YOLO11 model provides the initial detection, the authors note that standard regressors often underestimate the true extent of long streaks. To mitigate this, a photometric pre-analysis stage is implemented to longitudinally extend the OBBs. This process involves transforming the image region into a photometrically enhanced format and sampling a one-dimensional flux profile I(s) along the major axis. The box is extended iteratively as long as the measured flux remains above a dynamic threshold defined by I(s)>Ibg+kσ, where Ibg is the background level and σ is the noise estimate. This ensures that faint wings of the streak are captured, as suggested by the extended dashed green boundary in the diagram.
The training process utilizes a pretrained YOLO11 model initially trained on the DOTAv1.0 dataset, which is then fine-tuned on an augmented dataset containing both real and synthetically generated astronomical images. Training is conducted on cloud-based NVIDIA A100 GPUs. Following detection and geometric refinement, the pipeline employs a catalogue-driven filtering stage to remove false positives caused by stellar diffraction spikes by cross-matching with the Gaia DR3 catalogue. Finally, the refined detections undergo inter-frame association, where geometric extrapolation and temporal metadata are used to link streaks across consecutive frames, enabling the identification of moving objects.
Experiment
The evaluation framework combined quantitative testing on a held-out dataset with qualitative visual inspections to validate detection accuracy under controlled astronomical conditions. Subsequent application to real observational data confirmed the automated pipeline's superiority over manual inspection in terms of scalability, sensitivity to faint features, and reproducible database integration. Geometric characterization proved robust across most streak lengths, establishing the system as a viable end-to-end solution for processing large volumes of survey data.
The the the table details the composition of the dataset split into training, validation, and test subsets, categorized by the presence and length of streaks or the absence of streaks. The background class without streaks constitutes the majority of the data across all subsets, while long and short streaks represent smaller, balanced portions. This consistent partitioning ensures that the model is trained and tested on a representative sample of both positive detections and negative background frames. The No-streak class comprises the majority of samples in every subset. Long and short streak categories maintain consistent relative proportions throughout the dataset splits. The distribution ensures a balanced representation of streak types and background images for model evaluation.
The the the table displays the percentile distribution of detected streak lengths in pixels, ranging from the 25th to the 95th percentile. The text indicates that the model performs robustly for streaks up to half the image width, while longer streaks require additional geometric post-processing. This distribution reflects the variety of streak sizes encountered in the observational data. The data shows a significant spread in streak lengths, with the upper percentiles representing features much longer than the median. Geometric accuracy is reported to be highly reliable for streaks within the lower to middle range of the length distribution. For the longest streaks, detection stability is maintained but necessitates specific photometric-based post-processing steps.
The the the table details the composition of the dataset used for the experiment, divided into training, validation, and test subsets. The data is heavily imbalanced, with the majority of samples belonging to the no-streak class across all splits. The distribution of streak lengths and no-streak images remains consistent across the training, validation, and test sets. The no-streak class is the most prevalent category, accounting for the majority of the data in each subset. Short streaks represent the second most common class, making up a substantial minority of the total samples. Long streaks are the rarest category, appearing in a very small fraction of the images across all subsets.
The evaluation setup consists of training, validation, and test subsets with a consistent distribution where the no-streak background class predominates alongside balanced portions of short and long streaks. Experimental results show that the model achieves robust geometric accuracy for streaks within the lower to middle length ranges, typically extending up to half the image width. While detection stability is maintained for the longest streaks, these instances require specific geometric and photometric post-processing steps to ensure reliable performance across the full data spectrum.