Deep Learning-Based Real-Time Detection and Localization for Targets in UAV Remote Sensing Images
Keywords:
UAV Remote Sensing, Target Recognition, Deep Learning, Object Detection, Real-Time Processing, Attention Mechanism, Feature Pyramid NetworkAbstract
The rapid and accurate recognition and positioning of targets in UAV remote sensing imagery is a critical challenge in fields such as precision agriculture, disaster monitoring, and urban planning. This paper presents a deep learning-based algorithm research for the fast recognition and precise positioning of targets in UAV remote sensing images. We propose an optimized single-shot multibox detector (SSD) architecture integrated with an attention mechanism to enhance feature representation for small and densely distributed targets in complex backgrounds. The algorithm incorporates a feature pyramid network (FPN) to leverage multi-scale features, improving detection accuracy across varying target sizes. Additionally, we design a lightweight backbone network to ensure computational efficiency, enabling real-time processing on embedded platforms commonly deployed on UAVs. The proposed method is trained and validated on a custom dataset comprising diverse UAV-captured scenes, demonstrating a significant improvement in both inference speed and detection precision compared to existing approaches. Experimental results show that our algorithm achieves a mean average precision (mAP) of 89.7% with a processing speed of 32 frames per second on a single GPU, striking an effective balance between accuracy and efficiency. This research provides a practical solution for real-time UAV remote sensing applications, offering substantial potential for autonomous monitoring and rapid response systems.
References
Hu, Xiao. "Low-Cost 3D Authoring via Guided Diffusion in GUI-Driven Pipeline." (2025).
Tan, C. (2024). The Application and Development Trends of Artificial Intelligence Technology in Automotive Production. Artificial Intelligence Technology Research, 2(5).
Tan, C., Gao, F., Song, C., Xu, M., Li, Y., & Ma, H. (2024). Highly Reliable CI-JSO based Densely Connected Convolutional Networks Using Transfer Learning for Fault Diagnosis.
Zhuang, R. (2025). Evolutionary Logic and Theoretical Construction of Real Estate Marketing Strategies under Digital Transformation. Economics and Management Innovation, 2(2), 117-124.
Han, X., & Dou, X. (2025). User recommendation method integrating hierarchical graph attention network with multimodal knowledge graph. Frontiers in Neurorobotics, 19, 1587973.
Yang, J. (2025, July). Identification Based on Prompt-Biomrc Model and Its Application in Intelligent Consultation. In Innovative Computing 2025, Volume 1: International Conference on Innovative Computing (Vol. 1440, p. 149). Springer Nature.
Yang, Zhongheng, Aijia Sun, Yushang Zhao, Yinuo Yang, Dannier Li, and Chengrui Zhou. "RLHF Fine-Tuning of LLMs for Alignment with Implicit User Feedback in Conversational Recommenders." arXiv preprint arXiv:2508.05289 (2025).
Yang, Haowei, Yu Tian, Zhongheng Yang, Zhao Wang, Chengrui Zhou, and Dannier Li. "Research on Model Parallelism and Data Parallelism Optimization Methods in Large Language Model-Based Recommendation Systems." arXiv preprint arXiv:2506.17551 (2025).
Zhang, Jingbo, et al. "AI-Driven Sales Forecasting in the Gaming Industry: Machine Learning-Based Advertising Market Trend Analysis and Key Feature Mining." (2025).
Yang, Yifan. "Website Internal Link Optimization Strategy and SEO Effect Evaluation Based on Dijkstra Algorithm." Journal of Computer, Signal, and System Research 2.3 (2025): 90-96.
Cheng, Ying, et al. "Executive Human Capital Premium and Corporate Stock Price Volatility." Finance Research Letters (2025): 108278.
Xu, Haoran. "UrbanMod: Text-to-3D Modeling for Accelerated City Architecture Planning." Authorea Preprints (2025).
Hsu, Hsin-Ling, et al. "MEDPLAN: A Two-Stage RAG-Based System for Personalized Medical Plan Generation." arXiv preprint arXiv:2503.17900 (2025).
Yuan, Yuping, and Haozhong Xue. "Multimodal Information Integration and Retrieval Framework Based on Graph Neural Networks." Proceedings of the 2025 4th International Conference on Big Data, Information and Computer Network. 2025.
Chen, J., Zhang, X., Wu, Y., Ghosh, S., Natarajan, P., Chang, S. F., & Allebach, J. (2022). One-stage object referring with gaze estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5021-5030).
Tong, Kejian, et al. "An Integrated Machine Learning and Deep Learning Framework for Credit Card Approval Prediction." 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS). IEEE, 2024.
Gao W and Gorinevsky D 2020 Probabilistic modeling for optimization of resource mix with variable generation and storage IEEE Trans. Power Syst. 35 4036–45
Wang, Y. (2025). RAGNet: Transformer-GNN-Enhanced Cox–Logistic Hybrid Model for Rheumatoid Arthritis Risk Prediction.
Qi, R. (2025). AUBIQ: A Generative AI-Powered Framework for Automating Business Intelligence Requirements in Resource-Constrained Enterprises. Frontiers in Business and Finance, 2(01), 66-86.
Fang, Z. (2025). Microservice-Driven Modular Low-Code Platform for Accelerating SME Digital Transformation.
Li, B. (2025). GIS-Integrated Semi-Supervised U-Net for Automated Spatiotemporal Detection and Visualization of Land Encroachment in Protected Areas Using Remote Sensing Imagery.
Li, Binghui. "AD-STGNN: Adaptive Diffusion Spatiotemporal GNN for Dynamic Urban Fire Vehicle Dispatch and Emergency." (2025).
Lin, Tingting. "ENTERPRISE AI GOVERNANCE FRAMEWORKS: A PRODUCT MANAGEMENT APPROACH TO BALANCING INNOVATION AND RISK."
Huang, Jingyi, and Yujuan Qiu. "LSTM‐Based Time Series Detection of Abnormal Electricity Usage in Smart Meters." (2025).
Chen, Rensi. "The application of data mining in data analysis." International Conference on Mathematics, Modeling, and Computer Science (MMCS2022). Vol. 12625. SPIE, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Xuan Bian, Jia Ma

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
