Publications

If you want to know more, check my »google scholar page«.

MARIO: Modular and Extensible Architecture for Computing Visual Statistics in RoboCup SPL. Domenico D Bloisi, Andrea Pennisi, Cristian Zampino, Flavio Biancospino, Francesco Laus, Gianluca Di Stefano, Michele Brienza, Rocchina Romano. arXiv preprint arXiv:2209.09987, 2022.

Abstract
This technical report describes a modular and extensible architecture for computing visual statistics in RoboCup SPL (MARIO), presented during the SPL Open Research Challenge at RoboCup 2022, held in Bangkok (Thailand). MARIO is an open-source, ready-to-use software application whose final goal is to contribute to the growth of the RoboCup SPL community. MARIO comes with a GUI that integrates multiple machine learning and computer vision based functions, including automatic camera calibration, background subtraction, homography computation, player + ball tracking and localization, NAO robot pose estimation and fall detection. MARIO has been ranked no. 1 in the Open Research Challenge.

Multi-encoder U-Net for Oral Squamous Cell Carcinoma Image Segmentation. Andrea Pennisi, Domenico D Bloisi, Daniele Nardi, Silvia Varricchio, Francesco Merolla. 2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA), 2022.

Abstract
Oral tumors are responsible for about 170,000 deaths every year in the World. In this paper, we focus on oral squamous cell carcinoma (OSCC), which represents up to 80–90 % of all malignant neoplasms of the oral cavity. We present a novel deep learning-based method for segmenting whole slide image (WSI) samples at the pixel level. The proposed method is a modification of the well-known U-Net architecture through a multi-encoder structure. In particular, our network, called Multi-encoder U-Net, is a multi-encoder single decoder network that takes as input an image and splits it in tiles. For each tile, there is an encoder responsible for encoding it in the latent space, then a convolutional layer is responsible for merging the tiles into a single layer. Each layer of the decoder takes as input the previous up-sampled layer and concatenate it with the layer made by merging the corresponding layers of the multiple encoders. Experiments have been carried out on the publicly available ORal Cancer Annotated (ORCA) dataset, which contains annotated data from the TCGA repository. Quantitative experimental results, obtained using three different quality metrics, demonstrate the effectiveness of the proposed approach, which achieves 82% Pixel-wise Accuracy, 0.82 Dice similarity score, and 0.72 Mean Intersection Over Union.

Skin Lesion Area Segmentation Using Attention Squeeze U-Net for Embedded Devices. Andrea Pennisi, Domenico D Bloisi, Vincenzo Suriani, Daniele Nardi, Antonio Facchiano, and Anna Rita Giampetruzzi. Journal of Digital Imaging, 2022.

Abstract
Melanoma is the deadliest form of skin cancer. Early diagnosis of malignant lesions is crucial for reducing mortality. The use of deep learning techniques on dermoscopic images can help in keeping track of the change over time in the appearance of the lesion, which is an important factor for detecting malignant lesions. In this paper, we present a deep learning architecture called Attention Squeeze U-Net for skin lesion area segmentation specifically designed for embedded devices. The main goal is to increase the patient empowerment through the adoption of deep learning algorithms that can run locally on smartphones or low cost embedded devices. This can be the basis to (1) create a history of the lesion, (2) reduce patient visits to the hospital, and (3) protect the privacy of the users. Quantitative results on publicly available data demonstrate that it is possible to achieve good segmentation results even with a compact model.

Deep Learning-Based Pixel-Wise Lesion Segmentation on Oral Squamous Cell Carcinoma Images. Francesco Martino, Domenico D Bloisi, Andrea Pennisi, Mulham Fawakherji, Gennaro Ilardi, Daniela Russo, Daniele Nardi, Stefania Staibano, Francesco Merolla. Applied Sciences, 2020.

Abstract
Oral squamous cell carcinoma is the most common oral cancer. In this paper, we presenta performance analysis of four different deep learning-based pixel-wise methods for lesion segmentationon oral carcinoma images. Two diverse image datasets, one for training and another one for testing,are used to generate and evaluate the models used for segmenting the images, thus allowing to assessthe generalization capability of the considered deep network architectures. An important contributionof this work is the creation of the Oral Cancer Annotated (ORCA) dataset, containing ground-truth dataderived from the well-known Cancer Genome Atlas (TCGA) dataset.

A hierarchical association framework for multi-object tracking in airborne videos. Ting Chen, Andrea Pennisi, Zhi Li, Yanning Zhang, and Hichem Sahli. Remote Sensing, 2018.

Abstract
Multi-Object Tracking (MOT) in airborne videos is a challenging problem due to the uncertain airborne vehicle motion, vibrations of the mounted camera, unreliable detections, changes of size, appearance and motion of the moving objects and occlusions caused by the interaction between moving and static objects in the scene. To deal with these problems, this work proposes a four-stage hierarchical association framework for multiple object tracking in airborne video. The proposed framework combines Data Association-based Tracking (DAT) methods and target tracking using a compressive tracking approach, to robustly track objects in complex airborne surveillance scenes. In each association stage, different sets of tracklets and detections are associated to efficiently handle local tracklet generation, local trajectory construction, global drifting tracklet correction and global fragmented tracklet linking. Experiments with challenging airborne videos show significant tracking improvement compared to existing state-of-the-art methods.

Deep convolutional pixel-wise labeling for skin lesion image segmentation. Ali Yaussef, Andrea Pennisi, Domenico Bloisi, Daniele Nardi, Mario Muscio, and Antonio Facchiano. In 13th Annual IEEE International Symposium on Medical Measurements and Applications, 2018.

Abstract
Melanoma is one of the deadliest form of cancerwith an increasing incidence rate. The development of automaticdiagnostic tools for the early detection of skin cancer lesionsin dermoscopic images can help to reduce melanoma-inducedmortality. In this paper, we present an automatic method for skinlesion image segmentation based on a deep learning algorithm forpixel-wise labeling. Experimental results have been obtained bytesting two network architectures on publicly available data and,in order to show that the used approach is not data set related, wehave used the ISIC database for training the network and the PH2database for testing. The results show that the proposed approachachieves a very accurate segmentation even in presence of hairand air/oil bubbles. An additional contribution of this work isthe development of a semi-automatic GUI for data annotationthat can be used to generate more test images.

COACHES: An Assistance Multi-Robot System in Public Areas. Laurent Jeanpierre, Mouaddib Abdel-Illah, Luca Iocchi, Mara T. Lzaro, Andrea Pennisi, Hichem Sahli, Esra Erdem, Ezgi Demirel and Volkan Patoglu. European Conference on Mobile Robotics (ECMR), 2017.

Abstract
In this paper, we present a robust system of self- directed autonomous robots evolving in a complex and public spaces and interacting with people. This system integrates highlevel skills of environment modeling using knowledge-based modeling and reasoning and scene understanding with robust image and video analysis, distributed autonomous decision- making using Markov decision process and Petri-Net planning, short-term interacting with humans and robust and safe navigation in overcrowding spaces. This system has been deployed in a variety of public environments such as a shopping mall, a center of congress and in a lab to assist people and visitors. The results are very satisfying showing the effectiveness of the system and going beyond just a simple proof of concepts.

Fine-grained boat classification using convolutional neural networks. Michele Fiorini, Domenico D. Bloisi, Ali Youssef, and Andrea Pennisi. International Journal of e-Navigation and Maritime Economy.

Abstract
The use of radar-based systems for vessel monitoring is not suitable in populated areas, due to the high electromagnetic emissions. In this paper, a camera based vessel recognition system for application in the context of Vessel Traffic Services (VTS) and Homeland Protection (HP) is proposed. Our approach is designed to extend the functionality of traditional VTS systems by permitting the classification of both cooperative and non-cooperative targets, using camera images only. This allows enhancing the surveillance function in populated areas, where public opinion is strongly concerned about electromagnetic emissions and therefore antennas are suspiciously observed and radars are not allowed. Experiments have been carried out on a publicly available data set of images coming from the ARGOS boat traffic monitoring system in the City of Venice (Italy). The obtained classification accuracy of 89.6% (with 11 different classes of boats) demonstrates the effectiveness of the proposed approach.

Optical Target Recognition for Drone Ships. M. Fiorini, A. Pennisi and D.D. Bloisi. Proc. of the 12th International Conference on Marine Navigation and Safety of Sea Transportation (TransNav), 2017.

Abstract
Remote controlled drone ships without crews on board are expected by the end of the decade. To achieve the goal of developing (semiY) autonomous boats, reliable visionYbased methods for vessel detection, classification, and tracNing are needed. In this paper, we present a machine learning approach for vessels detection from a moving and zooming camera. In particular, the proposed method is supervised and derives from a fast and robust people detection algorithm. Quantitative experimental results have been obtained on a publicly available data set, which contains images from real sites, demonstrating the effectiveness of the approach. Ground truth annotations and the code of the proposed algorithm are both released for the community.

Enhancing automatic maritime surveillance systems with visual information. Domenico D. Bloisi, Fabio Previtali, Andrea Pennisi, Daniele Nardi, and Michele Fiorini. IEEE Transactions on Intelligent Transportation Systems, PP(99):1–10, 2017.

Abstract
Automatic surveillance systems for the maritime domain are becoming more and more important due to a constant increase of naval traffic and to the simultaneous reduction of crews on decks. However, available technology still provides only a limited support to this kind of applications. In this paper, a modular system for intelligent maritime surveillance, capable of fusing information from heterogeneous sources, is described. The system is designed to enhance the functions of the existing vessel traffic services systems and to be deployable in populated areas, where radar-based systems cannot be used due to the high electromagnetic radiation emissions. A quantitative evaluation of the proposed approach has been carried out on a large and publicly available data set of images and videos, which are collected from multiple real sites, with different light, weather, and traffic conditions.

Parallel multi-modal background modeling. Domenico D. Bloisi, Andrea Pennisi, and Luca Iocchi. In Pattern Recognition Letters, 2016.

Abstract
Background subtraction is a widely used technique for detecting moving objects in image sequences. Very often background subtraction approaches assume the availability of one or more clear (i.e., without foreground objects) frames at the beginning of the sequence in input. However, this assumption is not always true, especially when dealing with dynamic background or crowded scenes. In this paper, we present the results of a multi-modal background modeling method that is able to generate a reliable initial background model even if no clear frames are available. The proposed algorithm runs in real-time on HD images. Quantitative experiments have been conducted taking into account six different quality metrics on a set of 14 publicly available image sequences. The obtained results demonstrate a high-accuracy in generating the background model in comparison with several other methods.

Skin Lesion Image Segmentation Using Delaunay Triangulation for Melanoma Detection. Andrea Pennisi, Domenico Daniele Bloisi, Daniele Nardi, Anna Rita Giampietruzzi, Chiara Mondino, Antonio Facchiano. Computerized Medical Imaging and Graphics, 2016.

Abstract
Developing automatic diagnostic tools for the early detection of skin cancer lesions in dermoscopic images can help to reduce melanoma-induced mortality. Image segmentation is a key step in the automated skin lesion diagnosis pipeline. In this paper, a fast and fully-automatic algorithm for skin lesion segmentation in dermoscopic images is presented. Delaunay Triangulation is used to extract a binary mask of the lesion region, without the need of any training stage. A quantitative experimental evaluation has been conducted on a publicly available database, by taking into account six well-known state-of-the-art segmentation methods for comparison. The results of the experimental analysis demonstrate that the proposed approach is highly accurate when dealing with benign lesions, while the segmentation accuracy significantly decreases when melanoma images are processed. This behavior led us to consider geometrical and color features extracted from the binary masks generated by our algorithm for classification, achieving promising results for melanoma detection.

On-line Real-time Crowd Behavior Detection in Video Sequences. Andrea Pennisi, Domenico Daniele Bloisi, Luca Iocchi. Computer Vision and Image Understanding Journal (CVIU), pp. 166-176, 2016.

Abstract
Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach.

Melanoma Detection Using Delaunay Triangulation. Andrea Pennisi, Domenico Daniele Bloisi, Daniele Nardi, Anna Rita Giampetruzzi, Chiara Mondino and Antonio Facchiano. IEEE 27th International Conference on Tools with Artificial Intelligence, 2015.

Abstract
The detection of malignant lesions in dermoscopic images by using automatic diagnostic tools can help in reducing mortality from melanoma. In this paper, we describe a fully-automatic algorithm for skin lesion segmentation in dermoscopic images. The proposed approach is highly accurate when dealing with benign lesions, while the detection accuracy significantly decreases when melanoma images are segmented. This particular behavior lead us to consider geometrical and color features extracted from the output of our algorithm for classifying melanoma images, achieving promising results.

ARGOS-Venice Boat Classification. Domenico D. Bloisi, Luca Iocchi, Andrea Pennisi, and Luigi Tombolini. 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2015.

Abstract
Detection, tracking, and classification of people and ve-hicles are fundamental processes in intelligent surveillancesystems. The use of publicly available data set is the appro-priate way to compare the relative merits of existing meth-ods and to develop and assess new robust solutions. In thispaper, we focus on the maritime domain and we describe thegeneration of boat classification data sets, containing im-ages of boats automatically extracted by the ARGOS system,operating 24/7 in Venice (Italy). The data sets are uniquein their nature, since they come from an incomparable envi-ronment like Venice, but they present very interesting chal-lenges to vehicle classification, due to changes in the en-vironmental conditions, boat wakes, waves, reflections, etc.We thus believe that robust techniques, validated throughthe ARGOS Boat Classification data sets, will improve thedevelopment and deployment of solutions in similar appli-cations related to vehicle detection and classification.

Real-Time Adaptive Background Modeling in Fast Changing Conditions. Andrea Pennisi, Fabio Previtali, Domenico D. Bloisi, and Luca Iocchi. 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2015.

Abstract
Background modeling in fast changing scenarios is a challenging task due to unexpected events like sudden illumination changes, reflections, and shadows, which can strongly affect the accuracy of the foreground detection. In this paper, we describe a real-time and effective background modeling approach, called FAFEX, that can deal with global and rapid changes in the scene background. The method is designed to identify variations in the background geometry of the monitored scene and it has been quantitatively tested on a publicly available data set, containing a varied set of highly dynamic environments. The experimental evaluation demonstrates how our method is able to effectively deals with challenging sequences in real-time.

Multi-modal Background Model Initialization. Domenico D. Bloisi, Alfonso Grillo, Andrea Pennisi, Luca Iocchi, and Claudio Passaretti. New Trends in Image Analysis and Processing, ICIAP 2015 Workshops, 2015.

Abstract
Background subtraction is a widely used technique for detecting moving objects in image sequences. Very often background subtraction approaches assume the availability of one or more clear frames (i.e., without foreground objects) at the beginning of the image sequence in input. This strong assumption is not always correct, especially when dealing with dynamic background. In this paper, we present the results of an on-line and real-time background initialization method, called IMBS, which generates a reliable initial background model even if no clear frames are available. The accuracy of the proposed approach is calculated on a set of seven publicly available benchmark sequences. Experimental results demonstrate that IMBS generates accurate background models with respect to eight different quality metrics.

Multi-robot Surveillance Through a Distributed Sensor Network. Andrea Pennisi, Fabio Previtali, Cristiano Gennari, Domenico D. Bloisi, Luca Iocchi, Francesco Ficarola, Andrea Vitaletti, Daniele Nardi. Chapter in Cooperative Robots and Sensor Networks 2015, Springer International Publishing, vol. 604, pp. 77-98, 2015

Abstract
Automatic surveillance of public areas, such as airports, trainstations, and shopping malls, requires the capacity of detecting and rec-ognizing possible abnormal situations in populated environments. In thisbook chapter, an architecture for intelligent surveillance in indoor pub-lic spaces, based on an integration ofinteractiveandnon-interactiveheterogeneous sensors, is described. As a difference with respect to tra-ditional, passive and pure vision-based systems, the proposed approachrelies on a distributed sensor network combining RFID tags, multiplemobile robots, and fixed RGBD cameras. The presence and the positionof people in the scene is detected by suitably combining data comingfrom the sensor nodes, including those mounted on board of the mobilerobots that are in charge of patrolling the environment. The robots canadapt their behavior according to the current situation, on the basis of aPrey-Predator scheme, and can coordinate their actions to fulfill the re-quired tasks. Experimental results have been carried out both on real andon simulated data to show the effectiveness of the proposed approach.

Distributed sensor network for multi-robot surveillance. Andrea Pennisi, Fabio Previtali, Francesco Ficarola, Domenico Daniee Bloisi, Luca Iocchi, and Andrea Vitaletti. Procedia Computer Science, 32(0):1095 – 1100, 2014.

Abstract
Monitoring of populated indoor environments is crucial for the surveillance of public spaces like airports or embassies, wherethe behaviorof people may be relevant in order to determine abnormal situations. In this paper, a surveillance system based onan integration ofinteractiveandnon-interactiveheterogeneous sensorsis described. As a difference with respect to traditional,pure vision-based systems, the proposed approach relieson Radio Frequency Identification (RFID) tags carried by people, multiplemobile robots (each one equipped witha laser range finder and an RFID reader), and fixed RGBD cameras. The main task of thesystemis to assess the presence and the position of people in the environment. This is obtained by suitably integrating data comingfrom heterogeneous sensors, including those mountedon board of mobile robots that are in charge of patrolling the environment.The robots also adapt their behavior accordingto the current situation, on the basis of a Prey-Predator scheme. Experimental resultscarried out bothon real and on simulated data show the effectiveness of the approach.

Novel patterns and methods for zooming camera calibration. Andrea Pennisi, Domenico Daniele Bloisi, Claudio Gaz, Luca Iocchi, and Daniele Nardi. Journal of WSCG, 21(1):59–67, 2013.

Abstract
Camera calibration is a necessary step in order to develop applications that need to establish a relationship between image pixels and real world points. The goal of camera calibration is to estimate the extrinsic and intrinsic camera parameters. Usually, for non-zooming cameras, the calibration is carried out by using a grid pattern of known dimensions (e.g., a chessboard). However, for cameras with zoom functions, the use of a grid pattern only is not sufficient, because the calibration has to be effective at multiple zoom levels and some features (e.g., corners) could not be detectable. In this paper, a calibration method based on two novel calibration patterns, specifically designed for zooming cameras, is presented. The first pattern, called in-lab pattern, is designed for intrinsic parameter recovery, while the second one, called on-field pattern, is conceived for extrinsic parameter estimation. As an application example, on-line virtual advertising in sport events, where the objective is to insert virtual advertising images into live or pre-recorded television shows, is considered. A quantitative experimental evaluation shows an increase of performance with respect to the use of standard calibration routines considering both re-projection accuracy and calibration time.

Ground truth acquisition of humanoid soccer robot behaviour. Andrea Pennisi, Domenico Daniele Bloisi, Luca Iocchi, and Daniele Nardi. In Proceedings of the 17th Annual Robocup International Symposium, pages 1–8, 2013.

Abstract
In this paper an open source software for monitoring humanoid soccer robot behaviours is presented. The software is part of an easy to set up system, conceived for registering ground truth data that can be used for evaluating and testing methods such as robot coordination and localization. The hardware architecture of the system is designed for using multiple low-cost visual sensors (four Kinects). The software includes a foreground computation module and a detection unit for both players and ball. A graphical user interface has been developed in order to facilitate the creation of a shared multi-camera plan view, in which the observations of players and ball are re-projected to obtain global positions. The effectiveness of the implemented system has been proven using a laser sensor to measure the exact position of the objects of interest in the field.

Background modeling in the maritime domain. Domenico Daniele Bloisi, Andrea Pennisi, and Luca Iocchi. Machine Vision and Applications, pages 1–13, 2013.

Abstract
Maritime environment represents a challeng-ing scenario for automatic video surveillance, due to thecomplexity of the observed scene: waves on the watersurface, boat wakes, and weather issues contribute togenerate a highly dynamic background. Moreover, anappropriate background model has to deal with gradualand sudden illumination changes, camera jitter, shad-ows, and reflections that can provoke false detections.Using a predefined distribution (e.g., Gaussian) for gen-erating the background model can result ineffective, dueto the need of modeling non-regular patterns. In thispaper, a method for creating a “discretization” of anunknown distribution that can model highly dynamicbackground such as water is described. A quantitativeevaluation carried out on two publicly available datasets of videos and images, containing data recordedin different maritime scenarios, with varying light andweather conditions, demonstrates the effectiveness ofthe approach.

Human-robot collaboration for semantic labeling of the environment. Taigo Maria Bonanni, Andrea Pennisi, Domenico Daniele Bloisi, Luca Iocchi, and Daniele Nardi. In Proceedings of the 3rd Workshop on Semantic Perception, Mapping and Exploration (SPME), pages 1–6, 2013.

Abstract
Today’s robots are able to perform more and more complex tasks, which usually require a high degree of interaction with the environment they have to operate in. As a consequence, robotic systems should have a deep and specific knowledge of their workspaces, which goes far beyond a simple metric representation a robotic system can build up through SLAM (Simultaneous Localization and Mapping). In this paper, we present a novel human-robot collaboration approach, designed to extract 3D shapes associated to objects of interest pointed out by a human operator. The information regarding the segmented objects are then integrated into a metric map, built by the robot, providing a high-level representation of the environment that embodies all the knowledge required by a robot to actually execute complex tasks.

Context-aware video analysis for infomobility. Luca Iocchi and Andrea Pennisi. In Proceedings of the 2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS), CISIS 2012, pages 971–976. IEEE Computer Society, 2012.

Abstract
Mobility in large touristic cities (such as Rome and Venice), where needs of citizen and tourists are different(and sometimes even conflicting), is a very relevant problem and infomobility is thus increasingly important. Since active technologies, requiring the passengers to wear some devices(e.g., RFID devices) are not commonly available and cannot be enforced on citizens and tourists, a complete passive sensor system is needed. In this paper we describe development and experimentation of techniques for human activity recognition for infomobility applications based on 3D data extracted from stereo and Kinect cameras. More specifically, we considered the problem of automatic estimation of the number of people present in a bus stop area in a crowded city, like Venice and experimented an approach integrating 3D data analysis, feature extraction and machine learning techniques. Results assessing the feasibility and performance of the proposed approaches are also presented in this paper.