1 Introduction

The relatively recent development of very powerful computational hardware, such as graphic processioning units (GPUs), and the development of deep neural networks, paired with the availability of large quantities of digital data, have facilitated for machine learning (ML) to emerge as a field with the potential to generate great progress in different fields. ML is a subfield of artificial intelligence (AI) which includes a wide range of computational algorithms and modeling tools utilized to process large numbers of data, where these algorithms aim to mimic human intelligence by learning from training data. ML has been applied to different fields including robotics, pattern recognition, data mining, object recognition, face detection, and medical diagnosis [13, 48, 53, 85, 115].

Deep learning (DL) is a subset of ML, which aims at learning many levels of distributed representations of the data to be modeled. DL models have achieved massive progress in terms of algorithms, applications, and theories. It utilizes hierarchical recombination of features to extract relevant information and then learn the pattern representation by employing a multi-layer neural network. In recent years, medical image analysis has been boosted by ML and DL. The ML and DL methods help doctors diagnose the disease, predict its risk, and prevent them at an appropriate time [34]. It also helps to predict the number of patients of upcoming days in pandemics [86]. However, DL- and ML-based applications usually demand a large amount of annotated data to train the model, which relies on human experts with specialized knowledge and clinical experience to achieve adequate results. The model’s performance improves as the amount of annotated data increases.These annotated data are known as Ground-Truth (GT) and used for training, testing, and evaluating the models. GT represents the optimal performance that an algorithm is desired to achieve [20]. As illustrated in Fig. 1, the ML process starts with the raw dataset which needs annotation for training and testing the models. For evaluating the algorithmic performance, the deviation of a predicted result is measured with respect to the appropriate GT.

Fig. 1
figure 1

Phases in the machine learning process. It starts with collecting raw data that demands annotation to train the model and measure its performance

Good annotation tools are in great demand due to the massive increase in digital data [31]. The annotation process aims to transfer human knowledge to the artificial intelligence models by summarizing and assigning predefined labels to the digital data content [9, 101]. Annotation tools are characterized based on the covered tasks, the provided functionalities, and the supported features as pre-processing and automatic labeling [87]. These tools enable the user to label an interesting object in a frame by supporting three modalities: manual, semi-automatic, and automatic [19, 31].

Manual annotation of audio, digital image, text, or video is the initial processing step of most research projects and systems [42]. It requires human annotators to delineate and label spatial regions in the image or define the temporal segments related to audio or video. The spatial regions are specified utilizing a standard shape, i.e., circle, point, freehand-drawn mask, ellipse, polygon, polyline, etc., whereas the temporal segments are determined by beginning and end timestamps. For example, a facial landmark helps in annotating the human face for detecting the position of the key point in the face such as nose tip, eye corners, and eyebrows [40]. These landmarks are the main components in different face applications including face recognition, facial attribute analysis, and face verification [33, 56]. Fig. 2 illustrates the most used annotation techniques for labeling images and text.

Fig. 2
figure 2

Common annotation technique used in the most tools. (a) Disease bounding box in chest X-ray image from [92]. (b) Facial landmark location apply in grayscale image from [40]. (c) Text annotation by assigning the label to the part of the text to be helpful for Named Entity Recognition (NER) from [152]. (d) Polygon annotation in ultrasound image from [21]. (e) Polyline annotation by using Supervisely tool for X-ray image from [26]. (f) Point annotation of the 40 mandible landmarks [22]. (g) Line annotation of a cancer lesion [122]

In most image-processing tasks, the desired annotations may range from labels at the image level (image classification), to framing boxes (object detection) or annotation at a pixel level (image segmentation) [38]. Image annotation and segmentation are core components for computer-aided diagnosis (CAD) and image recognition systems [102]. CAD systems use medical images to identify image features and diagnose lesions [116]. Moreover, CAD recognizes regions of interest (ROIs) by utilizing image segmentation and automatic annotation tools, to identify the relevant region. Medical imaging refers to several different technologies that are used to view the human body to diagnose, monitor, or treat medical conditions. It requires significant expertise to efficiently and correctly interpret the images generated by each of these technologies, which among others include radiography, ultrasound, and magnetic resonance imaging. Image annotation is widely employed in medical applications, where imaging modality is annotated by an expert to improve the model’s performance for lesion and disease detection. It is an approach to identify regions and add descriptions, explanations, or comments of these regions in a textual form. Annotating medical images should be accurate, therefore, usually, multiple experts view the data separately and perform the manual annotations [107]. After that, each annotator views the data independently and updates the annotations with primitive tagging. Next, they judge the annotations together and agree to any changes to correct and update the annotations. This process is known as the gold-standard annotation, see Fig. 3.

Fig. 3
figure 3

Annotation process to achieve high accuracy annotation and satisfy the gold standard technique

Defining the appropriate annotation tools saves time and effort, however, knowing all existing tools and choosing the most suitable one among them is much complicated. Moreover, many annotation tools have emerged last few years for various tasks, this motivates us to provide a high-level glance of their successful usage, graphical user interface (GUI), available annotation techniques, and supported features, i.e., zooming, input, and output. The main intention of this paper is to explore the major annotation tools for medical image tagging.

The remainder of the paper is structured as follows. A comprehensive review of related surveys is presented in Section 1.1. Section 2 presents the possible annotations for medical images including input formats and exportation. Section 3 presents the annotation tools applied for the medical images with its snapshot of these tools. Medical image applications that employed the reviewed tools are descried in Section 4. Lastly, Section 5 discuss the reviewed tools and Section 6 concludes the paper .

1.1 Related survey

Previous surveys devoted to annotation tools offer a synopsis of the whole field while presenting the supported feature. In recent years, a significant amount of research has emerged about the use of different annotation tools for different content types. This illustrates the important role of annotation tools in the modern world. Dasiopoulou et al. [35] reviewed image and video annotation tools concerning functionality and interoperability perspectives, to focus on the problem of the communication, sharing, and reuse of generated metadata. Neves and Leser [105] presented a survey of biomedical text annotation tools, featuring 35 criteria to evaluate 13 annotation tools. These criteria encompassed issues of documentation, supported formats, extensibility, implemented functionality, platforms, and popularity. Gaur et al. [50] presented a survey for five current used annotation tools for video tagging with snapshots of tools’ GUIs. These tools are VATIC, Beaverdam, ViTBAT, iVAT, and MViPER-GT. Moreover, they compared these tools in terms of platforms, targets, object shape, the used machine learning algorithms, and interface design. Rebinth and Kumar [116] reviewed various manual annotation tools and different available datasets. They presented the used tools in the segmentation stage for medical imaging to provide automatic detection and diagnosis of diseases.

In this survey, we focus on the tools that support manual annotation for medical imaging, which has been proven its successful application for generating a gold standard annotation by at least one research. Table 1 compares our survey to others in terms of the medical image tools, tools’ snapshots, and mentioned applications of the tools. We selected the tools according to the following constraints: the tools should be easy to use, publicly available and support zooming. This survey will focus on the medical field and discuss the 13 most popular tools in tagging images. Moreover, we will support the tools with GUI snapshots and their successful application.

Table 1 Comparison between this survey and the related ones in terms of number of reviewed tools, medical image applications, tools’ snapshots and mentioned tools’ application

2 Medical image annotation

Image annotation is the process of classifying or labeling an image using text, annotation tools or both, to make a set of corresponding labels for each image to train the ML and DL models. This process is commonly applied to identify objects, boundaries and to segment images. Therefore, medical image annotation is the process of labeling medical images from different imaging modalities such as MRI, CT Scan, Ultrasound, Mammography etc., for ML and DL training. These annotation play a significant role in the healthcare sector to assist with diagnosing different diseases, segmenting organs at risk before radiation therapy and performing robotic surgery.

Figure 4 shows different types of medical image annotation. The segmentation task for different human body organs allow further quantitative analysis of many clinical parameters, including shape and volume such as in cardiac or brain image analysis. In addition, it is often a significant first step in CAD pipelines as shown in Fig. 5.

Fig. 4
figure 4

Different medical image annotation types (a) Original image [61]. (b) Benign and malignant classification of mammogram images. (c) Mass detection. (d) Breast arterial calcification segmentation in mammogram images. (e) Segment of different breast tissue regions in mammograms: Four regions (pectoral muscle; fatty region; nipple region; dense region

Fig. 5
figure 5

Organ and substructures annotation: (a) Vertebrae segmentation [114]. (b) Pancreas segmentation [119]. (c) Lung segmentation [23]. (d) Liver segmentation [41]

Many ML and DL technics in the medical field aim to segment or detect abnormalities to quantify or classify them into malignant or benign. Both segmentation and classification processes can be considered as a classification task, first classifying each pixel as belonging to a lesion or not, which is known as a semantic segmentation. Then, classifying segmented abnormalities as malignant or benign [77]. The detection of objects or lesions in medical images is an important part of disease diagnosis, but it is often a labor-intensive process. In most cases, the detection consists of the localization and identification of a small part in the full image. There has been extensive research on CAD systems that are developed to automatically detect lesions with high accuracy and thus decrease the reading time of human experts. Annotation tools have been used to segment different objects in different organs, such as masses in the breast, nodules in the lungs, vessels in the retina, and tumors in the liver, brain and other organs as shown in Fig. 6.

Fig. 6
figure 6

Part of organ annotation: (a) Brain tumour segmentation [110]. (b) Liver tumour segmentation [135]. (c) Lung lesions detection [151]. (d) Retinal vessels segmentation [134]. (e) Breast arterial calcification segmentation [5]

2.1 Input and output

Various scanning techniques have been used to visualize the interior of the human body generating multiple modalities including X-ray such as in mammography [3], Ultrasound (US), Magnetic Resonance imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), microscopy images, histology slide images, dermoscopy images, Optical Coherence Tomography (OCT) images, and color fundus images [8]. Figure 7 shows examples of medical images. CT and MRI are able to examine multiple organs at the same time, while retinal photography and dermoscopy are organ-specific. The amount of data generated from each imaging modality vary, for instance an MRI can be a few hundred megabytes while a histology slide is an image file of a few megabytes. This has technical effects on the way the data is pre-processed, and on the design of the model architecture, in terms of processor and memory limitations. Medical imaging has led to improvements in the diagnosis and treatment of numerous medical conditions in children and adults.

Fig. 7
figure 7

Examples of expected inputs figures are cropped from the corresponding papers: (a) 2D CT scan image [76], (b) Brain MRI image [76], (c) Shoulder X-ray image [76], (d) OCT image [11], (e) Fundus image [11], (f) Ultrasound image [76], (g) PET brain image [76], (h) Breast histology image [10], (i) Dermoscopy image [153]

There are several of medical imaging, each of which uses different technologies and techniques. CT and radiography (including mammography) use ionizing radiation to generate images of the body. In radiography a single image is recorded for later evaluation (mammography is a special type of radiography to image the internal structures of breasts), while in CT many X-ray images are recorded as the detector moves around the patient’s body, a computer reconstructs all the individual images into cross-sectional images or ’slices’ of internal organs and tissues. 2D images generated by X-ray are used in several evaluation settings such as bone fractures, pneumonia, pulmonary edema, renal or gallbladder stones and intestinal obstructions. CT images are used in many settings, including trauma evaluation, and can be used to evaluate internal organ systems such as neurologic, gastrointestinal, genitourinary, and the vascular system. MRI is a medical imaging procedure for making images of the internal structures of the body. MRI scanners use strong magnetic fields and radio waves (radiofrequency energy) to make images. During an MRI exam, an electric current is passed through coiled wires to create a temporary magnetic field in a patient’s body. Radio waves are sent from and received by a transmitter/receiver in the machine, and these signals are used to make digital images of the scanned area of the body. OCT images are the main diagnostic technology for retinal diseases. Fundus images show the retina, the optic disk and blood vessels. It is used in to diagnose diabetes retinopathy, macular degeneration, and glaucoma.

Medical image file formats can be divided into two main groups. The first format is intended to standardize the images generated by diagnostic modalities like Dicom. The second format is aimed to facilitate post-processing analysis like Analyze, Nifti, and Minc [88]. Some of the annotation tools support these formats while others require to convert the file to an image file format such as PNG or JPEG.

Annotation outputs are usually exported in one of the following file formats: Comma Separated Value (CSV) files, text files, Java Script Object Notation (JSON), Tensorflow Records (TFRecord), or as database-specific files [143]. JSON is a lightweight data-interchange format, which is considered easy for humans to read and write and also for machines to parse and process. Fig. 8 shows an example of expected output files for mass detection in mammography images. TFRecord file stores data as a sequence of binary records. Commonly used database-specific files include COCO which stores annotations using JSON, Pascal VOC that stores annotations in XML file, and YOLO that stores annotation in .txt file.

Fig. 8
figure 8

Examples of expected outputs: (a) is the corresponding JSON file for (b) the detected mass in breast mammogram, (c) is the corresponding CSV file for (b) the same detected mass

3 Annotation tools

Choosing the most appropriate tool for a specific application is very important because it significantly affects the quality of the data and the time needed to complete it. For this section we researched and tested 13 different image annotation tools and summarized the features of each one. In addition, we make reference to the papers where these tools were used to accomplish different tasks. For all the reviewed tools, we show a snapshot of the tools’ GUI where we tested the tool by performing a segmentation on a mammogram image to annotate different objects, including the pectoral muscle, fatty tissue, the nipple, and a breast mass.

3.1 VGG Image Annotator (VIA)

VIAFootnote 1 [42] is a software developed using HTML, CSS, and JavaScript by the Visual Geometry Group (VGG) from the University of Oxford. VIA is open source, straightforward, available for free and a manual annotation software used for video, image, and audio segmentation. It does not require any installation or setup since it runs in a web browser. This software adapts into a single self-contained HTML page with a size less than 400 Kilobyte, which operates as an offline program in most modern web browsers. The annotations export as JSON and CSV to allow further processing by other software tools. The software provides cooperative annotation of a big dataset by a group of human annotators. Also, it does not dependent on the external libraries. For detecting regions, it offers six shapes: polygon, rectangle, ellipse, circle, polyline, and point. The most used one is the rectangle, which is suitable for determining the object bounding box. The point is used to specify feature points like a landmark or the main points on MRI images. The text description is used to demonstrate the region content. A snapshot of the VIA’s GUI is presented in Fig. 9. This tool has been used for semantic segmentation [6], video annotation [2, 129], and image annotation [12, 25].

Fig. 9
figure 9

Breast parts and tumor annotation using VIA tool in X-ray image [26]

3.2 LabelIMG

LabelIMGFootnote 2 is a software developed using Python and QT utilized for the graphical interface. It is an open-source, straightforward, offline tool that can be accessed from both Windows and Mac operating systems via the GitHub. It annotates objects in graphical images using only bounding boxes. Annotations are stored as XML files in PASCAL VOC format, and it also supports YOLO format. However, it only supports a straightforward image annotation function which means that it does not provide annotation for a stream of the image, and without auxiliary annotation [154]. Figure 10 shows the snapshot of LabelIMG’s GUI. This tool was previously used for annotating ship images [18] and labeling bird images [118].

Fig. 10
figure 10

Breast parts and tumor annotation using LabelIMG tool

3.3 Ratsnake

RatsnakeFootnote 3 [67] is a generic image annotation, semantically aware software developed using Java. It is open source, straightforward, and provides quick image annotation with snakes. Ratsnake uses a semi-automatic approach for graphical image annotation and depends on a customizable active contour model with consideration of rapid user annotations. It allows fast segmentation and annotation of images with polygons, grids, or both. Moreover, it transforms binary masks to polygon annotations. However, only one mask can be added for each image. Ratsnake supports exporting an annotation as custom text, LabelMe XML, and owl format. Furthermore, it requires Java Virtual Machine to be installed. Figure 11 shows the snapshot of Ratsnake’s GUI. This tool was previously used for annotating images obtained by the wireless capsule [70] and annotating coherent regions in video [139].

Fig. 11
figure 11

Breast tumor annotation using Ratsnake tool

3.4 Visual object tagging tool (VOTT)

VOTTFootnote 4 is a software developed using TypeScript by the Commercial Software Engineering (CSE) group at Microsoft. It is an open-source, straightforward web application, available for free on GitHub and used for graphical images and video. It annotates objects in image using only bounding boxes. Annotations are exported in many formats as Microsoft Cognitive Toolkit TensorFlow (Pascal VOC and TFRecords), VoTT (generic JSON schema), and CSV. It provides the ability to export and import from local or cloud storage. Fig. 12 shows the snapshot of VOTT’s GUI. It was previously used for annotating feather images [15] and for annotating images in the YawDD dataset [1] in [4].

Fig. 12
figure 12

Breast parts and tumor annotation using VOTT tool in X-ray image

3.5 Mask editor

Mask EditorFootnote 5 [160] is a software developed using MATLAB. It is an open source, straightforward, and available free for use to generate image mask. It supports drawing irregular mask shapes around the object. It provides many annotation functions including erasing, super-pixel-marking, cropping, zooming, navigation between images and B-curve. It saves images with multiple formats including JPG, BMP, TIFF, and PNG. Figure 13 shows the snapshot of Mask Editor’s GUI. Mask Editor was previously used for annotating surgical tools [31].

Fig. 13
figure 13

Breast tumor annotation using Mask Editor tool

3.6 Supervisely

SuperviselyFootnote 6 [136] was developed by Deep Systems as a powerful platform for computer vision development, where individual annotators and large teams can work together and experiment with datasets. It is a web-based application with an easy-to-use GUI that helps individuals with and without ML experience to create computer vision applications. Supervisely has tools to draw the one in a completely manual way or semi-automatic way by selecting the desired area to create the marking and automatically generating the desired shape. It has quick access commands to make the marking process more efficient. Another important function is the ability to modify contrast and brightness to improve the marking process. Also, it can annotate either with vector graphics or on pixel level. The vector graphics tools are polygon, rectangle, polyline, and point, while the pixel level tools are brush, eraser, and smart tool. It enables the user to perform different functions regarding geometric objects, labeling data, and tags. It supports images, videos, volumetrics, and medical data with various formats including .png, .jpeg, .mp4, .avi, .dicom, .pcd and others. Furthermore, there are multiple ways for the annotations to be export like .json, .png masks, .tfrecords, .xml and more. However, time statistics and quality control mechanisms are missing in Supervisely. This community edition of this tools is free but a price is charged for self-hosted versions. Fig. 14 shows a snapshot using the Supervisely. This tool has been used before for object segmentation [14, 63], bone segmentation [71], object detection [46, 54, 108, 111, 117], segmentation [90, 158, 159] and segmentation of pneumothorax [138].

Fig. 14
figure 14

Breast parts and tumor annotation using the Supervisely tool

3.7 RectLabel

RectLabelFootnote 7 [78] (2017) is an image annotation tool for labelling images used for object detection, bounding box, and segmentation. There are many features of RectLabel like drawing a polygon, bounding box, line, and cubic bezier. Rectlabel allows drawing key points and the skeleton, label image pixels with the brush. It provides automatic superpixel tools to label images. RectLabel allows reading and writing in PASCAL, VOC and XML formats and enables the user to export to YOLO, COCO JSON, and CSV formats. It also allows users to export index color mask and separated mask images. RectLabel provides user- friendly labeling of images and retrieves images based on labels. However, it is only available through the Mac App Store. Fig. 15 shows a snapshot using the RectLabel. This tool has been used before for object detection [73, 79, 147, 148], Vessel lumen segmentation [150] and detection Polyps [112].

Fig. 15
figure 15

Breast parts and tumor annotation using RectLabel tool

3.8 LabelMe

LabelMeFootnote 8 [123] is a free, web-based software developed by MIT Computer Science and Artificial Intelligence Laboratory using JavaScript. This tool enables users to annotate images, focusing on ease of use and simplicity of the design. The segmentation is performed by drawing polygons over the objects of interest on the images. The user can export results in an XML file format, making them easy to extend and transfer. Fig. 16 shows a snapshot of LabelMe GUI. This tool has been used before for object segmentation [58].

Fig. 16
figure 16

Breast parts and tumor annotation using LabelMe tool

3.9 Labelme

LabelmeFootnote 9 [144] was developed by Kentaro Wada based on the previous Labelme (Section 3.8). It is a graphical image and video annotation tool, which is written in Python and uses Qt for its graphical interface. It provides polygon, rectangle, circle, line and point tools. It can be downloaded in Ubuntu, MacOS and Windows. Labelme enables the user to select a region to annotate the rectangular box which contains the object. After this, they can add a category or label of the object contained in the box. Finally, the annotation files can be exported as JSON files in VOC or COCO formats. The polygon annotation tool has a detailed contour of the object, which can be useful for image segmentation task. The disadvantage of LabelMe is that images can be only accepted in JPG format. Fig. 17 shows a snapshot of Labelme GUI. This tool has been used before for object segmentation [28, 47, 96, 121, 145] and object detection [80].

Fig. 17
figure 17

Breast parts and tumor annotation using Labelme tool

3.10 Computer vision annotation tool (CVAT)

CVATFootnote 10 [72] is an open-source program developed by Intel to annotate both images and videos data. The process in CVAT starts by creating an annotation task with a specific name, labels, and attributes. Datasets are then loaded from a mounted file system inside a container or the local file system. A task can include one image, one video or a set of images from shared storage. It allows users to annotate images with several types of shapes like boxes, polygons which used for both general and segmentation tasks, polylines, and points. CVAT is easy to reach by a web-based interface. However, CVAT has only been tested in Google Chrome browser and may not work well in other browsers. CVAT supports different image and video formats like *.png, *.jpg, *.mp4 and enables users to export annotations and images in a specific format such as CVAT for video, CVAT for images, PASCAL VOC and many other dataset formats. Fig. 18 shows a snapshot of CVAT GUI. This tool has been used before for object detection [106, 155] and segmentation [132].

Fig. 18
figure 18

Breast parts and tumor annotation using CVAT tool

3.11 LabelBox

LabelBoxFootnote 11 developed by Sharma, Daniel Rasmuson and Brian Rieger [127] is a free commercial online web-based annotation system for segmentation and classification purposes. It includes different types of marker, which are a line, point, brush, superpixel and brush. After finishing the annotation task, we can export the mask results in different formats such as CSV and JSON. The generated mask is compatible with multiple models such as Mask RCNN. LabelBox has one of the best user experiences so far. One of the things that makes annotation easier on LabelBox is that when we draw a marker on the object in an image, the polygon will move to the object border. Fig. 19 shows a snapshot of Labelbox. This tool has been used before for object detection [99] and object segmentation [5, 30, 62].

Fig. 19
figure 19

Breast parts and tumor annotation using LabelBox tool

3.12 ITK-SNAP

ITK-SNAPFootnote 12 [156] is a software application that allows users to annotate 3D medical images, manually draw anatomical areas, and automatically perform image segmentation. It was designed with the audience of the science and clinical researchers in mind; thus a focus has been put on providing a user-friendly interface and keeping a limited feature to prevent feature creep. ITK-SNAP is mostly used to work with Cone-Beam Computed Tomography (CBCT), MRI and CT data. The main features of the software are manual segmentation, image navigation, and automatic segmentation. ITK-SNAP is an open-source, free and multi-platform. It supports many different 3D image formats, like NIfTI and DICOM and export the segmentation results as images. Fig. 20 shows a snapshot of ITK-SNAP. This tool has been used before for 3D object segmentation [17, 157].

Fig. 20
figure 20

Breast parts annotation using ITK-SNAP tool

3.13 3D-Slicer

3D-SlicerFootnote 13 [44] is a multi-module software in which each module performs a special 3D medical image processing task. It is a free and open-source soft- ware developed using Python and C++. For annotating and segmentation of medical 3D images there are many modules that can be helpful for instance including Simple Region Growing Segmentation that is based on intensity statistics, EMSegment Easy that performs a quick intensity-based image segmentation on MRI and the Editor module that includes a collection of tools for manual segmentation (e.g., paint, draw) and semi-automatic segmentation (e.g., thresholding, region growing, interpolation). 3D-Slicer allows users to upload an extensive variety of image formats and includes converting format functions. Output of segmentation can be exported as NRRD or NIFTI file. Fig. 21 depicts a snapshot of 3D-Slicer. This tool has been used before for 3D object segmentation [124, 128, 141, 161].

Fig. 21
figure 21

Breast arterial calcification segmentation using 3D-Slicer tool

4 Medical application

For developing a robust and precise model, medical image annotation is required, but also considered a major hurdle [120]. Image-based CAD models try to facilitate medical lesion detection and abnormalities by evaluating medical images as objectively as possible, using image features and prior knowledge about the particular application domain. Such systems usually combine image segmentation methods to isolate ROIs corresponding to prominent objects, and automatic annotation methods, to attach labels that characterize each region. Prior knowledge is typically acquired from associated medical studies and domain experts, by manual annotation and segmentation of images [67]. In this section, we will present some medical applications that employed the above-mentioned tools. Tools for both detection and segmentation tasks are summarized in Table 2.

Table 2 Medical image segmentation summary. The table categorized according to the type of the image used in the research

4.1 Detection

Detection of abnormalities requires a tremendous amount of annotated data. Typically, detection is performed using a bounding box around each object of interest in an image [131]. The VIA tool has been employed for various detection tasks in medical images including knee joints detection by Kondal et al. [82], nuclei identification in placenta imaging by Ferlaino et al. [45], Chest region and heart detection in X-ray images by Gupteet al. [57] and foot ulcer detection by Cassidy et al. [24]. Additionally, it was utilized by Mallissery et al. [97] to identify sensitive data in medical images and by Rajaraman et al. [113] to detect abnormalities in chest radiographs related to COVID-19.

Various detection tasks have been developed based on LabelImg tool. For instance, osteoarthritis disease was detected in MRI images by Singh et al. [130], foot ulcers were detected by Cassidy et al. [24], other lesions were detected by Sha et al. [126] and malaria was detected by Nakasi et al. [104]. Additionally, Li et al. [91] used it for labeling tongue images. While Hahn et al. [60] used it for bounding abdominal aortic aneurysm region.

The Ratsnake tool was utilized for blood detection [68], and lesion detection [36, 69, 70]. VOTT was used for diagnosing brain tumors [43] and surgical tools detection [31]. Rahim et al. [112] used the RectLabel tool for detecting polyps in colonoscopy images. Wei et al. [146] used it to detect colorectal polyps on histopathology slides by setting rectangular bounding boxes around polyps. Moreover, Kawazoe et al. [79] employed it to detect glomerular in multi-stained human whole-slide histopathology images. Hadush et al. [59] used LabelMe to detect mass abnormalities in mammograms.

4.2 Segmentation

Segmentation involves annotating the object at pixel-level detail. Ciaparrone et al. [31] utilized VIA for annotating surgical tools. Alia et al. [7] provided accurate semantic segmentation of endoscopy artifacts by using VIA. Dhieb et al. [37] developed a framework for automated blood cells using VIA to segment cell images. Additionally, Hosseini et al. [64] developed an approach for counting, detecting, and categorizing cells inside microscopy images and annotating images using VIA. Lee et al. [89] utilized it for segmenting surgical tools. Brehar et al. [21] annotated Hepatocellular Carcinoma (HCC), and cirrhotic parenchyma (PAR) in ultrasound images of the liver using VIA. Vats et al. [140] used the Ratsnake tool to identify ROIs in gastrointestinal images. It was also used to segment lesions [36, 84].

Iglovikov et al. [71] used the Supervisely tool for segmentation of hand bones to train a DL model to assess pediatric bone age. Moreover, Francois et al. [46] used it for semantic segmentation of laparoscopic images of the uterus to automatically detect the organ, including its contours. In addition, Tolkachev et al. [138] employed it to segment pneumothorax air pockets on X-ray. Supervisely was also used by Zadeh et al. [158] to segment ovaries, uteruses, and surgical tools from laparoscopic gynaecological images that were used in image-guided surgery systems. Zaki et al. [159] used it for manual segmentation of the nucleus from different cell types.

Xie et al. [150] used the RectLabel tool to segment the vessel lumen from ultrasound images. Gurari et al. [58] used LabelMe to segment several medical image datasets to evaluate expert, non-expert, and algorithm segmentation performances. Zhu et al. [162] used it to segment teeth from natural color images by creating polygons around the teeth. Gou et al. [55] used Labelme to segment teeth on CT images. While Vlontzos et al. [142] employed it to segment vessel and catheter from fluoroscopy images. Gentil et al. [51] used LabelMe to segment cells in microscopy images. It was also used by Huang et al. [66] to segment vertebrae from MR images.

Roy et al. [121] used the Labelme tool to segment COVID-19 markers in lung ultrasounds. Liu et al. [94] used it to segment the throat from CT images to consider the effect of the throat area on tracheal intubation difficulty according to the irregularity. Chen et al. [27] employed Labelme to segment lungs from CT image. While Kordon et al. [83] used it to segment femoral condyles from knee joint X-ray images. Yu et al. [154] segmented the optic disc and the macula area using Labelme tool. SONG et al. [133] segmented gallstones in CT images by physicians and experienced radiologists using Labelme. Tang et al. [137] used it to segment the opacity regions in the lung from X-ray for Covid-19 patients.

Yushkevich et al. [157] used the ITK-SNAP tool to segment multi-modality imaging datasets like MRI brain scans. Besson et al. [17] used it to segment lung tumours from fluorodeoxyglucose (FDG)- PET images. Gaonkar et al. [49] used ITK-SNAP to manually segment of intervertebral disks. Xian et al. [149] manually segmented the main vessels from X-ray angiography images by ITK-SNAP. Muller et al. [103] segmented muscle volumes of the left and right lower legs and thighs using ITK-SNAP software. Park et al. [109] using the ITK-SNAP to manually segment aneurysms on each slice of CT image to a diagnosis of cerebral aneurysms. Kim et al. [81] used it to segment two regions of supraspinatus fossa and muscle in MRI slices. Segmenting a liver tumor along its boundary from the ultrasound image using ITK-SNAP has been done by Liu et al. [93]. Mansoor et al. [98] used ITK-SNAP to segment an anterior visual pathway from MRI images. Sanchez et al. [125] used it to segment bones from CT images. AlGhamdi et al. [5] used Labelbox for breast arterial calcification segmentation from mammography. Jha et al. [74] segmented polyps from gastrointestinal polyp images using Labelbox. Sirazitdinov et al. [132] used CVAT to manually annotate pixels which belong to tubes, wires, or catheters from a chest X-ray.

Vickery et al. [141] performed the annotation for the brain region using a 3D-Slicer. Besides, the 3D-Slicer was used to segment the hippocampal region [29], annotating abdominal organs [128], performing the data pre-processing for extracting the brain region [65], and highlighting a nodule in the chest [124]. Furthermore, Zhang et al. [161] employed a 3D-Slicer for positioning of the micro-calcification identified by a 3D bounding box in each digital breast tomosynthesis volume.

5 Discussion

The need to create annotated datasets has increased with the popularity and availability of application-specific DL methods. In the fact, building an extensive list of tools is challenging due to the large number of new tools that are being released while some of the old ones are no longer use. Moreover, defining the best annotation tool usually requires trying some of them to know their features and usability. However, the researcher may fail to obtain source or executable code for the many published tools or failing to install the tools due to different technical issues. We discuss some of the available software tools created for image annotation tasks and compare their feature scope in Table 3. Comparison between these tools has been conducted according to seven criteria which are web-based or desktop platform, provide zoom in∖out feature, free or need a fee, input types, output types, graphic annotation types and annotation method. For web-based systems, users do not need to download the software on their computers; they just need to open the tools link in a browser when they are ready to annotate the data. In contraposition, if the tool is installed, the user will not need the internet during the annotation process. Manual annotation of images by defining regions of interest by the user is a time-consuming and costly process. Furthermore, it is user dependent. On the other hand, automatic annotation uses ML algorithms trained with pre-annotated images to annotated new sets of images, which is cheaper and faster.

Table 3 Comparative summary of image annotation tools

By personally testing all 13 tools above, we found that all of them provide the ability to zoom in and out of specific parts of images. Most of the tools have some limitations not supporting all medical files formats except 3D-Slicer, ITK-SNAP and Supervisely, which accept more types of medical image file extensions to be annotated than the other tools. Another limitation is that some of them are not supported to work on all operating systems for example Reactlabel works only on the MacOS. Also Reactlabel is not free, so its use might be limited. Based on an experiment by Joel et al. [75] on 30 people, the process of importing a set of images into Supervisely was very direct and simple. With Labelbox it was also easy to import images, but it required extra steps such as defining the objects, colour for labels, and the type of tools before getting to the annotating screen. Furthermore, Labelbox does not allow the download of single or specific images but instead requires download of the entire dataset, and the same for Supervisely. In Labelbox, the generated output contains a JSON file, while in Supervisely it may also contain a mask of the annotation in an image format. Joel et al. [75] also found that Supervisely and LableBox are faster than other tools in opening a previously created mask, since both are implemented as cloud based.

LabelIMG tool lacks management properties, such the editing of images [32], which is supported by Supervisely to manage the project in several layers: teams, datasets, and workspaces, which is an advantage over LabelIMG. According to Dondi et al. [39] VIA does not require previous technical knowledge. It supports manual annotation requirements and allows users to produce metadata. Most non-technical users prefer to use VIA as it does not require a setup procedure or additional software installation. Moreover, the VIA web-based tool runs easily on all platforms such as Windows, MacOS and Linux. Mallissery et al. [97] used VIA and saved annotations as JSON file. Bernal et al. [16] utilized VIA to enable the user to get the CSV files with the entire annotation results i.e., annotation mask and text metadata values. Ratsnake relies on the use of a snake and directly provides the binary GT masks as a result. It has the appropriate pixel-wise level, is easy to use, and is an open-source annotation tool [52]. Both VOTT and CVAT, focus on single-annotator use instead of allowing many annotators to work simultaneously on the same file [95].

We also evaluated the annotation time and cost for all reviewed tools, which include the segmentation time per image on a 2 GHz Quad-Core Intel Core i5 device. We defined a test case to segment the breast arterial calcification (BAC) from a single mammogram image with size of 2457 × 1996 pixels. Note that, using the same segmentation method for all the tools was not feasible since they were developed for different applications. Thus, a polygon segmentation method was used in all tools except in LabelIMG, where we used a detection box, and in 3D-Slicer where we used a paint effect. Each tool’s time complexity and cost-effectiveness are summarized in Table 4 where N/A refers to that tool web-based that does not require installation. As we can see from the table, between the tools that offer the polygon method, Ratsnake was the fastest one and LabelMe was the slowest one. However, considering the tools’ size, the web-based tools (including VIA, Supervisely, CVAT and LabelBox) are more preferable for medical images since they do not need memory space.

Table 4 The annotation time and cost for each reviewed tool. The first column contains the time in seconds for segmenting the breast arterial calcification in a single mammogram, while the second column show the tool’s size in MegaBytes

Most of the annotation tools rely on semantic web technologies and use metadata attached to the annotated objects. Annotating a particular object in the image using different shapes is performed according to the model’s needs to carry out specific operations in the region. Several existing manual annotation tools demand installation and setup procedures. This condition often leads to barriers for non-technical users who cannot deal with setup and installation procedures on the different platforms [100]. User experience design (UXD) is an open research area for annotation tools to enhance usability and accessibility by adding features that increase user satisfaction.

As observed in Fig. 22 the most used tool in medical task is ITK-SNAP and VIA. However, the VIA tool is used for both segmentation and detection tasks, while the ITK-SNAP tool is employed only for medical image segmentation purposes.

Fig. 22
figure 22

The papers that used the reviewed tools in term of detection and segmentation task

6 Conclusions

Due to the large number of medical images generated, annotation tools software became necessary to tag on these images to improve the medical applications. In this survey, we reviewed various tools that had been used for medical image annotation under several constraints like ease to install and whether they support manual annotation. We presented a brief description and a snapshot for each tool and evaluated them according to different criteria. Also, we mentioned papers that used these tools in different tasks for medical application. Choosing the suitable software tool for annotating the images is an extremely important step that will significantly reduce the amount of work and time needed for annotation. Familiarity with all common tools and decide the suitable one among them is intricate. The aim of this survey is to help the researchers to select the appropriate annotation tool for their task and take a glance at the tool’s GUI to simplify dealing with it.

Our study shows that manual image annotation tools are more preferable tools in the medical domain. Manually tagging a large dataset of medical images is considered to be a time-consuming and computationally expensive process, since it needs multiple experts’ opinions to avoid human errors. On the other hand, semi-automatic and automatic annotation can achieve better results and save time and effort. Besides, we found that most users prefer web-based tools, which are easy to access and use, and do not require previous technical knowledge. Overall, we hope that this survey will increase awareness of what already exists and reduce resources for expensive functionality developments that are already available. A greater focus on the benefit of deep learning-based models to build automatic medical image annotation tools software will achieve better results. A further study focusing on user requirements is suggested to help the developers identify weaknesses in their tools or for those planning to develop new annotation tools.