Hyperspectral classification for identifying decayed oranges infected by fungi

*Corresponding author: Yong Niu, College of Forestry, Shandong Agricultural University, Taian, 271018, China. E-mail: yy_198111@163.com Received: 11 May 2017; Revised: 25 September 2017; Accepted: 27 September 2017; Published Online: 02 October 2017 Yin, et al.: Identification of the decayed oranges 602 Emir. J. Food Agric ● Vol 29 ● Issue 8 ● 2017 result in some potentially hazardous problems for human health (Lopes et al., 2010). Except for UV technique, near infrared spectroscopy (NIRS) was also helpful to identify this kind of damage due to the differences of optical properties of different tissues (Blasco et al., 2000). However, the conventional NIRS only can get a little of tissue information of the detected sample. In this study, spatial distribution of the different tissues was required considering the random distribution of decayed spots on citrus fruits. Therefore, it was very necessary and meaningful to explore a novel technology for detection of the decayed citrus fruits in the early stages. Recently, hyperspectral imaging technology, which can provide both spectral and spatial information of object simultaneously, has been used to identify some slight damages such as chilling injury (ElMasry et al., 2009; Sun et al., 2017) and slight bruise (Lee et al., 2014, Huang et al., 2015). And, in some resent studies, applications of this technique to detect the decay caused by fungi were also reported and showed that it was a powerful tool to identify the decayed citrus fruits (Lorente et al., 2013; Gómez-Sanchis et al., 2013, 2014). However, these studies mainly used Liquid Crystal Tunable Filters (LCTFs) technique for development of hyperspectral imaging system. The system based on LCTFs technique has two disadvantages: (1) a longer time is needed for tuning of system in the actual applications (Gómez-Sanchis et al., 2014) and (2) this system is not suitable for a moving target in the in-line detection. In contrast, hyperspectral imaging system based on image spectrophotometer (HIS-IS) is particularly suitable for a moving target with the high spectral resolution (Sun, 2010). Therefore, HIS-IS may be more practicable for our recent study. The main objective of this study was to observe the ability of HIS-IS for identification of early decayed citrus fruits caused by Penicillium digitatum (P. digitatum) by combining with the proposed novel spectrum and an image processing algorithm. For this purpose, the specific steps were as follows: (1) To develop a Vis-NIR HIS-IS system (400-1100 nm) to detect early decay in fresh citrus fruits. (2) To determine the characteristic wavelength images for fast detection of early decayed fruits by principal component analysis. (3) To develop a multispectral image processing algorithm for generating a two-dimensional (2-D) virtual classification image. (4) To identify the decayed citrus fruits using the proposed multispectral algorithm. The ultimate goal was to develop a fast and useful multispectral imaging technology and detection algorithm for in-line identification of early decay in citrus fruits. MATERIALS AND METHODS Experimental samples Navel orange, a special variety produced in Jiangxi China and higher economic value, was selected for this experiment. Healthy navel oranges were obtained from the local fruit market (Jiangxi Province, China). Then, two kinds of infected samples were obtained by inoculating with P. digitatum and P. italicum spores, respectively. The concentration value of suspension was about 106 spores mL-1. It was enough to generate decay (Palou et al., 2001). Each fruit was infected 1~2 spots. Next, these fruits were stored for 2~4 days in a controlled environment (25°C and 99% relative humidity). After this period, it was found that the small decay regions with different size were formed on the inoculated samples. RGB (red, green and blue) images of decayed and normal navel orange samples are shown in Fig. 1. It was seen that the peel color of decay region was similar to color of the normal skin around it, therefore making it difficult for a human inspector to detect it. In this study, a total of 540 navel orange samples were prepared and divided into three classes including 210 normal navel oranges (Class-I) and 210 infected navel oranges with P. digitatum fungi (Class-II) and 120 navel oranges infected by P. italicum fungi (Class-III). And, 90 normal samples of Class-I and 90 infected samples of Class-II were randomly selected as training set to develop the detection algorithm, whereas the remaining 240 samples (120 normal samples of Class-I and 120 infected samples of Class-II) were selected as test set to evaluate the performance of algorithm. In addition, Class-III without participating in the development of algorithm was used to assess the generalization ability of the proposed algorithm. Hyperspectral image collection and data processing Hyperspectral imaging system based on image


INTRODUCTION
Fresh citrus fruits are very popular due to its special flavor and high vitamin C concentration.In China, the yearly output has exceeded thirty million tons.Therefore, the citrus processing plants always seek to implement the automated sorting of fruit for improving the quality of fresh citrus fruits.The fast and nondestructive detection of peel defects of citrus fruits was a challenging task (Li et al., 2013;Magwaza et al., 2012;Qin et al., 2012).Decay caused by fungal infection was one of the most serious damages affecting the marketing of fresh citrus fruits compared to common surface defects such as scars.Infected fruits cannot be stored and transported for a longer period since only a few of decayed fruits can infect a whole batch of ones.Thus, citrus fruits industry will suffer the great economic losses if the damaged fruits are not detected at an early stage.
In citrus industry, green mold (Penicillium digitatum) and blue mold (Penicillium italicum) are the most common fungi resulting in postharvest losses (Eckert et al., 1989;Palou et al., 2011).In China, eighty percent of the decayed citrus fruits were caused by Penicillium digitatum and secondarily Penicillium italicum (Liu, 2002).These two kinds of diseases can lead to the postharvest rot rate of fresh citrus fruits up to 30~50 percent (Jia et al., 2013).The key difficulty of fast and automated detection of the infected fruits implies that damages caused by fungi are not easy to be found due to similar peel color to normal peel in the early stages.Therefore, it is very difficult to detect the infected fruit by using the standard RGB imaging system (Blasco et al., 2007).At present, workers in citrus process factory usually used the ultraviolet (UV) light technique to inspect the decayed citrus fruits because the infected regions show bright fluorescence under UV illumination (Momin et al., 2013).However, UV detection operation could Fast and nondestructive detection of early decay caused by fungal infection in citrus fruit was a challenging task for the citrus industry during the postharvest fruit processing.In general, workers relied on the ultraviolet induction fluorescence technique to detect and remove the decayed citrus fruits in fruit packing houses.However, this operation was harmful for human health, and was also very inefficient.In this study, navel oranges were used as research object.A novel method combining with hyperspectral imaging technology in the wavelength region between 400 and 1100 nm wavelength was proposed to solve this problem.First, normalization approaches were applied to decrease the variation of spectral reflectance intensity due to natural curvature of navel orange surface.Then, the spectral data of regions of interest (ROIs) from normal and decayed tissues was analyzed by principal component analysis (PCA) for investigating the performance of visible and near infrared (Vis-NIR) hyperspectral data to discriminate these two kinds of tissues.Next, six characteristic wavelength images were obtained by analyzing the loadings of the first principal component (PC1).And, a multispectral image was established by using the corrected six characteristic wavelength images.On basis of the multispectral image, pseudo-color image processing with intensity slicing was utilized to produce a two-dimensional color image with clear contrast between decayed and normal tissues.Finally, an image segmentation algorithm by combining the pseudo-color processing method and a global threshold method was proposed for fast identification of decayed navel oranges.For 240 independent samples, the success rates were 100 and 97.5% for decayed navel oranges infected by Penicillium digitatum and normal navel oranges, respectively.In particular, the proposed algorithm was also applied to detect the decayed navel oranges infected by Penicillium italicum (samples not used for the development of algorithm) and obtained a 91.7% identification accuracy, indicating a well generalization ability and actual application value of the proposed algorithm.
result in some potentially hazardous problems for human health (Lopes et al., 2010).Except for UV technique, near infrared spectroscopy (NIRS) was also helpful to identify this kind of damage due to the differences of optical properties of different tissues (Blasco et al., 2000).However, the conventional NIRS only can get a little of tissue information of the detected sample.In this study, spatial distribution of the different tissues was required considering the random distribution of decayed spots on citrus fruits.Therefore, it was very necessary and meaningful to explore a novel technology for detection of the decayed citrus fruits in the early stages.
Recently, hyperspectral imaging technology, which can provide both spectral and spatial information of object simultaneously, has been used to identify some slight damages such as chilling injury (ElMasry et al., 2009;Sun et al., 2017) and slight bruise (Lee et al., 2014, Huang et al., 2015).And, in some resent studies, applications of this technique to detect the decay caused by fungi were also reported and showed that it was a powerful tool to identify the decayed citrus fruits (Lorente et al., 2013;Gómez-Sanchis et al., 2013, 2014).However, these studies mainly used Liquid Crystal Tunable Filters (LCTFs) technique for development of hyperspectral imaging system.The system based on LCTFs technique has two disadvantages: (1) a longer time is needed for tuning of system in the actual applications (Gómez-Sanchis et al., 2014) and (2) this system is not suitable for a moving target in the in-line detection.In contrast, hyperspectral imaging system based on image spectrophotometer (HIS-IS) is particularly suitable for a moving target with the high spectral resolution (Sun, 2010).Therefore, HIS-IS may be more practicable for our recent study.
The main objective of this study was to observe the ability of HIS-IS for identification of early decayed citrus fruits caused by Penicillium digitatum (P.digitatum) by combining with the proposed novel spectrum and an image processing algorithm.For this purpose, the specific steps were as follows: (1) To develop a Vis-NIR HIS-IS system (400-1100 nm) to detect early decay in fresh citrus fruits.
(2) To determine the characteristic wavelength images for fast detection of early decayed fruits by principal component analysis.(3) To develop a multispectral image processing algorithm for generating a two-dimensional (2-D) virtual classification image.(4) To identify the decayed citrus fruits using the proposed multispectral algorithm.
The ultimate goal was to develop a fast and useful multispectral imaging technology and detection algorithm for in-line identification of early decay in citrus fruits.

Experimental samples
Navel orange, a special variety produced in Jiangxi China and higher economic value, was selected for this experiment.Healthy navel oranges were obtained from the local fruit market (Jiangxi Province, China).Then, two kinds of infected samples were obtained by inoculating with P. digitatum and P. italicum spores, respectively.The concentration value of suspension was about 10 6 spores mL -1 .It was enough to generate decay (Palou et al., 2001).Each fruit was infected 1~2 spots.Next, these fruits were stored for 2~4 days in a controlled environment (25°C and 99% relative humidity).
After this period, it was found that the small decay regions with different size were formed on the inoculated samples.RGB (red, green and blue) images of decayed and normal navel orange samples are shown in Fig. 1.It was seen that the peel color of decay region was similar to color of the normal skin around it, therefore making it difficult for a human inspector to detect it.In this study, a total of 540 navel orange samples were prepared and divided into three classes including 210 normal navel oranges (Class-I) and 210 infected navel oranges with P. digitatum fungi (Class-II) and 120 navel oranges infected by P. italicum fungi (Class-III).And, 90 normal samples of Class-I and 90 infected samples of Class-II were randomly selected as training set to develop the detection algorithm, whereas the remaining 240 samples (120 normal samples of Class-I and 120 infected samples of Class-II) were selected as test set to evaluate the performance of algorithm.In addition, Class-III without participating in the development of algorithm was used to assess the generalization ability of the proposed algorithm.

Hyperspectral image collection and data processing Hyperspectral imaging system based on image spectrophotometer
A visible and near infrared (Vis-NIR) hyperspectral imaging system based on image spectrophotometer (HIS-IS) shown in America, INC., USA), conveying stage (EZHR17EN, Allmotion, Inc., USA), data acquisition software (Spectral Cube, Spectral Imaging Ltd., Finland), and computer (DELL, RAM 4.00G).The spectrograph had an internal slit (30 μm) to determine a field of view (FOV) in horizontal direction and acquired the hyperspectral images of samples with a spectral interval of about 2.8 nm in the wavelength region of 400-1100 nm producing a total of 250 singlewavelength images.The conveying stage was moved by a step motor (GPL-DZTSA-1000-X, Zolix Instrument Co, China) and the movement was synchronized with the image acquisition by the Spectral Cube Software.

Image acquisition and calibration
Before the hyperspectral images of samples were acquired, some parameters including sample movement speed and exposure time were set to avoid the distortion of fruit object in the hyperspectral image and also make the spectral images remain clearer.In current study, these parameters were set to 1.2 mm s -1 and 50 ms by the pre-test, respectively.The distance from the lens to the conveying stage was set to 450 mm.Two Vis-NIR linear lamps were mounted at 45° angles from the horizontal plane, respectively.During the image acquisition, every navel orange sample was manually placed the conveying stage and moved to the field of view (FOV) of the camera and then scanned line by line.The Spectral Cube Software was used to collect the three-dimensional hyperspectral image data cube with two-dimensional spatial information (x, y) and one-dimensional spectral information (λ).The image acquisition process was controlled by the Spectral Cube Software.The acquired hyperspectral images were stored with a raw format in a computer.Because there were obvious noise signals in the acquired hyperspectral images outside the wavelength range of 550~1000 nm, the final hyperspectral image of each sample was resized to the wavelength range of 550~1000 nm with a total of 161 single-wavelength images.
After the hyperspectral images were acquired, the original image must be corrected by using the white and dark references.The typical white reference image can be obtained by acquiring a white diffuse reflectance board (Spectralon SRT-99-100, Labsphere Inc., North Sutton, NH, USA) with 99% reflection efficiency and the dark reference image with 0% reflectance can be collected by covering the lens and turning off two Vis-NIR linear lamps.
The corrected image (R) was computed based on following equation (Mehl et al., 2002;Xing et al., 2005): where R o , R w and R d are the raw hyperspectral image, white reference image and dark reference image, respectively.However, different from a flat object, shape of navel orange is similar to sphere.Therefore, change of the reflectance intensity depends on the position of the pixel on the fruit.Thus, the above preprocessing from equation ( 1) can only correct the dark current noises of camera and spatial variations of the linear lamps in the plane scene without considering the spatial variations caused by shape of the navel orange.Normalization methods including maximum, median and mean normalizations (Max_N, Med_N and Mean_N) were applied to reduce the variations of spectral reflectance intensity due to natural curvature of navel orange surface.These were computed as follows: where A i represents a continuous reflectance spectrum (p variables) of any pixel in the hyperspectral image.A i (Mean_N, Max_N or Med_N) is the mean, max or median reflectance spectrum corresponding to the processed pixel, respectively.The final performance of the proposed correction methods was assessed by coefficient of variation (CV).

Principal component analysis (PCA)
PCA as a very effective dimension reduction method has been used to determine the spectral data with lower dimension in many published studies (Li et al., 2011;Huang et al., 2015).In recent study, the hyperspectral data of ROIs of two kinds of tissues (decayed and normal) were processed by PCA clustering analysis.And, the optimal principal component (PC) was utilized to get its loadings curve.Then, the characteristic wavelength images can be selected by observing the peaks and valleys of loadings curve.On basis of the selected images, it was feasible to establish a fast multispectral algorithm for the visualization detection of decayed navel oranges (Li et al., 2011).

Image processing based on pseudo-color enhancement
Pseudo-color processing is a frequently-used image enhancement way in remote sensing field and medical applications (Khan et al., 2008;Wang et al., 2010).Considering the practical application of fast multispectral detection algorithms, intensity slicing technique, which was one of the simplest and effective pseudo-color processing technologies, was used to enhance the contrast between decayed and normal regions in the multispectral images.Gray-level corresponding to color was assigned by the following formula: where c k is the color related to the kth intensity interval V k .g(x, y) represents the pseudo-color image after processing.By the pseudo-color image processing, the original multispectral image with different gray levels f(x, y) changed into pseudo-color image g(x, y) and the different types of tissues on the navel orange were then visualized.

Spectral data analysis
The typical reflectance spectra of two regions of interest (ROIs) in the wavelength region of 550~1000 nm were shown in Fig. 3. To compare the spectral features of two kinds of tissues, two ROIs with the same size was extracted at the same regions of a normal navel orange and a decayed navel orange, respectively.In this figure, two spectra showed the similar characteristics with different reflectance values.Comparing with the reflectance of the spectrum of decayed region, spectrum of normal region showed the slightly higher reflectance intensity in the whole Vis-NIR wavelength range.Therefore, the single-band image might be useful to detect the infected navel oranges only based on analyzing the spectral intensity information on different tissues.Thus, a single-band image (975 nm) located at characteristic valley of the spectral curve was selected as objective image for segmentation of decayed region on the navel orange.As shown in Fig. 3, the plot on the left side of objective image showed spatial intensity variations from the solid horizontal line on the navel orange.It can be found from this plot that normal skin at the edges had the lower reflectance intensity than one of decayed region due to the uneven distribution of light intensity on the spherical navel orange.Therefore, the spectra shown in Fig. 3 do not account for the spatial variations in the intensities from the center portions toward the edges.Based on this fact, a single-band image was incompetent to be used for identification of decayed region on the navel oranges.So, it was important to reduce the spatial variations from uneven light scattering before further analyzing the hyperspectral images of navel oranges.

Correction of spectra
Hyperspectral image data integrates the spatial and spectral information of the detected navel orange samples.Therefore, the uneven distribution of light intensity on the navel oranges can be reflected through the form of image or spectrum.Thus, correction can be performed by analyzing the image or spectrum.In this work, three different normalization methods called Max_N, Med_N and Mean_N were tried to correct the spectral data.Firstly, a hyperspectral image of the normal navel orange was randomly selected, and then the spectra of eight ROIs (ROI-1~ROI-8) in size 3 by 3 pixels for each ROI along the radius of the fruit were extracted, as shown in (Fig. 4a) and 4(b).(Fig. 4a) showed the raw hyperspectral image and the selected eight ROIs.(Fig. 4b) showed the raw spectral curves of eight ROIs before correction operation.Next, the spectra were processed by the normalization methods, and results indicated that coefficients of variation (CVs) of corrected spectra were 0.0221, 0.0298 and 0.0209 and processing times were 4.14, 5.24 and 4.02 s by using Max_N, Med_N and Mean_N, respectively.It obviously showed that Mean_N was the most effective method to decrease spectral variability.The corrected spectral plot was shown in (Fig. 4c) with a smaller CV (0.0209) comparing with one (0.2576) of the raw spectral plot shown in (Fig. 4b).

Principal component clustering analysis and characteristic wavelength image selection
Based on Mean_N correction method, all ROIs spectra (one spectrum per sample) including 90 spectra of normal tissues and 90 spectra of decayed tissues from the training set were corrected and then processed by PCA clustering analysis to assess the ability of corrected spectra for differentiating two kinds of skin tissues.The clustering plot of the first two principal components (PC1 and PC2) was shown in Fig. 5.
It is clearly showed that PC1 had the ability to distinguish these two kinds of tissues by using the corrected spectra because they were separable along PC1.And, this result also showed that the spectral features of each kind of tissue were effectively conserved in the corrected hyperspectral data.However, in real applications, it is impossible to correct the hyperspectral image with full wavelengths because this implies a very high time-consuming for high-dimensional data processing.Therefore, selection of characteristic wavelength images was necessary to develop a multispectral system for the needs of practical application.Because each principal component (PC) image was a linear sum of all single-wavelength images multiplied by their corresponding to wavelength loadings, the characteristic wavelength images could be obtained by analyzing the loadings.The loadings for PC1 were shown in Fig. 6.In the loadings curve, every peak and valley represented a key wavelength image with a greater contribution to PC1.As shown in Fig. 6, six characteristic wavelength images centered at 577, 629, 702, 751, 808 and 923 nm can be selected.However, it was also noticed that these characteristic peaks and valleys have the wider wavelength band, therefore, it was very difficult to find those exactly characteristic wavelength images for classification.Different from other literatures (Kim et al., 2002;Xing and De Baerdemaeker, 2005;Huang et al., 2015), in this work, spectral images within the bandwidth of 10 nm in each characteristic peaks or valleys were respectively averaged in order to generate a new image as the final characteristic wavelength image corresponding to the relevant characteristic peak or valley.Another consideration for this processing was that the narrowband filters, such as filter with the bandwidth of 10 nm, were commonly used to develop an in-line multispectral detection system for determination of fruit quality in the actual industry application (Qin et al., 2012;Huang et al., 2015).

Correction of characteristic wavelength images
On the basis of above analysis, it can be found that normal and decayed tissues can be discriminated by analyzing the corrected spectra of two types of ROIs.However, the detected fruits position on in-line sorting chain was random, therefore, analysis of whole surface rather than only ROIs for the detected navel orange was necessary.Thus, the same Mean_N correction processing was utilized to correct the obtained characteristic wavelength images.A case on the correction of characteristic wavelength image was shown in Fig. 7. Fig. 7(a) and 7(b) show raw image without correction processing and corrected image by applying the Mean_N, respectively.Before the correction was performed, it was found that the image of navel orange had the lower brightness on the edge region than on the central region due to the spherical geometry of navel orange.After applying the correction method, the relatively even brightness distribution was displayed over the whole fruit surface.

Detection results
The proposed multispectral visualization method for fast identifying the early decayed navel oranges was used to assess the samples including 120 normal and 120 infected navel oranges with P. digitatum fungi in the test set.And, another 120 independent samples infected by P. italicum fungi were also used as a new test set to assess the generalization ability of the proposed algorithm.Table 1 shows the detection results where no one decayed fruit was misclassified as normal navel oranges in terms of 120 decayed fruits with P. digitatum fungi.In the practical applications, this detection result was very important for the citrus industry because only several decayed fruits or even one could spread the fungi to normal fruits in the same batch, causing large economic loses.For the normal samples, only  three samples were misidentified.The low ratio of misdiscrimination was not a severe problem, since there was a greater tolerance to reject normal fruit than to accept fruit with any kind of decay in the citrus industry.The total identification rate was 98.8% for 240 samples in the test set.For detection of the infected fruit with P. italicum fungi, although this kind of samples were not used for development of algorithm, 91.7% identification accuracy further implied that the generalization ability of the proposed algorithm was well.

CONCLUSIONS
In real time production and processing, those decayed fruit must be rejected from the fruit quality grading line in order to effectively prevent cross-infection in the process of storage and transport.In this research, Vis-NIR HIS-IS in the wavelength region of 550~1000 nm was evaluated for detecting the citrus decay especially for early decay caused by P. digitatum fungi in navel oranges.Three methods including Max_N, Med_N and Mean_N were proposed to reduce the spectral variations caused by curved surface of navel oranges and Mean_N was confirmed as the most effective correction method.Six characteristic wavelength images centered at 577, 629, 702, 751, 808 and 923 nm were selected by analyzing the loadings of PC1.
A multispectral image was then established by combining the six corrected characteristic wavelength images with an even brightness distribution over the fruit surface.
The pseudo-color image processing was proposed to change the multispectral image into RGB image with the clear contrast both decayed and normal tissues.On basis of pseudo-color image processing method and a global threshold method, an image segmentation algorithm for detecting the decayed navel oranges was developed.The overall identification rate for test set was 98.8% with no false negatives.Study results showed that HIS-IS combining with the proposed algorithm was a powerful technology for fast detection of the early decay caused by P. digitatum fungus in navel oranges.Moreover, this study also implied that the proposed algorithm had well generalization ability by identifying the navel oranges infected by P. italicum fungi.
Fig 1.Samples including the decayed navel oranges caused by Penicillium digitatum (left) and Penicillium italicum (middle), and a normal navel orange (right).

Fig 2 .
Fig 2. Hyperspectral imaging system based on image spectrophotometer.

Fig 3 .
Fig 3. Spectral curves of two regions of interest obtained from normal and decayed tissues.

Fig 4 .Fig 5 .
Fig 4. Spectra correction of regions of interest as an example.

Fig 6 .
Fig 6.Loadings curve for the first principal component.
Fig 7. Characteristic wavelength image lighting correction based on mean normalization.

Fig 8 .
Fig 8. Flow chart of the multispectral segmentation algorithm of the decayed regions