Machine learning-driven Raman spectroscopy for rapidly detecting type, adulteration, and oxidation of edible oils

by Hefei Zhao, Yinglun Zhan, Zheng Xu, Joshua John Nduwamungu, Yuzhen Zhou, and Changmou Xu

  • Real spectra can be rapidly and conveniently collected using vibrational microscopy techniques, such as Raman spectroscopy.
  • However, as compared to chemometric information derived from a spectrophotometer, gas chromatography (GC), or nuclear magnetic resonance (NMR), the quantitative determination of spectra is still difficult and time-consuming.
  • To enhance workflow efficiency and performance through augmented detection of features of Raman spectra, a new approach combining Raman spectroscopy technique and machine learning classification and regression has been developed to allow rapid and nondestructive detection of type, adulteration, and oxidation of edible oils and fatty foods.

The molecular vibrations of functional groups in triglycerides can be easily captured through Raman spectroscopy while shooting oil samples with a laser for just a few seconds. The degree of unsaturation of fatty acids allows this technique to classify and authenticate certain oils [1]. Liquid oxidation decomposes triglycerides by breaking carbon-carbon double bonds and generates new substances, such as free radicals, lipid hydroperoxide (LOOH), and aldehydes, which can also be detected by changes in functional groups through Raman techniques [2]. However, the holdback is usually the complicated spectra data analysis. Machine learning (ML), which is a subset of artificial intelligence (AI), has shown a great advantage in data analysis and has brought breakthroughs in processing spectra and diversity microscopy images [3,4], but its applications in the area of food science are still limited. The promise of substituting machine learning for manual analysis of Raman spectra in detecting the quality of edible oils is discussed in this article.

Raman spectroscopy is one of the molecular vibrational spectroscopies that can measure the chemical “fingerprints” of analytes non-destructively and rapidly. It has been used to characterize the chemical composition of bulk lipids, distinguish different edible oils and fats, and detect oil adulteration and lipid oxidation. Raman spectroscopy can also be implemented to identify adulterations in foods that include all three major nutrients, such as coconut water, honey, butter, meat, and milk. However, its overall performance and practical applications are still limited by the complicated spectral analysis. Therefore, to enhance workflow efficiency and performance through augmented detection of features of Raman spectra, we took the study of edible oils as an example and integrated machine learning into Raman spectroscopy to obtain rapid and accurate detection of the type, adulteration, and oxidation of edible oils.

Experimental approaches

The study employed a XploRA ONE™ 785 nm Raman Spectrometer System (HORIBA, Ltd., Kyoto, Japan) with a 785 nm near-infrared diode laser to collect the Raman spectra of three groups of samples: 1) 15 edible oil products (7 types) purchased from supermarkets (classification study); 2) mixed edible oils at various percentages (0, 5%, 10%, 20%, 30%, …100%), (adulteration study); and 3) oxidized edible oils at different levels (stored at 50°C in dark for 0, 1, 2, 3, 4, 5, 6, 8 and 10 weeks), (oxidation study).

Machine learning models, including lasso regression; elastic net regression; logistic regression; logistic regression with different regularizations, such as using L1 penalty, L2 penalty and elastic net penalty; random forest (RF); principal component analysis (PCA)+RF, PCA+ boosting; 1-dimensionalconvolutional neural network (1D-CNN); and 2D-CNN, were used to train and analyze the Raman spectra. The collection and analysis of Raman spectra were illustrated in Figure 1. We also employed the widely used authoritative methods for comparison, such as gas chromatography (GC) method for the authentication of oils, and prooxidant value (PV) and thiobarbituric acid reactive substances (TBARS) methods for the monitor of lipid oxidation.

Schematic diagram of collection and analysis of Ramen spectra
Fig. 1. Schematic diagram of collection and analysis of Ramen spectra

Classification of oil type

Our developed machine learning-driven Raman spectroscopy method was able to rapidly and accurately detect the type of 15 edible oils (Table 1). Among different algorithms, the random forest (RF) method was found to have the highest and fastest test accuracy in the classification of various pure edible oils (98.1% in 0.8 s), followed by logistic regression with L2 penalty and logistic regression with L1 Penalty. The individual oil test accuracy of RF was showed in the confusion matrix (Fig. 2).

Table 1. Training time and test accuracy for classification of various edible oils
Method Logistic regression Logistic regression with L1 penalty Logistic regression with L2 penalty Logistic regression with elastic net penalty PCA+RF Random forest PCA+ boosting Boosting 1D-CNN 2D-CNN
Training time 0.023 s 121.471 s 22.864 s 123.678 s 0.446 s 0.815 s 1.279 s 1.809 s 455.462 s 53.37 h
Test accuracy 0.800 0.981 0.990 0.971 0.914 0.981 0.848 0.857 0.714 0.895
Confusion matrices of validation oil data by the random forest method
Fig. 2. Confusion matrices of validation oil data by the random forest method

We compared this method with the traditional GC method with a principle component analysis (PCA) analysis. The GC-PCA method did classify different types of oils into separated clusters (Fig. 3). However, it took this method about 30 minutes per test on GC. Besides, it requires a pretreatment that transforms triglyceride to fatty acid methyl esters (FAME) which takes additional time and uses toxic and corrosive sodium methylate. In contrast, our developed machine learning-driven Raman spectroscopy method is a green technology and can achieve an accurate analysis within three minutes.

PCA for classification of various edible oils by GC method
Fig. 3. PCA for classification of various edible oils by GC method

Detection of oil adulteration

We expanded the machine learning-driven Raman spectroscopy method for the rapid detection of oil adulteration. We chose two simplified binaural systems, olive oil adulterated by soybean oil, and avocado oil adulterated by canola oil at various levels. The developed method was able to detect the level and type of adulterated oils rapidly and accurately. The random forest method was again found to have the highest and fastest test accuracy, both in the classification of olive oil adulterated by soybean oil (94.8% in 0.65 seconds) (Table 2), and avocado oil adulterated by canola oil (87.5% in 0.79 seconds) (Table 3).

Table 2. Training time and test accuracy for classification of olive oil adulterated by soybean oil
Method Logistic regression with L1 penalty Logistic regression with L2 penalty Logistic regression with elastic net penalty Random forest Boosting
Training time 42.458 s 3.299 s 281.032 s 0.650 s 2.478 s
Test accuracy 0.854 0.875 0.885 0.948 0.948
Table 3. Training time and test accuracy for classification of avocado oil adulterated by canola oil
Method Logistic regression with L1 penalty Logistic regression with L2 penalty Logistic regression with elastic net penalty Random forest Boosting
Training time 43.914 s 4.039 s 282.808 s 0.792 s 1.536 s
Test accuracy 0.875 0.856 0.856 0.875 0.856

Regression of oil oxidation

The developed machine learning-driven Raman spectroscopy method was further used in predicting the lipid oxidation in soybean oil and grapeseed oil models (Table 4). Results of classical oxidation evaluating methods showed that PV and TBARS of soybean oil increased from the initial 0.249 milliequivalent (MEQ)-peroxide/kg and 1.954malondialdehyde (MDA) ppm to 17.747 MEQ-peroxide/kg and 87.892MDA ppm in the 5th week, respectively; PV and TBARS of grapeseed oil increased from the initial 1.088MEQ-peroxide/kg and 11.811 malondialdehyde (MDA) ppm to 20.311MEQ-peroxide/kg and 99.953 MDA ppm in the 5th week, respectively. Using the machine learning-driven Raman spectroscopy method, the predicted R2 by lasso regression was up to 0.7583 for PV and 0.7519 for TBARS in soybean oil while the predicted R2 was up to 0.6126 by elastic net regression for PV and 0.7568 for TBARS by lasso regression in grapeseed oil.

Table 4. Machine learning (ML) regression of PV and TBARS of oxidized oils from their Raman spectra
Oil Oxidative factor ML regression method Predicted R2 Training R2
Soybean oil PV (0-5w) Lasso 0.7583 0.9999
TBARS(0-5w) Lasso 0.7519 0.9849
Test accuracy PV (0-5w) Elastic net 0.6126 0.9975
TBARS(0-5w) Lasso 0.7568 0.9956

This study demonstrated that by combining machine learning and Raman spectroscopy techniques, a relatively high accuracy and significantly fast detection of lipid quality has been achieved. It is possible that this method can be expanded to the application of the classification of other food materials and molecules. This should shed new light on the application of AI technologies in the determination of food quality.

About the Authors

Changmou Xu is a research assistant professor at the Food Processing Center, Department of Food Science and Technology, University of Nebraska-Lincoln (UNL), Lincoln, Nebraska, USA. He can be contacted at cxu13@unl.edu.

Hefei Zhao is a Ph.D. candidate at the Food Processing Center, Department of Food Science and Technology, UNL, Lincoln, Nebraska, USA. He can be contacted at hzhao@huskers.unl.edu.

Yinglun Zhan is a Ph.D. candidate at the Department of Statistics, UNL, Lincoln, Nebraska, USA. She can be contacted at yzhan@huskers.unl.edu.

Zheng Xu is an assistant professor in the Department of Mathematics and Statistics, Wright State University, Ohio, USA. He can be contacted at zheng.xu@wright.edu.

Joshua John Nduwamungu is an undergraduate student in the Department of Food Science and Technology, UNL, Lincoln, Nebraska, USA. He can be contacted at joshijo77.jj@gmail.com.

Yuzhen Zhou is an assistant professor in the Department of Statistics, UNL, Lincoln, Nebraska, USA. He can be contacted at yuzhenzhou@unl.edu.


Back to Member News