Machine Learning System For Continuous Certification With New Datasets

Machine Learning 17/05/2024

3 main points
✔️ This study aims to provide a better understanding of continuous authentication using behavioral biometrics.
✔️ The most robust model is SVC, which has been shown to have an average accuracy of approximately 90%.
✔️ Results show that touch dynamics can effectively distinguish users.

Your device may know you better than you know yourself -- continuous authentication on novel dataset using machine learning
written by Pedro Gomes do Nascimento, Pidge Witiak, Tucker MacCallum, Zachary Winterfeldt, Rushit Dave
(Submitted on 6 Mar 2024)
Comments: Published on arxiv.
Subjects: Artificial Intelligence (cs.AI)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

The purpose of this study is to gain a better understanding of continuous authentication using behavioral biometrics.

Behavioral biometrics is a technology that uses a person's behavioral patterns and characteristics to verify their identity when they access a device or system. Continuous authentication is a mechanism that not only logs in once, but continues to authenticate while the user is operating the system afterwards. This enhances security and prevents unauthorized access.For example, fingerprint or facial recognition may be used to unlock a smartphone. This is a type of biometric authentication that uses fingerprints or facial features to verify identity. Continuous authentication continues to monitor the user's behavior patterns to verify his or her identity even after the fingerprint or face has been used to unlock the phone once. For example, typing speed, swiping, and screen touches are monitored to see if they match the user's characteristics.

This prevents unauthorized access by detecting different patterns of behavior in the unlikely event that someone else takes the smartphone or logs in. Continuous authentication is an important technology that enhances security and keeps users safer than any single authentication method.

The research team provided a new dataset containing 15 minutes of gesture data for each of 15 users playing Minecraft on a Samsung tablet. Using this dataset, machine learning binary classifiers such as Random Forest (RF), K Nearest Neighbor (KNN), and Support Vector Classifier (SVC) were used to evaluate the reliability of specific user actions.

Proposed Method

Ethical training and approval played an important role in this study. Through the Collaborative Institutional Training Initiative (CITI) program, the research team covered topics such as ethical principles, informed consent, and privacy and confidentiality. In addition, the team obtained Investigational Review Board (IRB) approval to collect data on the Mankato State University campus. The following table illustrates the end-to-end process of the experiment. This chart shows how the study was conducted, how the data was processed, and how the model was trained and tested.

The data collection process used the Android Debug Bridge (ADB) tool to access the device's touchscreen metrics and run Python scripts to collect the data. 15 volunteers were given 15 minutes on a Samsung A8 tablet to Raw touch dynamics data was collected as they played Minecraft. This ensured that the data was realistic in a real-world usage environment.

During data cleaning and processing, rigorous filtering techniques were applied to exclude rows with default values, remove rows containing missing values, and standardize numerical columns. These steps ensured the reliability of the analysis and provided an excellent data set for training subsequent machine learning models.

During the feature extraction process, key features such as instantaneous touch velocity, acceleration, jerk, and path angle were selected from the cleaned and preprocessed data set. These features provided a more detailed understanding of user touch patterns and aided in training the continuous authentication system.

Experiment

The study evaluated the model primarily based on true positive, false positive, true negative, and false negative results. True positive (TP) refers to cases where authentic users were correctly classified, true negative (TN) refers to cases where fraudsters were correctly classified, false positive (FP) refers to cases where imposters were misclassified as authentic users, and false negative (FN) refers to cases where authentic users were misclassified as imposters.

Indicators such as accuracy, goodness of fit, reproducibility, F1 score, and area under the curve (AUC) were used to evaluate the model. These indices are calculated based on the equations presented in Table 2.

The evaluation of the models revealed that KNN produced above-average results, SVC showed exceptional results, and RF showed too-good-to-be-true results.

In particular, the RF model showed signs of over-training. The model adapted very well to the noise in the training data, but may not perform well with new data. Several techniques were employed to address this over-fitting, but the RF model results were ignored.

Accuracy, goodness of fit, repeatability, F1 score, and AUC were used as criteria to identify model performance. Based on these criteria, model performance was evaluated and appropriate adjustments were made.

Conclusion

Table 5 provides a comparative analysis of the performance of machine learning methods used in different research papers and current research. These include Siamese Recurrent Neural Networks (RNNs), Multilayer Perceptrons (MLPs), Support Vector Machines (SVCs), KMeans, Random Forests, K Nearest Neighbor (KNN), and Support Vector Classifiers (SVCs).

Performance measures in the table include accuracy, error rate, and stranger acceptance rate. The table shows that the most robust model is SVC, with an average accuracy of about 90%. This indicates that SVC can effectively distinguish users based on touch dynamics during Minecraft play.

Other methods also show high accuracy rates, with RNN, MLP, SVC, K-Means, and Random Forest achieving accuracy rates ranging from 86% to 97.7%. These results suggest that touch dynamics is a reliable source for continuous authentication.

However, the table also reveals limitations of some methods, such as the high error rate of the Siamese RNN (13%) and the high false acceptance rate of the multilayer perceptron (6.94%). These limitations mean that some methods are likely to misclassify users or accept impersonators, potentially compromising the security of the authentication system. Therefore, further research is needed to improve the performance and robustness of these methods.

Categories related to this article

Sasayama

Machine Learning System For Continuous Certification With New Datasets

Summary

Proposed Method

Experiment

Conclusion

[DeepCRE] Cutting-Edge Computational Models Revolutionize Drug Research And Development

[DeepCRE] Cutting-Edge Computational Models Revolutionize Drug Research And Development

Investigation Of A Method To Continuously Authenticate Users With Mouse Movements

Investigation Of A Method To Continuously Authenticate Users With Mouse Movements

Can Language Models Predict The Future At The Human Level?

Can Language Models Predict The Future At The Human Level?

[RL-GPT] A Framework To Acquire Diamonds Several Times Faster Than Usual With Mincraft Is Now Available

[RL-GPT] A Framework To Acquire Diamonds Several Times Faster Than Usual With Mincraft Is Now Availa ...

Applications Of LLM In Psychology

Applications Of LLM In Psychology

AI Model Inspector: AI Maintenance Inspired By Automotive Maintenance

AI Model Inspector: AI Maintenance Inspired By Automotive Maintenance