Concepts and ideas of using “machine learning” to create fraud detection modules

aminaas1575 · Post by **aminaas1575** » Wed Jan 08, 2025 4:26 am

Machine Learning Cover Image
The MLContext module is one of the main categories of ML.NET, which provides many functions and tools for "machine learning" tasks. MLContext can be thought of as a container for organizing machine learning workflows.

This time I used the Field Aware Factorization Machine model to train the prediction model. This model will cross the features to generate new features so that it can capture the interaction between features. It is specially used to analyze sparse features and has Cross feature data.

Explanation of proper nouns

Processing sparse features: Fraudulent text message detection usually uses sparse features, such as words, URLs, numbers, etc. in text message content. The Field Aware Factorization Machine model has good capabilities for processing sparse features and can automatically learn the interactions between features, helping to improve the accuracy of the model.

Consider feature interactions: The Field Aware Factorization Machine model is able to capture interactions between features, which is important in fraud text message detection. For example, the occurrence bahrain whatsapp phone number of specific words may be associated with a number or URL, and the Field Aware Factorization Machine can learn this interaction to help more accurately determine whether a text message is a scam.

special handling

Since directly using fraud text messages for training will use the entire text as a feature value, which is not conducive to the module's judgment of interactive relationships, we have done some processing in advance as follows:

Word segmentation processing: segment a complete sentence into several words, and use spaces as separators.
Remove words that will affect the judgment: replace the URL with [URL], and remove various description texts, emoticons, and time strings in brackets (it has nothing to do with fraud judgment, so as not to affect the judgment, replace it with the URL) The association with certain keywords is related to fraud)
The extracted URL is left to Google's fraudulent URL API to determine (in this module, the variables are removed but still retained, and the URL appears to be a possibility of fraud)