Search for collections on FTS Digilib

Fusion of Statistical and Stylistic Text Features with SVM for Persian Sentiment Analysis

Bahmani, Alireza (2025) Fusion of Statistical and Stylistic Text Features with SVM for Persian Sentiment Analysis. Journal of Future Artificial Intelligence and Technologies, 2 (4). pp. 534-548. ISSN 3048-3719

[thumbnail of 10.62411.faith.3048-3719-287.pdf] Text
10.62411.faith.3048-3719-287.pdf - Published Version
Available under License Creative Commons Attribution Share Alike.

Download (406kB)

Abstract

Sentiment analysis is a critical task in natural language processing (NLP) that classifies text into sentiment categories, such as positive, negative, or neutral. This task is particularly challenging for languages like Persian due to the complexity of their linguistic structure and the scarcity of high-quality labeled datasets. Previous studies on Persian sentiment analysis have largely relied on TF-IDF representations or deep learning models, often overlooking handcrafted statistical and stylistic features that capture subtle textual patterns. This limitation reduces their effectiveness, especially when dealing with informal or noisy text data. Experiments in this study were conducted on a dataset of Persian product reviews from Digikala.com, labeled according to user ratings to indicate positive, negative, or neutral sentiment. In this paper, we propose a novel approach to Persian text sentiment analysis by combining statistical and stylistic (surface-level) features with traditional text-based features such as Term Frequency–Inverse Document Frequency (TF-IDF). Unlike prior works that rely solely on TF-IDF or deep learning representations, our method integrates stylistic and statistical cues to capture expressive nuances in informal Persian text. Additionally, the Support Vector Machine (SVM) classifier is optimized using RandomizedSearchCV to enhance performance. The proposed system utilizes both statistical and textual features to improve classification accuracy. We compare its performance with four baseline models, i.e., Naïve Bayes, Logistic Regression, Random Forest, and Decision Tree, that rely solely on TF-IDF features. The experimental results demonstrate that the proposed approach outperforms the baseline models in terms of accuracy, F1-score, recall, and precision. Specifically, the proposed system achieved the highest accuracy (0.8354), significantly improving negative sentiment detection while maintaining strong performance in positive sentiment classification.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: dl fts
Date Deposited: 23 Mar 2026 03:28
Last Modified: 23 Mar 2026 03:28
URI: https://dl.futuretechsci.org/id/eprint/159

Actions (login required)

View Item
View Item