Comparing writing style feature-based classification methods for estimating user reputations in social media

In recent years, the anonymous nature of the Internet has made it difficult to detect manipulated user reputations in social media, as well as to ensure the qualities of users and their posts. To deal with this, this study designs and examines an automatic approach that adopts writing style features to estimate user reputations in social media. Under varying ways of defining Good and Bad classes of user reputations based on the collected data, it evaluates the classification performance of the state-of-art methods: four writing style features, i.e. lexical, syntactic, structural, and content-specific, and eight classification techniques, i.e. four base learners—C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)—and four Random Subspace (RS) ensemble methods based on the four base learners. When South Korea’s Web forum, Daum Agora, was selected as a test bed, the experimental results show that the configuration of the full feature set containing content-specific features and RS-SVM combining RS and SVM gives the best accuracy for classification if the test bed poster reputations are segmented strictly into Good and Bad classes by portfolio approach. Pairwise t tests on accuracy confirm two expectations coming from the literature reviews: first, the feature set adding content-specific features outperform the others; second, ensemble learning methods are more viable than base learners. Moreover, among the four ways on defining the classes of user reputations, i.e. like, dislike, sum, and portfolio, the results show that the portfolio approach gives the highest accuracy.

shared and shaped through social media, influence individual views of society and incite on and offline political participations, e.g. South Korea candlelight protest of 2008 and United States Occupy Wall Street of 2011. Thus, now social media has become an open platform for political and social innovations (Suh 2015;de Zuniga 2012).
However, there are problems and challenging issues for social media to grow into a better online place for political and social innovations, as well as for trusty information and opinions sharing. First, the anonymous nature of the Internet makes it difficult to ensure the qualities of users and their posts in social media. If there is no online user feedback on an anonymous user's posts, e.g. like and dislike, there is no way to find whether the anonymous user is good or bad before reading her/his post. This makes users in social media vulnerable to low quality posts and posters, offending and deceiving. Second, manipulations on user reputations arouse suspicion and mistrust on the online feedback system of social media in two ways: good quality posts and their users can be given low reputations maliciously by other users; some users can manipulate their own reputations to be high for certain reasons. However, identity changes, stemming from the anonymity of the Internet, lead to insufficient past reputation records on social media users, and make it hard to detect the manipulations on user reputations by the existing approaches, e.g. suggested in Lai et al. (2013).
To resolve the abovementioned problems and challenging issues, the user reputations in social media need to be estimated upon concrete user features. Moreover, previous works hint that writing styles of social media users can be such objective features (Koppel et al. 2009). Therefore, this paper proposes an automatic approach that adopts the writing styles of users as objective features for estimating user reputations in social media. Nevertheless, following research gaps are identified through the literature reviews: first, no study has been made on an automated classification of user reputations in social media by using writing style analysis; second, when a way of defining user reputations into Good and Bad classes is given, it is unclear which writing style feature and classification technique will be better for this study; third, there is no reference on how to define the classes of user reputations in social media for the better performance. Therefore, this paper proposes a research framework to find out a better way in estimating user reputations in social media by using writing style features. To explain, first of all, the paper segments the test bed users into Good or Bad reputation classes by proposed four approaches: like, dislike, sum, and portfolio. Moreover, it extracts four writing style features, i.e. lexical (denoted by F1), syntactic (denoted by F2), semantic (denoted by F3), and content-specific (denoted by F4), to represent the reputations of the test bed users. Next, it evaluates the classification performance of 32 configurations, resulted by combining four feature sets, i.e. F1, F1 + F2, F1 + F2 + F3, and F1 + F2 + F3 + F4, and eight classification techniques, i.e. four base learners-C4.5, Neural Network (NN), Support Vector Machine (SVM), and Naïve Bayes (NB)-and four Random Subspace (RS) ensemble methods based on the four base learners, with respect to accuracy for a given way of defining the classes of the test bed user reputations. In addition, it statistically compares the classification performances of different feature sets and different classification techniques by conducting pairwise t tests.
To sum up the contribution of this paper, it is the first work to deal with the estimation of user reputations in social media by using writing style features. If a system is built based on the experimental results of this study, the system can remedy the abovementioned shortcomings of social media's reputation system as follows. First, the user reputation estimation based on writing style features helps protect users from being exposed to bad users and their harmful posts in social media. Second, it contributes to establishing trust among social media users when sharing and searching information and opinions. Eventually, these help for social media to evolve into the trustworthy virtual place for political/social innovations and information/opinion sharing.
The rest of this paper is organized as follows. "Literature reviews" section briefly introduces and reviews the relevant literature. "Proposed research framework" section outlines the proposed research framework, and explains it in detail. Subsequently, "Experimental results and discussions" section demonstrates and discusses the experimental results of applying the suggested research framework to the Web forum of South Korea, Daum Agora, chosen as a test bed. "Results on comparative studies" section evaluates the results with statistical comparisons. Finally, "Conclusions" section concludes the paper with a reflection on limitations and further works.

Online anonymity
Online anonymity represents the incapability of others to identify an individual in computer-mediated communication (CMC) (Christopherson 2007). The online anonymity takes many different forms, grouped into three different types: first, visual anonymity is the most common type wherein physical characteristics are hidden although other identifying information is known; second, pseudonymity refers to the case when people use avatars or usernames as indicators of their online identity; third, full anonymity is said to exist where users remain unknowable after interaction has concluded, and occurs in the absence of any long-term usernames (Christie and Dill 2016). In this paper, the term anonymity refers to pseudonymity and full anonymity.
Due to the online anonymity, digesting posts in social media sometimes requires a great deal of risk taking, like doing businesses online without any physical interaction (Enders et al. 2008). For example, cyber criminals abuse the anonymous nature in social media to conduct malicious activities such as phishing scams, identity theft, and harassment (Iqbal et al. 2013). Hence, to alleviate such risk taking in reading posts, this paper aims at the objective user reputation system of social media, which is effective even under the anonymous circumstances.

Online user feedbacks and user reputations in social media
Social media users post their opinions regarding particular objects such as products, services, companies, people, and events Shad Manaman et al. 2016). Online user feedback mechanisms play crucial roles in evaluating the qualities of posts and their users. The online user feedbacks in social media are intended to offer social control mechanism, allowing social translucence for improved accountability (Erickson and Kellogg 2000). Mainly there are two types of online user feedbacks in social media: recommendations and reputations (Li et al. 2013). First, recommendations help users identify posts and users that suit their needs or preferences. They are usually used to solve information overload problems (Li et al. 2013). Recommendation systems are classified into content-based filtering, collaborative filtering, or hybrid approaches (Sarwar et al. 2001;Jin et al. 2004;Li et al. 2005;Huang et al. 2004;Liu et al. 2014;Yang et al. 2014). Second, reputations are considered as a collective measure of trustworthiness based on the referrals or ratings of users in social media (Jøsang et al. 2007). Reputation systems let users rate other users, and the ratings help determine who to trust in certain environments where users have to interact among themselves in online settings (Agudo et al. 2010).
Particularly for the reputation systems, there are two categories of calculating trust scores between users as user reputations: feature-based and graph-based. First, the feature-based method is to compute the trust score of an user from past ratings on the user's posts (O'Donovan and Smyth 2005). Second, the graph-based approach is to derive the trust values based on explicitly specified relations (e.g. friends) or trust relationships of the user (Golbeck 2005). Between the two, this study adopts the first method that measures reputations from past ratings on the user's posts because the anonymity accompanies uncovered or scarce relationships regarding the user.

Writing style features for characterizing user reputations in social media
According to systematic functional linguistic theory, a language has the textual dimension which individuals use to convey their ideas varying stylistic elements in their writings. The writing styles are influenced by education, gender, and vocabulary as well as subconscious factors described in the psycholinguistics works. The statistical analysis on such writing styles, a.k.a. authorship analysis, can discriminate authorship in social media (Abbasi et al. 2008a). Because Web-based channels such as e-mail, newsgroup, and chat rooms are relatively casual compared with formal publications, social media users are more likely to leave their own styles in their writings (Zheng et al. 2006). Hence, if the reputations of the social media users are characterized by their writing styles, authorship analysis can help resolve the problem of anonymity in the online communications of social media (Zhao et al. 2015;Iqbal et al. 2013). Previous works on writing style analysis in Table 1 also hint that writing styles extracted from posts can be objective features that characterize user reputations in social media. Actually, it is practical to use the writing style features for the anonymous users in social media because other features, e.g. their relationships as graph-based features, are not available in most cases. Nonetheless, to our best knowledge, there is no previous work made on classifying the reputations of social media users by using their writing styles.
Writing style markers that are known as the most effective discriminators of authorship in social media are lexical, syntactic, structural, and content-specific writing style features. Here, lexical, syntactic, and structural writing style features are called the content-free writing style features (Zhang et al. 2011;Jiang et al. 2014). Among the four writing style features in social media, the content-specific writing style features are expected to outperform the other content-free writing style features for this study because of their two characteristics: first, they consist of important keywords and phrases, so they are more meaningful with high representative ability than the other writing style features (Zhang et al. 2011); second, they contain a much larger number of n-grams extracted from the collected social media data, and the large potential feature spaces are known to be effective for online text classification (Abbasi and Chen 2008). Despite, all the writing style features used in Jiang et al. (2014) are considered for this study because there is no previous work that has shown which writing style feature is more useful for this study that aims to estimate user reputations in social media.

Classification techniques for writing style analysis
Three major types of writing style analysis tasks are identification, similarity detection, and classification (Zheng et al. 2006;Abbasi and Chen 2005). First, identification entails comparing anonymous texts against those belonging to identified entities, where the anonymous text is known to be written by one of those entities. Second, similarity detection task requires the comparison of anonymous texts against other anonymous texts in order to assess the degree of similarity. Third, classification is related to categorizing objects in regards to their properties, e.g. gender, by using their writing styles as features that represent the properties. This study belongs to the third category of classification techniques for writing style analysis. For the classification task, this paper adopts the supervised techniques because they have been extensively studied due to their predominant classification performance (Zheng et al. 2006;Abbasi and Chen 2009). In general, supervised techniques for classification consist of two steps: first, the extraction of features from training data and their conversion to feature vectors; second, training of the classifier on the feature vectors and application of the classifier to unseen instances. Hence, feature construction and learning method selection are crucial for accurate classification.
Referring to the classification techniques of previous writing style analysis works summarized in Table 1, four main supervised techniques are adopted as base learners for this study. To explain, first, C4.5, an extension of the ID3 algorithm, is a decision-tree building algorithm developed by Quinlan (1986), and it adopts a divide-and-conquer strategy and an entropy measure for object classification. Its goal is to classify mixed objects into their associated classes based on the objects' attribute values. Second, NN has been popular because of its unique learning capability (Widrow et al. 1994), and has achieved good performance in many different applications (Giles et al. 1998;Kim and Lewis 2000;Tolle et al. 2000). Third, SVM is a novel learning machine first introduced by Vapnik (1995), and is based on the structural risk minimization principle from computational learning theory. Because SVM can handle millions of inputs with good performance (Cristianini and Shawe-Taylor 2000;Joachims 2002), it was introduced to writing style analysis in many previous works (Argamon et al. 2003;Vel et al. 2001;Diederich et al. 2003). Fourth, based on Bayes Theorem (Barnard 1958), NB is a fairly simple probabilistic classification algorithm that uses strong independence assumptions regarding various features (Yang et al. 2002). It assumes that the presence of any feature is entirely independent of the presence of the other features, and allows building classification models efficiently.
Among the four base learners, SVM is a highly robust technique that has provided powerful classification capabilities for online authorship analysis. In head-to-head comparisons, SVM significantly outperformed other supervised learning methods such as NN and C4.5 (Zheng et al. 2006;Abbasi and Chen 2009). Similarly, SVM is expected to outperform the other base learners for this study. However, for writing style analysis, it is unclear which classification technique consistently performs better than others for a given problem in a given domain.
Moreover, for this uncertainty, it is not uncommon to conduct multiple learners and create an integrated classifier based on overall performance (Wang et al. 2014). Hence, in addition to the four base learners, this paper combines an ensemble learning method to each of the four base learners. Ensemble learning is a machine learning paradigm where multiple learners are trained to solve the same problem. In contrast to base learners that try to learn one hypothesis from the training data, ensemble learning methods try to conduct a set of hypotheses and combine them for use. In general, ensemble learning methods are divided into two categories: first, Boosting and Bagging are instance partitioning methods; second, feature partitioning methods include RS (Polikar 2006;Zhou 2012;Wang et al. 2014).
For this study, RS is selected as an ensemble learning method because it showed better accuracy than Boosting and Bagging in Wang et al. (2014). RS is an ensemble construction technique proposed by (Tin Kam 1998), and modifies the training dataset in the feature space. RS considers that, if one obtain better base learners in random spaces than in the original feature space, the combined decision of such base learners can be superior to a single classifier constructed on the original training dataset in the complete feature sets (Wang et al. 2014). Eventually, RS is combined with the four base learners selected for this study, and the resulted four multiple learner methods, i.e. RS-C4.5, RS-NN, RS-SVM, and RS-NB, are additionally adopted for this study. Considering the superiority of SVM to the other three base learners, RS-SVM is expected to outperform the other three ensemble learning methods.

Proposed research framework
To design and examine an automatic approach that uses writing style features for estimating user reputations in social media, this study proposes a research framework as outlined in Fig. 1. The research framework answers below research questions.

RQ1
How does writing style features perform for estimating user reputations in social media? RQ2 Which writing style features are the best at estimating user reputations by classification techniques in social media? RQ3 Which classification technique is better suited at differentiating user reputations with writing style features in social media? RQ4 Which method to define user reputation classes, i.e. Good and Bad, works better for estimating user reputations with writing style features in social media?
Ultimately, the research framework is intended for developing a system, which is capable of differentiating between Good and Bad reputation users by using stylistic tendencies inherent their writings in social media. In a nutshell, it consists of four steps: data collection, data representation, classification, and evaluation with comparisons. The following sub sections explain the details of each step in the research framework.

Collect data
This study uses the Web forum for data collection because it is a major type of social media with a balanced nature of discussions among participants and a relatively broader range of topics (Zhang et al. 2011). The data collection from the Web forum has two steps, crawling and parsing. First, the developed Web crawler programs collect the online data from the Web forum as HTML pages. Then, users whose posts had been evaluated at least once by the others are selected, and the posts and their past ratings are parsed out for the selected users from the raw HTML pages, and are stored in a relational database.

Represent user reputations by writing style features
In this step, writing style features are extracted as independent variables to represent the reputations of the collected and selected users from the Web forum. The class of a user reputation is obtained from her/his online user feedbacks by using different ways of defining user reputation classes. The details are explained in the following sub-sections.

Extract writing style features
This study generates different feature sets containing different types of writing style features. By doing so, we can compare and evaluate the performance of different writing style sets in estimating the classes of the selected users' reputations. Table 2 lists the writing style features in social media, adopted for this study. In this paper, the different writing style features are denoted as follows: lexical features F1, syntactic features F2, structural features F3, and content-specific features F4. The writing style features of Table 2 are based on the prior studies in Table 1, mainly from Jiang et al. (2014). In addition, unlike the previous works, emotional writing style features are included to F1 for this study. The emoticon refers to graphic representations of facial expressions, which often follow utterances in written CMC, and are produced by ASCII symbols or by graphic symbols (Skovholt et al. 2014). As a result, the four types of writing style features, i.e. F1, F2, F3, F4, are obtained after feature extraction. Based on those different types of writing style features, four feature sets are constructed in an incremental way: feature set F1; feature set F1 + F2; feature set F1 + F2 + F3; feature set F1 + F2 + F3 + F4. This incremental order implies the evolutionary sequence of features (Zheng et al. 2006;Abbasi and Chen 2008;Zhang et al. 2011).
Next, for feature selection, information gain (IG) heuristic is adopted due to its reported effectiveness in previous online text classification. IG (C, A) measures the amount of entropy decrease on a class C when providing a feature A (Quinlan 1986;Shannon 1948;Zhang et al. 2011). The decreasing amount of entropy reflects the additional information gained by adding feature A, and higher values between 0 and 1 indicate more information gained by providing certain features (Zhang et al. 2011). In this study, writing style features with IG (C, A) > 0.0025 are selected by referring to previous related works (Yang and Pedersen 1997;Abbasi et al. 2008b;Zhang et al. 2011).

Segment the selected users' reputations into Good and Bad classes
The selected Web forum users are segmented into Good and Bad reputation groups based on the ratings regarding their posts. In social media, there are generally two types of online user feedbacks: like and dislike, although there are various types of past ratings on their posts, e.g. helpfulness on reviews and reviewer rankings in Amazon.com, likes in Facebook, retweets in Twitter, ratings on sellers in eBay, ratings on answers in Q&A, etc. Hence, this paper proposes the four ways of defining social media users' reputations into two classes: Good and Bad. The four approaches are named as segmenting type s = {like, dislike, sum, and portfolio}, and the reputation classes for users, reputation s , are respectively defined as where like(user i ) is the number of likes that user i obtained per a post in social media, and m like is the average of like(user i ) for i = 1, …, N.
where dislike(user i ) is the number of dislikes that user i obtained per a post in social media, and m dislike is the average of dislike(user i ) for i = 1, …, N.
(2)  where sum(user i ) is equal to like(user i )-dislike(user i ) and m sum is the average of sum(user i ) for i = 1, …, N.

Estimate reputation s of the selected users
This paper adopts four base learners, i.e. C4.5, NN, SVM, and NB, commonly used for previous studies of writing styles in social media. Moreover, RS as an ensemble learning method is combined with the four base learners, resulting in RS-C4.5, RS-NN, RS-SVM, and RS-NB. In total, these eight classification techniques are used for this study. For an experiment, randomly 100 users for each reputation class are selected from the collected data, and a tenfold validation is performed to train a classifier and evaluate it. To implement the adopted eight classification techniques, the data mining toolkit WEKA (Waikato Environment for Knowledge Analysis) version 3.7.0 is used with all the default parameters because it is the most commonly used open-source toolkit with a collection of machine learning algorithms for solving data mining problems (Wang et al. 2014).
In detail, the WEKA modules for algorithms used in this study are as follows: J48 module for C4.5, Multilayer Perceptron module for NN, SMO module for SVM, Naïve Bayes module for NB, and Random Subspace module for RS.

Evaluate results with comparisons
To assess the performance of each feature set and each classification technique, this paper adopts the standard classification performance metrics. For the given segmenting type s, they are defined as To enhance understanding, Fig. 2 illustrates each metric of Eqs. (5)-(7). These four metrics have been widely used in information retrieval and text classification studies (Abbasi et al. 2008b;Li et al. 2008;Zhang et al. 2011). Among the four standard measures, accuracy assesses the overall classification correctness, while the others evaluate the correctness regarding each class. Therefore, this paper performs comparisons with respect to accuracy.
The comparisons are done by pairwise t tests because pairwise t test comparisons are the simplest kind of statistical tests and commonly used for comparing the performance of two algorithms (Derrac et al. 2011). To check whether the average difference in two (5) accuracy s = |{users classified correctly either as reputation s = Good or reputation s = Bad}| |{total users belonging either to reputation s = Good or reputation s = Bad}| .
(6) precision s (i) = |{ users classified correctly as reputation s = i}| |{ users classified either correctly or falsely as reputation s = i}| for i = Good, Bad.
recall s (i) = |{users classified correctly as reputation s = i}| |{users belonging to reputation s = i}| for i = Good, Bad.

Test bed: a Korean Web forum
This study targeted South Korea as a test bed country because South Korea has shown that social media can be used not only to exchange information and opinions, but also to organize the street protests and empower people to be active in the protests (Suh et al. 2010;Suh 2015 (Suh 2015). In this sense, Daum Agora proves sufficient and ideal for the Web forum of South Korea. The web crawler program collected the posting data from Daum Agora, which had been generated for the past 5 years from 2007 to 2011, and were stored in the relational database for the experiments. In total, the online data on 2,565,918 posts from 91,968 users were collected. Among the collected users, users, of which posts had been evaluated at least once by the others, were selected for the experiments, and they are 22,131 users. Based on the collected data, the writing style features of 22,131 users are extracted.
Next, the online user feedbacks regarding the posts of the selected 22,1131 users were extracted from the collected data. In case of the test bed Daum Agora, there are two types of online user feedbacks: like and dislike. Based on these, the classes of the selected 22,131 users' reputations were obtained by different segmenting types. As a result, Table 3 shows the number of users that belong to each user reputation class by different segmenting types. Moreover, from the selected 22,131 users, 100 users and their posts were randomly sampled for each class of user reputations in an experiment. Table 4 shows experimental results on accuracy for different writing feature sets and different classification techniques. To explain key findings, first, the feature set F1 + F2 + F3 + F4 gave the best accuracy for all the segmenting types except when segmenting type s = sum. On the other hand, RS-SVM gave the highest accuracy regardless of segmenting types. Second, among the 32 combinations, the feature set F1 + F2 + F3 + F4 and RS-SVM ranked the best, i.e. 94.50 %, in terms of accuracy for all the segmenting types. Likewise, the results in Table 4 indicate the superiority of the feature set F1 + F2 + F3 + F4 and RS-SVM. It is aligned with the paper's expectations stated in the Literature Reviews section, and the possible reason is that their common advantage of handling with tens of thousands features made them have the best teamwork.

Evaluations and discussions
In addition, the best accuracies were identified respectively for all four segmenting types, and were compared to each other. For classification techniques, reputation portfolio type gave the highest accuracy, i.e. 94.50 %, if the feature set F1 + F2 + F3 + F4 and the classification technique RS-SVM are used. On the other hand, reputation sum type gave the lowest best accuracy, i.e. 82.50 %, if the feature set F1 + F2 + F3 and the classification technique RS-SVM are used. Thus, it is seen that the more accurate way of segmenting user reputations by portfolio approach contributed to its higher accuracy than the other segmenting ways. Whereas, reputation sum type made it more difficult to classify user reputations, of which like(user i ) and dislike(user i ) are in a tense conflict. Table 5 shows the evaluation results on precision, recall, and F-measure. To explain, the feature set F1 + F2 + F3 + F4 achieved the highest precisions: 98.88 % with RS-NN for reputation dislike = Good, and 94.95 % with RS-SVM for reputation portfolio = Bad. Next, the feature set F1 + F2 + F3 + F4 and RS-NN gave the highest recalls: 100 % for reputation sum = Good, and 99.00 % for reputation dislike = Bad. The highest F-measures were achieved by the feature set F1 + F2 + F3 + F4 in cooperation with RS-SVM when segmenting type s = portfolio, i.e. 94.53 % for reputation portfolio = Good, and 94.47 % for reputation portfolio = Bad. Putting together, these results show that the feature set F1 + F2 + F3 + F4 and ensemble learning methods gave the best precision, recall, and F-measure for both Good and Bad classes of user reputations.

Results on comparative studies
On comparisons of different feature sets Table 6 shows the results of the pairwise t tests, conducted to examine the effect of different feature sets on accuracy for a certain classification technique. It reveals that,

Table 4 Accuracy (%) for different feature sets and different classification techniques
The best result for each segmenting type is highlighted as italics, and the best result over all the segmenting types is additionally highlighted as bold italics

F-measure Precision
Recall

F-measure Precision
Recall

F-measure Precision
Recall  regardless of segmenting types, adding one type of writing style features improved most of classification accuracies except adding the structural features F3. The insignificant effect of adding F3 is because its size is small so its representation capability is smaller than adding the other features. Moreover, the feature set F1 + F2 + F3 + F4 gave the best results for all eight classification techniques, regardless of the segmenting type. This suggests that the four feature sets provide important complementary and discriminatory potential if they are exploited by incorporating them in unison. Thus, a large set of rich writing style features are beneficial for automated classification on the reputations of social media users. Especially, it shows adding the content-specific writing style features F4 contributes to the best accuracy as expected in the Literature Reviews section. It indicates that keywords and phrases on certain topics are more important grounds to judge users than the other content-free writing style features in social media. Table 7 shows the results of the pairwise t tests, performed to investigate the effect of different classification techniques on accuracy for a specific feature set. For a given segmenting type s, classification techniques were compared in three parts: Base versus Base, Ensemble versus Ensemble, and Base versus Ensemble. In Table 7, it was observed that the ranks of all eight classification techniques are different according to the selected feature set.

On comparisons of different classification techniques
From Table 7, there are two key findings. First, there is no single classification technique that gave the best accuracy for all the feature sets in any given segmenting type. Second, ensemble learning methods are better than base learners in most of configurations. The reason is that the ensemble learning methods consider the writing style features in its entirety whereas the base learners only consider the average of the aggregated writing style features. This difference made the ensemble learning methods preserve the important information better than the base learners, and resulted in better accuracies.

On comparisons of different ways in defining the classes of user reputations
In Table 4, regarding segmenting types, it is remarkable that the segmenting type s = dislike gave the highest precision for both Good and Bad classes. It means that segmenting users by their dislike scores is the best way in terms of precision. Moreover, the segmenting type s = portfolio gave the highest F-measure when the feature set F1 + F2 + F3 + F4 and RS-SVM are combined: 94.53 % for Good class and 94.47 % for Bad class. One possible reason is that the more accurately segmenting users by portfolio approach contributed to higher F-measure than the other segmenting types. However, because the segmenting type s = portfolio classifies users more strictly into Bad class, its bests in terms of precision and recall were worse than the bests of the other segmenting types.
Moreover, in Table 7, it is seen that, when reputation portfolio was used, the feature set F1 + F2 + F3 + F4 and RS-SVM gave the best accuracy among all the configurations. The possible reason is that reputation portfolio classified users into Good class if they are certainly good, and strictly filtered users into Bad class if we are unsure about whether they belong to Good or Bad class.

Conclusions
This paper proposed a research framework to design and examine an automatic system that estimates user reputations of social media into Good and Bad classes by adopting writing styles. Using the most popular Web forum in South Korea, Daum Agora, selected as a test bed, the application was conducted by following the suggested research framework of the paper. Consequently, the experimental results in Table 4 show that the configuration of the feature set F1 + F2 + F3 + F4 and RS-SVM gave the best accuracy, i.e. 94.50 %, when segmenting type s = portfolio. It proves possible to classify user reputations by writing style features in social media with high accuracy (RQ1 is answered). In Table 6, the pairwise t tests on accuracy for different feature sets show that the feature set F1 + F2 + F3 + F4 The results are t and p values of the t tests for classification technique comparison, and the results more than 5 % of significance level are highlighted in italics ranked the best for all eight classification techniques regardless of segmenting types (RQ2 is answered). It represents that keywords and phrases on certain topics affect user reputations more than the other content-free writing style features. Whereas, according to Table 7, the results of pairwise t tests on accuracy for different classification techniques show that there was no single classification technique that gave the best accuracy for all the feature sets in any given segmenting type, but ensemble learning methods turned out better than base learners (RQ3 is answered). The experimental results related to RQ2 and RQ3 indicate that both the feature set F1 + F2 + F3 + F4 and the ensemble learning method are respectively better for handling with a large set of writing style features, and such common strength provided a synergy effect. In addition, the paper concluded that combining two types of online user feedbacks by using portfolio approach, i.e. segmenting type s = portfolio, gave the better accuracy than the other segmenting types (RQ4 is answered). A potential explanation is that, because the suggested portfolio approach segments user reputations more strictly into Good and Bad classes, it is better able to address the problem of this study. This paper contributes to the literature review as follows. First, this study is the first work that adopts writing styles as objective features to automatically classify social media user reputations into Good and Bad classes. Second, this paper provided guidelines for the system implementation in two ways: (1) which writing style features and classification technique should be used together for the best accuracy; (2) which segmenting type gave the best result with respect to accuracy. In particular, because social media have similar ways in measuring user reputations, which are given as the online user feedbacks, e.g. like, dislike, or both of two, the results can be used as a reference for similar studies on the other types of social media. Third, the paper helps keep the healthy and trustful social media ecosystem by protecting users from bad users, and it enables us to manage user reputations that are manipulated to be either lower or higher than the original values. As a consequence, it helps build the trust between users by complementing the online user feedback system in social media.
Directions for further studies can be suggested based on this paper as follows. First, for this study, South Korea was selected as the test bed country for reasons, but different country targets and more various languages may lead to additional implications. Hence, future researches for various countries, e.g. US, European, China, Japan, and Mid East are recommendable as the future researches. Second, this study focused on writing styles as objective features to classify user reputations in social media, but there can be other objective features useful for this study, e.g. network structures in communications between users and their commenters. In a similar vein, third, simpler or more sophisticated approaches should be considered to tackle the computing problem that ensemble learning methods take a great deal of time. Thus, the further studies can be conducted to revisit the problems and challenging issues, which motivated this study, with different perspectives on countries, languages, features, and techniques.