Previous works | Tasks | Research purpose | Test bed social media | Language | Features of writing styles | Techniques | |||
---|---|---|---|---|---|---|---|---|---|
Lexical | Syntactic | Structural | Content-specific | ||||||
Abbasi and Chen (2005) | Authorship identification | Evaluating the linguistic features of Web messages and comparing them to known writing styles offers the intelligence community a tool for identifying patterns of terrorist communication | Web forum | English, Arabic | √ | √ | √ | √ | C4.5, SVM |
Zheng et al. (2006) | Authorship identification | Examining writing style features and classification techniques to identify authorship of unknown online messages | Internet news group, Bulletin board system (BBS) | English, Chinese | √ | √ | √ | √ | C4.5, NN, SVM |
Argamon et al. (2007) | Classification | Developing a new type of lexical feature for stylistic text classification, and demonstrating its usefulness in sentiment classification | Movie reviews | English | √ |  |  | √ | SVM |
Abbasi and Chen (2008) | Authorship identification, similarity detection | Using writing style analysis techniques for identification and similarity detection of anonymous identities | eBay comments, Java forum | English | √ | √ | √ | √ | SVM, RS-SVM, PCA, Standard K-L transforms |
Abbasi et al. (2008b) | Classification | Evaluating techniques that select writing style features for sentiment classification | Movie reviews, Web forum | English, Arabic | √ | √ | √ | √ | IG, GA, SVM weights, EWGA |
Abbasi et al. (2008a) | Similarity detection | Evaluating writing style similarity detection techniques | Online feedback comments of eBay members | English | √ | √ | √ | √ | PCA, n-gram models, Markov models, Cross entropy, K–L similarity |
Agichtein et al. (2008) | Classification | Automatic finding on high-quality content in a question/answering portal | Yahoo! Answer | English | √ | √ |  | √ | C4.5, SVM |
Koppel et al. (2009) | Classification | Comparing methods and features applied to authorship attribution problems representative of the range of classical attribution problems | Blog | English | √ | √ | √ | √ | NB, C4.5, Window, Bayesian regression, SVM |
Huang et al. (2010) | Classification | Evaluating the effectiveness of user-generated text data in online video classification | Video-sharing Web site | English | √ | √ |  | √ | NB, C4.5, SVM |
Zhang et al. (2011) | Classification | Evaluating writing style features and classification techniques for online gender classification | Web forum | English | √ | √ | √ | √ | SVM |
Benjamin and Hsinchun (2012) | Classification | Investigating relationship between hacker posting behaviors and reputation to identify potential cues for determining key actors | Hacker communities | English |  |  | √ |  | Regression analysis |
Iqbal et al. (2013) | Authorship identification, classification | Studying three typical authorship analysis problems encountered by cybercrime investigators: authorship identification with large training samples, authorship identification with small training samples, and authorship characterization for gender and location | Blog | English | √ | √ | √ | √ | Ensemble of Nested Dichotomies, C4.5, RBF Network, NB, BayesNet |
Jiang et al. (2014) | Similarity detection | Using online writing style analysis to segment the forum participants by stakeholder groups, and partitions their messages into different time periods of major firm events to examine how important stakeholders evolve over time | Web forum | English | √ | √ | √ | √ | EM clustering |
This study | Classification | Using writing style features as objective features to estimate the classes of user reputations in social media | Web forum | Korean, English | √ | √ | √ | √ | C4.5, NN, SVM, NB, RS-C4.5, RS-NN, RS-SVM, RS-NB |