An alternative data filling approach for prediction of missing data in soft sets (ADFIS)

Sadiq Khan, Muhammad; Al-Garadi, Mohammed Ali; Wahab, Ainuddin Wahid Abdul; Herawan, Tutut

doi:10.1186/s40064-016-2797-x

Research
Open access
Published: 15 August 2016

An alternative data filling approach for prediction of missing data in soft sets (ADFIS)

Muhammad Sadiq Khan ORCID: orcid.org/0000-0001-8428-9208¹,
Mohammed Ali Al-Garadi¹,
Ainuddin Wahid Abdul Wahab¹ &
…
Tutut Herawan¹

SpringerPlus volume 5, Article number: 1348 (2016) Cite this article

1594 Accesses
18 Citations
Metrics details

Abstract

Soft set theory is a mathematical approach that provides solution for dealing with uncertain data. As a standard soft set, it can be represented as a Boolean-valued information system, and hence it has been used in hundreds of useful applications. Meanwhile, these applications become worthless if the Boolean information system contains missing data due to error, security or mishandling. Few researches exist that focused on handling partially incomplete soft set and none of them has high accuracy rate in prediction performance of handling missing data. It is shown that the data filling approach for incomplete soft set (DFIS) has the best performance among all previous approaches. However, in reviewing DFIS, accuracy is still its main problem. In this paper, we propose an alternative data filling approach for prediction of missing data in soft sets, namely ADFIS. The novelty of ADFIS is that, unlike the previous approach that used probability, we focus more on reliability of association among parameters in soft set. Experimental results on small, 04 UCI benchmark data and causality workbench lung cancer (LUCAP2) data shows that ADFIS performs better accuracy as compared to DFIS.

Background

Soft set theory proposed by Molodtsov is considered as a mathematical model for dealing with vague and uncertain data (Molodtsov 1999). This theory is a standard as compare to existing theories such as fuzzy set, rough set, vague set and statistical approach for dealing with vague data because of its adequate of parameterization. Research in the soft set theory both theoretical and practical has been attracted many attentions, especially in the field of decision making. The first attempt in soft set decision making is introduced by Maji et al. (2002). They presented soft set first application in decision making by representing it in Boolean table and defined its reduct set. Their work of reduct was improved by Chen et al., further improved by Kong et al. and sequentially by Ma et al. for decision making of sub-optimal choices and simplified approaches, respectively (Chen et al. 2005; Kong et al. 2008; Ma et al. 2011). In parallel to these developments, researchers used soft set for handling daily life’s uncertain data issues and applied it in verity of useful applications (Cagman and Enginoglu 2012; Cagman et al. 2011; Çelik and Yamak 2013; Herawan and Deris 2011; Jun et al. 2009; Jun and Park 2008; Kalaichelvi and Malini 2011; Kalayathankal and Singh 2010; Tanay and Kandemir 2011; Xiao et al. 2009; Yuksel et al. 2013). But in some applications, researchers faced problem of incomplete soft set cases with partially missing values. Soft and its related sets data can be missed due to many factors such as improper entry, viral attack, security reasons and errors during data transfer. Incomplete soft sets can be no longer applied in any application or may yield extra-large, very small, unexpected and misleading results, if still applied. Such results, especially a wrong decision making can cause a huge loss to an individual or organizations. For coping with this situation, Zou et al. presented their techniques of weighted-average for calculating decision values and average probability for prediction of missing values in soft set and fuzzy soft set respectively (Zou and Xiao 2008). Qin et al. proposed DFIS where it indicated that data prediction in incomplete soft set is more reliable and accurate if recalculated through association between parameters and they used simple probability for cases having zero or weak association (Qin et al. 2012). Rose et al. also contributed in completion of incomplete soft set using parity bits and aggregate values (Mohd Rose et al. 2011; Rose et al. 2011). Sub-sequentially, Kong et al. (Kong et al. 2014) improved Zou et al. (Zou and Xiao 2008) approach of incomplete soft set by presenting an equivalent probability technique having less complexity and also determining actual missing data instead of only decision values determination. However, in reviewing Kong et al. approach, it still facing inherited shortcomings and low accuracy as compared to DFIS.

In this paper, we compare all exiting approaches in term of accuracy and computational complexity and find DFIS as most suitable among them for predicting missing values in incomplete soft set. We propose an alternative data filling approach for prediction of missing data in soft sets. In summary the contribution of this work is described as follow:

(a)
We propose an alternative data filling approach for prediction of missing data in soft sets (ADFIS). The novelty of ADFIS is that, unlike the previous approach that used probability, we focus more on reliability of association between parameters.
(b)
In contrast to DFIS, we revise association calculating procedure to predict maximum possible number of unknowns through association.
(c)
To validate our work, we perform extensive experiment tests on 04 UCI benchmark and causality workbench lung cancer (LUCAP2) data sets to show the performance of ADFIS.
(d)
We compare the results with other baseline approaches mentioned in the literatures.

Soft set

Let given U be an initial non-empty universal set and E be a set of parameters related to U. According to Molodtsov (1999), a pair (F, E) is called soft set over U if and only if F is mapping from E into the set of all subsets of the set U. The following example gives us illustration for a soft set.

Example 1

Suppose U = {h ₁, h ₂, h ₃, h ₄, h ₅} is a set of houses and E = {e ₁, e ₂, e ₃, e ₄, e ₅, e ₆} is the set of parameters in relation to each house. Each member of E represents cheap, new, wooden, expensive, old and beautiful house, respectively. Let cheap houses are h ₁, h ₃, h ₅, new houses are h ₁, h ₂, h ₃, h ₄, wooden houses are h ₂, h ₃, h ₄, expensive houses are h ₂, h ₄, old house is h ₅ and beautiful houses are h ₁, h ₂, h ₄, h ₅. Here, the pair (F, E) describing the attractiveness a soft set given by

$$\begin{aligned} \left( {F,E} \right) & = \left\{ {\left( {e_{1} ,\left\{ {h_{1} ,h_{3} ,h_{5} } \right\}} \right),\,\left( {e_{2} ,\left\{ {h_{1} ,h_{2} ,h_{3} ,h_{4} } \right\}} \right),\,\left( {e_{3} ,\left\{ {h_{2} ,h_{3} ,h_{4} } \right\}} \right)} \right. \\ &\; \quad \left. {\left( {e_{4} ,\left\{ {h_{2} ,h_{4} } \right\}} \right),\left( {e_{5} ,\left\{ {h_{5} } \right\}} \right),\,\left( {e_{6} ,\left\{ {h_{1} ,h_{2} ,h_{4} ,h_{5} } \right\}} \right)} \right\} \\ \end{aligned}$$

Representation of soft set in tabular form

If U is finite non-empty set of objects, AT is the non-empty finite set of attributes, $V = \cup V_{r}$ such that V _r is the value domain of attribute and f is an information function given by $f:U \times AT \to V_{r}$. Then the quaternion S = (U, AT, V _r, f) is called an information system (Ma et al. 2011). The soft set (F, E) in Example 1 is represented in Table 1 i.e. in a Boolean information system.

Table 1 Tabular representation of a soft set (F, E) in a Boolean-valued information system and its decision value

An alternative data filling approach for prediction of missing data in soft sets (ADFIS)

Abstract

Background

Soft set

Example 1

Representation of soft set in tabular form

Incomplete soft set

Example 2

Related works

Zou et al. approach

Qin et al. approach

Example 3

Kong et al. approach

Comparison of previous approaches

Zou et al. versus Kong et al

Kong et al. versus DFIS

Alternative approach for data filling of incomplete soft sets

Definition 1

Definition 2

Definition 3

Example 4

Results and discussion

Incomplete soft set of Example 2

UCI benchmark data sets

Zoo data set

Flags data set

SPECT hearts data set

Congressional votes data set

Causality workbench LUCAP2 data set

Discussions

Weaknesses of the ADFIS

Incorrect results rare cases

High computational complexity

Conclusion

References

Authors’ contributions

Acknowledgements

Competing interests

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords