2023年全國(guó)碩士研究生考試考研英語(yǔ)一試題真題(含答案詳解+作文范文)_第1頁(yè)
已閱讀1頁(yè),還剩9頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、 Procedia Computer Science 96 ( 2016 ) 1681 – 1690 Available online at www.sciencedirect.com1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND licens

2、e (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of KES International doi: 10.1016/j.procs.2016.08.216 ScienceDirect20th International Conference on Knowledge Based and Intelligen

3、t Information and Engineering Systems, KES2016, 5-7 September 2016, York, United Kingdom Model selection for financial statement analysis: Variable selection with data mining technique Ken Ishibashia*, Takuya Iwasakia,

4、 Shota Otomasaa and Katsutoshi Yadaaa Data Science Laboratory, Kansai University, 3-3-35 Yamate, Suita, Osaka 564-8680, Japan Abstract The purpose of this study is to verify the effectiveness of a data-driven approach fo

5、r financial statement analysis. In the area of accounting, variable selection for construction of models to predict firm’s earnings based on financial statement data has been addressed from perspectives of corporate val

6、uation theory, etc., but there has not been enough verification based on data mining techniques. In this paper, an attempt was made to verify the applicability of variable selection for the construction of an earnings p

7、rediction model by using recent data mining techniques. From analysis results, a method that considers the interaction among variables and the redundancy of model could be effective for financial statement data. ©

8、 2016 The Authors. Published by Elsevier B.V.? Peer-review under responsibility of KES International. Keywords: Financial statement analysis; earnings prediction model; model selection; variable selection; data mining 1.

9、 Introduction Recent advancement in information and communication technology is dramatically improving computational speeds. Under the circumstances, researchers have addressed studies focused on big data accumulated in

10、 various areas. Data mining techniques play an important role in data-driven analysis and modeling. Various methods related to data mining have been developed until now, and software such as SPSS and Weka has been deve

11、loped to enable us to use them easily. However, for these applications, we generally need to select a method appropriate to data. The purpose of this study is to verify the effectiveness of a data-driven approach for t

12、he financial statement analysis. In the area of accounting, Ou and Penman (1989)1) addressed the construction of an earnings prediction * Corresponding author. Tel.: +81-6-6368-1121. E-mail address: r108047@kansai-u.ac

13、.jp © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of KES Internati

14、onal1683Ken Ishibashi et al. / Procedia Computer Science 96 ( 2016 ) 1681 – 1690 2.2. Relief Relief is an instance-based attribute ranking scheme proposed by Kira and Rendell (1992)6), and later improved by Kono

15、nenko (1994)10). This method is applied to the estimation of a variable’s importance for the classification. In a classification of certain class, Relief decides a variable’s importance by focusing on instances located

16、around the border of the class. From these instances, two instances are selected as near-miss and near-hit. The near-miss is an instance that is the closest to randomly selected samples but is not the same class as the

17、m. On the other hand, an instance selected as near-hit is the closest to them and is the same class. In Relief, the importance of a variable is decided based on the effectiveness for the classification of near-miss. Ex

18、isting research5) showed that this method had large tolerance to noise but low redundancy. In the application of Relief to variable selection, variables to adopt are generally decided by setting a threshold to their es

19、timated ranks. In this study, the importance of variables is decided by 10-fold cross-validation, and we adopt variables for which the “Merit” criterion for the classification is more than 0 are adopted. 2.3. Correlatio

20、n-based feature selection CFS is a method that evaluates subsets of variables, not individual variables7). This method searches subsets containing variables that are highly correlated with the class and have low inter-c

21、orrelation with each other. CFS tends to be computationally cheap and choose small variables’ subsets, but it is difficult to search solutions if there are strong variable interactions5).In this study, we use a Greedy

22、algorithm to search for a subset that has the best CFS’s evaluation. 2.4. Consistency-based subset evaluation CNS evaluates variables’ subsets by using class consistency8). This method searches for combinations of varia

23、bles which divide the data into subsets containing strong single class majority. Thus, this search tends to be biased in favor of small variable subsets with high-class consistency. Compared with CFS, CNS is useful if t

24、here are strong variable interactions, but the size of subset tends to be large5).In this study, CNS searches for subsets by using a Greedy algorithm like in CFS. 2.5. C4.5 decision tree learner C4.5 is a learning algor

25、ithm that constructs a decision tree by selecting variables appropriate to maximize the mutual information for classification9). This method can avoid over-training to data by the function called “branch pruning”, whic

26、h removes branches that have little mutual information or classify few instances. In the variable selection, variables contained in the decision tree are adopted as a subset of variables. In this study, a decision tree

27、 is constructed by using all training data for modeling, and then branches of which the number of classifying data is less than 50 are removed by the pruning. In this way, we obtain a subset with a size equivalent to C

28、FS’s subsets. 2.6. Stepwise method In existing research, Ou and Penman (1989)1) constructed an earnings prediction model by using stepwise logistic regression. Stepwise method is a conventional method that sequentially

29、chooses variables to enhance evaluation criteria. In this method, the process of variable selection is very clear. However, because the effect of each variable is sequentially evaluated, this method is computationally

30、expensive and it is difficult to take account of the interaction among variables. In this study, we construct a logit model by using the stepwise forward selection method with all variables. In addition, an attempt is

31、 made to apply the same method as Ou and Penman (1989)1). The previous method constructs a logit model through three stages. In the first stage, logit models with each variable are constructed respectively, and then va

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫(kù)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論