2023年全國碩士研究生考試考研英語一試題真題(含答案詳解+作文范文)_第1頁
已閱讀1頁,還剩5頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

1、<p>  畢業(yè)設計(論文)外文翻譯</p><p>  ——基于數據挖掘技術的WWW推薦系統(tǒng)設計</p><p><b>  英文原文</b></p><p>  Data Mining: What is Data Mining?</p><p><b>  Overview </b>&

2、lt;/p><p>  Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can b

3、e used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, an

4、d summarize the relationships identified. Technically, data mining is the process </p><p>  Continuous Innovation </p><p>  Although data mining is a relatively new term, the technology is not.

5、Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and st

6、atistical software are dramatically increasing the accuracy of analysis while driving down the cost. </p><p><b>  Example </b></p><p>  For example, one Midwest grocery chain used th

7、e data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppe

8、rs typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The gro

9、cery chain cou</p><p>  Data, Information, and Knowledge </p><p><b>  Data</b></p><p>  Data are any facts, numbers, or text that can be processed by a computer. Today,

10、organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes: </p><p>  ?operational or transactional data such as, sales, cost, inventory, pay

11、roll, and accounting</p><p>  ?nonoperational data, such as industry sales, forecast data, and macro economic data </p><p>  ?meta data - data about the data itself, such as logical database d

12、esign or data dictionary definitions </p><p>  Information</p><p>  The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail poin

13、t of sale transaction data can yield information on which products are selling and when. </p><p><b>  Knowledge</b></p><p>  Information can be converted into knowledge about histori

14、cal patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retai

15、ler could determine which items are most susceptible to promotional efforts. </p><p>  Data Warehouses </p><p>  Dramatic advances in data capture, processing power, data transmission, and stora

16、ge capabilities are enabling organizations to integrate their various databases into data warehouses. Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data min

17、ing, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data. Centra</p><p>

18、  What can data mining do? </p><p>  Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies

19、to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And

20、, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "dri</p><p>  With data mining, a retailer could use point-of-sale records o

21、f customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to s

22、pecific customer segments. </p><p>  For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its

23、cardholders based on analysis of their monthly expenditures. </p><p>  WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-of-sale transactions from over

24、 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses.

25、 These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store </p><p>  The National Basketball Association (NBA) is explorin

26、g a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For e

27、xample, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted

28、four ju</p><p>  By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb thr

29、ough hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot. </p><p>  How does data mining w

30、ork? </p><p>  While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationshi

31、ps and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks. Generally, any of four types of relatio

32、nships are sought: </p><p>  ?Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and wha

33、t they typically order. This information could be used to increase traffic by having daily specials.</p><p>  ?Clusters: Data items are grouped according to logical relationships or consumer preferences. Fo

34、r example, data can be mined to identify market segments or consumer affinities. </p><p>  ?Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative min

35、ing. </p><p>  ?Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on

36、a consumer's purchase of sleeping bags and hiking shoes. </p><p>  Data mining consists of five major elements: </p><p>  ?Extract, transform, and load transaction data onto the data wareho

37、use system. </p><p>  ?Store and manage the data in a multidimensional database system. </p><p>  ?Provide data access to business analysts and information technology professionals. </p>

38、<p>  ?Analyze the data by application software. </p><p>  ?Present the data in a useful format, such as a graph or table. </p><p>  Different levels of analysis are available: </p&g

39、t;<p>  ?Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.</p><p>  ?Genetic algorithms: Optimization techniq

40、ues that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution. </p><p>  ?Decision trees: Tree-shaped structures that represent s

41、ets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAI

42、D) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CA

43、RT </p><p>  ?Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). S

44、ometimes called the k-nearest neighbor technique. </p><p>  ?Rule induction: The extraction of useful if-then rules from data based on statistical significance. </p><p>  ?Data visualization:

45、The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships. </p><p>  What technological infrastructure is required?</p><

46、p>  Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a

47、terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technolo

48、gical drivers: </p><p>  ?Size of the database: the more data being processed and maintained, the more powerful the system required. </p><p>  ?Query complexity: the more complex the queries a

49、nd the greater the number of queries being processed, the more powerful the system required. </p><p>  Relational database storage and management technology is adequate for many data mining applications less

50、 than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new har

51、dware architectures such as Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query time. For example, MPP systems from NCR link hundreds of hig</p><p><b>  中文部分</b&g

52、t;</p><p>  數據挖掘:什么是數據挖掘?</p><p><b>  概述</b></p><p>  一般來說,數據挖掘(有時也被稱為數據或知識發(fā)現)是從不同的角度進行分析和總結數據,進而轉化為有用信息的IT流程,這些信息可以用于增加收入,降低成本,或兩者兼而有之。數據挖掘軟件是用來分析數據的工具之一。它允許用戶分析來自許多不同的

53、層面或角度的數據,歸類、發(fā)現和總結其中的關系。從技術上講,數據挖掘是發(fā)現在大型關系數據庫中的數十個相關領域或模式的過程。</p><p><b>  不斷創(chuàng)新</b></p><p>  雖然數據挖掘是一個相對較新的術語,但從技術上來說并非如此。在以前,公司主要通過強大的計算機來篩選超市掃描數據量和多年的市場分析研究報告來進行數據分析?,F在,計算機的處理能力,磁盤存儲

54、和統(tǒng)計軟件的不斷創(chuàng)新,都正在顯著的提高分析的準確性,同時降低成本。</p><p><b>  范例</b></p><p>  例如,一個中西部雜貨連鎖店使用Oracle軟件的數據挖掘能力,以分析當地的購買模式。他們發(fā)現,男子在周四和周六買尿布的時候,他們也傾向于購買啤酒。進一步分析表明,這些購物者通常因為要上班而不會在星期六買菜。但是,星期四他們也只買了幾種商品。

55、這家零售商認為他們購買的啤酒主要用于即將到來的周末。雜貨連鎖店可以利用這個新發(fā)現的購物方式信息來增加收入。例如,他們可以將啤酒和尿布擺在一起。而且,他們可以確保啤酒和尿布是在上周四全價出售。</p><p><b>  數據,信息和知識</b></p><p><b>  數據</b></p><p>  數據是任何事實,

56、數字或文字,可以由計算機處理。今天,企業(yè)也積累了在不同數據庫中的數據格式和不同的廣闊和不斷增長的數額。這包括:</p><p>  ?業(yè)務或交易數據,例如,銷售,成本,庫存,工資和會計</p><p>  ?nonoperational數據,如行業(yè)銷售,預測數據和宏觀經濟數據</p><p>  ?元數據 - 關于數據本身的數據,如數據庫的邏輯設計或數據字典的

57、定義</p><p><b>  信息</b></p><p>  模式,關聯(lián),或在所有這些數據之間的關系都可以提供信息。例如,銷售交易數據分析,零售點的信息可以產生哪些產品銷售和時間。</p><p><b>  知識</b></p><p>  信息可以被轉化為對歷史規(guī)律和未來趨勢的了解。例如,

58、總結零售超市銷售信息可以分析推廣工作,提供消費者購買行為的光。因此,制造商或零售商可以決定哪些東西最容易推廣。</p><p><b>  數據倉庫</b></p><p>  迅猛發(fā)展的數據采集,處理能力,數據傳輸和存儲能力使企業(yè)能夠整合數據倉庫的各種數據庫。數據倉庫是指一個集中的數據管理和檢索的過程。數據倉庫,和數據挖掘一樣,雖然本身是一個已經存在多年的概念相對較

59、新的任期。數據倉庫代表了維護所有組織數據的中央儲存庫的理想目標。數據集中是需要最大化的用戶訪問和分析。戲劇性的技術進步使這一設想成為許多企業(yè)的現實。而且,在數據分析軟件同樣巨大的進步使用戶能夠自由地訪問該數據。數據分析軟件是支持數據挖掘。</p><p>  數據挖掘可以做什么?</p><p>  主要用于數據挖掘的公司今天在密切關注消費者 - 零售,金融,通信和營銷組織。它使這些公司來

60、決定在“內部”,如價格,產品定位,或工作人員的技能因素的關系,和“外部”,如經濟指標,競爭和客戶的人口統(tǒng)計因素。而且,它使他們能夠確定在銷售,客戶滿意度和企業(yè)利潤的影響。最后,它使他們能夠“深入”到摘要信息,查看詳細的交易數據。</p><p>  通過數據挖掘,零售商可以使用的客戶購買點的銷售記錄,發(fā)送個人的購買記錄為基礎針對性的促銷。通過挖掘意見或保修卡從人口統(tǒng)計數據,零售商可以開發(fā)產品和促銷活動,吸引特定的

61、客戶群。</p><p>  例如,百視達娛樂地雷的錄影帶出租租金歷史數據庫中,建議對個人客戶。美國運通持卡人可以建議其產品的基礎上,他們每月支出的分析。</p><p>  沃爾瑪是開拓龐大的數據挖掘改變其供應商關系。沃爾瑪捕捉來自6個國家的2,900點店的銷售交易,并不斷傳遞到其龐大的7.5 TB的Teradata數據倉庫的數據。沃爾瑪允許超過3500個供應商,其產品上訪問數據和執(zhí)行數

62、據分析。這些供應商使用這些數據來確定在店內展示級客戶購買模式。他們利用這些信息來管理本地商店庫存,并確定新的銷售機會。 1995年,沃爾瑪計算機處理超過100萬的查詢復雜數據。</p><p>  美國國家籃球協(xié)會(NBA)是數據挖掘中的應用探索,可配合使用的籃球比賽的影像記錄。高級軟件分析球員的動作來幫助教練編排戰(zhàn)術和策略。例如,分析的播放按播放之間的紐約尼克斯隊和克利夫蘭騎士隊1月6日起在游戲片,1995年發(fā)

63、現時,馬克后衛(wèi)的位置上發(fā)揮的價值,約翰威廉姆斯企圖四跳投,并提出各一!先進的偵察兵,不僅認為這種模式,但解釋說這是相當有趣,因為它不同于一般拍攝從49.30%的比例在騎士隊那場比賽。</p><p>  利用NBA的通用時鐘,一個教練可以自動彈出,而無需通過梳理小時的錄像短片顯示威廉姆斯試圖用價格在地板上的每一個跳投。這些剪輯顯示一個非常成功的挑選和角色扮演,其中價格提請尼克斯隊的防守,然后找到一個開放的跳投威廉

64、姆斯。</p><p>  數據挖掘是如何工作的?</p><p>  盡管大型信息技術已經發(fā)展獨立的交易和分析系統(tǒng),數據挖掘提供了兩者之間的聯(lián)系。數據挖掘軟件分析在存儲交易開放式的用戶查詢的數據關系和模式。分析軟件提供了幾種類型:統(tǒng)計,機器學習,神經網絡。一般來說,四種類型的關系的任何要求:</p><p>  類:存儲的數據是用來定位在預定的組的數據。例如,我的

65、餐飲連鎖企業(yè)客戶可以購買數據,以確定哪些客戶光臨時,他們通常的順序。此信息可能被用來增加每天有特價流量。</p><p>  集群:數據項進行分組根據邏輯關系或消費者的喜好。例如,可以將數據挖掘,以確定細分市場或消費者的親和力。</p><p>  社團:數據可以開采,以確定關聯(lián)。啤酒,尿布的例子是一個關聯(lián)挖掘的例子。</p><p>  序列模式:數據挖掘,預測行

66、為模式和趨勢。例如,一個戶外設備零售商可以預測的可能性的背包上購買消費者的睡袋,登山鞋購買基礎。</p><p>  數據挖掘包括五個主要元素:</p><p>  提取,轉換和交易數據加載到數據倉庫系統(tǒng)。</p><p>  存儲和管理多維數據庫系統(tǒng)的數據。</p><p>  提供數據訪問業(yè)務分析員和信息技術專業(yè)人才。</p>

67、<p>  由應用軟件分析數據。</p><p>  目前在一個有用的格式的數據,如圖形或表。</p><p><b>  不同的分析級別:</b></p><p>  人工神經網絡:非線性是通過培訓和學習的結構類似生物神經網絡預測模型。</p><p>  遺傳算法:使用優(yōu)化技術的結合過程,如遺傳,變異,

68、并在對自然進化的概念為基礎設計的自然選擇。</p><p>  決策樹:樹狀結構代表的決定套。這些決定產生的數據集分類規(guī)則。具體方法包括決策樹分類回歸樹(CART)的和卡方自動交互檢測(CHAID)。 CART和CHAID的決策樹分類數據集使用的技術。它們提供了一種規(guī)則,你可以申請一個新的(未分類)數據集以預測哪些記錄將有相應的結果集。通過創(chuàng)建車段2路數據集分割而CHAID以卡方檢定段創(chuàng)建多路分割。車通常需要不到

69、CHAID數據準備。</p><p>  最近鄰法:一個技術,在數據集分類的基礎上每個記錄的記錄的K(s)最相似的歷史數據集給它的類的組合(其中k 1)。有時被稱為的K -最近鄰技術。</p><p>  規(guī)則歸納:運用有用的if - then規(guī)則從統(tǒng)計意義的數據提取。</p><p>  數據可視化:在多維數據的復雜關系的可視化解釋。圖形工具是用來說明數據關系。&

70、lt;/p><p>  科技基礎設施是需要什么?</p><p>  如今,數據挖掘應用,可為主機,客戶機/服務器,PC平臺的所有大小的系統(tǒng)。系統(tǒng)的價格范圍從幾千年的最小應用的最大上限為100萬TB的美元。企業(yè)范圍的應用,一般的尺寸范圍從10千兆字節(jié)到超過11萬億字節(jié)。 NCR已交付應用的能力超過100千兆字節(jié)。有兩個關鍵的技術驅動程序:</p><p>  大小數據庫

71、:更多數據正在處理和維護,更強大的系統(tǒng)需要。</p><p>  查詢的復雜性:更復雜的查詢和查詢的數量越大,正在處理,更強大的系統(tǒng)需要。</p><p>  關系型數據庫存儲和管理技術是多種數據挖掘應用超過50千兆字節(jié)更難滿足。但是,這種基礎設施的需求將明顯增強,以支持更大的應用程序。一些廠商已經將廣泛索引功能來提高查詢性能。其他使用新的硬件,如大量的并行處理器架構(MPP)的實現訂單在

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論