關(guān)于fpga的外文文獻(xiàn)翻譯---一種新的包裝,布局和布線工具的fpga研究_第1頁
已閱讀1頁,還剩24頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、<p><b>  譯 文</b></p><p>  VPR:一種新的包裝,布局和布線工具的FPGA研究</p><p>  沃恩貝茨和喬納森羅斯</p><p>  系電氣與計算機(jī)工程系,多倫多大學(xué)</p><p>  多倫多,ON,加拿大M5S3G4{沃恩,jayar} @ eecg.toronto.e

2、du</p><p>  摘 要 我們描述了一個基于FPGA新的功能和CAD工具使用的算法,各種途徑和方(VPR)。在減少路由面積計算方面,VPR優(yōu)于所有的FPGA布局布線工具,我們可以比較。雖然常用的算法是基于已知的方法,是我們目前而言改善運行時間和質(zhì)量的幾個有效方法。我們目前的版圖和路由上的大型電路的一套新的結(jié)果,讓未來的基準(zhǔn)電路尺寸上的設(shè)計方法更多,用于今天的典型的FPGA布局布線工具工業(yè)品外觀設(shè)計

3、。VPR是針對一個范圍廣泛的FPGA架構(gòu)的能力,并且源代碼是公開的。它和相關(guān)的網(wǎng)表翻譯/群集工具VPACK已經(jīng)被用在世界各地的一些研究項目,并且是有用的FPGA體系結(jié)構(gòu)的研究。</p><p><b>  1 簡介</b></p><p>  在FPGA的研究中,人們通常必須評估新結(jié)構(gòu)特色的實用工具而做評估實驗。也就是說評估基準(zhǔn)電路技術(shù)映射,放置和FPGA的布線結(jié)構(gòu)上

4、的關(guān)系和措施的架構(gòu)質(zhì)量,如運算速度或區(qū)域,然后可以很容易地提取出來。因此,有相當(dāng)大的對于靈活CAD工具的需求,這樣才可以針對各種架構(gòu)的FPGA做高效的設(shè)計,從而便于比較均勻的設(shè)計架構(gòu)。本文介紹了通用的地點和路線(VPR)工具,設(shè)計很靈活,足夠讓許多FPGA架構(gòu)的比較VPR可以執(zhí)行的位置,要么全球路由或合并后的全球詳細(xì)路由。這是公開的http://www.eecg.toronto.edu/?jayar/軟件。</p><

5、;p>  為了使FPGA體系結(jié)構(gòu)的比較有意義,它是至關(guān)重要的CAD工具用于將每個電路架構(gòu),以地圖的高品質(zhì)展現(xiàn)。路由相優(yōu)于所有的VPR在查看FPGA的路由器方面,任何標(biāo)準(zhǔn)基準(zhǔn)測試的結(jié)果都可用,并且指出VPR的砂礦和路由器的組合勝過所有出版的FPGA布局和布線工具。本文結(jié)構(gòu)如下:</p><p>  在第2節(jié)我們描述了一些VPR功能的FPGA架構(gòu)和范圍與它可能被使用的地方。在第3和第4節(jié),我們描述了布局布線法。

6、在第5節(jié)講述了比較有必要的VPR曲目數(shù)量和該電路成功的布線所要求的其他已發(fā)表的工具。在第6節(jié)得出了我們的結(jié)論,并提出一些VPR將來的升級。</p><p><b>  2 概述VPR</b></p><p>  圖1概括了VPR 的CAD流程。VPR投入到由一個technologymapped 網(wǎng)表和一個文本文件描述了的FPGA架構(gòu)中。VPR可以放置電路,或一個預(yù)先存

7、在的位置,可以讀入VPR可以執(zhí)行或者是全局的路線或合并后的全球/詳細(xì)的安置途徑。VPR的輸出由布局、布線和統(tǒng)計組成,評估一項有用的工具FPGA架構(gòu),如路由線長,跟蹤計數(shù)最大凈長度。給出一些可指定的建筑結(jié)構(gòu)參數(shù)描述文件: </p><p>  ?邏輯塊輸入和輸出的數(shù)量, </p><p>  ?對每個邏輯塊的輸入和輸出端訪問(S)之和</p><p>  ?邏輯等價性

8、不同的輸入和輸出引腳(例如,所有對照表輸入功能當(dāng)量), </p><p>  ?對I /成一行或一列的FPGA適合O引腳數(shù), </p><p>  ?邏輯塊陣列的尺寸(如23 × 30的邏輯塊)。此外,如果全球路由要執(zhí)行,你也可以指定: </p><p>  ?橫向和縱向通道的相對寬度之和 </p><p>  ?在不同區(qū)域的FPG

9、A的渠道相對寬度。最后,如??果合并后的全球和詳細(xì)的路由被執(zhí)行,一個也會進(jìn)行求值: </p><p>  ?開關(guān)塊[1]架構(gòu)(即為何路由曲目是相互關(guān)聯(lián)的), </p><p>  ?曲目號碼,每個邏輯塊的輸入引腳連接( [1]), </p><p>  ?為邏輯塊輸出FC值,</p><p>  ?對I / O口FC值。</p>

10、<p>  當(dāng)前的體系結(jié)構(gòu)描述格式不允許跨越多個領(lǐng)域和多個邏輯塊和被列入路由體系結(jié)構(gòu),但我們目前加入此功能。添加新的路由架構(gòu)的功能VPR相對容易,因為VPR使用體系結(jié)構(gòu)描述來創(chuàng)建路由資源圖。每個路由跟蹤和建設(shè)中的每一個腳成為在這個圖中的節(jié)點,圖邊表示為允許的連接。路由器,圖形可視化和統(tǒng)計計算程序都與此路由資源圖的工作相關(guān),所以添加新的路由架構(gòu)功能僅涉及更改的子程序來建設(shè)這個圖。雖然VPR最初是島式FPGA的開發(fā)[2,3],它

11、也可以和以行為為基礎(chǔ)的FPGA應(yīng)用[4]。 VPR目前沒有能力為目標(biāo)的層次FPGA的[5],顯然增加一個適當(dāng)?shù)奈恢煤统杀竞瘮?shù)設(shè)計所需的布線資源圖形程序?qū)⑹蛊淠軌蚪鉀Q這些問題。最后,VPR的內(nèi)置圖形允許交互式可視化的布局,路由可用資源和互連的可能途徑路由資源。</p><p>  VPACK邏輯塊包裝程序/網(wǎng)絡(luò)表翻譯</p><p>  VPACK讀取一個已經(jīng)技術(shù)映射電路網(wǎng)表格式blif 到

12、LUT和觸發(fā)器,包裝成所需的FPGA邏輯LUT和觸發(fā)器塊,并輸出在VPR的網(wǎng)表。 VPACK可以針對邏輯塊組成一個LUT,如圖2所示,因為這是一種常見的FPGA邏輯元件。 VPACK也針對邏輯塊包含幾個有用的LUT和幾個拖動程序,有或沒有共享LUT的輸入[6]。這些“clusterbased”邏輯塊類似于最近由Altera FPGA開發(fā)的工具類型。</p><p><b>  3布局算法</b&g

13、t;</p><p>  VPR采用模擬退火算法[7]。我們已經(jīng)嘗試與幾個不同的成本函數(shù)聯(lián)系,發(fā)現(xiàn)我們稱之為線性擠塞的成本函數(shù)提供了一個合理的計算時間,最好的結(jié)果[8]。此成本函數(shù)的函數(shù)形式就是對所有的求和電路中的網(wǎng)進(jìn)行計算。對于每一個網(wǎng),北方新宇和bby指出在其邊界框的水平和垂直跨度分別為Q(n)的因數(shù)補(bǔ)償。邊界線長度模型中的實際低估所需的布線,就可以看成超過三個終端網(wǎng),作為建議[10]。它的價值取決于凈N兩端

14、號碼; Q是對總體1有3個或更少的終端,并慢慢增加了50臺網(wǎng)邏輯與上2.79。賈夫常數(shù)x(n)、?(n)為平均信道容量(在首部)在X和Y方向,分別比較全凈邊框和成本函數(shù)的余量,需要更多的調(diào)配路由的領(lǐng)域,F(xiàn)PGA具有窄渠道。本文中的所有結(jié)果的得到,是利用FPGA中的所有通道都有相同的原則。在這種情況下,賈夫是一個常數(shù),函數(shù)的線性阻塞耗費降低到一個包圍盒的成本函數(shù)。一個良好的退火算法的必要條件是時間表取得一個合理的高品質(zhì)的解決方案與模擬退火

15、的計算時間相關(guān)聯(lián)。我們已經(jīng)開發(fā)出一種新的退火附表,導(dǎo)致非常高品質(zhì)的展示位置,并在其中給出退火參數(shù)的自動調(diào)節(jié)功能,不同的成本和電路尺寸。我們計算在初始溫度相同的方式為[11</p><p>  最后,它表明在[12,13],這是可取的Raccept保證作為近似0.44的量有可能被取值。為此,就需要利用Raccept值來控制這個范圍限制器。塊是小于或等于交匯處的值,Dlimit單位除了在X和Y方向嘗試。一個小的Dli

16、mit增加值由Raccept確保這僅僅是塊進(jìn)行交換考慮。而這些“本地交換“往往導(dǎo)致安置成本相對較小的變化,越來越多被接受的可能性增加。最初,Dlimit設(shè)置為整個芯片。每當(dāng)溫度降低,Dlimit整個芯片的尺寸為這個結(jié)果退火的第一部分,逐漸萎縮退火過程中的中間階段,并正在為退火低溫第1部分最后設(shè)計余量,當(dāng)T退火終止“0.005*成本/ Nnets。該運動的邏輯塊總是至少影響到一個網(wǎng)。當(dāng)溫度高于平均凈成本的一個單位時,它是不可能接受任何成本

17、增加的調(diào)配結(jié)果的,所以我們終止了退火。</p><p><b>  4路由算法</b></p><p>  VPR的路由器是基于試探談判的擁塞算法[14,8]。 基本上該算法由最初各條線路的最短路徑找到網(wǎng), 無論任何接線段或邏輯塊管腳,都可能會導(dǎo)致過度使用。路由器的迭代過程包含順序抓取行動和重新路由(由最低成本路徑中找到)中的每個電路網(wǎng)。對使用路由資源成本的函數(shù),其對

18、資源的任何過度使用都會讓當(dāng)前路由發(fā)生事先迭代。通過逐漸增加的多余認(rèn)購路由資源成本,該算法勢力替代路線網(wǎng),以避免使用超額認(rèn)購資源,只剩下網(wǎng)最需要一個給定的資源。對于本文的實驗結(jié)果,我們設(shè)置路由器的最大數(shù)量迭代為45,如果電路中路由沒有成功,一定數(shù)目的目錄中45迭代就被假定為不可路由通道的寬度。為了避免過于迂回路線以節(jié)省CPU時間,我們讓一個去凈路由最外的3個通道的凈終端邊界框。一個重要的執(zhí)行細(xì)節(jié)值得一提。無論是原探路者算法和Vpr路由器使

19、用的Dijkstra算法(即一個迷宮路由器[15]),以每個網(wǎng)絡(luò)連接和AK用線網(wǎng)為依據(jù),路由器調(diào)用通道的k - 1次執(zhí)行所有需要的連接。在第一次調(diào)用迷宮路由波從凈源擴(kuò)大,直到它到達(dá)任何的K – 1值之后。路徑從源到接收器作為現(xiàn)在這個網(wǎng)的路由的第一部分。波前的迷宮路由被清空,新波前擴(kuò)展是從整個網(wǎng)絡(luò)布</p><p><b>  5實驗結(jié)果</b></p><p>  各

20、種FPGA在本節(jié)中使用的參數(shù),總是選擇與先前參數(shù)有明顯對比的那些參數(shù)。所得結(jié)果在本節(jié)獲得了邏輯的4輸入LUT加上一個觸發(fā)器組成的塊,如圖所示在圖2。時鐘網(wǎng)和時序電路沒有遞交,因為它通常是路由通過專用FPGA的商業(yè)網(wǎng)絡(luò)中的路由。每個LUT的輸入出現(xiàn)在一個邏輯塊的一面,而邏輯塊輸出一般訪問底部和右側(cè),如圖4。每個邏輯塊的輸入或輸出連接任何相鄰?fù)ǖ溃╯)(即Fc的=寬)。每根電線段和其他布線連接到三段,而在通道交叉口(即值= 3)和開關(guān)箱拓?fù)?/p>

21、是“不相交” 這是因為在0磁道接線段只連接在0磁道的其他布線段。</p><p>  5.1實驗結(jié)果與輸入引腳Doglegs</p><p>  以往大多數(shù)FPGA布線結(jié)果認(rèn)為“輸入引腳doglegs”是可能。如果輸入引腳之間的音軌和它連接接線盒的Fc通過獨立的SRAM位控制晶體所組成,為了驗證兩條軌道上的這些開關(guān)通過電氣連接的可能性。我們將把這個作為一個輸入管腳doglegs。作為商業(yè)化

22、的FPGA,實現(xiàn)從一個輸入引腳接線盒到多路通道,只有一個軌道可以連接到輸入引腳,使用多路復(fù)用器而不是獨立通過在FPGA中的晶體管布局來保存相當(dāng)?shù)拿娣e。另外,通常有一個緩沖軌道之間的連接塊和它連接多路復(fù)用這樣做的目的是為了提高速度,同時這也意味著緩沖輸入引腳doglegs不能被使用。因此,如果在未來FPGA的路由器測試時沒有輸入引腳doglegs那么我們必須讓輸入引腳doglegs和過去??的結(jié)果公平的比較這樣是最好的。在本節(jié)中我們比較了

23、所需的最低數(shù)目,每一條成功的路徑和CAD工具的路由設(shè)置。所有的基準(zhǔn)circuits.1在表2給出結(jié)果,得到了路由Altor [16],制作了一個基于位置的工具min。列出三兩步(全球和詳細(xì))路由與其它路由器進(jìn)行合并后的全球和詳細(xì)的路由。 VPR要求比第二,第三最佳路由器降低10%的資源數(shù)目,表3列出了音</p><p>  5.2不輸入引腳的Doglegs實驗</p><p> 

24、 比較了VPR與SPLACE / SROUTE工具,不允許輸入引腳doglegs的性能。當(dāng)這兩個工具都只能使用路線一,比起SROUTE軌道Altor產(chǎn)生的安置需求VPR減少13%,。當(dāng)然這些工具都支持允許布局和布線的電路,對于SPLACE / SROUTE組合VPR還需要少29%資源數(shù)目。無論是基于VPR和SPLACE只要是使用模擬退火算法,我們相信VPR單元在一方面優(yōu)于SPLACE是因為它處理高扇出網(wǎng)絡(luò)更有效率,讓更多的動作進(jìn)行評估,

25、另一方面是因為它更有效的退火時間表給定的時間。朗顯示對應(yīng)的拉丁字符的拼音</p><p>  大電路5.3實驗結(jié)果</p><p>  在第5.1和5.2的54至358的邏輯基準(zhǔn)塊范圍內(nèi)使用面積計算顯然太小,因為這是特殊的FPGA。因此在本節(jié)中我們目前的實驗結(jié)果,20個最大的MCNC基準(zhǔn)電路[27],它的大小范圍從1047到8383邏輯塊。我們使用Flowmap [28]以技術(shù)圖每4個LU

26、T和拖動塊并為VPACK tocombine拖動塊,進(jìn)入我們的基本邏輯電路塊LUT。I / O引腳數(shù)每行或列適合設(shè)置為2,符合目前的商業(yè)化FPGA。每個電路被放置在最小的正方形FPGA可以包含它的路由并且輸入引腳doglegs是不允許的。請注意三個基準(zhǔn)bigkey,DES和dsip,是padlimited要求在FPGA架構(gòu)表5比較資源數(shù)量的地方,在完全路線電路與全版圖范圍內(nèi)所需地點與路線的電路與數(shù)字VPR,然后進(jìn)行詳細(xì)的路由世嘉[23]

27、。表5還給出了大小每個邏輯塊的數(shù)量計算電路。在世嘉列中的條目³仿真無法成功,因為世嘉運行路由內(nèi)存不足。由VPR增加路由產(chǎn)生的全版圖航線曲目總數(shù),有超過所需68%路線的電路主場由VPR路由完全執(zhí)行。顯然,世嘉處理無法進(jìn)行。因為路由大電路當(dāng)輸入引腳doglegs是不允許的。為了鼓勵其它FPGA研究人員公布的</p><p><b>  6結(jié)論和未來工作</b></p>&

28、lt;p>  我們已經(jīng)提出了一個優(yōu)于所有這類工具的新的FPGA布局布線工具,它讓我們可以進(jìn)行直接的比較。此外,我們已經(jīng)提出更大的電路基準(zhǔn)測試結(jié)果。建立專門用于描述精密學(xué)術(shù)的FPGA布局布線工具。我們希望下一代的FPGACAD工具將優(yōu)化這些大型基點,因為他們是一系列密切的問題被映射成今天的FPGA。VPR的主要設(shè)計目標(biāo)之一是保持足夠的靈活性,允許工具使用在很多FPGA架構(gòu)的研究上。我們目前正進(jìn)行幾個VPR改進(jìn),才能進(jìn)一步提高其在FP

29、GA架構(gòu)的研究能力。在不久的將來VPR將支持緩沖和分段路由結(jié)構(gòu),我們計劃增加定時分析儀和時序驅(qū)動的路由。</p><p><b>  外文原文</b></p><p>  VPR: A New Packing, Placement and Routing Tool for</p><p>  FPGA Research1</p>

30、<p>  Vaughn Betz and Jonathan Rose</p><p>  Department of Electrical and Computer Engineering, University of Toronto</p><p>  Toronto, ON, Canada M5S 3G4 {vaughn, jayar}@eecg.toronto.edu&l

31、t;/p><p><b>  Abstract</b></p><p>  We describe the capabilities of and algorithms used in a new FPGA CAD tool,Versatile Place and Route (VPR). In terms of minimizing routing area, VPR

32、outperforms all published FPGA place and route tools to which we can compare.Although the algorithms used are based on previously known approaches, we present several enhancements that improve run-time and quality. We pr

33、esent placement and routing results on a new set of large circuits to allow future benchmark comparisons of FPGA place and route to</p><p>  1 Introduction</p><p>  In FPGA research, one must ty

34、pically evaluate the utility of new architectural features experimentally. That is, benchmark circuits are technology mapped, placed and routed onto the FPGA architectures of interest, and measures of the architecture’s

35、quality, such as speed or area, can then readily be extracted. Accordingly, there is considerable need for flexible CAD tools that can target a wide variety of FPGA architectures efficiently, and hence allow fair compari

36、sons of the architectures.This </p><p>  In order to make meaningful FPGA architecture comparisons, it is essential that the CAD tools used to map circuits into each architecture are of high quality. The rou

37、ting phase of VPR outperforms all previously published FPGA routers for which standard benchmarks results are available, and that the combination of VPR’s placer and router outperforms all published combinations of FPGA

38、placement and routing tools.2 The organization of this paper is as follows. In Section 2 we describe some of the f</p><p>  2 Overview of VPR</p><p>  Figure 1 outlines the VPR CAD flow. The inp

39、uts to VPR consist of a technologymapped netlist and a text file describing the FPGA architecture. VPR can place the circuit, or a pre-existing placement can be read in. VPR can then perform either a global route or a co

40、mbined global/detailed route of the placement. VPR’s output consists of the placement and routing, as well as statistics useful in assessing the utility of an FPGA architecture, such as routed wirelength, track count, an

41、d maximum net len</p><p>  ? the number of logic block inputs and outputs,</p><p>  ? the side(s) of the logic block from which each input and output is accessible,</p><p>  ? the l

42、ogical equivalence between various input and output pins (e.g. all LUT</p><p>  inputs are functionally equivalent),</p><p>  ? the number of I/O pads that fit into one row or one column of the

43、FPGA, and</p><p>  ? the dimensions of the logic block array (e.g. 23 x 30 logic blocks).</p><p>  In addition, if global routing is to be performed, one can also specify:</p><p>  

44、? the relative widths of horizontal and vertical channels, and</p><p>  ? the relative widths of the channels in different regions of the FPGA.</p><p>  Finally, if combined global and detailed

45、routing is to be performed, one also specifies:</p><p>  ? the switch block [1] architecture (i.e. how the routing tracks are interconnected),</p><p>  ? the number of tracks to which each logic

46、 block input pin connects (Fc [1]),</p><p>  ? the Fc value for logic block outputs, and</p><p>  ? the Fc value for I/O pads.</p><p>  The current architecture description format d

47、oes not allow segments that span more than one logic block to be included in the routing architecture, but we are presently adding this feature. Adding new routing architecture features to VPR is relatively easy, since V

48、PR uses the architecture description to create a routing resource graph.Every routing track and every pin in the architecture becomes a node in this graph, and the graph edges represent the allowable connections. The rou

49、ter, graphics v</p><p>  2.1 The VPACK Logic Block Packer / Netlist Translator</p><p>  VPACK reads in a blif format netlist of a circuit that has been technology-mapped to LUTs and flip-flops,

50、packs the LUTs and flip flops into the desired FPGA logic block, and outputs a netlist in VPR’s netlist format. VPACK can target a logic block consisting of one LUT and one FF, as shown in Figure 2, as this is a common F

51、PGA logic element. VPACK is also capable of targeting logic blocks that contain several LUTs and several flip flops, with or without shared LUT inputs [6]. These “clusterbase</p><p>  Placement Algorithm<

52、/p><p>  VPR uses the simulated annealing algorithm [7] for placement. We have experimented with several different cost functions, and found that what we call a linear congestion cost function provides the best

53、 results in a reasonable computation time [8].The functional form of this cost function is where the summation is over all the nets in the circuit. For each net, bbx and bby denote the horizontal and vertical spans of it

54、s bounding box, respectively. The q(n)factor compensates for the fact that the </p><p>  the wiring necessary to connect nets with more than three terminals, as suggested in [10]. Its value depends on the nu

55、mber of terminals of net n; q is 1 for nets with 3 or fewer terminals, and slowly increases to 2.79 for nets with 50 terminals.Cav,x(n) and Cav,y(n) are the average channel capacities (in tracks) in the x and y direction

56、s, respectively, over the bounding box of net n.This cost function penalizes placements which require more routing in areas of the FPGA that have narrower channel</p><p>  Routing Algorithm</p><p&

57、gt;  VPR’s router is based on the Pathfinder negotiated congestion algorithm [14, 8].Basically, this algorithm initially routes each net by the shortest path it can find,regardless of any overuse of wiring segments or lo

58、gic block pins that may result. One iteration of the router consists of sequentially ripping-up and re-routing (by the lowest cost path found) every net in the circuit. The cost of using a routing resource is a function

59、of the current overuse of that resource and any overuse that occu</p><p>  Therefore, in the latter invocations of the maze router the partial routing used as the net source will be very large,and it will ta

60、ke a long time to expand the maze router wavefront out to the next sink.Fortunately there is a more efficient method. When a net sink is reached, add all the routing resource segments required to connect the sink and the

61、 current partial routing to the wavefront (i.e. the expansion list) with a cost of 0. Do not empty the current maze routing wavefront; just continue</p><p>  5 Experimental Results</p><p>  The

62、various FPGA parameters used in this section were always chosen to allow a direct comparison with previously published results. All the results in this section were obtained with a logic block consisting of a 4-input LUT

63、 plus a flip flop, as shown in Figure 2. The clock net was not routed in sequential circuits, as it is usually routed via a dedicated routing network in commercial FPGAs. Each LUT input appears on one side of the logic b

64、lock, while the logic block output is accessible from bo</p><p>  5.1 Experimental Results with Input Pin Doglegs</p><p>  Most previous FPGA routing results have assumed that “input pin doglegs

65、” are possible. If the connection box between an input pin and the tracks to which it connects consists of Fc independent pass transistors controlled by Fc SRAM bits, it is possible to turn on two of these switches in or

66、der to electrically connect two tracks via the input pin. We will refer to this as an input pin dogleg. Commercial FPGAs, however, implement the connection box from an input pin to a channel via a multiplexer,</p>

67、<p>  In this section we compare the minimum number of tracks per channel required for a successful routing by various CAD tools on a set of 9 benchmark circuits.1 All the results in Table 2 are obtained by routing

68、 a placement produced by Altor [16], a mincut based placement tool. Three of the columns consist of two-step (global then detailed) routing, while the other routers perform combined global and detailed routing.VPR requir

69、es 10% fewer tracks than the second best router, and the third best rout</p><p>  5.2 Experimental Results Without Input Pin Doglegs</p><p>  Table 4 compares the performance of VPR with that of

70、 the SPLACE/SROUTE toolset,which does not allow input pin doglegs. When both tools are only allowed to route an Altor-generated placement VPR requires 13% fewer tracks than SROUTE. When the tools are allowed to both plac

71、e and route the circuits, VPR requires 29% fewer tracks than the SPLACE/SROUTE combination. Both VPR and SPLACE are based on simulated annealing. We believe the VPR placer outperforms SPLACE partially because it handles

72、high-fan</p><p>  5.3 Experimental Results on Large Circuits</p><p>  The benchmarks used in Sections 5.1 and 5.2 range in size from 54 to 358 logic blocks, and accordingly are too small to be v

73、ery representative of today’s FPGAs.Therefore, in this section we present experimental results for the 20 largest MCNC benchmark circuits [27], which range in size from 1047 to 8383 logic blocks. We use Flowmap [28] to t

74、echnology map each circuit to 4-LUTs and flip flops, and VPACK tocombine flip flops and LUTs into our basic logic block. The number of I/O pads that fit per</p><p>  6 Conclusions and Future Work</p>

75、<p>  We have presented a new FPGA placement and routing tool that outperforms all such tools to which we can make direct comparisons. In addition we have presented benchmark results on much larger circuits than hav

76、e typically been used to characterize academic FPGA place and route tools. We hope the next generation of FPGA CAD tools will be compared on the basis of these larger benchmarks, as they are a closer approximation of the

77、 kind of problems being mapped into today’s FPGAs.One of the main desig</p><p>  References</p><p>  [1] S. Brown, R. Francis, J. Rose, and Z. Vranesic, Field-Programmable Gate Arrays, Kluwer<

78、;/p><p>  Academic Publishers, 1992.</p><p>  [2] Xilinx Inc., The Programmable Logic Data Book, 1994.</p><p>  [3] AT & T Inc., ORCA Datasheet, 1994.</p><p>  [4] Act

79、el Inc., FPGA Data Book, 1994.</p><p>  [5] Altera Inc., Data Book, 1996.</p><p>  [6] V. Betz and J. Rose, “Cluster-Based Logic Blocks for FPGAs: Area-Efficiency vs. Input</p><p> 

80、 Sharing and Size,” CICC, 1997, pp. 551 - 554.</p><p>  [7] S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, “Optimization by Simulated Annealing,”</p><p>  Science, May 13, 1983, pp. 671 -

81、680.</p><p>  [8] V. Betz and J. Rose, “Directional Bias and Non-Uniformity in FPGA Global Routing</p><p>  Architectures,” ICCAD, 1996, pp. 652 - 659.</p><p>  [9] V. Betz and J. R

82、ose, “On Biased and Non-Uniform Global Routing Architectures and CAD</p><p>  Tools for FPGAs,” CSRI Tech. Rep. #358, Dept. of ECE, University of Toronto, 1996.</p><p>  [10] C. E. Cheng, “RISA:

83、 Accurate and Efficient Placement Routability Modeling,” DAC, 1994,</p><p>  pp. 690 - 695.</p><p>  [11] M. Huang, F. Romeo, and A. Sangiovanni-Vincentelli, “An Efficient General Cooling</p&

84、gt;<p>  Schedule for Simulated Annealing,” ICCAD, 1986, pp. 381 - 384.</p><p>  [12] W. Swartz and C. Sechen, “New Algorithms for the Placement and Routing of Macro</p><p>  Cells,” ICCA

85、D, 1990, pp. 336 - 339.</p><p>  [13] J. Lam and J. Delosme, “Performance of a New Annealing Schedule,” DAC, 1988, pp. 306</p><p><b>  - 311.</b></p><p>  [14] C. Ebelin

86、g, L. McMurchie, S. A. Hauck and S. Burns, “Placement and Routing Tools for</p><p>  the Triptych FPGA,” IEEE Trans. on VLSI, Dec. 1995, pp. 473 - 482.</p><p>  [15] C. Y. Lee, “An Algorithm for

87、 Path Connections and its Applications, “IRE Trans. Electron.</p><p>  Comput., Vol. EC=10, 1961, pp. 346 - 365.</p><p>  [16] J. S. Rose, W. M. Snelgrove, Z. G. Vranesic, “ALTOR: An Automatic S

88、tandard Cell Layout</p><p>  Program,” Canadian Conf. on VLSI, 1985, pp. 169 - 173.</p><p>  [17] J. S. Rose, “Parallel Global Routing for Standard Cells,” IEEE Trans. on CAD, Oct. 1990,</p&g

89、t;<p>  pp. 1085 - 1095.</p><p>  [18] S. Brown, J. Rose, Z. G. Vranesic, “A Detailed Router for Field-Programmable Gate</p><p>  Arrays,” IEEE Trans. on CAD, May 1992, pp. 620 - 628.<

90、/p><p>  [19] G. Lemieux, S. Brown, “A Detailed Router for Allocating Wire Segments in FPGAs,”</p><p>  ACM/SIGDA Physical Design Workshop, 1993, pp. 215 - 226.</p><p>  [20] Y.-L. Wu,

91、 M. Marek-Sadowska, “An Efficient Router for 2-D Field-Programmable Gate</p><p>  Arrays,” EDAC, 1994, pp. 412 - 416.</p><p>  [21] Y.-L. Wu, M. Marek-Sadowska, “Orthogonal Greedy Coupling -- A

92、New Optimization</p><p>  Approach to 2-D FPGA Routing,” DAC, 1995, pp. 568 - 573.</p><p>  [22] M. J. Alexander, G. Robins, “New Performance-Driven FPGA Routing Algorithms,” DAC,</p><

93、;p>  1995, pp. 562 - 567.</p><p>  [23] G. Lemieux, S. Brown, D. Vranesic, “On Two-Step Routing for FPGAs,” Int. Symp. on</p><p>  Physical Design, 1997, pp. 60 - 66.</p><p>  [2

94、4] Y.-S. Lee, A. Wu, “A Performance and Routability Driven Router for FPGAs Considering</p><p>  Path Delays,” DAC, 1995, pp. 557 - 561.</p><p>  [25] M. J. Alexander, J. P. Cohoon, J. L. Ganley

95、, G. Robins, “Performance-Oriented Placement</p><p>  and Routing for Field-Programmable Gate Arrays,” EDAC, 1995, pp. 80 - 85.</p><p>  [26] S. Wilton, “Architectures and Algorithms for Field-P

96、rogrammable Gate Arrays with</p><p>  Embedded Memories,” Ph.D. Dissertation, University of Toronto, 1997.</p><p>  [27] S. Yang, “Logic Synthesis and Optimization Benchmarks, Version 3.0,” Tech

97、. Report,</p><p>  Microelectronics Centre of North Carolina, 1991.</p><p>  [28] J. Cong and Y. Ding, “Flowmap: An Optimal Technology Mapping Algorithm for Delay</p><p>  Optimizat

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論