版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、Efficient System-Enforced Deterministic ParallelismAmittai Aviram, Shu-Chun Weng, Sen Hu, Bryan Ford Yale UniversityAbstractDeterministic execution offers many benefits for debug- ging, fault tolerance, and security. Cur
2、rent methods of executing parallel programs deterministically, how- ever, often incur high costs, allow misbehaved software to defeat repeatability, and transform time-dependent races into input- or path-dependent races
3、without elim- inating them. We introduce a new parallel program- ming model addressing these issues, and use Determina- tor, a proof-of-concept OS, to demonstrate the model’s practicality. Determinator’s microkernel API
4、provides only “shared-nothing” address spaces and determinis- tic interprocess communication primitives to make ex- ecution of all unprivileged code—well-behaved or not— precisely repeatable. Atop this microkernel, Deter
5、mi- nator’s user-level runtime adapts optimistic replication techniques to offer a private workspace model for both thread-level and process-level parallel programing. This model avoids the introduction of read/write dat
6、a races, and converts write/write races into reliably-detected con- flicts. Coarse-grained parallel benchmarks perform and scale comparably to nondeterministic systems, on both multicore PCs and across nodes in a distrib
7、uted cluster.1 IntroductionWe often wish to run software deterministically, so that from a given input it always produces the same out- put. Determinism is the foundation of replay debug- ging [37, 39, 46, 56], fault tol
8、erance [15, 18, 50], and ac- countability mechanisms [30, 31]. Methods of intrusion analysis [22, 34] and timing channel control [4] further assume the system can enforce determinism even on ma- licious code designed to
9、evade analysis. Executing par- allel software deterministically is challenging, however, because threads sharing an address space—or processes sharing resources such as file systems—are prone to non- deterministic, timin
10、g-dependent races [3,40,42,43]. User-space techniques for parallel deterministic exe- cution [8, 10, 20, 21, 44] show promise but have limi- tations. First, by relying on a deterministic scheduler residing in the applica
11、tion process, they permit buggy or malicious applications to compromise determinism by interfering with the scheduler. Second, determinis- tic schedulers emulate conventional APIs by synthesiz- ing a repeatable—but arbit
12、rary—schedule of inter-thread interactions, often using an instruction counter as an arti- ficial time metric. Data races remain, therefore, but theirmanifestation depends subtly on inputs and code path lengths instead o
13、f on “real” time. Third, the user-level instrumentation required to isolate and schedule threads’ memory accesses can incur considerable overhead, even on coarse-grained code that synchronizes rarely.To meet the software
14、 development, debugging, and security challenges that ubiquitous parallelism presents, it may be insufficient to shoehorn the standard nonde- terministic programming model into a synthetic execu- tion schedule. Instead w
15、e propose to rethink the basic model itself. We would like a parallel environment that: (a) is “deterministic by default” [12, 40], except when we inject nondeterminism explicitly via external inputs; (b) introduces no d
16、ata races, either at the memory ac- cess level [25, 43] or at higher semantic levels [3]; (c) can enforce determinism on arbitrary, compromised or malicious code for security reasons; and (d) is efficient enough to use f
17、or “normal-case” execution of deployed code, not just for instrumentation during development.As a step toward such a model, we present Determi- nator, a proof-of-concept OS designed around the above goals. Due to its OS-
18、level approach, Determinator sup- ports existing languages, can enforce deterministic exe- cution not only on a single process but on groups of in- teracting processes, and can prevent malicious user-level code from subv
19、erting the kernel’s guarantee of determin- ism. In order to explore the design space freely, Determi- nator takes a “clean-slate” approach, making few com- promises for backward compatibility with existing ker- nels or A
20、PIs. Determinator’s programming model could be implemented in a legacy kernel for backward compat- ibility, however, as part of a “deterministic sandbox” for example [9]. Determinator’s user-level runtime also pro- vides
21、 limited emulation of the Unix process, thread, and file APIs, to simplify application porting.Determinator’s kernel enforces determinism by deny- ing user code direct access to hardware resources whose use can yield non
22、deterministic behavior, including real- time clocks, cycle counters, and writable shared memory. Determinator constrains user code to run within a hierar- chy of single-threaded, process-like spaces, each having a privat
23、e virtual address space. The kernel’s low-level API provides only three system calls, with which a space can synchronize and communicate with its immediate parent and children. Potentially useful sources of non- determin
24、ism, such as timers, Determinator encapsulates into I/O devices, which unprivileged spaces can accesscate, verify, or analyze a program’s execution history. Replay can be efficient when only I/O need be logged, as for a
25、uniprocessor virtual machine [22], but becomes much more costly if internal sources of nondeterminism due to parallelism must also be replayed [19,23]. Determinator therefore transforms useful sources of nondeterminism i
26、nto explicit I/O, which applications may obtain via controllable channels, and eliminates only internal nondeterminism resulting from parallelism. If an application calls gettimeofday(), for example, then a supervising p
27、rocess can intercept this I/O to log, replay, or synthesize these explicit time inputs.2.2 A Race-Free Model for Shared StateConventional systems give threads direct, concurrent ac- cess to many forms of shared state, su
28、ch as shared mem- ory and file systems, yielding data races and heisenbugs if the threads fail to synchronize properly [25, 40, 43]. While replay debuggers [37,39,46,56] and deterministic schedulers [8,10,20,21,44] make
29、data races reproducible once they manifest, they do not change the inherently race-prone model in which developers write applications. Determinator replaces the standard concurrent access model with a private workspace m
30、odel, in which data races do not arise in the first place. This model gives each thread a complete, private virtual replica of all log- ically shared state a thread may access, including shared memory and file system sta
31、te. A thread’s normal reads and writes affect only its private working state, and do not interact directly with other threads. Instead, Deter- minator accumulates each threads’s changes to shared state, then reconciles t
32、hese changes among threads only at program-defined synchronization points. This model is related to and inspired by early parallel Fortran sys- tems [7, 51], replicated file systems [47], transactional memory [33, 52] an
33、d operating systems [48], and dis- tributed version control systems [29], but to our knowl- edge Determinator is the first OS to introduce a model for pervasive thread- and process-level determinism. If one thread execut
34、es the assignment ‘x = y’ while another concurrently executes ‘y = x’, for example, these assignments race in the conventional model, but are race-free under Determinator and always swap x with y. Each thread’s read of x
35、 or y always sees the “old” version of that variable, saved in the thread’s private workspace at the last explicit synchronization point. Figure 1 illustrates a more realistic example of a game or simulator, which uses a
36、n array of “actors” (players, particles, etc.) to represent some logical “universe,” and updates all of the actors in parallel at each time step. To update the actors, the main thread forks a child thread to process each
37、 actor, then synchronizes by joining all these child threads. The child thread code to update each ac- tor is shown “inline” within the main() function, whichstruct actor state actor[nactors];main() initialize all elemen
38、ts of actor[] array for (time = 0; ; time++) for (i = 0; i < nactors; i++) if (thread fork(i) == IN CHILD) // child thread to process actor[i] examine state of nearby actors update state of actor[i] accordingly thread
39、 exit(); for (i = 0; i < nactors; i++) thread join(i);Figure 1: C pseudocode for lock-step time simulation, which contains a data race in standard concurrency mod- els but is bug-free under Determinator.under Unix wor
40、ks only with process-level fork(); De- terminator offers this convenience for shared memory threads as well, as discussed later in Section 4.4.In this example, each child thread reads the “prior” state of any or all acto
41、rs in the array, then updates the state of its assigned actor “in-place,” without any explicit copying or additional synchronization. With standard threads this code has a read/write race: each child thread may see an ar
42、bitrary mix of “old” and “new” states as it examines other actors in the array. Under Determi- nator, however, this code is correct and race-free. Each child thread reads only its private working copy of the actors array
43、, which is untouched (except by the child thread itself) since the main thread forked that child. As the main thread rejoins all its child threads, Determina- tor merges each child’s actor array updates back into the mai
44、n thread’s working copy, for use in the next time step.While read/write races disappear in Determinator’s model, traditional write/write races become conflicts. If two child threads concurrently write to the same actor a
45、rray element, for example, Determinator detects this conflict and signals a runtime exception when the main thread attempts to join the second conflicting child. In the conventional model, by contrast, the threads’ execu
46、- tion schedules might cause either of the two writes to “win” and silently propagate its likely erroneous value throughout the computation. Running this code under a conventional deterministic scheduler causes the “win-
47、 ner” to be decided based on a synthetic, reproducible time metric (e.g., instruction count) rather than real time, but the race remains and may still manifest or vanish due to slight changes in inputs or instruction pat
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 國外操作系統(tǒng)相關(guān)論文flexiblesystemcallschedulingwithexceptionlesssystemcalls
- 國外操作系統(tǒng)相關(guān)論文trustandprotectionintheillinoisbrowseroperatingsystem
- 國外操作系統(tǒng)相關(guān)論文stabledeterministicmultithreadingthroughschedulememoization
- 淺談操作系統(tǒng)(操作系統(tǒng)論文)
- 操作系統(tǒng)論文
- 計算機(jī)操作系統(tǒng)論文—微內(nèi)核操作系統(tǒng)
- 操作系統(tǒng)論文2
- 操作系統(tǒng)原理小論文
- 操作系統(tǒng)復(fù)習(xí)及相關(guān)題目
- 操作系統(tǒng)課程設(shè)計論文
- linux操作系統(tǒng)課程論文模板
- 操作系統(tǒng)課程設(shè)計-- 操作系統(tǒng)
- android操作系統(tǒng)畢業(yè)論文
- 操作系統(tǒng)課程設(shè)計——操作系統(tǒng)課程設(shè)計模擬操作系統(tǒng)
- 操作系統(tǒng)a
- 操作系統(tǒng)
- 操作系統(tǒng)
- 操作系統(tǒng)linux主存管理操作系統(tǒng)實驗 5
- 操作系統(tǒng)程序設(shè)計-操作系統(tǒng)模擬實現(xiàn)
- 內(nèi)存管理(操作系統(tǒng))操作系統(tǒng)課程設(shè)計
評論
0/150
提交評論