倒立擺畢業(yè)設(shè)計外文翻譯---multi-agent旋翼試驗臺控制系統(tǒng)設(shè)計_第1頁
已閱讀1頁,還剩39頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、<p>  Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode vs. Reinforcement Learning</p><p>  Steven L. Waslander, Gabriel M. Hoffmann</p><p>  Ph.D. Candidate Aeronautics a

2、nd Astronautics Stanford University</p><p>  {stevenw, gabeh}@stanford.edu</p><p>  Jung Soon Jang Research Associate Aeronautics and Astronautics Stanford University jsjang@stanford.edu</p&

3、gt;<p>  Claire J. Tomlin Associate Professor Aeronautics and Astronautics Stanford University tomlin@stanford.edu</p><p>  Abstract—The Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Cont

4、rol (STARMAC) is a multi-vehicle testbed currently comprised of two quadrotors, also called X4-?yers, with capacity for eight. This paper presents a comparison of control design techniques, speci?cally for outdoor altitu

5、de control, in and above ground effect, that accommodate the unique dynamics of the aircraft. Due to the complex air?ow in- duced by the four interacting rotors, classical linear techniques failed to prov</p><

6、;p>  I. INTRODUCTION </p><p>  As ?rst introduced by the authors in [1],the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control(STARMAC) is an aerial platform intended to validate novel mu

7、lti-vehicle control techniques and present real-world problems for further investigation.The base vehicle for STARMAC is a four rotor aircraft with ?xed pitch blades, referred to as a quadrotor, or an X4-?yer.They are c

8、apable of 15 minute outdoor ?ights in a 100m square area[1].</p><p>  Fig. 1. One of the STARMAC quadrotors in action.</p><p>  There have been numerous projects involving quadrotors to date,wit

9、h the ?rst known hover occurring in October,1922[2]. Recent interest in the quadrotor concept has been sparked by commercial remote control versions, such as the DraganFlyer IV[3]. Many groups [4]–[7]have seen significan

10、t success in developing autonomous quadrotor vehicles. To date,however,STARMAC is the only operational multi-vehicle quadrotor platform capable of autonomous outdoor ?ight, without tethers or motion guides.</p>&l

11、t;p>  The ?rst major milestone for STARMAC was autonomous hover control,with closed loop control of attitude, altitude and position. Using inertial sensing, the attitude of the aircraft is simple to control, by applyi

12、ng small variations in the relative speeds of the blades. In fact, standard integral LQR techniques were applied to provide reliable attitude stability and tracking for the vehicle.Position control was also achieved with

13、 an integral LQR, with careful design in order to ensure spectral sep</p><p>  Unfortunately, altitude control proves less straightforward. There are many factors that affect the altitude loop specifically t

14、hat do not amend themselves to classical control techniques. Foremost is the highly nonlinear and destabilizing effect of four rotor downwashes interacting. In our experience, this effect becomes critical when motion is

15、not damped by motion guides or tethers. Empirical observation during manual ?ight revealed a noticeable loss in thrust upon descent through the highly </p><p>  In order to accommodate this combination of n

16、oise and disturbances, two distinct approaches are adopted. Integral Sliding Mode (ISM) control[10]–[12] takes the approach that the disturbances cannot be modeled, and instead designsa control law that is guaranteed to

17、be robust to disturbances as long as they do not exceed a certain magnitude. Model-based reinforcement learning[13] creates a dynamic model based on recorded inputs and responses, without any knowledge of the underlying

18、 dynamics, and </p><p>  II. SYSTEM DESCRIPTION </p><p>  STARMAC consists of a ?eet of quadrotors and a ground station. The system communicates over a Bluetooth Class 1 network. The core of th

19、e aircraft are microcontroller circuit boards designed and assembled at Stanford, for this project. The microcontrollers run real-time control code, interface with sensors and the ground station, and supervise the syste

20、m. </p><p>  The aircraft are capable of sensing position, attitude, and proximity to the ground. The differential GPS receiver is theTrimble Lassen LP, operating on the L1 band, providing 1Hz updates. The I

21、MU is the MicroStrain 3DM-G, a low cost, light weight IMU that delivers 76 Hz attitude, attitude rate, and acceleration readings. The distance from the ground is found using ultrasonic ranging at 12 Hz.</p><p&

22、gt;  The ground station consists of a laptop computer, to interface with the aircraft, and a GPS receiver, to provide differential corrections. It also has a battery charger, and joysticks for control-augmented manual ?i

23、ght, when desired.</p><p>  III. QUADROTOR DYNAMICS</p><p>  The derivation of the nonlinear dynamics is performed in North-East-Down (NED) inertial and body ?xed coordinates. Let {eN , eE , eD

24、} denote the inertial axes, and {xB , yB , zB } denote the body axes, as de?ned in Figure 2. Euler angles of the body axes are {φ, θ, ψ} with respect to the eN , eE and eD axes, respectively, and are referred to as roll,

25、 pitch andyaw. Let r be de?ned as the position vector from the inertial origin to the vehicle center of gravity (CG), and let ωB be de?ned as the a</p><p>  Fig.2. Free body diagram of a quadrotor aircraft.&

26、lt;/p><p>  The rotors, numbered 1?4, are mounted outboard on the xB,yB,?xB and -yB axes,respectively, with position vectors ri with respect to the CG. Each rotor produces an aerodynamic torque, Qi , and thrust

27、, Ti , both parallel to the rotor’s axis of rotation, and both used for vehicle control.Here, , where ui is the voltage applied to the motors, as determined from a load cell test. In ?ight, Ti can vary greatly from this

28、approximation. The torques, Qi , are proportional to the rotor thrust, and are giv</p><p>  The body drag force is de?ned as DB , vehicle mass is m, acceleration due to gravity is g, and the inertia m

29、atrix is I ∈ R3×3 . A free body diagram is depicted in Figure 2. The total force, F, and moment, M, can be summed as,</p><p><b> ?。?)</b></p><p><b>  (2)</b></p&g

30、t;<p>  The full nonlinear dynamics can be described as,</p><p><b> ?。?)</b></p><p>  where the total angular momentum of the rotors is assumed to be near zero, because they a

31、re counter-rotating. Near hover conditions, the contributions by rolling moment and drag can be neglected in Equations (1) and (2). De?ne the total thrust as The translational motion is de?ned by,</p><p><

32、;b> ?。?)</b></p><p>  Where Rφ,Rθ, and Rψ are the rotation matrices for roll, pitch, and yaw, respectively. Applying the small angle approximation to the rotation matrices,</p><p><b

33、> ?。?)</b></p><p>  Finally, assuming total thrust approximately counteracts gravity, except in the eD axis.</p><p><b>  (6)</b></p><p>  For small angular velo

34、cities, the Euler angle accelerations are determined from Equation (3) by dropping the second order term,ω×Iω, and expanding the thrust into its four constituents. The angular equations become,</p><p>&

35、lt;b>  (7)</b></p><p>  Where the moment arm lengthl=||ri×zB||is identical for all rotors due to symmetry. The resulting linear models can now be used for control design. </p><p> 

36、 IV. ESTIMATION AND CONTROL DESIGN</p><p>  Applying the concept of spectral separation, inner loop control of attitude and altitude is performed by commanding motor voltages, and outer loop position control

37、 is performed by commanding attitude requests for the inner loop. Accurate attitude control of the plant in Equation (7) is achieved with an integral LQR controller design to account for thrust biases. </p><p&

38、gt;  Position estimation is performed using a navigation ?lter that combines horizontal position and velocity information from GPS, vertical position and estimated velocity information from the ultrasonic ranger, and acc

39、eleration and angular rates from the IMU in a Kalman ?lter that includes bias estimates. Integral LQR techniques are applied to the horizontal components of the linear position plant described in Equation (6). The result

40、ing hover performance is shown in Figure 6. </p><p>  As described above, altitude control suffers exceedingly from unmodeled dynamics. In fact, manual command of the throttle for altitude control remains a

41、challenge for the authors to this day. Additional complications arise from the ultrasonic ranging sensor, which has frequent erroneous readings, as seen in Figure 3. To alleviate the effect of this noise, rejection of in

42、feasible measurements is used to remove much of the non-Gaussian noise component. This is followed by altitude and altitude rat</p><p>  Fig. 3. Characteristic unprocessed ultrasonic ranging data, displaying

43、 spikes, false echoes and dropouts. Powered ?ight commences at 185 seconds.</p><p>  Integral Sliding Mode Control</p><p>  A linear approximation to the altitude error dynamics of a quadrotor a

44、ircraft in hover is given by,</p><p><b>  (8)</b></p><p>  where{x1, x2}={(rz,des?rz),( rz,des?r˙z)}are the altitude error states,ui is the control input, andξ(·) is a bounded m

45、odel of disturbances and dynamic uncertainty. It is assumed that ξ(·) satis?es ||ξ||≤γ where γ is the upper bounded norm of ξ(·). </p><p>  In early attempts to stabilize this system, it was observ

46、ed that LQR control was not able to address the instability and performance degradation due to ξ(g, x). Sliding Mode Control (SMC) was adapted to provide a systematic approach to the problem of maintaining stability and

47、consistent performance in the face of modeling imprecision and disturbances. However, until the system dynamics reach the sliding mani-fold, such nice properties of SMC are not assured. In order to provide robust control

48、 th</p><p><b>  (9)</b></p><p>  Where Kp and Kd are proportional and derivative loop gains that stabilize the linear dynamics without disturbances. For disturbance rejection, a slid

49、ing surface,s, is designed,</p><p><b>  (10)</b></p><p>  such that state trajectories are forced towards the manifold s= 0. Here,s0 is a conventional sliding mode design, Z is an ad

50、ditional term that enables integral control to be included, and α, k∈R are positive constants. Based on the following Lyapunov function candidate,</p><p>  , the control component,ud, can be determined such

51、that V <0, guranteeing convergence to the sliding manifold.</p><p><b> ?。?1)</b></p><p>  The above condition holds if z = ?α(up+kx2) and ud can be guaranteed to satisfy,</p>

52、;<p><b> ?。?2)</b></p><p>  Since the disturbances,ξ(g, x), are bounded by γ, de?ne ud to be ud=?λs with λ∈R. Equation (11) becomes,</p><p><b> ?。?3)</b></p>

53、<p>  and it can be seen that λ|s| ?γ >0. As a result, for up and ud as above, the sliding mode condition holds when,</p><p><b> ?。?4)</b></p><p>  With the input derived a

54、bove, the dynamics are guaranteed to evolve such that s decays to within the boundary layer,, of the sliding manifold. Additionally, the system does not suffer from input chatter as conventional sliding mode controllers

55、 do, as the control law does not include a switching function along the sliding mode.</p><p>  V. REINFORCEMENT LEARNING CONTROL</p><p>  An alternate approach is to implement a reinforcement le

56、arning controller. Much work has been done on continuous state-action space reinforcement learning methods[13], [14]. For this work, a nonlinear,nonparametric model of the system is ?rst constructed using ?ight data, app

57、roximating the system as a stochastic Markov process[15], [16]. Then a model-based reinforcement learning algorithm uses the model in policy-iteration to search for an optimal control policy that can be implemented on th

58、e em</p><p>  In order to model the aircraft dynamics as a stochastic Markov process, a Locally Weighted Linear Regression (LWLR) approach is used to map the current state,S(t)∈Rns, and input,u(t)∈Rnu, onto

59、the subsequent state estimate,S(t+ 1).</p><p>  In this application,,where V is the battery level. In the altitude loop, the input,u∈R, is the total motor power,u. The subsequent state mapping is the summati

60、on of the traditional LWLR estimate, using the current state and input, with the random vector,v∈Rns, representing unmodeled noise. The value for v is drawn from the distribution of output error as determined by using a

61、maximum likelihood estimate[16] of the Gaussian noise in the LWLR estimate. Although the true distribution is not perfect</p><p>  The LWLR method[17] is well suited to this problem, as it ?ts a non-parametr

62、ic curve to the local structure of the data. The scheme extends least squares by assigning weights to each training data point according to its proximity to the input value, for which the output is to be computed. The te

63、chnique requires a sizable set of training data in order to re?ect the full dynamics of the system, which is captured from ?ights ?own under both automatic and manually controlled thrust, with the attitud</p><

64、p>  For m training data points, the input training samples are stored in X∈R(m)×(ns+nu+1), and the outputs corresponding to those inputs are stored inY∈Rm×ns. These matrices are de?ned as</p><p>

65、;<b>  ,(15)</b></p><p>  The column of ones in X enables the inclusion of a constant offset in the solution, as used in linear regression.The diagonal weighting matrix W ∈ Rm×m , which acts

66、 on X , has one diagonal entry for each training data point. That entry gives more weight to training data points that are close to the S(t) and u(t) for which S? (t + 1) is to be computed.</p><p>  The dist

67、ance measure used in this work is</p><p><b> ?。?6)</b></p><p>  Where x(i) is the ith row of X, x is the vector,</p><p>  and ?t parameter τ is used to adjust the range

68、of in?uence of training points. The value for τ can be tuned by cross validation to prevent over- or under-?tting the data. Note that it may be necessary to scale the columns before taking the Euclidean norm to prevent u

69、ndue in?uence of one state on the W matrix. </p><p>  The subsequent state estimate is computed by summing the LWLR estimate with v,</p><p><b> ?。?7)</b></p><p>  Becaus

70、e W is a continuous function of x and X, as x is varied, the resulting estimate is a continuous non-parametric curve capturing the local structure of the data. The matrix computations, in code, exploit the large diagonal

71、 matrix W; as each Wi,i is computed, it is multiplied by row x(i), and stored in W X. </p><p>  The matrix being inverted is poorly conditioned, because weakly related data points have little in?uence, so t

72、heir contribution cannot be accurately numerically inverted. To more accurately compute the numerical inversion, one can perform a singular value decomposition,</p><p>  (XTW X) =UΣVT. Then, numerical error

73、during inversion can be avoided by using the n singular values σi with values of , where the value of Cmax is chosen by cross validation. In this work,Cmax ≈10 was found to minimize numerical error, and was typically sat

74、is?ed by n= 1. The inverse can be directly computed using the n upper singular values in the diagonal matrixΣn∈Rn×n, and the corresponding singular vectors, in Un∈Rm×n and Vn∈Rm×n. Thus, the stochastic Mar

75、kov model becomes</p><p><b> ?。?8)</b></p><p>  Next, model-based reinforcement learning is implemented, incorporating the stochastic Markov model, to design a controller. A quadrati

76、c reward function is used,</p><p><b> ?。?9)</b></p><p>  whereR:R2ns→R,C1>0 and C2>0 are constants giving reward for accurate tracking and good damping respectively, and is th

77、e reference state desired for the system. </p><p>  The control policy maps the observed state S onto the input </p><p>  Command u. In this work, the state space has the constraint of rz ≥0, an

78、d the input command has the constraint of 0≤u≤ u max. The control policy is chosen to be</p><p><b> ?。?0)</b></p><p>  Where w∈R nc is the vector of policy coef?cients w1, . . . , w

79、nc. Linear functions were suf?cient to achieve good stability and performance. Additional terms, such as battery level and integral of altitude error, could be included to make the policy more resilient to differing ?igh

80、t conditions. Policy iteration is performed as explained in Algorithm 1. The algorithm aims to ?nd the value of w that yields the greatest total reward R total, as determined by simulating the system over a ?nite hori<

81、;/p><p>  Algorithm 1 Model-Based Reinforcement Learning </p><p>  1: Generate set S0 of random initial states </p><p>  2: Generate set T of random reference trajectories </p>

82、<p>  3: Initialize w to reasonable values </p><p>  4:R best← ?∞,W best←w</p><p><b>  5: repeat</b></p><p>  6: Rtotal←0</p><p>  7: for s0∈S0, t∈T

83、do</p><p>  8: S(0)←s0</p><p>  9: for t= 0 to tmax?1 do</p><p>  10: u(t)←π(S(t) , w)</p><p>  11: S(t+ 1)←LWL( R(S(t) , u(t) ) +v</p><p>

84、  12: R total←R total+R(S(t+ 1))</p><p>  13: end for</p><p>  14: end for</p><p>  15: if R total> R best then</p><p>  16: Rbest←Rtotal,wbest←w<

85、;/p><p>  17: end if</p><p>  18: Add Gaussian random vector to w best, store as w </p><p>  19: until w best converges </p><p>  In policy iteration, a ?xed set of

86、random initial conditions and reference trajectories are used to simulate ?ights at each iteration, with a given policy parameterized by w. It is necessary to use the same random set at each iteration in order for conver

87、gence to be possible[15]. After each iteration, the new value of w is stored as w best if it outperforms the previous best policy, as determined by comparing Rtotal to Rbest, the previous best reward encountered. Then, a

88、 Gaussian random vector i</p><p>  By using a Gaussian update rule for the policy weights,w, it is possible to escape local maxima of Rtotal. The highest probability steps are small, and result in re?nement

89、of a solution near a local maximum of Rtotal. However, if the algorithm is not at the global maximum, and is allowed to continue, there exists a ?nite probability that a suf?ciently large Gaussian step will be performed

90、such that the algorithm can keep ascending.</p><p>  VI. FLIGHT TEST RESULTS</p><p>  A. Integral Sliding Mode</p><p>  The results of an outdoor ?ight test with ISM control can be

91、seen in Figure 4. The response time is on the order of 1-2 seconds, with 5 seconds settling time, and little to no steady state offset. Also, an oscillatory character can be seen in the response, which is most likely bei

92、ng triggered by the nonlinear aerodynamic effects and sensor data spikes described earlier.</p><p>  Fig. 4. Integral sliding mode step response in outdoor ?ight test.</p><p>  Compared to linea

93、r control design techniques implemented on the aircraft, the ISM control proves a signi?cant enhancement. By explicitly incorporating bounds on the unknown disturbance forces in the derivation of the control law, it is p

94、ossible to maintain stable altitude on a system that has evaded standard approaches.</p><p>  B. Reinforcement Learning Control</p><p>  One of the most exciting aspects of RL control design is

95、its ease of implementation. The policy iteration algorithm arrived at the implemented control law after only 3 hours on a Pentium IV computer. Figure 5 presents ?ight test results for the controller. The high ?delity mod

96、el of the system, used for RL control design, provides a useful tool for comparison of the RL control law with other controllers. In fact, in simulation with linear controllers that proved unstable on the quadrotor, ?igh

97、t p</p><p>  The locally weighted linear regression model showed many relations that were not re?ected in the linear model, but that re?ect the physics of the system well. For instance, with all other states

98、 held ?xed, an upward velocity results in more acceleration at the subsequent time step for a throttle level, and a downward velocity yields the opposite effect. This is essentially negative damping. The model also shows

99、 a strong ground effect. That is, with all other states held ?xed, the closer the vehi</p><p>  Fig. 5. Reinforcement learning controller response to manually applied step input, in outdoor ?ight test. Spike

100、s in state estimates are from sensor noise passing through the Kalman ?lter.</p><p>  The reinforcement learning control law is susceptible to system disturbances for which it is not trained. In particular,

101、varying battery levels and blade degradation may cause a reduction in stability or steady state offset. Addition of an integral error term to the control policy may prove an effective means of mitigating steady state dis

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 眾賞文庫僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論