若$c=0$,那么显然最优策略是选择$t=\infty$并等待最大点数出现。若$c>0$,则选择$t$时的期望收益$E_t$可以递归计算:$E_0=0$,而
$$E_t=\frac{1}{d}\sum_{i=1}^d\max(r_i, E_{t-1}),$$
其中$r_i$是骰子第$i$面上的点数。这是因为掷出点数后只需比较所得点数和进行$t-1$轮的期望收益就可以决定是否需要继续。
递推求出解即得结果$147668794$。
注:因使用numpy求矩阵逆,以下代码为Python 3。
from numpy import linalg
d=20
def E(a,c):
if len(a)==0:
return 0
if c<=0:
return max(a)
x=sum(a)/len(a)
y=x-c
z=0
t=1
while y>z:
t+=1
z=y
x=sum(max(k,x) for k in a)/len(a)
y=x-c*t
return z
S=[[[]], [[1],[2],[3]], [[1,2],[2,3],[1,3]], [[1,2,3]]]+[[] for k in range(d-3)]
P=[[sum(E(j,k) for j in S[i]) for i in range(4)]+[0]*(d-3) for k in range(d+1)]
res=0
for n in range(4,d+1):
mat=[[1 if i==j else 0 for j in range(n)] for i in range(n)]
for i in range(1,n):
mat[i][i-1]=(i-n)/n
for i in range(n-1):
mat[i][i+1]=-(i+2)/n
for j in range(n-1,-1,-1):
for k in S[j]:
S[j+1].append(k+[n])
for i in range(d+1):
P[i][j+1]+=E(k+[n],i)
res+=sum(linalg.solve(mat,P[c][1:n+1])[-1] for c in range(n))
print(n,res)
print(round(res))
|