簡單實(shí)用的 TensorFlow 實(shí)現(xiàn) RNN 入門教程

本文作者： AI研習(xí)社

2017-05-07 19:31

導(dǎo)語：手把手教你用 TensorFlow 實(shí)現(xiàn) RNN。

雷鋒網(wǎng)按：本文作者劉沖，原文載于作者個(gè)人博客，雷鋒網(wǎng)已獲授權(quán)。

最近在看RNN模型，為簡單起見，本篇就以簡單的二進(jìn)制序列作為訓(xùn)練數(shù)據(jù)，而不實(shí)現(xiàn)具體的論文仿真，主要目的是理解RNN的原理和如何在TensorFlow中構(gòu)造一個(gè)簡單基礎(chǔ)的模型架構(gòu)。其中代碼參考了這篇博客。

數(shù)據(jù)集

首先我們看一下實(shí)驗(yàn)數(shù)據(jù)的構(gòu)造：

輸入數(shù)據(jù)X：在時(shí)間t，Xt的值有50%的概率為1，50%的概率為0；
輸出數(shù)據(jù)Y：在實(shí)踐t，Yt的值有50%的概率為1，50%的概率為0，除此之外，如果`Xt-3 == 1`，Yt為1的概率增加50%，如果`Xt-8 == 1`，則Yt為1的概率減少25%，如果上述兩個(gè)條件同時(shí)滿足，則Yt為1的概率為75%。

可知，Y與X有兩個(gè)依賴關(guān)系，一個(gè)是t-3，一個(gè)是t-8。我們實(shí)驗(yàn)的目的就是檢驗(yàn)RNN能否捕捉到Y(jié)與X之間的這兩個(gè)依賴關(guān)系。實(shí)驗(yàn)使用交叉熵作為評(píng)價(jià)標(biāo)準(zhǔn)，則有下面三條理想的實(shí)驗(yàn)結(jié)果：

如果RNN沒有學(xué)習(xí)到任何一條依賴，那么Yt為1的概率就是0.625（0.5+0.5*0.5-0.5*0.25），所以所獲得的交叉熵應(yīng)該是0.66（-(0.625 * np.log(0.625) + 0.375 * np.log(0.375))）。

如果RNN學(xué)習(xí)到第一條依賴關(guān)系，即Xt-3為1時(shí)Yt一定為1。那么，所以最終的交叉熵應(yīng)該是0.52（-0.5 * (0.875 * np.log(0.875) + 0.125 * np.log(0.125)) -0.5 * (0.625 * np.log(0.625) + 0.375 * np.log(0.375))）。

如果RNN學(xué)習(xí)到了兩條依賴，那么有0.25的概率全對(duì)，0.5的概率正確率是75%，還有0.25的概率正確率是0.5。所以其交叉熵為0.45（-0.50 * (0.75 * np.log(0.75) + 0.25 * np.log(0.25)) - 0.25 * (2 * 0.50 * np.log (0.50)) - 0.25 * (0)）。

數(shù)據(jù)預(yù)處理

這部分主要是生成實(shí)驗(yàn)數(shù)據(jù)，并將其按照RNN模型的輸入格式進(jìn)行切分和batch化。代碼入下：

1，生成實(shí)驗(yàn)數(shù)據(jù)：

def gen_data(size=100000):
X = np.array(np.random.choice(2, size=(size,)))
Y = []
for i in range(size):
threshold = 0.5
#判斷X[i-3]和X[i-8]是否為1，修改閾值
if X[i-3] == 1:
threshold += 0.5
if X[i-8] == 1:
threshold -= 0.25
#生成隨機(jī)數(shù)，以threshold為閾值給Yi賦值
if np.random.rand() > threshold:
Y.append(0)
else:
Y.append(1)
return X, np.array(Y)

接下來將生成的數(shù)據(jù)按照模型參數(shù)設(shè)置進(jìn)行切分，這里需要用得到的參數(shù)主要包括：batch_size和num_steps，分別是批量數(shù)據(jù)大小和RNN每層rnn_cell循環(huán)的次數(shù)，也就是下圖中Sn中n的大小。代碼入下：

def gen_batch(raw_data, batch_size, num_steps):
#raw_data是使用gen_data()函數(shù)生成的數(shù)據(jù)，分別是X和Y
raw_x, raw_y = raw_data
data_length = len(raw_x)

# 首先將數(shù)據(jù)切分成batch_size份，0-batch_size，batch_size-2*batch_size。。。
batch_partition_length = data_length // batch_size
data_x = np.zeros([batch_size, batch_partition_length], dtype=np.int32)
data_y = np.zeros([batch_size, batch_partition_length], dtype=np.int32)
for i in range(batch_size):
data_x[i] = raw_x[batch_partition_length * i:batch_partition_length * (i + 1)]
data_y[i] = raw_y[batch_partition_length * i:batch_partition_length * (i + 1)]

#因?yàn)镽NN模型一次只處理num_steps個(gè)數(shù)據(jù)，所以將每個(gè)batch_size在進(jìn)行切分成epoch_size份，每份num_steps個(gè)數(shù)據(jù)。注意這里的epoch_size和模型訓(xùn)練過程中的epoch不同。
epoch_size = batch_partition_length // num_steps

#x是0-num_steps， batch_partition_length -batch_partition_length +num_steps。。。共batch_size個(gè)
for i in range(epoch_size):
x = data_x[:, i * num_steps:(i + 1) * num_steps]
y = data_y[:, i * num_steps:(i + 1) * num_steps]
yield (x, y)

#這里的n就是訓(xùn)練過程中用的epoch，即在樣本規(guī)模上循環(huán)的次數(shù)
def gen_epochs(n, num_steps):
for i in range(n):
yield gen_batch(gen_data(), batch_size, num_steps)

根據(jù)上面的代碼我們可以看出來，這里的數(shù)據(jù)劃分并沒有將數(shù)據(jù)完全的按照原先的數(shù)據(jù)順序，而是每隔一段取num_steps個(gè)數(shù)據(jù)，這樣組成的batch進(jìn)行訓(xùn)練==這里是為了省事還是另有原因還有待后面學(xué)習(xí)中考證。

模型構(gòu)建

RNN的具體原理我們就不再進(jìn)行贅述，主要是隱層狀態(tài)和輸入連接后計(jì)算新的隱層狀態(tài)和輸出。這里用的是單層的RNN。公式和原理圖如下所示：

St=tanh(W(Xt @ St?1)+bs)
Pt=softmax(USt+bp)

簡單實(shí)用的 TensorFlow 實(shí)現(xiàn) RNN 入門教程

至于使用TensorFlow構(gòu)建RNN模型，主要就是定義rnn_cell類型，然后將其復(fù)用即可。代碼如下所示：

x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
#RNN的初始化狀態(tài)，全設(shè)為零。注意state是與input保持一致，接下來會(huì)有concat操作，所以這里要有batch的維度。即每個(gè)樣本都要有隱層狀態(tài)
init_state = tf.zeros([batch_size, state_size])

#將輸入轉(zhuǎn)化為one-hot編碼，兩個(gè)類別。[batch_size, num_steps, num_classes]
x_one_hot = tf.one_hot(x, num_classes)
#將輸入unstack，即在num_steps上解綁，方便給每個(gè)循環(huán)單元輸入。這里可以看出RNN每個(gè)cell都處理一個(gè)batch的輸入（即batch個(gè)二進(jìn)制樣本輸入）
rnn_inputs = tf.unstack(x_one_hot, axis=1)

#定義rnn_cell的權(quán)重參數(shù)，
with tf.variable_scope('rnn_cell'):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer=tf.constant_initializer(0.0))
#使之定義為reuse模式，循環(huán)使用，保持參數(shù)相同
def rnn_cell(rnn_input, state):
with tf.variable_scope('rnn_cell', reuse=True):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer=tf.constant_initializer(0.0))
#定義rnn_cell具體的操作，這里使用的是最簡單的rnn，不是LSTM
return tf.tanh(tf.matmul(tf.concat([rnn_input, state], 1), W) + b)

state = init_state
rnn_outputs = []
#循環(huán)num_steps次，即將一個(gè)序列輸入RNN模型
for rnn_input in rnn_inputs:
state = rnn_cell(rnn_input, state)
rnn_outputs.append(state)
final_state = rnn_outputs[-1]

#定義softmax層
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
#注意，這里要將num_steps個(gè)輸出全部分別進(jìn)行計(jì)算其輸出，然后使用softmax預(yù)測(cè)
logits = [tf.matmul(rnn_output, W) + b for rnn_output in rnn_outputs]
predictions = [tf.nn.softmax(logit) for logit in logits]

# Turn our y placeholder into a list of labels
y_as_list = tf.unstack(y, num=num_steps, axis=1)

#losses and train_step
losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(labels=label, logits=logit) for \
logit, label in zip(logits, y_as_list)]
total_loss = tf.reduce_mean(losses)
train_step = tf.train.AdagradOptimizer(learning_rate).minimize(total_loss)

模型訓(xùn)練

定義好我們的模型之后，接下來就是將數(shù)據(jù)傳入，然后進(jìn)行訓(xùn)練，代碼入下：

def train_network(num_epochs, num_steps, state_size=4, verbose=True):
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
training_losses = []
#得到數(shù)據(jù)，因?yàn)閚um_epochs==1，所以外循環(huán)只執(zhí)行一次
for idx, epoch in enumerate(gen_epochs(num_epochs, num_steps)):
training_loss = 0
#保存每次執(zhí)行后的最后狀態(tài)，然后賦給下一次執(zhí)行
training_state = np.zeros((batch_size, state_size))
if verbose:
print("\nEPOCH", idx)
#這是具體獲得數(shù)據(jù)的部分，應(yīng)該會(huì)執(zhí)行1000000//200//5 = 1000次，即每次執(zhí)行傳入的數(shù)據(jù)是batch_size*num_steps個(gè)（1000），共1000000個(gè)，所以每個(gè)num_epochs需要執(zhí)行1000次。
for step, (X, Y) in enumerate(epoch):
tr_losses, training_loss_, training_state, _ = \
sess.run([losses,
total_loss,
final_state,
train_step],
feed_dict={x:X, y:Y, init_state:training_state})
training_loss += training_loss_
if step % 100 == 0 and step > 0:
if verbose:
print("Average loss at step", step,
"for last 250 steps:", training_loss/100)
training_losses.append(training_loss/100)
training_loss = 0

return training_losses
training_losses = train_network(1,num_steps)
plt.plot(training_losses)
plt.show()

實(shí)驗(yàn)結(jié)果如下所示：

簡單實(shí)用的 TensorFlow 實(shí)現(xiàn) RNN 入門教程

從上圖可以看出交叉熵最終穩(wěn)定在0。52，按照我們上面的分析可以知道：RNN模型成功的學(xué)習(xí)到了第一條依賴關(guān)系，因?yàn)槲覀兊难h(huán)步長選擇的是5，所以他只能學(xué)習(xí)到t-3的第一條依賴關(guān)系，而無法學(xué)習(xí)到t-8的第二條依賴。
接下來可以嘗試num_steps==10，區(qū)捕捉第二條依賴關(guān)系。最終的結(jié)果圖如下所示：

簡單實(shí)用的 TensorFlow 實(shí)現(xiàn) RNN 入門教程

從上圖可以看出，我們的RNN模型成功的學(xué)習(xí)到了兩條依賴關(guān)系。最終的交叉熵未定在0.46附近。

幾點(diǎn)改進(jìn)

1，首先上面的代碼中，為了盡可能詳細(xì)的解釋TensorFlow中RNN模型的構(gòu)造方法，將rnn_cell的定義寫的很詳細(xì)，其實(shí)這些工作tf已經(jīng)封裝好了，我們只需要一行命令就可以實(shí)現(xiàn)，所以第一個(gè)要改進(jìn)的地方就是將rnn_cell的定義和循環(huán)使用部分的代碼簡化：

#定義rnn_cell的權(quán)重參數(shù)，
with tf.variable_scope('rnn_cell'):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer=tf.constant_initializer(0.0))
#使之定義為reuse模式，循環(huán)使用，保持參數(shù)相同
def rnn_cell(rnn_input, state):
with tf.variable_scope('rnn_cell', reuse=True):
W = tf.get_variable('W', [num_classes + state_size, state_size])
b = tf.get_variable('b', [state_size], initializer=tf.constant_initializer(0.0))
#定義rnn_cell具體的操作，這里使用的是最簡單的rnn，不是LSTM
return tf.tanh(tf.matmul(tf.concat([rnn_input, state], 1), W) + b)

state = init_state
rnn_outputs = []
#循環(huán)num_steps次，即將一個(gè)序列輸入RNN模型
for rnn_input in rnn_inputs:
state = rnn_cell(rnn_input, state)
rnn_outputs.append(state)
final_state = rnn_outputs[-1]

#----------------------上面是原始代碼，定義了rnn_cell，然后使用循環(huán)的方式對(duì)其進(jìn)行復(fù)用，簡化之后我們可以直接調(diào)用BasicRNNCell和static_rnn兩個(gè)函數(shù)實(shí)現(xiàn)------------------------

cell = tf.contrib.rnn.BasicRNNCell(state_size)
rnn_outputs, final_state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=init_state)

2，使用動(dòng)態(tài)rnn模型，上面的模型中，我們將輸入表示成列表的形式，即rnn_inputs是一個(gè)長度為num_steps的列表，其中每個(gè)元素是[batch_size, features]的tensor（即每個(gè)rnn_cell要處理的數(shù)據(jù)），這樣做事比較麻煩的，我們還可以使用tf提供的dynamic_rnn函數(shù)，這樣做不僅會(huì)使編程更加簡單，還可以提高計(jì)算效率。使用dynamic_rnn 時(shí)，我們直接將輸入表示成[batch_size, num_steps, features]的三維Tensor即可。最終的動(dòng)態(tài)RNN模型代碼如下所示：

x = tf.placeholder(tf.int32, [batch_size, num_steps], name='input_placeholder')
y = tf.placeholder(tf.int32, [batch_size, num_steps], name='labels_placeholder')
init_state = tf.zeros([batch_size, state_size])

rnn_inputs = tf.one_hot(x, num_classes)
#注意這里去掉了這行代碼，因?yàn)槲覀儾恍枰獙⑵浔硎境闪斜淼男问皆谑褂醚h(huán)去做。
#rnn_inputs = tf.unstack(x_one_hot, axis=1)

cell = tf.contrib.rnn.BasicRNNCell(state_size)
#使用dynamic_rnn函數(shù)，動(dòng)態(tài)構(gòu)建RNN模型
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)

with tf.variable_scope('softmax'):
W = tf.get_variable('W', [state_size, num_classes])
b = tf.get_variable('b', [num_classes], initializer=tf.constant_initializer(0.0))
logits = tf.reshape(
tf.matmul(tf.reshape(rnn_outputs, [-1, state_size]), W) + b,
[batch_size, num_steps, num_classes])
predictions = tf.nn.softmax(logits)

losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
total_loss = tf.reduce_mean(losses)
train_step = tf.train.AdagradOptimizer(learning_rate).minimize(total_loss)

至此，我們就實(shí)現(xiàn)了一個(gè)很簡單的RNN模型的構(gòu)造，在這個(gè)過程中，我們需要注意的主要有以下三點(diǎn)：

如何將數(shù)據(jù)轉(zhuǎn)化成rnn所能接受的輸入格式，需要注意batch_size和num_steps之間的關(guān)系。
定義rnn_cell，這里使用的是下面這條命令：cell = tf.contrib.rnn.BasicRNNCell(state_size)
定義RNN模型，可以使用下面這兩條命令分別靜態(tài)和動(dòng)態(tài)構(gòu)建：

rnn_outputs, final_state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=init_state)
rnn_outputs, final_state = tf.nn.dynamic_rnn(cell, rnn_inputs, initial_state=init_state)

雷鋒網(wǎng)(公眾號(hào)：雷鋒網(wǎng))相關(guān)閱讀：

從原理到實(shí)戰(zhàn) 英偉達(dá)教你用PyTorch搭建RNN（上）

從原理到實(shí)戰(zhàn) 英偉達(dá)教你用PyTorch搭建RNN（下）

雷峰網(wǎng)版權(quán)文章，未經(jīng)授權(quán)禁止轉(zhuǎn)載。詳情見轉(zhuǎn)載須知。

19人收藏

相關(guān)文章

AI研習(xí)社

編輯

聚焦數(shù)據(jù)科學(xué)，連接 AI 開發(fā)者。更多精彩內(nèi)容，請(qǐng)?jiān)L問：yanxishe.com

發(fā)私信

當(dāng)月熱門文章

“因其偉大，故而艱難”，資深科普作家陳宗周解碼AI七十年