TinyML是機(jī)器學(xué)習(xí)前沿的一個(gè)分支,致力于在超低功耗、資源受限的邊緣端(MCU)部署機(jī)器學(xué)習(xí)模型,實(shí)現(xiàn)邊緣AI,使機(jī)器學(xué)習(xí)真正大眾化,使生活真正智能化。簡(jiǎn)單來(lái)說(shuō)就是在單片機(jī)上跑深度學(xué)習(xí),很不可思議吧,因?yàn)锳I在大眾的印象里都是需要大算力、高能耗,TinyML為低功耗AI的普及開(kāi)了個(gè)好頭。
下面介紹的一個(gè)項(xiàng)目是TinyML最簡(jiǎn)單入門的一個(gè)小項(xiàng)目,麻雀雖小,五臟俱全,它包含了基本的TinyML項(xiàng)目所有的必要步驟。它就是用神經(jīng)網(wǎng)絡(luò)訓(xùn)練一個(gè)正弦波,然后把正弦波部署到esp32上實(shí)現(xiàn)呼吸燈效果,聽(tīng)著很蹩腳,也沒(méi)什么實(shí)用性,因?yàn)楹粑鼰粽J畮仔写a就搞定了,但這主要是為了入門TinyML嘛,最終我們自己訓(xùn)練的模型會(huì)實(shí)在實(shí)地部署在單片機(jī)上,實(shí)現(xiàn)離線人工智能, 這個(gè)呼吸燈絕對(duì)與眾不同,滿滿的成就感。
不多說(shuō)廢話,任何一個(gè)TinyML項(xiàng)目都包括三個(gè)步驟:
- 數(shù)據(jù)采集、處理
- 模型訓(xùn)練、導(dǎo)出
- 模型部署、功能編寫
下面逐一講解,每一步都有全代碼,把代碼復(fù)制運(yùn)行就好。
數(shù)據(jù)采集、處理
因?yàn)榈认乱胻ensorflow而且要一步步調(diào)試看結(jié)果,所以我們打開(kāi) Colab,基于機(jī)器學(xué)習(xí)的項(xiàng)目最好都以ipython編寫以便于調(diào)試和理解。
PS:其實(shí)正常數(shù)據(jù)的采集應(yīng)該由單片機(jī),即邊緣端來(lái)完成,此項(xiàng)目為了簡(jiǎn)便就自己生成數(shù)據(jù)、然后擬合。
導(dǎo)入包
# TensorFlow is an open source machine learning library
!pip install tensorflow==2.0
import tensorflow as tf
# Numpy is a math library
import numpy as np
# Matplotlib is a graphing library
import matplotlib.pyplot as plt
# math is Python's math library
import math
正弦波數(shù)據(jù)生成
因?yàn)檫@個(gè)項(xiàng)目要訓(xùn)練一個(gè)能擬合正弦波的模型,所以先要模擬生成一些理想數(shù)據(jù),再加一些噪聲模擬成現(xiàn)實(shí)數(shù)據(jù),然后就可以讓我們的模型去擬合它們,最終擬合出一個(gè)漂亮的正弦波,然后將其部署到單片機(jī),以正弦波來(lái)控制LED,實(shí)現(xiàn)呼吸燈。
# We'll generate this many sample datapoints
SAMPLES = 1000
# Set a "seed" value, so we get the same random numbers each time we run this
# notebook. Any number can be used here.
SEED = 1337
np.random.seed(SEED)
tf.random.set_seed(SEED)
# Generate a uniformly distributed set of random numbers in the range from
# 0 to 2π, which covers a complete sine wave oscillation
x_values = np.random.uniform(low=0, high=2*math.pi, size=SAMPLES)
# Shuffle the values to guarantee they're not in order
np.random.shuffle(x_values)
# Calculate the corresponding sine values
y_values = np.sin(x_values)
# Add a small random number to each y value
y_values += 0.1 * np.random.randn(*y_values.shape)
# Plot our data
plt.plot(x_values, y_values, 'b.')
plt.show()
數(shù)據(jù)集分類
我們開(kāi)始分訓(xùn)練集、驗(yàn)證集和測(cè)試集,如果不太懂這些概念的童鞋可以去康康我的博客哦《無(wú)廢話的機(jī)器學(xué)習(xí)筆記》。
# We'll use 60% of our data for training and 20% for testing. The remaining 20%
# will be used for validation. Calculate the indices of each section.
TRAIN_SPLIT = int(0.6 * SAMPLES)
TEST_SPLIT = int(0.2 * SAMPLES + TRAIN_SPLIT)
x_train, x_validate, x_test = np.split(x_values, [TRAIN_SPLIT, TEST_SPLIT])
y_train, y_validate, y_test = np.split(y_values, [TRAIN_SPLIT, TEST_SPLIT])
plt.plot(x_train, y_train, 'b.', label="Train")
plt.plot(x_validate, y_validate, 'y.', label="Validate")
plt.plot(x_test, y_test, 'r.', label="Test")
plt.legend()
plt.show()
模型1訓(xùn)練
我們將建立一個(gè)模型,它將接受一個(gè)輸入值(在本例中是x),并使用它來(lái)預(yù)測(cè)一個(gè)數(shù)值輸出值(x的正弦值)。這種類型的問(wèn)題被稱為回歸。為了實(shí)現(xiàn)這一點(diǎn),我們將創(chuàng)建一個(gè)簡(jiǎn)單的神經(jīng)網(wǎng)絡(luò)。它將使用神經(jīng)元層來(lái)嘗試學(xué)習(xí)訓(xùn)練數(shù)據(jù)下的任何模式,從而做出預(yù)測(cè)。首先,我們將定義兩個(gè)層。
第一層接受一個(gè)輸入(我們的x值),并通過(guò)16個(gè)神經(jīng)元運(yùn)行?;谶@種輸入,每個(gè)神經(jīng)元會(huì)根據(jù)其內(nèi)部狀態(tài)(其權(quán)重和偏置值)被激活到一定程度。神經(jīng)元的激活程度用數(shù)字表示。第一層的激活數(shù)將作為輸入輸入到第二層,也就是單個(gè)神經(jīng)元。它會(huì)將自己的權(quán)重和偏差應(yīng)用到這些輸入,并計(jì)算自己的激活,它將作為我們的y值輸出。
下面單元格中的代碼使用Keras (TensorFlow用于創(chuàng)建深度學(xué)習(xí)網(wǎng)絡(luò)的高級(jí)API)定義了我們的模型。一旦網(wǎng)絡(luò)被定義,我們將編譯它,指定參數(shù)來(lái)決定它將如何訓(xùn)練。
模型1創(chuàng)建
# We'll use Keras to create a simple model architecture
from tensorflow.keras import layers
model_1 = tf.keras.Sequential()
# First layer takes a scalar input and feeds it through 16 "neurons". The
# neurons decide whether to activate based on the 'relu' activation function.
model_1.add(layers.Dense(16, activation='relu', input_shape=(1,)))
# Final layer is a single neuron, since we want to output a single value
model_1.add(layers.Dense(1))
# Compile the model using a standard optimizer and loss function for regression
model_1.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
# Print a summary of the model's architecture
model_1.summary()
我們看到這個(gè)神經(jīng)網(wǎng)絡(luò)只有兩層,弟中之弟哈哈,過(guò)于簡(jiǎn)單,來(lái)看看它的擬合效果怎樣。
模型1訓(xùn)練
一旦我們定義了模型,我們就可以使用數(shù)據(jù)來(lái)訓(xùn)練它。訓(xùn)練包括向神經(jīng)網(wǎng)絡(luò)傳遞一個(gè)x值,檢查網(wǎng)絡(luò)的輸出與期望的y值偏離多少,調(diào)整神經(jīng)元的權(quán)值和偏差,以便下次輸出更有可能是正確的。訓(xùn)練在完整數(shù)據(jù)集上多次運(yùn)行這個(gè)過(guò)程,每次完整的運(yùn)行都被稱為一個(gè)epoch。
訓(xùn)練中要運(yùn)行的epoch數(shù)是我們可以設(shè)置的參數(shù)。在每個(gè)epoch期間,數(shù)據(jù)在網(wǎng)絡(luò)中以多個(gè)批次運(yùn)行。每個(gè)批處理,幾個(gè)數(shù)據(jù)片段被傳遞到網(wǎng)絡(luò),產(chǎn)生輸出值。這些輸出的正確性是整體衡量的,網(wǎng)絡(luò)的權(quán)重和偏差是相應(yīng)調(diào)整的,每批一次。批處理大小也是我們可以設(shè)置的參數(shù)。下面單元格中的代碼使用來(lái)自訓(xùn)練數(shù)據(jù)的x和y值來(lái)訓(xùn)練模型。它運(yùn)行1000個(gè)epoch,每個(gè)批處理中有16條數(shù)據(jù)。我們還傳入一些用于驗(yàn)證的數(shù)據(jù)。 沒(méi)錯(cuò),代碼里就是一行的事。
# Train the model on our training data while validating on our validation set
history_1 = model_1.fit(x_train, y_train, epochs=1000, batch_size=16,
validation_data=(x_validate, y_validate))
檢查訓(xùn)練指標(biāo)
在訓(xùn)練過(guò)程中,模型的性能不斷地根據(jù)我們的訓(xùn)練數(shù)據(jù)和我們?cè)缦攘舫龅尿?yàn)證數(shù)據(jù)進(jìn)行測(cè)量。訓(xùn)練產(chǎn)生一個(gè)數(shù)據(jù)日志,告訴我們模型的性能在訓(xùn)練過(guò)程中是如何變化的。
# Draw a graph of the loss, which is the distance between
# the predicted and actual values during training and validation.
loss = history_1.history['loss']
val_loss = history_1.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'g.', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
再靠近一點(diǎn)看數(shù)據(jù):
# Exclude the first few epochs so the graph is easier to read
SKIP = 100
plt.plot(epochs[SKIP:], loss[SKIP:], 'g.', label='Training loss')
plt.plot(epochs[SKIP:], val_loss[SKIP:], 'b.', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
明顯的過(guò)擬合,模型在測(cè)試集上表現(xiàn)不好,即只擬合了訓(xùn)練數(shù)據(jù),真正應(yīng)用就拉胯。
為了更深入地了解我們的模型的性能,我們可以繪制更多的數(shù)據(jù)。下面我們將繪制平均絕對(duì)誤差MAE,這是另一種衡量網(wǎng)絡(luò)預(yù)測(cè)距離實(shí)際數(shù)字有多遠(yuǎn)的方法:
# Draw a graph of mean absolute error, which is another way of
# measuring the amount of error in the prediction.
mae = history_1.history['mae']
val_mae = history_1.history['val_mae']
plt.plot(epochs[SKIP:], mae[SKIP:], 'g.', label='Training MAE')
plt.plot(epochs[SKIP:], val_mae[SKIP:], 'b.', label='Validation MAE')
plt.title('Training and validation mean absolute error')
plt.xlabel('Epochs')
plt.ylabel('MAE')
plt.legend()
plt.show()
我們看到即使訓(xùn)練了1000次,誤差也會(huì)有30%,太大了,我們畫(huà)出擬合曲線看看有多離譜:
# Use the model to make predictions from our validation data
predictions = model_1.predict(x_train)
# Plot the predictions along with to the test data
plt.clf()
plt.title('Training data predicted vs actual values')
plt.plot(x_test, y_test, 'b.', label='Actual')
plt.plot(x_train, predictions, 'r.', label='Predicted')
plt.legend()
plt.show()
這張圖清楚地表明,我們的網(wǎng)絡(luò)已經(jīng)學(xué)會(huì)了以一種非常有限的方式近似正弦函數(shù)。這些預(yù)測(cè)是高度線性的,只能非常粗略地符合數(shù)據(jù)。這種擬合的剛性表明,該模型沒(méi)有足夠的能力學(xué)習(xí)正弦波函數(shù)的全部復(fù)雜性,所以它只能以一種過(guò)于簡(jiǎn)單的方式近似它。把我們的模型做大,我們就能提高它的性能。
模型2訓(xùn)練
有了前面的“教訓(xùn)”,我們知道不能把神經(jīng)網(wǎng)絡(luò)設(shè)置太簡(jiǎn)單,至少要3層。
model_2 = tf.keras.Sequential()
# First layer takes a scalar input and feeds it through 16 "neurons". The
# neurons decide whether to activate based on the 'relu' activation function.
model_2.add(layers.Dense(16, activation='relu', input_shape=(1,)))
# The new second layer may help the network learn more complex representations
model_2.add(layers.Dense(16, activation='relu'))
# Final layer is a single neuron, since we want to output a single value
model_2.add(layers.Dense(1))
# Compile the model using a standard optimizer and loss function for regression
model_2.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
# Show a summary of the model
model_2.summary()
history_2 = model_2.fit(x_train, y_train, epochs=600, batch_size=16,
validation_data=(x_validate, y_validate))
# Draw a graph of the loss, which is the distance between
# the predicted and actual values during training and validation.
loss = history_2.history['loss']
val_loss = history_2.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'g.', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
# Exclude the first few epochs so the graph is easier to read
SKIP = 80
plt.clf()
plt.plot(epochs[SKIP:], loss[SKIP:], 'g.', label='Training loss')
plt.plot(epochs[SKIP:], val_loss[SKIP:], 'b.', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
plt.clf()
# Draw a graph of mean absolute error, which is another way of
# measuring the amount of error in the prediction.
mae = history_2.history['mae']
val_mae = history_2.history['val_mae']
plt.plot(epochs[SKIP:], mae[SKIP:], 'g.', label='Training MAE')
plt.plot(epochs[SKIP:], val_mae[SKIP:], 'b.', label='Validation MAE')
plt.title('Training and validation mean absolute error')
plt.xlabel('Epochs')
plt.ylabel('MAE')
plt.legend()
plt.show()
跟前面一樣的步驟,我們可以看到誤差很不錯(cuò)
# Calculate and print the loss on our test dataset
loss = model_2.evaluate(x_test, y_test)
# Make predictions based on our test dataset
predictions = model_2.predict(x_test)
# Graph the predictions against the actual values
plt.clf()
plt.title('Comparison of predictions and actual values')
plt.plot(x_test, y_test, 'b.', label='Actual')
plt.plot(x_test, predictions, 'r.', label='Predicted')
plt.legend()
plt.show()
模型導(dǎo)出(TensorFlow Lite)
模型已經(jīng)被我們訓(xùn)練好了,但一般來(lái)說(shuō)正常訓(xùn)練好的DL模型不能被部署到單片機(jī)上,因?yàn)樘罅?,我們將使?strong>TensorFlow Lite轉(zhuǎn)換器。轉(zhuǎn)換器以一種特殊的、節(jié)省空間的格式輸出文件,以便在內(nèi)存受限的設(shè)備上使用。由于這個(gè)模型將部署在一個(gè)微控制器上,我們希望它盡可能小!量化是一種減小模型尺寸的技術(shù)。它降低了模型權(quán)值的精度,節(jié)省了內(nèi)存。
# Convert the model to the TensorFlow Lite format without quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model_2)
tflite_model = converter.convert()
# Save the model to disk
open("sine_model.tflite", "wb").write(tflite_model)
# Convert the model to the TensorFlow Lite format with quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model_2)
# Indicate that we want to perform the default optimizations,
# which includes quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Define a generator function that provides our test data's x values
# as a representative dataset, and tell the converter to use it
def representative_dataset_generator():
for value in x_test:
# Each scalar value must be inside of a 2D array that is wrapped in a list
yield [np.array(value, dtype=np.float32, ndmin=2)]
converter.representative_dataset = representative_dataset_generator
# Convert the model
tflite_model = converter.convert()
# Save the model to disk
open("sine_model_quantized.tflite", "wb").write(tflite_model)
模型轉(zhuǎn)化后我們可能懷疑它的準(zhǔn)確性會(huì)不會(huì)下降,答案是不會(huì)的,誤差不會(huì)差多少,可以試試下面的代碼看看轉(zhuǎn)換后的模型與原本的模型對(duì)比,發(fā)現(xiàn)差不多,很準(zhǔn)確。
# Instantiate an interpreter for each model
sine_model = tf.lite.Interpreter('sine_model.tflite')
sine_model_quantized = tf.lite.Interpreter('sine_model_quantized.tflite')
# Allocate memory for each model
sine_model.allocate_tensors()
sine_model_quantized.allocate_tensors()
# Get indexes of the input and output tensors
sine_model_input_index = sine_model.get_input_details()[0]["index"]
sine_model_output_index = sine_model.get_output_details()[0]["index"]
sine_model_quantized_input_index = sine_model_quantized.get_input_details()[0]["index"]
sine_model_quantized_output_index = sine_model_quantized.get_output_details()[0]["index"]
# Create arrays to store the results
sine_model_predictions = []
sine_model_quantized_predictions = []
# Run each model's interpreter for each value and store the results in arrays
for x_value in x_test:
# Create a 2D tensor wrapping the current x value
x_value_tensor = tf.convert_to_tensor([[x_value]], dtype=np.float32)
# Write the value to the input tensor
sine_model.set_tensor(sine_model_input_index, x_value_tensor)
# Run inference
sine_model.invoke()
# Read the prediction from the output tensor
sine_model_predictions.append(
sine_model.get_tensor(sine_model_output_index)[0])
# Do the same for the quantized model
sine_model_quantized.set_tensor(sine_model_quantized_input_index, x_value_tensor)
sine_model_quantized.invoke()
sine_model_quantized_predictions.append(
sine_model_quantized.get_tensor(sine_model_quantized_output_index)[0])
# See how they line up with the data
plt.clf()
plt.title('Comparison of various models against actual values')
plt.plot(x_test, y_test, 'bo', label='Actual')
plt.plot(x_test, predictions, 'ro', label='Original predictions')
plt.plot(x_test, sine_model_predictions, 'bx', label='Lite predictions')
plt.plot(x_test, sine_model_quantized_predictions, 'gx', label='Lite quantized predictions')
plt.legend()
plt.show()
為微控制器使用TensorFlow Lite準(zhǔn)備模型的最后一步是將其轉(zhuǎn)換為C(或h)源文件。為此,我們可以使用一個(gè)名為xxd的命令行實(shí)用程序。下面的單元格在量化模型上運(yùn)行xxd并打印輸出:
# Install xxd if it is not available
!apt-get -qq install xxd
# Save the file as a C source file
!xxd -i sine_model_quantized.tflite > sine_model_quantized.cc
# Print the source file
!cat sine_model_quantized.cc
這樣我們整個(gè)模型就被導(dǎo)出為c文件?。?!搞嵌入式的應(yīng)該很熟悉了!我們也可以導(dǎo)出為.h文件,在arduino里include一下就行,很方便。
模型部署、功能編寫
有了.c文件,我們開(kāi)始搞單片機(jī),打開(kāi)arduino,官方的代碼是用arduino nano ble 33,但這個(gè)板子太貴了,10幾塊的esp32完全可以駕馭TinyML,所以我們用esp32。(STM32也完全可以的,不過(guò)沒(méi)有arduino方便,有空我也會(huì)出個(gè)基于stm32的TinyML教程)
下載這個(gè)庫(kù),然后找到示例里面的hello world
,點(diǎn)開(kāi)。(默認(rèn)大家已經(jīng)裝了esp32的庫(kù)了,如果沒(méi)有在庫(kù)管理器里搜esp32安裝就行)
這個(gè)代碼就是根據(jù)官方寫的而改編的esp32版本,不過(guò)還有地方要改,點(diǎn)擊它的output_handler.cpp
文件,然后將其替換為下面的代碼:
#include "output_handler.h"
#include "Arduino.h"
#include "constants.h"
int led = 2;
bool initialized = false;
void HandleOutput(tflite::ErrorReporter* error_reporter, float x_value,
float y_value) {
// Do this only once
if (!initialized) {
ledcSetup(0, 5000, 13);
// Set the LED pin to output
ledcAttachPin(led, 0);
//pinMode(led, OUTPUT);
initialized = true;
}
// Calculate the brightness of the LED such that y=-1 is fully off
// and y=1 is fully on. The LED's brightness can range from 0-255.
int brightness = (int)(127.5f * (y_value + 1));
// Set the brightness of the LED. If the specified pin does not support PWM,
// this will result in the LED being on when y > 127, off otherwise.
//analogWrite(led, brightness);
uint32_t duty = (8191 / 255) * min(brightness, 255);
ledcWrite(0, duty);
//delay(30);
// Log the current brightness value for display in the Arduino plotter
TF_LITE_REPORT_ERROR(error_reporter, "%dn", brightness);
// // Log the current X and Y values
// TF_LITE_REPORT_ERROR(error_reporter, "x_value: %f, y_value: %fn",
// static_cast<double>(x_value),
// static_cast<double>(y_value));
}
還有一個(gè)小地方,constans.cpp
這個(gè)文件里面的kInferencesPerCycle
改為200左右,不然燈閃得太快了。
上面的代碼的model.cpp
里面那一大坨數(shù)字就是訓(xùn)練好的模型,我們自己訓(xùn)練的跟它差不多,如果你想用自己的,把自己轉(zhuǎn)換好的模型粘貼進(jìn)去就好,記得把長(zhǎng)度也填在最后一行。
ok,編譯,你就會(huì)看到板子上的燈以一個(gè)正弦波的節(jié)奏在呼吸,恭喜你成功地實(shí)現(xiàn)了嵌入式ML、邊緣AI、TinyML。
CV和NLP方面的有趣的TinyML應(yīng)用現(xiàn)在也有了很多,我有空都出些教程。