Introduction
Sequential data is everywhere in our digital world—from the words you’re reading right now to stock prices fluctuating over time, from speech patterns in audio recordings to user behavior on websites. Traditional feedforward neural networks, while powerful for many tasks, struggle with this temporal dimension because they treat each input independently, missing the crucial relationships that unfold over time.
Enter Recurrent Neural Networks (RNNs), a class of neural architectures specifically designed to handle sequential information by maintaining an internal memory state. TensorFlow provides three primary RNN implementations that have become the workhorses of sequence modeling: SimpleRNN, LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Unit).
While all three architectures share the fundamental concept of processing sequences step-by-step while maintaining memory, they differ significantly in their complexity, computational requirements, and ability to capture long-term dependencies. SimpleRNN offers an elegant introduction to recurrent concepts but suffers from the vanishing gradient problem. LSTM revolutionized the field by introducing sophisticated gating mechanisms to preserve information across long sequences. GRU emerged as a streamlined alternative, simplifying LSTM’s architecture while maintaining much of its power.
In this post, we’ll take a practical, hands-on approach to understanding these three RNN types. Rather than diving deep into complex theory, we’ll build simple implementations of each network using basic Python code and test them on a synthetic dataset. This straightforward comparison will help you get a feel for how these architectures behave and perform relative to each other.
By the end of this post, you’ll have a basic understanding of the key differences between SimpleRNN, LSTM, and GRU, along with some practical experience implementing each one. While this won’t make you an expert, it should give you a solid starting point for exploring these important sequence modeling tools in your own projects.
Development approach
The code example in this post was generated using Claude Code with Sonnet 4 as the model and context7 as MCP, demonstrating how modern AI tools can help streamline the development and comparison of different neural network architectures. To provide full transparency into this process, the exact prompts used to generate the implementation are shown immediately below, giving you insight into how AI-assisted coding can be effectively leveraged for machine learning experiments and educational purposes.
The prompt typed into Claude Code was simply:
Follow the instructions written in the file specification.md
and specification of the program to generate is written in the file specification.md, and the content of the file is:
Implement in Python using TensorFlow 2 with Keras a python program named "rnncomp.py".
Purpose of this program is to compare 3 different kinds of recurrent neural network (SimpleRNN, LSTM, GRU) on same dataset that must be generated syntetically. The architecture of the 3 networks must be identical except that for the recurrent layers.
The dataset must be generated synthetically combining 4 sinusoidal curves (with different period and different amplitude) plus a bit of white noise.
Generate 1000 samples, sequence length is 50; the 80% of dataset must be used for training, the 20% for test.
After the training of the 3 networks on same dataset, show two charts in a windows.
1) one with the training data, then a red vertical line to show where test data starts and then the test data with predictions of the 3 models.
2) the other one with the test data with predictions of the 3 models (so without training data).
Then, in another windows, show other two charts together:
1) one to show the 3 error functions during training
2) the other one to show the accuracy of the 3 models.
Generate requirements.txt file, too.
Use context7
The generated code
This is the code generated by Claude Code. I did only very a few of changes (namely I reduced number of epochs to 20 and passed verbose=1 in the call to the fit method)
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, LSTM, GRU, Dense
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
def generate_synthetic_dataset(n_samples=1000, seq_length=50):
"""Generate synthetic dataset with 4 sinusoidal curves and white noise"""
np.random.seed(42)
# Time array
t = np.linspace(0, 100, n_samples + seq_length)
# Four sinusoidal components with different periods and amplitudes
sin1 = 2.0 * np.sin(0.1 * t) # Period ~63, Amplitude 2
sin2 = 1.5 * np.sin(0.25 * t) # Period ~25, Amplitude 1.5
sin3 = 1.0 * np.sin(0.5 * t) # Period ~13, Amplitude 1
sin4 = 0.8 * np.sin(0.8 * t) # Period ~8, Amplitude 0.8
# Add white noise
noise = 0.1 * np.random.normal(0, 1, len(t))
# Combine all components
data = sin1 + sin2 + sin3 + sin4 + noise
# Normalize data
scaler = MinMaxScaler()
data = scaler.fit_transform(data.reshape(-1, 1)).flatten()
# Create sequences
X, y = [], []
for i in range(n_samples):
X.append(data[i:i+seq_length])
y.append(data[i+seq_length])
X = np.array(X).reshape(n_samples, seq_length, 1)
y = np.array(y)
return X, y, data, scaler
def create_model(model_type, seq_length, units=50):
"""Create RNN model with specified type"""
model = Sequential()
if model_type == 'SimpleRNN':
model.add(SimpleRNN(units, input_shape=(seq_length, 1)))
elif model_type == 'LSTM':
model.add(LSTM(units, input_shape=(seq_length, 1)))
elif model_type == 'GRU':
model.add(GRU(units, input_shape=(seq_length, 1)))
model.add(Dense(1))
model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])
return model
def plot_results_window1(data, train_size, X_test, y_test, predictions, seq_length):
"""Create first window with training/test data and predictions"""
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10))
# Chart 1: Full data with predictions
train_data = data[:train_size + seq_length]
test_data = data[train_size + seq_length:]
test_indices = range(train_size + seq_length, len(data))
ax1.plot(range(len(train_data)), train_data, 'b-', label='Training Data', alpha=0.7)
ax1.axvline(x=train_size + seq_length, color='red', linestyle='--', linewidth=2, label='Test Start')
ax1.plot(test_indices, test_data, 'g-', label='Test Data (Actual)', alpha=0.7)
test_start_idx = train_size + seq_length
for i, (name, pred) in enumerate(predictions.items()):
ax1.plot(test_indices, pred, '--', label=f'{name} Prediction', linewidth=2)
ax1.set_title('Complete Dataset: Training Data, Test Data, and Model Predictions')
ax1.set_xlabel('Time Step')
ax1.set_ylabel('Value')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Chart 2: Test data only with predictions
ax2.plot(y_test, 'g-', label='Actual Test Data', linewidth=2)
for i, (name, pred) in enumerate(predictions.items()):
ax2.plot(pred, '--', label=f'{name} Prediction', linewidth=2)
ax2.set_title('Test Data vs Model Predictions')
ax2.set_xlabel('Test Sample')
ax2.set_ylabel('Value')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
def plot_results_window2(histories, predictions, y_test):
"""Create second window with error functions and accuracy"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Chart 1: Training loss
for name, history in histories.items():
ax1.plot(history.history['loss'], label=f'{name} Loss', linewidth=2)
ax1.set_title('Training Loss Over Epochs')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss (MSE)')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Chart 2: Model accuracy (using MSE and MAE)
model_names = list(predictions.keys())
mse_scores = [mean_squared_error(y_test, pred) for pred in predictions.values()]
mae_scores = [mean_absolute_error(y_test, pred) for pred in predictions.values()]
x = np.arange(len(model_names))
width = 0.35
ax2_twin = ax2.twinx()
bars1 = ax2.bar(x - width/2, mse_scores, width, label='MSE', alpha=0.7)
bars2 = ax2_twin.bar(x + width/2, mae_scores, width, label='MAE', alpha=0.7, color='orange')
ax2.set_title('Model Accuracy Comparison')
ax2.set_xlabel('Model')
ax2.set_ylabel('MSE', color='blue')
ax2_twin.set_ylabel('MAE', color='orange')
ax2.set_xticks(x)
ax2.set_xticklabels(model_names)
# Add value labels on bars
for bar, score in zip(bars1, mse_scores):
height = bar.get_height()
ax2.text(bar.get_x() + bar.get_width()/2., height + height*0.01,
f'{score:.4f}', ha='center', va='bottom')
for bar, score in zip(bars2, mae_scores):
height = bar.get_height()
ax2_twin.text(bar.get_x() + bar.get_width()/2., height + height*0.01,
f'{score:.4f}', ha='center', va='bottom')
ax2.legend(loc='upper left')
ax2_twin.legend(loc='upper right')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
def main():
# Generate synthetic dataset
print("Generating synthetic dataset...")
X, y, data, scaler = generate_synthetic_dataset(n_samples=1000, seq_length=50)
# Split data (80% train, 20% test)
train_size = int(0.8 * len(X))
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]
print(f"Dataset created: {len(X_train)} training samples, {len(X_test)} test samples")
# Create and train models
model_types = ['SimpleRNN', 'LSTM', 'GRU']
models = {}
histories = {}
predictions = {}
for model_type in model_types:
print(f"\nTraining {model_type} model...")
model = create_model(model_type, seq_length=50, units=50)
# Train model
history = model.fit(X_train, y_train,
epochs=20,
batch_size=32,
validation_split=0.2,
verbose=1)
# Make predictions
pred = model.predict(X_test, verbose=0).flatten()
models[model_type] = model
histories[model_type] = history
predictions[model_type] = pred
# Print model performance
mse = mean_squared_error(y_test, pred)
mae = mean_absolute_error(y_test, pred)
print(f"{model_type} - MSE: {mse:.4f}, MAE: {mae:.4f}")
# Display results
print("\nDisplaying results...")
plot_results_window1(data, train_size, X_test, y_test, predictions, seq_length=50)
plot_results_window2(histories, predictions, y_test)
print("\nComparison completed!")
if __name__ == "__main__":
main()The requirements.txt content is:
tensorflow>=2.10.0
numpy>=1.21.0
matplotlib>=3.5.0
scikit-learn>=1.1.0
GPU is not needed to run this code.
These materials are distributed under MIT license; feel free to use, share and adapt these materials as you see fit.
Code execution
To run the program, first create a Python virtual environment, then install the dependencies by running
pip install -r requirements.txtand finally execute the comparison by running
python rnncomp.pyResults
After the training of the three models and correspondent predictions, the program generates four charts (grouped in pairs) to visualize the comparison results. The first chart displays the complete time series, showing both the training and test data along with the predictions made by each RNN model on the test portion. The second chart provides a zoomed-in view focusing exclusively on the test data and the corresponding predictions, making it easier to analyze the models’ performance on unseen data.


Overall, all three RNN models perform more or less well on this synthetic dataset.
The remaining two charts show the comparison of training loss across epochs for all three models and the accuracy comparison between the three RNN architectures.


Interpretation of the accuracy chart
Based on this Model Accuracy Comparison chart, we can see the performance metrics for all three RNN architectures measured using both Mean Squared Error (MSE) and Mean Absolute Error (MAE).
LSTM emerges as the best performer with the lowest values for both metrics – an MSE of 0.0002 and MAE of 0.0105. SimpleRNN shows solid middle-ground performance with an MSE of 0.0003 and MAE of 0.0133. GRU, while still performing well, ranks third with an MSE of 0.0003 (tied with SimpleRNN) but a slightly higher MAE of 0.0145.
The differences between the models are quite small, confirming that all three architectures handle this particular synthetic dataset effectively. The MSE values are very close (ranging from 0.0002 to 0.0003), while the MAE shows slightly more variation. This suggests that for this specific task, the additional complexity of LSTM’s gating mechanisms provides a modest but measurable advantage, while the simpler GRU architecture performs nearly as well with potentially faster training times.
Explanation of the code
The code implements a comprehensive RNN comparison program, namely it compares how SimpleRNN, LSTM, and GRU handle the same complex time series prediction task.
Below are the main components of the code:
- Dataset Generation
- generate_synthetic_dataset(): Creates 1000 samples with 4 sinusoidal components:
- Different periods (63, 25, 13, 8) and amplitudes (2.0, 1.5, 1.0, 0.8)
- Adds white noise for realism
- Normalizes data using MinMaxScaler
- Creates sequences of length 50 for time series prediction
- generate_synthetic_dataset(): Creates 1000 samples with 4 sinusoidal components:
- Model Architecture
- create_model(): Builds identical architectures except for recurrent layer type:
- 50 units in the recurrent layer (SimpleRNN/LSTM/GRU)
- Single Dense output layer
- Adam optimizer with MSE loss
- create_model(): Builds identical architectures except for recurrent layer type:
- Training Process
- Splits data 80/20 (train/test)
- 800 training samples, 200 test samples
- Trains each model for 20 epochs (modified from original 50)
- Uses 20% validation split during training
- Tracks training history for loss visualization
- Splits data 80/20 (train/test)
- Visualization (Two Windows)
- Window 1
- Chart 1: Complete timeline showing training data, red line, test data plus all 3 model predictions
- Chart 2: Test data only with predictions from all models
- Windows 2
- Chart 1: Training loss curves over epochs for all 3 models
- Chart 2: Accuracy comparison using MSE and MAE metrics with bar charts
- Window 1
