How can you create a simple model from scratch in tensorflow without using higher API’s like keras?

4 min readDec 30, 2020

So there are still things that are demystifying and as a beginner we are stuck to the keras semantics. Coz keras is simple and great for beginners or if you are working with DL. Things get easy like transfer learning, creating a architecture without thinking much, using the ImageDateGenerator.

We are here to use tensorflow to create a very low level tensorflow code to create a linear regression model. And also we’ll also be crafting our own data so we know where it must eventually go (It’s just kept for a hands on easy tutorial). If you wish you can use your own data.

https://en.wikipedia.org/wiki/Linear_regression

Imports

Let’s start by importing the main libraries that we will be requiring. And that’s just tensorflow for the model and numpy for generating our data and matplotlib to visualise our model in action.

import tensorflow as tf 
import numpy as np
import matplotlib.pyplot as plt

Data Creation

Now let’s create the data:

X = tf.constant(np.linspace(0, 2, 2000), dtype=tf.float32)
Y = X * tf.exp(-X**2) #finding exponential

In the above code we are creating 2 tensorflow constants. And if you are not know about tensorflow constants then they are just like numpy array whose values cannot be modified. Or more specifically if you know about pandas series then its the same (values, rank, shape and dtype). Inside tf.constant() we are creating a numpy array of 2000 values ranging between 0 and 200 and then defining the type of the tensorflow constant to be float32.

You can visualize X and Y if you like to.

plt.plot(X, Y)
plt.show()

Let’s define some helper functions:

def make_features(X):
    f1 = tf.ones_like(X)  # Bias.
    f2 = X
    f3 = tf.square(X)
    f4 = tf.sqrt(X)
    f5 = tf.exp(X)
    return tf.stack([f1, f2, f3, f4, f5], axis=1)def predict(X, W):
    return tf.squeeze(X @ W, -1)def loss_mse(X, Y, W):
    Y_hat = predict(X, W)
    errors = (Y_hat - Y)**2
    return tf.reduce_mean(errors)

We are using constant X to generate more features for our problem using the make_features() function. and then returning them stacked over horizontally to make it look like a real dataset.

We are defining the predict() function which uses the traditional equation X transpose W.

The loss we are using is mean squared error using the Y_hat (predicted Y which is calculated using X and W) and Y to find the error.

Gradient Function

def compute_gradients(X, Y, W):
    with tf.GradientTape() as tape:
        loss = loss_mse(Xf, Y, W)
    return tape.gradient(loss, W)

To use gradient descent we need to take the partial derivatives of the loss function with respect to each of the weights. We could manually compute the derivatives, but with Tensorflow’s automatic differentiation capabilities we don’t have to!

During gradient descent we think of the loss as a function of the parameters w0, w1, w2, w3, w4 and w5. Thus, we want to compute the partial derivative with respect to these variables.

For that we need to wrap our loss computation within the context of tf.GradientTape instance which will record gradient information:

with tf.GradientTape() as tape:
    loss = # computation

This will allow us to later compute the gradients of any tensor computed within the tf.GradientTape context with respect to instances of tf.Variable:

gradients = tape.gradient(loss, [w0, w1])

Model Training

STEPS = 2000 #try 50000
LEARNING_RATE = .02Xf = make_features(X)
n_weights = Xf.shape[1]W = tf.Variable(np.zeros((n_weights, 1)), dtype=tf.float32)# For plotting
steps, losses = [], []
plt.figure()for step in range(1, STEPS + 1):
    dW = compute_gradients(X, Y, W)
    W.assign_sub(dW * LEARNING_RATE)    if step % 100 == 0:
        loss = loss_mse(Xf, Y, W)
        steps.append(step)
        losses.append(loss)
        plt.clf()
        plt.plot(steps, losses)
plt.show()print("STEP: {} MSE: {}".format(STEPS, loss_mse(Xf, Y, W)))

Model training is pretty simple it’s just summing up the whole process from definition of common terms, making features, generating random weights, computing the gradients and for every 100 step we are calculating the loss and then storing the step and loss at that moment and plotting it up. And then finally printing the number of steps and the loss that evaluated after training.

Testing

Let’s now evaluate how our model actually performs.

# The .figure() method will create a new figure, or activate an existing figure.
plt.figure()
# The .plot() is a versatile function, and will take an arbitrary number of arguments. For example, to plot x versus y.
plt.plot(X, Y, label='actual')
plt.plot(X, predict(Xf, W), label='predicted')
# The .legend() method will place a legend on the axes.
plt.legend()
plt.show()

And that’s it.

Thanks for having a look at my first tutorial ever made.