Photo by Daniel Gregoire on Usplash

What is Steganography

Steganography is the practice of concealing a file, message, image, or video within another file, message, image, or video.

Inspired by this excellent blog post by Kelvin Salton do Prado explaining a simple approach for concealing one image within another image, I have created this short application to hide a text message inside a given image, without causing any noticeable change to the original image.

I have also added a nice visualization at the end capturing the text extraction taking place as the image is scanned pixel-by-pixel. It would look something like this:

Note: This application requires an image with 3 channels (e.g. RGB) hence grayscale images will not work

Before we start, a bit of background on digital images and component channels might be useful.

Digital images are typically stored in computers as files which contain encoded data of all the tiny components which make up an image.

In computer terminology, the smallest building block of an image is called a Pixel. The value of this pixel is a number which tells how bright or dull that portion of the image is. Typically, the higher the value value, the brighter the pixel.

An image can be spatially described by it's width and height, where the units of the dimensions are in pixels. e.g. the size of the image on top of this post is 5045 x 3324, i.e. the image has a width of 5045 pixels and a height of 3324 pixels.

For color images, we need more than just one value for every pixel. This is because visible light is a mix of multiple colors, with each color having it's own intensity or brightness.

A typical representation of color images is the RGB (Red,Green,Blue) format, where the value of every pixel in the image is described as a combination of 3 values, i.e. one value for each of the 3 primary components R, G and B.

These 3 components or channels, when combined together, determine the actual color of each pixel in the image.

Thus a digital image is represented in 3 spatial dimensions instead of 2 i.e. height, width and channels or in short HxWxC

In the following illustration of an image, the small boxes represent pixels and the big colored squares stacked together represent each of the 3 channels R, G and B.

The core idea behind this implementation is actually quite simple and the key aspects are as follows:

  • Each pixel of an RGB image consists of three 8-bit values corresponding to the three channels
  • Each of these three 8-bit values consist of 4 most-significant-bits (MSB) and 4 least-significant-bits (LSB)
  • Hence every pixel is a combination of 12 (4+4+4) MSBs and 12 (4+4+4) LSBs
  • As the name suggests, the 12 LSBs are less relevant and can be modified without any noticeable change in the image

Following are the steps involved in the insert operation:

  • Read the input image into a 3D array of pixels of size HxWxC (H=height, W=width, C=channels=3 for RGB)
  • Read the input text file and store contents into an array of bytes
  • Convert every byte(character) of the text into a 12-bit binary format
  • Starting with the first pixel (n=0) of the image matrix, replace the 12 LSBs of pixel-n with the 12-bit value of the byte-n from the text byte-array
  • Repeat above operation (n=n+1 every time) until there are no more bytes left in the text byte-array
  • In the last iteration, replace the 12 LSBs of pixel-n with a special 12-bit end-of-text indicator value
  • Save the modified image matrix into a new image file

The following image should help illustrate the above steps using an example

Similarly, the steps involved in the extract operation are as follows:

  • Read the input image into a 3D array of pixels of size HxWxC (H=height, W=width, C=channels=3 for RGB)
  • Starting with the first pixel (n=0) of the image matrix, read the 12 LSBs of pixel-n and add it to a byte array
  • Repeat above operation (n=n+1 every time) until you find the end-of-text indicator value in the 12 LSBs of pixel-n
  • Decode the byte array and save the contents into a new text file

We start with importing the libraries we need and setting the plot mode for inline display

Note: To run this Jupyter notebook on your local machine please ensure you have the following dependencies installed as a precondition:

  • cv2
  • imutils
  • matplotlib
  • ffmpeg
import base64
import cv2
import time
import imutils
import os
import numpy as np
import matplotlib
from matplotlib.animation import FuncAnimation
import matplotlib.pyplot as plt
from IPython.display import display
from IPython.display import HTML

%matplotlib inline

In this script we will be using some static configuration parameters, which we wouldn't normally require to change.

A special note for the parameter eotIn which is a special signature used to indicate the end of text.

This allows the extraction logic to stop processing the image when all the text has been extracted, hence saving time.

# define configuration params
figureSize = (12,10) # image size for display
displayWidth = (1024+128) # max width of image for display
maxDebugBytes = 10 # max num of bytes to view in debug mode
eotInd = 2730 # '101010101010' - 12 bit End-Of-Text indicator
maxCharRange = 127 # max range of characters as per ASCII table
debugMode = False # flag to view debug prints
visFps = 30 # frames per second for the captured visualization video
visStack = 0 # direction (0-horizontal/1-vertical) for visualization stack

Then we move on to define the core logic of our script, the Steganography class and it's member functions.

This class implements the insert and extract operations along with some utility functions.

class Steganography(object):

    @staticmethod
    def __int_to_bin(rgb):
        """Convert an integer tuple to a binary (string) tuple.

        :param rgb: An integer tuple (e.g. (220, 110, 96))
        :return: A string tuple (e.g. ("00101010", "11101011", "00010110"))
        """
        r, g, b = rgb
        return ('{0:08b}'.format(r),
                '{0:08b}'.format(g),
                '{0:08b}'.format(b))

    @staticmethod
    def __bin_to_int(rgb):
        """Convert a binary (string) tuple to an integer tuple.

        :param rgb: A string tuple (e.g. ("00101010", "11101011", "00010110"))
        :return: Return an int tuple (e.g. (220, 110, 96))
        """
        r, g, b = rgb
        return (int(r, 2),
                int(g, 2),
                int(b, 2))

    @staticmethod
    def __hide_rgb(rgb1, rgb2):
        """hide two RGB tuples.

        :param rgb1: A string tuple (e.g. ("00101010", "11101011", "00010110"))
        :param rgb2: Another string tuple
        (e.g. ("00101010", "11101011", "00010110"))
        :return: An integer tuple with the two RGB values hidden.
        """
        r1, g1, b1 = rgb1
        r2, g2, b2 = rgb2
        rgb = (r1[:4] + r2[:4],
               g1[:4] + g2[:4],
               b1[:4] + b2[:4])
        return rgb

    @staticmethod
    def insert(imfile, txtfile, outputfile, debug=False):
        """Hide a text within an image.

        :param imfile: Input Image file
        :param txtfile: Input Text file
        :param outputfile: Output Image file
        :param debug: Flag for debug prints
        :return: True or False depending on Success or Failure of operation
        """
        # check is image path is valid
        if not os.path.exists(imfile):
          print("!! Unable to locate image file:",imfile)
          return False
        
        # check is text path is valid
        if not os.path.exists(txtfile):
          print("!! Unable to locate text file:",txtfile)
          return False

        # Load the image
        image = cv2.imread(imfile,cv2.IMREAD_UNCHANGED)
        image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        
        # Clone the original image to create a copy that will contain hidden content
        new_image = image.copy()
        
        # check if image contains 3 channels, else quit
        im_shape = image.shape
        if len(im_shape) != 3:
          print("!! Unable to insert text as image does not have 3 channels !!")
          return False
        elif im_shape[2] != 3:
          print("!! Unable to insert text as image does not have 3 channels !!")
          return False
        
        # Open the text file and read the content
        fp = open(txtfile, 'rb')
        text_bin = fp.read()
        fp.close()

        print("Length of text: {} chars".format(len(text_bin)))
        
        # check if image size is sufficient to encode full text, else quit
        required_bytes = len(text_bin) * 3
        if required_bytes > im_shape[0] * im_shape[1] * 3:
          print("!! Unable to insert full text as image does not have sufficient size !!")
          return False
        
        # initialize variables
        idx = 0
        break_flag = False

        for i in range(im_shape[0]):
            for j in range(im_shape[1]):
                rgb1 = Steganography.__int_to_bin(image[i,j])
                
                if idx == len(text_bin):
                  # if no more text left to insert, add the end of text signature
                  bin_val = "{0:012b}".format(eotInd)
                  break_flag = True
                  if debug:
                    print("Hit end of text at idx:",idx)
                    print("No of bytes used:", idx*3)
                else:
                  if text_bin[idx] > maxCharRange:
                    # if character is out of max range, insert a 'dot' character in 12-bit binary format
                    bin_val = "{0:012b}".format(ord('.'))
                  else:
                    # convert character to 12-bit binary format
                    bin_val = "{0:012b}".format(text_bin[idx])
                  
                # split into 3 channels of 4 bits each
                rgb2 = (bin_val[:4],bin_val[4:8],bin_val[8:12])

                # hide the two pixels and convert it to a integer tuple
                rgb = Steganography.__hide_rgb(rgb1, rgb2)

                new_image[i,j] = Steganography.__bin_to_int(rgb)
                
                if debug:
                  if idx < maxDebugBytes:
                    # visualize first maxDebugBytes
                    # print(text_bin[idx])
                    # print(chr(text_bin[idx]))
                    print("bin_val[{}]:{}".format(idx,bin_val))
                
                idx += 1
                if break_flag == True:
                  break
            if break_flag == True:
              break

        plt.figure(figsize=figureSize)
        plt.imshow(imutils.resize(image,width=displayWidth))
        plt.title('Original Image')
        plt.xticks([]), plt.yticks([])  # Hides the graph ticks and x / y axis
        plt.show()
        
        plt.figure(figsize=figureSize)
        plt.imshow(imutils.resize(new_image,width=displayWidth))
        plt.title('Image with Hidden Text')
        plt.xticks([]), plt.yticks([])  # Hides the graph ticks and x / y axis
        plt.show()
        
        new_image = cv2.cvtColor(new_image, cv2.COLOR_RGB2BGR)
        
        if os.path.exists(outputfile):
            os.remove(outputfile)
            
        cv2.imwrite(outputfile,new_image)

        return True

    @staticmethod
    def extract(imfile, txtfile, visualize, visfile, debug=False):
        """Extract hidden text from an image.

        :param imfile: input image file containing hidden text
        :param txtfile: output text file containing extracted text
        :param visualize: flag to enable/disable visualization
        :param visfile: output file to store visualization video
        :param debug: Flag for debug prints
        :return: True or False depending on Success or Failure of operation.
        """

        # check is image path is valid
        if not os.path.exists(imfile):
          print("!! Unable to locate image file:",imfile)
          return False
        
        # Load the image
        image = cv2.imread(imfile,cv2.IMREAD_UNCHANGED)
        image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        
        # Store the image size
        im_shape = image.shape
        # check if image contains 3 channels, else quit
        if len(im_shape) != 3:
          print("!! Unable to extract text as image does not have 3 channels !!")
          return False
        elif im_shape[2] != 3:
          print("!! Unable to extract text as image does not have 3 channels !!")
          return False
        
        # initialize variables
        text_bin = []
        idx = 0
        break_flag = False

        # create a canvas for displaying extracted text
        text_canvas =  np.zeros_like(image)
        # create a white canvas for displaying image
        white_canvas =  255 * np.ones_like(image)
        
        if visualize.lower() == 'true':
            vis_width = displayWidth
            vis_height = int(0.5*displayWidth*im_shape[0]/im_shape[1])
            visframes = []
            capture_vis = True
        else:
            capture_vis = False
            
        print("Begin processing image file... this may take a few minutes")
        start_time = time.time()
        for i in range(im_shape[0]):
            for j in range(im_shape[1]):
                # Get the RGB (as a string tuple) from the current pixel
                r, g, b = Steganography.__int_to_bin(image[i,j])

                # Extract the last 4 bits (corresponding to the hidden text)
                bin_val = r[4:] + g[4:] + b[4:]
                
                if debug:
                # view first maxDebugBytes bytes
                  if idx < maxDebugBytes:
                    print("bin_val[{}]:{}".format(idx,bin_val))

                # Convert it to an integer
                int_val = int(bin_val,2)
                if int_val == eotInd:
                  # Check if End-Of-Text indicator is found
                  if debug:
                    print("Found end of text signature at index:", idx)
                  break_flag = True
                elif int_val <= maxCharRange:
                  # only take characters within max range and append to output list
                  text_bin.append(int_val)

                  if capture_vis == True:
                    # visualize the extraction process
                    try:
                      xpos = (j*30) % im_shape[1]
                      ypos = i+30 + (j*30//im_shape[1]) * 30
                      
                      cv2.putText(text_canvas, chr(text_bin[idx]), (xpos,ypos),cv2.FONT_HERSHEY_SIMPLEX,1.0,(0, 255, 0),4)
                      
                      # create a canvas for displaying original image while extraction with some blurring effect
                      img_canvas = image.copy()
                      img_canvas = cv2.addWeighted(img_canvas, 0.30, white_canvas, 0.70, 0)
                      # show a small moving box to indicate pixel processing
                      rectx = (j*30) % im_shape[1]
                      recty = (i + (j*30//im_shape[1]) * 30) % im_shape[0]
                      rectw = 30
                      recth = 30
                      
                      cv2.rectangle(img_canvas, (rectx, recty), (rectx+rectw, recty+recth), (0,0,255), 5)
                      
                      if idx%20 < 10:
                        # Add a blinking text effect
                        cv2.putText(img_canvas, 'Scanning Image', (int(0.6* im_shape[1]//2),im_shape[0]//2),cv2.FONT_HERSHEY_SIMPLEX,3.0,(255, 0, 0), 10)
                      
                      if visStack == 0:
                        # stack_img = np.hstack((img_canvas,text_canvas))
                        stack_img = np.hstack((imutils.resize(img_canvas,width=displayWidth//2),imutils.resize(text_canvas,width=displayWidth//2)))
                      else:
                        # stack_img = np.vstack((img_canvas,text_canvas))
                        stack_img = np.vstack((imutils.resize(img_canvas,width=displayWidth//2),imutils.resize(text_canvas,width=displayWidth//2)))

                      stack_img = cv2.cvtColor(stack_img, cv2.COLOR_RGB2BGR)
                      visframes.append(stack_img)
                        
                      if ypos >= im_shape[0]:
                        capture_vis = False
                      
                    except Exception as err:
                        print("!! Visualization error:", err)
                        visualize = 'false'
                        capture_vis = False
                  
                  idx += 1
                  
                if break_flag == True:
                  break
            if break_flag == True:
              break

        end_time = time.time()
        print("Finished processing image file... time taken: {:0.02f}s".format(end_time-start_time))
        
        if visualize.lower() == 'true':
            print('Begin saving visualization file... this may take a few minutes')
            start_time = time.time()
            delay = 1000 / visFps
            num_frames = len(visframes)

            fig = plt.figure(figsize=(vis_width/100, vis_height/100), tight_layout=True)
            im = plt.imshow(visframes[0])
            plt.axis('off')
            plt.grid(b=None)

            def update_vis(i):
              """Updates the visualization"""
              im.set_data(visframes[i])

            anim = FuncAnimation(
                    fig,
                    update_vis,
                    frames=np.arange(1, num_frames),
                    interval=delay,
                    blit=False)
            
            if os.path.exists(visfile):
                os.remove(visfile)
            
            if visfile[-3:] == 'mp4':
              anim.save(visfile, dpi=100, fps=visFps)
            elif visfile[-3:] == 'gif':
              anim.save(visfile, writer='imagemagick', fps=visFps, dpi=100)

            plt.close()
            end_time = time.time()
            print("Finished saving visualization file... time taken: {:0.02f}s".format(end_time-start_time))
        
        if debug:
          # view first maxDebugBytes
          print("view first {} bytes".format(maxDebugBytes))
          print(text_bin[:maxDebugBytes])
          print(bytes(text_bin[:maxDebugBytes]))
          print("maxval:",max(text_bin))
          print("minval:",min(text_bin))
        
        plt.figure(figsize=figureSize)
        plt.imshow(imutils.resize(image,width=displayWidth))
        plt.title('Image with Hidden Text')
        plt.xticks([]), plt.yticks([])  # Hides the graph ticks and x / y axis
        plt.show()
        try:
          plt.figure(figsize=figureSize)
          plt.imshow(imutils.resize(text_canvas,width=displayWidth))
          plt.title('Extracted Text')
          plt.xticks([]), plt.yticks([])  # Hides the graph ticks and x / y axis
          plt.show()
          text_data = bytes(text_bin).decode()
          print("Extracted Text:")
          print("===================================")
          print(text_data)
          print("===================================")
        
        except Exception as err:
          print("!! Decode Error:",err)
          return False

        if os.path.exists(txtfile):
            os.remove(txtfile)
            
        fp = open(txtfile,'wb')
        fp.write(bytes(text_bin))
        fp.close()
    
        return True

With the core logic defined, we will now define a simple function to invoke the insert operation.

This function accepts the required inputs, calls the insert function and saves the output image

def insert_text(img_file, text_file, output_file):
    start_time = time.time()
    ret_val = Steganography.insert(img_file,text_file,output_file,debugMode)
    end_time = time.time()
    
    if not ret_val:
      print("!! Insert Failure !!")
    else:
      print("Total time taken for Insert operation: {:0.02f}s".format(end_time-start_time))

We will configure the input parameters for the insert operation, providing the names of the input image and text files along with the output image file.

I have configured them as per some sample files which are included as part of this demonstration.

You can change these values as per your choice of image and text content that can be either local files or remote URL.

# sample inputs with local files, uncomment to run
#img_file = 'samples/img1.jpg' # input image file
#text_file = 'samples/text1.txt' # input text file
#output_file = 'output.png' # output image file containing hidden text
# sample inputs with URL, uncomment to run
img_file = 'https://raw.githubusercontent.com/DebalB/my-blogs/master/_notebooks/samples/img1.jpg'
text_file = 'https://raw.githubusercontent.com/DebalB/my-blogs/master/_notebooks/samples/text1.txt'
output_file = 'output.png'

The following code section downloads input files from remote location if required.

# download the images locally if URL is provided
if img_file.startswith('http'):
    !wget -nc $img_file
    img_file = os.path.split(img_file)[-1]
if text_file.startswith('http'):
    !wget -nc $text_file
    text_file = os.path.split(text_file)[-1]

We are now ready to perform the insert operation

insert_text(img_file,text_file,output_file)
Length of text: 1444 chars
Total time taken for Insert operation: 0.72s

As you can see above, it is hard for the human eye to distinguish between the original and the merged image containing the text hidden inside.

Next, we will define a function to extract this hidden text from the image.

This simple function will accept the required inputs, call the extract function which will save the extracted text in the file name that we provided.

def extract_text(imfile, txtfile, visualize, visfile):
    start_time = time.time()
    ret_val = Steganography.extract(imfile,txtfile,visualize,visfile,debugMode)
    end_time = time.time()
    
    if not ret_val:
      print("!! Extraction Failure !!")
    else:
      print("Total time taken for extraction: {:0.02f}s".format(end_time-start_time))

We will configure our input parameters as before, providing the input image and output text file along with some parameters to control if a visualization video is generated for the extraction process.

Feel free to set true as false if you would prefer to skip the visualization.

new_img_file = 'output.png' # input image file for text extraction
output_txt_file = 'text_output.txt' # output file to store extracted text
visualize = 'true'  # flag to enable/disable visualization
visfile = 'visualize.mp4' # file to store visualization video

All we need to do now is call the extract_text function with our configuration parameters

extract_text(new_img_file,output_txt_file,visualize,visfile)
Begin processing image file... this may take a few minutes
Finished processing image file... time taken: 17.37s
Begin saving visualization file... this may take a few minutes
Finished saving visualization file... time taken: 45.79s
Extracted Text:
===================================
To be, or not to be, that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them. To die...to sleep,
No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
That flesh is heir to: 'tis a consummation
Devoutly to be wish'd. To die, to sleep;
To sleep, perchance to dream...ay, there's the rub:
For in that sleep of death what dreams may come,
When we have shuffled off this mortal coil,
Must give us pause...there's the respect
That makes calamity of so long life.
For who would bear the whips and scorns of time,
Th'oppressor's wrong, the proud man's contumely,
The pangs of dispriz'd love, the law's delay,
The insolence of office, and the spurns
That patient merit of th'unworthy takes,
When he himself might his quietus make
With a bare bodkin? Who would fardels bear,
To grunt and sweat under a weary life,
But that the dread of something after death,
The undiscovere'd country, from whose bourn
No traveller returns, puzzles the will,
And makes us rather bear those ills we have
Than fly to others that we know not of?
Thus conscience does make cowards of us all,
And thus the native hue of resolution
Is sicklied o'er with the pale cast of thought,
And enterprises of great pitch and moment
With this regard their currents turn awry
And lose the name of action.
===================================
Total time taken for extraction: 63.80s

And voila !!

Our hidden text has magically appeared out of the image.

Lets see a visualization of the extraction process.

def show_video(video_path):
  mp4 = open(video_path, 'rb').read()
  data_url = 'data:video/mp4;base64,' + base64.b64encode(mp4).decode()
  return HTML("""<video width=1024 controls>
      <source src="%s" type="video/mp4"></video>
  """ % data_url)

if visualize == 'true' and os.path.exists(visfile):
  display(show_video(visfile))

Hope you enjoyed this demo.

I encourage you to try running the above code with images and text of your choice and share you feedback.