Sobel – Image Processing Using Sobel Convolution

Edges characterize boundaries and are a problem of fundamental importance in image processing. Edges in images are areas with strong intensity contrasts – a jump in intensity from one pixel to the next. Edge detecting an image significantly reduces the amount of data and filters out useless information, while preserving the important structural properties in an image. There are many ways to perform edge detection. However, the majority of different methods may be grouped into two categories, gradient and Laplacian. The gradient method detects the edges by looking for the maximum and minimum in the first derivative of the image.



Sobel convolution on the toureiffel512.bmp sample image

The Sobel operator performs a 2-D spatial gradient measurement on an image. Typically it is used to find the approximate absolute gradient magnitude at each point in an input grayscale image. The Sobel edge detector uses a pair of 3 by 3 convolution masks (or kernels), one estimating the gradient in the x-direction (columns) and the other estimating the gradient in the y-direction (rows). A convolution mask is usually much smaller than the actual image. As a result, the mask is slid over the image, manipulating a square of pixels at a time.
The Sobel masks are shown below:
-1 0 +1            +1 +2 +1
-2 0 +2 0 0 0
-1 0 +1 -1 -2 -1
Gx Gy
Sobel Masks

An approximate magnitude can be calculated using: |G| = |Gx| + |Gy|. The code for the Sobel edge detector in this example uses this gradient approximation.

Why is Convolution Suitable for Accelerating in Mitrion-C? The main reasons why convolution has a good potential acceleration are the inherent parallelism and the potential for data re-use.

Inherent parallelism: The new value for each pixel may be calculated independently of every other, so the algorithm has a high degree of fine-grained parallelism. In this implementation, the Mitrion-C code calculates 16 values at a time, sweeping across the image in an X-Y scan.

Data re-use: Because the bandwidth into and out of the FPGA is one of the scarcest resources available to the programmer, it is important to use each piece of input data for as many calculations as possible per load / store. This is known as data re-use, and the 3 by 3 Sobel convolution algorithm reads every pixel value and uses it in a calculation nine times. This is a favourable rate of re-use compared to some calculations such as a dot product only perform a single calculation with each input value, and thus have no opportunity for re-use.

Overview and Organization of the Sobel Example Application
This example consists of two programs. The host program is written in ANSI-C and does file I/O, error checking, reserves the FPGA(s) from the OS, and handles data communications to and from the FPGA. The Mitrion-C program is compiled and synthesized to a bitstream that is loaded onto the FPGA to perform edge detection using the Sobel image convolution algorithm. The two programs are described in detail below.

Click here to download the source files (ZIP, 384 KB)

Mitrion-C Program: At the heart of the Mitrion-C program (file sobel.mitc) is a pipelined stencil generator. Every MVP clock cycle, the stencil generator outputs a two-dimensional array of 3 x 3 stencils of the current pixel surrounded by it’s neighbour pixels in the input image. These stencils are not stored, but generated on-the-fly as the image is read in. This reduces memory usage and greatly simplifies the design of the convolution function, since it only has to work on a single isolated stencil. The Sobel kernel function is called inside a nested parallel loop, corresponding to the two-dimensional array of the input image. The remainder of the Mitrion-C program consists of routines to read the data from the input memory bank and write out the results to the output memory bank.

Sobel kernel: The Sobel edge detection convolution is implemented in the function do_sobel. The function takes as its input a 3 by 3 stencil of the values surrounding a given pixel, and returns a new pixel calculated from that stencil. If an image processing algorithm other than Sobel convolution is required, the do_sobel function can be replaced with a new function implementing the algorithm and no further changes to the Mitrion-C program are required. If the new function returns more than one value for a given stencil, a list or vector of values may be returned. These values would need to be written to output memory for return to the host processor, of course.

Parallelism scheme: The main body of the program (at main()), consists of a triply-nested series of parallel foreach loops. The foreach loop is a parallel construct in Mitrion-C that allows all iterations of a loop to execute concurrently if there are no loop-carried constraints.

The outermost loop is over rows of the input image, the next inner loop is over blocks within a row, and the innermost loop is over PAR_UNROLL pixels within a block, where PAR_UNROLL is a constant defined in the file header. Because the innermost loop is operating on a vector collection, PAR_UNROLL copies of the kernel do_sobel are placed on the FPGA, all operating simultaneously. The outer two loops provide pipelining, and as a result the routine is able to process PAR_UNROLL (16 in this example) pixels per clock tick, or 1.6 Billion pixels per second.

Stencil generator: The stencil generator routine stencil_gen_2dv takes in a 2 dimensional rectangular array of pixel values and generates a two-dimensional list of 3 x 3 stencils, each containing the values surrounding a pixel. The routine is unrolled for parallelism in one of the two dimensions by a factor PAR_UNROLL. Every row is subdivided into frames of length PAR_UNROLL. The value of PAR_UNROLL, which controls the degree of loop unrolling, is a trade-off between space available on the FPGA and input memory bandwidth.

Read/Write Scaffolding: The image to be processed is read in from external memory at the start of main(). PAR_UNROLL values are read in at one time, with a single call to the memread() function. Since each pixel value is an 8 bit quantity, PAR_UNROLL is set to 16 to match the 128-bit width of the RC100’s memory interface. A parallel foreach loop of length NREADS is used to input the entire image; thus PAR_UNROLL times NREADS must be the number of pixels in the image. A reshape() command is used to convert the list of 128-bit wide vectors into the 2D array used by subsequent routines.

Writing out the processed image is basically the reverse of the image input: first the processed image is converted to a list of 128-bit vectors using another reshape() command, then the vectors are written out with a memwrite() loop.

Modifying the Image Size Constants: Dynamic memory allocation is not available in Mitrion-C, so a number of parameters related to the size of the image and the degree of parallelization must be defined at compile time at the top of the Mitrion-C program. The details of the parameters are explained in the program header. The Mitrion-C program must be re-compiled and a new bitstream generated for each image size.
If you wish to process an image other that the default size of 512 by 512 pixels, change the following definitions in the top of the sobel.mitc program: IMSIZEY, IMSIZEX, IMSZ_P, and NREADS. If you wish to change the pixel depth, you must change PIXEL, PIXMAX and NREADS. The comments in the source code explain the meaning of these constants in more detail.

A single image (or multiple images packed sequentially into a single buffer) must be less than or equal to 8 MB in order to fit in RC100 memory. Regardless of the number of images in a buffer, the size of a buffer to be processed determines how large the external memory (MEM_SRAM and NWORDS_SRAM in sobel.mitc) should be declared. NWORDS_SRAM must be the same in both the Mitrion-C and host programs otherwise deadlocks can occur.

When multiple images are processed, the total number of images must be a multiple of the number of images in a single buffer because the data transfers to/from the FPGA must all be the same size.

The sobel.mitc program listing for SGI RASC RC100:

Mitrion-C 1.5;
/****************************************************************
*
* Sobel Edge Detection via convolution
*
* This Mitrion-C program is intended for use with a SGI RASC
* RC100. It expects an input image in Bank A of RC100 external
* memory and stores the convolved image in Bank B.
*
*/


/*********************************************************************
* The size of External Memory on the RC100 is defined by RASC as two
* banks of SRAM, each containing 1048576 (0x100000) 128 bit words of
* memory.
*
* In order to take advantage of automatic double buffering (a memory
* scheme where new data is written to the FPGA while it is
* processing old data) implemented in RASClib the usable memory is
* only half of the physical memory to account for the second buffer.
*
* NOTE: This size is required to be the same in host program.
*/
const MEM_SRAM_NWORDS = 0x80000;
const MEM_SRAM_NBITS = 128;
type MEM_SRAM = typedef mem bits:MEM_SRAM_NBITS[MEM_SRAM_NWORDS];

type PIXEL = typedef uint:8; /* Pixel depth. If changed, change PIXMAX also */
const PIXMAX = 255; /* Maximum value for PIXEL type */
const IMSIZEY = 512; /* Image height/width dimensions are IMSIZEY/IMSIZEX */
const IMSIZEX = 512; /* If image size changes, change IMSZ_P and PAR must
be changed accordingly */

/*********************************************************************
* If the number of pixels per external SRAM word (PIXPER) is
* different from the parallel unrolling factor (PAR_UNROLL), you
* must define MISMATCH by uncommenting the following line:
*
* #define MISMATCH
*
* When the two are the same, it is possible to reshape the list-list
* to a list-list-vector (where the inner vector is the unrolled
* parallel loop) in one step. If the values are mismatched, two
* additional reformats are needed, using additional time and space.
*
*/
const PIX_PER_WORD = 16; /* PIXELs per SRAM word, half this for 16 bit PIXELs */
const PAR_UNROLL = 16; /* parallel unrolling factor */


const NWORDS_PER_IMG = IMSIZEY * IMSIZEX / PIX_PER_WORD; /* Number of words per image */
const IMG_BLOCKSIZE = IMSIZEX / PAR_UNROLL; /* Width of one block of image */



/*********************************************************************
*
* Sobel Edge Detection Example Using Convolution
*
*/
(MEM_SRAM, MEM_SRAM) main( MEM_SRAM mem_a_00, MEM_SRAM mem_b_00 )
{
/*******************************************************************
* Read in the image data PIX_PER_WORD values at a time
*/
(image_lv, mem_a_02) = foreach ( i in <0 .. NWORDS_PER_IMG-1> )
{
PIXEL[PIX_PER_WORD] image_v ;
(image_v, mem_a_01) = memread( mem_a_00, i );
} (image_v, mem_a_01) ;

/*******************************************************************
* Convert the input list into a rectangular image which is
* unrolled PAR_UNROLL times in the X dimension. See MISMATCH
* comments above for more information.
*/
#if defined(MISMATCH)
input_ll = reformat( image_lv, <NWORDS_PER_IMG><PIX_PER_WORD> );
image_ll = reshape( input_ll, <IMSIZEY><IMG_BLOCKSIZE><PAR_UNROLL> );
image_llv = reformat( image_ll, <IMSIZEY><IMG_BLOCKSIZE>[PAR_UNROLL] );
#else
image_llv = reshape ( image_lv, <IMSIZEY><IMG_BLOCKSIZE>[PAR_UNROLL] );
#endif


/*******************************************************************
* Convert every pixel in the input image to a 3x3 stencil of
* pixels.
*/
PIXEL <IMSIZEY><IMG_BLOCKSIZE>[PAR_UNROLL][3][3] stencil_llvvv =
stencil_gen_2d( image_llv );


/*******************************************************************
* Iterate over all the stencils, convolving each stencil
*/
newimage_llv = foreach ( stencil_lvvv in stencil_llvvv )
{
newimage_lv = foreach ( stencil_vvv in stencil_lvvv )
{
newimage_v = foreach (stencil_vv in stencil_vvv)
{
PIXEL newimage = do_sobel( stencil_vv ) ;
} newimage ;
} newimage_v ;
} newimage_lv ;



/*******************************************************************
* Convert the rectangular image, which was unrolled, back to a
* list of packed 128 bit values.
*/
#ifdef MISMATCH
imageout_lll = reformat (newimage_llv, <IMSIZEY><IMG_BLOCKSIZE><PAR_UNROLL>) ;
imageout_ll = reshape (imageout_lll, <NWORDS_PER_IMG><PIX_PER_WORD>) ;
imageout_lv = reformat (imageout_ll, <NWORDS_PER_IMG>[PIX_PER_WORD]) ;
#else
imageout_lv = reshape (newimage_llv, <NWORDS_PER_IMG>[PIX_PER_WORD]) ;
#endif


/*******************************************************************
* Write the image data to external memory bank B
*/
mem_b_02 = foreach ( imageout_v in imageout_lv by i )
{
mem_b_01 = memwrite( mem_b_00, i, (bits:MEM_SRAM_NBITS) imageout_v );
} mem_b_01;

} (mem_a_02, mem_b_02) ;



/*********************************************************************
* Stencil generator
*
* A 2-D image (list of lists) is input, producing a list of 3x3
* stencils, one for each input image pixel.
*
* The list is subdivided into vectors of length PAR_UNROLL. This is
* done to afford unrolling of the minor (X) dimension of the image
* to exploit loop level parallelism.
*
*
* NOTE:
* Language restrictions require that the result of looping over a
* collection produces a collection of the same length. However,
* 3x3 stencils are not valid on the edges of an image, therefore
* the stencils produced at the edges of the image should not have
* been computed and are incorrect.
*/
stencil_gen_2d( PIXEL <IMSIZEY><IMG_BLOCKSIZE>[PAR_UNROLL] image_llv )
{
/*******************************************************************
* Define initial (all 0) values the first row of stencils
*/
PIXEL <IMG_BLOCKSIZE>[PAR_UNROLL][3][3] st_lvvv = foreach(v in <0 .. IMG_BLOCKSIZE-1>)
{
e_vvv = foreach (e_vv in [0 .. PAR_UNROLL-1]) [ [0, 0, 0],
[0, 0, 0],
[0, 0, 0] ];
} e_vvv ;


/*******************************************************************
* Serial loop over the rows of the image array
*/
stencil_llvvv = for ( image_lv in image_llv )
{
/***************************************************************
* Initialize a list of 3x3 stencils, one for each stencil to
* computed concurrently (PAR_UNROLL times).
*/
PIXEL [PAR_UNROLL][3][3] st_vvv =
foreach (vv in [0 .. PAR_UNROLL-1]) [ [0, 0, 0],
[0, 0, 0],
[0, 0, 0] ];

/***************************************************************
* Serial loop over all the unrolled blocks of columns in the
* row
*/
st_lvvv = for ( pix_v, st_block_vvv in image_lv, st_lvvv )
{
/*****************************************************************
* Initialize the value of the "previous" stencil
*/
PIXEL [3][3] st_vv = [ [0, 0, 0],
[0, 0, 0],
[0, 0, 0] ];

/*****************************************************************
* Parallel loop over blocks of columns
*/
uint:8 ix = 0 ;
st_vvv = for ( pix, st_cols_vv in pix_v, st_block_vvv )
{
/*************************************************************
* Parallel 3x3 stencil construction
*/
st_vv = foreach ( i in [0 .. 2] )
{
st_v = foreach ( j in [0 .. 2] )
{
ptemp_i = if ( i > 1 )
{
ptemp_j = if ( j > 1 )
pix
else
{
ptemp = if ( ix > 0 )
st_vv[i][j+1] // value from previous stencil
else
// Value from previous unrolled region
st_vvv[PAR_UNROLL-1][i][j+1] ;
} ptemp ;
} ptemp_j
else
st_cols_vv[i+1][j]; // value from previous row
} ptemp_i ;
} st_v ;
ix = ix + 1 ;
} >< st_vv ;
} >< st_vvv ;
} >< st_lvvv ;
} stencil_llvvv ;





/*********************************************************************
* Perform Sobel convolution on a single stencil
*/
PIXEL do_sobel ( PIXEL [3][3] stencil )
{
/*******************************************************************
* Create x and y gradients
*/
gradX = stencil[0][2] + 2 * stencil[1][2] + stencil[2][2]
-stencil[0][0] - 2 * stencil[1][0] - stencil[2][0] ;

gradY = stencil[0][0] + 2 * stencil[0][1] + stencil[0][2]
-stencil[2][0] - 2 * stencil[2][1] - stencil[2][2] ;

/*******************************************************************
* Gradient magnitude approximation (from Myler)
*/
gsum = _abs(gradX) + _abs(gradY) ;


/*******************************************************************
* Limit value
*/
newpix = if (gsum > PIXMAX) PIXMAX else gsum ;
PIXEL newpix1 = if (newpix < 0) 0 else newpix ;

/*******************************************************************
* Invert image
*/
PIXEL newpix2 = PIXMAX - newpix1 ;
} newpix2 ;