Snapshot release of image package 2.1.1

The image package has accumulated a lot of changes since its last release and I’m hoping to make a new release soon (to match the release of Octave 3.8.0). I have prepared a snapshot tarball (version 2.1.1) with the current development status which can be installed easily with pkg. Would be great if users of the image package could give it a try and report any problems that the many changes may have caused.

It is important to note that this is not a final release. To make this clear, and to avoid that this becomes distributed as if it was, I made it dependent on an yet unreleased version of Octave (it is dependent on the development version of Octave anyway), and made it print a warning about it every time the package is loaded. After reading this, download the tarball and install it with pkg install -nodeps /path/to/tarball

I am partial to the changes in the new version, so these are the ones that I paid more attention to:

  • complete rewrite of imdilate and imerode with a performance improvement and many compatibility bugs fixed;
  • implementation of the @strel class and support for it in the image package functions;
  • support for N-dimensional matrices in several functions;
  • rewrite of the block processing functions which in some cases performs 1000x faster.

There are also a lot of bug fixes and new functions. Some will break backwards compatibility really bad but needed to be done for the sake of Matlab compatibility. For example, bwconncomp was returning indices for object perimeter but should have been returning indices for all elements in the objects. So do take a look at the NEWS file or use news image after installing the package.

After this release, I plan to follow the Octave core release method of keeping two development branches: a stable branch for minor releases with small bug fixes and regressions, and a default branch for big changes. Hopefully that will allow for more frequent releases as things of different levels get ready.

Please report any problems you have found either in Octave’s bug or patch trackers.

GSoC 2013: imerode and imdilate (update #6)

Since I deviated so much from the original project while fixing the image IO functions in Octave core, I decided to only focus on optimizing the imerode and imdilate in the final phase of GSoC. The reason is that these are at the core of many functions in the image package.

On the original project it was planned to do all the work by expanding the __spatial_filtering__ function and that’s where I previously started. While doing so, it became evident that a complete rewrite was necessary. The convn() which could be used in most cases of binary images was performing much faster even though it was performing a more complex operation. As such, performing at least as fast as convn() became the target which was achieved:

octave> se = [0 1 0; 1 1 1; 0 1 0];
octave> im = rand (1000) > 0.5;
octave> t = cputime (); for i = 1:100, a = imerode (im, se, "same"); endfor; cputime() - t
ans = 2.1281
octave> t = cputime (); for i = 1:100, a = convn (im, se, "same") == nnz (se); endfor; cputime() - t
ans = 2.7802

This implementation could be reused in __spatial_filtering__ to also speed up functions such as medfilt2, ordfilt2, and ordfiltn but there are specific algorithms for those cases which should be used instead.

I have tested the new implementation of imerode against the last release (version 2.0.0) and the last development version that was still making use of __spatial_filtering__ (cset 6db5e3c6759b). The tests are very simple (test imerode), just random matrices with different number of dimensions. The new implementation seems to perform much faster in all cases, and shows a performance increase between 1.5-30x (output of test imerode). The differences are bigger for grayscale images (since imerode was already using convn for binary cases), and larger structuring elements (SE) with multiple dimensions.

A couple of things:

  • in the latest release of the image package (version 2.0.0), there was no erosion for multidimensional binary images (only grayscale);
  • both development versions make use of the new strel class. One of the things that it does, its to decompose structuring elements automatically, hence why the tests use a cross rather than a square for SE;
  • I’m only testing with the shape “same” since version 2.0.0 only had that one;
  • when using binary images I test with different percentages of true values since the new implementation is sensitive to it;
  • I do not compare the results since I know the new implementations also fix some bugs, specially related to the border pixels.
  • imdilate uses exactly the same code and so I’m assuming that the differences from imerode are the same.
version 2.0.0 (old) cset 6db5e3c6759b (dev) latest development (new) Performance old/new Performance dev/new
2D binary image (90%) 0.009081  0.024602 0.006240  1.4551  3.9423
2D binary image (10%)  0.007360  0.022881  0.004160  1.7692  5.5000
3D binary image with 2D SE  NO! 0.481470  0.079125 n/a  6.0849
3D binary image with 3D SE  NO!  0.518032  0.075605  n/a  6.8519
7D binary image with 5D SE  NO!  13.940071  0.463229  n/a  30.093
2D uint8 image  0.062324  0.043403  0.029322  2.1255  1.4802
3D uint8 image with 2D SE  NO NO  0.430347  n/a n/a
3D uint8 image with 3D SE  3.061951  1.725628  0.791569  3.8682  2.1800
7D uint8 image with 3D SE  NO NO 2.005325 n/a n/a
7D uint8 image with 7D SE  4.091456  2.940984  0.541634  7.5539  5.4298
2D uint8 image with a larger SE  0.610678  0.305579  0.087445  6.9835  3.4945

 

GSoC 2013: done with ImageIO – please test (update #5)

The imageIO section of the original project was much shorter than this. Originally it was limited to implement imformats, and then expand it to do the writing and reading of multipage images in
a Matlab compatible way which is just implementing the options Index and Frames in imread, and WriteMode in imwrite. The date of that last commit was 16th of July. However, I didn’t stop there and tried to fix other incompatibilities with Matlab and add new options.

Here’s a list of the main changes:

  • dealing with transparency: alpha channel is now returned as the third output of imread and only accepted with the Alpha option of imwrite. Previously, the alpha channel would be considered the 2nd or 4th element on the 3rd dimension for grayscale and RGB images respectively. This prevented the distinction between a RGB with transparency and a CMYK image. Also, since GraphicsMagick has the transparency values inverted from most other applications (Matlab inclusive), we were doing the same. This has been fixed.
  • CMYK images: implemented reading and writing of images in the CMYK colorspace.
  • imread options: in addition to Index and Frames, also implemented PixelRegion, which may speedup reading when only parts of the image are of interest, and Info, for compatibility only, which doesn’t have any effect at the moment.
  • imwrite options: in addition to WriteMode, implemented Alpha (as mentioned above).
  • indexed images: implement writing of indexed images. Previously, indexed images would be converted to RGB which would then be saved. An indexed image will now always return indexes independently if a colormap was requested as output (previously, a raster image would be returned in such cases).
  • floating point and int32 images: GraphicsMagick is integer exclusively and we can’t differentiate between an image saved with int32 or floating point. Since the latter are far more common Octave will now return a floating point image when GraphicsMagick reports a bitdepth of 32. Previously, images with more than a reported bitdepth of 16 would not be read at all (note that this is dependent on the GM build options. If GM’s quantum depth was 16 or 8, the most common, GM would report a file with 32 bitdepth with the quantum depth it was built with so it would read files with a bitdepth of 32 as long as it was not built with quantum depth 32).
  • bitdepth: imread now returns the actual bitdepth of the image in the file rather than trying to optimize it. This is still not perfect but we’re making a better guess at it than before (this should be very simple but with GraphicsMagick it’s not).
  • imfinfo: some new fields, specially reading of Exif and GPS data which allowed to deprecate readexif in the image package.

In the end, I recognize that there’s some incompatibilities left to fix but I don’t know how anymore. As I mentioned on my previous post, GraphicsMagick is too smart for our use. I predict that we will eventually have to move to something different, if for nothing else, to read and write floating point images without trouble. I only had a quick glance on alternatives but FreeImage would seem like a good bet. Of course, there’s also the possibility to write our library wrapper around the format specific libraries.

Would be nice if users could throw some images to imread and imwrite and report any bugs. I’m specially afraid of regressions since there was so much stuff changed and we don’t really have tests for these functions yet.

GSoC 2013: GraphicsMagick (update #4)

Octave uses GraphicsMagick (GM) for reading and writing of images, which gives us a unique interface to read many different formats through a simple and unique interface. We actually get support for more image formats than Matlab.

However, I’m starting to think that GM may not be the best option, or even an option at all, if we wish to be Matlab compatible. The reason is that GM does not gives us methods to access information about the actual image that is on file, only about how the image is being stored. I’ll be listing the problems I have faced during the last few weeks while improving image IO in Octave.

Note that I’m not arguing that GM is bad. We can use it to read pretty much any image, the problem is that what we read may be different from what was in the file. Octave needs to get the original image values and not one of the possible representations of that image. GM is doing a great job at being smart but we may need something more stupid.

Quantum depth

It is well know among those that use Octave for reading images, that we’re dependent on the quantum depth used to build GM. The way it works is that GM can be built with quantum 8, 16 and 32 and defines the type it uses to store the image values (uint8, uint16, and uint32 respectively). Independently of the original image values, GM will scale the values for the range of the type. If the file has 8 bit depth, and GM was built with quantum 16, we will need to fix the range (this seems to be one of the few cases where we can get the actual image information using the depth () method). As long as the images have their values as unsigned integers there is no problem but images are not limited to those types. Images with values as floating point become a problem (see bug #39249) since we loose all information about the range. Also, it appears that at least the JPEG 2000 format may use int16 instead of uint16 and I have no idea what to do about that.

Image class

GM defines two Image Class types: direct and pseudo class. Direct class is for a “normal” image where the image is “composed of pixels which represent literal color values“. Pseudo class if for indexed images where the image is “composed of pixels which specify an index in a color palette“, or colormap in Octave vernacular. The problem is that the class is not of the original image but based on what GM guesses will increase performance. This leads to some JPEG (a format that has no support for indexed images at all), being reported as PseudoClass probably because they have few unique colors. And we can’t make the decision based on the image format since some (such as TIFF and PNG) support both. If an image from that format reports being pseudo class, we have no method to find if GM is choosing that for performance or if it’s the original file class.

Image type

Similarly to the image class, GM also has image types. There’s many more image types which define the image of any class as bilevel, grayscale, RGB, or CMYK, with and without opacity (called matte in GM). Just like it does for the image class, GM guesses which type is better so a file with a RGB image that only uses one of the color channels will report being grayscale, or even bilevel if it only has two values on that channel (see bug #32986 and bug #36820).


For now, there’s not much we can do. Adding methods to find information about the actual image in file through GM is not trivial and outside the scope of my project. I have already implemented the options I proposed to, the ones that allow reading and writing of multi-dimensional images.

Fixing the problems of indexed images and alpha channels was nor part of the proposal but it makes sense to do it as I was going through the code. We are now Matlab compatible but in some cases there is nothing we can do, it’s just the way GM works.

If we wish to be 100% Matlab compatible when reading images, some other strategy is needed. Whether that means improving GM or replace it with other libraries I don’t know. What we need is a C++ library for reading and writing of images in any formats and gives us access to the original image values.

GSoC 2013: rewriting imerode and imdilate (update #3)

I pride myself in writing heavily commented code. That’s not necessarily a good thing, maybe it shows that I need an explanation for things that others consider too trivial to comment, parts were the code comments itself.

Still, sometimes I take a specific approach to a problem, find a problem with it and change it. At the start I thought it to be the best solution, I was unable to foresee the problem. Someone coming later may have the same initial idea as me, and attempt to implement it just to find the same problem. Provided the problem is found early enough, such things are never committed so one can’t simply look into the logs. That’s why I have a fair amount of comments on why code was not written in some other way.

The following is a comment block that I wrote for the imdilate function (currently a m file) that will never be commited since I ended up rewriting the whole thing in C++. But at least I can reuse it for the blog post since it  explains a long standing problem with the morphology functions of the image package. Oh! And the comment was so long that I actually thought it deserved its own title on the source code.

    The problem of even sized and non-symmetric Structuring Elements

Using convn() to perform dilation and erosion of binary images is a
a *lot* faster than using __spatial_filtering__(). While that is
probably an indication that __spatial_filtering__ could be improved,
we still use convn() for both simplicity and performance.

Erosion is simpler to understand because it's a simple spatial filter,
not unlike stamping on the image. Now, dilation is the erosion of the
image complement. This explains how for grayscale images, erosion and
dilation are minimum and maximum filters respectively. The concept is
quite simple as long as the SE is symmetric. If not, then there's the
small detail of having to translate it, i.e., rotate around its center.
But convn() already performs the translation so we don't do it for
dilation (the one that would need it), but do it for erosion (because
convn() will undo it by retranslating it).

A second problem are even sized SE. When one of the sides has an even
length, then we have more than one possible center. For Matlab
compatibility, the center coordinates is defined as
      floor ([size(SE)/2] + 1)
That's also what convn() uses which is good, but since we also need to
do translation of the SE, small bugs creep in which are hard to track
down.

The solution for non-symmetric is simple, the problem arises with the
even sizes. So we expand the SE with zeros (no effect on the filter)
and cut the extra region in the output image.

Note: we could use cross correlation since it's like convolution without
translation of the SE but: 1) there is no xcorrn(), only xcorr() and
xcorr2(); 2) even xcorr2() is just a wrapper around conv2() with a
translated SE.

After spending an entire day playing with the translation of SE‘s and padding of images, I decided to rewrite imdilate and imerode. Once you understand what these are actually doing you’ll see why.

Consider erosion of a logical image as the simplest example. This is basically sliding the SE over the image, and check the values of the image under the true elements of the SE. If any of them is false, then the value at those coordinates on the output matrix will be false. There’s not even need to check the value of all elements under the SE, we can move to the next as soon as we find one that does not match.

Convolution however, needs to multiply the value of the SE with the values on the image, and then sum them all together to get the value for the output matrix. And all of that in double precision. And still, it was performing faster. Just take a look at the following

octave> im = rand (1000) > 0.3;
octave> se = logical ([0 1 0; 1 1 1; 0 1 0]);

octave> t = cputime (); a = convn (im, se, "valid") == nnz (se); cputime() - t
ans =  0.12001
octave> t = cputime (); c = __spatial_filtering__ (im, se, "min", zeros (3), 0); cputime() - t
ans =  2.3641

There’s a couple of reasons why __spatial_filtering__ is slower, but I think the main reason is that it was written to be of a more general tool. It has a lot of conditions that makes it quite useful for all the other functions while working nicely with images with any number of dimensions. But it’s 20 times slower, performing something that should be a lot simpler. Since dilation and erosion are at the core of most morphology operators, they are among the ones that should be optimized the most.

So I wrote my own code for dilation in C++ and managed to make it a lot faster than __spatial_filtering__. Since my implementation does not actually evaluate all of the values its performance on the distribution of true elements in the input matrix. And while it’s a lot faster than __spatial_filtering__, it’s still not faster than convn.

octave> im1 = rand (1000) > 0.3;
octave> im2 = rand (1000) > 0.95;
octave> t = cputime (); a = convn (im1, se, "valid") == nnz (se); cputime() - t
ans =  0.11601
octave> t = cputime (); b = imerode (im1, se, "valid"); cputime() -t
ans =  0.21601
octave> t = cputime (); c = __spatial_filtering__ (bb, se, "min", zeros (3), 0); cputime() -t
ans =  2.3681

octave> t = cputime (); a = convn (im2, se, "valid") == nnz (se); cputime() - t
ans =  0.12001
octave> t = cputime (); b = imerode (im2, se, "valid"); cputime() -t
ans =  0.16001
octave> t = cputime (); c = __spatial_filtering__ (im2, se, "min", zeros (3), 0); cputime() -t
ans =  2.3681

As can be seen, even under optimal conditions (when most of the input matrix is false, my implementation still performs slower than convn). But I’m not done with it yet.

GSoC 2013: imformats and filenames (update #2)

Since my last post, I’ve been working on the image IO abilities of Octave by implementing the missing function imformats(). The purpose of this function is to control imread(), imwrite(), and imfinfo(), by adding support for new image formats and/or changing how to act with existing ones.

By default, Octave’s image IO functions (imread(), imwrite(), and imfinfo()) use the GraphicsMagick library for their operations. Through GraphicsMagick, Octave can support a vast number of image formats, a much larger number than the ones required for Matlab compatibility. However, because of the large ammount of image formats in science and their commonly closed nature, this is still not enough to support all image formats out of the box.

Enter imformats()

imformats() keeps an internal list of all the supported image formats, and what functions should be used to deal with them. What this mean is that when a user calls, for example, imread(), that function only needs to ask imformats() what function should be used to read the file, and then pass all the arguments to it.

In my experience (mainly microscopy), the majority of the closed image formats in science are works around TIFF. For example, a lsm file is just a TIFF file. By default, Octave will recognize it as TIFF and try to read it as such. However it will fail because only half of the images inside the file are the actual images that are meant to be read. The other half are thumbnails to them and should be skipped. In another example, dv files are also just TIFF files but for some weird reason the images are inverted. Here’s how to read such files:

if (strcmpi (extension,".lsm"))
  image = imread (file, 1:2:nPages*2);
elseif (strcmpi (extension,".dv"))
  image = imread (file, 1:nPages);
  image = rotdim (image, 2, [1 2]);
else
  ## let imread decide
  image = imread (file, 1:nPages);
endif
Note that the actual code is a bit more complicated that what I’m showing in order to account for old versions of the formats which can’t be deduced by file extension alone.

As can be seen, most changes required to make Octave able to correctly read other formats are small hacks around imread(). But we can’t simply add a new format to imformats instructing imread() to use a function that makes use of imread() or we will get stuck in an endless loop. Since the actual function performing the action is now separated, one must use the function handle returned by imformats(). imread() can now be configured to read the above formats with the following code:

function image = read_lsm (filename, varargin)
  tif = imformats ("tif");
  ## piece of code to figure nPages from varargin
  image = tif.read (filename, 1:2:nPages*2);
endfunction
function image = read_dv (filename, varargin)
  tif = imformats ("tif");
  ## piece of code to figure nPages from varargin
  image = tif.read (filename, 1:nPages);
  image = rotdim (image, 2, [1 2]);
endfunction

lsm = dv = imformats ("tif");

lsm.read = @read_lsm;
imformats ("add", "lsm", lsm);

dv.read = @read_dv;
imformats ("add", "dv", dv);

This is something not meant to be done in the main code. It’s something that can be added to an .octaverc file, or even better, to the PKG_ADD scripts of an Octave package mean to add support to other image formats.

The end result is Octave consistent code that abstracts himself from file formats.

Filename and Mathworks bashing

There’s a tiny detail that complicated things a bit. My understanding of filename is that it includes the file extension. However, someone at Mathworks seems to think that is open to discussion and made possible to define the extension as a separate argument. As if that was not weird enough, the extension is completely ignored if a file is found without trying to add the extension. Consider the following situation where you have two images, with and without extension.

>> ls
 test_img
 test_img.tif
 test_img2.jpg

>> imread ("test_img");          # reads test_img
>> imread ("test_img.tif");      # reads test_img.tif
>> imread ("test_img2", "jpg");  # reads test_img2.jpg
>> imread ("test_img", "tif");   # reads test_img! ????

So yeah! Why would we pass the extension as a separate argument just to have it ignored? I may be wrong about the why of this API but my guess is that it’s required to support more arguments after the extension:

>> imread ("test_img.tif", "Index", 1:10, "PixelRegion", {[250 350] [100 200]})

If imread() can find a file without the extension, then whatever comes next must be the reading options, and not the extension. But then, why does imwrite() does the same? It certainly can’t make that decision based on whether the file already exists. So we have to check if we have an odd or even number of leftover arguments.

And why the possibility to not specify the file extension in the first place? Again, just my guess. To cover the stupidity of people who don’t know about file extensions because their file managers hides them.

Interestingly, the question whether file extension belongs to the filename is only relevant with image files. Other functions such as audioread(), csvread(), fopen(), or wavread() do not suffer of this.

For more Matlab bashing, go read the abandon matlab blog.

GSoC 2013: image processing of ND images (update #1)

My project proposal for image processing of ND images in GNU Octave was accepted for Google Summer of Code 2013. If all goes as planned, not only will it be a very nice improvement to the image package but also a chapter on my PhD thesis.

I decided to warm up by implementing montage() which basically displays all pages (or frames) of an ND image as a single 2D image. It is not part of the proposal but the coding period hasn’t started anyway, and it will be really useful for the rest of the project. So useful indeed, I had wrote my own version of it last year, completely unaware of Matlab’s implementation and of course, completely incompatible. So I had to do it all over again.

Following the trend of changing 20 000 other small things first, my first contribution ended up being a fix for ind2rgb() causing a complaint in just two hours. Never had I made a commit with such immediate repercussions. Still, I’m taking it positively because:

  1. means I’m changing code that people use and care about;
  2. my fix was actually correct :)

I guess the main problem users will have is understanding ND images. While images can have many dimensions, the standard is to have them as a 4D matrix, the multiple images concatenated into the 4th dimension, and the 3rd being reserved for the colorspace. Expect a new section for the Octave manual.

To implement montage(), the most difficult part was dealing with a cell array of filepaths for images. Handling all different possibilities wasn’t trivial, trying to think of all possible combinations to create a correct image with all of them. In special, multipage indexed images are troublesome because I have never seen one. And because imread() seems to always return a single colormap, a possible Octave bug may exist when reading multipage indexed image where each page has different colormaps. Reading the TIFF6 file specifications such file could in theory exist. However, I don’t think they actually do.

In the end, montage() is finished and added to the image package (r11933 and r11934). In addition to Matlab’s API, I included options to add margins between panels, and configure the colour for both margins and panel background. It’s far from being the smartest piece of code I ever wrote, but smart code would use a lot of permute(), reshape(), and indexing trickery, turning it into a maintenance problem.

It took me 4 times longer than originally planned. I’m blaming the Irish weather.

Unreal Irish weather.

Ireland being famous for its non stop rain, has had amazingly sunny weather for 2 weeks now. Galway has been the world procrastination central as everyone’s out getting as much sun as possible.