Cognitive Services: Smart image cropping with AI

At the recent London Sitecore User Group meetup Ian Jepp from Lake Solutions gave a talk on AI Images. He outlined a common problem faced these days where it's a compromise between managing a set of tailored versions of the same image differently scaled for multiple devices and page occurrences, and usage of a single HQ image scaled to desired size. While the first comes with a huge resourcing cost to create and manage such collections, the latter severely impacts the page load time harming both UX and SEO results with Google.

The presented solution was to get images cropped on the fly to thumbnails. That's not a big deal when cutting down to a fixed area of an image, e.g. centre of it, however, would that be sufficient from the UX perspective?

Let's have a look at example with this original image:

 

It's already been scaled down for presentation purpose while a HQ version of it would be several MBs, but still serves the purpose well. Now, let's say we'd like to like to generate a 200x200px thumbnail selecting the central area of the image:

 

Unfortunately, that doesn't look satisfactory as we've cut out a bit of our 'area of interest' which is the car itself. We could have manually selected the 200x200px frame around the car, however, performing this action in bulk, e.g. adjusting thousands of images in the Media Library to produce thumbnails - that sounds like a lot of tedious, manual work and not an acceptable option without a team of editors.

Here come the Microsoft Cognitive Services with Computer Vision Service offering its advanced features for processing images, including 'smart' image cropping what is exactly what we need. Let's see how the service would handle this task for us using AI to identify the crucial area and cutting the image down to desired size:

 

That's a spot on result.

As the presentation didn't come with open-source code to try it out, I've decided to spend some time on exploring this topic further.

To get Cognitive Services assistance with this task, we need some code sending an image to Computer Vision Service together with the desired width and height of the result image. Microsoft offers a code sample for a quick start. As the service allows some flexibility on how to provide it with the image source (API Reference), I'll present 2 overloads - one accepting a serialized image, while the other providing URL of one already available on the web (that includes your CDN of course):

namespace Sitecore91.Foundation.CognitiveServices.ComputerVision
{
    public interface IComputerVisionService
    {
        byte[] GetThumbnail(byte[] image, int width, int height, bool isSmartCrop = true);
        byte[] GetThumbnail(string imageUrl, int width, int height, bool isSmartCrop = true);
    }
}

Now, let's get a simple implementation:

using System;
using System.Net.Http;
using System.Net.Http.Headers;
using Sitecore.Diagnostics;

namespace Sitecore91.Foundation.CognitiveServices.ComputerVision
{
    public class ComputerVisionService : IComputerVisionService
    {
        private readonly string _subscriptionKey = "<-SUBSCRIPTION KEY->";
        private readonly string _serviceUrl = "https://<-AZURE REGION->.api.cognitive.microsoft.com/vision/v2.0/";

        public byte[] GetThumbnail(byte[] image, int width, int height, bool isSmartCrop = true)
        {
            var apiMethod = "generateThumbnail";
            var requestUri = _serviceUrl + apiMethod + $"?width={width}&height={height}&smartCropping={isSmartCrop}";

            try
            {
                using (var client = new HttpClient())
                {
                    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", _subscriptionKey);
                    
                    using (var content = new ByteArrayContent(image))
                    {
                        content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
                        using (var response =  client.PostAsync(requestUri, content).GetAwaiter().GetResult())
                        {
                            if (response.IsSuccessStatusCode)
                            {
                                return response.Content.ReadAsByteArrayAsync().GetAwaiter().GetResult();
                            }
                            var errorMessage = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                            Log.Error(errorMessage, this);
                            return null;
                        }
                    }
                }
            }
            catch (Exception e)
            {
                Log.Error(e.Message, this);
                return null;
            }
        }

        public byte[] GetThumbnail(string imageUrl, int width, int height, bool isSmartCrop = true)
        {
            var apiMethod = "generateThumbnail";
            var requestUri = _serviceUrl + apiMethod + $"?width={width}&height={height}&smartCropping={isSmartCrop}";

            try
            {
                using (var client = new HttpClient())
                {
                    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", _subscriptionKey);

                    using (var content = new StringContent($"{{url:\"{imageUrl}\"}}"))
                    {
                        content.Headers.ContentType = new MediaTypeHeaderValue("application/json");
                        using (var response = client.PostAsync(requestUri, content).GetAwaiter().GetResult())
                        {
                            if (response.IsSuccessStatusCode)
                            {
                                return response.Content.ReadAsByteArrayAsync().GetAwaiter().GetResult();
                            }
                            var errorMessage = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
                            Log.Error(errorMessage, this);
                            return null;
                        }
                    }
                }
            }
            catch (Exception e)
            {
                Log.Error(e.Message, this);
                return null;
            }
        }
    }
}

Code looks quite straightforward. All you need to do is replacing the 'SUBSCRIPTION KEY' and 'AZURE REGION' variables which you get upon registering for Azure Cognitive Services in a certain region. All the necessary information how to do that is available in the links provided above. What's more, Computer Vision Service offers a free tier sufficient for development and testing purposes, so nothing stops you from giving it a try with your solution.

Comments are closed