Sitecore development with Docker

There's already couple of good Sitecore Docker base images repositories on GitHub allowing you to quickly build and run Sitecore in Docker containers. Recently Martin Miles wrote a great tutorial on how to start with Sitecore in Docker containers. I'll explain how to make the next step and perform development of your Sitecore instance running in Docker.

1. Repository with Sitecore Docker base images
To start, we need Sitecore Docker images. sitecoreops/sitecore-images repository offers a diverse set of Sitecore (up-to-date - 9.1.1) base images providing you with scripts to build and run Sitecore in XP0, XM1 and XP1 topologies, including images with pre-installed Sitecore modules. Thanks to that, you can easily deploy a scaled instance of Sitecore with SXA or JSS!

 2. Docker registry (Optional)
It's not necessary, but makes your work with images easier as you can store a collection of built, ready to use images for immediate download. In case you don't want to use it, just modify the Powershell script provided in sitecore-images and cut out the code section referring to image upload. Alternatively, to turn off the image upload temporarily while keeping the registry configured and ready just in case, set the PushMode to "Never" (instead of "WhenChanged" by default).

My registry of choice is Azure Container Registry. Used it only as for image storage purpose, however, it offers many more features such as automatic image rebuild on code repository commit allowing you to always have an up-to-date collection of images corresponding to the latest code base. What comes in handy is that you can create as many repositories as you want. That's great in case of trying out XP1 topology consisting of 7 containers running in parallel (SOLR, SQL, CM, CD + another 3 for XConnect). Images are built with explicitly defined layers (separate Dockerfiles), so for instance the SXA CM is constructed with the following dependency chain, where each image means a separate repository:

sitecore-xp-sxa-1.8.1-standalone
sitecore-xp-pse-5.0-standalone
sitecore-xp-standalone
sitecore-xp-base

Basic tier gets you 10GBs of storage what costs less than 4£ a month (~$5; in North Europe Azure region). Complete set of images required to run SXA and JSS in XP takes less than 3GB, so the offered space seems to be definitely enough for a start. What's awesome, you don't have to update to a higher tier if running out of storage - you pay as you go and for example, another 10GBs used costs you around 0.8£ (~$1) a month, where all the extra costs are charged per day. Similar basic setup on Docker Hub would cost me $22 a month, so that was a no-brainer.

3. Volume configuration
Code: To deploy code to Docker container, you need to define a volume with a proper binding. It directly reflects the state of your local folder within a container. That means if you're hoping for binding your local Sitecore deployment folder to the container instance folder to blend your code in - that's a nope. Instead, the Sitecore instance files in container would get overwritten. The resolution is to use volumes to let your Docker container to access files from a host folder. That means you need to split your code deployment into 2 steps:

  • Deploy your code to a local deployment folder binded to a folder in Docker container
  • Copy that code to the Sitecore instance within your Docker container

To make the binding work as above, add this to your container's volumes section in docker-compose.yml:

volumes:
  - type: bind
    source: C:/Dev/Sitecore91/docker
    target: C:/deployment

"C:/Dev/Sitecore91/docker" is your local deployment folder where you deploy your solution and serialization files ('C:/Dev/Sitecore91' is my solution dev folder, so just added a 'docker' folder to it as a sibling of 'src' folder)
"C:/deployment" is a folder in your Docker container, where locally deployed files are accessible from

Serialization: To make the Unicorn serialization work, your CM instance needs to have serialization files accessible. To achieve that, you also need to use a binded folder as mentioned. Not to create additional bindings, let's use the one above making that folder a root container for both code and serialization files. Important thing to mention is that Unicorn locates the serialization files based on the folder structure as defined in your Unicorn config files. That means if you stick to outlined convention you need to copy the serialization files preserving the folder structure, just as they get serialized in your solution folder.

Tip Despite container recreation, your changes made to Sitecore databases will be preserved. Why's that? SQL instance in sitecore-images repo is pre-configured to use volumes too. If you take a closer look at sitecore-compose.yml, the SQL container definition has a volume defined:

volumes:
  - .\data\sql:C:\Data

which contains differential backups for all  the databases used in your Sitecore Docker deployment. Need clean, OOTB databases? Just delete those backups before the next run.

4. Solution configuration
Code: You need to create a publishing profile deploying files to your local deployment folder binded to the container. Make sure it's a separate, empty folder so that you can fully purge it before each deployment to assure it contains only the current unit of deployment (no leftovers). I just made it "C:\Dev\Sitecore91\docker\code".

Serialization: Unicorn depends on the sourceFolder variable defined in your config files. Make sure it contains a valid path within your Docker container, so in our case:

<sc.variable name="sourceFolder" value="C:\inetpub\sc\serialization" />

"C:\inetpub\sc" is a default sc instance folder defined in Dockerfiles for base images in sitecore-images repo.

5. Deployment scripts
Here's a basic script performing code deployment. You can run it each time to deploy the code, automate this process further by triggering the script as a post-deployment task or make it run right after spinning off your Docker containers.

# Clear the deployment folder
Remove-Item –path C:\Dev\Sitecore91\docker\code\* -Recurse

# Build and deploy solution code into deployment folder
C:\"Program Files (x86)\Microsoft Visual Studio"\2019\BuildTools\MSBuild\Current\Bin\MsBuild.exe /t:Clean,Build /p:DeployOnBuild=true /p:PublishProfile=Docker

# Copy solution code within the container
docker exec ltsc2019_cm_1 robocopy \deployment\code \inetpub\sc /s
docker exec ltsc2019_cd_1 robocopy \deployment\code \inetpub\sc /s
  • Local deployment folder is "C:\Dev\Sitecore91\docker". It contains 2 folders: "code" and "serialization" to deploy the code and serialization files separately.
  • Publish profile for each project is "Docker".
  • "Docker" publish profile target folder is your Local deployment folder.
  • Using absolute path for MsBuild.exe in my VS2019 instance, but could store MsBuild.exe path as environmental variable.
  • Using robocopy as Copy-Item does not support long paths (over 260 chars) often present with Unicorn serialization.
  • Code is deployed to both CM and CD instances, obviously.


This script copies the Unicorn serialization files for items' syns either in a fresh Sitecore container or after pulling code from code repository. It can be extended with Unicorn Remote item sync, which would automatically sync the items in your Docker container.

# Clear the deployment folder
Remove-Item –path C:\Dev\Sitecore91\docker\serialization\* -Recurse

$sourceFolder = "C:\dev\Sitecore91\src"
$destinationFolder = "C:\dev\Sitecore91\docker\serialization"

# Copy serialization files preserving the folder structure into deployment folder
$folders = Get-ChildItem $sourceFolder -Recurse -Directory | Where-Object { $_.Name.Equals("serialization") } | Select-Object -expandproperty fullname
ForEach($i in $folders) { robocopy $i $($destinationFolder+$i.replace($sourceFolder,"")) /s }

# Deploy serialization files in the container
docker exec ltsc2019_cm_1 robocopy \deployment\serialization \inetpub\sc\serialization /s

The scripts above do the job, while you're encouraged to extend and tune them. You'd run them tens of times a day to develop Sitecore with Docker, so try to make your experience as seamless and convenient as possible.

6. Accessing the databases
Sitecore databases are hosted in a separate container, by default ltsc2019_sql_1 available at localhost:44010. How to access the databases from outside of the container? Let's try with SSMS:


'localhost,44010' to access sqlserver available at localhost:44010 as mentioned above
login and password are available in Dockerfile for sitecore-xp-sqldev

Now you're good to manipulate you Sitecore instance's databases just like in a regular, local dev environment:


Happy Sitecore development with Docker containers.

Cognitive Services + Sitecore: Smart image tagging with AI (part 2)

Sitecore 9.1 release has brought many useful features including Cortex - widely advertised integration and support for AI and ML. For now it covers 2 features: component personalisation suggestions and content tagging. The latter comes with an OOTB integration with OpenCalais API. This service provided by Thomson Reuters provides categorisation suggestions based on the meaning of provided content. Combined with Sitecore, it automatically assigns tags based on item's text fields content.

Unfortunately, it's fairly simple and does not cover tagging based on the image content. However, with the help of Microsoft Cognitive Services (Tag Image endpoint to be precise) we can make it work and get a collection of relevant tags.

Let's get to work. To start we need to figure out OOTB tagging architecture and how it works, so a quick look at Sitecore docs gives us a brief idea of what we need. What we'll do is extending the current content tagging pipeline, so that processing a media item will not only use OpenCalais (which must be active) with item's text fields to produce the tags, but also use the MS Cognitive Services to analyze the image and describe it with appropriate tags.

Precisely, what we need are 2 new providers:

  • Content provider extracting the string of relevant information out of the item for further analysis
  • Discovery provider containing the business logic behind processing extracted data and producing tags

Ideally, there should be also an initial validation processor encapsulating all the checks and aborting the pipeline in case of any issues (missing configuration etc.) or if the item is not a good fit for our custom tagging. In our case it's just a functional extension, so performing such an abortion would stop the whole Content Tagging pipeline, not just our additional functionality resulting in no tags assigned at all. Because of that and to simplify the code, I decided to move those checks to Content and Discovery providers. What's more, not all 4 of providers available to be put in the pipeline have to be implemented. That means that Content and Discovery ones will add extra image tags returned by the API for a further, standard processing performed by the latter 2 providers. The configuration of such extension looks as follows:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/">
  <sitecore>
    <contentTagging>
      <configurations>
        <config name="Default">
          <content>
            <provider name="ComputerVisionContentProvider"/>
          </content>
          <discovery>
            <provider name="ComputerVisionDiscoveryProvider"/>
          </discovery>
        </config>
      </configurations>
      <providers>
        <content>
          <add name="ComputerVisionContentProvider" type="Sitecore91.Foundation.Tagging.Providers.ComputerVisionContentProvider, Sitecore91.Foundation.Tagging" />
        </content>
        <discovery>
          <add name="ComputerVisionDiscoveryProvider" type="Sitecore91.Foundation.Tagging.Providers.ComputerVisionDiscoveryProvider, Sitecore91.Foundation.Tagging" />
        </discovery>
      </providers>
    </contentTagging>
  </sitecore>
</configuration>

Now, here's the Content provider extracting the taggable data from the item. Standard process is extraction of the item text fields and sending them down the pipeline for conversion into tags. In our case to tightly follow that purpose we'd have to serialize the image into byte array. Let's go for a different approach, extract just the image ID and pass it further so that we'll delegate and encapsulate this task within Discovery provider. The only check we make is if it's a media item which does not eliminate, but strongly reduce the number of unsupported items left for further processing.

using Sitecore.ContentTagging.Core.Models;
using Sitecore.ContentTagging.Core.Providers;
using Sitecore.Data.Items;

namespace Sitecore91.Foundation.Tagging.Providers
{
    public class ComputerVisionContentProvider : IContentProvider<Item>
    {
        public TaggableContent GetContent(Item item)
        {
            var stringContent = new StringContent();
            if (item.Paths.IsMediaItem)
            {
                stringContent.Content = item.ID.ToString();
            }
            return stringContent;
        }
    }
}

To align the business logic with the approach taken in Content provider, we need think on how to handle the 'special' case of processing the image item. Just to remind: the providers described here are just extending the existing pipeline, so we focus on handling this case only exclusively by processing the taggable content only if:

  • all it contains is an item ID (it's been processed by our custom Content provider)
  • the item is a media item (in this use case we're tagging images from media library only)
  • the media item is an image (as only images will get correctly tagged by the service)

After that we just serialize such image and pass it to the service. Please keep in mind that we also have the other 'Tag' implementation accepting image URL which would make the API calls much faster. I just used the other overload as found it easier to use in development environment.

I also made use of the 'confidence' parameter to filter out the tags with less than 50% of confidence to filter out the potentially irrelevant tags.

using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using Microsoft.Extensions.DependencyInjection;
using Sitecore.ContentTagging.Core.Models;
using Sitecore.ContentTagging.Core.Providers;
using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.DependencyInjection;
using Sitecore91.Foundation.CognitiveServices.ComputerVision;

namespace Sitecore91.Foundation.Tagging.Providers
{
    public class ComputerVisionDiscoveryProvider : IDiscoveryProvider
    {
        public IEnumerable<TagData> GetTags(IEnumerable<TaggableContent> content)
        {
            var computerVisionService = ServiceLocator.ServiceProvider.GetService<IComputerVisionService>();

            var tagDataList = new List<TagData>();
            foreach (StringContent stringContent in content)
            {
                if (string.IsNullOrEmpty(stringContent?.Content) || !ID.TryParse(stringContent.Content, out var itemId))
                    continue;

                var item = Sitecore.Configuration.Factory.GetDatabase("master").GetItem(itemId);
                if (!item.Paths.IsMediaItem)
                    continue;

                var mediaItem = new MediaItem(item);
                if (!mediaItem.MimeType.StartsWith("image/"))
                    continue;

                var image = (byte[]) new ImageConverter().ConvertTo(Image.FromStream(mediaItem.GetMediaStream()), typeof(byte[]));
                var result = computerVisionService.Analyze(image, "tags", "", "en");
                var tags = result.tags.Where(x => x.confidence > 0.5).Select(x => x.name);

                tagDataList.AddRange(tags.Select(tag => new TagData {TagName = tag}));
            }
            return tagDataList;
        }

        public bool IsConfigured()
        {
            return true;
        }
    }
}

From implementation point of view that's it - 2 providers extending the pipeline. All you have to do now to test the solution is to select a media item in Content Editor, select the Home tab in the ribbon and click Tag Item in Content Tagging section. After several seconds of processing of our sample image:

You can refresh the 'Tagging' section to see the following tags:

That aligns with the collection generated in my previous post where we directly hit the service with this exact image. We have 1 tag missing comparing with the previous tag set and that's the result of filtering using the 'confidence' field.