Continuous Integration on your Machine/Deep Learning models with Moven

As we previously discussed, Moven is the small piece of technology we have developed in the SSIX Project for helping us dealing with the distribution of the large models the project is producing. It has help us a lot during development and deployment. But today we are going to focus on a particular phase of the development process: testing and continuous integration.

Continuous Integration (CI in short) is a development practice that systematically performs some  integration checks of the code. Each time the new code is then verified by an automated build, allowing teams to early detect problems early.

In the last weeks in SSIX Project we have started to experiment on using Moven for doing Continuous Integration on our Machine/Deep Learning models. If you are using any CI system that support Docker as environment definition (e.g., Bitbucket Pipelines), you can use the Moven Docker image for getting Moven in your CI chain.

For instance, this could be the bitbucket-pipelines.yml definition on Bitbucket Pipelines for a Keras-based classifier:

image: redlinkgmbh/moven

pipelines:
  default:
    - step:
      script:
        - moven models.txt
        - pip install -r requirements.txt
        - python -m nltk.downloader stopwords sentiwordnet punkt wordnet
        - mkdir ~/.keras && echo '{ "backend":"theano" }' > ~/.keras/keras.json
        - nosetests

But the same could be probably ported to any other CI tool (Travis, Jenkins, etc.), with the requirements of being able to use Docker to customize your build image, and of course your custom memory requirements for your models. As the whole Moven, the source code of the Dockerfile is open source, so you are free to fork it and provide a custom image for your concrete need.

The next step would be to use this approach for doing Continuous Training based on user feedback. We are not yet there, but definitely it is a path we want to explore in the near future. There is a lot of interesting research on the field, like the recent research by Google on Federated Learning.

This blog post was written by SSIX partner Sergio Fernández, Software Engineer at Redlink GmbH, leading some of the Big Data developments in the SSIX Project regarding analysis processing. For the latest update, like us on Facebook, follow us on Twitter and join us on LinkedIn.