Foreground-background separation

Rationale

Separating foreground from background as part of preprocessing makes the analysis easier and more robust. In particular, the machine learning procedure that is used to detect animal positioning and orientation won’t get confused by details in the background (shadows, reflections, poop).

Constraints and considerations

The algorithm must be able to deal with several conditions arising in real-life data. These conditions are described in subsections below.

Short-time noise

Sometimes the camera can knocked out of place, or the illumination may suddenly change. Both of these events may lead to a short burst of useless frames. These frames should be detected and discarded, and certainly not used as a background.

Slow change in background

The animal may poop, or eat/move around its food pellets. This results in a change in the background that is persistent on a longer timescale. There is therefore no single background for the entire video, but rather a time series of probable backgrounds.

The spatial extent of this change in the background may be as large as the animal itself – an example could be the reflection of lighting in a puddle of urine.

Multiple animals

In particular, one animal may be sleeping while the other is active. The sleeping animal mustn’t become “background”.

Therefore even if there is movement in part of the image, we cannot assume that all the foreground moves at once.

A difficulty could be distinguishing an unmoving animal from a large change in background.

Similar-color background

Parts of the background may contain reflections or shadows that are a similar shade as the animal.

Textured background idea

Animals tend to appear as smooth. If we use high resolution cameras, we can use a background that has high frequency features and use the presence of high frequency components in wavelet transform to detect background.

Note that most videos currently taken by experimenters have a smooth background.

This is probably a bad idea because it requires an unnecessarily high resolution camera.

Design

There’s basically no way of easily and reliably distinguishing (in preprocessing) between an unmoving/sleeping white rat and a static reflection in the background. We therefore require some user input.

Overall procedure

  1. First, the video is split into chapters that are dissimilar with each other. This is unnecessary if the camera remained nicely static and the background/lighting doesn’t change too much. The subsequent steps are performed on each chapter independently.
  2. A representative sample of dissimilar frames is chosen.
  3. Within these frames, the animals’ locations are detected heuristically.
  4. The user is asked to confirm whether the animals were correctly identified in the representative frames, and to perform the selection manually otherwise. The result is a background mask for each of the representative frames.
  5. If the union of these background masks does not cover the entire frame, go back to step 2 (and pick an additional representative frame).
  6. Use the masks to obtain the best guess for the background for each interval between representative frames.