If you are looking for a project to stretch your machine learning skills with, here's an idea to play with. It's possible that it's already been done, but my five minute literature search didn't find it.
This is how I would go about it:
1) Using unlabeled video,
2) calculate optical flow
3) using a neural network with an autoencoder-like structure, predict pixel-level optic flow using calculated values as ground truth.
4) Then in still images, use the trained network to estimate optic flow and
5) run blob detection on the estimated optic flow to identify objects.
Pixel-level object segmentation is not as accurate as we would wish, especially in less-than-pristine environments.
This approach may help machines to solve tough clutter and camouflage problems like this and this.
No special treatment or labeling of the data is necessary. Data is bountiful. Any video can be used.