Meta AI’s revolutionary self-supervised computer vision model: DinoV2

5

October

2023

No ratings yet.

As humans we are able to instantly identify the things we see around us, how far away they are, and if they are stationary or not, due to our years of experience and context. Computer vision is the realm of AI which attempts to reproduce these human capabilities in machines (IBM, 2023). It aims to recognize different objects in an image, determine their depth, retrieve similar images, and much more (IBM, 2023). Consequently, computer vision is essential in self-driving vehicles, optical character recognition, and has many more applications. Think of instantly creating highlight reels, identifying production defects, augmented reality, and basically anything that has to do with visually instrumented devices (IBM, 2023).

This year, Meta AI developed their second version of their initial Dino (DIstillation of knowledge with NO labels) self-supervised computer vision model (Ravi, 2023). It is considered quite revolutionary mainly because of the fact that it is self-supervised. This means that the model does not require labeled input, but is able to label the input itself by identifying an unknown part of the input using any known part of the input. For example, a natural language processing model can predict the rest of an input sentence while knowing only a few words (Shah, 2023). Furthermore, DinoV2’s performance is largely superior compared to other self-supervised models, and is even able to compete with supervised models (Ravi, 2023). For the sake of brevity, I will not go into the technical details of DinoV2, however it is able to estimate depth in an image, segment the objects, map it, match parts of different images, and instantly retrieve similar images with high accuracy (Meta, 2023).

Now, Meta and Marck Zuckerberg have been prevalent in the news because of concerns about privacy and the dangers of the Metaverse and their Meta AI branch (Woods, 2022). Therefore, I was surprised to read that all Meta AI’s efforts and algorithms behind Dino and DinoV2 are open source, and that they are actively encouraging others to use and improve DinoV2 (Meta, 2023). Moreover, Meta AI has introduced FACET (FAirness in Computer Vision EvaluaTion), with the aim to serve as a benchmark for evaluating computer vision models’ fairness in classifying and detecting people across different demographics (Meta, 2023). For example, FACET is able to determine whether models are better at detecting nurses when they are female, or footballers when they are male, etc. (Meta, 2023). Thus, using FACET as a benchmark next to your computer vision model can help you identify biases in your training data, which you can then eliminate, making your model less discriminatory.

To conclude, Meta and its AI branch are making waves in many areas, but concerns about privacy and other dangers are ubiquitous. It seems to me that with DinoV2 Meta is doing anything in their power to eliminate these concerns, but one could also argue it is all a facade to hide their true intentions. What do you think?

References

IBM. (2023). What is Computer Vision? | IBM. https://www.ibm.com/topics/computer-vision#:~:text=Computer%20vision%20is%20a%20field,recommendations%20based%20on%20that%20information.

Meta. (2022). DINOV2 by Meta AI. https://dinov2.metademolab.com/demos?category=depth

Meta. (2023). Evaluating the fairness of computer vision models. https://ai.meta.com/blog/dinov2-facet-computer-vision-fairness-evaluation/

Ravi, A. (2023, May 11). DINOV2: the new frontier in Self-Supervised Learning? Medium. https://betterprogramming.pub/dinov2-the-new-frontier-in-self-supervised-learning-b3a939f6d533

Shah, D. (2023). Self-Supervised Learning and its applications. neptune.ai. https://neptune.ai/blog/self-supervised-learning

Woods, J. (2022). 10 Biggest Concerns About The Metaverse, According To Reddit. ScreenRant. https://screenrant.com/biggest-reddit-concerns-about-metaverse/

Please rate this

Leave a Reply

Your email address will not be published. Required fields are marked *