The Future of Machine Vision and How to be ready for that Future

The Future of Machine Vision and How to be ready for that Future

Over the past decade, the promise of machine vision has undeniably taken off. From self-driving cars all the way to facial recognition doorbells, the applications have captured the imagination of the public. To get these solutions right and be where we are today, a massive amount of work was required on the embedded infrastructure. As a developer of products, I find it exciting to see the industry continue to learn and evolve in a scalable way as the demand for machine vision has grown. I would like to share three current trends in this space:

1: Levels of recognition. Often, we get requests to “recognize” an object or people. Recognition has a wide span of meanings. First, there is the deep learning/machine learning level of recognition driven by the real-time needs of self-driving, facial-based identity, and instant awareness of a large number of objects. Figure 1’s upper-right quadrant represents this well, and is an area that many are calling ubiquitous in commercial and industrial applications.

In most cases, however, we have found that lower levels of recognition are good enough for the application rest. There is feature-level recognition (see Figure 2) which is looking for the existence of features in an image, and there is also basic object detection. In both cases, there have been growth in options for developers.

Referring to Figure 1 again, these interesting options are represented by the upper-left and lower-right quadrants.

• The upper-left quadrant represents the sort of new options available using powerful edge compute devices, but with lower-end optics and sensors. An example of this is the optical scanner used for fingerprint identification on smartphones. Accurate results can be taken as low as 500dpi optical sensor.

• The lower-right quadrant represents the options where a more embedded, autonomous compute is paired with higher-end optics and sensors. The earliest Ring doorbell was powered by an iMX.RT from NXP.

2: Independence at the Edge. The growth in lower-level recognition options has created an architectural shift in how dependent deep-learning systems are on the Cloud.

There are now application-specific AI Accelerators that blur the lines between all three quadrants. Google’s Edge TPU, Intel’s Movidius VPUs, NVIDIA Jetson Nano, and others, through parallel computing optimized for machine learning, can add inference to the system—Learning locally, not connected to the Cloud.

“The growth in lower-level recognition options has created an architectural shift in how dependent deep-learning systems are on the Cloud.”

Not all applications need continuous learning. Low-end hardware that only intermittently connects to the Cloud can participate in learning by uploading privacy-safe metrics that then contribute to Big Data which can be used to improve a model that comes down later in a firmware update as well as discover new patterns.

3: Imaging is Everywhere. The recent need to work from home has rapidly changed the average person’s exposure to video and imaging. This has made it more clear than ever that imaging truly is everywhere. A few proof points are as follows.

• Identity using 1-to-1 Match. Earlier this year, taxpayers looking to get a federal Identity Protection (IP) PIN from the IRS website found themselves transferring the process of biometric identification to their smartphone. The app, run by ID.me, does a 1-to-1 match that compares the photo of a government-issued ID to a selfie. Only after a match occurs are users sent back to the website for the issue of the IP PIN.

• Layered Biometrics and Encrypted Coding. My first encounter with the CLEAR app was at CES2022, which required a similar matching process to confirm vaccination status to a government-issued ID. CLEAR, which is more prominently used as identification for airport security, layers both eyes and face detection to create an encrypted code used for identification as well as safekeeping of biometric data.

• Classic image processing and UX. The entire imaging chain is only as good as what is coming in. In both ID.me and CLEAR, users may notice guide marks and provide suggestions on the positioning of the government-issued ID. This is an example of how the marriage of user experience and classic image processing—setting the region of interest is used to increase the success of the more complex algorithms running above it. Ultimately, it is about making users more successful in the new experience.

In closing, we, as product developers, need to determine the real level of recognition that is needed. What makes machine vision exciting is the ever-expanding applications that it can be used, and the breadth of options available to developers to make the experience successful.

Read Also

Hardware is Key to Stronger, More Scalable Security

Steve Orrin, Federal CTO and Senior Principal Engineer, Intel

Increased Semiconductor Yields and Process Reliability Demand Precise Plasma Power Control and Insight

Dhaval Dhayatkar, Senior Director, Plasma Power Products, Advanced Energy

The Semiconductor Market has Changed almost beyond Recognition

Scott D’Cruze, Global Supplier Account Manager at Newark

Our Wireless World: How Wi-Fi 6 will seamlessly Integrate with 5G to Help Keep us Connected

David Haynes, Vice President, Specialty Technologies for Lam Research's Customer Support Business Group (CSBG)

How IP brings technological innovation to life

Kristof Beets, Vice President of Technology Insights, Imagination Technologies