In the name of Allah, most gracious and most merciful,
Computer Vision is an Artificial Intelligence “AI” subfield for training computers to see, process, and understand the visual world. In other words, they want to replicate the human brain’s visual intelligence.
This section is mostly a summary of the book overview section in the Computer Vision Algorithms and Applications text book.
2.1 Image Formation
Image formation processes which create the images that we see. Understanding this process is essential for those who want to take a computer vision scientific discipline.
2.2 Image Processing
This is needed in almost all applications of computer vision. It includes the following topics:
- Linear and non-linear filtering
- The Fourier transform
- Image pyramids and wavelets
- Geometric transformations such as image warping
- Global optimization techniques such as regularization and Markov Random Fields (MRFs)
2.3 Feature Detection and Matching
Detecting certain features for images (like edges, straight lines, etc…) and feature matching is establishing correspondences between two images of the same scene/object. This is a fundamental technique required by many computer vision topics since a lot of current 3D reconstruction and recognition techniques are built on extracting and matching feature points.
Region segmentation techniques, including active contour detection and tracking. Segmentation techniques are essential building blocks that are used widely in various applications, like performance-driven animation, interactive image editing, and recognition.
2.5 Geometric Alignment and Camera Calibration
It includes basic techniques of feature-based alignment. Feature-based alignment is then used
as a building block for 3D pose estimation (extrinsic calibration) and camera (intrinsic) calibration. These techniques could be applied to photo alignment for flip-book animations, 3D pose estimation from a hand-held camera, and single-view reconstruction of building models.
2.6 Structure from Motion
The simultaneous recovery of 3D camera motion and 3D scene structure from a collection of 2D tracked features.
2.7 Dense Intensity-based Motion Estimation
Determining motion vectors that describe the transformation from one 2D image to another. Usually from adjacent frames in a video sequence. Its applications include automated morphing, frame interpolation (slow motion), and motion-based user interfaces.
2.8 Image Stitching
Combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image where a panorama is any wide-angle view or representation of a physical space even if it is an image or drawing. It is one example of computation photography which is mentioned in section 2.7.
2.9 Computational Photography
The process of creating new images from one or more input photographs, often based on the careful modeling and calibration of the image formation process.
Computational photography techniques include:
- Merging multiple exposures to create high dynamic range images
- Increasing image resolution through blur removal and super-resolution
- Image editing and compositing operations
2.8 Stereo Correspondence
It can be thought of as a special case of motion estimation where the camera positions are already known. This additional knowledge enables stereo algorithms to search over a much smaller space of correspondences and, in many cases, to produce dense depth estimates that can be converted into visible surface models.
Applications of stereo matching include:
- Head and gaze tracking
- Depth-based background replacement (Z-keying)
2.9 3D Reconstruction
It is the process of capturing the shape and appearance of real objects which can be accomplished either by active or passive methods.
The collection of techniques for going from one or more images to partial or full 3D models is called Image-based Modeling or 3D Photography.
- Detecting and recognizing faces
- Finding and recognizing particular objects (instance recognition)
- Recognition of broad categories, such as cars, motorcycles, horses, and other animals
3. The Father of Computer Vision
It is commonly accepted that Dr. Larry Roberts is the father of Computer Vision. In his PhD thesis (cir. 1960) at MIT discussed the possibilities of extracting 3D geometrical information from 2D perspective views of blocks. Many researchers in Artificial Intelligence studied computer vision in the context of the blocks world.
4. Computer Vision Applications
- 3D reconstruction from multiple images
- Audio-visual speech recognition
- Augmented reality
- Augmented reality-assisted surgery
- Computer stereo vision
- Autonomous cars (Self-Driving Cars)
- Mobile robots
- Automatic image annotation
- Remote sensing
- Smart camera
- Optical character recognition
5. Skills needed to work as a Computer Vision Engineer
Since Computer Vision is a Machine Learning subfield, then the skills required are the same as Machine Learning Engineer skills written in this post while adding to them the following skills. Of course, the weight of different skills will vary depending on which field you are in (Machine Learning in general, Computer Vision, or NLP for instance).
Not all these skills are required to start a job as a Computer Vision engineer. It differs depending on the needs of the company you are applying for, its needs, and your exact role in it.
- Computer Vision Knowledge
- Image segmentation
- Object detection
- Tracking moving objects over time
- Optical character recognition
- Face detection and recognition
- Python OpenCV library
Thank you. I hope this post has been beneficial to you. I would appreciate any comments if anyone needed more clarifications or if anyone has seen something wrong in what I have written in order to modify it, and I would also appreciate any possible enhancements or suggestions. We are humans, and mistakes are expected from us, but we could also minimize those mistakes by learning from them and by seeking to improve what we do and how we do it.
Allah bless our master Muhammad and his family.