Visual 3D Mapping of the Seafloor

Traditionally, acoustical techniques have been used for mapping the ocean floor, e.g. using echo sounders. Our goal is to complement the acoustic information by establishing visual mapping techniques (and ultimately to combine the best of both worlds). They are directly understandable by human observers and they provide very high resolution 3D models that allow for measuring distances, surfaces, volumes etc. While e.g. multibeam echo sounders require external sensors for localization, the motion of a camera can be derived from the video sequence itself, which is called simultaneous localization and mapping (SLAM) or "structure and motion". Our goal is to map large deep sea environments with these methods in order to see what the status is and what is changing.

One important and dynamic environment are black smoker fields. Here extremely hot water escapes from deep under the seafloor and the carried minerals create large structures, e.g. 20m high towers, in these environments. These black smokers are both biologically and geologically of very high interest and can also contain high amounts of resources. Understanding their growth and how the habitats around them behave is a central research question that can be tackled using visual methods and also visual-acoustic sensor fusion. During the research cruise Falkor-160320 that brought us close to Tonga, we have visually and acoustically scanned an old crater with 500m diameter that contains an entire black smoker field. The videos for this cruise can be seen here:

From this, and also from other cruises, we have tremendeous amounts of video and photo material. Unfortunately, deep sea navigation data from external sensors is very inaccurate and so we have to use the visual data in order to refine the navigation. The general principle used is that corresponding seafloor points are seen and identified in several subsequent images, and also when the robot comes back to the same place. These correspondences provide geometric constraints on the camera motion, and thus on the robot motion. Once the motion is recovered, dense depth estimation techniques can be used to estimate the distance of each pixel in each image to the camera, to fuse these estimates and to finally create a 3D model of the environment.

We are currently looking for students or HiWis to work on several sub-problems:

robust feature matching and loop detection

In order to reduce drift, it is advisable to often return to places seen earlier during mapping and to cross the old track. When this is recognized in the visual data, some extra constraints are available that stabilize the trajectory estimation against drift. Because of computational complexity, recognizing places seen earlier must be achieved through efficient indexing and not through exhaustive comparison to old data. Earlier approaches on land quantize the image contents into "visual words" and perform location recognition on these.

loop closure

Once a potential loop is detected, this has to be verified and inconsistencies and larger drift of the path has to be corrected. Loops can however consist of many images and using all data might be intractable. Rather, efficient techniques need to be used that solve an approximation of the original problem and make the trajectory almost correct.

large scale efficient bundle adjustment

When the largest drift and contradictions have been removed, the final trajectory is finally optimized using bundle adjustment. Underwater, this can mean optimization in the Gauss-Helmert model with implicit constraints using tens or hundreds of thousands of parameters. Efficiently (memory and/or runtime) constructing the equation systems and solving these using numerical methods is key to being able to handle realistic scenarios. Parallelization and dedicated GPU implementations are desirable. MSc Thesis topic (pdf)

dense modeling

Once the trajectory is solved, the goal is to compute the geometry of an entire surface. This is complicated by the fact that the light source is moving. Estimating the underwater imaging conditions and the light situation from the visual data is another topic. Finally, all data from all images need to be fused into one big model that can be visualized e.g. in Google Earth or virtual environments like our ARENA lab.

If you are interested to work on real world problems that help understanding our planet, and if you have a background in computer vision, geometry or numerics please contact Dr. Kevin Köser or Dr. Anne Jordt. Besides the scenario presented above, we are also working on visual real-time navigation and many other visual and acoustic sensing challenges.