dsu-info(at)geomar.de
Timm Schoening
Leitung DSU
Anne Hennke
Visualization & Stories
Laura Haffert
AI & Digital Twins
Karl Heger
Imaging & Robotics
Judith Fischer
AI, Imaging & Robotics
Sophie Schindler
Image Data Steward
Welcome to the Data Science materials collection, curated by GEOMAR DSU. Here we provide a list of links and descriptions of materials that we find useful in the context of Data Science: Courses, books, use case publications, datasets, etc. We are constantly updating this collection as we come across relevant contributions. Also, let us know your recommendations - we'll be happy to add them here.
"From Data to Knowledge": Check out the inspiring and varied program for the 2-week event: https://events.hifis.net/event/1590/program
The Summer School will take place virtually from 16 – 27 September 2024 and is open to all researchers and staff in the Helmholtz Association.
Online free Python courses
It is difficult to recommend a particular course without knowing the background in programming and the particular application in mind. If you already have programming experience I find it most useful to simply start with a cheat sheet for python, it can quite easily replace a beginners course for python and then you can start with more advanced courses.
Some example cheat sheets:
pdf cheat sheet (a bit messy but also helpful)
In my experience it is good to look for a course that is geared towards data sciences because python is so versatile that some courses can cover a lot of topics that are not necessarily useful for natural scientist.
A highly recommended self-study course on the subject of "Multivariate Exploratory Data Analysis" via Open Classrooms you can find here.
The carpentries platformthis is the platform that we also use to teach our course, it is generally well maintained and validated.
This is a Python course geared towards data science and teaches by applying python to a real world problem.
is geared towards data science and data and offers beginners courses in Python:
The Python community itself offers a lot of material for learning:
More platforms offering a huge range of courses. Most of them free, especially if you do not take the exams and require an official certificate:
Georgia Tech Python course – very high quality, the beginners courses are very good but also very slow if you already have coding experience, it helps to speed up the videos…
offers - like EdX – a lot of courses including python courses for beginners and other applications:
again a huge offer of courses for free.
and another platform for courses, also has a good reputation
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. by Aurélien Géron. Released September 2019. Publisher(s): O'Reilly Media, Inc.
"Python Crash Course: A Hands-On, Project-Based Introduction to Programming" by Eric Matthes
"Automate the Boring Stuff with Python: Practical Programming for Total Beginners" by Al Sweigart
"Python for Everybody: Exploring Data in Python 3" by Charles Severance
"Learning Python, 5th Edition" by Mark Lutz
"Python Programming: An Introduction to Computer Science, 3rd Edition" by John Zelle
"Introduction to Python for Science and Engineering" by David J. Pine
"Python Basics: A Practical Introduction to Python 3" by Real Python
"Think Python: How to Think Like a Computer Scientist" by Allen B. Downey
"Python 101: A Crash Course in Python Programming" by Mike Driscoll
"Python Programming for the Absolute Beginner, 3rd Edition" by Michael Dawson
“Habitat Suitability and Distribution Models with Applications in R” by Guisan et al. (2017)
“Spatial Ecology and Conservation Modeling - Applications with R” by Fletcher & Fortin (2018)
Marine Data Portal: Shows available data from your research area from all DAM partners: bathymetry, sediment and observation datasets, CONMAR datasets;
PANGAEA: Marine and environmental datasets published in the PANGAEA World Data Center.
Geoserver: Publication and sharing of geodata
OSIS: All information about expeditions, numerical models and experiments.
ZPL : Search for the rock samples and sediment cores stored at GEOMAR.
WDC Climate: Published datasets in the World Data Center Climate at the German Climate Computing Center (DKRZ).
GEOMAR OPeNDAP Service: Data from peer-reviewed articles with results from numerical models..
DSHIP Underway Dat of RVs: The recorded underway data of the German research vessels are transferred ashore and archived in the long term. They can be accessed and exported via interlinked web services at GEOMAR, BSH and AWI.
Google Earth Engine: Find, download and process global satellite data.
USGS Earth Explorer: Source for satellite data; choice from many different satellites; ability to import shape files to export imagery for specific areas.
Boknis Eck Time series data: Monthly samples since 1957 at the time series station Boknis Eck in the western Baltic Sea.
IHO DCDB Bathymetrie Daten-Viewer: Collection of bathymetric data available worldwide, including data from the major international bathymetric data repositories.
Real time Data: Real-time data from scientific platforms installed by GEOMAR research groups.
BIS Biosample Management : biological samples from GEOMAR expeditions
MDI DE Portal: Platform for marine geodata from Marine Data Infrastructure Germany
OBIS : Marine Biodiversity Database
IMLGS from NOAA: marine and lacustrine geological samples
EarthChem: global collection of seabed geochemical samples
Kaggle datasets: AI-ready datasets for a wide range of applications
Digital Earth Viewer: Visualizes spatial time series datasets in real time. The viewer is able to handle different types of data and facilitates interactive exploration of different datasets in one place. As an in-house product, direct support can be provided.
ARENA 2: Explore your data in an in-house projection dome. It visualizes 2-4D geodata, model runs, large format videos, photos and enables telepresence.
BELUGA: Visualization of data from different platforms; besides the visualization of platform data, an essential part of BELUGA is also the underwater network (cummunication and navigation under water).
ArcGIS Add-on Benthic Terrain Modeller: Tool compilation for the analysis and classification of benthic terrain
Geopandas: Python GeoPandas is a popular open source library for working with geospatial data that allows users to easily manipulate, analyze, and visualize geographic information within the Python environment.
QGIS: QGIS is a free and open source geographic information systems (GIS) software that allows users to create, edit, visualize and analyze geographic data.
GDAL: GDAL (Geospatial Data Abstraction Library) is an open source software library that provides a set of tools and libraries for working with raster and vector geospatial data formats and enables versatile geospatial data editing and conversion.
R landscape metrics: R landscape metrics are a collection of quantitative measures and statistics used in the R programming language to assess and analyze the spatial patterns and characteristics of landscapes, making them a valuable tool for landscape ecology and land use planning.
OpenCV: Python OpenCV is a powerful open-source computer vision library that allows developers to perform a wide range of image and video processing tasks using the Python programming language.
Colmap: COLMAP (Structure-from-Motion and Multi-View Stereo) is a computer vision software package that specializes in reconstructing 3D scenes from 2D images, making it valuable for tasks like photogrammetry and 3D modeling.
Metashape: MetaShape, is a professional photogrammetry software that allows users to create high-quality 3D models and maps from a collection of 2D images.
Pandas: The Python package pandas is a powerful and popular data manipulation and analysis library that provides easy-to-use data structures and tools for working with structured data.
Bokeh: The Python package Bokeh is a data visualization library that provides a simple and interactive way to create web-based visualizations for modern browsers.
Holoviz: The Python Holoviz package is a collection of open-source data visualization and exploration tools that allow users to quickly create interactive visualizations with minimal code.
Panel: The Python Panel package is a library that allows users to easily create interactive web-based dashboards and applications from Python code, supporting a wide range of data sources and visualization tools.
Blender: Blender is a versatile and open source 3D computer graphics toolset that supports modeling, animation, rendering, compositing and much more.
D3.js: Excellent Java Script library for data visualization (more precisely DOM manipulation). Comparatively low level with a steep learning curve.
Machine Learning Playground:Machine Learning Playground is an open-source project with the goal of providing students and interested parties with a guided introduction to the complex world of machine learning.
Hands on ML: A series of Jupyter notebooks that walk through the basics of machine learning and deep learning in Python with Scikit-Learn, Keras, and TensorFlow 2.
R Basics — Everything You Need to Know to Get Started with R: An introductory "Towards Data Science" article on working with R.
Seeing Theory: Seeing Theory is an interactive online resource that provides an intuitive and visual approach to understanding complex probability and statistics concepts.
Distill: Distill is an open access online publishing platform that emphasizes clear, interactive, and visually appealing articles to effectively communicate research findings and concepts across academic disciplines.
Colah: Colah is the blog of a prominent researcher and blogger in the field of artificial intelligence, known for his insightful and accessible writing on deep learning and neural networks.
Scientific color maps: Various citable color maps designed for different scientific visualization applications for download.
Environmental Data Science book: EDS book showcases and supports the publication of data, research and open-source tools using Data Science and AI for characterizing, monitoring and/or modelling a wide diversity of environmental systems.