Data Ingestion
Datasets are sourced from different sources before consumed for the platform or the statistical models. Each source’s raw data is stored before processing to required formats.
Data Classifier: The raw datasets are then transformed into input variables for the statistical model. These input variables are calculated at the revenue circle level.
Data Pipeline: The data is processed through a series of steps (data pipeline) to be used by different components in the system. For Data pipeline and orchestration, we use RabbitMQ and Prefect.
Data Update: The updation of data is achieved by setting up data pipelines scheduled to run on different intervals based on the data source. Some pipelines update data on a near real time basis.
Data Sources
For the context of this project, we are sourcing data variables from the following data sources to understand flood hazard, exposure, vulnerability, damage & losses and government response.
Each variable is aggregated at the revenue circle level, which is the unit of decision making for the ASDMA.
Hazard Variables
Data Source |
Data Variables |
Frequency |
Last Updated |
Method of data sourcing |
|---|---|---|---|---|
Inundation percentage, Inundation intensity |
Monthly |
2024-08 |
Flood inundation images are accessed through BHUVAN’s WMS server and images of each month are processed |
|
NDVI |
Monthly |
2024-09 |
SENTINEL-2 MSI’s images are accessed from Google Earth Engine (GEE) and then processed for each month |
|
Rainfall |
Monthly |
Sep 2024 |
Indian Meteorological Department (IMD) data is made available through a python package called imd-lib. The data is accessed in the form of daily rasters, which are processed for each month |
|
Elevation, Slope |
One-Time |
2000 |
Digital Elevation Model (DEM) is sourced from Google Earth Engine (GEE) in the form of a raster. Slope is calculated from the DEM using a mathematical expression. The rasters are processed to aggregate values for each revenue circle |
|
Surface runoff |
One-Time |
2019 |
The Global Curve Numbers (GCN) dataset is available as a raster image on GEE Community Catalog. This raster is processed to calculate aggregate value for each revenue circle |
|
Distance from rivers, Drainage density |
One-Time |
2022 |
GIS data accessed from WRIS servers is processed to calculate the variables of interest for each revenue circle |
Exposure Variables
Data Source |
Data Variables |
Frequency |
Last Updated |
Method of data sourcing |
|---|---|---|---|---|
Health centers per revenue circle, Road length per revenue circle, Rail length per revenue circle, School per revenue circle |
One-Time |
2023-09 |
The GIS data from BHARAT MAPS data can be accessed through the ArcGIS REST server. This is a snapshot data that we can access through softwares like QGIS. After accessing the GIS data, we process it to calculate the variables of interest |
|
Proximity to embankment |
One-Time |
2023-09 |
The GIS data from NERDRR can be accessed through their Geo Server. After downloading data on embankments, it was processed to calculate the variable of interest |
|
NDBI |
Monthly |
2023-08 |
SENTINEL-2 MSI’s images are accessed from Google Earth Engine (GEE) and then processed for each month |
Vulnerability Variables
Data Source |
Data Variables |
Frequency |
Last Updated |
Method of data sourcing |
|---|---|---|---|---|
Population, Sex-Ratio, Aged population (>=65), Children population (<=5) |
Yearly |
2024 |
Since the national census got delayed we are using World Pop’s estimates for every year to calculate population in each revenue circle |
|
Mission Antyodaya 2020 |
Net sown area, Availability of domestic electricity, Availability of telephone services, Households with piped water connections, Households without sanitary latrines |
One-Time |
2020 |
Mission Antyodaya 2020 data is processed to find aggregates for each revenue circles. Wherever there are gaps in the Mission Antyodaya data, World Pop estimates are used to cover the gaps |
Damage and Losses
Data Source |
Data Variables |
Frequency |
Last Updated |
Method of data sourcing |
|---|---|---|---|---|
Population affected, Crop area affected, Roads damaged, Bridges damaged, Embankments damaged, Human lives lost, Animals affected, Animals washed away, Erosion damages, Houses damaged |
Monthly |
09-2024 |
ASDMA collects Flood damages datasets on a daily basis and makes it available in the form of FRIMS system. We plan to get access to FRIMS datasets through an API access. We would then process it to calculate the variables of interest for each month for each revenue circle |
Government Response
Data Source |
Data Variables |
Frequency |
Last Updated |
Method of data sourcing |
|---|---|---|---|---|
Number of relief camps, Number of relief distribution centres, Number of inmates in relief camps, Relief distributed (Oil, Rice, Dal, Salt) |
Monthly |
09-2024 |
ASDMA collects Flood relief datasets on a daily basis and makes it available in the form of FRIMS system. We plan to get access to FRIMS datasets through an API access. We would then process it to calculate the variables of interest for each month for each revenue circle |
|
Total number of flood related tenders awarded, Total awarded amount of tenders, Tenders - scheme wise (SDRF, SOPD, RIDF), Tenders works wise (Roads, Bridges, Embankments, Erosion), Tenders Type (Immediate measure, Repairs, Preparation, Goods) |
Monthly |
09-2024 |
Awarded Tenders (AOC) are scraped for each month from the assam tenders website. The scraped data is then processed to calculate flood related tenders. We then geotag each of the flood related tender to a revenue circle to calculate the variables for each revenue circle |