g-weatherill committed Dec 09, 2021 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Exposure to Site Tool - User Manual and Technical Guide  --- Author: Graeme Weatherill Affiliation: Seismic Hazard & Risk Dynamics (Section 2.6) GFZ German Research Centre for Geosciences 14473, Potsdam, Germany Email: graeme.weatherill@gfz-potsdam.de License (source code): GNU General Public License v3.0 (GPLv3) License (document): Creative Commons CC-By-4.0 ---  15 16 17 18 19 20 21 22 The exposure to site tool is a short(ish) Python 3 module for constructing the ESRM20 site amplification files for a given exposure model configuration. It supports several different workflows, depending on whether the user wishes to define the resulting site model from a regularly spaced grid, an existing exposure model or simply a shapefile of administative regions. ### Before you start The tools require additional data files not included with the main code. These files contain the regional scale European site input data. More details can be found in the description below. The data files are distributed with a Createive Commons CC-By-4.0 license and are available for download from: g-weatherill committed Dec 09, 2021 23 24 25 ### Installation 26 The tool requires as standard set of Python libraries that can be installed via most common Python package managers. A setup.py file is included to facilitate installation. We recommend using a Python virtual environment for installing and running the tools (). g-weatherill committed Dec 09, 2021 27 28 29 30 31 The required dependencies are: (Required) 32 33 34 35 36 37 38 39 40 * Python >=3.7 * h5py >=2.10 * Numpy >=1.18 * Scipy >=1.3 * Pandas >=0.25 * Shapely >=1.7, < 2.0 * Pyproj >= 1.9 * matplotlib >= 3.0 * geopandas >= 0.9.0 g-weatherill committed Dec 09, 2021 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 (Additional) * ipython >=7.2 * jupyter >=1.0.0 ##### For Linux/OSX Install Python virtualenv via the Python package manager, e.g. for Ubuntu: bash sudo apt install git python3-venv  On OSX this may be brew or macports Download the code from the Gitlab repository bash git clone https://gitlab.seismo.ethz.ch/efehr/esrm20_sitemodel.git  Create and enter the virtual environment by bash python3 -m venv exp2site source exp2site/bin/activate  Update the python package manager pip: bash pip install --update pip  Then run: 78 g-weatherill committed Dec 09, 2021 79 bash 80 pip install -e esrm20_site_model/ g-weatherill committed Dec 09, 2021 81 82 83 84 85 86 87 88 89  This should install the dependencies and the exposure2site package. Additional packages to help run the code can be installed via bash pip install ipython pip install jupyter  90 91 92 93 94 95 Note: some installations may also require pygeos. If an error is raised indicating that it is not availablr then it can be installed via: bash pip install pygeos  g-weatherill committed Dec 09, 2021 96 97 98 99 100 101 102 103 104 105 ##### Windows For users experienced in creating virtual environments on Windows we suggest following the steps above in the manner best suited to your own Python environment. For other users we suggest installing and running the toolkit using Anaconda. Details on how to install and get started with Anaconda can be found here: . To create the virtual environment via conda run: bash 106 conda create --name exp2site python=3.8 g-weatherill committed Dec 09, 2021 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 conda activate exp2site  Download the code from the EFEHR repository, either from the zip/tar.gz file or, preferably, via Git from the Conda command prompt bash git clone https://gitlab.seismo.ethz.ch/efehr/esrm20_sitemodel.git  Update the python package manager pip: bash pip install --update pip  Navigate to the exposure_site_tool and run the installation bash 125 pip install -e esrm20_site_model/ g-weatherill committed Dec 09, 2021 126 127  128 g-weatherill committed Dec 09, 2021 129 130 131 132 133 134 135 136 137 138 139 140 This should install the dependencies and the exposure2site package. Additional packages to help run the code can be installed via bash pip install ipython pip install jupyter  *** ## Required Datasets 141 142 143 144 145 In addition to the tools themselves, you will need several files containing pre-processed site data. The zipped file containing all of the necessary data is available for download from here: The downloaded data should be unzipped and stored in a directory named site_data, this directory should then be placed into the main code within the folder exposure2site. g-weatherill committed Dec 09, 2021 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 The required datasets comprise 5 files: 1. GEOL_V8_ERA2.shp The European geological data from BRGM stored in shapefile format 2. europe_gebco_topography.hdf5 An hdf5 binary containing all of the required site data for Europe on a 30 arc-second grid, bounded by longitudes -25˚E to 45˚E and latitudes 34.0˚N to 72˚N. The file contains the following attributes longitude: Vector of nlon longitudes latitude: Vector of nlat latitudes elevation: Array (nlat,nlon) of GEBCO elevations (topography and bathymetry slope: Array of (nlat, nlon) slopes derived from the elevation data set using the Horn algorithm availble from the Generic Mapping Tools grdgradient function built_density: Built density from the Global Human Settlement database, rendered from the 250 m x 250 m grid to the 30 arc-second grid geology: Geological era code per 30 arc-second cell 3. europe_2020_vs30.hdf5 Slice of the USGS Global Vs30 dataset for Europe limited to the region enclosed by longitudes -30˚E to 65˚E and latitudes 30˚N to 75˚N. 4. volcanic_front_distances_5min.hdf5 Distances of sites on a 5' x 5' grid from the volcanic front (i.e. forearc-backarc limiting zone). This is needed for the subduction ground motion models and can be safely interpolated from the reference grids to the desired sites. 5. ghs_global_250m_Europe The original built density dataset at 250 m x 250 m spacing in the Mollewiede projection, limited to Europe and the Middle East. The global grid from which this is taken can be downloaded from the Global Human Settlement project, here: 6. ATTENUATION_REGIONS.shp A shapefile containing the ESHM20 attenuation regions to which the sites are assigned Longitudes and latitudes in all 30 arc-second data sets refer to the locations of the cell centres beginning from the integer bounds such that the lower left cell centre is at (llon + (1 / 240), llat + (1 / 240)) and the upper right cell centre is at (ulon - (1 / 240), ulat - (1 / 240)), where llon, llat, ulon and ulat are the lower (left) longitude, lower (south) latitude, upper (right) longitude and upper (north) latitude respectively. *** # General Usage 193 The Ex2Site tool is mostly intended for command line usage (although Jupyter Notebooks illustrating how to call the functions directly are also provided). With the exposure_to_site_tools.py file in the current working directory, and the data files in the subdirectory ./site_data the tool is run by: g-weatherill committed Dec 09, 2021 194 195 bash 196 exposure2site --run=choice_of_workflow --output-file=path/to/output_site_model.csv g-weatherill committed Dec 09, 2021 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288  ## grid The reference grid for the site information is 30 arc-seconds; however, we can downsample to a coarser grid by averaging out the properties of the cells contained on the reference grid for each of the corresponding target grid. This is illustrated in the figure below for the case that we wish to downsample the reference 30 arc-second grid to a 90 arc-second grid. ![](images/downsampling_grid.png "General formulation for downsampling a 30'' grid to 90''") The 90 arc-second grid cell is aligned with the set of 30 arc-second cells such that it contains 9 reference grid cells. Each grid cell is associated with a given property (e.g. slope, vs30 etc.), \$$x_i\$$ for \$$i=1, 2, ... 9\$$. In the case that we wish to treat all cells as evenly weighted then the value of the property, \$$y_j\$$ for the \$$j^{th}\$$ 90 arc-second cell is the simple mean: \$y_j = \frac{1}{9}\sum_i^9 x_i \$ Or, more generally for a grid spaced as a multiple, \$$\gamma\$$, of 30 arc-seconds: \$y_j = \frac{1}{\gamma^2}\sum_i^{\gamma^2} x_i \$ The reference point for the coarser grid that will be used in the site model will always be the centre of the coarse grid (regardless of the weighting), as indicated by the red dot in the figure. For the purposes of defining an optimal site model for the risk calculations we may wish to assign a site property to the grid cell that is closest to the site conditions that affect the greatest proportion of the population. One of the clearest cases where this may be relevant is in the case that one wishes to define the slope, vs30 and geology for coarse cell containing a small town in an upland valley. By just taking the value at the centroid of the cell, it is merely a matter of chance as to whether that point may be representative of a site in the valley (lower slope, lower Vs30 and potentially younger Quaternary geology) or of a hillside (higher slope, higher Vs30, potentially older geology). To potentially improve the assignment of the site property we can weight the high-resolution grid cells by their built environment density (\$$\rho_i\$$), as defined from the global human settlement index. Each grid cell is associated with a built density, which is re-scaled here as a value from 0 (uninhabited) to 1 (densely inhabited), originally defined as 0 and 100 in the GHS dataset. With this information we can weight the averaging of the properties in the cell by the respective built density. To do so we define the weight for each cell by normalising the built density with respect to the total built density: \$w_i = \frac{\rho_i}{\sum_i^{\gamma^2} \rho_i} \$ With each cell assigned then as a weight we can define the "average" properties in one of three ways: MEAN (default) Takes the weighted mean of the data such that \$y_j = \frac{\sum_i^{\gamma^2} w_i \cdot x_i}{\sum_i^{\gamma^2} w_i} = \sum_i^{\gamma^2} w_i \cdot x_i \$ MAX Takes the value of \$$x_i\$$ with the highest weight MEDIAN Takes the weighted median defined by sorting \$$x_i\$$ from lowest normalised weight to highest and taking the value of \$$x_i\$$ for the first cell with a cumulative weight, \$$w_i^c\$$, greater than or equal to 0.5 #### The Smoothed GHS option The GHS layer can be useful for applying weightings in order to bias the centroid location and/or properties toward the areas of highest urban density. There are some limitations to this process, however, in that the maximum value in the GHS is equal to 100 but there is not necessarily an easy way to distinguish between, for example, the downtown of a small town (which may have only a few cells equal to 100) versus that of a large metropolitan region (where a large area of cells may all take the value of 100). This may be problematic when using the MAX or MEDIAN options as the specific maximum or median is somewhat randomly chosen when many cells in a region contain the maximum value. In order to try to elucidate the large dense urban areas from the small towns, the user can apply weighting from a smoothed version of the GHS that uses the following kernel \$GHS^{SMOOTH} (x, y) = \sum_{i=-N}^{N} \sum_{j=-N}^{N} GHS (x + i,y + j) \$ where N is the size of the Moore Neighbourhood (i.e. the number of adjacent cells to the target cell). The consequence of this kernel is that cells surrounded by cells of similarly high density will take a higher value than that of a high urban density cell surrounded by lower density cells. At present the smoothing can be applied assuming a Moore Neighbourhood of \$$N = 1, 3\$$ or \$$5\$$. To use the smoothed GHS the additional keyword argument can be added: --smooth-n # where # is the N size of the Moore Neighbourhood (1, 3 or 5). Or if running the function from within a Jupyter Notebook, the optional keyword argument smooth_n=# should be used (which takes \$$N=0\$$ by default). #### Handling Geological Data The weighting approach described previously may be applicable for the case that the quantity is numerical and thus a "weighted average" can be calculated. For categorical data such as geology, however, the resulting property for the coarse cell should correspond to one of the categories represented in the set of sub-cells. For this purpose as geological era identifier, \$$z_i\$$, is assigned to each cell on the 30 arc-second grid based on the location of the cell centre with respect to the European geological map. To assign the geological property to the coarser grid, the user can choose between one of three options: MODE (default) Assigns the modal geological unit found within the \$$\gamma^2\$$ sub-cells. MAX Assigns the geological unit found at the highest weighted sub-cell MEDIAN Sorts the geological values from low to highest according to the normalised weight and cumulates the weight for the call from 0 to 1. Assigns the geological unit of the cell whose cumulative weight is greater than or equal to 0.5 ### Required grid properties --bbox llon/llat/ulon/ulat Defines the limits of the regular grid --spacing ##.# Defines the spacing of the downsampled grid, which must be a multiple of 30 arc-seconds (i.e. 30, 60, 90, 120, 150 ...) ### Optional grid properties --onshore-only True|False Set whether to clip the resulting site model only to those sites onshore (True default) or to return all grid cells. Onshore is defined as having an averaged elevation greater than -10 m above mean sea level. This value was determined by inspection found to produce good agreement with the European coastlines --weighting True|False Determine whether or not to apply weighting of the averaging by the built density (default=True) --averaging mean|median|max For numerical data this indicates whether to define the "average" property as the mean (default), median or max. These averaging terms can be applied with or without weighting --geological-weighting mode|median|max For the geological data, this controls how the geological unit is determined from the sub-cells, with options of mode (default), median or max --default-xvf ##.## For sites outside of the range of at which the subduction forearc/backarc distance taper applies, this sets the default volcanic front distance. Mostly this can be ignored and the default value of -150 km used. ### grid Example The following command would produce a 120 arc-second grid between 22˚E and 30˚E and 45˚N to 51.5˚N, clipping only to onshore sites, applying building density weighing and taking the resulting property as the maximum numerical value and modal geological category. The result will be exported to example_site_model.csv bash 289 exposure2site --run grid --output-file example_site_model.csv --bbox 22/45/30/51.5 --weighting True --averaging max --geological-weighting mode --onshore-only True g-weatherill committed Dec 09, 2021 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316  *** ## exposure The exposure workflow assumed that the target sites of an exposure model and their corresponding spatial polygons are predefined, and thus the intention is to assign properties to the target site based on an averaging over the corresponding polygons. For the current purpose, it is assumed that the target sites for the exposure are stored in a separate csv file from the polygons, which are themselves stored in a shapefile. An example combination of exposure and target sites is shown below. ![](./images/belgium_admin2.png "Polygons for Belgium at the Admin 2 level") In the tool, the underlying spatial data from the 30 arc-second grids are taken for an area enclosed by the bounding box of the exposure model. The points are then assigned to their respective polygons using standard spatial join methods. At present, the spatial join is done entirely in the WGS84 (longitude, latitude) space in order to reduce computational time re-projecting data (TODO: this may change in the future). Once the intersections have been performed, each of the N polygons contains \$$n_j\$$ sites. From these sites the averages can be taken in the same manner as those done for the grid, and can be weighted by the building density if needed. Therefore, the options --weighting, --averaging, --geological-weighting and --default-backarc-distance all control the definition of the site properties in the same manner as for the grid calculation. ### Required exposure Arguments --input-exposure PATH/TO/input_exposure_model.csv Enter the relative path to the csv exposure file. Within the exposure file the target sites must have the column headers lon and lat. The country and administrative level are interpreted from the filename, which should take the form Exposure_Type_Country.csv --shapefile-dir PATH/TO/SHAPEFILE_DIRECTORY Inputs the path to the directory where the corresponding shapefiles are contained. The code will identify the shapefiles for the country and retrieve the polygons for the highest administrative level for which the exposure is defined. ### Optional exposure Arguments All of the optional arguments specified for the grid option can be used here, with the exception of --onshore-only. ### exposure Example The following command will read the exposure model in the file Exposure_Res_Dummy.csv, whose geometry can be found in the file Adm2_Dummy.shp in the directory Country_Shapefiles. Weighting based on built density will be applied, this time with the value of the site property at the maximum weighted location. Geology will be determined from the modal geological category within the polygon. The result will be written to the output xml file Dummy_Site_Model.xml bash 317 exposure2site --run exposure --output-file Dummy_Site_Model.xml --input-exposure ./Exposure_Res_Dummy.csv --shapefile-dir ./Country_Shapefiles --weighting True --averaging max --geological-weighting mode g-weatherill committed Dec 09, 2021 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376  *** ## admin The admin workflow is a variation of the exposure workflow for which only the administrative polygons of the regions are defined but not the target sites. As the target sites are not defined in the input files, the user must determine how they can be obtained from the geographical data. For this the user is provided with several options, controlled with the command --centroid-weighting. As for the exposure workflow, all \$$i\$$ sites on the regular 30 arc-second grid, and assigned to each of the N polygons such that there are \$$j\$$ sites per polygon. If omitted, the target site is taken from the ordinary centroid of the polygons. Otherwise the centroid is determined from a weighted grid using the built density data set from one of three options: mean Each site \$$X_i\$$ (long, lat) within the polygon is assigned a normalised weight \$$w_i\$$ from the built density data set, as defined previously. The location of the weighted centroid \$$Y_j\$$ is calculated from the locations \$$X_i\$$ of the \$$i\$$ sites via: \$Y_j = \frac{\sum_i^{\gamma^2} w_i \cdot X_i}{\sum_i^{\gamma^2} w_i} = \sum_i^{\gamma^2} w_i \cdot X_i \$ median Each site \$$X_i\$$ within the \$$j^{th}\$$ polygon is assigned the corresponding weight \$$w_i\$$ as before. The sites are then ranked in order of weight and the site corresponding to the 50th percentile weight is selected. max The site corresponding to the highest weight is selected. A comparison of the centroids weighted by building density for the Administrative 2 level of Belgium is shown below. In many cases locational discrepancies can be seen between the definitions, often by a significant margin. ![](./images/belgium_centroids.png "Alternative Centroids for Belgium at the Admin 2 level") Once the centroids have been defined, the admin workflow then proceeds in accordance with the exposure workflow in assigning the site properties on the basis of the "average" properties for the polygon. H ### Required admin properties --input-shapefile path/to/input_shapefile.shp Path to the input administrative shapefile ### Optional admin properties --admin-level # Defines the administrative level at which to define the centroids. The administrative level must be less than or equal to the maximum administrative level in the input shapefile, and if not defined will take the highest administrative level in the shapefile. --centroid-weighting mean|median|max Controls how to define the centroid per polygon as either the built density weighted mean, median or max, or the unweighted centroid if not defined. Other configurable properties (as defined previously) are: --weighting, --averaging, --geological-weighting and --default-backarc-distance. *** ## point The point workflow is a simple way of assigning site properties to a given location. This assumes that the user would like the properties exactly at the site locations specified in the input file. No weighting or averaging is undertaken. This may be useful in the case of the industrial exposure or any other type of exposure where facilities are placed at specific locations The required input file can be any comma-separated text file, but the target sites must be specified under the column names lon and lat, or alternatively xcoord and ycoord, respectively. Note that duplicate points in the resulting site model will be removed! ### Required point properties --input-file path/to/input_site_file.csv Defines the relative path to the input csv file containing the target sites ### Optional point properties --onshore-only True|False Clip the resulting site model file to those sites located only onshore (elevation > 10 m below M.S.L.) (default=True) or retain all input sites (False) ### point Example The following command will define the site model from a simple csv file of longitudes and latitudes (./site_locations.csv) and export the resulting model to ./example_point_site_model.csv, clipping only to onshore sites bash 377 exposure2site --run point --output-file ./example_point_site_model.csv --input-file ./site_locations.csv --onshore-only True g-weatherill committed Dec 09, 2021 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396  ## join This is not a workflow per se, but merely a command to combine multiple site model files into a single site model. Duplicate sites will be removed. The join can apply to a series of files specified by the user, or to an entire directory of site model files. The site models must be in the correct OpenQuake csv or xml format. ### Required join arguments: One of either: --input-dir PATH/TO/INPUT_DIRECTORY Specifying the directory where the site model files are contained --files model1.csv+model2.csv+model3.xml Enumerates the list of files to be concatenated, separated by the + operator (no spaces allowed) ### join example The following command joins together the site model in the file site_model_countryA.csv and the site model in the file site_model_countryB.xml, exporting them to site_model_countriesAB.csv: bash 397 exposure2site --run join --output-file site_model_countriesAB.csv --files site_model_countryA.csv+site_model_countryB.xml g-weatherill committed Dec 09, 2021 398 399 400 401 402  aneberle committed Dec 06, 2021 403