Underestimation of N buildings / SQM in France
I believe that the number of buildings and the total floorspace area of buildings in France is underestimated in the French exposure files. This could result in a risk assessment with lower risks than are actually the case. If we look at the image, we can see that for example Belgium has a higher risk than France. The same is true for Germany:
The data
Occupancy (main usage) | ESRM20 N buildings | French Cadastre N Buildings | Factor | ESRM20 Footprint (sqm) | French cadastre footprint (sqm) | Factor |
---|---|---|---|---|---|---|
Res | 14M | 19.9M | 1.42 | 1.5B | 2.49B | 1.66 |
Com | 300K | 1.4M | 4.6 | 492M | 567M | 0.87 |
Ind | 596K | 477K | 0.8 | 289M | 513M | 1.78 |
Total | 15.3M | 21.8M | 1.42 | 2.28B | 3.57B | 1.57 |
Total with other types (unknown / agricultural / religious) | 15.3 | 50.6M (22.7M are unknown) | 3.3 | 2.28B | 5.6B (1.6B is unknown) | 2.46 |
There's also something strange with the number of buildings / sqm in the commercial and industrial taxonomies: while the number of commercial buildings is smaller in ESRM20, the SQM is bigger. In the industrial buildings the opposite is true: the number of buildings is smaller, but the area is larger. Could it be that the area per building in the commercial types is on average too big and for the industrial types too small?
Code used to get footprint size in France from ESRM20
For the footprint in ESRM20, I am using the total floorspace per building and divide it by the average height of the building.
Maybe good to notice: I think both files area_per_dwelling_France_RES and dwlngs_per_bldngs_France_RES have a typo: feature 9/10 are MCF/LWAL+CDM/H:1
and MCF/LWAL+CDM/H:2
, but the CDM
should be CDN
. At least that's the case in the exposure files themselves.
The code needs taxonomy-lib to run. It's installed with pip install https://git.gfz-potsdam.de/globaldynamicexposure/libraries/taxonomy-lib/-/archive/main/taxonomy-lib-main.zip
.
import pandas as pd
from taxonomylib import Taxonomy
def get_footprint(df):
# Create a `height` attribute, that is returned from the `Taxonomy` class. The attribute can
# look like: H:2, HBET:3-5, HBET:6-
df["height"] = df.apply(lambda item: Taxonomy(item["TAXONOMY"]).get_section('height'),
axis=1)
df["floors"] = None
for idx, item in df.iterrows():
# Split the string at the colon to get the key and the value
k, v = item["height"].split(':')
# If the key is `H`, the value is only one integer
if k == "H":
floors = int(v)
# If the value ends with a `-`, the range is without a limit, therefore we take the
# minimum amount of floors + 1
elif v[-1] == "-":
floors = int(v[:-1]) + 1
# Else we take the highest value in the range. For example for HBET:3-5, the number of
# floors will be 5.
else:
floors = int(v.split('-')[1])
df["floors"][idx] = floors
# The footprint is the total SQM, divided by the amount of floors and multiplied by the
# amount of buildings.
df["footprint"] = (df["AREA_PER_BUILDING_SQM"] / df["floors"]) * df["BUILDINGS"]
df["total_area"] = df["AREA_PER_BUILDING_SQM"]* df["BUILDINGS"]
return df
def get_residential_footprint(df_res, df_dwellings, df_area):
# Merge df_dwellings and df_area based on TAXONOMY
c_df = pd.merge(df_area, df_dwellings, on='TAXONOMY')
# Set average footprint per taxonomy and make a dictionary
sqm_dct = {}
n_dwellings_dct = {}
c_df["average_footprint"] = \
(c_df["AREA_DWELLING_URBAN"] * c_df["DWELLINGS PER BUILDING"]) / c_df["FLOORS"]
for idx, (taxonomy, footprint, n_dwellings) in c_df[["TAXONOMY", "average_footprint", "DWELLINGS PER BUILDING"]].iterrows():
sqm_dct[taxonomy] = footprint
n_dwellings_dct[taxonomy] = n_dwellings
# Create a list of taxonomies of the residential area and match with the values in the
# df_dwellings/df_area datasets.
list_of_taxonomies = df_res["TAXONOMY"].unique()
# The CSV files of dwelling area do not match with the taxonomies in the residential file.
# There is the +LFC tag that is disregarded. Therefore, this one needs to be added to the
# dictionary too. If the `LFC` tag is found, it is removed and matched with the taxonomies
# that do exist in the CSV files of dwelling area.
for t in list_of_taxonomies:
lfc = t.find('+LFC:')
if lfc != -1:
lfc_slash = t[lfc:].find('/')
_t = t[:lfc] + t[lfc + lfc_slash:]
sqm_dct[t] = sqm_dct[_t]
n_dwellings_dct[t] = n_dwellings_dct[_t]
# The footprint is the footprint SQM according to the taxonomy dictionary multiplied by the
# amount of buildings
df_res["footprint"] = df_res.apply(
lambda item: sqm_dct[item["TAXONOMY"]] * item["BUILDINGS"],
axis=1)
df_res["n_dwellings"] = df_res.apply(
lambda item: n_dwellings_dct[item["TAXONOMY"]] * item["BUILDINGS"],
axis=1)
print(f"""
------------------------------------------
Number of dwellings: {df_res["n_dwellings"].sum():.2f}
------------------------------------------
""")
return df_res
if __name__ == "__main__":
# Open all files
df_dwellings = pd.read_csv('dwlngs_per_bldngs_France_RES.csv', sep=',')
df_area = pd.read_csv('area_per_dwelling_France_RES.csv', sep=',')
df_res = pd.read_csv('Exposure_Model_France_Res.csv', sep=',')
df_com = pd.read_csv('Exposure_Model_France_Ind.csv', sep=',')
df_ind = pd.read_csv('Exposure_Model_France_Com.csv', sep=',')
# print(df_res["OCCUPANTS_PER_ASSET_NIGHT"].sum() + df_com["OCCUPANTS_PER_ASSET_NIGHT"].sum() +
# df_ind["OCCUPANTS_PER_ASSET_NIGHT"].sum())
# Get footprints for each exposure file
df_com = get_footprint(df_com)
df_ind = get_footprint(df_ind)
df_res = get_residential_footprint(df_res, df_dwellings, df_area)
# Sum all footprints
sum_footprint_res = df_res["footprint"].sum()
sum_footprint_com = df_com["footprint"].sum()
sum_footprint_ind = df_ind["footprint"].sum()
sum_footprint = sum_footprint_res + sum_footprint_com + sum_footprint_ind
sum_buildings_res = df_res["BUILDINGS"].sum()
sum_buildings_com = df_com["BUILDINGS"].sum()
sum_buildings_ind = df_ind["BUILDINGS"].sum()
sum_buildings = sum_buildings_res + sum_buildings_com + sum_buildings_ind
print(f"""
------------------------------------------
Footprint
------------------------------------------
Residential: {sum_footprint_res:.2f}
Commercial: {sum_footprint_com:.2f}
Industrial: {sum_footprint_ind:.2f}
Sum: {sum_footprint:.2f}
------------------------------------------
N Buildings
------------------------------------------
Residential: {sum_buildings_ind:.2f}
Commercial: {sum_buildings_com:.2f}
Industrial: {sum_buildings_ind:.2f}
Sum: {sum_buildings:.2f}
------------------------------------------
""")
Query of France Cadastre dataset
Source: data.gouv.fr batiments (Last updated: 24th of March 2023) / alternative dataset with all cadastre information
SQL Query for total number of buildings and footprint size:
SELECT usage_1, count(*), SUM(ST_Area(geometrie, True)) AS area
FROM batiment
GROUP BY usage_1
ORDER BY usage_1 DESC;
usage_1 | count | area
------------------------+----------+--------------------
Sportif | 53062 | 42963005.55154355
Résidentiel | 19921308 | 2487924496.4517083
Religieux | 83440 | 23129203.045996834
Industriel | 477078 | 513365477.32243454
Indifférencié | 22707428 | 1619562691.4066281
Commercial et services | 1399220 | 567139570.0830743
Annexe | 4878844 | 271368712.85938895
Agricole | 1083990 | 577505751.2126371
The Indifférencié
buildings are scattered around, they include some parts of train stations, but also large buildings that are definitely of res/com/ind type, that are failed to be defined as such. See for example here a map of a small part in Paris, with in red Résidentiel
, in blue Commercial et services
, in yellow Industriel
and in black Indifférencié
:
/cc @hcrowley