Snowflake Archives - The Proptech Cloud

How To Set Coordinate Reference Systems (CRS) In Snowflake Using Spatial Reference Identifiers

Jul 30, 2024 | Technology Application

In previous blogs, we’ve covered off what Coordinate Reference Systems (CRS) are, its scope and uses.

In this blog, we’ll cover how to set these on the Snowflake platform for geospatial referencing and analysis.

In Snowflake, you can define the Coordinate Reference System (CRS) by specifying a spatial reference identifier (SRID), which is a unique code that tells you which map or coordinate system you’re using, including its tolerance and resolution (or in other words, how precise and accurate it is).

TL;DR

If using the GEOGRAPHY data type in Snowflake, you won’t need to set the CRS, as it will be automatically set as WGS 84.

To set the CRS of a GEOMETRY data type, determine its SRID, then use the ST_SETSRID() function.

If not explicitly set, the SRID of a GEOMETRY column will be 0.

To convert from one CRS to another CRS on a GEOMETRY column, use the ST_TRANSFORM() function.

Overview

Coordinate Reference System

A Coordinate Reference System (CRS) defines how the two-dimensional, projected map in your GIS relates to real places on the earth. It encompasses:

Datum: Defines the position of the spheroid relative to the centre of the earth.
Projection: Converts the 3D surface of the earth to a 2D map.
Coordinate system: Defines how the coordinates relate to positions in the real world.

Spatial Reference System Identifier (SRID)

An SRID is a unique identifier associated with a CRS. It is a numeric value that references a specific CRS definition in a spatial database or standard, like the EPSG (European Petroleum Survey Group) codes.

Key Points

CRS is the comprehensive system that includes all the information needed to translate between coordinate systems and real-world positions.

SRID is an identifier for a specific CRS.

Example:
EPSG:4326 is a common SRID, where 4326 is the SRID that corresponds to the WGS 84 CRS (used by GPS).

What are the benefits of setting SRID in Snowflake?

Setting up an SRID in Snowflake ensures data consistency by aligning all spatial data to the same coordinate system, enhancing accuracy with precise tolerance and resolution information.

It facilitates interoperability between systems, enables advanced geospatial analysis and maintains data integrity by providing a defined coordinate framework.

This allows users to perform complex spatial queries efficiently in Snowflake.

How to set CRS and SRID in Snowflake

Snowflake provides the following data types for geospatial data:

The GEOGRAPHY data type, which models Earth as though it were a perfect sphere.
The GEOMETRY data type, which represents features in a planar (Euclidean, Cartesian) coordinate system.

The GEOGRAPHY data type follows the WGS 84 standard (spatial reference ID 4326).

The GEOMETRY data type represents features in a planar (Euclidean, Cartesian) coordinate system.

The coordinates are represented as pairs of real numbers (x, y). Currently, only 2D coordinates are supported.

The units of the X and Y are determined by the spatial reference system (SRS) associated with the GEOMETRY object. The spatial reference system is identified by the SRID number.

Unless the SRID is provided when creating the GEOMETRY object or by calling ST_SETSRID, the SRID is 0.

ST_SETSRID()

Returns a GEOMETRY object that has its SRID set to the specified value.

Use this function to change the SRID without affecting the coordinates of the object. If you also need to change the coordinates to match the new SRS, use ST_TRANSFORM instead.

ST_TRANSFORM()

Converts a GEOMETRY object from one spatial reference system SRS to another.

Use this function to change the SRID and the coordinates of the object to match the new SRS (spatial reference system).

If you just need to change the SRID without changing the coordinates (e.g. if the SRID was incorrect), use ST_SETSRID instead.

Syntax

ST_SETSRID( <geometry_expression> , <srid> )

Examples

The following example creates and returns a GEOMETRY object that uses the SRID 4326:

ALTER SESSION SET GEOMETRY_OUTPUT_FORMAT='EWKT';

SELECT ST_SETSRID(TO_GEOMETRY('POINT(13 51)'), 4326);

Syntax

ST_TRANSFORM( <geometry_expression> [ , <from_srid> ] , <to_srid> );

Examples
The following example transforms a POINT GEOMETRY object from EPSG:32633 (WGS 84 / UTM zone 33N) to EPSG:3857 (Web Mercator).

-- Set the output format to EWKT

ALTER SESSION SET GEOMETRY_OUTPUT_FORMAT='EWKT';

SELECT

ST_TRANSFORM(

ST_GEOMFROMWKT('POINT(389866.35 5819003.03)', 32633),

3857

) AS transformed_geom;

After setting the SRID on a GEOMETRY object, you can check if it has been applied correctly using the ST_SRID() function.

ST_SRID()

Returns the SRID (spatial reference system identifier) of a GEOGRAPHY or GEOMETRY object.

Currently, for any value of the GEOGRAPHY type, only SRID 4326 is supported and is returned.

Syntax

ST_SRID( <geography_or_geometry_expression> )

Examples
This shows a simple use of the ST_SRID function:

SELECT ST_SRID(ST_MAKEPOINT(37.5, 45.5)); +-----------------------------------+ | ST_SRID(ST_MAKEPOINT(37.5, 45.5)) | |-----------------------------------| | 4326 |
+———————————–+

Subscribe to our newsletter

Subscribe to receive the latest blogs and data listings direct to your inbox.

The Three Primary Methods of Real Estate Data Integration

cLearn the three primary methods of real estate data integration—geospatial relationships, title matches, and address matching—to improve accuracy, insights, and decision-making.

What’s the Difference Between GDA94 and GDA2020?

Geodetic datums, or geodetic systems, are often used by proptechs. Here is a rundown of everything you need to know about the different geodetic datums we use and reference in Australia.

Alternative Data: What Is It, Who Uses It And Why It Matters

Discover the powerful intel alternative data can offer. Learn why businesses and investors are turning to non-traditional data sources for deeper insights and smarter decisions.

What Is A Geohash And How Is It Used?

Discover what a geohash is, how it works, and its real-world applications in mapping, logistics and data analysis.

What Is A Cadastre?

Learn how cadastres capture property boundaries, ownership details, and values, and see how technology is transforming them with 3D models, GIS and digital platforms.

How to Predict Migration Patterns using Auspost Movers Statistics Data and Snowflake’s Cortex ML functions

Apr 11, 2024 | Inference and Analytics, Technology Application

Data consultancy, Data Army taps into the hidden insights of the Australia Post Movers Statistics dataset to forecast the future hotspots in Australia—predicting where people are likely to move next.

By analysing trends in mail redirection requests through the Time Series Forecasting function, we provide valuable insights with precision.

This advanced analysis is a feature of Snowflake Cortex, Snowflake’s comprehensive Artificial Intelligence and Machine Learning service, designed to empower your decisions with data-driven intelligence.

Jump ahead ↓

Overview of Data Used
Introduction to Snowflake’s Functionality & Technology Stack
Steps to Create the Forecasts Technical How-To Guide
Steps to Create the Visualisations Technical How-To Guide
Summary

Overview of Data Used

The primary dataset is Australia Post’s Movers Statistics currently available from The Proptech Cloud with a limited free trial. Updated monthly, it contains de-identified and aggregated data on moves across Australia based on mail redirection requests for the previous 5 years. For this exercise, we used the data from February 2019 to January 2024. Each entry in the data includes the postcode the household relocated from, the postcode the household relocated to, the month of relocation and the number of the people that relocated.
The secondary dataset used is the Geography Boundaries & Insights – Australia, specifically Australian Bureau of Statistics (ABS) 2021 Postcode Boundary Data (ABS_POA_2021_AUST_GDA2020) to conduct geospatial visualisations. This dataset is free from The Proptech Cloud.

Australia Post Movers Statistics Data

This dataset contains five years of de-identified, aggregated information on past moves captured by Australia Post’s Mail Redirection service.

Access Australia Post mail redirect statistics now to help you develop competitive data-driven strategies.

Access data listing

Introduction to Snowflake Functionality & Technology Stack

The Snowflake Forecasting Model is part of the Snowflake Cortex ML- Powered Functions. This model uses a Machine Learning algorithm to predict future trends from historical data.
SnowPark which is the set of libraries in Snowflake which will allow us to deploy and process the data pipeline using Python
Streamlit which is used to visualise the forecasts created using interactive Python apps. This functionality is fully integrated within the Snowflake Platform

Introduction to Snowflake’s Forecasting Model

Snowflake Cortex is a service by Snowflake which offers Machine Learning (ML) and Artificial Intelligence (AI) solutions.

This blog will focus on using the Time-Series Forecasting Model which is part of this service. The ML forecasting model used in this algorithm is Gradient Boosting (GB).

The intuition behind the GB algorithm is that the combination of multiple models that learn and improve on each other will perform better than one model. To implement this approach, the GB algorithm will firstly implement the best possible model on the dataset.

The second model will assess where the first model performs poorly and try to improve on these areas. This process continues and the models continue to learn and iterate on one another until the model iteration process no longer improves the model outcomes and therefore the optimal model combination is found and used for predictions.

The GB algorithm is very popular due to its ability to learn from itself and it performs very strongly in many different ML problems. One of the typical challenges in using the GB model is that it is usually very computationally expensive and time consuming to iterate and find the optimal model. The advantage of using the Snowflake Cortex Machine Learning Powered Forecasting Model is that it is extremely quick to compute on even very large datasets as it leverages Snowflake’s existing highly scalable infrastructure.

Technical How-To Guide

The forecasting model example shown will use Snowpark to create the data pipeline and use Streamlit for the data visualisations.

Step 1: Accessing the data

Go to the link on the listing to access the Australia Post – Movers Statistics Data.
Click the “Access Data Listing on Snowflake Marketplace” button to access the data listing on the Snowflake Marketplace
Click the ‘Get’ button and the top right of the page to access the Australian Post Movers Statistics dataset. This will then redirect to a link to either create a new Snowflake account or sign in if one already exists.
Once your Snowflake account is set up and running , the Australia post Mover Statistics dataset listing is located within Data Products and then Marketplace as shown by the link below:

5. Click on the Open button to explore the sample data within the Snowflake Environment. This will redirect to a Snowflake worksheet which will show some sample queries.

6. The full product can also be requested from the Marketplace page with the button ‘Request Full Product’ if access to the entire dataset is needed.

Step 2: Setting Up the Example Data

The forecasting model created is a multi-time-series model. The following types of variables were needed to create this:

The series variable to create multiple time series forecasting
A timestamp column
A target column which includes a quantity of interest at each timestamp

The cleaning and transformation of the dataset to prep the data for forecasted was completed by running SQL queries using Snowpark. A sample of this data is shown below:

This data is then saved as a view named migration_people_moving_to to use in the forecasting model.

Step 3: Creating the Forecasting Model

Each row in the migration_people_moving_to view corresponds to the three types of columns needed to create a multi-series forecasting model; the postcode (series column), month (timestamp column) and the number of people who moved into the postcode that month (the target column)

The code to create a forecasting model is as follows:

CREATE OR REPLACE SNOWFLAKE.ML.FORECAST forecasting_model_people_to (

INPUT_DATA => SYSTEM$REFERENCE(‘VIEW’, ‘migration_people_moving_to’),

SERIES_COLNAME => ‘to_postcode’,

TIMESTAMP_COLNAME => ‘timestamp_month’,

TARGET_COLNAME => ‘number_of_people_to_postcode’

)

This will create the forecasting model forecasting_model_people_to

Step 4: Calling and Interpreting the Forecasting Model

The model can then be used to forecast for any number of periods in the future. The model will perform better for forecasting periods that are closer training dataset, and are less reliable when it is used to forecast for periods further into the future.

The code used to forecast the number of people moving into a postcode every month for 12 months and save it to a table is shown below.

BEGIN

CALL forecasting_model_people_to!FORECAST(FORECASTING_PERIODS => 12);

LET x := SQLID;

CREATE OR REPLACE TABLE forecast_12periods_move_to AS

SELECT * FROM TABLE(RESULT_SCAN(:x));

END;

An example of the forecasting model output results are shown below.

Snowflake Marketplace - forecasting model output results

The way to interpret the above output would be to say that the number of people forecasted to move into Postcode 5096 (which covers Para Hills, Para Hills West, Gulfview Heights, Adelaide) in April 2024 is approximately 26. The lower bound and upper bound of 12.6 and 38.7 represent the prediction interval. For the April 2024 forecast into Para Hills, the model is 95% confident that the number of people who will move into postcode 5096 in April 2024 is between 12.6 and 38.7. A smaller prediction interval indicates that there is less error in the model and the estimated forecast is more likely to be accurate and vice versa.

The default prediction interval when calling a model in Snowflake is 95%. However, this can be configured when calling the model by adding a prediction interval. The code below shows how to call a model with an 80% prediction interval:

CALL forecasting_model_people_to!FORECAST(FORECASTING_PERIODS => 12, CONFIG_OBJECT => {'prediction_interval': 0.8})

Step 5: Visualising the forecasts using Snowpark and Streamlit

The 12 months results of the forecasting model were then aggregated to produce the total of number people forecasted to move into each postcode across Australia.

The data was then also joined with the Australian Postcode boundaries from the Geography Boundaries & Insights – Australia to allow for geospatial visualisations.

The visualisations were hosted using Streamlit within the Snowflake User Interface.

Streamlit is an open source python library which allows for the creation and sharing of data web applications. Using Streamlit within the Snowflake console allows for the flexibility to securely clean, transform and visualise the data in one place, without the need for any external environments.

Data Visualisation – Greater Melbourne Region

The visualisation shows the postcodes that people are moving to in the Greater Melbourne region.

The green and yellow regions show the places where high numbers of people are forecasted to move into in the next year, while the purple and blue regions show the regions that are forecasted to have a lower amount of relocation in the next year.

Interestingly, the visualisation shows that places in the outer East including Cranbourne, Clyde North and the outer west including Point Cook and Werribee South. The inner city postcodes which include suburbs such as Fitzroy, Brunswick and North Melbourne are forecasted to have much less migration in the next year.

Streamlit - Data Visualisation of migrations

Streamlit - Data Visualisation of migrations SYD

Data Visualisation – Greater Sydney Region

A similar visualisation was done in the Greater Sydney area, where a similar trend was observed.

High levels of migration are forecasted for outer-city areas including Kellyville and North Kellyville and outer-city south west south west areas including Camden and Oran Park.

Like Melbourne, there seems to be less migration forecasted for inner city suburbs including Chippendale, Ultimo and Redfern.

Steps to Create the Visualisations

The following steps were performed to create the geospatial visualisations.

Firstly, the base steps to create a Streamlit App were completed. This includes creating an app and selecting a warehouse to run the queries. This will then create a Snowpark worksheet which allows the creation of a Streamlit app using Python. The Streamlit environment also needs to be set up to allow for the ingestion of packages which requires the CREATE STREAMLIT permission.

The third-party packages were then ingested using the Packages tab at the top of the worksheet. Only packages which are supported by Snowflake are able to be ingested to ensure that the Snowflake platform remains secure. Both Matplotlib and Pydeck were ingested to create these visualisations.

The required packages were then imported to create the Streamlit visualisation

# Import python packages

import streamlit as st

from snowflake.snowpark.context import get_active_session

import json

import pydeck as pdk

import matplotlib.pyplot as plt

import matplotlib as mpl

The Snowpark package was used to connect the worksheet to the table containing the 12 month forecasting data in Snowflake. The postcode geospatial boundaries were also obtained, joined to the forecasting data and converted into a geojson format. This was achieved using the code below:

session = get_active_session()

session.sql ("""SELECT

POA_CODE_2021 as POSTCODE_NAME,

NUMBER_OF_PEOPLE,

ST_ASGEOJSON(geometry) AS geojson

FROM

forecast_12periods_move_to --forecasting model table created in Step 3

INNER JOIN ABS_POA_2021_AUST_GDA2020

ON POA_CODE_2021 = TO_POSTCODE

""").collect()

Each row in the query represents the postcode, the number of people forecasted to move into the postcode in the next year and a geojson representing to geometry of the postcode boundary. Further transformations were done on the result so that each row in the query result was transformed into a dictionary. A key aspect of this transformation was assigning RGB colour code to each postcode depending on the number of people forecasted to migrate to that postcode. A sample of the geojson format is shown below:

map_geojson = {
"type": "FeatureCollection",
"features":[
{"type": "Feature",
"properties":
  {"postcode": 2000,
'colour':(68, 1, 84)
  },
"geometry":{'geometry in geosjon format'}
},
{"type": "Feature",
"properties":
   {"postcode": 2000,
'colour':(38, 130, 142)
 },
"geometry":{'geometry in geosjon format'}
},
]

}

The base chloropeth map was then set up by assigning the centre and zoom point for the map to render.

pydeck_chloropeth = pdk.Deck(
map_style=None,
 initial_view_state=pdk.ViewState(
latitude={insert centre latitude},
 longitude={insert centre longitude},
 zoom={insert zoom level},
),

The geojson created in the step above was then added to the base map to create the chloropeth layer using the code below.


layers=[
pdk.Layer(
"GeoJsonLayer",
map_geojson, #name of the goejson created earlier
opacity=0.8,
stroked=False,
filled=True,
extruded=True,
wireframe=True,
get_fill_color='properties.colour',
get_line_color=[255, 255, 255],
)
],
)

The legend on the map was created using the colorbar function in the matplotlib library.

fig, ax = plt.subplots(figsize=(6, 1), layout='constrained')
cmap = mpl.cm.viridis
 norm = mpl.colors.Normalize(vmin=0, vmax=1)
fig.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=cmap),
cax=ax, orientation='horizontal', label='Map legend', ticks=[])

Finally, the following lines of code were used to render both the chloropeth map and the legend on the Streamlit app.

st.pydeck_chart(pydeck_chloropeth)

st.pyplot(fig)

`Summary`

In this blog, the Australia Post Movers Statistics Marketplace listing was used along with Snowflake's Cortex ML Forecasting function to forecast the postcodes within Australia that have high levels of population movement.

The Streamlit data visualisations revealed that the postcodes that the highest amount of people were forecasted to move into were predominantly located in the outer-city area. Read our following blog to understand where people are moving to.

The rundown above highlights how the Snowflake Data Platform makes it straightforward for businesses to access quality data and market-leading compute, AI and visualisations all on one neat platform.

`Australia Post Movers Statistics Data`

This dataset contains five years of de-identified, aggregated information on past moves captured by Australia Post's Mail Redirection service.

Access Australia Post mail redirect statistics now to help you develop competitive data-driven strategies.

Access data listing

All rights are reserved, and no content may be republished or reproduced without express written permission from The Proptech Cloud. All content provided is for informational purposes only. While we strive to ensure that the information provided here is both factual and accurate, we make no representations or warranties of any kind about the completeness, accuracy, reliability, suitability, or availability with respect to the blog or the information, products, services, or related graphics contained on the blog for any purpose.

`Subscribe to our newsletter`

Subscribe to receive the latest blogs and data listings direct to your inbox.

   Geohashes and Efficient Geospatial Joins in Snowflake
Sep 28, 2023 | Technology Application
Geohashes are an incredibly useful tool when it comes to spatial analysis. They serve as an encoding system that translates geographic coordinates into a short string of letters and digits, which simplifies and optimises geospatial operations.
One area where geohashes shine is in making geospatial joins more efficient. In this blog, we’ll dive into what geohashes are, and how you can leverage Snowflake’s ST_GEOHASH function to improve your geospatial joins in Snowflake.
What is a geohash?
A geohash is a hierarchical spatial data structure that subdivides space into a grid of cells, each cell having a unique string identifier. Geohashes convert a two-dimensional geographic coordinate (latitude and longitude) into this alphanumeric string. The length of the string determines the precision of the geohash; a longer string means a more precise location.
Read our blog on What is a Geohash for a detailed overview.
 
Source: Geohash Explorer
How geohashes make geospatial joins more efficient
Geospatial joins can be computationally expensive because they often require pairing each record in one dataset with every record in another to calculate distances or find overlaps. This can lead to a computational complexity of O(N*M), which is not ideal for large datasets.
Geohashes simplify this problem by converting the geospatial coordinates into strings. When you want to join based on geographic proximity, you can simply perform a string comparison, which is far less computationally expensive than a full spatial join.
Snowflake and ST_GEOHASH
Snowflake offers native support for geospatial functions, including ST_GEOHASH. Below is a simple example of how you can use this function to create a geohash in Snowflake:
-- Create a geohash for a specific latitude and longitude
SELECT ST_GEOHASH(37.7749, -122.4194, 12) AS geohash;
In this example, 37.7749 is the latitude, -122.4194 is the longitude, and 12 is the precision of the geohash.
To perform a geospatial join using geohashes, you can do the following:
-- Create two tables with geospatial data
CREATE TABLE locations1 (id INT, latitude FLOAT, longitude FLOAT);
CREATE TABLE locations2 (id INT, latitude FLOAT, longitude FLOAT);

-- Populate tables (this is just a representation)
-- ...

-- Add a geohash column to both tables
ALTER TABLE locations1 ADD COLUMN geohash STRING;
ALTER TABLE locations2 ADD COLUMN geohash STRING;

-- Update the geohash columns using ST_GEOHASH
UPDATE locations1 SET geohash = ST_GEOHASH(latitude, longitude, 12);
UPDATE locations2 SET geohash = ST_GEOHASH(latitude, longitude, 12);

-- Perform the join using the geohash
SELECT a.*, b.*
FROM locations1 a, locations2 b
WHERE a.geohash = b.geohash;
 
Geohash – Streamlining geospatial joins
Geohashes offer a streamlined way to perform geospatial joins, drastically reducing the computational resources required. With native functions like ST_GEOHASH in Snowflake, it’s easier than ever to incorporate geohashes into your geospatial workflows. By leveraging the power of geohashes, you can perform complex geospatial analyses more efficiently, saving both time and money.
Read more from The Proptech Cloud
The Three Primary Methods of Real Estate Data Integration
cLearn the three primary methods of real estate data integration—geospatial relationships, title matches, and address matching—to improve accuracy, insights, and decision-making.
Read more
What’s the Difference Between GDA94 and GDA2020?
Geodetic datums, or geodetic systems, are often used by proptechs. Here is a rundown of everything you need to know about the different geodetic datums we use and reference in Australia.
Read more
Alternative Data: What Is It, Who Uses It And Why It Matters
Discover the powerful intel alternative data can offer. Learn why businesses and investors are turning to non-traditional data sources for deeper insights and smarter decisions.
Read more
What Is A Geohash And How Is It Used?
Discover what a geohash is, how it works, and its real-world applications in mapping, logistics and data analysis.
Read more
What Is A Cadastre?
Learn how cadastres capture property boundaries, ownership details, and values, and see how technology is transforming them with 3D models, GIS and digital platforms.
Read more

TL;DR

Overview

Coordinate Reference System

Spatial Reference System Identifier (SRID)

Key Points

What are the benefits of setting SRID in Snowflake?

How to set CRS and SRID in Snowflake

ST_SETSRID()

ST_TRANSFORM()

Syntax

Syntax

ST_SRID()

Syntax

Subscribe to our newsletter

Read more from The Proptech Cloud

Table of Contents

Overview of Data Used

Australia Post Movers Statistics Data

Introduction to Snowflake Functionality & Technology Stack

Introduction to Snowflake’s Forecasting Model

Technical How-To Guide

Step 1: Accessing the data

Step 2: Setting Up the Example Data

Step 3: Creating the Forecasting Model

Step 4: Calling and Interpreting the Forecasting Model

Step 5: Visualising the forecasts using Snowpark and Streamlit

Data Visualisation – Greater Melbourne Region

Data Visualisation – Greater Sydney Region

Steps to Create the Visualisations

Summary

Australia Post Movers Statistics Data

Subscribe to our newsletter

Read more from The Proptech Cloud

What is a geohash?

How geohashes make geospatial joins more efficient

Snowflake and ST_GEOHASH

Geohash – Streamlining geospatial joins

Read more from The Proptech Cloud

Recent Posts

Recent Comments

Data Listings

`Summary`

`Australia Post Movers Statistics Data`

`Subscribe to our newsletter`

`Read more from The Proptech Cloud`

`Recent Posts`

`Recent Comments`

`Data Listings`