Session 3 - Data storage and management

Systems and the Web Evolve

From structured to messy

Tables of transactions still exist and still matter
But now we also store text, images, clickstreams, and maps
This is the variety dimension of big data from lecture 1

"Big Data consists of extensive datasets that require a scalable architecture for efficient storage, manipulation, and analysis because of data volume, variety, velocity, and/or variability." (NIST, 2015)

{
  "type": "node",
  "id": 1234567,
  "lat": 1.2136, "lon": -77.2811,
  "tags": {
    "amenity": "school",
    "name": "School 1"
  }
}

A semi-structured OpenStreetMap node. No fixed table, just nested keys and values.

id_household	order	age
A-1	1	40
A-1	2	12
B-3	1	29

id_household	order	activity
A-1	1	1
A-1	2	2

id_household	order	age
A-1	1	40
A-1	2	12

raw code	readable
P6040	age
P3271	gender
P6240	activity
FEX_C18	factor_expansion

P6040	age
P3271	gender
P6240	activity
FEX_C18	factor_expansion

Data storage and management

Last Time

Last Time

Last Time

Last Time

Last Time

The Need to Store

The Need to Store

The Need to Store

The Need to Store

The Need to Store

Systems and the Web Evolve

Systems and the Web Evolve

Systems and the Web Evolve

Systems and the Web Evolve

Databases

Relational Databases

Relational Databases

Relational Databases

Relational Databases

Non-Relational Databases

Non-Relational Databases

Non-Relational Databases

Non-Relational Databases

Read VS Write

Read VS Write

Read VS Write

Big Data Storage

Big Data Storage History

Big Data Storage History

Big Data Storage History

Big Data Storage History

Big Data Storage History

ETL

ETL

ETL

Partitioning

Partitioning

Partitioning

Harmonisation

Harmonisation

Lakehouse

Lakehouse

Conclusions

Conclusions

This Week Notebooks

Many Thanks!