quaintitative

I write about my quantitative explorations in visualisation, data science, machine and deep learning here, as well as other random musings.

For more about me and my other interests, visit playgrd or socials below


Categories
Subscribe

Setting Up a Data Lab Environment - Part 5 - Databases

When we do stuff in Jupyter notebooks, we could save the output in a range of local files, from CSV, to JSON, to HDF5. But there might be instances where it might make sense to save things to a database, be it a SQL database like Postgres, or a NoSQL database like MongoDB.

Usually when we set-up a database on the server, we would have to have it running at some host and port, and then make a connection to the database.

So for MongoDB, we would usually first install and run MongoDB, then install a library like pymongo, and then make a connection to the host and port of the MongoDB instance.

With Docker, it’s similar, but slightly easier. Having a database setup within Docker is as easy as pulling an image. Continuing from where we left off on the docker-compose.yml file in the previous part of this series, we just have to add a few lines right at the end to have access to MongoDB and Postgres databases within the Docker container.

version: '3'
services:
    jupyterone:
    build: docker/jupyter
    ports:
        - "8888:8888"
    volumes:
        - .:/home/jovyan/work
    env_file:
        - config/jupyter.env

# The lines below are the new ones!
    this_mongo:
    image: mongo
    volumes:
        - mongo_data:/data/db
    this_postgres:
    image: postgres
    volumes:
        - postgres_data:/var/lib/postgresql/data
volumes:
    postgres_data:
    mongo_data:

And connecting to it within the Docker container is super easy. Here’s one to make MongoDB play nice with Pandas. We just have to connect it to this_mongo, which is what we named it in the docker-compose.yml file.

from pymongo import MongoClient

def get_mongo_database(db_name, host='this_mongo'):
    conn = MongoClient('this_mongo')
    return conn[db_name]

def dataframe_to_mongo(df, db_name, collection):
    db = get_mongo_database(db_name)
    # 'records' means that it will be saved as an array of objects
    entry = df.to_dict('records')
    db[collection].insert(entry)

def mongo_to_dataframe(db_name, collection, query={}):
    db = get_mongo_database(db_name)
    cursor = db[collection].find(query)
    df =  pd.DataFrame(list(cursor))
    # Remove the mongo id
    if no_id: 
        del df['_id']

    return df

And here’s how to connect to the Postgres database.

import psycopg2 as pg2
import psycopg2.extras as pgex

conn = pg2.connect(host='this_postgres', 
                    user='postgres',
                    database='postgres')

cur = conn.cursor(cursor_factory=pgex.RealDictCursor)

cur.execute("""
BEGIN;
CREATE TABLE jupyter_test(
    _id INTEGER,
    name TEXT,
    list DOUBLE PRECISION[],
    vector BYTEA
);
COMMIT;
""")

cur.execute("""
BEGIN;
INSERT INTO jupyter_test VALUES(1, 'Test1', '{1,2,3,4,5}');
INSERT INTO jupyter_test VALUES(2, 'Test2', '{2,4,6,8,10}');
COMMIT;
""")

cur.execute("""
SELECT * FROM jupyter_test;
""")

result_raw = cur.fetchall()

print(result_raw[0]['name'])

Pretty simple right?


Articles

Comparing Prompts for Different Large Language Models (Other than ChatGPT)
AI and UIs
Listing NFTs
Extracting and Processing Wikidata datasets
Extracting and Processing Google Trends data
Extracting and Processing Reddit datasets from PushShift
Extracting and Processing GDELT GKG datasets from BigQuery
Some notes relating to Machine Learning
Some notes relating to Python
Using CCapture.js library with p5.js and three.js
Introduction to PoseNet with three.js
Topic Modelling
Three.js Series - Manipulating vertices in three.js
Three.js Series - Music and three.js
Three.js Series - Simple primer on three.js
HTML Scraping 101
(Almost) The Simplest Server Ever
Tweening in p5.js
Logistic Regression Classification in plain ole Javascript
Introduction to Machine Learning Right Inside the Browser
Nature and Math - Particle Swarm Optimisation
Growing a network garden in D3
Data Analytics with Blender
The Nature of Code Ported to Three.js
Primer on Generative Art in Blender
How normal are you? Checking distributional assumptions.
Monte Carlo Simulation of Value at Risk in Python
Measuring Expected Shortfall in Python
Style Transfer X Generative Art
Measuring Market Risk in Python
Simple charts | crossfilter.js and dc.js
d3.js vs. p5.js for visualisation
Portfolio Optimisation with Tensorflow and D3 Dashboard
Setting Up a Data Lab Environment - Part 6
Setting Up a Data Lab Environment - Part 5
Setting Up a Data Lab Environment - Part 4
Setting Up a Data Lab Environment - Part 3
Setting Up a Data Lab Environment - Part 2
Setting Up a Data Lab Environment - Part 1
Generating a Strange Attractor in three.js
(Almost) All the Most Common Machine Learning Algorithms in Javascript
3 Days of Hand Coding Visualisations - Day 3
3 Days of Hand Coding Visualisations - Day 2
3 Days of Hand Coding Visualisations - Day 1
3 Days of Hand Coding Visualisations - Introduction