Notion - Neo4j Steam Games Graph Project Plan

Miscellaneous Stuff (To be sorted into plan)

… all taken care of! (now)

🎯 Project Overview

Build a simple web application to explore a Neo4j graph of Steam games through predefined graph queries. The app will demonstrate graph database concepts with an intuitive UI and at least one visual graph representation.

📋 Phase 1: Fix the Scope and Use-Case

Goal

Define what your app actually does in one sentence and avoid feature creep.

Checklist

[x] Decide on the core user story (e.g. "Explore Steam games, publishers, tags, and player patterns using predefined queries.")
- Done → Steam games dataset
- Project Repository:
  - https://github.com/JacobFaller/Neo4j_Project_SteamGames
  - NOTE: had to split dataset file into two files due to uploade size constraints of Github
    - Simply used a little PowerShell command to do this locally:
```
$inputFile     = "SteamGames.csv"
$linesPerFile  = 60000
$header        = Get-Content $inputFile -First 1
$lineNumber    = 0
$fileIndex     = 1
$outputFile    = ""

Get-Content $inputFile | Select-Object -Skip 1 | ForEach-Object {
    if ($lineNumber -eq 0) {
        $outputFile = "SteamGames_part_$fileIndex.csv"
        $header | Out-File $outputFile -Encoding UTF8
        $fileIndex++
    }
    
    $_ | Out-File $outputFile -Append -Encoding UTF8
    $lineNumber++
    
    if ($lineNumber -ge $linesPerFile) {
        $lineNumber = 0
    }
}
```
- Then i reviewed the dataset more meticulously to beginn the thinking process regarding the selection and import of data I quickly learned two things. Keep in mind, that this was a review of the first half of the data set, since i had split it into two parts (more or less the same size)
  1. The dataset was huge!…. The (already split!) excel took about 3 minutest simply to open and any change (like adding filters) took another 2-5 minutes (and i have a pretty good processor).
  2. A lot of the data is not relevant for a graph database. Things like amount of positive or negative ratings, dont add much value, since this only represents cumulative data that adds little in the form of relationships, which defeats the purpose of a graph db, and especially also means that no ‘hidden’ knwoledge can be derived from it and its connections to other nodes (if it had any/many). The good news was, that this helped resolve my the first issue (of too much data), since irrelevant data could simply be deleted. This made my life significantly easier, as each deleted column decreased the dataset size, and implicitly increased performance not only of the data cleaning process, but also, presumably, of the future Neo4j instance. Futher details are described below.
- Defined the following Columns (to keep):
  - Name (Primary Entity), AppID, Release date, Price, Discount, Rating Score (Wilson score interval lower bound), Achievements, Recommendations, Supported languages, Windows, Mac, Linux, Developers, Publishers, Categories, Genres, Tags
    - NOTE: Deleted several other columns to refine the dataset, because they did not add any value (for a graph db) and to improve performance (since the dataset was quite large initially)
- Defined the following Graph DB Structure:

Game Node Title	Game Node Attribute	Game Node Attribute	Game Node Attribute	Game Node Attribute	Game Node Attribute	Game Node Attribute	Game Node Attribute	SUPPORTS LANGUAGE	SUPPORTS PLATFORM	SUPPORTS PLATFORM	SUPPORTS PLATFORM	DEVELOPED BY	PUBLISHED BY	HAS CATEGORY	HAS GENRE	HAS TAG
Name (Primary Entity)	AppID	Release date	Price	Discount	Rating Score (Wilson score interval lower bound)	Achievements	Recommendations	Supported languages	Windows	Mac	Linux	Developers	Publishers	Categories	Genres	Tags

🗄️ Phase 2: Set Up Neo4j + Load the Steam Games Dataset

Goal

Have a running Neo4j instance with the Steam games graph.

Checklist

[x] Select Neo4j DB Technology
- Done → Neo4j graph database deployed as Neo4j AuraDB (free cloud version)
- Create DB instance;
  - Neo4J Aura Instance Overview: https://console-preview.neo4j.io/projects/3f4c43f4-5dc2-4a4b-81fc-17caf1ed4598/instances
    - Instance Connection URI: neo4j+s://71308a94.databases.neo4j.io
    - Query API URL: https://71308a94.databases.neo4j.io/db/Steam_Games_DB/query/v2