Miscellaneous Stuff (To be sorted into plan)
- β¦ all taken care of! (now)
π― Project Overview
Build a simple web application to explore a Neo4j graph of Steam games through predefined graph queries. The app will demonstrate graph database concepts with an intuitive UI and at least one visual graph representation.
π Phase 1: Fix the Scope and Use-Case
Goal
Define what your app actually does in one sentence and avoid feature creep.
Checklist
- [x] Decide on the core user story (e.g. "Explore Steam games, publishers, tags, and player patterns using predefined queries.")
-
Done β Steam games dataset
-
Project Repository:
- https://github.com/JacobFaller/Neo4j_Project_SteamGames
- NOTE: had to split dataset file into two files due to uploade size constraints of Github
-
Simply used a little PowerShell command to do this locally:
$inputFile = "SteamGames.csv"
$linesPerFile = 60000
$header = Get-Content $inputFile -First 1
$lineNumber = 0
$fileIndex = 1
$outputFile = ""
Get-Content $inputFile | Select-Object -Skip 1 | ForEach-Object {
if ($lineNumber -eq 0) {
$outputFile = "SteamGames_part_$fileIndex.csv"
$header | Out-File $outputFile -Encoding UTF8
$fileIndex++
}
$_ | Out-File $outputFile -Append -Encoding UTF8
$lineNumber++
if ($lineNumber -ge $linesPerFile) {
$lineNumber = 0
}
}
-
Then i reviewed the dataset more meticulously to beginn the thinking process regarding the selection and import of data I quickly learned two things. Keep in mind, that this was a review of the first half of the data set, since i had split it into two parts (more or less the same size)
- The dataset was huge!β¦. The (already split!) excel took about 3 minutest simply to open and any change (like adding filters) took another 2-5 minutes (and i have a pretty good processor).
- A lot of the data is not relevant for a graph database. Things like amount of positive or negative ratings, dont add much value, since this only represents cumulative data that adds little in the form of relationships, which defeats the purpose of a graph db, and especially also means that no βhiddenβ knwoledge can be derived from it and its connections to other nodes (if it had any/many). The good news was, that this helped resolve my the first issue (of too much data), since irrelevant data could simply be deleted. This made my life significantly easier, as each deleted column decreased the dataset size, and implicitly increased performance not only of the data cleaning process, but also, presumably, of the future Neo4j instance. Futher details are described below.
-
Defined the following Columns (to keep):
- Name (Primary Entity), AppID, Release date, Price, Discount, Rating Score (Wilson score interval lower bound), Achievements, Recommendations, Supported languages, Windows, Mac, Linux, Developers, Publishers, Categories, Genres, Tags
- NOTE: Deleted several other columns to refine the dataset, because they did not add any value (for a graph db) and to improve performance (since the dataset was quite large initially)
-
Defined the following Graph DB Structure:

| Game Node Title |
Game Node Attribute |
Game Node Attribute |
Game Node Attribute |
Game Node Attribute |
Game Node Attribute |
Game Node Attribute |
Game Node Attribute |
SUPPORTS LANGUAGE |
SUPPORTS PLATFORM |
SUPPORTS PLATFORM |
SUPPORTS PLATFORM |
DEVELOPED BY |
PUBLISHED BY |
HAS CATEGORY |
HAS GENRE |
HAS TAG |
| Name (Primary Entity) |
AppID |
Release date |
Price |
Discount |
Rating Score (Wilson score interval lower bound) |
Achievements |
Recommendations |
Supported languages |
Windows |
Mac |
Linux |
Developers |
Publishers |
Categories |
Genres |
Tags |
ποΈ Phase 2: Set Up Neo4j + Load the Steam Games Dataset
Goal
Have a running Neo4j instance with the Steam games graph.
Checklist
- [x] Select Neo4j DB Technology
- Done β Neo4j graph database deployed as Neo4j AuraDB (free cloud version)
- Create DB instance;