Data Models
By defining some data models, we ease the process of converting Reddit data to Cypher (Neo4j's query language) code.
Nodes
Nodes are designed to do three things: 1. Get data from Reddit API and store it (Extract) 2. Adjust the data to be compatible with Neo4j (Transform) 3. Generate Cypher code from adjusted data (Load-ish)
In other words, Nodes are our little ETL helpers, helping simplify a relatively complex process.
Degrees
A degree is a string showing the furthermost you can go when looking at connections of a node.
Node types Subreddit, Submission, Redditor have attributes showing their available degrees.
For instance, a Subreddit node has [submissions, comments, replies]
as available degrees.
- Submissions: Get the subreddit node, connect it with submissions under it
- Comments: Do what the previous degree does + connect submissions to comments under them
- This gets all the comments, not just the top comments in comment trees.
- Replies: Do what the previous degree does + connect comments to each other by finding
which one is a reply to the other
To see how degrees are implemented, see Relationships
Node Types
Major types (Subreddit, Submission, Comment, Redditor) are denoted with a class
Minor types (e.g Redditor has Employee, Suspended) are shown as a list of strings in available_types attribute of each major type.
-
Redditor (Inherits from Node)
- Available types: Employee, Suspended
- Available degrees: Submissions, Comments, Replies
- Properties:
"id", "username", "created_utc", "has_verified_email", "employee", "suspended"
- If suspended:
"id", "username", "employee", "suspended"
- If suspended:
-
Submission (Inherits from a helper class SubOrComment which inherits from Node)
- Available types: Archived, Stickied, Locked, Over18
- Available degrees: Comments, Replies
- Properties:
"id", "created_utc", "title", "text", "archived", "stickied", "locked", "over18"
-
Subreddit (Inherits from Node)
- Available types: Over18
- Available degrees: Submissions, Comments, Replies
- Properties:
"id", "created_utc", "name", "over18", "desc"
-
Comment (Inherits from a helper class SubOrComment which inherits from Node)
- Comments do not have minor types
- Comments do not have degrees, since there isn't a component smaller than a Comment.
- Replies to comments are Comment objects too
- Properties:
"id", "created_utc", "text", "is_submitter", "stickied"
Code Sample
from reddit_detective.data_models import Subreddit
sub = Subreddit(api_, "learnpython", limit=100)
print(red.merge_code())
# Output
# MERGE (:Subreddit {id: "2r8ot", created_utc: 1254499181.0, name: "learnpython", desc: "..."})
# desc is truncated since the actual desc is too long
Relationship types
In Neo4j, two nodes can have directed relationships connecting one to the other, allowing us to create a network.
Relationships can also have properties, e.g relationship LIKES
might have a boolean property is_crush
Below there are three types of relationships with the types of nodes they connect shown in this format: (Node1 -> Node2)
-
MODERATES
- (Redditor -> Subreddit)
- No properties
-
UNDER
- (Submission -> Subreddit)
- (Comment -> Submission)
- (Comment -> Comment)
- No properties
-
AUTHORED
- (Redditor -> Submission)
- (Redditor -> Comment)
- No properties
For detailed information about relationship types, see Relationships