By morandi3


2011-05-13 17:51:41 8 Comments

I need to have a 5 levels hierarchy for the users registered to a website. Every user is invited by another, and I need to know all descendants for a user. And also ancestors for a user.

I have in mind 2 solution.

  1. Keeping a table with relationships this way. A closure table:

    ancestor_id  descendant_id  distance
    1            1              0
    2            2              0
    3            3              0
    4            4              0
    5            5              0
    6            6              0
    2            3              1
  1. Having this table for relationships. Keeping in a table 5 levels ancestors. A "ancestors" table:

   user_id ancestor_level1_id ancestor_level2_id ancestor_level3_id ancestor_level4_id ancestor_level5_id
   10      9                  7                  4                  3                  2
   9       7                  4                  3                  2                  1

Are these good ideas?

I know about "the adjacency list model" and "the modified preorder tree traversal algorithm", but are these good solutions for a "referral" system?

The queries that I need to perform on this tree are:

  • frequently adding a new users
  • when a user buys something, their referrers get a percentage commission
  • every user should be able to find out how many people they've referred (and how many people were referred by people who they referred....) at each level

4 comments

@Neville Kuyt 2011-05-15 19:41:13

Managing Hierarchical Data in MySQL

In general, I like the "nested set", esp. in MySQL which doesn't really have language support for hierarchical data. It's fast, but you'll need to make sure your developers read that article if ease of maintenance is a big deal. It's very flexible - which doesn't seem to matter much in your case.

It seems a good fit for your problem - in the referral model, you need to find the tree of referrers, which is fast in the nested set model; you also need to know who are the [email protected] of a given user, and the depth of their relationship; this is also fast.

@morandi3 2011-05-16 15:32:03

I don't think this is a good approach for my system, because nested set need to be updated every time when a new user signup, isn't it?!

@Ken Bloom 2011-05-15 21:32:08

Use the OQGRAPH storage engine.

You probably want to keep track of an arbitrary number of levels, rather than just 5 levels. Get one of the MySQL forks that supports the QGRAPH engine (such as MariaDB or OurDelta), and use that to store your tree. It implements the adjacency list model, but by using a special column called latch to send a command to the storage engine, telling it what kind of query to perform, you get all of the advantages of a closure table without needing to do the bookkeeping work each time someone registers for your site.

Here are the queries you'd use in OQGRAPH. See the documentation at http://openquery.com/graph-computation-engine-documentation

We're going to use origid as the referrer, and destid as the referree.

To add user 11, referred by user 10

insert into ancestors_table (origid,destid) values (10,11)

To find all users referred by user 3.

SELECT linkid FROM ancestors_table WHERE latch = 2 AND origid = 3;

To find the ancestors of user 10.

SELECT linkid FROM ancestors_table WHERE latch = 2 AND destid = 10;

To find the number of users at each level, referred by user 3:

SELECT count(linkid), weight
FROM ancestors_table
WHERE latch = 2 AND origid = 3
GROUP BY weight;

@morandi3 2011-05-16 15:39:06

@Ken I don't know if it is a good idea to keep an "arbitrary" number of levels, let's say that we will have about 100 levels. Is it a good think to keep all these levels in database?

@Ken Bloom 2011-05-16 16:33:42

@morandi3: Looking only at the technical constraints (without discussing privacy implications), it depends how you're doing it. Your ancestors table, uses storage space proportional to the maximum number of levels you're keeping track of. OQGRAPH is a storage engine intended especially for performing graph algorithms. It's designed to perform operations like Dijkstra's shortest path algorithm which are normally difficult or impossible in a SQL database, by having a special column in the table to issue a command to the storage engine. It doesn't have the same space penalty.

@morandi3 2011-05-16 16:54:03

@Ken I'm not sure I can use MariaDB or OurDelta on my server. Is it possible to use this engine just for a table from database, or I need to change storage for all tables? Which of my ideas looks fastest/reliable in your opionion?

@Ken Bloom 2011-05-16 16:58:33

@morandi3: you don't need to change storage for all your tables. A storage engine in MySQL is a plugin for creating and managing a particular kind of table. The default table type in MySQL is MyISAM, and the default distribution also allows you to create particular tables with the InnoDB storage engine (for better concurrency in transactions). OQGRAPH is just another kind of table that you can create when the need arises. It doesn't change the default, it doesn't change the storage engine used for existing tables, and it doesn't replace the default storage engines.

@Ken Bloom 2011-05-16 17:05:24

@morandi3: To know what's best, I'd need to know what kinds of queries you wanted to perform. I don't think the nested set model will work well for you since you have lots of insertions. If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.

@morandi3 2011-05-16 17:25:31

@Ken We will have queries for: adding a new user, giving commissions(percent) to the ancestors when a descendant buy something, every user can know how many descendants have on each level (for this I'm thinking to use some fields in database: count_level1, count_level2, where to keep this information. Yes, a delimited string of ancestors should be fine when I need to find out the ancestors, but I need to find also for every user the descendants.

@Ken Bloom 2011-05-16 17:47:28

@morandi3: I now have 2 answers that spell out exactly how to do each of your queries. Neither of these models are very complicated.

@Ken Bloom 2011-05-16 18:13:51

@morandi3: don't forget to accept whichever answer you've decided to actually use, by clicking on the checkmark next to the answer.

@Ken Bloom 2011-05-16 17:14:04

Delimited String of Ancestors

If you're strongly considering the 5-level relationship table, it may simplify things to use a delimited string of ancestors instead of 5 separate columns.

user_id  depth   ancestors
10       7       9,7,4,3,2,1
9        6       7,4,3,2,1
...
2        2       1
1        1       (empty string)

Here are some SQL commands you'd use with this model:

To add user 11, referred by user 10

insert into ancestors_table (user_id, depth, ancestors)
select 11, depth+1, concat(10,',',ancestors)
from ancestors_table
where user_id=10;

To find all users referred by user 3. (Note that this query can't use an index.)

select user_id
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3';

To find the ancestors of user 10. You need to break up the string in your client program. In Ruby, the code would be ancestorscolumn.split(",").map{|x| x.to_i}. There's no good way to break up the string in SQL.

select ancestors from ancestors_table where user_id=10;

To find the number of users at each level, referred by user 3:

select
   depth-(select depth from ancestors_table where user_id=3),
   count(*)
from ancestors_table
where ancestors like '%,3,%' or ancestors like '3,%' or ancestors like '%,3'
group by depth;

You can avoid SQL injection attacks in the like '%,3,%' parts of these queries by using like concat('%,', ?, ',%') instead and binding the an integer for the user number to the placeholder.

@morandi3 2011-05-16 18:17:32

@Ken "ancestors" field shouldn't be in this case "descendants"? Or for what we use "depth"? Because every user has just one ancestor at each level.

@Ken Bloom 2011-05-16 18:23:27

@morandi3: in the last query, you're looking for "nodes that have 3 as an ancestor", and grouping them by how far below that node 3 they are. The depth column keeps track of how many ancestors a certain node has, grouping them into levels in the database.

@morandi3 2011-05-16 18:34:34

@Ken Yes, but a user has just one ancestor/level

@Ken Bloom 2011-05-16 18:40:17

@morandi3: I think you're confusing parent with ancestor. User 10 has just one parent (user 9). And user 9 has one parent (user 7). User 7 also has one parent (user 4). All of these parents and parents of parents are called the "ancestors" of user 10. The depth, therefore, is the number of hops you have to go before you reach someone who wasn't referred by a friend (someone who found your site by an advertisement or a random search).

@morandi3 2011-05-16 18:43:43

@Ken Ok, now I understand, depth is the length of the chain. Hmm, but with this solution is hard to find descendants for a user.

@Ken Bloom 2011-05-16 18:45:05

@morandi: finding the descendants is the second query. (Remember, just like ancestors, descendants are children and children of children and so forth) Did you mean finding just the children?

@morandi3 2011-05-16 18:50:27

@Ken what I wanted to say is that: finding descendants for a specific level will be slow because we can't use a index for this

@Ken Bloom 2011-05-16 18:55:18

@morandi3: umm yeah. That is the downside of this model. (You could put an index on the depth field, but I doubt that would help much.) That's basically why SQL and graphs/trees don't mix, in a nutshell, and why the OQGRAPH storage engine was invented.

@morandi3 2011-05-16 19:08:51

@Ken Hmm, yes, OQGRAPH seems a good solution. Do you think that a closure table (ancestor_id descendant_id distance) or using 5 fields, one for each level will slow down things to much?

@Ken Bloom 2011-05-16 19:13:44

@morandi3: I think that using 5 fields will be very unwieldy to work with. I'll post an answer for how to do things with a closure table though, and you can decide whether that will work.

@Ken Bloom 2011-05-16 19:22:38

@morandi3: The closure table actually looks workable, if you're not worried about the space blowup that comes from listing each ancestor many times.

@morandi3 2011-05-16 19:32:12

@Ken: I'm a bit worried, because for 50.000 users we will have a table with about 300.000 rows

@Ken Bloom 2011-05-16 19:21:03

Closure Table

ancestor_id  descendant_id  distance
    1            1              0
    2            2              0
    3            3              0
    4            4              0
    5            5              0
    6            6              0
    2            3              1

To add user 10, referred by user 3. (I don't think you need to lock the table between these two insertions):

insert into ancestor_table
select ancestor_id, 10, distance+1
from ancestor_table
where descendant_id=3;

insert into ancestor_table values (10,10,0);

To find all users referred by user 3.

select descendant_id from ancestor_table where ancestor_id=3;

To count those users by depth:

select distance, count(*) from ancestor_table where ancestor_id=3 group by distance;

To find the ancestors of user 10.

select ancestor_id, distance from ancestor_table where descendant_id=10;

The drawback to this method is amount of storage space this table will take.

@morandi3 2011-05-16 19:29:35

All your solutions looks good :D It's hard to find the best. I think I will choose this for now. But also OQGRAPH storage engine seems to be a reliable solution. Thanks

Related Questions

Sponsored Content

48 Answered Questions

[SOLVED] How do I import an SQL file using the command line in MySQL?

13 Answered Questions

[SOLVED] Which MySQL data type to use for storing boolean values

47 Answered Questions

[SOLVED] How do I quickly rename a MySQL database (change schema name)?

24 Answered Questions

[SOLVED] How to reset AUTO_INCREMENT in MySQL?

23 Answered Questions

[SOLVED] How do I connect to a MySQL Database in Python?

  • 2008-12-16 21:49:09
  • Marc Lincoln
  • 1255919 View
  • 1169 Score
  • 23 Answer
  • Tags:   python mysql

39 Answered Questions

[SOLVED] Should I use the datetime or timestamp data type in MySQL?

10 Answered Questions

[SOLVED] Duplicating a MySQL table, indices, and data

  • 2010-07-19 09:53:57
  • xkcd150
  • 480205 View
  • 677 Score
  • 10 Answer
  • Tags:   mysql

15 Answered Questions

[SOLVED] How to get a list of user accounts using the command line in MySQL?

1 Answered Questions

[SOLVED] Get minimum from result with GROUP BY in MySQL

Sponsored Content