2013-03-14 13:50:03 8 Comments
I am using pandas as a db substitute as I have multiple databases (oracle, mssql, etc) and I am unable to make a sequence of commands to a SQL equivalent.
I have a table loaded in a DataFrame with some columns:
YEARMONTH, CLIENTCODE, SIZE, .... etc etc
In SQL, to count the amount of different clients per year would be:
SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;
And the result would be
201301 5000
201302 13245
How can I do that in pandas?
Related Questions
Sponsored Content
14 Answered Questions
[SOLVED] "Large data" work flows using pandas
- 2013-01-10 16:20:32
- Zelazny7
- 273261 View
- 919 Score
- 14 Answer
- Tags: python mongodb pandas hdf5 large-data
6 comments
@Vivek Ananthan 2019-10-02 14:58:40
With new pandas version, it is easy to get as dataframe
@Gangaraju 2019-08-01 09:38:19
I am also using
nunique
but it will be very helpful if you have to use an aggregate function like'min', 'max', 'count' or 'mean'
etc.@YO and BEN_W 2018-11-23 15:16:20
Using
crosstab
, this will return more information thangroupby
nunique
After a little bit modify ,yield the result
@Enthusiast 2017-07-02 11:16:54
Here is another method, much simple, lets say your dataframe name is
daat
and column name isYEARMONTH
@user5569985 2017-12-18 23:38:21
I like this answer. How can I use this method if my column name has a '.' in it (e.g. 'ck.Class')? Thanks
@Enthusiast 2017-12-20 02:18:05
daat['ck.Class'].value_counts()
@Aaron Schumacher 2018-01-16 19:21:41
This does not address the question asked.
@Jason Goal 2018-10-22 20:36:02
this counting the number of observations within each group, not the unique value of a certain column each group has.
@Corey Levinson 2019-01-23 20:03:58
This is the incorrect answer; it does not reflect the
DISTINCT
requirement from the question! Moreover, it does not include counts ofNaN
!@Roman Kh 2014-05-05 02:59:28
Interestingly enough, very often
len(unique())
is a few times (3x-15x) faster thannunique()
.@user4015990 2015-12-03 00:24:05
You mean this?
.CLIENTCODE.apply(lambda x: len(x.unique()))
, from here@rpanai 2018-04-17 16:03:52
@3novak This doesn't seem to work with a
groupby
.@3novak 2018-04-17 18:59:34
@user32185 you'd have to drop it into an
apply
call with a lambda. For instance,df.groupby('YEARMONTH')['CLIENTCODE'].apply(lambda x: x.unique().shape[0])
.@mlh351 2018-09-24 20:27:57
Syntax isn't completely clear, I used
len(df['column'].unique())
no need for lambda function@Jason Goal 2018-10-22 20:44:20
I got
TypeError: object of type 'method' has no len()
fromChen's
comment,3novak's
worked for me.@Dan Allan 2013-03-14 14:09:06
I believe this is what you want:
Example:
@ErnestScribbler 2017-10-02 08:10:41
What if I have multiple columns that I want to be unique together, like in .drop_duplicates(subset=['col1','col2'])?
@Tarun Khaneja 2018-05-03 12:08:09
How to access this unique count . As there is no column name
@MeHdi 2018-07-22 08:44:01
Thanks lot, I used this style on output of resample. df_watch_record.resample('M').user.nunique() counts the number of unique users who have watched movie per month.
@wllbll 2018-09-10 06:03:13
and sort them with table.groupby('YEARMONTH').CLIENTCODE.nunique().sort_values(ascending=False)
@Josh Hansen 2019-05-09 04:56:57
Is it possible to retrieve the group identifier after
nunique
? Try as I might I cannot find a way, since the result of this answer is aSeries
, not aDataFrame
.@Indrajit 2019-07-17 07:11:59
Is it possible to apply this for multiple columns? Right now in the example, only one column is selected.