#### [SOLVED] How are iloc and loc different?

By AZhao

Can someone explain how these two methods of slicing are different?
I've seen the docs, and I've seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a `DataFrame`. How is it that these two work?

``````df.loc[:5]
df.iloc[:5]
``````

Can someone present three cases where the distinction in uses are clearer?

Once upon a time, I also wanted to know how these two functions differ from `df.ix[:5]` but `ix` has been removed from pandas 1.0, so I don't care anymore!

#### @JoeCondron 2015-07-23 17:17:27

`iloc` works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

``````df.iloc[0]
``````

or the last five rows by doing

``````df.iloc[-5:]
``````

You can also use it on the columns. This retrieves the 3rd column:

``````df.iloc[:, 2]    # the : in the first position indicates all rows
``````

You can combine them to get intersections of rows and columns:

``````df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)
``````

On the other hand, `.loc` use named indices. Let's set up a data frame with strings as row and column labels:

``````df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])
``````

Then we can get the first row by

``````df.loc['a']     # equivalent to df.iloc[0]
``````

and the second two rows of the `'date'` column by

``````df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]
``````

and so on. Now, it's probably worth pointing out that the default row and column indices for a `DataFrame` are integers from 0 and in this case `iloc` and `loc` would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, `df.loc[:5]` would raise an error.

Also, you can do column retrieval just by using the data frame's `__getitem__`:

``````df['time']    # equivalent to df.loc[:, 'time']
``````

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where `.ix` comes in:

``````df.ix[:2, 'time']    # the first two rows of the 'time' column
``````

I think it's also worth mentioning that you can pass boolean vectors to the `loc` method as well. For example:

`````` b = [True, False, True]
df.loc[b]
``````

Will return the 1st and 3rd rows of `df`. This is equivalent to `df[b]` for selection, but it can also be used for assigning via boolean vectors:

``````df.loc[b, 'name'] = 'Mary', 'John'
``````

#### @Alvis 2017-05-03 10:03:16

Is df.iloc[:, :] equivalent to all rows and columns?

#### @JoeCondron 2017-05-03 20:45:41

It is, as would be `df.loc[:, :]`. It can be used to re-assign the values of the entire `DataFrame` or create a view of it.

#### @Marine Galantin 2020-06-10 17:27:40

hi, do you know why loc and iloc take parameters in between the square parenthesis [ ] and not as a normal method in between classical parenthesis ( ) ?

#### @Ted Petrou 2017-10-24 16:39:52

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for `.iloc` and instead, prefer integer location as it is much more descriptive and exactly what `.iloc` stands for. The key word is INTEGER - `.iloc` needs INTEGERS.

See my extremely detailed blog series on subset selection for more

### .ix is deprecated and ambiguous and should never be used

Because `.ix` is deprecated we will only focus on the differences between `.loc` and `.iloc`.

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let's take a look at a sample DataFrame:

``````df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
'height':[165, 70, 120, 80, 180, 172, 150],
'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
},
index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])
``````

All the words in bold are the labels. The labels, `age`, `color`, `food`, `height`, `score` and `state` are used for the columns. The other labels, `Jane`, `Nick`, `Aaron`, `Penelope`, `Dean`, `Christina`, `Cornelia` are used for the index.

The primary ways to select particular rows in a DataFrame are with the `.loc` and `.iloc` indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

## .loc selects data only by labels

We will first talk about the `.loc` indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for `.loc`

• A string
• A list of strings
• Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following `.loc`.

``````df.loc['Penelope']
``````

This returns the row of data as a Series

``````age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object
``````

Selecting multiple rows with .loc with a list of strings

``````df.loc[['Cornelia', 'Jane', 'Dean']]
``````

This returns a DataFrame with the rows in the order specified in the list:

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

``````df.loc['Aaron':'Dean']
``````

Complex slices can be taken in the same manner as Python lists.

## .iloc selects data only by integer location

Let's now turn to `.iloc`. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for `.iloc`

• An integer
• A list of integers
• Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

``````df.iloc[4]
``````

This returns the 5th row (integer location 4) as a Series

``````age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object
``````

Selecting multiple rows with .iloc with a list of integers

``````df.iloc[[2, -2]]
``````

This returns a DataFrame of the third and second to last rows:

Selecting multiple rows with .iloc with slice notation

``````df.iloc[:5:3]
``````

## Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both `.loc/.iloc` is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

``````df.loc[['Jane', 'Dean'], 'height':]
``````

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with `.iloc` using only integers.

``````df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object
``````

### Simultaneous selection with labels and integer location

`.ix` was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows `Nick` and `Cornelia` along with columns 2 and 4, we could use `.loc` by converting the integers to labels with the following:

``````col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names]
``````

Or alternatively, convert the index labels to integers with the `get_loc` index method.

``````labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]
``````

### Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the `food` and `score` columns we can do the following:

``````df.loc[df['age'] > 30, ['food', 'score']]
``````

You can replicate this with `.iloc` but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

``````df.iloc[(df['age'] > 30).values, [2, 4]]
``````

### Selecting all rows

It is possible to use `.loc/.iloc` for just column selection. You can select all the rows by using a colon like this:

``````df.loc[:, 'color':'score':2]
``````

### The indexing operator, `[]`, can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

``````df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object
``````

Using a list selects multiple columns

``````df[['food', 'score']]
``````

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

``````df['Penelope':'Christina'] # slice rows by label
``````

``````df[2:6:2] # slice rows by integer location
``````

The explicitness of `.loc/.iloc` for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

``````df[3:5, 'color']
TypeError: unhashable type: 'slice'
``````

#### @pragun 2019-05-30 22:34:17

Wow, this was one of the very well articulated and lucid explanations that i have ever come across of a programming topic, What you explained in the last about normal indexing which works either on row or columns is one of the reason we have loc and iloc method. I came across that caveat in the datacamp course. a.) What do df.columns and df.index return? Is it a list of strings? If it is a list, is it allowed to access two elements like this df.columns[ [2,4] ] in a list? b.) Can i call get_loc() on df.columns? c.) Why do we need to call df['age']>30.values in case of iloc.

#### @omabena 2020-05-16 22:17:03

This is a really good answer, I liked that it doesn't get much into ix, which is deprecated and pointless to dive deep. Thanks.

#### @Ben Usman 2020-07-28 13:50:41

This should be the top answer!

#### @Alex Riley 2015-07-23 16:59:47

Note: in pandas version 0.20.0 and above, `ix` is deprecated and the use of `loc` and `iloc` is encouraged instead. I have left the parts of this answer that describe `ix` intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to `ix`.

First, here's a recap of the three methods:

• `loc` gets rows (or columns) with particular labels from the index.
• `iloc` gets rows (or columns) at particular positions in the index (so it only takes integers).
• `ix` usually tries to behave like `loc` but falls back to behaving like `iloc` if a label is not present in the index.

It's important to note some subtleties that can make `ix` slightly tricky to use:

• if the index is of integer type, `ix` will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

• if the index does not contain only integers, then given an integer, `ix` will immediately use position-based indexing rather than label-based indexing. If however `ix` is given another type (e.g. a string), it can use label-based indexing.

To illustrate the differences between the three methods, consider the following Series:

``````>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
``````

We'll look at slicing with the integer value `3`.

In this case, `s.iloc[:3]` returns us the first 3 rows (since it treats 3 as a position) and `s.loc[:3]` returns us the first 8 rows (since it treats 3 as a label):

``````>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
``````

Notice `s.ix[:3]` returns the same Series as `s.loc[:3]` since it looks for the label first rather than working on the position (and the index for `s` is of integer type).

What if we try with an integer label that isn't in the index (say `6`)?

Here `s.iloc[:6]` returns the first 6 rows of the Series as expected. However, `s.loc[:6]` raises a KeyError since `6` is not in the index.

``````>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6
``````

As per the subtleties noted above, `s.ix[:6]` now raises a KeyError because it tries to work like `loc` but can't find a `6` in the index. Because our index is of integer type `ix` doesn't fall back to behaving like `iloc`.

If, however, our index was of mixed type, given an integer `ix` would behave like `iloc` immediately instead of raising a KeyError:

``````>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN
``````

Keep in mind that `ix` can still accept non-integers and behave like `loc`:

``````>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN
``````

As general advice, if you're only indexing using labels, or only indexing using integer positions, stick with `loc` or `iloc` to avoid unexpected results - try not use `ix`.

### Combining position-based and label-based indexing

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?

``````>>> df = pd.DataFrame(np.nan,
index=list('abcde'),
columns=['x','y','z', 8, 9])
>>> df
x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN
``````

In earlier versions of pandas (before 0.20.0) `ix` lets you do this quite neatly - we can slice the rows by label and the columns by position (note that for the columns, `ix` will default to position-based slicing since `4` is not a column name):

``````>>> df.ix[:'c', :4]
x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
``````

In later versions of pandas, we can achieve this result using `iloc` and the help of another method:

``````>>> df.iloc[:df.index.get_loc('c') + 1, :4]
x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN
``````

`get_loc()` is an index method meaning "get the position of the label in this index". Note that since slicing with `iloc` is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

There are further examples in pandas' documentation here.

#### @measureallthethings 2015-07-23 18:36:57

Great explanation! One related question I've always had is what relation, if any, loc, iloc and ix have with SettingWithCopy warnings? There is some documentation but to be honest I'm still a little confused pandas.pydata.org/pandas-docs/stable/…

#### @Alex Riley 2015-07-23 18:56:18

@measureallthethings: `loc`, `iloc` and `ix` might still trigger the warning if they are chained together. Using the example DataFrame in the linked docs `dfmi.loc[:, 'one'].loc[:, 'second']` triggers the warning just like `dfmi['one']['second']` because a copy of data (rather than a view) might be returned by the first indexing operation.

#### @cjm2671 2016-04-29 08:51:31

What do you use if you want to lookup a DateIndex with a Date, or something like `df.ix[date, 'Cash']`?

#### @Alex Riley 2016-04-29 09:18:12

@cjm2671: both `loc` or `ix` should work in that case. For example, `df.loc['2016-04-29', 'Cash']` will return all row indexes with that particular date from the 'Cash' column. (You can be as specific as you like when retrieving indexes with strings, e.g. `'2016-01'` will select all datetimes falling in January 2016, `'2016-01-02 11' will select datetimes on January 2 2016 with time 11:??:??.)

#### @JohnE 2016-12-20 18:00:03

In case you want to update this answer at some point, there are suggestions here for how to use loc/iloc instead of ix github.com/pandas-dev/pandas/issues/14218

#### @Antony Hatchkins 2017-12-26 04:01:01

Thanks! Nice and clear explanation! One suggestion: maybe it makes sense to move everything related to `ix` to a separate section of the answer if it is deprecated now?

#### @Thomas 2018-06-20 15:48:36

I am really sad they deprecated ix. Usually, I know how my dataframe looks, and writing my_dataframe_name.get_loc() and then remembering that I have to offset by 1 feels incredibly unwieldy compared to R's syntax...

#### @Ben 2019-07-01 07:00:58

How would the syntax look like when the labels are in the columns? Just swapping row and column information inside the brackets do not appear to solve that?

#### @renny 2019-11-15 05:59:23

Great explanation. Cleared all my doubts

### [SOLVED] How do I list all files of a directory?

• 2010-07-08 19:31:22
• duhhunjonn
• 4698821 View
• 3470 Score
• Tags:   python directory