In [48]:
import pandas as pd
In [49]:
import numpy as np

Load CSV

In [50]:
df = pd.read_csv("/home/ranjeet/Website/data/IRIS.csv")

Above /home/ranjeet/Website/data/IRIS.csv is path to the CSV data. Here i am using IRIS Data for this demo

In [51]:
df.head()
Out[51]:
5.1 0.222222222 3.5 0.625 1.4 0.06779661 0.2 0.041666667 setosa
0 4.9 0.166667 3.0 0.416667 1.4 0.067797 0.2 0.041667 setosa
1 4.7 0.111111 3.2 0.500000 1.3 0.050847 0.2 0.041667 setosa
2 4.6 0.083333 3.1 0.458333 1.5 0.084746 0.2 0.041667 setosa
3 5.0 0.194444 3.6 0.666667 1.4 0.067797 0.2 0.041667 setosa
4 5.4 0.305556 3.9 0.791667 1.7 0.118644 0.4 0.125000 setosa

Here .head() Function give us top 4 data frame rows

As you can notice above dataframe is considering first row as column names. So if you want to read DataFrame without header try following command

In [52]:
df = pd.read_csv("/home/ranjeet/Website/data/IRIS.csv", header=None)
In [53]:
df.head()
Out[53]:
0 1 2 3 4 5 6 7 8
0 5.1 0.222222 3.5 0.625000 1.4 0.067797 0.2 0.041667 setosa
1 4.9 0.166667 3.0 0.416667 1.4 0.067797 0.2 0.041667 setosa
2 4.7 0.111111 3.2 0.500000 1.3 0.050847 0.2 0.041667 setosa
3 4.6 0.083333 3.1 0.458333 1.5 0.084746 0.2 0.041667 setosa
4 5.0 0.194444 3.6 0.666667 1.4 0.067797 0.2 0.041667 setosa

Now here pandas is not considering first row as column names and assigning numbers starting from 0 as column names.

We can also skip top n rows while loading CSV into Pandas DataFrame using following command

In [54]:
df = pd.read_csv("/home/ranjeet/Website/data/IRIS.csv", skiprows = 2, header=None)
In [55]:
df.head()
Out[55]:
0 1 2 3 4 5 6 7 8
0 4.7 0.111111 3.2 0.500000 1.3 0.050847 0.2 0.041667 setosa
1 4.6 0.083333 3.1 0.458333 1.5 0.084746 0.2 0.041667 setosa
2 5.0 0.194444 3.6 0.666667 1.4 0.067797 0.2 0.041667 setosa
3 5.4 0.305556 3.9 0.791667 1.7 0.118644 0.4 0.125000 setosa
4 4.6 0.083333 3.4 0.583333 1.4 0.067797 0.3 0.083333 setosa

Above i used skiprows = 2 which means i skipped first two rows while reading Data.

You can also read specific columns while reading dataframe

First we will assign column names to our DataFrame

In [56]:
df.head()
Out[56]:
0 1 2 3 4 5 6 7 8
0 4.7 0.111111 3.2 0.500000 1.3 0.050847 0.2 0.041667 setosa
1 4.6 0.083333 3.1 0.458333 1.5 0.084746 0.2 0.041667 setosa
2 5.0 0.194444 3.6 0.666667 1.4 0.067797 0.2 0.041667 setosa
3 5.4 0.305556 3.9 0.791667 1.7 0.118644 0.4 0.125000 setosa
4 4.6 0.083333 3.4 0.583333 1.4 0.067797 0.3 0.083333 setosa
In [57]:
column_names = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9']
In [58]:
df.columns = column_names
In [59]:
df.head()
Out[59]:
col1 col2 col3 col4 col5 col6 col7 col8 col9
0 4.7 0.111111 3.2 0.500000 1.3 0.050847 0.2 0.041667 setosa
1 4.6 0.083333 3.1 0.458333 1.5 0.084746 0.2 0.041667 setosa
2 5.0 0.194444 3.6 0.666667 1.4 0.067797 0.2 0.041667 setosa
3 5.4 0.305556 3.9 0.791667 1.7 0.118644 0.4 0.125000 setosa
4 4.6 0.083333 3.4 0.583333 1.4 0.067797 0.3 0.083333 setosa
In [60]:
column_names_to_choose = ['col1', 'col2', 'col3']
In [61]:
df = df[column_names_to_choose]
In [62]:
df.head()
Out[62]:
col1 col2 col3
0 4.7 0.111111 3.2
1 4.6 0.083333 3.1
2 5.0 0.194444 3.6
3 5.4 0.305556 3.9
4 4.6 0.083333 3.4

As you can see Pandas read all the columns but using simple trick we kept specified columns and droped all other columns from DataFrame