How to read CSV file in Pandas


Reading CSV file in Pandas is pretty simple. Here is how you do it.

import pandas as pd
df = pd.read_csv('/root/test.csv')

If you run in to following error, it means you need to set the correct delimiter or your data has different encoding

ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

For my file the delimiter is tab, so lets try again...

df = pd.read_csv('/root/test.csv', ',delimiter='\t')
ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

As we see above, I still got the above error. So we need to fix the encoding.

There is a python library chardet which can help us find the correct encoding. Lets import that and use it find the encoding...

import chardet
with open('/root/test.csv','rb') as f:
  rawdata = b''.join([f.readline() for _ in range(20)])
  print(chardet.detect(rawdata)['encoding'])
UTF-16

Ok the encoding is UTF-16, lets read the csv file again using UTF-16 now...

df = pd.read_csv('/root/test.csv', encoding='UTF-16',delimiter='\t')
len(df)
6384

It worked now.