Table of Contents

How to read CSV file in Pandas


Reading CSV file in Pandas is pretty simple. Here is how you do it. Check out also how to read excel using pandas.

import pandas as pd
df = pd.read_csv('/root/test.csv')

If you run in to following error, it means you need to set the correct delimiter or your data has different encoding

ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

For my file the delimiter is tab, so lets try again...

df = pd.read_csv('/root/test.csv', ',delimiter='\t')
ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

As we see above, I still got the above error. So we need to fix the encoding.

There is a python library chardet which can help us find the correct encoding. Lets import that and use it find the encoding...

import chardet
with open('/root/test.csv','rb') as f:
  rawdata = b''.join([f.readline() for _ in range(20)])
  print(chardet.detect(rawdata)['encoding'])
UTF-16

Ok the encoding is UTF-16, lets read the csv file again using UTF-16 now...

df = pd.read_csv('/root/test.csv', encoding='UTF-16',delimiter='\t')
len(df)
6384

It worked now.

Related Posts