How to read CSV file in Pandas
Reading CSV file in Pandas is pretty simple. Here is how you do it.
import pandas as pd df = pd.read_csv('/root/test.csv')
If you run in to following error, it means you need to set the correct delimiter or your data has different encoding
ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
For my file the delimiter is tab, so lets try again...
df = pd.read_csv('/root/test.csv', ',delimiter='\t') ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
As we see above, I still got the above error. So we need to fix the encoding.
There is a python library chardet which can help us find the correct encoding. Lets import that and use it find the encoding...
import chardet with open('/root/test.csv','rb') as f: rawdata = b''.join([f.readline() for _ in range(20)]) print(chardet.detect(rawdata)['encoding']) UTF-16
Ok the encoding is UTF-16, lets read the csv file again using UTF-16 now...
df = pd.read_csv('/root/test.csv', encoding='UTF-16',delimiter='\t') len(df) 6384
It worked now.