Exel Processing

Function in Exel

Trước khi xử lý các hàm tìm kiếm Vlookup and Hlookup, If trong exel cần phải format dữ liệu và sort trước → Lúc đó hàm xử lý mới chạy đúng VLOOKUP($B2,omni!$A$1:$C$4778,3,FALSE) → Tìm chính xác nhờ tuỳ chọn FALSE

Install

Install on windows:

easy_install.exe xlrd
easy_install.exe xlwt
easy_install.exe xlutils

Install on linux:

pip install xlrd
pip install xlwt
pip install xlutils

Using xlrd

Basic APIs:

xlrd.open_workbook('adminup/shop.xlsx') return xlrd.book.Book object

Example codes:

Open Workbook and get sheets

Simple Open workbook

import xlrd
 
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
print xlsfile

output:

<xlrd.book.Book object at 0x023263B0>

Get sheets:

import xlrd
from xlrd.book import Book
 
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
print xlsfile.sheets()

output:

[<xlrd.sheet.Sheet object at 0x02256530>, <xlrd.sheet.Sheet object at 0x022564D0>, <xlrd.sheet.Sheet object at 0x022569F0>]

Using sheet(xlrd.sheet.Sheet)

dump sheet to get basic informations

import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
 
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
for sheet in xlsfile.sheets():
    print '******************************'
    sheet.dump()

output for sheet1:

_cell_types: <type 'list'>, len = 8
_cell_values: <type 'list'>, len = 8
_dimncols: 4
_dimnrows: 8
_first_full_rowx: 3

name: u'Th\xf4ng tin shop'
ncols: 4
nrows: 8
number: 0

output for sheet2:

_cell_types: <type 'list'>, len = 6
_cell_values: <type 'list'>, len = 6
_dimncols: 3
_dimnrows: 54
_first_full_rowx: 2

name: u'Danh m\u1ee5c'
ncols: 3
nrows: 6
number: 1

Get sheet by name: Using http://www.rapidmonkey.com/unicodeconverter/ to convert utf-8 to ASCII(Unicode Escaped) Content

import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
 
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
sheet_info = xlsfile.sheet_by_name(u'Th\xf4ng tin shop')
sheet_cat = xlsfile.sheet_by_name(u'Danh m\u1ee5c')
sheet_products = xlsfile.sheet_by_name(u'S\u1EA3n ph\u1EA9m')

Or

import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
 
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
sheet_info = None
sheet_cat = None
sheet_products = None
for sheet in xlsfile.sheets():
    if sheet.name == u'Th\xf4ng tin shop':
        sheet_info = sheet
    elif sheet.name == u'Danh m\u1ee5c':
        sheet_cat = sheet
    elif sheet.name == u'S\u1EA3n ph\u1EA9m':
        sheet_products = sheet
if sheet_info is not None:
    print sheet_info.dump()

Parse xls file

import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
nhapkhaufile = 'nhapkhau.xlsx'
xuatkhaufile = 'xuatkhau.xlsx'
constantsfile = 'constants.xlsx'
def parsexls(filename):
    xlsfile = xlrd.open_workbook(filename)
    for sheet in xlsfile.sheets():
        n_cells = sheet.ncols             
        for curr_row in range(sheet.nrows):
            curr_cell = 0
            while curr_cell < n_cells:
                cell_type = sheet.cell_type(curr_row, curr_cell)
                if cell_type == xlrd.XL_CELL_EMPTY:
                    curr_cell += 1
                    continue
                value = sheet.cell_value(curr_row, curr_cell)
                curr_cell += 1
                print value
parsexls(constantsfile)

Error Parse UTF-8 file

UnicodeDecodeError: 'utf16' codec can't decode bytes in position 3438-3439: unexpected end of data

⇒ Check exel format in encoding(not utf-8)(May be exel file was not stored as encoding utf-8, It can be saved as utf16 default encoding) You can debug for information:

xlsfile = xlrd.open_workbook(filename,verbosity=3)

output:

BOF: op=0x0809 vers=0x0600 stream=0x0005 buildid=14420 buildyr=1997 -> BIFF80
CODEPAGE: codepage 1200 -> encoding 'utf_16_le'
DATEMODE: datemode 0

⇒ default encoding: 'utf_16_le'

Fix: Lỗi do file exel, mở lại file exel và lưu lại sẽ hết lỗi

my notes

Table of Contents

Exel Processing

Function in Exel

Install

Using xlrd

Open Workbook and get sheets

Using sheet(xlrd.sheet.Sheet)

Parse xls file

Error Parse UTF-8 file

Using xlwt