====== Exel Processing ======
===== Function in Exel =====
Trước khi xử lý các hàm tìm kiếm Vlookup and Hlookup, If trong exel cần phải format dữ liệu và sort trước -> Lúc đó hàm xử lý mới chạy đúng
VLOOKUP($B2,omni!$A$1:$C$4778,3,FALSE) -> Tìm chính xác nhờ tuỳ chọn **FALSE**
===== Install =====
* Install on windows:
easy_install.exe xlrd
easy_install.exe xlwt
easy_install.exe xlutils
* Install on linux:
pip install xlrd
pip install xlwt
pip install xlutils
===== Using xlrd =====
Basic APIs:
* **xlrd.open_workbook('adminup/shop.xlsx')** return **xlrd.book.Book** object
Example codes:
==== Open Workbook and get sheets ====
* Simple Open workbook
import xlrd
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
print xlsfile
output:
* Get sheets:
import xlrd
from xlrd.book import Book
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
print xlsfile.sheets()
output:
[, , ]
==== Using sheet(xlrd.sheet.Sheet) ====
* dump sheet to get basic informations
import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
for sheet in xlsfile.sheets():
print '******************************'
sheet.dump()
output for sheet1:
_cell_types: , len = 8
_cell_values: , len = 8
_dimncols: 4
_dimnrows: 8
_first_full_rowx: 3
name: u'Th\xf4ng tin shop'
ncols: 4
nrows: 8
number: 0
output for sheet2:
_cell_types: , len = 6
_cell_values: , len = 6
_dimncols: 3
_dimnrows: 54
_first_full_rowx: 2
name: u'Danh m\u1ee5c'
ncols: 3
nrows: 6
number: 1
* Get sheet by name: Using http://www.rapidmonkey.com/unicodeconverter/ to convert utf-8 to ASCII(Unicode Escaped) Content
import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
sheet_info = xlsfile.sheet_by_name(u'Th\xf4ng tin shop')
sheet_cat = xlsfile.sheet_by_name(u'Danh m\u1ee5c')
sheet_products = xlsfile.sheet_by_name(u'S\u1EA3n ph\u1EA9m')
Or
import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
xlsfile = xlrd.open_workbook('adminup/shop.xlsx')
sheet_info = None
sheet_cat = None
sheet_products = None
for sheet in xlsfile.sheets():
if sheet.name == u'Th\xf4ng tin shop':
sheet_info = sheet
elif sheet.name == u'Danh m\u1ee5c':
sheet_cat = sheet
elif sheet.name == u'S\u1EA3n ph\u1EA9m':
sheet_products = sheet
if sheet_info is not None:
print sheet_info.dump()
==== Parse xls file ====
import xlrd
from xlrd.book import Book
from xlrd.sheet import Sheet
nhapkhaufile = 'nhapkhau.xlsx'
xuatkhaufile = 'xuatkhau.xlsx'
constantsfile = 'constants.xlsx'
def parsexls(filename):
xlsfile = xlrd.open_workbook(filename)
for sheet in xlsfile.sheets():
n_cells = sheet.ncols
for curr_row in range(sheet.nrows):
curr_cell = 0
while curr_cell < n_cells:
cell_type = sheet.cell_type(curr_row, curr_cell)
if cell_type == xlrd.XL_CELL_EMPTY:
curr_cell += 1
continue
value = sheet.cell_value(curr_row, curr_cell)
curr_cell += 1
print value
parsexls(constantsfile)
==== Error Parse UTF-8 file ====
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 3438-3439: unexpected end of data
=> Check exel format in encoding(not utf-8)(May be exel file was not stored as encoding utf-8, It can be saved as utf16 default encoding)
You can debug for information:
xlsfile = xlrd.open_workbook(filename,verbosity=3)
output:
BOF: op=0x0809 vers=0x0600 stream=0x0005 buildid=14420 buildyr=1997 -> BIFF80
CODEPAGE: codepage 1200 -> encoding 'utf_16_le'
DATEMODE: datemode 0
=> default encoding: **'utf_16_le'**
**Fix**: Lỗi do file exel, mở lại file exel và lưu lại sẽ hết lỗi
===== Using xlwt =====