python:exelprocess
Table of Contents
Exel Processing
Function in Exel
Trước khi xử lý các hàm tìm kiếm Vlookup and Hlookup, If trong exel cần phải format dữ liệu và sort trước → Lúc đó hàm xử lý mới chạy đúng VLOOKUP($B2,omni!$A$1:$C$4778,3,FALSE) → Tìm chính xác nhờ tuỳ chọn FALSE
Install
- Install on windows:
easy_install.exe xlrd easy_install.exe xlwt easy_install.exe xlutils
- Install on linux:
pip install xlrd pip install xlwt pip install xlutils
Using xlrd
Basic APIs:
- xlrd.open_workbook('adminup/shop.xlsx') return xlrd.book.Book object
Example codes:
Open Workbook and get sheets
- Simple Open workbook
import xlrd xlsfile = xlrd.open_workbook('adminup/shop.xlsx') print xlsfile
output:
<xlrd.book.Book object at 0x023263B0>
- Get sheets:
import xlrd from xlrd.book import Book xlsfile = xlrd.open_workbook('adminup/shop.xlsx') print xlsfile.sheets()
output:
[<xlrd.sheet.Sheet object at 0x02256530>, <xlrd.sheet.Sheet object at 0x022564D0>, <xlrd.sheet.Sheet object at 0x022569F0>]
Using sheet(xlrd.sheet.Sheet)
- dump sheet to get basic informations
import xlrd from xlrd.book import Book from xlrd.sheet import Sheet xlsfile = xlrd.open_workbook('adminup/shop.xlsx') for sheet in xlsfile.sheets(): print '******************************' sheet.dump()
output for sheet1:
_cell_types: <type 'list'>, len = 8 _cell_values: <type 'list'>, len = 8 _dimncols: 4 _dimnrows: 8 _first_full_rowx: 3 name: u'Th\xf4ng tin shop' ncols: 4 nrows: 8 number: 0
output for sheet2:
_cell_types: <type 'list'>, len = 6 _cell_values: <type 'list'>, len = 6 _dimncols: 3 _dimnrows: 54 _first_full_rowx: 2 name: u'Danh m\u1ee5c' ncols: 3 nrows: 6 number: 1
- Get sheet by name: Using http://www.rapidmonkey.com/unicodeconverter/ to convert utf-8 to ASCII(Unicode Escaped) Content
import xlrd from xlrd.book import Book from xlrd.sheet import Sheet xlsfile = xlrd.open_workbook('adminup/shop.xlsx') sheet_info = xlsfile.sheet_by_name(u'Th\xf4ng tin shop') sheet_cat = xlsfile.sheet_by_name(u'Danh m\u1ee5c') sheet_products = xlsfile.sheet_by_name(u'S\u1EA3n ph\u1EA9m')
Or
import xlrd from xlrd.book import Book from xlrd.sheet import Sheet xlsfile = xlrd.open_workbook('adminup/shop.xlsx') sheet_info = None sheet_cat = None sheet_products = None for sheet in xlsfile.sheets(): if sheet.name == u'Th\xf4ng tin shop': sheet_info = sheet elif sheet.name == u'Danh m\u1ee5c': sheet_cat = sheet elif sheet.name == u'S\u1EA3n ph\u1EA9m': sheet_products = sheet if sheet_info is not None: print sheet_info.dump()
Parse xls file
import xlrd from xlrd.book import Book from xlrd.sheet import Sheet nhapkhaufile = 'nhapkhau.xlsx' xuatkhaufile = 'xuatkhau.xlsx' constantsfile = 'constants.xlsx' def parsexls(filename): xlsfile = xlrd.open_workbook(filename) for sheet in xlsfile.sheets(): n_cells = sheet.ncols for curr_row in range(sheet.nrows): curr_cell = 0 while curr_cell < n_cells: cell_type = sheet.cell_type(curr_row, curr_cell) if cell_type == xlrd.XL_CELL_EMPTY: curr_cell += 1 continue value = sheet.cell_value(curr_row, curr_cell) curr_cell += 1 print value parsexls(constantsfile)
Error Parse UTF-8 file
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 3438-3439: unexpected end of data
⇒ Check exel format in encoding(not utf-8)(May be exel file was not stored as encoding utf-8, It can be saved as utf16 default encoding) You can debug for information:
xlsfile = xlrd.open_workbook(filename,verbosity=3)
output:
BOF: op=0x0809 vers=0x0600 stream=0x0005 buildid=14420 buildyr=1997 -> BIFF80 CODEPAGE: codepage 1200 -> encoding 'utf_16_le' DATEMODE: datemode 0
⇒ default encoding: 'utf_16_le'
Fix: Lỗi do file exel, mở lại file exel và lưu lại sẽ hết lỗi
Using xlwt
python/exelprocess.txt · Last modified: 2022/10/29 16:15 by 127.0.0.1