User Tools

Site Tools


python:twistedlxmlandre

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
python:twistedlxmlandre [2014/07/26 09:57] – [re.search, re.match] adminpython:twistedlxmlandre [2015/10/27 00:08] – [Regular Expression Language] admin
Line 240: Line 240:
   * ElementTree:https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-objects   * ElementTree:https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-objects
   * HTMLElement:https://docs.python.org/2/library/xml.etree.elementtree.html#element-objects   * HTMLElement:https://docs.python.org/2/library/xml.etree.elementtree.html#element-objects
-==== Create Etree from xml and html ====+==== Parsing xml and html to Etree Object ====
 refer: http://lxml.de/parsing.html\\ refer: http://lxml.de/parsing.html\\
 etree.parse return **lxml.etree._ElementTree** object etree.parse return **lxml.etree._ElementTree** object
Line 421: Line 421:
 result = etree.tostring(tree.getroot(), pretty_print=True, method="html") result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
 print(result) print(result)
 +</code>
 +==== Build xml using Etree ====
 +  * Build xml using xml.etree.ElementTree:<code python>
 +from xml.etree import ElementTree as ET
 +'''
 +<?xml version="1.0"?>
 +<data>
 +    <country name="Liechtenstein">
 +        <rank>1</rank>
 +        <year>2008</year>
 +    </country>
 +    <country name="Singapore">
 +        <rank>4</rank>
 +        <year>2011</year>
 +    </country>
 +</data>
 +'''
 +data = ET.Element('data')
 +
 +country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
 +rank1 = ET.SubElement(country1, 'rank')
 +rank1.text = '1'
 +year1 = ET.SubElement(country1, 'year')
 +year1.text = '2008'
 +
 +country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
 +rank2 = ET.SubElement(country2, 'rank')
 +rank2.text = '4'
 +year2 = ET.SubElement(country2, 'year')
 +year2.text = '2011'
 +print ET.tostring(data)
 +</code> output:<code>
 +<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
 +</code>
 +  * Build xml using lxml.etree:<code python>
 +from lxml import etree as ET
 +'''
 +<?xml version="1.0"?>
 +<data>
 +    <country name="Liechtenstein">
 +        <rank>1</rank>
 +        <year>2008</year>
 +    </country>
 +    <country name="Singapore">
 +        <rank>4</rank>
 +        <year>2011</year>
 +    </country>
 +</data>
 +'''
 +data = ET.Element('data')
 +
 +country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
 +rank1 = ET.SubElement(country1, 'rank')
 +rank1.text = '1'
 +year1 = ET.SubElement(country1, 'year')
 +year1.text = '2008'
 +
 +country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
 +rank2 = ET.SubElement(country2, 'rank')
 +rank2.text = '4'
 +year2 = ET.SubElement(country2, 'year')
 +year2.text = '2011'
 +print ET.tostring(data)
 +</code> output: <code>
 +<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
 </code> </code>
 ==== Custom Functions ==== ==== Custom Functions ====
Line 461: Line 526:
 tree.write('index.html', method = 'html') tree.write('index.html', method = 'html')
 </code> </code>
- 
 ===== re Package(Regular Expression) ===== ===== re Package(Regular Expression) =====
 +To use re package, we need to import it:<code python>
 +import re
 +</code>
 ==== Regular Expression Language ==== ==== Regular Expression Language ====
 A regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern\\ A regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern\\
-refer: http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx+refer:  
 +  * http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx 
 +  * python: https://docs.python.org/2/library/re.html#regular-expression-syntax
  
 **Match Character** **Match Character**
Line 507: Line 576:
 </code> </code>
 === re.findall === === re.findall ===
-  * re.findall: The findall() is probably the single most powerful function in the re module<code python>+findall: The findall() is probably the single most powerful function in the re module 
 +  - Example 1: <code python>
 str = 'purple [email protected], blah monkey [email protected] blah dishwasher' str = 'purple [email protected], blah monkey [email protected] blah dishwasher'
  
Line 516: Line 586:
     # do something with each found email string     # do something with each found email string
     print email     print email
-</code>    <code python>+</code>Understand pattern syntax above: 
 +  * [\w\.-]+ => Begin with one or multiple(sign: +) in group(sign: []): word(sign: \w) or character **.**(sign: \.) or character **-** 
 +  * @[\w\.-]+ => next of it is character @ and one or multiple characters in group: [word, **.** , **-**] 
 +  - Example 2: <code python>
 # Open file # Open file
 f = open('test.txt', 'r') f = open('test.txt', 'r')
Line 619: Line 692:
 text2 = re.sub("cool", "good", text) text2 = re.sub("cool", "good", text)
 print text2 print text2
 +</code>output<code>
 +Python for beginner is a very good website
 </code> </code>
       * Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host.<code python>       * Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host.<code python>
Line 627: Line 702:
 ## 1 is group(1), 2 group(2) in the replacement ## 1 is group(1), 2 group(2) in the replacement
  
-print re.sub(r'([w.-]+)@([w.-]+)', r'[email protected]', str) +print re.sub(r'([\w.-]+)@([\w.-]+)', r'[email protected]', str)
 ## purple [email protected], blah monkey [email protected] blah dishwasher ## purple [email protected], blah monkey [email protected] blah dishwasher
 +</code>output:<code>
 +purple [email protected], blah monkey [email protected] blah dishwasher
 </code> </code>
   * re.compile: With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.    * re.compile: With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 
python/twistedlxmlandre.txt · Last modified: 2022/10/29 16:15 by 127.0.0.1