User Tools

Site Tools


python:twistedlxmlandre

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
python:twistedlxmlandre [2014/07/26 09:40] – [re.search, re.match] adminpython:twistedlxmlandre [2022/10/29 16:15] (current) – external edit 127.0.0.1
Line 240: Line 240:
   * ElementTree:https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-objects   * ElementTree:https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-objects
   * HTMLElement:https://docs.python.org/2/library/xml.etree.elementtree.html#element-objects   * HTMLElement:https://docs.python.org/2/library/xml.etree.elementtree.html#element-objects
-==== Create Etree from xml and html ====+==== Parsing xml and html to Etree Object ====
 refer: http://lxml.de/parsing.html\\ refer: http://lxml.de/parsing.html\\
 etree.parse return **lxml.etree._ElementTree** object etree.parse return **lxml.etree._ElementTree** object
Line 421: Line 421:
 result = etree.tostring(tree.getroot(), pretty_print=True, method="html") result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
 print(result) print(result)
 +</code>
 +==== Build xml using Etree ====
 +  * Build xml using xml.etree.ElementTree:<code python>
 +from xml.etree import ElementTree as ET
 +'''
 +<?xml version="1.0"?>
 +<data>
 +    <country name="Liechtenstein">
 +        <rank>1</rank>
 +        <year>2008</year>
 +    </country>
 +    <country name="Singapore">
 +        <rank>4</rank>
 +        <year>2011</year>
 +    </country>
 +</data>
 +'''
 +data = ET.Element('data')
 +
 +country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
 +rank1 = ET.SubElement(country1, 'rank')
 +rank1.text = '1'
 +year1 = ET.SubElement(country1, 'year')
 +year1.text = '2008'
 +
 +country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
 +rank2 = ET.SubElement(country2, 'rank')
 +rank2.text = '4'
 +year2 = ET.SubElement(country2, 'year')
 +year2.text = '2011'
 +print ET.tostring(data)
 +</code> output:<code>
 +<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
 +</code>
 +  * Build xml using lxml.etree:<code python>
 +from lxml import etree as ET
 +'''
 +<?xml version="1.0"?>
 +<data>
 +    <country name="Liechtenstein">
 +        <rank>1</rank>
 +        <year>2008</year>
 +    </country>
 +    <country name="Singapore">
 +        <rank>4</rank>
 +        <year>2011</year>
 +    </country>
 +</data>
 +'''
 +data = ET.Element('data')
 +
 +country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
 +rank1 = ET.SubElement(country1, 'rank')
 +rank1.text = '1'
 +year1 = ET.SubElement(country1, 'year')
 +year1.text = '2008'
 +
 +country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
 +rank2 = ET.SubElement(country2, 'rank')
 +rank2.text = '4'
 +year2 = ET.SubElement(country2, 'year')
 +year2.text = '2011'
 +print ET.tostring(data)
 +</code> output: <code>
 +<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
 </code> </code>
 ==== Custom Functions ==== ==== Custom Functions ====
Line 461: Line 526:
 tree.write('index.html', method = 'html') tree.write('index.html', method = 'html')
 </code> </code>
- 
 ===== re Package(Regular Expression) ===== ===== re Package(Regular Expression) =====
 +To use re package, we need to import it:<code python>
 +import re
 +</code>
 ==== Regular Expression Language ==== ==== Regular Expression Language ====
 A regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern\\ A regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern\\
-refer: http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx+refer:  
 +  * http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx 
 +  * python: https://docs.python.org/2/library/re.html#regular-expression-syntax
  
 **Match Character** **Match Character**
Line 507: Line 576:
 </code> </code>
 === re.findall === === re.findall ===
-  * re.findall: The findall() is probably the single most powerful function in the re module<code python>+findall: The findall() is probably the single most powerful function in the re module 
 +  - Example 1: <code python>
 str = 'purple [email protected], blah monkey [email protected] blah dishwasher' str = 'purple [email protected], blah monkey [email protected] blah dishwasher'
  
Line 516: Line 586:
     # do something with each found email string     # do something with each found email string
     print email     print email
-</code>    <code python>+</code>Understand pattern syntax above: 
 +  * [\w\.-]+ => Begin with one or multiple(sign: +) in group(sign: []): word(sign: \w) or character **.**(sign: \.) or character **-** 
 +  * @[\w\.-]+ => next of it is character @ and one or multiple characters in group: [word, **.** , **-**] 
 +  - Example 2: <code python>
 # Open file # Open file
 f = open('test.txt', 'r') f = open('test.txt', 'r')
Line 573: Line 646:
 '\\w+' => 'The' '\\w+' => 'The'
 </code> </code>
-  * re.search and re.match the same:<code python>+  * re.search and re.match:<code python>
 import re import re
  
 +print '**********************************'
 text ="10/15/99" text ="10/15/99"
  
 +print "match1:"
 m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text) m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text)
 if m: if m:
     print m.group(1, 2, 3)     print m.group(1, 2, 3)
 +
 +print "search1:"
 s = re.search("(\d{2})/(\d{2})/(\d{2,4})", text) s = re.search("(\d{2})/(\d{2})/(\d{2,4})", text)
 if s: if s:
-    print s.group(1, 2, 3)+    print s.group(1, 2, 3)     
 + 
 +print '**********************************' 
 +text ="hello 10/15/99" 
 +print "match2:" 
 +m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text) 
 +if m: 
 +    print m.group(1, 2, 3) 
 + 
 +print "search2:" 
 +s = re.search("(\d{2})/(\d{2})/(\d{2,4})", text) 
 +if s: 
 +    print s.group(1, 2, 3)     
 </code>output:<code> </code>output:<code>
 +**********************************
 +match1:
 ('10', '15', '99') ('10', '15', '99')
 +search1:
 +('10', '15', '99')
 +**********************************
 +match2:
 +search2:
 ('10', '15', '99') ('10', '15', '99')
 </code> </code>
Line 595: Line 692:
 text2 = re.sub("cool", "good", text) text2 = re.sub("cool", "good", text)
 print text2 print text2
 +</code>output<code>
 +Python for beginner is a very good website
 </code> </code>
       * Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host.<code python>       * Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host.<code python>
Line 603: Line 702:
 ## 1 is group(1), 2 group(2) in the replacement ## 1 is group(1), 2 group(2) in the replacement
  
-print re.sub(r'([w.-]+)@([w.-]+)', r'[email protected]', str) +print re.sub(r'([\w.-]+)@([\w.-]+)', r'[email protected]', str)
 ## purple [email protected], blah monkey [email protected] blah dishwasher ## purple [email protected], blah monkey [email protected] blah dishwasher
 +</code>output:<code>
 +purple [email protected], blah monkey [email protected] blah dishwasher
 </code> </code>
   * re.compile: With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.    * re.compile: With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. 
python/twistedlxmlandre.1406367625.txt.gz · Last modified: 2022/10/29 16:15 (external edit)