Differences

This shows you the differences between two versions of the page.

--- python:twistedlxmlandre [2014/07/26 09:40] – [re.search, re.match] admin
+++ python:twistedlxmlandre [2022/10/29 16:15] (current) – external edit 127.0.0.1
@@ Line 240: / Line 240: @@
   * ElementTree:https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-objects
   * HTMLElement:https://docs.python.org/2/library/xml.etree.elementtree.html#element-objects
-==== Create Etree from xml and html ====
+==== Parsing xml and html to Etree Object ====
 refer: http://lxml.de/parsing.html\\
 etree.parse return **lxml.etree._ElementTree** object
@@ Line 421: / Line 421: @@
 result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
 print(result)
+</code>
+==== Build xml using Etree ====
+  * Build xml using xml.etree.ElementTree:<code python>
+from xml.etree import ElementTree as ET
+'''
+<?xml version="1.0"?>
+<data>
+    <country name="Liechtenstein">
+        <rank>1</rank>
+        <year>2008</year>
+    </country>
+    <country name="Singapore">
+        <rank>4</rank>
+        <year>2011</year>
+    </country>
+</data>
+'''
+data = ET.Element('data')
+country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
+rank1 = ET.SubElement(country1, 'rank')
+rank1.text = '1'
+year1 = ET.SubElement(country1, 'year')
+year1.text = '2008'
+country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
+rank2 = ET.SubElement(country2, 'rank')
+rank2.text = '4'
+year2 = ET.SubElement(country2, 'year')
+year2.text = '2011'
+print ET.tostring(data)
+</code> output:<code>
+<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
+</code>
+  * Build xml using lxml.etree:<code python>
+from lxml import etree as ET
+'''
+<?xml version="1.0"?>
+<data>
+    <country name="Liechtenstein">
+        <rank>1</rank>
+        <year>2008</year>
+    </country>
+    <country name="Singapore">
+        <rank>4</rank>
+        <year>2011</year>
+    </country>
+</data>
+'''
+data = ET.Element('data')
+country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
+rank1 = ET.SubElement(country1, 'rank')
+rank1.text = '1'
+year1 = ET.SubElement(country1, 'year')
+year1.text = '2008'
+country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
+rank2 = ET.SubElement(country2, 'rank')
+rank2.text = '4'
+year2 = ET.SubElement(country2, 'year')
+year2.text = '2011'
+print ET.tostring(data)
+</code> output: <code>
+<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
 </code>
 ==== Custom Functions ====
@@ Line 461: / Line 526: @@
 tree.write('index.html', method = 'html')
 </code>
 ===== re Package(Regular Expression) =====
+To use re package, we need to import it:<code python>
+import re
+</code>
 ==== Regular Expression Language ====
 A regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern\\
-refer: http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
+refer:
+  * http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
+  * python: https://docs.python.org/2/library/re.html#regular-expression-syntax
 **Match Character**
@@ Line 507: / Line 576: @@
 </code>
 === re.findall ===
-  * re.findall: The findall() is probably the single most powerful function in the re module<code python>
+findall: The findall() is probably the single most powerful function in the re module
+  - Example 1: <code python>
 str = 'purple [email protected], blah monkey [email protected] blah dishwasher'
@@ Line 516: / Line 586: @@
     # do something with each found email string
     print email
-</code>    <code python>
+</code>Understand pattern syntax above:
+  * [\w\.-]+ => Begin with one or multiple(sign: +) in group(sign: []): word(sign: \w) or character **.**(sign: \.) or character **-**
+  * @[\w\.-]+ => next of it is character @ and one or multiple characters in group: [word, **.** , **-**]
+  - Example 2: <code python>
 # Open file
 f = open('test.txt', 'r')
@@ Line 573: / Line 646: @@
 '\\w+' => 'The'
 </code>
-  * re.search and re.match the same:<code python>
+  * re.search and re.match:<code python>
 import re
+print '**********************************'
 text ="10/15/99"
+print "match1:"
 m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text)
 if m:
     print m.group(1, 2, 3)
+print "search1:"
 s = re.search("(\d{2})/(\d{2})/(\d{2,4})", text)
 if s:
     print s.group(1, 2, 3)
+print '**********************************'
+text ="hello 10/15/99"
+print "match2:"
+m = re.match("(\d{2})/(\d{2})/(\d{2,4})", text)
+if m:
+    print m.group(1, 2, 3)
+print "search2:"
+s = re.search("(\d{2})/(\d{2})/(\d{2,4})", text)
+if s:
+    print s.group(1, 2, 3)
 </code>output:<code>
+**********************************
+match1:
 ('10', '15', '99')
+search1:
+('10', '15', '99')
+**********************************
+match2:
+search2:
 ('10', '15', '99')
 </code>
@@ Line 595: / Line 692: @@
 text2 = re.sub("cool", "good", text)
 print text2
+</code>output<code>
+Python for beginner is a very good website
 </code>
       * Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host.<code python>
@@ Line 603: / Line 702: @@
 ## 1 is group(1), 2 group(2) in the replacement
-print re.sub(r'([w.-]+)@([w.-]+)', r'[email protected]', str)
+print re.sub(r'([\w.-]+)@([\w.-]+)', r'[email protected]', str)
 ## purple [email protected], blah monkey [email protected] blah dishwasher
+</code>output:<code>
+purple [email protected], blah monkey [email protected] blah dishwasher
 </code>
   * re.compile: With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.