Differences

This shows you the differences between two versions of the page.

--- python:twistedlxmlandre [2014/07/26 09:57] – [re.search, re.match] admin
+++ python:twistedlxmlandre [2015/10/27 00:08] – [Regular Expression Language] admin
@@ Line 240: / Line 240: @@
   * ElementTree:https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-objects
   * HTMLElement:https://docs.python.org/2/library/xml.etree.elementtree.html#element-objects
-==== Create Etree from xml and html ====
+==== Parsing xml and html to Etree Object ====
 refer: http://lxml.de/parsing.html\\
 etree.parse return **lxml.etree._ElementTree** object
@@ Line 421: / Line 421: @@
 result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
 print(result)
+</code>
+==== Build xml using Etree ====
+  * Build xml using xml.etree.ElementTree:<code python>
+from xml.etree import ElementTree as ET
+'''
+<?xml version="1.0"?>
+<data>
+    <country name="Liechtenstein">
+        <rank>1</rank>
+        <year>2008</year>
+    </country>
+    <country name="Singapore">
+        <rank>4</rank>
+        <year>2011</year>
+    </country>
+</data>
+'''
+data = ET.Element('data')
+country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
+rank1 = ET.SubElement(country1, 'rank')
+rank1.text = '1'
+year1 = ET.SubElement(country1, 'year')
+year1.text = '2008'
+country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
+rank2 = ET.SubElement(country2, 'rank')
+rank2.text = '4'
+year2 = ET.SubElement(country2, 'year')
+year2.text = '2011'
+print ET.tostring(data)
+</code> output:<code>
+<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
+</code>
+  * Build xml using lxml.etree:<code python>
+from lxml import etree as ET
+'''
+<?xml version="1.0"?>
+<data>
+    <country name="Liechtenstein">
+        <rank>1</rank>
+        <year>2008</year>
+    </country>
+    <country name="Singapore">
+        <rank>4</rank>
+        <year>2011</year>
+    </country>
+</data>
+'''
+data = ET.Element('data')
+country1 = ET.SubElement(data, 'country', {'name':'Liechtenstein'})
+rank1 = ET.SubElement(country1, 'rank')
+rank1.text = '1'
+year1 = ET.SubElement(country1, 'year')
+year1.text = '2008'
+country2 = ET.SubElement(data, 'country', {'name':'Singapore'})
+rank2 = ET.SubElement(country2, 'rank')
+rank2.text = '4'
+year2 = ET.SubElement(country2, 'year')
+year2.text = '2011'
+print ET.tostring(data)
+</code> output: <code>
+<data><country name="Liechtenstein"><rank>1</rank><year>2008</year></country><country name="Singapore"><rank>4</rank><year>2011</year></country></data>
 </code>
 ==== Custom Functions ====
@@ Line 461: / Line 526: @@
 tree.write('index.html', method = 'html')
 </code>
 ===== re Package(Regular Expression) =====
+To use re package, we need to import it:<code python>
+import re
+</code>
 ==== Regular Expression Language ====
 A regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern\\
-refer: http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
+refer:
+  * http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
+  * python: https://docs.python.org/2/library/re.html#regular-expression-syntax
 **Match Character**
@@ Line 507: / Line 576: @@
 </code>
 === re.findall ===
-  * re.findall: The findall() is probably the single most powerful function in the re module<code python>
+findall: The findall() is probably the single most powerful function in the re module
+  - Example 1: <code python>
 str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
@@ Line 516: / Line 586: @@
     # do something with each found email string
     print email
-</code>    <code python>
+</code>Understand pattern syntax above:
+  * [\w\.-]+ => Begin with one or multiple(sign: +) in group(sign: []): word(sign: \w) or character **.**(sign: \.) or character **-**
+  * @[\w\.-]+ => next of it is character @ and one or multiple characters in group: [word, **.** , **-**]
+  - Example 2: <code python>
 # Open file
 f = open('test.txt', 'r')
@@ Line 619: / Line 692: @@
 text2 = re.sub("cool", "good", text)
 print text2
+</code>output<code>
+Python for beginner is a very good website
 </code>
       * Here is another example (taken from Googles Python class ) which searches for all the email addresses, and changes them to keep the user (1) but have yo-yo-dyne.com as the host.<code python>
@@ Line 627: / Line 702: @@
 ## 1 is group(1), 2 group(2) in the replacement
-print re.sub(r'([w.-]+)@([w.-]+)', r'1@yo-yo-dyne.com', str)
+print re.sub(r'([\w.-]+)@([\w.-]+)', r'1@yo-yo-dyne.com', str)
 ## purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher
+</code>output:<code>
+purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher
 </code>
   * re.compile: With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.