BaseTools:ECC report errors on account of analyze special characters

[edk2-devel] [PATCH] BaseTools:ECC report errors on account of analyze special characters

Posted by Fan, ZhijuX 6 years, 9 months ago

BZ:https://bugzilla.tianocore.org/show_bug.cgi?id=1751

In case that a C function body contains the string of L'', L'\"', 
L"\"", L''', L""", L"\"\"", L"\"^", L" \"", L"\" \"", ('L",\\\""') 
ECC tool running under python3 interpreter will report error. 
The antlr4 module misidentified this character

This patch is going to fix that issue.

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Signed-off-by: Zhiju.Fan <zhijux.fan@intel.com>
---
 BaseTools/Source/Python/Ecc/CodeFragmentCollector.py | 5 ++++-
 BaseTools/Source/Python/Ecc/Configuration.py         | 5 +++++
 BaseTools/Source/Python/Ecc/c.py                     | 3 +++
 BaseTools/Source/Python/Ecc/config.ini               | 2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py b/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py
index f844b4a0b3..589e8d91e6 100644
--- a/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py
+++ b/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py
@@ -79,7 +79,7 @@ class CodeFragmentCollector:
         self.FileName = FileName
         self.CurrentLineNumber = 1
         self.CurrentOffsetWithinLine = 0
-
+        self.TokenReleaceList = []
         self.__Token = ""
         self.__SkippedChars = ""
 
@@ -509,6 +509,9 @@ class CodeFragmentCollector:
         FileStringContents = ''
         for fileLine in self.Profile.FileLinesList:
             FileStringContents += fileLine
+        for Token in self.TokenReleaceList:
+            if Token in FileStringContents:
+                FileStringContents = FileStringContents.replace(Token, 'TOKENSTRING')
         cStream = antlr.InputStream(FileStringContents)
         lexer = CLexer(cStream)
         tStream = antlr.CommonTokenStream(lexer)
diff --git a/BaseTools/Source/Python/Ecc/Configuration.py b/BaseTools/Source/Python/Ecc/Configuration.py
index 66c8dd7880..9ebd130c31 100644
--- a/BaseTools/Source/Python/Ecc/Configuration.py
+++ b/BaseTools/Source/Python/Ecc/Configuration.py
@@ -120,6 +120,7 @@ _ConfigFileToInternalTranslation = {
     "SmmCommParaCheckBufferType":"SmmCommParaCheckBufferType",
     "SpaceCheckAll":"SpaceCheckAll",
     "SpellingCheckAll":"SpellingCheckAll",
+    "TokenReleaceList":"TokenReleaceList",
     "UniCheckAll":"UniCheckAll",
     "UniCheckHelpInfo":"UniCheckHelpInfo",
     "UniCheckPCDInfo":"UniCheckPCDInfo",
@@ -395,6 +396,8 @@ class Configuration(object):
         # A list for Copyright format
         self.Copyright = []
 
+        self.TokenReleaceList = []
+
         self.ParseConfig()
 
     def ParseConfig(self):
@@ -425,6 +428,8 @@ class Configuration(object):
                     List[1] = GetSplitValueList(List[1], TAB_COMMA_SPLIT)
                 if List[0] == 'Copyright':
                     List[1] = GetSplitValueList(List[1], TAB_COMMA_SPLIT)
+                if List[0] == 'TokenReleaceList':
+                    List[1] = GetSplitValueList(List[1], TAB_COMMA_SPLIT)
                 self.__dict__[_ConfigFileToInternalTranslation[List[0]]] = List[1]
 
     def ShowMe(self):
diff --git a/BaseTools/Source/Python/Ecc/c.py b/BaseTools/Source/Python/Ecc/c.py
index 7b645ff053..75fe4544a1 100644
--- a/BaseTools/Source/Python/Ecc/c.py
+++ b/BaseTools/Source/Python/Ecc/c.py
@@ -501,6 +501,8 @@ def CollectSourceCodeDataIntoDB(RootDir):
     tuple = os.walk(RootDir)
     IgnoredPattern = GetIgnoredDirListPattern()
     ParseErrorFileList = []
+    TokenReleaceList = EccGlobalData.gConfig.TokenReleaceList
+    TokenReleaceList.extend(['L",\\\""'])
 
     for dirpath, dirnames, filenames in tuple:
         if IgnoredPattern.match(dirpath.upper()):
@@ -525,6 +527,7 @@ def CollectSourceCodeDataIntoDB(RootDir):
                 EdkLogger.info("Parsing " + FullName)
                 model = f.endswith('c') and DataClass.MODEL_FILE_C or DataClass.MODEL_FILE_H
                 collector = CodeFragmentCollector.CodeFragmentCollector(FullName)
+                collector.TokenReleaceList = TokenReleaceList
                 try:
                     collector.ParseFile()
                 except UnicodeError:
diff --git a/BaseTools/Source/Python/Ecc/config.ini b/BaseTools/Source/Python/Ecc/config.ini
index 00c98c6232..cdd294280e 100644
--- a/BaseTools/Source/Python/Ecc/config.ini
+++ b/BaseTools/Source/Python/Ecc/config.ini
@@ -283,3 +283,5 @@ SmmCommParaCheckBufferType = 1
 BinaryExtList = EXE, EFI, FV, ROM, DLL, COM, BMP, GIF, PYD, CMP, BIN, JPG, UNI, RAW, COM2, LIB, DEPEX, SYS, DB
 # A list for only scanning dirs, the dirs should be the top folder(s) under workspace
 ScanOnlyDirList = ScanFolder1 ScanFolder2
+# A list for Used to circumvent special strings
+TokenReleaceList = L'', L'\"', L"\"", L''', L""", L"\"\"", L"\"^", L" \"", L"\" \""
-- 
2.14.1.windows.1
GitPatchExtractor 1.1

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#40017): https://edk2.groups.io/g/devel/message/40017
Mute This Topic: https://groups.io/mt/31515947/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [edk2-devel] [PATCH] BaseTools:ECC report errors on account of analyze special characters

Posted by Bob Feng 6 years, 9 months ago

Reviewed-by: Bob Feng<bob.c.feng@intel.com>

-----Original Message-----
From: Fan, ZhijuX 
Sent: Monday, May 6, 2019 10:35 AM
To: devel@edk2.groups.io
Cc: Gao, Liming <liming.gao@intel.com>; Feng, Bob C <bob.c.feng@intel.com>
Subject: [PATCH] BaseTools:ECC report errors on account of analyze special characters

BZ:https://bugzilla.tianocore.org/show_bug.cgi?id=1751

In case that a C function body contains the string of L'', L'\"', L"\"", L''', L""", L"\"\"", L"\"^", L" \"", L"\" \"", ('L",\\\""') ECC tool running under python3 interpreter will report error. 
The antlr4 module misidentified this character

This patch is going to fix that issue.

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Signed-off-by: Zhiju.Fan <zhijux.fan@intel.com>
---
 BaseTools/Source/Python/Ecc/CodeFragmentCollector.py | 5 ++++-
 BaseTools/Source/Python/Ecc/Configuration.py         | 5 +++++
 BaseTools/Source/Python/Ecc/c.py                     | 3 +++
 BaseTools/Source/Python/Ecc/config.ini               | 2 ++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py b/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py
index f844b4a0b3..589e8d91e6 100644
--- a/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py
+++ b/BaseTools/Source/Python/Ecc/CodeFragmentCollector.py
@@ -79,7 +79,7 @@ class CodeFragmentCollector:
         self.FileName = FileName
         self.CurrentLineNumber = 1
         self.CurrentOffsetWithinLine = 0
-
+        self.TokenReleaceList = []
         self.__Token = ""
         self.__SkippedChars = ""
 
@@ -509,6 +509,9 @@ class CodeFragmentCollector:
         FileStringContents = ''
         for fileLine in self.Profile.FileLinesList:
             FileStringContents += fileLine
+        for Token in self.TokenReleaceList:
+            if Token in FileStringContents:
+                FileStringContents = FileStringContents.replace(Token, 
+ 'TOKENSTRING')
         cStream = antlr.InputStream(FileStringContents)
         lexer = CLexer(cStream)
         tStream = antlr.CommonTokenStream(lexer) diff --git a/BaseTools/Source/Python/Ecc/Configuration.py b/BaseTools/Source/Python/Ecc/Configuration.py
index 66c8dd7880..9ebd130c31 100644
--- a/BaseTools/Source/Python/Ecc/Configuration.py
+++ b/BaseTools/Source/Python/Ecc/Configuration.py
@@ -120,6 +120,7 @@ _ConfigFileToInternalTranslation = {
     "SmmCommParaCheckBufferType":"SmmCommParaCheckBufferType",
     "SpaceCheckAll":"SpaceCheckAll",
     "SpellingCheckAll":"SpellingCheckAll",
+    "TokenReleaceList":"TokenReleaceList",
     "UniCheckAll":"UniCheckAll",
     "UniCheckHelpInfo":"UniCheckHelpInfo",
     "UniCheckPCDInfo":"UniCheckPCDInfo",
@@ -395,6 +396,8 @@ class Configuration(object):
         # A list for Copyright format
         self.Copyright = []
 
+        self.TokenReleaceList = []
+
         self.ParseConfig()
 
     def ParseConfig(self):
@@ -425,6 +428,8 @@ class Configuration(object):
                     List[1] = GetSplitValueList(List[1], TAB_COMMA_SPLIT)
                 if List[0] == 'Copyright':
                     List[1] = GetSplitValueList(List[1], TAB_COMMA_SPLIT)
+                if List[0] == 'TokenReleaceList':
+                    List[1] = GetSplitValueList(List[1], 
+ TAB_COMMA_SPLIT)
                 self.__dict__[_ConfigFileToInternalTranslation[List[0]]] = List[1]
 
     def ShowMe(self):
diff --git a/BaseTools/Source/Python/Ecc/c.py b/BaseTools/Source/Python/Ecc/c.py
index 7b645ff053..75fe4544a1 100644
--- a/BaseTools/Source/Python/Ecc/c.py
+++ b/BaseTools/Source/Python/Ecc/c.py
@@ -501,6 +501,8 @@ def CollectSourceCodeDataIntoDB(RootDir):
     tuple = os.walk(RootDir)
     IgnoredPattern = GetIgnoredDirListPattern()
     ParseErrorFileList = []
+    TokenReleaceList = EccGlobalData.gConfig.TokenReleaceList
+    TokenReleaceList.extend(['L",\\\""'])
 
     for dirpath, dirnames, filenames in tuple:
         if IgnoredPattern.match(dirpath.upper()):
@@ -525,6 +527,7 @@ def CollectSourceCodeDataIntoDB(RootDir):
                 EdkLogger.info("Parsing " + FullName)
                 model = f.endswith('c') and DataClass.MODEL_FILE_C or DataClass.MODEL_FILE_H
                 collector = CodeFragmentCollector.CodeFragmentCollector(FullName)
+                collector.TokenReleaceList = TokenReleaceList
                 try:
                     collector.ParseFile()
                 except UnicodeError:
diff --git a/BaseTools/Source/Python/Ecc/config.ini b/BaseTools/Source/Python/Ecc/config.ini
index 00c98c6232..cdd294280e 100644
--- a/BaseTools/Source/Python/Ecc/config.ini
+++ b/BaseTools/Source/Python/Ecc/config.ini
@@ -283,3 +283,5 @@ SmmCommParaCheckBufferType = 1  BinaryExtList = EXE, EFI, FV, ROM, DLL, COM, BMP, GIF, PYD, CMP, BIN, JPG, UNI, RAW, COM2, LIB, DEPEX, SYS, DB  # A list for only scanning dirs, the dirs should be the top folder(s) under workspace  ScanOnlyDirList = ScanFolder1 ScanFolder2
+# A list for Used to circumvent special strings TokenReleaceList = L'', 
+L'\"', L"\"", L''', L""", L"\"\"", L"\"^", L" \"", L"\" \""
--
2.14.1.windows.1
GitPatchExtractor 1.1

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#40060): https://edk2.groups.io/g/devel/message/40060
Mute This Topic: https://groups.io/mt/31515947/1787277
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [importer@patchew.org]
-=-=-=-=-=-=-=-=-=-=-=-