[edk2] [RFC] Formalize source files to follow DOS format

Liming Gao posted 1 patch 5 years, 11 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
BaseTools/Scripts/FormatDosFiles.py | 93 +++++++++++++++++++++++++++++++++++++
1 file changed, 93 insertions(+)
create mode 100644 BaseTools/Scripts/FormatDosFiles.py
[edk2] [RFC] Formalize source files to follow DOS format
Posted by Liming Gao 5 years, 11 months ago
FormatDosFiles.py is added to clean up dos source files. It bases on
the rules defined in EDKII C Coding Standards Specification.
5.1.2 Do not use tab characters
5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
5.1.7 All files must end with CRLF
No trailing white space in one line. (To be added in spec)

The source files in edk2 project with the below postfix are dos format.
.h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf 
.txt .bat .py

The package maintainer can use this script to clean up all files in his 
package. The prefer way is to create one patch per one package.

Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Liming Gao <liming.gao@intel.com>
---
 BaseTools/Scripts/FormatDosFiles.py | 93 +++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 BaseTools/Scripts/FormatDosFiles.py

diff --git a/BaseTools/Scripts/FormatDosFiles.py b/BaseTools/Scripts/FormatDosFiles.py
new file mode 100644
index 0000000..c3a5476
--- /dev/null
+++ b/BaseTools/Scripts/FormatDosFiles.py
@@ -0,0 +1,93 @@
+# @file FormatDosFiles.py
+# This script format the source files to follow dos style.
+# It supports Python2.x and Python3.x both.
+#
+#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
+#
+#  This program and the accompanying materials
+#  are licensed and made available under the terms and conditions of the BSD License
+#  which accompanies this distribution.  The full text of the license may be found at
+#  http://opensource.org/licenses/bsd-license.php
+#
+#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+#
+
+#
+# Import Modules
+#
+import argparse
+import os
+import os.path
+import re
+import sys
+
+"""
+difference of string between python2 and python3:
+
+there is a large difference of string in python2 and python3.
+
+in python2,there are two type string,unicode string (unicode type) and 8-bit string (str type).
+	us = u"abcd",
+	unicode string,which is internally stored as unicode code point.
+	s = "abcd",s = b"abcd",s = r"abcd",
+	all of them are 8-bit string,which is internally stored as bytes.
+
+in python3,a new type called bytes replace 8-bit string,and str type is regarded as unicode string.
+	s = "abcd", s = u"abcd", s = r"abcd",
+	all of them are str type,which is internally stored unicode code point.
+	bs = b"abcd",
+	bytes type,which is interally stored as bytes
+
+in python2 ,the both type string can be mixed use,but in python3 it could not,
+which means the pattern and content in re match should be the same type in python3.
+in function FormatFile,it read file in binary mode so that the content is bytes type,so the pattern should also be bytes type.
+As a result,I add encode() to make it compitable among python2 and python3.
+  
+difference of encode,decode in python2 and python3: 
+the builtin function str.encode(encoding) and str.decode(encoding) are used for convert between 8-bit string and unicode string.
+
+in python2
+	encode convert unicode type to str type.decode vice versa.default encoding is ascii.
+	for example: s = us.encode()
+	but if the us is str type,the code will also work.it will be firstly convert to unicode type,
+	in this situation,the call equals s = us.decode().encode().
+
+in python3
+	encode convert str type to bytes type,decode vice versa.default encoding is utf8.
+	fpr example:
+	bs = s.encode(),only str type has encode method,so that won't be used wrongly.decode is the same.
+	
+in conclusion:
+	this code could work the same in python27 and python36 environment as far as the re pattern satisfy ascii character set.
+
+"""
+def FormatFiles():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('path', nargs=1, help='The path for files to be converted.')
+    parser.add_argument('extensions', nargs='+', help='File extensions filter. (Example: .txt .c .h)')
+    args = parser.parse_args()
+    filelist = []
+    for dirpath, dirnames, filenames in os.walk(args.path[0]):
+        for filename in [f for f in filenames if any(f.endswith(ext) for ext in args.extensions)]:
+            filelist.append(os.path.join(dirpath, filename))
+    for file in filelist:
+        fd = open(file, 'rb')
+        content = fd.read()
+        fd.close()
+        # Convert the line endings to CRLF
+        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
+        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags = re.MULTILINE)
+        # Add a new empty line if the file is not end with one
+        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
+        # Remove trailing white spaces
+        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags = re.MULTILINE)
+        # Replace '\t' with two spaces
+        content = re.sub('\t'.encode(), '  '.encode(), content)
+        fd = open(file, 'wb')
+        fd.write(content)
+        fd.close()
+        print(file)
+
+if __name__ == "__main__":
+    sys.exit(FormatFiles())
\ No newline at end of file
-- 
2.8.0.windows.1

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Carsey, Jaben 5 years, 11 months ago
Liming,

One Pep8 thing.
Can you change to use the with statement for the file read/write?

Other small thoughts.
I think that FileList should be changed to a set as order is not important.
Maybe wrapper the re.sub function with your own so all the .encode() are in one location?  As we move to python 3 we will have fewer changes to make.


> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Liming Gao
> Sent: Sunday, May 20, 2018 9:52 PM
> To: edk2-devel@lists.01.org
> Subject: [edk2] [RFC] Formalize source files to follow DOS format
> 
> FormatDosFiles.py is added to clean up dos source files. It bases on
> the rules defined in EDKII C Coding Standards Specification.
> 5.1.2 Do not use tab characters
> 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
> 5.1.7 All files must end with CRLF
> No trailing white space in one line. (To be added in spec)
> 
> The source files in edk2 project with the below postfix are dos format.
> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf
> .txt .bat .py
> 
> The package maintainer can use this script to clean up all files in his
> package. The prefer way is to create one patch per one package.
> 
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Liming Gao <liming.gao@intel.com>
> ---
>  BaseTools/Scripts/FormatDosFiles.py | 93
> +++++++++++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>  create mode 100644 BaseTools/Scripts/FormatDosFiles.py
> 
> diff --git a/BaseTools/Scripts/FormatDosFiles.py
> b/BaseTools/Scripts/FormatDosFiles.py
> new file mode 100644
> index 0000000..c3a5476
> --- /dev/null
> +++ b/BaseTools/Scripts/FormatDosFiles.py
> @@ -0,0 +1,93 @@
> +# @file FormatDosFiles.py
> +# This script format the source files to follow dos style.
> +# It supports Python2.x and Python3.x both.
> +#
> +#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
> +#
> +#  This program and the accompanying materials
> +#  are licensed and made available under the terms and conditions of the
> BSD License
> +#  which accompanies this distribution.  The full text of the license may be
> found at
> +#  http://opensource.org/licenses/bsd-license.php
> +#
> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
> BASIS,
> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
> EXPRESS OR IMPLIED.
> +#
> +
> +#
> +# Import Modules
> +#
> +import argparse
> +import os
> +import os.path
> +import re
> +import sys
> +
> +"""
> +difference of string between python2 and python3:
> +
> +there is a large difference of string in python2 and python3.
> +
> +in python2,there are two type string,unicode string (unicode type) and 8-bit
> string (str type).
> +	us = u"abcd",
> +	unicode string,which is internally stored as unicode code point.
> +	s = "abcd",s = b"abcd",s = r"abcd",
> +	all of them are 8-bit string,which is internally stored as bytes.
> +
> +in python3,a new type called bytes replace 8-bit string,and str type is
> regarded as unicode string.
> +	s = "abcd", s = u"abcd", s = r"abcd",
> +	all of them are str type,which is internally stored unicode code point.
> +	bs = b"abcd",
> +	bytes type,which is interally stored as bytes
> +
> +in python2 ,the both type string can be mixed use,but in python3 it could
> not,
> +which means the pattern and content in re match should be the same type
> in python3.
> +in function FormatFile,it read file in binary mode so that the content is bytes
> type,so the pattern should also be bytes type.
> +As a result,I add encode() to make it compitable among python2 and
> python3.
> +
> +difference of encode,decode in python2 and python3:
> +the builtin function str.encode(encoding) and str.decode(encoding) are
> used for convert between 8-bit string and unicode string.
> +
> +in python2
> +	encode convert unicode type to str type.decode vice versa.default
> encoding is ascii.
> +	for example: s = us.encode()
> +	but if the us is str type,the code will also work.it will be firstly convert
> to unicode type,
> +	in this situation,the call equals s = us.decode().encode().
> +
> +in python3
> +	encode convert str type to bytes type,decode vice versa.default
> encoding is utf8.
> +	fpr example:
> +	bs = s.encode(),only str type has encode method,so that won't be
> used wrongly.decode is the same.
> +
> +in conclusion:
> +	this code could work the same in python27 and python36
> environment as far as the re pattern satisfy ascii character set.
> +
> +"""
> +def FormatFiles():
> +    parser = argparse.ArgumentParser()
> +    parser.add_argument('path', nargs=1, help='The path for files to be
> converted.')
> +    parser.add_argument('extensions', nargs='+', help='File extensions filter.
> (Example: .txt .c .h)')
> +    args = parser.parse_args()
> +    filelist = []
> +    for dirpath, dirnames, filenames in os.walk(args.path[0]):
> +        for filename in [f for f in filenames if any(f.endswith(ext) for ext in
> args.extensions)]:
> +            filelist.append(os.path.join(dirpath, filename))
> +    for file in filelist:
> +        fd = open(file, 'rb')
> +        content = fd.read()
> +        fd.close()
> +        # Convert the line endings to CRLF
> +        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
> +        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags =
> re.MULTILINE)
> +        # Add a new empty line if the file is not end with one
> +        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
> +        # Remove trailing white spaces
> +        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags =
> re.MULTILINE)
> +        # Replace '\t' with two spaces
> +        content = re.sub('\t'.encode(), '  '.encode(), content)
> +        fd = open(file, 'wb')
> +        fd.write(content)
> +        fd.close()
> +        print(file)
> +
> +if __name__ == "__main__":
> +    sys.exit(FormatFiles())
> \ No newline at end of file
> --
> 2.8.0.windows.1
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Kinney, Michael D 5 years, 11 months ago
Liming,

We have a set of standard flags for tools that 
should always be present.

--help
-v
-q
--debug

We should also always have the program name,
description, version, and copyright.

Please see BaseTools/Scripts/BinToPcd.py as 
an example.

It might be useful to have a way to run this tool
on a single file when BaseTools/Scripts/PatchCheck.py
reports an issue.

Do you think it would be good to have one option to
scan path for file extensions that are documented as
DOS line endings so the extensions do not have to be
entered?

Mike


> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-
> bounces@lists.01.org] On Behalf Of Carsey, Jaben
> Sent: Monday, May 21, 2018 7:50 AM
> To: Gao, Liming <liming.gao@intel.com>; edk2-
> devel@lists.01.org
> Subject: Re: [edk2] [RFC] Formalize source files to
> follow DOS format
> 
> Liming,
> 
> One Pep8 thing.
> Can you change to use the with statement for the file
> read/write?
> 
> Other small thoughts.
> I think that FileList should be changed to a set as
> order is not important.
> Maybe wrapper the re.sub function with your own so all
> the .encode() are in one location?  As we move to python
> 3 we will have fewer changes to make.
> 
> 
> > -----Original Message-----
> > From: edk2-devel [mailto:edk2-devel-
> bounces@lists.01.org] On Behalf Of
> > Liming Gao
> > Sent: Sunday, May 20, 2018 9:52 PM
> > To: edk2-devel@lists.01.org
> > Subject: [edk2] [RFC] Formalize source files to follow
> DOS format
> >
> > FormatDosFiles.py is added to clean up dos source
> files. It bases on
> > the rules defined in EDKII C Coding Standards
> Specification.
> > 5.1.2 Do not use tab characters
> > 5.1.6 Only use CRLF (Carriage Return Line Feed) line
> endings.
> > 5.1.7 All files must end with CRLF
> > No trailing white space in one line. (To be added in
> spec)
> >
> > The source files in edk2 project with the below
> postfix are dos format.
> > .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni
> .asl .aslc .vfr .idf
> > .txt .bat .py
> >
> > The package maintainer can use this script to clean up
> all files in his
> > package. The prefer way is to create one patch per one
> package.
> >
> > Contributed-under: TianoCore Contribution Agreement
> 1.1
> > Signed-off-by: Liming Gao <liming.gao@intel.com>
> > ---
> >  BaseTools/Scripts/FormatDosFiles.py | 93
> > +++++++++++++++++++++++++++++++++++++
> >  1 file changed, 93 insertions(+)
> >  create mode 100644
> BaseTools/Scripts/FormatDosFiles.py
> >
> > diff --git a/BaseTools/Scripts/FormatDosFiles.py
> > b/BaseTools/Scripts/FormatDosFiles.py
> > new file mode 100644
> > index 0000000..c3a5476
> > --- /dev/null
> > +++ b/BaseTools/Scripts/FormatDosFiles.py
> > @@ -0,0 +1,93 @@
> > +# @file FormatDosFiles.py
> > +# This script format the source files to follow dos
> style.
> > +# It supports Python2.x and Python3.x both.
> > +#
> > +#  Copyright (c) 2018, Intel Corporation. All rights
> reserved.<BR>
> > +#
> > +#  This program and the accompanying materials
> > +#  are licensed and made available under the terms
> and conditions of the
> > BSD License
> > +#  which accompanies this distribution.  The full
> text of the license may be
> > found at
> > +#  http://opensource.org/licenses/bsd-license.php
> > +#
> > +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE
> ON AN "AS IS"
> > BASIS,
> > +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND,
> EITHER
> > EXPRESS OR IMPLIED.
> > +#
> > +
> > +#
> > +# Import Modules
> > +#
> > +import argparse
> > +import os
> > +import os.path
> > +import re
> > +import sys
> > +
> > +"""
> > +difference of string between python2 and python3:
> > +
> > +there is a large difference of string in python2 and
> python3.
> > +
> > +in python2,there are two type string,unicode string
> (unicode type) and 8-bit
> > string (str type).
> > +	us = u"abcd",
> > +	unicode string,which is internally stored as unicode
> code point.
> > +	s = "abcd",s = b"abcd",s = r"abcd",
> > +	all of them are 8-bit string,which is internally
> stored as bytes.
> > +
> > +in python3,a new type called bytes replace 8-bit
> string,and str type is
> > regarded as unicode string.
> > +	s = "abcd", s = u"abcd", s = r"abcd",
> > +	all of them are str type,which is internally stored
> unicode code point.
> > +	bs = b"abcd",
> > +	bytes type,which is interally stored as bytes
> > +
> > +in python2 ,the both type string can be mixed use,but
> in python3 it could
> > not,
> > +which means the pattern and content in re match
> should be the same type
> > in python3.
> > +in function FormatFile,it read file in binary mode so
> that the content is bytes
> > type,so the pattern should also be bytes type.
> > +As a result,I add encode() to make it compitable
> among python2 and
> > python3.
> > +
> > +difference of encode,decode in python2 and python3:
> > +the builtin function str.encode(encoding) and
> str.decode(encoding) are
> > used for convert between 8-bit string and unicode
> string.
> > +
> > +in python2
> > +	encode convert unicode type to str type.decode vice
> versa.default
> > encoding is ascii.
> > +	for example: s = us.encode()
> > +	but if the us is str type,the code will also work.it
> will be firstly convert
> > to unicode type,
> > +	in this situation,the call equals s =
> us.decode().encode().
> > +
> > +in python3
> > +	encode convert str type to bytes type,decode vice
> versa.default
> > encoding is utf8.
> > +	fpr example:
> > +	bs = s.encode(),only str type has encode method,so
> that won't be
> > used wrongly.decode is the same.
> > +
> > +in conclusion:
> > +	this code could work the same in python27 and
> python36
> > environment as far as the re pattern satisfy ascii
> character set.
> > +
> > +"""
> > +def FormatFiles():
> > +    parser = argparse.ArgumentParser()
> > +    parser.add_argument('path', nargs=1, help='The
> path for files to be
> > converted.')
> > +    parser.add_argument('extensions', nargs='+',
> help='File extensions filter.
> > (Example: .txt .c .h)')
> > +    args = parser.parse_args()
> > +    filelist = []
> > +    for dirpath, dirnames, filenames in
> os.walk(args.path[0]):
> > +        for filename in [f for f in filenames if
> any(f.endswith(ext) for ext in
> > args.extensions)]:
> > +            filelist.append(os.path.join(dirpath,
> filename))
> > +    for file in filelist:
> > +        fd = open(file, 'rb')
> > +        content = fd.read()
> > +        fd.close()
> > +        # Convert the line endings to CRLF
> > +        content = re.sub(r'([^\r])\n'.encode(),
> r'\1\r\n'.encode(), content)
> > +        content = re.sub(r'^\n'.encode(),
> r'\r\n'.encode(), content, flags =
> > re.MULTILINE)
> > +        # Add a new empty line if the file is not end
> with one
> > +        content = re.sub(r'([^\r\n])$'.encode(),
> r'\1\r\n'.encode(), content)
> > +        # Remove trailing white spaces
> > +        content = re.sub(r'[ \t]+(\r\n)'.encode(),
> r'\1'.encode(), content, flags =
> > re.MULTILINE)
> > +        # Replace '\t' with two spaces
> > +        content = re.sub('\t'.encode(), '
> '.encode(), content)
> > +        fd = open(file, 'wb')
> > +        fd.write(content)
> > +        fd.close()
> > +        print(file)
> > +
> > +if __name__ == "__main__":
> > +    sys.exit(FormatFiles())
> > \ No newline at end of file
> > --
> > 2.8.0.windows.1
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Carsey, Jaben 5 years, 11 months ago
Mike,

Perhaps a default set of file extensions that can be overridden?

-Jaben


> On May 21, 2018, at 3:41 PM, Kinney, Michael D <michael.d.kinney@intel.com> wrote:
> 
> Liming,
> 
> We have a set of standard flags for tools that 
> should always be present.
> 
> --help
> -v
> -q
> --debug
> 
> We should also always have the program name,
> description, version, and copyright.
> 
> Please see BaseTools/Scripts/BinToPcd.py as 
> an example.
> 
> It might be useful to have a way to run this tool
> on a single file when BaseTools/Scripts/PatchCheck.py
> reports an issue.
> 
> Do you think it would be good to have one option to
> scan path for file extensions that are documented as
> DOS line endings so the extensions do not have to be
> entered?
> 
> Mike
> 
> 
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-
>> bounces@lists.01.org] On Behalf Of Carsey, Jaben
>> Sent: Monday, May 21, 2018 7:50 AM
>> To: Gao, Liming <liming.gao@intel.com>; edk2-
>> devel@lists.01.org
>> Subject: Re: [edk2] [RFC] Formalize source files to
>> follow DOS format
>> 
>> Liming,
>> 
>> One Pep8 thing.
>> Can you change to use the with statement for the file
>> read/write?
>> 
>> Other small thoughts.
>> I think that FileList should be changed to a set as
>> order is not important.
>> Maybe wrapper the re.sub function with your own so all
>> the .encode() are in one location?  As we move to python
>> 3 we will have fewer changes to make.
>> 
>> 
>>> -----Original Message-----
>>> From: edk2-devel [mailto:edk2-devel-
>> bounces@lists.01.org] On Behalf Of
>>> Liming Gao
>>> Sent: Sunday, May 20, 2018 9:52 PM
>>> To: edk2-devel@lists.01.org
>>> Subject: [edk2] [RFC] Formalize source files to follow
>> DOS format
>>> 
>>> FormatDosFiles.py is added to clean up dos source
>> files. It bases on
>>> the rules defined in EDKII C Coding Standards
>> Specification.
>>> 5.1.2 Do not use tab characters
>>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line
>> endings.
>>> 5.1.7 All files must end with CRLF
>>> No trailing white space in one line. (To be added in
>> spec)
>>> 
>>> The source files in edk2 project with the below
>> postfix are dos format.
>>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni
>> .asl .aslc .vfr .idf
>>> .txt .bat .py
>>> 
>>> The package maintainer can use this script to clean up
>> all files in his
>>> package. The prefer way is to create one patch per one
>> package.
>>> 
>>> Contributed-under: TianoCore Contribution Agreement
>> 1.1
>>> Signed-off-by: Liming Gao <liming.gao@intel.com>
>>> ---
>>> BaseTools/Scripts/FormatDosFiles.py | 93
>>> +++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 93 insertions(+)
>>> create mode 100644
>> BaseTools/Scripts/FormatDosFiles.py
>>> 
>>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
>>> b/BaseTools/Scripts/FormatDosFiles.py
>>> new file mode 100644
>>> index 0000000..c3a5476
>>> --- /dev/null
>>> +++ b/BaseTools/Scripts/FormatDosFiles.py
>>> @@ -0,0 +1,93 @@
>>> +# @file FormatDosFiles.py
>>> +# This script format the source files to follow dos
>> style.
>>> +# It supports Python2.x and Python3.x both.
>>> +#
>>> +#  Copyright (c) 2018, Intel Corporation. All rights
>> reserved.<BR>
>>> +#
>>> +#  This program and the accompanying materials
>>> +#  are licensed and made available under the terms
>> and conditions of the
>>> BSD License
>>> +#  which accompanies this distribution.  The full
>> text of the license may be
>>> found at
>>> +#  http://opensource.org/licenses/bsd-license.php
>>> +#
>>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE
>> ON AN "AS IS"
>>> BASIS,
>>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND,
>> EITHER
>>> EXPRESS OR IMPLIED.
>>> +#
>>> +
>>> +#
>>> +# Import Modules
>>> +#
>>> +import argparse
>>> +import os
>>> +import os.path
>>> +import re
>>> +import sys
>>> +
>>> +"""
>>> +difference of string between python2 and python3:
>>> +
>>> +there is a large difference of string in python2 and
>> python3.
>>> +
>>> +in python2,there are two type string,unicode string
>> (unicode type) and 8-bit
>>> string (str type).
>>> +    us = u"abcd",
>>> +    unicode string,which is internally stored as unicode
>> code point.
>>> +    s = "abcd",s = b"abcd",s = r"abcd",
>>> +    all of them are 8-bit string,which is internally
>> stored as bytes.
>>> +
>>> +in python3,a new type called bytes replace 8-bit
>> string,and str type is
>>> regarded as unicode string.
>>> +    s = "abcd", s = u"abcd", s = r"abcd",
>>> +    all of them are str type,which is internally stored
>> unicode code point.
>>> +    bs = b"abcd",
>>> +    bytes type,which is interally stored as bytes
>>> +
>>> +in python2 ,the both type string can be mixed use,but
>> in python3 it could
>>> not,
>>> +which means the pattern and content in re match
>> should be the same type
>>> in python3.
>>> +in function FormatFile,it read file in binary mode so
>> that the content is bytes
>>> type,so the pattern should also be bytes type.
>>> +As a result,I add encode() to make it compitable
>> among python2 and
>>> python3.
>>> +
>>> +difference of encode,decode in python2 and python3:
>>> +the builtin function str.encode(encoding) and
>> str.decode(encoding) are
>>> used for convert between 8-bit string and unicode
>> string.
>>> +
>>> +in python2
>>> +    encode convert unicode type to str type.decode vice
>> versa.default
>>> encoding is ascii.
>>> +    for example: s = us.encode()
>>> +    but if the us is str type,the code will also work.it
>> will be firstly convert
>>> to unicode type,
>>> +    in this situation,the call equals s =
>> us.decode().encode().
>>> +
>>> +in python3
>>> +    encode convert str type to bytes type,decode vice
>> versa.default
>>> encoding is utf8.
>>> +    fpr example:
>>> +    bs = s.encode(),only str type has encode method,so
>> that won't be
>>> used wrongly.decode is the same.
>>> +
>>> +in conclusion:
>>> +    this code could work the same in python27 and
>> python36
>>> environment as far as the re pattern satisfy ascii
>> character set.
>>> +
>>> +"""
>>> +def FormatFiles():
>>> +    parser = argparse.ArgumentParser()
>>> +    parser.add_argument('path', nargs=1, help='The
>> path for files to be
>>> converted.')
>>> +    parser.add_argument('extensions', nargs='+',
>> help='File extensions filter.
>>> (Example: .txt .c .h)')
>>> +    args = parser.parse_args()
>>> +    filelist = []
>>> +    for dirpath, dirnames, filenames in
>> os.walk(args.path[0]):
>>> +        for filename in [f for f in filenames if
>> any(f.endswith(ext) for ext in
>>> args.extensions)]:
>>> +            filelist.append(os.path.join(dirpath,
>> filename))
>>> +    for file in filelist:
>>> +        fd = open(file, 'rb')
>>> +        content = fd.read()
>>> +        fd.close()
>>> +        # Convert the line endings to CRLF
>>> +        content = re.sub(r'([^\r])\n'.encode(),
>> r'\1\r\n'.encode(), content)
>>> +        content = re.sub(r'^\n'.encode(),
>> r'\r\n'.encode(), content, flags =
>>> re.MULTILINE)
>>> +        # Add a new empty line if the file is not end
>> with one
>>> +        content = re.sub(r'([^\r\n])$'.encode(),
>> r'\1\r\n'.encode(), content)
>>> +        # Remove trailing white spaces
>>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(),
>> r'\1'.encode(), content, flags =
>>> re.MULTILINE)
>>> +        # Replace '\t' with two spaces
>>> +        content = re.sub('\t'.encode(), '
>> '.encode(), content)
>>> +        fd = open(file, 'wb')
>>> +        fd.write(content)
>>> +        fd.close()
>>> +        print(file)
>>> +
>>> +if __name__ == "__main__":
>>> +    sys.exit(FormatFiles())
>>> \ No newline at end of file
>>> --
>>> 2.8.0.windows.1
>>> 
>>> _______________________________________________
>>> edk2-devel mailing list
>>> edk2-devel@lists.01.org
>>> https://lists.01.org/mailman/listinfo/edk2-devel
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Kinney, Michael D 5 years, 11 months ago
Jaben,

Yes.  With default behavior is default set and 
specifying one or more extensions overrides the 
default set.

Mike

> -----Original Message-----
> From: Carsey, Jaben
> Sent: Monday, May 21, 2018 3:43 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com>
> Cc: Gao, Liming <liming.gao@intel.com>; edk2-
> devel@lists.01.org
> Subject: Re: [edk2] [RFC] Formalize source files to
> follow DOS format
> 
> Mike,
> 
> Perhaps a default set of file extensions that can be
> overridden?
> 
> -Jaben
> 
> 
> > On May 21, 2018, at 3:41 PM, Kinney, Michael D
> <michael.d.kinney@intel.com> wrote:
> >
> > Liming,
> >
> > We have a set of standard flags for tools that
> > should always be present.
> >
> > --help
> > -v
> > -q
> > --debug
> >
> > We should also always have the program name,
> > description, version, and copyright.
> >
> > Please see BaseTools/Scripts/BinToPcd.py as
> > an example.
> >
> > It might be useful to have a way to run this tool
> > on a single file when BaseTools/Scripts/PatchCheck.py
> > reports an issue.
> >
> > Do you think it would be good to have one option to
> > scan path for file extensions that are documented as
> > DOS line endings so the extensions do not have to be
> > entered?
> >
> > Mike
> >
> >
> >> -----Original Message-----
> >> From: edk2-devel [mailto:edk2-devel-
> >> bounces@lists.01.org] On Behalf Of Carsey, Jaben
> >> Sent: Monday, May 21, 2018 7:50 AM
> >> To: Gao, Liming <liming.gao@intel.com>; edk2-
> >> devel@lists.01.org
> >> Subject: Re: [edk2] [RFC] Formalize source files to
> >> follow DOS format
> >>
> >> Liming,
> >>
> >> One Pep8 thing.
> >> Can you change to use the with statement for the file
> >> read/write?
> >>
> >> Other small thoughts.
> >> I think that FileList should be changed to a set as
> >> order is not important.
> >> Maybe wrapper the re.sub function with your own so
> all
> >> the .encode() are in one location?  As we move to
> python
> >> 3 we will have fewer changes to make.
> >>
> >>
> >>> -----Original Message-----
> >>> From: edk2-devel [mailto:edk2-devel-
> >> bounces@lists.01.org] On Behalf Of
> >>> Liming Gao
> >>> Sent: Sunday, May 20, 2018 9:52 PM
> >>> To: edk2-devel@lists.01.org
> >>> Subject: [edk2] [RFC] Formalize source files to
> follow
> >> DOS format
> >>>
> >>> FormatDosFiles.py is added to clean up dos source
> >> files. It bases on
> >>> the rules defined in EDKII C Coding Standards
> >> Specification.
> >>> 5.1.2 Do not use tab characters
> >>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line
> >> endings.
> >>> 5.1.7 All files must end with CRLF
> >>> No trailing white space in one line. (To be added in
> >> spec)
> >>>
> >>> The source files in edk2 project with the below
> >> postfix are dos format.
> >>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni
> >> .asl .aslc .vfr .idf
> >>> .txt .bat .py
> >>>
> >>> The package maintainer can use this script to clean
> up
> >> all files in his
> >>> package. The prefer way is to create one patch per
> one
> >> package.
> >>>
> >>> Contributed-under: TianoCore Contribution Agreement
> >> 1.1
> >>> Signed-off-by: Liming Gao <liming.gao@intel.com>
> >>> ---
> >>> BaseTools/Scripts/FormatDosFiles.py | 93
> >>> +++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 93 insertions(+)
> >>> create mode 100644
> >> BaseTools/Scripts/FormatDosFiles.py
> >>>
> >>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
> >>> b/BaseTools/Scripts/FormatDosFiles.py
> >>> new file mode 100644
> >>> index 0000000..c3a5476
> >>> --- /dev/null
> >>> +++ b/BaseTools/Scripts/FormatDosFiles.py
> >>> @@ -0,0 +1,93 @@
> >>> +# @file FormatDosFiles.py
> >>> +# This script format the source files to follow dos
> >> style.
> >>> +# It supports Python2.x and Python3.x both.
> >>> +#
> >>> +#  Copyright (c) 2018, Intel Corporation. All
> rights
> >> reserved.<BR>
> >>> +#
> >>> +#  This program and the accompanying materials
> >>> +#  are licensed and made available under the terms
> >> and conditions of the
> >>> BSD License
> >>> +#  which accompanies this distribution.  The full
> >> text of the license may be
> >>> found at
> >>> +#  http://opensource.org/licenses/bsd-license.php
> >>> +#
> >>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE
> >> ON AN "AS IS"
> >>> BASIS,
> >>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY
> KIND,
> >> EITHER
> >>> EXPRESS OR IMPLIED.
> >>> +#
> >>> +
> >>> +#
> >>> +# Import Modules
> >>> +#
> >>> +import argparse
> >>> +import os
> >>> +import os.path
> >>> +import re
> >>> +import sys
> >>> +
> >>> +"""
> >>> +difference of string between python2 and python3:
> >>> +
> >>> +there is a large difference of string in python2
> and
> >> python3.
> >>> +
> >>> +in python2,there are two type string,unicode string
> >> (unicode type) and 8-bit
> >>> string (str type).
> >>> +    us = u"abcd",
> >>> +    unicode string,which is internally stored as
> unicode
> >> code point.
> >>> +    s = "abcd",s = b"abcd",s = r"abcd",
> >>> +    all of them are 8-bit string,which is
> internally
> >> stored as bytes.
> >>> +
> >>> +in python3,a new type called bytes replace 8-bit
> >> string,and str type is
> >>> regarded as unicode string.
> >>> +    s = "abcd", s = u"abcd", s = r"abcd",
> >>> +    all of them are str type,which is internally
> stored
> >> unicode code point.
> >>> +    bs = b"abcd",
> >>> +    bytes type,which is interally stored as bytes
> >>> +
> >>> +in python2 ,the both type string can be mixed
> use,but
> >> in python3 it could
> >>> not,
> >>> +which means the pattern and content in re match
> >> should be the same type
> >>> in python3.
> >>> +in function FormatFile,it read file in binary mode
> so
> >> that the content is bytes
> >>> type,so the pattern should also be bytes type.
> >>> +As a result,I add encode() to make it compitable
> >> among python2 and
> >>> python3.
> >>> +
> >>> +difference of encode,decode in python2 and python3:
> >>> +the builtin function str.encode(encoding) and
> >> str.decode(encoding) are
> >>> used for convert between 8-bit string and unicode
> >> string.
> >>> +
> >>> +in python2
> >>> +    encode convert unicode type to str type.decode
> vice
> >> versa.default
> >>> encoding is ascii.
> >>> +    for example: s = us.encode()
> >>> +    but if the us is str type,the code will also
> work.it
> >> will be firstly convert
> >>> to unicode type,
> >>> +    in this situation,the call equals s =
> >> us.decode().encode().
> >>> +
> >>> +in python3
> >>> +    encode convert str type to bytes type,decode
> vice
> >> versa.default
> >>> encoding is utf8.
> >>> +    fpr example:
> >>> +    bs = s.encode(),only str type has encode
> method,so
> >> that won't be
> >>> used wrongly.decode is the same.
> >>> +
> >>> +in conclusion:
> >>> +    this code could work the same in python27 and
> >> python36
> >>> environment as far as the re pattern satisfy ascii
> >> character set.
> >>> +
> >>> +"""
> >>> +def FormatFiles():
> >>> +    parser = argparse.ArgumentParser()
> >>> +    parser.add_argument('path', nargs=1, help='The
> >> path for files to be
> >>> converted.')
> >>> +    parser.add_argument('extensions', nargs='+',
> >> help='File extensions filter.
> >>> (Example: .txt .c .h)')
> >>> +    args = parser.parse_args()
> >>> +    filelist = []
> >>> +    for dirpath, dirnames, filenames in
> >> os.walk(args.path[0]):
> >>> +        for filename in [f for f in filenames if
> >> any(f.endswith(ext) for ext in
> >>> args.extensions)]:
> >>> +            filelist.append(os.path.join(dirpath,
> >> filename))
> >>> +    for file in filelist:
> >>> +        fd = open(file, 'rb')
> >>> +        content = fd.read()
> >>> +        fd.close()
> >>> +        # Convert the line endings to CRLF
> >>> +        content = re.sub(r'([^\r])\n'.encode(),
> >> r'\1\r\n'.encode(), content)
> >>> +        content = re.sub(r'^\n'.encode(),
> >> r'\r\n'.encode(), content, flags =
> >>> re.MULTILINE)
> >>> +        # Add a new empty line if the file is not
> end
> >> with one
> >>> +        content = re.sub(r'([^\r\n])$'.encode(),
> >> r'\1\r\n'.encode(), content)
> >>> +        # Remove trailing white spaces
> >>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(),
> >> r'\1'.encode(), content, flags =
> >>> re.MULTILINE)
> >>> +        # Replace '\t' with two spaces
> >>> +        content = re.sub('\t'.encode(), '
> >> '.encode(), content)
> >>> +        fd = open(file, 'wb')
> >>> +        fd.write(content)
> >>> +        fd.close()
> >>> +        print(file)
> >>> +
> >>> +if __name__ == "__main__":
> >>> +    sys.exit(FormatFiles())
> >>> \ No newline at end of file
> >>> --
> >>> 2.8.0.windows.1
> >>>
> >>> _______________________________________________
> >>> edk2-devel mailing list
> >>> edk2-devel@lists.01.org
> >>> https://lists.01.org/mailman/listinfo/edk2-devel
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Gao, Liming 5 years, 11 months ago
Mike:
  I agree your comments. On default file set, this script can have the default ones. User can specify more set to append the default ones instead of override the default ones.

Thanks
Liming
>-----Original Message-----
>From: Kinney, Michael D
>Sent: Tuesday, May 22, 2018 6:59 AM
>To: Carsey, Jaben <jaben.carsey@intel.com>; Kinney, Michael D
><michael.d.kinney@intel.com>
>Cc: Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
>Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
>
>Jaben,
>
>Yes.  With default behavior is default set and
>specifying one or more extensions overrides the
>default set.
>
>Mike
>
>> -----Original Message-----
>> From: Carsey, Jaben
>> Sent: Monday, May 21, 2018 3:43 PM
>> To: Kinney, Michael D <michael.d.kinney@intel.com>
>> Cc: Gao, Liming <liming.gao@intel.com>; edk2-
>> devel@lists.01.org
>> Subject: Re: [edk2] [RFC] Formalize source files to
>> follow DOS format
>>
>> Mike,
>>
>> Perhaps a default set of file extensions that can be
>> overridden?
>>
>> -Jaben
>>
>>
>> > On May 21, 2018, at 3:41 PM, Kinney, Michael D
>> <michael.d.kinney@intel.com> wrote:
>> >
>> > Liming,
>> >
>> > We have a set of standard flags for tools that
>> > should always be present.
>> >
>> > --help
>> > -v
>> > -q
>> > --debug
>> >
>> > We should also always have the program name,
>> > description, version, and copyright.
>> >
>> > Please see BaseTools/Scripts/BinToPcd.py as
>> > an example.
>> >
>> > It might be useful to have a way to run this tool
>> > on a single file when BaseTools/Scripts/PatchCheck.py
>> > reports an issue.
>> >
>> > Do you think it would be good to have one option to
>> > scan path for file extensions that are documented as
>> > DOS line endings so the extensions do not have to be
>> > entered?
>> >
>> > Mike
>> >
>> >
>> >> -----Original Message-----
>> >> From: edk2-devel [mailto:edk2-devel-
>> >> bounces@lists.01.org] On Behalf Of Carsey, Jaben
>> >> Sent: Monday, May 21, 2018 7:50 AM
>> >> To: Gao, Liming <liming.gao@intel.com>; edk2-
>> >> devel@lists.01.org
>> >> Subject: Re: [edk2] [RFC] Formalize source files to
>> >> follow DOS format
>> >>
>> >> Liming,
>> >>
>> >> One Pep8 thing.
>> >> Can you change to use the with statement for the file
>> >> read/write?
>> >>
>> >> Other small thoughts.
>> >> I think that FileList should be changed to a set as
>> >> order is not important.
>> >> Maybe wrapper the re.sub function with your own so
>> all
>> >> the .encode() are in one location?  As we move to
>> python
>> >> 3 we will have fewer changes to make.
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: edk2-devel [mailto:edk2-devel-
>> >> bounces@lists.01.org] On Behalf Of
>> >>> Liming Gao
>> >>> Sent: Sunday, May 20, 2018 9:52 PM
>> >>> To: edk2-devel@lists.01.org
>> >>> Subject: [edk2] [RFC] Formalize source files to
>> follow
>> >> DOS format
>> >>>
>> >>> FormatDosFiles.py is added to clean up dos source
>> >> files. It bases on
>> >>> the rules defined in EDKII C Coding Standards
>> >> Specification.
>> >>> 5.1.2 Do not use tab characters
>> >>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line
>> >> endings.
>> >>> 5.1.7 All files must end with CRLF
>> >>> No trailing white space in one line. (To be added in
>> >> spec)
>> >>>
>> >>> The source files in edk2 project with the below
>> >> postfix are dos format.
>> >>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni
>> >> .asl .aslc .vfr .idf
>> >>> .txt .bat .py
>> >>>
>> >>> The package maintainer can use this script to clean
>> up
>> >> all files in his
>> >>> package. The prefer way is to create one patch per
>> one
>> >> package.
>> >>>
>> >>> Contributed-under: TianoCore Contribution Agreement
>> >> 1.1
>> >>> Signed-off-by: Liming Gao <liming.gao@intel.com>
>> >>> ---
>> >>> BaseTools/Scripts/FormatDosFiles.py | 93
>> >>> +++++++++++++++++++++++++++++++++++++
>> >>> 1 file changed, 93 insertions(+)
>> >>> create mode 100644
>> >> BaseTools/Scripts/FormatDosFiles.py
>> >>>
>> >>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
>> >>> b/BaseTools/Scripts/FormatDosFiles.py
>> >>> new file mode 100644
>> >>> index 0000000..c3a5476
>> >>> --- /dev/null
>> >>> +++ b/BaseTools/Scripts/FormatDosFiles.py
>> >>> @@ -0,0 +1,93 @@
>> >>> +# @file FormatDosFiles.py
>> >>> +# This script format the source files to follow dos
>> >> style.
>> >>> +# It supports Python2.x and Python3.x both.
>> >>> +#
>> >>> +#  Copyright (c) 2018, Intel Corporation. All
>> rights
>> >> reserved.<BR>
>> >>> +#
>> >>> +#  This program and the accompanying materials
>> >>> +#  are licensed and made available under the terms
>> >> and conditions of the
>> >>> BSD License
>> >>> +#  which accompanies this distribution.  The full
>> >> text of the license may be
>> >>> found at
>> >>> +#  http://opensource.org/licenses/bsd-license.php
>> >>> +#
>> >>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE
>> >> ON AN "AS IS"
>> >>> BASIS,
>> >>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY
>> KIND,
>> >> EITHER
>> >>> EXPRESS OR IMPLIED.
>> >>> +#
>> >>> +
>> >>> +#
>> >>> +# Import Modules
>> >>> +#
>> >>> +import argparse
>> >>> +import os
>> >>> +import os.path
>> >>> +import re
>> >>> +import sys
>> >>> +
>> >>> +"""
>> >>> +difference of string between python2 and python3:
>> >>> +
>> >>> +there is a large difference of string in python2
>> and
>> >> python3.
>> >>> +
>> >>> +in python2,there are two type string,unicode string
>> >> (unicode type) and 8-bit
>> >>> string (str type).
>> >>> +    us = u"abcd",
>> >>> +    unicode string,which is internally stored as
>> unicode
>> >> code point.
>> >>> +    s = "abcd",s = b"abcd",s = r"abcd",
>> >>> +    all of them are 8-bit string,which is
>> internally
>> >> stored as bytes.
>> >>> +
>> >>> +in python3,a new type called bytes replace 8-bit
>> >> string,and str type is
>> >>> regarded as unicode string.
>> >>> +    s = "abcd", s = u"abcd", s = r"abcd",
>> >>> +    all of them are str type,which is internally
>> stored
>> >> unicode code point.
>> >>> +    bs = b"abcd",
>> >>> +    bytes type,which is interally stored as bytes
>> >>> +
>> >>> +in python2 ,the both type string can be mixed
>> use,but
>> >> in python3 it could
>> >>> not,
>> >>> +which means the pattern and content in re match
>> >> should be the same type
>> >>> in python3.
>> >>> +in function FormatFile,it read file in binary mode
>> so
>> >> that the content is bytes
>> >>> type,so the pattern should also be bytes type.
>> >>> +As a result,I add encode() to make it compitable
>> >> among python2 and
>> >>> python3.
>> >>> +
>> >>> +difference of encode,decode in python2 and python3:
>> >>> +the builtin function str.encode(encoding) and
>> >> str.decode(encoding) are
>> >>> used for convert between 8-bit string and unicode
>> >> string.
>> >>> +
>> >>> +in python2
>> >>> +    encode convert unicode type to str type.decode
>> vice
>> >> versa.default
>> >>> encoding is ascii.
>> >>> +    for example: s = us.encode()
>> >>> +    but if the us is str type,the code will also
>> work.it
>> >> will be firstly convert
>> >>> to unicode type,
>> >>> +    in this situation,the call equals s =
>> >> us.decode().encode().
>> >>> +
>> >>> +in python3
>> >>> +    encode convert str type to bytes type,decode
>> vice
>> >> versa.default
>> >>> encoding is utf8.
>> >>> +    fpr example:
>> >>> +    bs = s.encode(),only str type has encode
>> method,so
>> >> that won't be
>> >>> used wrongly.decode is the same.
>> >>> +
>> >>> +in conclusion:
>> >>> +    this code could work the same in python27 and
>> >> python36
>> >>> environment as far as the re pattern satisfy ascii
>> >> character set.
>> >>> +
>> >>> +"""
>> >>> +def FormatFiles():
>> >>> +    parser = argparse.ArgumentParser()
>> >>> +    parser.add_argument('path', nargs=1, help='The
>> >> path for files to be
>> >>> converted.')
>> >>> +    parser.add_argument('extensions', nargs='+',
>> >> help='File extensions filter.
>> >>> (Example: .txt .c .h)')
>> >>> +    args = parser.parse_args()
>> >>> +    filelist = []
>> >>> +    for dirpath, dirnames, filenames in
>> >> os.walk(args.path[0]):
>> >>> +        for filename in [f for f in filenames if
>> >> any(f.endswith(ext) for ext in
>> >>> args.extensions)]:
>> >>> +            filelist.append(os.path.join(dirpath,
>> >> filename))
>> >>> +    for file in filelist:
>> >>> +        fd = open(file, 'rb')
>> >>> +        content = fd.read()
>> >>> +        fd.close()
>> >>> +        # Convert the line endings to CRLF
>> >>> +        content = re.sub(r'([^\r])\n'.encode(),
>> >> r'\1\r\n'.encode(), content)
>> >>> +        content = re.sub(r'^\n'.encode(),
>> >> r'\r\n'.encode(), content, flags =
>> >>> re.MULTILINE)
>> >>> +        # Add a new empty line if the file is not
>> end
>> >> with one
>> >>> +        content = re.sub(r'([^\r\n])$'.encode(),
>> >> r'\1\r\n'.encode(), content)
>> >>> +        # Remove trailing white spaces
>> >>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(),
>> >> r'\1'.encode(), content, flags =
>> >>> re.MULTILINE)
>> >>> +        # Replace '\t' with two spaces
>> >>> +        content = re.sub('\t'.encode(), '
>> >> '.encode(), content)
>> >>> +        fd = open(file, 'wb')
>> >>> +        fd.write(content)
>> >>> +        fd.close()
>> >>> +        print(file)
>> >>> +
>> >>> +if __name__ == "__main__":
>> >>> +    sys.exit(FormatFiles())
>> >>> \ No newline at end of file
>> >>> --
>> >>> 2.8.0.windows.1
>> >>>
>> >>> _______________________________________________
>> >>> edk2-devel mailing list
>> >>> edk2-devel@lists.01.org
>> >>> https://lists.01.org/mailman/listinfo/edk2-devel
>> >> _______________________________________________
>> >> edk2-devel mailing list
>> >> edk2-devel@lists.01.org
>> >> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Carsey, Jaben 5 years, 11 months ago
We could do something like we do for compiler flags... append or overwrite depending on syntax.

> -----Original Message-----
> From: Gao, Liming
> Sent: Thursday, May 24, 2018 1:35 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com>; Carsey, Jaben
> <jaben.carsey@intel.com>
> Cc: edk2-devel@lists.01.org
> Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> Importance: High
> 
> Mike:
>   I agree your comments. On default file set, this script can have the default
> ones. User can specify more set to append the default ones instead of
> override the default ones.
> 
> Thanks
> Liming
> >-----Original Message-----
> >From: Kinney, Michael D
> >Sent: Tuesday, May 22, 2018 6:59 AM
> >To: Carsey, Jaben <jaben.carsey@intel.com>; Kinney, Michael D
> ><michael.d.kinney@intel.com>
> >Cc: Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
> >Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> >
> >Jaben,
> >
> >Yes.  With default behavior is default set and
> >specifying one or more extensions overrides the
> >default set.
> >
> >Mike
> >
> >> -----Original Message-----
> >> From: Carsey, Jaben
> >> Sent: Monday, May 21, 2018 3:43 PM
> >> To: Kinney, Michael D <michael.d.kinney@intel.com>
> >> Cc: Gao, Liming <liming.gao@intel.com>; edk2-
> >> devel@lists.01.org
> >> Subject: Re: [edk2] [RFC] Formalize source files to
> >> follow DOS format
> >>
> >> Mike,
> >>
> >> Perhaps a default set of file extensions that can be
> >> overridden?
> >>
> >> -Jaben
> >>
> >>
> >> > On May 21, 2018, at 3:41 PM, Kinney, Michael D
> >> <michael.d.kinney@intel.com> wrote:
> >> >
> >> > Liming,
> >> >
> >> > We have a set of standard flags for tools that
> >> > should always be present.
> >> >
> >> > --help
> >> > -v
> >> > -q
> >> > --debug
> >> >
> >> > We should also always have the program name,
> >> > description, version, and copyright.
> >> >
> >> > Please see BaseTools/Scripts/BinToPcd.py as
> >> > an example.
> >> >
> >> > It might be useful to have a way to run this tool
> >> > on a single file when BaseTools/Scripts/PatchCheck.py
> >> > reports an issue.
> >> >
> >> > Do you think it would be good to have one option to
> >> > scan path for file extensions that are documented as
> >> > DOS line endings so the extensions do not have to be
> >> > entered?
> >> >
> >> > Mike
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: edk2-devel [mailto:edk2-devel-
> >> >> bounces@lists.01.org] On Behalf Of Carsey, Jaben
> >> >> Sent: Monday, May 21, 2018 7:50 AM
> >> >> To: Gao, Liming <liming.gao@intel.com>; edk2-
> >> >> devel@lists.01.org
> >> >> Subject: Re: [edk2] [RFC] Formalize source files to
> >> >> follow DOS format
> >> >>
> >> >> Liming,
> >> >>
> >> >> One Pep8 thing.
> >> >> Can you change to use the with statement for the file
> >> >> read/write?
> >> >>
> >> >> Other small thoughts.
> >> >> I think that FileList should be changed to a set as
> >> >> order is not important.
> >> >> Maybe wrapper the re.sub function with your own so
> >> all
> >> >> the .encode() are in one location?  As we move to
> >> python
> >> >> 3 we will have fewer changes to make.
> >> >>
> >> >>
> >> >>> -----Original Message-----
> >> >>> From: edk2-devel [mailto:edk2-devel-
> >> >> bounces@lists.01.org] On Behalf Of
> >> >>> Liming Gao
> >> >>> Sent: Sunday, May 20, 2018 9:52 PM
> >> >>> To: edk2-devel@lists.01.org
> >> >>> Subject: [edk2] [RFC] Formalize source files to
> >> follow
> >> >> DOS format
> >> >>>
> >> >>> FormatDosFiles.py is added to clean up dos source
> >> >> files. It bases on
> >> >>> the rules defined in EDKII C Coding Standards
> >> >> Specification.
> >> >>> 5.1.2 Do not use tab characters
> >> >>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line
> >> >> endings.
> >> >>> 5.1.7 All files must end with CRLF
> >> >>> No trailing white space in one line. (To be added in
> >> >> spec)
> >> >>>
> >> >>> The source files in edk2 project with the below
> >> >> postfix are dos format.
> >> >>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni
> >> >> .asl .aslc .vfr .idf
> >> >>> .txt .bat .py
> >> >>>
> >> >>> The package maintainer can use this script to clean
> >> up
> >> >> all files in his
> >> >>> package. The prefer way is to create one patch per
> >> one
> >> >> package.
> >> >>>
> >> >>> Contributed-under: TianoCore Contribution Agreement
> >> >> 1.1
> >> >>> Signed-off-by: Liming Gao <liming.gao@intel.com>
> >> >>> ---
> >> >>> BaseTools/Scripts/FormatDosFiles.py | 93
> >> >>> +++++++++++++++++++++++++++++++++++++
> >> >>> 1 file changed, 93 insertions(+)
> >> >>> create mode 100644
> >> >> BaseTools/Scripts/FormatDosFiles.py
> >> >>>
> >> >>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
> >> >>> b/BaseTools/Scripts/FormatDosFiles.py
> >> >>> new file mode 100644
> >> >>> index 0000000..c3a5476
> >> >>> --- /dev/null
> >> >>> +++ b/BaseTools/Scripts/FormatDosFiles.py
> >> >>> @@ -0,0 +1,93 @@
> >> >>> +# @file FormatDosFiles.py
> >> >>> +# This script format the source files to follow dos
> >> >> style.
> >> >>> +# It supports Python2.x and Python3.x both.
> >> >>> +#
> >> >>> +#  Copyright (c) 2018, Intel Corporation. All
> >> rights
> >> >> reserved.<BR>
> >> >>> +#
> >> >>> +#  This program and the accompanying materials
> >> >>> +#  are licensed and made available under the terms
> >> >> and conditions of the
> >> >>> BSD License
> >> >>> +#  which accompanies this distribution.  The full
> >> >> text of the license may be
> >> >>> found at
> >> >>> +#  http://opensource.org/licenses/bsd-license.php
> >> >>> +#
> >> >>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE
> >> >> ON AN "AS IS"
> >> >>> BASIS,
> >> >>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY
> >> KIND,
> >> >> EITHER
> >> >>> EXPRESS OR IMPLIED.
> >> >>> +#
> >> >>> +
> >> >>> +#
> >> >>> +# Import Modules
> >> >>> +#
> >> >>> +import argparse
> >> >>> +import os
> >> >>> +import os.path
> >> >>> +import re
> >> >>> +import sys
> >> >>> +
> >> >>> +"""
> >> >>> +difference of string between python2 and python3:
> >> >>> +
> >> >>> +there is a large difference of string in python2
> >> and
> >> >> python3.
> >> >>> +
> >> >>> +in python2,there are two type string,unicode string
> >> >> (unicode type) and 8-bit
> >> >>> string (str type).
> >> >>> +    us = u"abcd",
> >> >>> +    unicode string,which is internally stored as
> >> unicode
> >> >> code point.
> >> >>> +    s = "abcd",s = b"abcd",s = r"abcd",
> >> >>> +    all of them are 8-bit string,which is
> >> internally
> >> >> stored as bytes.
> >> >>> +
> >> >>> +in python3,a new type called bytes replace 8-bit
> >> >> string,and str type is
> >> >>> regarded as unicode string.
> >> >>> +    s = "abcd", s = u"abcd", s = r"abcd",
> >> >>> +    all of them are str type,which is internally
> >> stored
> >> >> unicode code point.
> >> >>> +    bs = b"abcd",
> >> >>> +    bytes type,which is interally stored as bytes
> >> >>> +
> >> >>> +in python2 ,the both type string can be mixed
> >> use,but
> >> >> in python3 it could
> >> >>> not,
> >> >>> +which means the pattern and content in re match
> >> >> should be the same type
> >> >>> in python3.
> >> >>> +in function FormatFile,it read file in binary mode
> >> so
> >> >> that the content is bytes
> >> >>> type,so the pattern should also be bytes type.
> >> >>> +As a result,I add encode() to make it compitable
> >> >> among python2 and
> >> >>> python3.
> >> >>> +
> >> >>> +difference of encode,decode in python2 and python3:
> >> >>> +the builtin function str.encode(encoding) and
> >> >> str.decode(encoding) are
> >> >>> used for convert between 8-bit string and unicode
> >> >> string.
> >> >>> +
> >> >>> +in python2
> >> >>> +    encode convert unicode type to str type.decode
> >> vice
> >> >> versa.default
> >> >>> encoding is ascii.
> >> >>> +    for example: s = us.encode()
> >> >>> +    but if the us is str type,the code will also
> >> work.it
> >> >> will be firstly convert
> >> >>> to unicode type,
> >> >>> +    in this situation,the call equals s =
> >> >> us.decode().encode().
> >> >>> +
> >> >>> +in python3
> >> >>> +    encode convert str type to bytes type,decode
> >> vice
> >> >> versa.default
> >> >>> encoding is utf8.
> >> >>> +    fpr example:
> >> >>> +    bs = s.encode(),only str type has encode
> >> method,so
> >> >> that won't be
> >> >>> used wrongly.decode is the same.
> >> >>> +
> >> >>> +in conclusion:
> >> >>> +    this code could work the same in python27 and
> >> >> python36
> >> >>> environment as far as the re pattern satisfy ascii
> >> >> character set.
> >> >>> +
> >> >>> +"""
> >> >>> +def FormatFiles():
> >> >>> +    parser = argparse.ArgumentParser()
> >> >>> +    parser.add_argument('path', nargs=1, help='The
> >> >> path for files to be
> >> >>> converted.')
> >> >>> +    parser.add_argument('extensions', nargs='+',
> >> >> help='File extensions filter.
> >> >>> (Example: .txt .c .h)')
> >> >>> +    args = parser.parse_args()
> >> >>> +    filelist = []
> >> >>> +    for dirpath, dirnames, filenames in
> >> >> os.walk(args.path[0]):
> >> >>> +        for filename in [f for f in filenames if
> >> >> any(f.endswith(ext) for ext in
> >> >>> args.extensions)]:
> >> >>> +            filelist.append(os.path.join(dirpath,
> >> >> filename))
> >> >>> +    for file in filelist:
> >> >>> +        fd = open(file, 'rb')
> >> >>> +        content = fd.read()
> >> >>> +        fd.close()
> >> >>> +        # Convert the line endings to CRLF
> >> >>> +        content = re.sub(r'([^\r])\n'.encode(),
> >> >> r'\1\r\n'.encode(), content)
> >> >>> +        content = re.sub(r'^\n'.encode(),
> >> >> r'\r\n'.encode(), content, flags =
> >> >>> re.MULTILINE)
> >> >>> +        # Add a new empty line if the file is not
> >> end
> >> >> with one
> >> >>> +        content = re.sub(r'([^\r\n])$'.encode(),
> >> >> r'\1\r\n'.encode(), content)
> >> >>> +        # Remove trailing white spaces
> >> >>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(),
> >> >> r'\1'.encode(), content, flags =
> >> >>> re.MULTILINE)
> >> >>> +        # Replace '\t' with two spaces
> >> >>> +        content = re.sub('\t'.encode(), '
> >> >> '.encode(), content)
> >> >>> +        fd = open(file, 'wb')
> >> >>> +        fd.write(content)
> >> >>> +        fd.close()
> >> >>> +        print(file)
> >> >>> +
> >> >>> +if __name__ == "__main__":
> >> >>> +    sys.exit(FormatFiles())
> >> >>> \ No newline at end of file
> >> >>> --
> >> >>> 2.8.0.windows.1
> >> >>>
> >> >>> _______________________________________________
> >> >>> edk2-devel mailing list
> >> >>> edk2-devel@lists.01.org
> >> >>> https://lists.01.org/mailman/listinfo/edk2-devel
> >> >> _______________________________________________
> >> >> edk2-devel mailing list
> >> >> edk2-devel@lists.01.org
> >> >> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Gao, Liming 5 years, 11 months ago
I get your point. We will update this script and send version 2. 

> -----Original Message-----
> From: Carsey, Jaben
> Sent: Thursday, May 24, 2018 10:14 PM
> To: Gao, Liming <liming.gao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>
> Cc: edk2-devel@lists.01.org
> Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> 
> We could do something like we do for compiler flags... append or overwrite depending on syntax.
> 
> > -----Original Message-----
> > From: Gao, Liming
> > Sent: Thursday, May 24, 2018 1:35 AM
> > To: Kinney, Michael D <michael.d.kinney@intel.com>; Carsey, Jaben
> > <jaben.carsey@intel.com>
> > Cc: edk2-devel@lists.01.org
> > Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> > Importance: High
> >
> > Mike:
> >   I agree your comments. On default file set, this script can have the default
> > ones. User can specify more set to append the default ones instead of
> > override the default ones.
> >
> > Thanks
> > Liming
> > >-----Original Message-----
> > >From: Kinney, Michael D
> > >Sent: Tuesday, May 22, 2018 6:59 AM
> > >To: Carsey, Jaben <jaben.carsey@intel.com>; Kinney, Michael D
> > ><michael.d.kinney@intel.com>
> > >Cc: Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
> > >Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> > >
> > >Jaben,
> > >
> > >Yes.  With default behavior is default set and
> > >specifying one or more extensions overrides the
> > >default set.
> > >
> > >Mike
> > >
> > >> -----Original Message-----
> > >> From: Carsey, Jaben
> > >> Sent: Monday, May 21, 2018 3:43 PM
> > >> To: Kinney, Michael D <michael.d.kinney@intel.com>
> > >> Cc: Gao, Liming <liming.gao@intel.com>; edk2-
> > >> devel@lists.01.org
> > >> Subject: Re: [edk2] [RFC] Formalize source files to
> > >> follow DOS format
> > >>
> > >> Mike,
> > >>
> > >> Perhaps a default set of file extensions that can be
> > >> overridden?
> > >>
> > >> -Jaben
> > >>
> > >>
> > >> > On May 21, 2018, at 3:41 PM, Kinney, Michael D
> > >> <michael.d.kinney@intel.com> wrote:
> > >> >
> > >> > Liming,
> > >> >
> > >> > We have a set of standard flags for tools that
> > >> > should always be present.
> > >> >
> > >> > --help
> > >> > -v
> > >> > -q
> > >> > --debug
> > >> >
> > >> > We should also always have the program name,
> > >> > description, version, and copyright.
> > >> >
> > >> > Please see BaseTools/Scripts/BinToPcd.py as
> > >> > an example.
> > >> >
> > >> > It might be useful to have a way to run this tool
> > >> > on a single file when BaseTools/Scripts/PatchCheck.py
> > >> > reports an issue.
> > >> >
> > >> > Do you think it would be good to have one option to
> > >> > scan path for file extensions that are documented as
> > >> > DOS line endings so the extensions do not have to be
> > >> > entered?
> > >> >
> > >> > Mike
> > >> >
> > >> >
> > >> >> -----Original Message-----
> > >> >> From: edk2-devel [mailto:edk2-devel-
> > >> >> bounces@lists.01.org] On Behalf Of Carsey, Jaben
> > >> >> Sent: Monday, May 21, 2018 7:50 AM
> > >> >> To: Gao, Liming <liming.gao@intel.com>; edk2-
> > >> >> devel@lists.01.org
> > >> >> Subject: Re: [edk2] [RFC] Formalize source files to
> > >> >> follow DOS format
> > >> >>
> > >> >> Liming,
> > >> >>
> > >> >> One Pep8 thing.
> > >> >> Can you change to use the with statement for the file
> > >> >> read/write?
> > >> >>
> > >> >> Other small thoughts.
> > >> >> I think that FileList should be changed to a set as
> > >> >> order is not important.
> > >> >> Maybe wrapper the re.sub function with your own so
> > >> all
> > >> >> the .encode() are in one location?  As we move to
> > >> python
> > >> >> 3 we will have fewer changes to make.
> > >> >>
> > >> >>
> > >> >>> -----Original Message-----
> > >> >>> From: edk2-devel [mailto:edk2-devel-
> > >> >> bounces@lists.01.org] On Behalf Of
> > >> >>> Liming Gao
> > >> >>> Sent: Sunday, May 20, 2018 9:52 PM
> > >> >>> To: edk2-devel@lists.01.org
> > >> >>> Subject: [edk2] [RFC] Formalize source files to
> > >> follow
> > >> >> DOS format
> > >> >>>
> > >> >>> FormatDosFiles.py is added to clean up dos source
> > >> >> files. It bases on
> > >> >>> the rules defined in EDKII C Coding Standards
> > >> >> Specification.
> > >> >>> 5.1.2 Do not use tab characters
> > >> >>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line
> > >> >> endings.
> > >> >>> 5.1.7 All files must end with CRLF
> > >> >>> No trailing white space in one line. (To be added in
> > >> >> spec)
> > >> >>>
> > >> >>> The source files in edk2 project with the below
> > >> >> postfix are dos format.
> > >> >>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni
> > >> >> .asl .aslc .vfr .idf
> > >> >>> .txt .bat .py
> > >> >>>
> > >> >>> The package maintainer can use this script to clean
> > >> up
> > >> >> all files in his
> > >> >>> package. The prefer way is to create one patch per
> > >> one
> > >> >> package.
> > >> >>>
> > >> >>> Contributed-under: TianoCore Contribution Agreement
> > >> >> 1.1
> > >> >>> Signed-off-by: Liming Gao <liming.gao@intel.com>
> > >> >>> ---
> > >> >>> BaseTools/Scripts/FormatDosFiles.py | 93
> > >> >>> +++++++++++++++++++++++++++++++++++++
> > >> >>> 1 file changed, 93 insertions(+)
> > >> >>> create mode 100644
> > >> >> BaseTools/Scripts/FormatDosFiles.py
> > >> >>>
> > >> >>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
> > >> >>> b/BaseTools/Scripts/FormatDosFiles.py
> > >> >>> new file mode 100644
> > >> >>> index 0000000..c3a5476
> > >> >>> --- /dev/null
> > >> >>> +++ b/BaseTools/Scripts/FormatDosFiles.py
> > >> >>> @@ -0,0 +1,93 @@
> > >> >>> +# @file FormatDosFiles.py
> > >> >>> +# This script format the source files to follow dos
> > >> >> style.
> > >> >>> +# It supports Python2.x and Python3.x both.
> > >> >>> +#
> > >> >>> +#  Copyright (c) 2018, Intel Corporation. All
> > >> rights
> > >> >> reserved.<BR>
> > >> >>> +#
> > >> >>> +#  This program and the accompanying materials
> > >> >>> +#  are licensed and made available under the terms
> > >> >> and conditions of the
> > >> >>> BSD License
> > >> >>> +#  which accompanies this distribution.  The full
> > >> >> text of the license may be
> > >> >>> found at
> > >> >>> +#  http://opensource.org/licenses/bsd-license.php
> > >> >>> +#
> > >> >>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE
> > >> >> ON AN "AS IS"
> > >> >>> BASIS,
> > >> >>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY
> > >> KIND,
> > >> >> EITHER
> > >> >>> EXPRESS OR IMPLIED.
> > >> >>> +#
> > >> >>> +
> > >> >>> +#
> > >> >>> +# Import Modules
> > >> >>> +#
> > >> >>> +import argparse
> > >> >>> +import os
> > >> >>> +import os.path
> > >> >>> +import re
> > >> >>> +import sys
> > >> >>> +
> > >> >>> +"""
> > >> >>> +difference of string between python2 and python3:
> > >> >>> +
> > >> >>> +there is a large difference of string in python2
> > >> and
> > >> >> python3.
> > >> >>> +
> > >> >>> +in python2,there are two type string,unicode string
> > >> >> (unicode type) and 8-bit
> > >> >>> string (str type).
> > >> >>> +    us = u"abcd",
> > >> >>> +    unicode string,which is internally stored as
> > >> unicode
> > >> >> code point.
> > >> >>> +    s = "abcd",s = b"abcd",s = r"abcd",
> > >> >>> +    all of them are 8-bit string,which is
> > >> internally
> > >> >> stored as bytes.
> > >> >>> +
> > >> >>> +in python3,a new type called bytes replace 8-bit
> > >> >> string,and str type is
> > >> >>> regarded as unicode string.
> > >> >>> +    s = "abcd", s = u"abcd", s = r"abcd",
> > >> >>> +    all of them are str type,which is internally
> > >> stored
> > >> >> unicode code point.
> > >> >>> +    bs = b"abcd",
> > >> >>> +    bytes type,which is interally stored as bytes
> > >> >>> +
> > >> >>> +in python2 ,the both type string can be mixed
> > >> use,but
> > >> >> in python3 it could
> > >> >>> not,
> > >> >>> +which means the pattern and content in re match
> > >> >> should be the same type
> > >> >>> in python3.
> > >> >>> +in function FormatFile,it read file in binary mode
> > >> so
> > >> >> that the content is bytes
> > >> >>> type,so the pattern should also be bytes type.
> > >> >>> +As a result,I add encode() to make it compitable
> > >> >> among python2 and
> > >> >>> python3.
> > >> >>> +
> > >> >>> +difference of encode,decode in python2 and python3:
> > >> >>> +the builtin function str.encode(encoding) and
> > >> >> str.decode(encoding) are
> > >> >>> used for convert between 8-bit string and unicode
> > >> >> string.
> > >> >>> +
> > >> >>> +in python2
> > >> >>> +    encode convert unicode type to str type.decode
> > >> vice
> > >> >> versa.default
> > >> >>> encoding is ascii.
> > >> >>> +    for example: s = us.encode()
> > >> >>> +    but if the us is str type,the code will also
> > >> work.it
> > >> >> will be firstly convert
> > >> >>> to unicode type,
> > >> >>> +    in this situation,the call equals s =
> > >> >> us.decode().encode().
> > >> >>> +
> > >> >>> +in python3
> > >> >>> +    encode convert str type to bytes type,decode
> > >> vice
> > >> >> versa.default
> > >> >>> encoding is utf8.
> > >> >>> +    fpr example:
> > >> >>> +    bs = s.encode(),only str type has encode
> > >> method,so
> > >> >> that won't be
> > >> >>> used wrongly.decode is the same.
> > >> >>> +
> > >> >>> +in conclusion:
> > >> >>> +    this code could work the same in python27 and
> > >> >> python36
> > >> >>> environment as far as the re pattern satisfy ascii
> > >> >> character set.
> > >> >>> +
> > >> >>> +"""
> > >> >>> +def FormatFiles():
> > >> >>> +    parser = argparse.ArgumentParser()
> > >> >>> +    parser.add_argument('path', nargs=1, help='The
> > >> >> path for files to be
> > >> >>> converted.')
> > >> >>> +    parser.add_argument('extensions', nargs='+',
> > >> >> help='File extensions filter.
> > >> >>> (Example: .txt .c .h)')
> > >> >>> +    args = parser.parse_args()
> > >> >>> +    filelist = []
> > >> >>> +    for dirpath, dirnames, filenames in
> > >> >> os.walk(args.path[0]):
> > >> >>> +        for filename in [f for f in filenames if
> > >> >> any(f.endswith(ext) for ext in
> > >> >>> args.extensions)]:
> > >> >>> +            filelist.append(os.path.join(dirpath,
> > >> >> filename))
> > >> >>> +    for file in filelist:
> > >> >>> +        fd = open(file, 'rb')
> > >> >>> +        content = fd.read()
> > >> >>> +        fd.close()
> > >> >>> +        # Convert the line endings to CRLF
> > >> >>> +        content = re.sub(r'([^\r])\n'.encode(),
> > >> >> r'\1\r\n'.encode(), content)
> > >> >>> +        content = re.sub(r'^\n'.encode(),
> > >> >> r'\r\n'.encode(), content, flags =
> > >> >>> re.MULTILINE)
> > >> >>> +        # Add a new empty line if the file is not
> > >> end
> > >> >> with one
> > >> >>> +        content = re.sub(r'([^\r\n])$'.encode(),
> > >> >> r'\1\r\n'.encode(), content)
> > >> >>> +        # Remove trailing white spaces
> > >> >>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(),
> > >> >> r'\1'.encode(), content, flags =
> > >> >>> re.MULTILINE)
> > >> >>> +        # Replace '\t' with two spaces
> > >> >>> +        content = re.sub('\t'.encode(), '
> > >> >> '.encode(), content)
> > >> >>> +        fd = open(file, 'wb')
> > >> >>> +        fd.write(content)
> > >> >>> +        fd.close()
> > >> >>> +        print(file)
> > >> >>> +
> > >> >>> +if __name__ == "__main__":
> > >> >>> +    sys.exit(FormatFiles())
> > >> >>> \ No newline at end of file
> > >> >>> --
> > >> >>> 2.8.0.windows.1
> > >> >>>
> > >> >>> _______________________________________________
> > >> >>> edk2-devel mailing list
> > >> >>> edk2-devel@lists.01.org
> > >> >>> https://lists.01.org/mailman/listinfo/edk2-devel
> > >> >> _______________________________________________
> > >> >> edk2-devel mailing list
> > >> >> edk2-devel@lists.01.org
> > >> >> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Gao, Liming 5 years, 11 months ago
Jaben:
  What difference of statement for file read/write? 

  Besides, we use .encode() here to support python 3. After we move to python 3, this script is not changed. 

Thanks
Liming
>-----Original Message-----
>From: Carsey, Jaben
>Sent: Monday, May 21, 2018 10:50 PM
>To: Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
>Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
>
>Liming,
>
>One Pep8 thing.
>Can you change to use the with statement for the file read/write?
>
>Other small thoughts.
>I think that FileList should be changed to a set as order is not important.
>Maybe wrapper the re.sub function with your own so all the .encode() are in
>one location?  As we move to python 3 we will have fewer changes to make.
>
>
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>> Liming Gao
>> Sent: Sunday, May 20, 2018 9:52 PM
>> To: edk2-devel@lists.01.org
>> Subject: [edk2] [RFC] Formalize source files to follow DOS format
>>
>> FormatDosFiles.py is added to clean up dos source files. It bases on
>> the rules defined in EDKII C Coding Standards Specification.
>> 5.1.2 Do not use tab characters
>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
>> 5.1.7 All files must end with CRLF
>> No trailing white space in one line. (To be added in spec)
>>
>> The source files in edk2 project with the below postfix are dos format.
>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf
>> .txt .bat .py
>>
>> The package maintainer can use this script to clean up all files in his
>> package. The prefer way is to create one patch per one package.
>>
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Liming Gao <liming.gao@intel.com>
>> ---
>>  BaseTools/Scripts/FormatDosFiles.py | 93
>> +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 93 insertions(+)
>>  create mode 100644 BaseTools/Scripts/FormatDosFiles.py
>>
>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
>> b/BaseTools/Scripts/FormatDosFiles.py
>> new file mode 100644
>> index 0000000..c3a5476
>> --- /dev/null
>> +++ b/BaseTools/Scripts/FormatDosFiles.py
>> @@ -0,0 +1,93 @@
>> +# @file FormatDosFiles.py
>> +# This script format the source files to follow dos style.
>> +# It supports Python2.x and Python3.x both.
>> +#
>> +#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
>> +#
>> +#  This program and the accompanying materials
>> +#  are licensed and made available under the terms and conditions of the
>> BSD License
>> +#  which accompanies this distribution.  The full text of the license may be
>> found at
>> +#  http://opensource.org/licenses/bsd-license.php
>> +#
>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
>> BASIS,
>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
>> EXPRESS OR IMPLIED.
>> +#
>> +
>> +#
>> +# Import Modules
>> +#
>> +import argparse
>> +import os
>> +import os.path
>> +import re
>> +import sys
>> +
>> +"""
>> +difference of string between python2 and python3:
>> +
>> +there is a large difference of string in python2 and python3.
>> +
>> +in python2,there are two type string,unicode string (unicode type) and 8-
>bit
>> string (str type).
>> +	us = u"abcd",
>> +	unicode string,which is internally stored as unicode code point.
>> +	s = "abcd",s = b"abcd",s = r"abcd",
>> +	all of them are 8-bit string,which is internally stored as bytes.
>> +
>> +in python3,a new type called bytes replace 8-bit string,and str type is
>> regarded as unicode string.
>> +	s = "abcd", s = u"abcd", s = r"abcd",
>> +	all of them are str type,which is internally stored unicode code point.
>> +	bs = b"abcd",
>> +	bytes type,which is interally stored as bytes
>> +
>> +in python2 ,the both type string can be mixed use,but in python3 it could
>> not,
>> +which means the pattern and content in re match should be the same type
>> in python3.
>> +in function FormatFile,it read file in binary mode so that the content is
>bytes
>> type,so the pattern should also be bytes type.
>> +As a result,I add encode() to make it compitable among python2 and
>> python3.
>> +
>> +difference of encode,decode in python2 and python3:
>> +the builtin function str.encode(encoding) and str.decode(encoding) are
>> used for convert between 8-bit string and unicode string.
>> +
>> +in python2
>> +	encode convert unicode type to str type.decode vice versa.default
>> encoding is ascii.
>> +	for example: s = us.encode()
>> +	but if the us is str type,the code will also work.it will be firstly convert
>> to unicode type,
>> +	in this situation,the call equals s = us.decode().encode().
>> +
>> +in python3
>> +	encode convert str type to bytes type,decode vice versa.default
>> encoding is utf8.
>> +	fpr example:
>> +	bs = s.encode(),only str type has encode method,so that won't be
>> used wrongly.decode is the same.
>> +
>> +in conclusion:
>> +	this code could work the same in python27 and python36
>> environment as far as the re pattern satisfy ascii character set.
>> +
>> +"""
>> +def FormatFiles():
>> +    parser = argparse.ArgumentParser()
>> +    parser.add_argument('path', nargs=1, help='The path for files to be
>> converted.')
>> +    parser.add_argument('extensions', nargs='+', help='File extensions filter.
>> (Example: .txt .c .h)')
>> +    args = parser.parse_args()
>> +    filelist = []
>> +    for dirpath, dirnames, filenames in os.walk(args.path[0]):
>> +        for filename in [f for f in filenames if any(f.endswith(ext) for ext in
>> args.extensions)]:
>> +            filelist.append(os.path.join(dirpath, filename))
>> +    for file in filelist:
>> +        fd = open(file, 'rb')
>> +        content = fd.read()
>> +        fd.close()
>> +        # Convert the line endings to CRLF
>> +        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
>> +        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags =
>> re.MULTILINE)
>> +        # Add a new empty line if the file is not end with one
>> +        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
>> +        # Remove trailing white spaces
>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags
>=
>> re.MULTILINE)
>> +        # Replace '\t' with two spaces
>> +        content = re.sub('\t'.encode(), '  '.encode(), content)
>> +        fd = open(file, 'wb')
>> +        fd.write(content)
>> +        fd.close()
>> +        print(file)
>> +
>> +if __name__ == "__main__":
>> +    sys.exit(FormatFiles())
>> \ No newline at end of file
>> --
>> 2.8.0.windows.1
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC] Formalize source files to follow DOS format
Posted by Carsey, Jaben 5 years, 11 months ago
Follow pep8 for coding style.

The technical benefit is things like that If an exception occurs we still close the file.

> -----Original Message-----
> From: Gao, Liming
> Sent: Thursday, May 24, 2018 1:31 AM
> To: Carsey, Jaben <jaben.carsey@intel.com>; edk2-devel@lists.01.org
> Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> Importance: High
> 
> Jaben:
>   What difference of statement for file read/write?
> 
>   Besides, we use .encode() here to support python 3. After we move to
> python 3, this script is not changed.
> 
> Thanks
> Liming
> >-----Original Message-----
> >From: Carsey, Jaben
> >Sent: Monday, May 21, 2018 10:50 PM
> >To: Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
> >Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
> >
> >Liming,
> >
> >One Pep8 thing.
> >Can you change to use the with statement for the file read/write?
> >
> >Other small thoughts.
> >I think that FileList should be changed to a set as order is not important.
> >Maybe wrapper the re.sub function with your own so all the .encode() are
> in
> >one location?  As we move to python 3 we will have fewer changes to
> make.
> >
> >
> >> -----Original Message-----
> >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> >> Liming Gao
> >> Sent: Sunday, May 20, 2018 9:52 PM
> >> To: edk2-devel@lists.01.org
> >> Subject: [edk2] [RFC] Formalize source files to follow DOS format
> >>
> >> FormatDosFiles.py is added to clean up dos source files. It bases on
> >> the rules defined in EDKII C Coding Standards Specification.
> >> 5.1.2 Do not use tab characters
> >> 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
> >> 5.1.7 All files must end with CRLF
> >> No trailing white space in one line. (To be added in spec)
> >>
> >> The source files in edk2 project with the below postfix are dos format.
> >> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf
> >> .txt .bat .py
> >>
> >> The package maintainer can use this script to clean up all files in his
> >> package. The prefer way is to create one patch per one package.
> >>
> >> Contributed-under: TianoCore Contribution Agreement 1.1
> >> Signed-off-by: Liming Gao <liming.gao@intel.com>
> >> ---
> >>  BaseTools/Scripts/FormatDosFiles.py | 93
> >> +++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 93 insertions(+)
> >>  create mode 100644 BaseTools/Scripts/FormatDosFiles.py
> >>
> >> diff --git a/BaseTools/Scripts/FormatDosFiles.py
> >> b/BaseTools/Scripts/FormatDosFiles.py
> >> new file mode 100644
> >> index 0000000..c3a5476
> >> --- /dev/null
> >> +++ b/BaseTools/Scripts/FormatDosFiles.py
> >> @@ -0,0 +1,93 @@
> >> +# @file FormatDosFiles.py
> >> +# This script format the source files to follow dos style.
> >> +# It supports Python2.x and Python3.x both.
> >> +#
> >> +#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
> >> +#
> >> +#  This program and the accompanying materials
> >> +#  are licensed and made available under the terms and conditions of the
> >> BSD License
> >> +#  which accompanies this distribution.  The full text of the license may
> be
> >> found at
> >> +#  http://opensource.org/licenses/bsd-license.php
> >> +#
> >> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS
> IS"
> >> BASIS,
> >> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
> >> EXPRESS OR IMPLIED.
> >> +#
> >> +
> >> +#
> >> +# Import Modules
> >> +#
> >> +import argparse
> >> +import os
> >> +import os.path
> >> +import re
> >> +import sys
> >> +
> >> +"""
> >> +difference of string between python2 and python3:
> >> +
> >> +there is a large difference of string in python2 and python3.
> >> +
> >> +in python2,there are two type string,unicode string (unicode type) and
> 8-
> >bit
> >> string (str type).
> >> +	us = u"abcd",
> >> +	unicode string,which is internally stored as unicode code point.
> >> +	s = "abcd",s = b"abcd",s = r"abcd",
> >> +	all of them are 8-bit string,which is internally stored as bytes.
> >> +
> >> +in python3,a new type called bytes replace 8-bit string,and str type is
> >> regarded as unicode string.
> >> +	s = "abcd", s = u"abcd", s = r"abcd",
> >> +	all of them are str type,which is internally stored unicode code point.
> >> +	bs = b"abcd",
> >> +	bytes type,which is interally stored as bytes
> >> +
> >> +in python2 ,the both type string can be mixed use,but in python3 it could
> >> not,
> >> +which means the pattern and content in re match should be the same
> type
> >> in python3.
> >> +in function FormatFile,it read file in binary mode so that the content is
> >bytes
> >> type,so the pattern should also be bytes type.
> >> +As a result,I add encode() to make it compitable among python2 and
> >> python3.
> >> +
> >> +difference of encode,decode in python2 and python3:
> >> +the builtin function str.encode(encoding) and str.decode(encoding) are
> >> used for convert between 8-bit string and unicode string.
> >> +
> >> +in python2
> >> +	encode convert unicode type to str type.decode vice versa.default
> >> encoding is ascii.
> >> +	for example: s = us.encode()
> >> +	but if the us is str type,the code will also work.it will be firstly convert
> >> to unicode type,
> >> +	in this situation,the call equals s = us.decode().encode().
> >> +
> >> +in python3
> >> +	encode convert str type to bytes type,decode vice versa.default
> >> encoding is utf8.
> >> +	fpr example:
> >> +	bs = s.encode(),only str type has encode method,so that won't be
> >> used wrongly.decode is the same.
> >> +
> >> +in conclusion:
> >> +	this code could work the same in python27 and python36
> >> environment as far as the re pattern satisfy ascii character set.
> >> +
> >> +"""
> >> +def FormatFiles():
> >> +    parser = argparse.ArgumentParser()
> >> +    parser.add_argument('path', nargs=1, help='The path for files to be
> >> converted.')
> >> +    parser.add_argument('extensions', nargs='+', help='File extensions
> filter.
> >> (Example: .txt .c .h)')
> >> +    args = parser.parse_args()
> >> +    filelist = []
> >> +    for dirpath, dirnames, filenames in os.walk(args.path[0]):
> >> +        for filename in [f for f in filenames if any(f.endswith(ext) for ext in
> >> args.extensions)]:
> >> +            filelist.append(os.path.join(dirpath, filename))
> >> +    for file in filelist:
> >> +        fd = open(file, 'rb')
> >> +        content = fd.read()
> >> +        fd.close()
> >> +        # Convert the line endings to CRLF
> >> +        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
> >> +        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags =
> >> re.MULTILINE)
> >> +        # Add a new empty line if the file is not end with one
> >> +        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
> >> +        # Remove trailing white spaces
> >> +        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content,
> flags
> >=
> >> re.MULTILINE)
> >> +        # Replace '\t' with two spaces
> >> +        content = re.sub('\t'.encode(), '  '.encode(), content)
> >> +        fd = open(file, 'wb')
> >> +        fd.write(content)
> >> +        fd.close()
> >> +        print(file)
> >> +
> >> +if __name__ == "__main__":
> >> +    sys.exit(FormatFiles())
> >> \ No newline at end of file
> >> --
> >> 2.8.0.windows.1
> >>
> >> _______________________________________________
> >> edk2-devel mailing list
> >> edk2-devel@lists.01.org
> >> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel