python脚本常用功能 – SillyRabbit's blog

字符串操作

str1 = '12345678'
str2 = 'abcdefg'
str3 = 'aaaaaa'

print str_1[0:3]
print(str1 + str2)
print(str3.replace('a','d'))
print(str3.find('c'))

>>> 123
>>> 12345678abcdefg
>>> dddddd
>>> 2

分割：

str = 'a,b,c,d'
strlist = str.split(',') #返回列表
for value in strlist:
    print value
>>> a
>>> b
>>> c
>>> d

strip()
用于移除字符串头尾指定的字符（默认为空格或换行符）或字符序列。

str.strip([chars]);

注意：该方法只能删除开头或是结尾的字符，不能删除中间部分的字符。
.lstrip('a')
返回截掉字符串左边的指定字符('a')后生成的新字符串,语法同上。
.rstrip('a')
返回截掉字符串右边的指定字符('a')后生成的新字符串,语法同上。

文件

读写文件内容

关于open()的mode参数：

'r'：读
'w'：写
'a'：追加
'r+' == r+w（可读可写，文件若不存在就报错(IOError)）
'w+' == w+r（可读可写，文件若不存在就创建）
'a+' ==a+r（可追加可写，文件若不存在就创建）
对应的，如果是二进制文件，就都加一个b就好啦：
'rb'　　'wb'　　'ab'　　'rb+'　　'wb+'　　'ab+'）

关于open()的encoding参数
设置编码方式，常用有：
ascii, latin-1, utf-8 和utf-16

f = open('/path/to/file', mode='r'，encoding='utf-8')

打开文件open()
打开一个文件用open()方法(open()返回一个文件对象，它是可迭代的)

文件使用完毕后需要关闭，因为文件对象会占用操作系统的资源，并且操作系统同一时间能打开的文件数量也是有限的

f = open('/path/to/file', 'r')
print(f.read())
f.close()

读取

read()
每次读取整个文件，它通常用于将文件内容放到一个字符串变量中。如果文件大于可用内存，为了保险起见，可以反复调用read(size)方法，每次最多读取size个字节的内容。

with open('/path/to/file', 'r') as f:
    print(f.read())

readlines()
读取整个文件，同 .read() 一样。.readlines() 自动将文件内容分析成一个行的列表，该列表可以由 Python 的 for ... in ... 结构进行处理。

with open('/path/to/file', 'r') as f:
    list1 = f.readlines()
#list1:
#['111\n', '222\n', '333\n', '444\n', '555\n', '666\n']

readline()
每次只读取一行，通常比readlines() 慢得多。当没有足够内存可以一次读取整个文件时，可以使用 readline()。

with open('/path/to/file', 'r') as f:
    fline = f.readline()
    print(fline)

注意：这三种方法默认读取末尾换行符'\n'，用 print 输出时 '\n' 正常换行。

写入

写文件和读文件是一样的，唯一区别是调用open()函数时，传入标识符'w'或者'wb'表示写文本文件或写二进制文件

注意：'w'模式，若文件不存在，就创建该文件；如果有，那么就会先清空原内容再写入新的内容。不覆盖原来的内容在后面追加新的内容，则用'a'模式。

write()
read()、readline()方法对应，是将字符串写入到文件中。

f = open('/path/to/file', 'w')
f.write('Hello, world!') #写入字符串
f.close()

writelines()
和readlines()方法对应，也是针对列表的操作。它接收一个字符串列表作为参数，写入到文件中，不会自动加入换行符。
复制代码

f1 = open('test1.txt', 'w')
f1.writelines(["1", "2", "3"])
#    此时test1.txt的内容为:123

f2 = open('test2.txt', 'w')
f2.writelines(["1\n", "2\n", "3\n"])
#此时test1.txt的内容为:
#1
#2
#3

重命名

Path = '/file.txt'
newPath = 'aaa'.join(Path.split('file'))

>>> file.txt => aaa.txt

文件夹

遍历目录

os.path.isdir()：用于判断某一对象(需提供绝对路径)是否为目录
os.path.isfile()：用于判断某一对象(需提供绝对路径)是否为文件

os.listdir()：此方法返回一个列表，其中包含有指定路径下的目录和文件的名称

os.walk()：返回的是一个三元组(root,dirs,files)

root 所指的是当前正在遍历的这个文件夹的本身的地址
dirs 是一个 list ，内容是该文件夹中所有的目录的名字(不包括子目录)
files 同样是 list , 内容是该文件夹中所有的文件(不包括子目录)

os.listdir

# -*- coding: utf-8 -*-
import os
rootDir = 'C://123/'
for lists in os.listdir(rootDir):  #遍历目录
    path = os.path.join(rootDir, lists)  #目录拼接
    print path
    if os.path.isdir(path):
        Test2(path)

os.walk:

# -*- coding: utf-8 -*-
import os
rootDir = 'C://123/'
list_dirs = os.walk(rootDir)  #遍历目录
for root, dirs, files in list_dirs:
    for d in dirs:
        print os.path.join(root, d)
    for f in files:
        print os.path.join(root, f)

Request

import requests
headers={
"Host": "www.baidu.com",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:87.0) Gecko/20100101 Firefox/87.0",
"Accept": "text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Cookie": "JSESSIONID=36655AA9E9F66EA98179C33AFE921B60",
"Upgrade-Insecure-Requests": "1",
"If-None-Match": "W/'59254-1618555182000'"
}

try:
    r=requests.get(now_url,timeout=10,headers=headers)
    status = r.status_code
    r.encoding=r.apparent_encoding
    return r.text
except:
    print('connect failed!\nstatus coed:',status)

编码

编码与解码的处理对象是byte，故对原数据要先编码，使原本的str类型变成byte，解码后直接输出来的是byte对象，故要解码成str对象。

base64:

import base64

st = 'hello world!'.encode()#默认以utf8编码
res = base64.b64encode(st)
print(res.decode())#默认以utf8解码
res = base64.b64decode(res)
print(res.decode())#默认以utf8解码

这部分以后慢慢积累...

Office

word批量转pdf:

from win32com.client import gencache
from win32com.client import constants, gencache
import os

def createPdf(wordPath, pdfPath):
    print(pdfPath)
    word = gencache.EnsureDispatch('Word.Application')
    try:
        doc = word.Documents.Open(wordPath, ReadOnly=1)
        doc.ExportAsFixedFormat(pdfPath,
                                constants.wdExportFormatPDF,
                                Item=constants.wdExportDocumentWithMarkup,
                                CreateBookmarks=constants.wdExportCreateHeadingBookmarks)
        print('yes')
    except:
        print('fail')
    finally:
        word.Quit(constants.wdDoNotSaveChanges)

sourcePath="C:\\Users\dell\Desktop\docxs"
PdfPath="C:\\Users\dell\Desktop\pdfs"

files=os.listdir(sourcePath)
for file in files:
    pdf = PdfPath+'\\'+os.path.splitext(file)[0]+'.pdf'
    doc= sourcePath+'\\'+file
    createPdf(doc,pdf)

Post Views: 78

字符串操作

文件

读写文件内容

读取

写入

重命名

文件夹

遍历目录

Request

编码

Office

发表评论 取消回复

发表评论取消回复