Extracting data from a text file with regular expressions. The List tool on Regexr.com by gskinner is powerful but may cause browser's memory overflow when processing too big data. So let's do it via VSCode. 正则表达式测试网站 regexr.com 上用于文本数据提取的 List 操作工具,复刻为 VSCode 扩展程序,可避免浏览器内存溢出,从而用于生产环境。
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Extracting data from a text file with regular expressions. The List tool on Regexr.com by gskinner is powerful but may cause browser's memory overflow when processing too big data. So let's do it via VSCode.
正则表达式测试网站 regexr.com 上用于文本数据提取的 List 操作工具,复刻为 VSCode 扩展程序,可避免浏览器内存溢出,从而用于生产环境。
Classical Regular Expression (regex) has two actions: search (or match) and replace. The Web-based Regexr.com provides another action of LIST -- by extracting every match and replacement into a new ouput, LIST makes a list of what a Pattern matches and converts the match by a Format usu with replacement groups.
This LIST tool on Regexr.com is so powerful for data extraction that one may use it under PROD. However, Regexr.com can't process too big text because of the Web browser memory limitation. So this VSCode extension provided.
Regexr.com 网站的 List 功能极为有用,但该网站在浏览器中运行,由于内存限制,难以用于生产环境中的大量文本数据处理。为此编写该 VSCode 编辑器扩展程序。
Requirements 系统要求
Visual Studio Code editor. 只需要 VSCode 编辑器软件。
Extension Settings 扩展程序设置选项
None is required. 不用做什么设置。
Usage 用法
Install the extension. 安装扩展程序。
Open the original text in a VSCode Tab. 在编辑器中打开一个文本文件。
Start the extension with the command reList. 使用 reList 命令启动扩展程序。
Enter the Pattern to match. This is required. 输入待匹配的正则表达式。
Enter the Format to list. Default format: $&\n (Every whole match in a line). 输入提取后的格式,缺省为全部匹配文本分行列出。
The extracted list text will be output in a new Tab on VSCode. 提取到的列表文本会输出到一个新建的编辑器标签页中。
Save the output as a local file or use it as your wish. 结果可以保存为本地文件,或随意使用。
Known Issues 已知问题
The regex metacharacter Back-slash () with n, r, t, b, f, in Format, like \n \r \t \b \f can be parsed as special chars LF, CR, Tab, etc., and \ as \ itself. But this parsing is during the extension script other than the script language (JS) core. Multiple back-slashes like \\ or more may be parsed weirdly. 正则表达式用反斜线(\)元字符对 n r t b f 等转义的字符,在提取格式中使用时会被解析为换行、回车、制表符等特殊字符,但在本扩展中并不能由 JS 脚本语言核心本身解析,而是由扩展中所编写的脚本代码执行的。三连反斜线 \\ 或更多连写的反斜线的解析结果可能不符合预期。
Some other differences with the VSCode regex (this extension calls JavaScript RegEx methods). 某些匹配结果与 VSCode 编辑器本身的正则表达式搜索匹配方式似乎不同(本扩展采用 JS 语言 RegEx 对象实现)。
Release Notes 发布说明 [0.1.2] - 2020-02-10
Added 增加
Simplified Chinese UI and revised bilingual README. 增加简体中文界面(自述文件同时更新为中英双语版)。
Changed 变更
Minor fixes. 各种微小修改。
Plan 开发计划
Matches the very way same as the VSCode regex. 修正匹配结果,使之与 VSCode 编辑器的正则表达式匹配方式相一致。