git随便写写-瘦身git仓库

/ 默认分类 / 1 条评论 / 68浏览

前言

这几天在新电脑clone自己在github上的一个android库,发现就然有300m,始终clone不下来,回忆出曾经传过几个大文件,虽然后面删除了,但是git会保存相关历史,也就是说文件并没有真正的删除,所以需要现在需要彻底的删除在git库里的历史。

‘瘦身’方案

瘦身方案我这里有两种

  1. 使用一个老外的开源工具BFG
  2. 使用基础的git命令

BGF开源工具

官网有很明确的命令~自己看吧,它主要是封装了git的filter-branch,verify-pack等命令,提供更加便捷方式和优雅的显示

手动处理步骤

查找大文件

git verify-pack -v .idx 校验所有归档文件,同时会把每个文件的大小,类型展现出来,通过结合sort结合按照文件大小的正序排出。

git verify-pack .git/objects/pack/pack-***.idx -v |grep -v -E "chain|delta" | sort -nr -k 3 -| more

sort -rn -k 3 : r代表逆序,n代表按数字大小 -k是第几个字段

输出结果类似如下:

87e60b237cbb421c45da4a4eec092e29d6f4f8d2 blob   4252532 2091380 2091725
5580e15226238964b1b28283ecd6b6a13a541562 blob   3480460 1719736 371989
3cf07f1ad58201171cbe1aab4a6b85fffae9f1bd blob   876848 217971 6594845
bc5ef36fc34765a1ca0691aec2f7dd25ae6b1dbc blob   522514 97651 7026291
1228580e01d722bd0f90ac44edaef7907b32475b blob   325732 310704 5253772
489bcdb7b2c36c162064ddb3237aa818c15eb59b blob   318821 62564 6509324
54178e3014e55061c41aa7c5ac81792c9c165706 blob   302901 289362 5782416

可以看到最大的文件是==87e60b237cbb421c45da4a4eec092e29d6f4f8d2==这个文件,下面需要找到这个文件对应的名字

git-rev-list 列出以反向时间顺序提交对象。

 git rev-list --objects --all | grep 87e60b237cbb421c45da4a4

输出结果:

87e60b237cbb421c45da4a4eec092e29d6f4f8d2 MyPractice/app/libs/armeabi-v7a/libksystreamer.so

可以看到最大的文件就是libksystreamer.so文件了

同时我们可以通过log命令查看该文件都存在哪些提交中

git log --pretty=oneline --branches -- MyPractice/app/libs/armeabi-v7a/libksystreamer.so

删除文件的所有历史记录

在git中,所有的文件元数据都存在历史中,所以需要重写git的历史,来彻底删除文件。

==git-filter-branch==命令就是用来重写历史的底层命令,这个命令参数很多,没有必要全部了解,我觉得找到我们所需要的即可。

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch MyPractice/app/libs/armeabi-v7a/libksystreamer.so' --prune-empty --tag-name-filter cat -- --all

==--index-filter==参数作用看一下git的说明

--index-filter <command>
    This is the filter for rewriting the index. It is similar to the tree filter but does not check out the tree,
    which makes it much faster. Frequently used with git rm --cached --ignore-unmatch ..., see EXAMPLES below. For
    hairy cases, see git-update-index(1).

意思就是用来重写索引的,经常配合==git rm --cached --ignore-unmatch==一起使用。

==--prune-empty==参数说明:

--prune-empty
    Some filters will generate empty commits that leave the tree untouched. This option instructs git-filter-branch
    to remove such commits if they have exactly one or zero non-pruned parents; merge commits will therefore remain
    intact. This option cannot be used together with --commit-filter, though the same effect can be achieved by
    using the provided git_commit_non_empty_tree function in a commit filter.

感觉为了如果filter成功,那么那次的提交可能是空的,所以这个参数是为了删除空提交。

==--tag-name-filter==参数说明:

--tag-name-filter <command>
           This is the filter for rewriting tag names. When passed, it will be called for every tag ref that points to a
           rewritten object (or to a tag object which points to a rewritten object). The original tag name is passed via
           standard input, and the new tag name is expected on standard output.

           The original tags are not deleted, but can be overwritten; use "--tag-name-filter cat" to simply update the
           tags. In this case, be very careful and make sure you have the old tags backed up in case the conversion has run
           afoul.

           Nearly proper rewriting of tag objects is supported. If the tag has a message attached, a new tag object will be
           created with the same message, author, and timestamp. If the tag has a signature attached, the signature will be
           stripped. It is by definition impossible to preserve signatures. The reason this is "nearly" proper, is because
           ideally if the tag did not change (points to the same object, has the same name, etc.) it should retain any
           signature. That is not the case, signatures will always be removed, buyer beware. There is also no support for
           changing the author or timestamp (or the tag message for that matter). Tags which point to other tags will be
           rewritten to point to the underlying commit.

太长了~,简单点说是为了重写tag的名字的,后面的command是bash命令,cat表示不变。

最后的-- --all中--是分隔符,--all表示对所有分支生效

收尾操作

清除reflog信息,归档松散对象,推送远程仓库

git reflog expire --expire=now --all #清楚所有reflog的信息。
git gc --prune=now #归档松散对象,
git push --all --force #推送到远程分支

总结操作过程

大致的操作命令顺序如下:

git verify-pack .git/objects/pack/pack-***.idx -v |grep -v -E "chain|delta" | sort -nr -k 3 -| more
git rev-list --objects --all | grep 87e60b237cbb421c45da4a4
git log --pretty=oneline --branches -- MyPractice/app/libs/armeabi-v7a/libksystreamer.so
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch MyPractice/app/libs/armeabi-v7a/libksystreamer.so' --prune-empty --tag-name-filter cat -- --all
git reflog expire --expire=now --all #清楚所有reflog的信息。
git gc --prune=now #归档松散对象
git push --all --force #推送到远程分支

结束

git本身的底层命令非常多,但是大部分都不怎么会用到,需要的时候就可以了。还有就是底层命令最好不要使用~~~

  1. 随便写写就这么厉害吗

    回复