本文主要是介绍TfLite porting: 生成目标文件太大的问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
TfLite移植到其他环境如MCU等其他RAM较小硬件,要控制TfLite image的大小。
1. 生成的文件是否包含debug信息
再次感谢前面提到的那本书,知道了file命令
file
file是Unix 系统的一条标准命令,用来确认文件的类型
比如:
file chre_app_oem.so
chre_app_oem.so: ELF 32-bit LSB shared object, QUALCOMM DSP6, version 1 (SYSV), dynamically linked, with debug_info, not stripped主要其中的with debug_info
readelf -S: 其中debug段和size大小
readelf -S chre_app_oem.so
There are 34 section headers, starting at offset 0x1b4320:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .dynsym DYNSYM 00000094 000094 000d90 10 A 2 0 4
[ 2] .dynstr STRTAB 00000e24 000e24 0024cc 00 A 0 0 1
[ 3] .hash HASH 000032f0 0032f0 0006d0 04 A 1 0 4
[ 4] .rela.dyn RELA 000039c0 0039c0 0008dc 0c A 1 0 4
[ 5] .rela.plt RELA 0000429c 00429c 0001a4 0c A 1 7 4
[ 6] .init PROGBITS 00004440 004440 00006c 00 AX 0 0 32
[ 7] .plt PROGBITS 000044b0 0044b0 000260 00 AX 0 0 16
[ 8] .text PROGBITS 00004720 004720 0079ec 00 AX 0 0 32
[ 9] .fini PROGBITS 0000c120 00c120 000044 00 AX 0 0 32
[10] .rodata PROGBITS 0000c170 00c170 0373f4 00 A 0 0 16
[11] .eh_frame PROGBITS 00043580 043580 000004 00 A 0 0 32
[12] .dynamic DYNAMIC 00044000 044000 0000a0 08 WA 2 0 4
[13] .data.rel.ro PROGBITS 000440a0 0440a0 0002b0 00 WA 0 0 8
[14] .data PROGBITS 00044350 044350 000d08 00 WA 0 0 8
[15] .ctors PROGBITS 00045058 045058 000008 00 WA 0 0 4
[16] .dtors PROGBITS 00045060 045060 00000c 00 WA 0 0 4
[17] .got PROGBITS 0004506c 04506c 00002c 00 WA 0 0 4
[18] .got.plt PROGBITS 00045098 045098 00009c 00 WA 0 0 8
[19] .debug_info PROGBITS 00000000 045134 0b08a9 00 0 0 1
[20] .debug_str PROGBITS 00000000 0f59dd 062f05 01 MS 0 0 1
[21] .debug_abbrev PROGBITS 00000000 1588e2 0047c6 00 0 0 1
[22] .debug_aranges PROGBITS 00000000 15d0a8 0004a8 00 0 0 1
[23] .debug_ranges PROGBITS 00000000 15d550 002a80 00 0 0 1
[24] .debug_macinfo PROGBITS 00000000 15ffd0 000012 00 0 0 1
[25] .debug_pubnames PROGBITS 00000000 15ffe2 00783b 00 0 0 1
[26] .debug_pubtypes PROGBITS 00000000 16781d 02fb3c 00 0 0 1
[27] .comment PROGBITS 00000000 197359 000074 00 MS 0 0 1
[28] .debug_line PROGBITS 00000000 1973cd 009468 00 0 0 1
[29] .debug_loc PROGBITS 00000000 1a0835 00e010 00 0 0 1
[30] .debug_frame PROGBITS 00000000 1ae848 000e98 00 0 0 4
[31] .shstrtab STRTAB 00000000 1af6e0 000144 00 0 0 1
[32] .symtab SYMTAB 00000000 1af824 001540 10 33 123 4
[33] .strtab STRTAB 00000000 1b0d64 0035a1 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
2. trip symbols
hexagon-strip --help
Usage: hexagon-strip [options] file...
Discard information from ELF objects.
Options:
-d | -g | -S | --strip-debug Remove debugging symbols.
-h | --help Print a help message.
--only-keep-debug Keep debugging information only.
-p | --preserve-dates Preserve access and modification times.
-s | --strip-all Remove all symbols.
--strip-unneeded Remove symbols not needed for relocation
processing.
-w | --wildcard Use shell-style patterns to name symbols.
-x | --discard-all Discard all non-global symbols.
-I TGT| --input-target=TGT (Accepted, but ignored).
-K SYM | --keep-symbol=SYM Keep symbol 'SYM' in the output.
-N SYM | --strip-symbol=SYM Remove symbol 'SYM' from the output.
-O TGT | --output-target=TGT Set the output file format to 'TGT'.
-R SEC | --remove-section=SEC Remove the section named 'SEC'.
-V | --version Print a version identifier and exit.
-X | --discard-locals Remove compiler-generated local symbols.
-z | --output-format Specify the output format for the archive (cpio, shar, tar, zip, mtree).
-o <file> Place stripped output in <file>. If not specified, strip is inplace.
除了debug信息外还有什么能去掉的?去掉符号?从上面的help注释里可以看出:都是一些符号
这么多选项,到底用哪个?假设
./hexagon-strip -x chre_app_oem.so (Discard all non-global symbols)
在用file 命令看下
file chre_app_oem.so
chre_app_oem.so: ELF 32-bit LSB shared object, QUALCOMM DSP6, version 1 (SYSV), dynamically linked, not stripped
仍然是not stripped, 注意其中的 【 -s | --strip-all Remove all symbols.】
hexagon-strip -s chre_app_oem.so -> file
chre_app_oem.so: ELF 32-bit LSB shared object, QUALCOMM DSP6, version 1 (SYSV), dynamically linked, stripped
3. 怎样去掉debug 和symbols信息
使用工具hexagon-strip(stip的具体平台相关的工具)
hexagon-strip -d -s xxx.so
4. 编译优化选项
–O1
这两个选项的含义是一样的,GCC将执行减少代码尺寸和执行时间的优化,对于那些会严重影响编译时间的优化选项,这个级别的优化并不会执行。
-O2
在这一级别GCC将会提供所有支持的优化,但这其中并不包括以空间换时间的优化手段,例如编译器不会使用循环展开和函数内联。和-O相比,该选项进一步加快了编译时间和生成代码的性能。
-O3
除了-O2提供的优化选项外,还指定了-finline-functions,-funswitch-loops和-fgcse-afer-reload选项,目的只有一个就是全力执行代码优化。
-Os
这个选项是专门用来优化代码尺寸的,-Os打开了所有-O2级别中不会显著增长代码尺寸的优化选项
-O0
该选项代表不执行优化
在这里要说明的是,尽管GCC提供了1~3和s这4个整体优化选项,但从实际的优化效果上来看,往往O3优化出来的程序的效率并不是最高的,而大部分情况下我们都在使用-O2,如果你希望获得最高的效率利益,那么不妨这4个选项都试试
5. 那些库导致了文件变的太大
执行完前面的方法后,如果生成的文件还是太大,就要仔细分析下了。前面提到把libstdc++未解决一些未定义符号的问题换成了libc++, libc++远大于libstdc++. 并不是说链接过程会把整个库如libc++包含进来,链接器会保证把相关的 .o文件包含进来,libc++中不相干的.o文件不会被链接。
为什么libc++会把目标文件变大,还是那几个undefined symbol导致的。那几个未定义的函数重要吗?能给他实现为空函数吗?能不使用他吗?
5.1 反汇编看这些代码到底是什么
反汇编的工具肯定是和平台相关的和readelf等和elf文件相关的操作不一样,另外这里使用目标文件不要去掉debug、符号信息等就是不要执行trip命名,这时库也要是libc++, 没有出现undefined symbol错误
hexagon-llvm-objdump --line-numbers libchre_slpi_skel.so > 1.log
在1.log文件中搜索:查找源文件等
1] _ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv
2796 _ZNSt3__16vectorIiNS_9allocatorIiEEE8allocateEj:
2797 /home/ws/buildtool/fusion/tools/HEXAGON_Tools/8.2.05/Tools/bin/../target/hexagon/include/c++/v1/vector:928
918 // Allocate space for __n objects
919 // throws length_error if __n > max_size()
920 // throws (probably bad_alloc) if memory run out
921 // Precondition: __begin_ == __end_ == __end_cap() == 0
922 // Precondition: __n > 0
923 // Postcondition: capacity() == __n
924 // Postcondition: size() == 0
925 template <class _Tp, class _Allocator>
926 void
927 vector<_Tp, _Allocator>::allocate(size_type __n)
928 {
929 if (__n > max_size()) //当vector的length 大于最大值时调用的错误处理函数
930 this->__throw_length_error(); //_ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv
931 this->__begin_ = this->__end_ = __alloc_traits::allocate(this->__alloc(), __n);
932 this->__end_cap() = this->__begin_ + __n;
933 __annotate_new(0);
934 }
哪里定义了 _ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv
_ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv
98226 _ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv: 编译器自带的,没有源码
98227 /local/mnt/workspace/bots/hexbotmaster-sles11_sd_20/libcxx-base-bldr/build/llvm/projects/libcxx/include/vector:300
编译器一般会提供标准库,所以未必都有源码,但开源代码肯定等找到代码
_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcj
10096 /home/ws/code/cepheus-native/vendor/qcom/non-hlos-sm8150-la10/slpi_proc/chre/tensorflow/tensorflow/lite/mutable_op_resolver.cc:48
10097 47e5c: 00 50 51 f2 f2515000 { p0=cmp.gt(r17,r16)
10098 47e60: 12 42 01 f5 f5014212 r19:18=combine(r1,r2)
10099 47e64: 36 0a 3d ea ea3d0a36 memd(r29+#56)=r19:18; memd(r29+#48)=r21:20 }
10100 47e68: 15 40 9d 74 749d4015 { if (!p0) r21=add(r29,#0)
10101 47e6c: 14 43 80 74 74804314 if (!p0) r20=add(r0,#24)
10102 47e70: 4e 40 00 5c 5c00404e if (p0) jump:nt 0x47f04 <_ZN6tflite17MutableOpResolver9AddCustomEPKcPK19_TfLiteRegistrationii+0xB0>
10103 47e74: 05 d6 dd a1 a1ddd605 memd(r29+#40)=r23:22 }
10104 47e78: 16 c0 00 7c 7c00c016 { r23:22=combine(#0,#0) }
10105 /home/ws/code/cepheus-native/vendor/qcom/non-hlos-sm8150-la10/slpi_proc/chre/tensorflow/tensorflow/lite/mutable_op_resolver.cc:49
10106 47e7c: 00 40 73 70 70734000 { r0=r19
10107 47e80: 61 40 92 91 91924061 r1=memw(r18+#12)
10108 47e84: 08 d2 bd a1 a1bdd208 memw(r29+#32)=r1.new }
10109 47e88: 41 40 92 91 91924041 { r1=memw(r18+#8)
10110 47e8c: 07 d2 bd a1 a1bdd207 memw(r29+#28)=r1.new }
10111 47e90: 21 40 92 91 91924021 { r1=memw(r18+#4)
10112 47e94: 06 d2 bd a1 a1bdd206 memw(r29+#24)=r1.new }
10113 47e98: 82 40 92 91 91924082 { r2=memw(r18+#16)
10114 47e9c: 09 d2 bd a1 a1bdd209 memw(r29+#36)=r2.new }
10115 47ea0: 01 40 92 91 91924001 { r1=memw(r18+#0)
10116 47ea4: 05 d2 bd a1 a1bdd205 memw(r29+#20)=r1.new }
10117 /home/ws/buildtool/fusion/tools/HEXAGON_Tools/8.2.05/Tools/bin/../target/hexagon/include/c++/v1/memory:2245
10118 47ea8: fc 45 ff 5b 5bff45fc { call 0x40aa0 <.plt+0x110>
10119 47eac: 07 0a d2 f0 f0d20a07 memw(r21+#8)=#0; memd(r29+#0)=r23:22 }
10120 /home/ws/buildtool/fusion/tools/HEXAGON_Tools/8.2.05/Tools/bin/../target/hexagon/include/c++/v1/string:647
10121 47eb0: 84 73 08 5a 5a087384 { call 0x8e5b8 <_ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcj>
10122 47eb4: 00 40 1d b0 b01d4000 r0=add(r29,#0)
10123 47eb8: b1 30 02 30 300230b1 r2=r0; r1=r19 }
std::string是模板类,是个基于basic_string的模板类,有std::string的地方就会调用 _ZNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcj可以找到源码那里使用了std::string
类似的方法,我们可以找到源码中那里的调用导致了undefined symbol
6. 怎样规避undefined symbol
std::vector
std::string
std::hash
std::sort
6.1 std::string
不使用std::string, 用const char*替代
6.2 std::vector
是处理错误的undefined, 我们可以实现这个函数,直接用plus plus后的名字,和参数无关的C实现
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
void _ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv(void);
void _ZNSt11logic_errorC2EPKc(void);
#ifdef __cplusplus
} // extern "C"
#endif // __cplusplus
#endif // TENSORFLOW_LITE_C_BUILTIN_OP_DATA_H_
#include "tensorflow/lite/c/chre_stdc_undefined.h"
void _ZNKSt3__120__vector_base_commonILb1EE20__throw_length_errorEv(void){
}
void _ZNSt11logic_errorC2EPKc(void) {
}
6.2 std::sort
_ZNSt3__16__sortIRNS_6__lessIiiEEPiEEvT0_S5_T_
6.2.1源码定位到std::sort
la10/slpi_proc/chre/tensorflow/tensorflow/lite/graph_info.cc:103
3397 4329c: 42 c0 9d 91 919dc042 { r2=memw(r29+#8) }
3398 /home/ws/buildtool/fusion/tools/HEXAGON_Tools/8.2.05/Tools/bin/../target/hexagon/include/c++/v1/vector:1466
3399 432a0: 10 c0 82 91 9182c010 { r16=memw(r2+#0) }
3400 /home/ws/buildtool/fusion/tools/HEXAGON_Tools/8.2.05/Tools/bin/../target/hexagon/include/c++/v1/vector:1482
3401 432a4: 31 40 82 91 91824031 { r17=memw(r2+#4)
3402 432a8: 98 d0 02 20 2002d098 if (cmp.eq(r17.new,r16)) jump:nt 0x433d4 <_ZN6tflite38PartitionGraphIntoIndependentSubgraphsEPKNS_9GraphInfoEPK14TfLiteIntArrayPNSt3__16vectorINS_8SubgraphENS6_9a
3403 /home/ws/buildtool/fusion/tools/HEXAGON_Tools/8.2.05/Tools/bin/../target/hexagon/include/c++/v1/vector:1466
3404 432ac: 7a 59 04 5a 5a04597a { call 0x665a0 <_ZNSt3__16__sortIRNS_6__lessIiiEEPiEEvT0_S5_T_>
100 // Make sure every subgraph's inputs and outputs are unique. Since the
101 // list of inputs and outputs is generated in a way that produces
102 // duplicates.
103 for (Subgraph& subgraph : *subgraphs_) {
104 // Sort and uniquefy using standard library algorithms.
105 auto uniquefy = [](std::vector<int>* items) {
106 std::sort(items->begin(), items->end());
107 auto last = std::unique(items->begin(), items->end());
108 items->erase(last, items->end());
109 };
110 uniquefy(&subgraph.input_tensors);
111 uniquefy(&subgraph.output_tensors);
112 }
//sort:
/**
* @brief Sort the elements of a sequence.
* @ingroup sorting_algorithms
* @param __first An iterator.
* @param __last Another iterator.
* @return Nothing.
*
* Sorts the elements in the range @p [__first,__last) in ascending order,
* such that for each iterator @e i in the range @p [__first,__last-1),
* *(i+1)<*i is false.
*
* The relative ordering of equivalent elements is not preserved, use
* @p stable_sort() if this is needed.
*/
template<typename _RandomAccessIterator>
inline void
sort(_RandomAccessIterator __first, _RandomAccessIterator __last)
{
// concept requirements
__glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept<
_RandomAccessIterator>)
__glibcxx_function_requires(_LessThanComparableConcept<
typename iterator_traits<_RandomAccessIterator>::value_type>)
__glibcxx_requires_valid_range(__first, __last);
__glibcxx_requires_irreflexive(__first, __last);
std::__sort(__first, __last, __gnu_cxx::__ops::__iter_less_iter());
}
6.2.2 解决
// comparator predicate: returns true if a < b, false otherwise
struct IntComparator
{
bool operator()(const int &a, const int &b) const
{
return a < b;
}
};
void test_std_sort(void) {
std::vector<int> items { 4, 3, 1, 2, 4, 2};
//std::sort(items.begin(), items.end(), IntComparator());
// Sort and uniquefy using standard library algorithms.
auto uniquefy = [](std::vector<int>* items) {
std::sort(items->begin(), items->end(), IntComparator());
auto last = std::unique(items->begin(), items->end());
items->erase(last, items->end());
};
uniquefy(&items);
for (int i= 0; i<items.size(); i++) {
FARF(ALWAYS, "--------> test sort [%d] <-----------------------",items[i]);
}
}
6.3 _ZNSt3__112__next_primeEj (hash相关)
2109 void
2110 __hash_table<_Tp, _Hash, _Equal, _Alloc>::rehash(size_type __n)
2111 {
2112 if (__n == 1)
2113 __n = 2;
2114 else if (__n & (__n - 1))
2115 __n = __next_prime(__n);
2116 size_type __bc = bucket_count();
2117 if (__n > __bc)
2118 __rehash(__n);
2119 else if (__n < __bc)
2120 {
2121 __n = _VSTD::max<size_type>
2122 (
2123 __n,
2124 __is_hash_power2(__bc) ? __next_hash_pow2(size_t(ceil(float(size()) / max_load_factor()
2125 __next_prime(size_t(ceil(float(size()) / max_load_factor())))
2126 );
2127 if (__n < __bc)
2128 __rehash(__n);
2129 }
2130 }
使用map代替hash_map
71 typedef std::pair<tflite::BuiltinOperator, int> BuiltinOperatorKey;
72 //typedef std::pair<std::string, int> CustomOperatorKey;
73 typedef std::pair<const char*, int> CustomOperatorKey;
74 #if 0
75 std::unordered_map<BuiltinOperatorKey, TfLiteRegistration,
76 op_resolver_hasher::OperatorKeyHasher<BuiltinOperatorKey> >
77 builtins_;
78 std::unordered_map<CustomOperatorKey, TfLiteRegistration,
79 op_resolver_hasher::OperatorKeyHasher<CustomOperatorKey> >
80 custom_ops_;
81 #endif
82 std::map<BuiltinOperatorKey, TfLiteRegistration> builtins_;
83 std::map<CustomOperatorKey, TfLiteRegistration> custom_ops_;
这个是关键的函数,不能自己实现,也没找到类似sort的方法(用另一种实现),更改相关代码,不使用hash
void test_pair_map(void) {
typedef std::pair<int, int> BuiltinOperatorKey;
typedef std::pair<const char*, int> CustomOperatorKey;
std::map<BuiltinOperatorKey, int> tst;
BuiltinOperatorKey tst_pair = std::make_pair(2, 3);
tst[tst_pair] = 4;
FARF(ALWAYS, "--------> test pait map <-----------------------");
}
把上面的库相关的更改后,解决undefined symbol问题。
这篇关于TfLite porting: 生成目标文件太大的问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!