z memcpy(for MSVC)小内存高速复制

2024-02-26 02:48
文章标签 内存 复制 memcpy msvc 高速

本文主要是介绍z memcpy(for MSVC)小内存高速复制,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!



z memcpy(for MSVC only)小内存高速复制,即使在debug模式下复制的也比memcpy快,release下差距也不大

2016-3-6注意: 由于指令缓存命中、内联深度等方面的原因,此函数性能测试看起来很不错,插入到程序中实际应用时就不一定了,请测试对比后再决定使用

对VC2008及以上均测试通过

Release Mode
All time to memcpy 63 * 100M is 0.042s in 3GHz (zmemcopy template const size)
All time to memcpy 63 * 100M is 0.050s in 3GHz (zmemcopy static  const size)
All time to memcpy 63 * 100M is 0.147s in 3GHz (memcpy const size)
All time to memcpy 63 * 100M is 0.048s in 3GHz (zmemcopy const size)
All time to memcpy 63 * 100M is 0.050s in 3GHz (zmemcopy unknown array direct)
All time to memcpy 63 * 100M is 0.051s in 3GHz (zmemcopy unknown small size)
All time to memcpy 63 * 100M is 0.056s in 3GHz (zmemcopy unknown size)
All time to memcpy 63 * 100M is 0.140s in 3GHz (memcpy unknown size)


Debug Mode
All time to memcpy 63 * 100M is 0.056s in 3GHz (zmemcopy template const size)
All time to memcpy 63 * 100M is 0.055s in 3GHz (zmemcopy static  const size)
All time to memcpy 63 * 100M is 0.171s in 3GHz (memcpy const size)
All time to memcpy 63 * 100M is 0.093s in 3GHz (zmemcopy const size)
All time to memcpy 63 * 100M is 0.060s in 3GHz (zmemcopy unknown array direct)
All time to memcpy 63 * 100M is 0.086s in 3GHz (zmemcopy unknown small size)
All time to memcpy 63 * 100M is 0.100s in 3GHz (zmemcopy unknown size)
All time to memcpy 63 * 100M is 0.172s in 3GHz (memcpy unknown size)

使用方法,一般用zmemcpy(dest,src,size),在已知目标长度(且是常量表达式时)可以用 ZMemoryCopy::copy<size>(dest,src)

// zmemcpy.cpp : 定义控制台应用程序的入口点。
//#include "stdafx.h"
#include "zmemcpy.h"
//#include <intrin.h>
//#include <nmmintrin.h>
//#include <windows.h>
//#include <utility>
__declspec(noinline) void* GetCurrentAddress()
{return _ReturnAddress();
}
inline void* GetRetAddress()
{return _ReturnAddress();
}
__declspec(noinline) bool IsReleaseMode()
{return _ReturnAddress() == GetRetAddress();
}
bool g_IsReleaseMode = IsReleaseMode();// #pragma runtime_checks( "scu", restore )
//#define pCopyFunc(x) copy<x>
#define pCopyFunc(x) ZMemoryCopy::___copy_##x
void(*const copys[129])(char* dest, const char* src) =
{ZMemoryCopy::___copy_0,pCopyFunc(1), pCopyFunc(2), pCopyFunc(3), pCopyFunc(4), pCopyFunc(5), pCopyFunc(6), pCopyFunc(7), pCopyFunc(8), pCopyFunc(9), pCopyFunc(10),pCopyFunc(11), pCopyFunc(12), pCopyFunc(13), pCopyFunc(14), pCopyFunc(15), pCopyFunc(16), pCopyFunc(17), pCopyFunc(18), pCopyFunc(19), pCopyFunc(20),pCopyFunc(21), pCopyFunc(22), pCopyFunc(23), pCopyFunc(24), pCopyFunc(25), pCopyFunc(26), pCopyFunc(27), pCopyFunc(28), pCopyFunc(29), pCopyFunc(30),pCopyFunc(31), pCopyFunc(32), pCopyFunc(33), pCopyFunc(34), pCopyFunc(35), pCopyFunc(36), pCopyFunc(37), pCopyFunc(38), pCopyFunc(39), pCopyFunc(40),pCopyFunc(41), pCopyFunc(42), pCopyFunc(43), pCopyFunc(44), pCopyFunc(45), pCopyFunc(46), pCopyFunc(47), pCopyFunc(48), pCopyFunc(49), pCopyFunc(50),pCopyFunc(51), pCopyFunc(52), pCopyFunc(53), pCopyFunc(54), pCopyFunc(55), pCopyFunc(56), pCopyFunc(57), pCopyFunc(58), pCopyFunc(59), pCopyFunc(60),pCopyFunc(61), pCopyFunc(62), pCopyFunc(63), pCopyFunc(64), pCopyFunc(65), pCopyFunc(66), pCopyFunc(67), pCopyFunc(68), pCopyFunc(69), pCopyFunc(70),pCopyFunc(71), pCopyFunc(72), pCopyFunc(73), pCopyFunc(74), pCopyFunc(75), pCopyFunc(76), pCopyFunc(77), pCopyFunc(78), pCopyFunc(79), pCopyFunc(80),pCopyFunc(81), pCopyFunc(82), pCopyFunc(83), pCopyFunc(84), pCopyFunc(85), pCopyFunc(86), pCopyFunc(87), pCopyFunc(88), pCopyFunc(89), pCopyFunc(90),pCopyFunc(91), pCopyFunc(92), pCopyFunc(93), pCopyFunc(94), pCopyFunc(95), pCopyFunc(96), pCopyFunc(97), pCopyFunc(98), pCopyFunc(99), pCopyFunc(100),pCopyFunc(101), pCopyFunc(102), pCopyFunc(103), pCopyFunc(104), pCopyFunc(105), pCopyFunc(106), pCopyFunc(107), pCopyFunc(108), pCopyFunc(109), pCopyFunc(110),pCopyFunc(111), pCopyFunc(112), pCopyFunc(113), pCopyFunc(114), pCopyFunc(115), pCopyFunc(116), pCopyFunc(117), pCopyFunc(118), pCopyFunc(119), pCopyFunc(120),pCopyFunc(121), pCopyFunc(122), pCopyFunc(123), pCopyFunc(124), pCopyFunc(125), pCopyFunc(126), pCopyFunc(127), pCopyFunc(128),
};
#undef pCopyFunc#define pCopyFunc(x) ZMemoryCopy::copy<x>
void(*const template_copys[129])(char* dest, const char* src) =
{ZMemoryCopy::___copy_0,pCopyFunc(1), pCopyFunc(2), pCopyFunc(3), pCopyFunc(4), pCopyFunc(5), pCopyFunc(6), pCopyFunc(7), pCopyFunc(8), pCopyFunc(9), pCopyFunc(10),pCopyFunc(11), pCopyFunc(12), pCopyFunc(13), pCopyFunc(14), pCopyFunc(15), pCopyFunc(16), pCopyFunc(17), pCopyFunc(18), pCopyFunc(19), pCopyFunc(20),pCopyFunc(21), pCopyFunc(22), pCopyFunc(23), pCopyFunc(24), pCopyFunc(25), pCopyFunc(26), pCopyFunc(27), pCopyFunc(28), pCopyFunc(29), pCopyFunc(30),pCopyFunc(31), pCopyFunc(32), pCopyFunc(33), pCopyFunc(34), pCopyFunc(35), pCopyFunc(36), pCopyFunc(37), pCopyFunc(38), pCopyFunc(39), pCopyFunc(40),pCopyFunc(41), pCopyFunc(42), pCopyFunc(43), pCopyFunc(44), pCopyFunc(45), pCopyFunc(46), pCopyFunc(47), pCopyFunc(48), pCopyFunc(49), pCopyFunc(50),pCopyFunc(51), pCopyFunc(52), pCopyFunc(53), pCopyFunc(54), pCopyFunc(55), pCopyFunc(56), pCopyFunc(57), pCopyFunc(58), pCopyFunc(59), pCopyFunc(60),pCopyFunc(61), pCopyFunc(62), pCopyFunc(63), pCopyFunc(64), pCopyFunc(65), pCopyFunc(66), pCopyFunc(67), pCopyFunc(68), pCopyFunc(69), pCopyFunc(70),pCopyFunc(71), pCopyFunc(72), pCopyFunc(73), pCopyFunc(74), pCopyFunc(75), pCopyFunc(76), pCopyFunc(77), pCopyFunc(78), pCopyFunc(79), pCopyFunc(80),pCopyFunc(81), pCopyFunc(82), pCopyFunc(83), pCopyFunc(84), pCopyFunc(85), pCopyFunc(86), pCopyFunc(87), pCopyFunc(88), pCopyFunc(89), pCopyFunc(90),pCopyFunc(91), pCopyFunc(92), pCopyFunc(93), pCopyFunc(94), pCopyFunc(95), pCopyFunc(96), pCopyFunc(97), pCopyFunc(98), pCopyFunc(99), pCopyFunc(100),pCopyFunc(101), pCopyFunc(102), pCopyFunc(103), pCopyFunc(104), pCopyFunc(105), pCopyFunc(106), pCopyFunc(107), pCopyFunc(108), pCopyFunc(109), pCopyFunc(110),pCopyFunc(111), pCopyFunc(112), pCopyFunc(113), pCopyFunc(114), pCopyFunc(115), pCopyFunc(116), pCopyFunc(117), pCopyFunc(118), pCopyFunc(119), pCopyFunc(120),pCopyFunc(121), pCopyFunc(122), pCopyFunc(123), pCopyFunc(124), pCopyFunc(125), pCopyFunc(126), pCopyFunc(127), pCopyFunc(128),
};
#undef pCopyFunc#define pCopyFunc(x) ZMemoryCopy::___copy_##x
static void(*const static_copys[129])(char* dest, const char* src) =
{ZMemoryCopy::___copy_0,pCopyFunc(1), pCopyFunc(2), pCopyFunc(3), pCopyFunc(4), pCopyFunc(5), pCopyFunc(6), pCopyFunc(7), pCopyFunc(8), pCopyFunc(9), pCopyFunc(10),pCopyFunc(11), pCopyFunc(12), pCopyFunc(13), pCopyFunc(14), pCopyFunc(15), pCopyFunc(16), pCopyFunc(17), pCopyFunc(18), pCopyFunc(19), pCopyFunc(20),pCopyFunc(21), pCopyFunc(22), pCopyFunc(23), pCopyFunc(24), pCopyFunc(25), pCopyFunc(26), pCopyFunc(27), pCopyFunc(28), pCopyFunc(29), pCopyFunc(30),pCopyFunc(31), pCopyFunc(32), pCopyFunc(33), pCopyFunc(34), pCopyFunc(35), pCopyFunc(36), pCopyFunc(37), pCopyFunc(38), pCopyFunc(39), pCopyFunc(40),pCopyFunc(41), pCopyFunc(42), pCopyFunc(43), pCopyFunc(44), pCopyFunc(45), pCopyFunc(46), pCopyFunc(47), pCopyFunc(48), pCopyFunc(49), pCopyFunc(50),pCopyFunc(51), pCopyFunc(52), pCopyFunc(53), pCopyFunc(54), pCopyFunc(55), pCopyFunc(56), pCopyFunc(57), pCopyFunc(58), pCopyFunc(59), pCopyFunc(60),pCopyFunc(61), pCopyFunc(62), pCopyFunc(63), pCopyFunc(64), pCopyFunc(65), pCopyFunc(66), pCopyFunc(67), pCopyFunc(68), pCopyFunc(69), pCopyFunc(70),pCopyFunc(71), pCopyFunc(72), pCopyFunc(73), pCopyFunc(74), pCopyFunc(75), pCopyFunc(76), pCopyFunc(77), pCopyFunc(78), pCopyFunc(79), pCopyFunc(80),pCopyFunc(81), pCopyFunc(82), pCopyFunc(83), pCopyFunc(84), pCopyFunc(85), pCopyFunc(86), pCopyFunc(87), pCopyFunc(88), pCopyFunc(89), pCopyFunc(90),pCopyFunc(91), pCopyFunc(92), pCopyFunc(93), pCopyFunc(94), pCopyFunc(95), pCopyFunc(96), pCopyFunc(97), pCopyFunc(98), pCopyFunc(99), pCopyFunc(100),pCopyFunc(101), pCopyFunc(102), pCopyFunc(103), pCopyFunc(104), pCopyFunc(105), pCopyFunc(106), pCopyFunc(107), pCopyFunc(108), pCopyFunc(109), pCopyFunc(110),pCopyFunc(111), pCopyFunc(112), pCopyFunc(113), pCopyFunc(114), pCopyFunc(115), pCopyFunc(116), pCopyFunc(117), pCopyFunc(118), pCopyFunc(119), pCopyFunc(120),pCopyFunc(121), pCopyFunc(122), pCopyFunc(123), pCopyFunc(124), pCopyFunc(125), pCopyFunc(126), pCopyFunc(127), pCopyFunc(128),
};
#undef pCopyFuncchar dest[32000000];
char dest2[32000000];
const char pSource_[32000000] = "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"123456789012345678901234567-901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"
"1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890""abcde67890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890";
const char * volatile pSource = pSource_;// __declspec(noinline) void __fastcall donothing(int v)
// {
//  	__asm nop;
// }
// int _______reserved = (donothing((const volatile int&)(const int&)1), donothing((const volatile int&)(const int&)2), 1);
// __declspec(noinline) void __fastcall donothing_(int v)
// {
// 	donothing(v);
// }
// 
// __declspec(noinline) bool testConstFunction()
// {
// 	unsigned char* ptr;
// 	if (*(unsigned short*)((ptr = (unsigned char*)_ReturnAddress())- 15) == *(unsigned short*)"\xC7\x05")
// 	{
// 		DWORD p;
// 		VirtualProtect(ptr - 15, 15, PAGE_EXECUTE_READWRITE, &p);
// 		p = *(int*)(ptr - 15 + 2 + 4);
// 		memcpy(ptr - 15, "\xB8\x90\x90\x90\x90\xB8\x01\x00\x00\x00\x90\x90\x90\x90\x90", 15);	//mov eax, 1
// 		*(int*)(ptr - 14) = p;
// 		return true;
// 	}
// 	else
// 	{
// 		DWORD p;
// 		VirtualProtect(ptr - 10, 10, PAGE_EXECUTE_READWRITE, &p);
// 		memcpy(ptr - 10, "\xB8\x00\x00\x00\x00\xB8\x00\x00\x00\x00", 10);	//mov eax, 0
// 		return false;
// 	}
// }
// 
// static int g_somewhere;
// #define testConst(v) (_mm_pause(), g_somewhere = v, testConstFunction())void test();#define constsize 63					//每次复制的内存大小
#if 0							//1 开启无缓存测试:通过不断改变地址使cache失效
#define nocache + (i & 0xFFF) * 4096		
#else							//0 开启有缓存测试:源数据和目标数据总在cache中
#define nocache
#endif#if constsize > 128
#define copysize constsize
namespace ZMemoryCopy
{
#include "zmemcpyinc.h"
}
#endif#pragma runtime_checks( "s", restore ) int _tmain(int argc, _TCHAR* argv[])
{if (g_IsReleaseMode)puts("Release Mode");elseputs("Debug Mode");char test[2000];for (int i = 1; i <= 128; ++i){memset(test, 0, 200);template_copys[i](test, pSource);if(memcmp(test, pSource, i) != 0)__debugbreak();if (test[i] != (char)0)__debugbreak();}for (int i = 1; i <= 499; ++i){memset(test, 0, 500);zmemcpy(test, pSource, i);if (memcmp(test, pSource, i) != 0)__debugbreak();if (test[i] != (char)0)__debugbreak();}for (int i = 1; i <= 128; ++i){memset(test, 0, 200);static_copys[i](test, pSource);if (memcmp(test, pSource, i) != 0)__debugbreak();if (test[i] != (char)0)__debugbreak();}memset(dest, 0, sizeof(dest));memset(dest2, 0, sizeof(dest2));memcpy(dest2, pSource, sizeof(dest2));int volatile unknownSize = constsize;for (int j = 0; j < 4; ++j){{__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){ZMemoryCopy::copy<constsize>(dest nocache/*+ 1*/, pSource nocache /*+ 1*/);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (zmemcopy template const size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}{__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){_COMBINE(ZMemoryCopy::___copy_, constsize)(dest nocache/*+ 1*/, pSource nocache /*+ 1*/);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (zmemcopy static  const size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}{__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){memcpy(dest nocache/*+ 1*/, pSource nocache/*+ 1*/, constsize);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (memcpy const size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}{__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){zmemcpy(dest nocache/*+ 1*/, pSource nocache /*+ 1*/, constsize);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (zmemcopy const size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}if (unknownSize < 128){__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){copys[unknownSize](dest nocache/*+ 1*/, pSource nocache /*+ 1*/);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (zmemcopy unknown array direct)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}if (unknownSize < 128){__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){zmemcpy_max128(dest nocache/*+ 1*/, pSource nocache /*+ 1*/, unknownSize);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (zmemcopy unknown small size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}{__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){zmemcpy(dest nocache/*+ 1*/, pSource nocache /*+ 1*/, unknownSize);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (zmemcopy unknown size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}{__int64 t = __rdtsc();for (int i = 0; i < 10000000; ++i){memcpy(dest nocache/*+ 1*/, pSource nocache /*+ 1*/, unknownSize);}t = __rdtsc() - t;printf("All time to memcpy %d * %dM is %0.3fs in 3GHz (memcpy unknown size)\n", constsize, 100000000 / 1000000, t / 3000000000.0);}puts("");}return 0;
}#pragma runtime_checks( "s", restore ) 


zmemcpy.h:

#pragma once#include <windows.h>
#include <intrin.h>#ifndef _SAFEBUFFERS
#if _MSC_VER >= 1600
#define _SAFEBUFFERS __declspec(safebuffers)
#else
#define _SAFEBUFFERS
#endif
#endifnamespace z
{
#ifndef _Z_IF_DEFINED
#define _Z_IF_DEFINEDtemplate<bool v>struct If{enum{ True = 1 };};template<>struct If < false >{enum{ False = 1 };};//强制使用z::If<false>和z::If<true>//这样__if_exists有效
#if _MSC_VER >= 1600static_assert(z::If<false>::False, "");static_assert(z::If<true>::True, "");
#elseenum{ ___unknown = z::If<true>::True + z::If<false>::False };
#endif#endif //_Z_IF_DEFINED#pragma runtime_checks( "s", off) //由于#pragma runtime_checks必须在cpp末尾关闭,才能对模板生效,因此这里手工动态修改机器码移除stack check代码//加速Debug模式下的函数执行inline __declspec(noinline) _SAFEBUFFERS void RemoveCodeOf_InitESPBuffer(){__asm pushad;{unsigned char* ptr;if (*(unsigned int*)((ptr = (unsigned char*)_ReturnAddress() - 17) + 5) == *(unsigned int*)"\xB8\xCC\xCC\xCC"){DWORD p;VirtualProtect(ptr, 17, PAGE_EXECUTE_READWRITE, &p);//memset(ptr, 0x90, 17);memcpy(ptr, "\xE9\x0C\x00\x00\x00\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90", 17);}else if (*(unsigned char*)(ptr = (unsigned char*)_ReturnAddress() - 5) == 0xE8u){DWORD p;VirtualProtect(ptr, 5, PAGE_EXECUTE_READWRITE, &p);memset(ptr, 0x90, 5);}}__asm popad;}inline __declspec(noinline) _SAFEBUFFERS void RemoveCodeOf_CheckESP(){unsigned char* ptr;if (*(unsigned char*)(ptr = (unsigned char*)_ReturnAddress() - 5) == 0xE8u){DWORD p;VirtualProtect(ptr, 22, PAGE_EXECUTE_READWRITE, &p);memset(ptr, 0x90, 5);if (*(unsigned int*)(ptr + 5 + 1 + 3 + 5) == *(unsigned int*)"\x00\x3B\xEC\xE8"){memset(ptr + 5 + 1 + 3 + 5 + 1, 0x90, 7);memcpy(ptr + 5 + 1 + 3 + 5 + 1, "\x8B\xE5\x5D\xC3", 4);//   memcpy(ptr + 5, "\x5F\x5E\x5B\x81\xC4\xC0\x00\x00\x00\x8B\xE5\x5D\xC3", 13);}//3B EC cmp         ebp, esp//E8 xxxxxxxx call  __RTC_CheckEsp}else{__debugbreak();}}#pragma runtime_checks( "s", restore ) 
}namespace ZMemoryCopy
{
#pragma runtime_checks( "s", off) //成组地复制128字节inline __declspec(noinline) _SAFEBUFFERS void __copy_group(char* dest, const char* src, int size){__asm{mov esi, dword ptr[src];mov edi, dword ptr[dest];}while ((size -= 0x80) >= 0){__asm{movdqu xmm0, xmmword ptr[esi + 0x00];movdqu xmm1, xmmword ptr[esi + 0x10];movdqu xmm2, xmmword ptr[esi + 0x20];movdqu xmm3, xmmword ptr[esi + 0x30];movdqu xmm4, xmmword ptr[esi + 0x40];movdqu xmm5, xmmword ptr[esi + 0x50];movdqu xmm6, xmmword ptr[esi + 0x60];movdqu xmm7, xmmword ptr[esi + 0x70];prefetchnta[esi + 0x80];prefetchnta[esi + 0xC0];movdqu xmmword ptr[edi + 0x00], xmm0;movdqu xmmword ptr[edi + 0x10], xmm1;movdqu xmmword ptr[edi + 0x20], xmm2;movdqu xmmword ptr[edi + 0x30], xmm3;movdqu xmmword ptr[edi + 0x40], xmm4;movdqu xmmword ptr[edi + 0x50], xmm5;movdqu xmmword ptr[edi + 0x60], xmm6;movdqu xmmword ptr[edi + 0x70], xmm7;add esi, 0x80;add edi, 0x80;}}}//如果已知块大小(且是常量表达式),可以直接使用这个版本template<int copysize>static _SAFEBUFFERS __forceinline void copy(char* dest, const char* src){//由于#pragma runtime_checks必须在cpp末尾关闭,才能对模板生效,因此这里手工动态修改机器码移除stack check代码//加速Debug模式下的函数执行z::RemoveCodeOf_InitESPBuffer();__if_exists(z::If<(copysize >= 4000)>::True){memcpy(dest, src, copysize);}__if_exists(z::If<(copysize >= 4000)>::False){__asm{mov esi, dword ptr[src];mov edi, dword ptr[dest];}__if_exists(z::If<(copysize >= 0x80 * 3)>::True){__asm{prefetchnta[esi + 0x40];}int vsize = copysize;while ((vsize -= 0x80) >= 0x80)}//__if_exists(z::If<(copysize >= 0x80 * 2)>::True){__asm{movdqu xmm0, xmmword ptr[esi + 0x00];movdqu xmm1, xmmword ptr[esi + 0x10];movdqu xmm2, xmmword ptr[esi + 0x20];movdqu xmm3, xmmword ptr[esi + 0x30];movdqu xmm4, xmmword ptr[esi + 0x40];movdqu xmm5, xmmword ptr[esi + 0x50];movdqu xmm6, xmmword ptr[esi + 0x60];movdqu xmm7, xmmword ptr[esi + 0x70];prefetchnta[esi + 0x80];prefetchnta[esi + 0xC0];movdqu xmmword ptr[edi + 0x00], xmm0;movdqu xmmword ptr[edi + 0x10], xmm1;movdqu xmmword ptr[edi + 0x20], xmm2;movdqu xmmword ptr[edi + 0x30], xmm3;movdqu xmmword ptr[edi + 0x40], xmm4;movdqu xmmword ptr[edi + 0x50], xmm5;movdqu xmmword ptr[edi + 0x60], xmm6;movdqu xmmword ptr[edi + 0x70], xmm7;add esi, 0x80;add edi, 0x80;}}enum { offset1 = 0 };//__if_exists(z::If<(copysize >= 0x80)>::True){__asm{movdqu xmm0, xmmword ptr[esi + 0x00];movdqu xmm1, xmmword ptr[esi + 0x10];movdqu xmm2, xmmword ptr[esi + 0x20];movdqu xmm3, xmmword ptr[esi + 0x30];movdqu xmm4, xmmword ptr[esi + 0x40];movdqu xmm5, xmmword ptr[esi + 0x50];movdqu xmm6, xmmword ptr[esi + 0x60];movdqu xmm7, xmmword ptr[esi + 0x70];}__if_exists(z::If<(copysize & 0x60)>::True){__asm{prefetchnta[esi + 0x80];}}__asm{movdqu xmmword ptr[edi + 0x00], xmm0;movdqu xmmword ptr[edi + 0x10], xmm1;movdqu xmmword ptr[edi + 0x20], xmm2;movdqu xmmword ptr[edi + 0x30], xmm3;movdqu xmmword ptr[edi + 0x40], xmm4;movdqu xmmword ptr[edi + 0x50], xmm5;movdqu xmmword ptr[edi + 0x60], xmm6;movdqu xmmword ptr[edi + 0x70], xmm7;//    add esi, 0x80;//    add edi, 0x80;}enum { offset2 = 0x80 };}__if_exists(z::If<(copysize >= 0x80)>::False){enum { offset2 = 0 };}//__if_exists(z::If<(copysize & 0x40)>::True){__asm{movdqu xmm0, xmmword ptr[esi + offset2 + 0x00];movdqu xmm1, xmmword ptr[esi + offset2 + 0x10];movdqu xmm2, xmmword ptr[esi + offset2 + 0x20];movdqu xmm3, xmmword ptr[esi + offset2 + 0x30];movdqu xmmword ptr[edi + offset2 + 0x00], xmm0;movdqu xmmword ptr[edi + offset2 + 0x10], xmm1;movdqu xmmword ptr[edi + offset2 + 0x20], xmm2;movdqu xmmword ptr[edi + offset2 + 0x30], xmm3;}enum { offset3 = offset2 + 0x40 };}__if_exists(z::If<(copysize & 0x40)>::False){enum { offset3 = offset2 };}//__if_exists(z::If<(copysize & 0x20)>::True){__asm{movdqu xmm4, xmmword ptr[esi + offset3 + 0x00];movdqu xmm5, xmmword ptr[esi + offset3 + 0x10];movdqu xmmword ptr[edi + offset3 + 0x00], xmm4;movdqu xmmword ptr[edi + offset3 + 0x10], xmm5;}enum { offset4 = offset3 + 0x20 };}__if_exists(z::If<(copysize & 0x20)>::False){enum { offset4 = offset3 };}//__if_exists(z::If<(copysize & 0x10)>::True){__asm{movdqu xmm6, xmmword ptr[esi + offset4 + 0x00];movdqu xmmword ptr[edi + offset4 + 0x00], xmm6;}enum { offset5 = offset4 + 0x10 };}__if_exists(z::If<(copysize & 0x10)>::False){enum { offset5 = offset4 };}//__if_exists(z::If<(copysize & 0x8)>::True){__asm{movlpd xmm7, qword ptr[esi + offset5];movlpd qword ptr[edi + offset5], xmm7;}enum { offset6 = offset5 + 0x8 };}__if_exists(z::If<(copysize & 0x8)>::False){enum { offset6 = offset5 };}//__if_exists(z::If<(copysize & 0x7)>::True){enum { copydone = false };{__if_exists(z::If < ((copysize & 0x7) > 4) && copysize >= 8 > ::True) // 5 6 7 //8字节移动版{enum{ copy_offset = (copysize & 0x7) - 8 };__asm{movlpd xmm0, qword ptr[esi + offset6 + copy_offset];movlpd qword ptr[edi + offset6 + copy_offset], xmm0;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x7) >= 4)>::True) // 4 5 6 7        //缓冲区不够先移动4字节{__asm{mov eax, dword ptr[esi + offset6];mov dword ptr[edi + offset6], eax;}enum{ offset6 = offset6 + 4 };}__if_exists(z::If <!copydone && ((copysize & 0x3) == 3) && (copysize >= 4)> ::True) //3{__asm{mov eax, dword ptr[esi + offset6 - 1];mov dword ptr[edi + offset6 - 1], eax;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 3)> ::True) //3{__asm{mov ax, word ptr[esi + offset6];mov word ptr[edi + offset6], ax;mov al, byte ptr[esi + offset6 + 2];mov byte ptr[edi + offset6 + 2], al;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 2) > ::True) //2{__asm{mov ax, word ptr[esi + offset6];mov word ptr[edi + offset6], ax;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 1) > ::True) //1{__asm{mov al, byte ptr[esi + offset6];mov byte ptr[edi + offset6], al;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 0) > ::True) //0{enum { copydone = true };//return;}__if_exists(z::If<!copydone>::True){static_assert(0, "");}}}}z::RemoveCodeOf_CheckESP();__asm nop;}inline void ___copy_0(char* dest, const char* src){}#pragma runtime_checks( "s", restore)#define copysize 1
#include "zmemcpyinc.h"
#define copysize 2
#include "zmemcpyinc.h"
#define copysize 3
#include "zmemcpyinc.h"
#define copysize 4
#include "zmemcpyinc.h"
#define copysize 5
#include "zmemcpyinc.h"
#define copysize 6
#include "zmemcpyinc.h"
#define copysize 7
#include "zmemcpyinc.h"
#define copysize 8
#include "zmemcpyinc.h"
#define copysize 9
#include "zmemcpyinc.h"
#define copysize 10
#include "zmemcpyinc.h"
#define copysize 11
#include "zmemcpyinc.h"
#define copysize 12
#include "zmemcpyinc.h"
#define copysize 13
#include "zmemcpyinc.h"
#define copysize 14
#include "zmemcpyinc.h"
#define copysize 15
#include "zmemcpyinc.h"
#define copysize 16
#include "zmemcpyinc.h"
#define copysize 17
#include "zmemcpyinc.h"
#define copysize 18
#include "zmemcpyinc.h"
#define copysize 19
#include "zmemcpyinc.h"
#define copysize 20
#include "zmemcpyinc.h"
#define copysize 21
#include "zmemcpyinc.h"
#define copysize 22
#include "zmemcpyinc.h"
#define copysize 23
#include "zmemcpyinc.h"
#define copysize 24
#include "zmemcpyinc.h"
#define copysize 25
#include "zmemcpyinc.h"
#define copysize 26
#include "zmemcpyinc.h"
#define copysize 27
#include "zmemcpyinc.h"
#define copysize 28
#include "zmemcpyinc.h"
#define copysize 29
#include "zmemcpyinc.h"
#define copysize 30
#include "zmemcpyinc.h"
#define copysize 31
#include "zmemcpyinc.h"
#define copysize 32
#include "zmemcpyinc.h"
#define copysize 33
#include "zmemcpyinc.h"
#define copysize 34
#include "zmemcpyinc.h"
#define copysize 35
#include "zmemcpyinc.h"
#define copysize 36
#include "zmemcpyinc.h"
#define copysize 37
#include "zmemcpyinc.h"
#define copysize 38
#include "zmemcpyinc.h"
#define copysize 39
#include "zmemcpyinc.h"
#define copysize 40
#include "zmemcpyinc.h"
#define copysize 41
#include "zmemcpyinc.h"
#define copysize 42
#include "zmemcpyinc.h"
#define copysize 43
#include "zmemcpyinc.h"
#define copysize 44
#include "zmemcpyinc.h"
#define copysize 45
#include "zmemcpyinc.h"
#define copysize 46
#include "zmemcpyinc.h"
#define copysize 47
#include "zmemcpyinc.h"
#define copysize 48
#include "zmemcpyinc.h"
#define copysize 49
#include "zmemcpyinc.h"
#define copysize 50
#include "zmemcpyinc.h"
#define copysize 51
#include "zmemcpyinc.h"
#define copysize 52
#include "zmemcpyinc.h"
#define copysize 53
#include "zmemcpyinc.h"
#define copysize 54
#include "zmemcpyinc.h"
#define copysize 55
#include "zmemcpyinc.h"
#define copysize 56
#include "zmemcpyinc.h"
#define copysize 57
#include "zmemcpyinc.h"
#define copysize 58
#include "zmemcpyinc.h"
#define copysize 59
#include "zmemcpyinc.h"
#define copysize 60
#include "zmemcpyinc.h"
#define copysize 61
#include "zmemcpyinc.h"
#define copysize 62
#include "zmemcpyinc.h"
#define copysize 63
#include "zmemcpyinc.h"
#define copysize 64
#include "zmemcpyinc.h"
#define copysize 65
#include "zmemcpyinc.h"
#define copysize 66
#include "zmemcpyinc.h"
#define copysize 67
#include "zmemcpyinc.h"
#define copysize 68
#include "zmemcpyinc.h"
#define copysize 69
#include "zmemcpyinc.h"
#define copysize 70
#include "zmemcpyinc.h"
#define copysize 71
#include "zmemcpyinc.h"
#define copysize 72
#include "zmemcpyinc.h"
#define copysize 73
#include "zmemcpyinc.h"
#define copysize 74
#include "zmemcpyinc.h"
#define copysize 75
#include "zmemcpyinc.h"
#define copysize 76
#include "zmemcpyinc.h"
#define copysize 77
#include "zmemcpyinc.h"
#define copysize 78
#include "zmemcpyinc.h"
#define copysize 79
#include "zmemcpyinc.h"
#define copysize 80
#include "zmemcpyinc.h"
#define copysize 81
#include "zmemcpyinc.h"
#define copysize 82
#include "zmemcpyinc.h"
#define copysize 83
#include "zmemcpyinc.h"
#define copysize 84
#include "zmemcpyinc.h"
#define copysize 85
#include "zmemcpyinc.h"
#define copysize 86
#include "zmemcpyinc.h"
#define copysize 87
#include "zmemcpyinc.h"
#define copysize 88
#include "zmemcpyinc.h"
#define copysize 89
#include "zmemcpyinc.h"
#define copysize 90
#include "zmemcpyinc.h"
#define copysize 91
#include "zmemcpyinc.h"
#define copysize 92
#include "zmemcpyinc.h"
#define copysize 93
#include "zmemcpyinc.h"
#define copysize 94
#include "zmemcpyinc.h"
#define copysize 95
#include "zmemcpyinc.h"
#define copysize 96
#include "zmemcpyinc.h"
#define copysize 97
#include "zmemcpyinc.h"
#define copysize 98
#include "zmemcpyinc.h"
#define copysize 99
#include "zmemcpyinc.h"
#define copysize 100
#include "zmemcpyinc.h"
#define copysize 101
#include "zmemcpyinc.h"
#define copysize 102
#include "zmemcpyinc.h"
#define copysize 103
#include "zmemcpyinc.h"
#define copysize 104
#include "zmemcpyinc.h"
#define copysize 105
#include "zmemcpyinc.h"
#define copysize 106
#include "zmemcpyinc.h"
#define copysize 107
#include "zmemcpyinc.h"
#define copysize 108
#include "zmemcpyinc.h"
#define copysize 109
#include "zmemcpyinc.h"
#define copysize 110
#include "zmemcpyinc.h"
#define copysize 111
#include "zmemcpyinc.h"
#define copysize 112
#include "zmemcpyinc.h"
#define copysize 113
#include "zmemcpyinc.h"
#define copysize 114
#include "zmemcpyinc.h"
#define copysize 115
#include "zmemcpyinc.h"
#define copysize 116
#include "zmemcpyinc.h"
#define copysize 117
#include "zmemcpyinc.h"
#define copysize 118
#include "zmemcpyinc.h"
#define copysize 119
#include "zmemcpyinc.h"
#define copysize 120
#include "zmemcpyinc.h"
#define copysize 121
#include "zmemcpyinc.h"
#define copysize 122
#include "zmemcpyinc.h"
#define copysize 123
#include "zmemcpyinc.h"
#define copysize 124
#include "zmemcpyinc.h"
#define copysize 125
#include "zmemcpyinc.h"
#define copysize 126
#include "zmemcpyinc.h"
#define copysize 127
#include "zmemcpyinc.h"
#define copysize 128
#include "zmemcpyinc.h"#pragma runtime_checks( "s", off ) __forceinline void zmemcpy(char* dest, const char* src, size_t size){
#define pCopyFunc(x) ZMemoryCopy::___copy_##xstatic void(*const static_copys[129])(char* dest, const char* src) ={ZMemoryCopy::___copy_0,pCopyFunc(1), pCopyFunc(2), pCopyFunc(3), pCopyFunc(4), pCopyFunc(5), pCopyFunc(6), pCopyFunc(7), pCopyFunc(8), pCopyFunc(9), pCopyFunc(10),pCopyFunc(11), pCopyFunc(12), pCopyFunc(13), pCopyFunc(14), pCopyFunc(15), pCopyFunc(16), pCopyFunc(17), pCopyFunc(18), pCopyFunc(19), pCopyFunc(20),pCopyFunc(21), pCopyFunc(22), pCopyFunc(23), pCopyFunc(24), pCopyFunc(25), pCopyFunc(26), pCopyFunc(27), pCopyFunc(28), pCopyFunc(29), pCopyFunc(30),pCopyFunc(31), pCopyFunc(32), pCopyFunc(33), pCopyFunc(34), pCopyFunc(35), pCopyFunc(36), pCopyFunc(37), pCopyFunc(38), pCopyFunc(39), pCopyFunc(40),pCopyFunc(41), pCopyFunc(42), pCopyFunc(43), pCopyFunc(44), pCopyFunc(45), pCopyFunc(46), pCopyFunc(47), pCopyFunc(48), pCopyFunc(49), pCopyFunc(50),pCopyFunc(51), pCopyFunc(52), pCopyFunc(53), pCopyFunc(54), pCopyFunc(55), pCopyFunc(56), pCopyFunc(57), pCopyFunc(58), pCopyFunc(59), pCopyFunc(60),pCopyFunc(61), pCopyFunc(62), pCopyFunc(63), pCopyFunc(64), pCopyFunc(65), pCopyFunc(66), pCopyFunc(67), pCopyFunc(68), pCopyFunc(69), pCopyFunc(70),pCopyFunc(71), pCopyFunc(72), pCopyFunc(73), pCopyFunc(74), pCopyFunc(75), pCopyFunc(76), pCopyFunc(77), pCopyFunc(78), pCopyFunc(79), pCopyFunc(80),pCopyFunc(81), pCopyFunc(82), pCopyFunc(83), pCopyFunc(84), pCopyFunc(85), pCopyFunc(86), pCopyFunc(87), pCopyFunc(88), pCopyFunc(89), pCopyFunc(90),pCopyFunc(91), pCopyFunc(92), pCopyFunc(93), pCopyFunc(94), pCopyFunc(95), pCopyFunc(96), pCopyFunc(97), pCopyFunc(98), pCopyFunc(99), pCopyFunc(100),pCopyFunc(101), pCopyFunc(102), pCopyFunc(103), pCopyFunc(104), pCopyFunc(105), pCopyFunc(106), pCopyFunc(107), pCopyFunc(108), pCopyFunc(109), pCopyFunc(110),pCopyFunc(111), pCopyFunc(112), pCopyFunc(113), pCopyFunc(114), pCopyFunc(115), pCopyFunc(116), pCopyFunc(117), pCopyFunc(118), pCopyFunc(119), pCopyFunc(120),pCopyFunc(121), pCopyFunc(122), pCopyFunc(123), pCopyFunc(124), pCopyFunc(125), pCopyFunc(126), pCopyFunc(127), pCopyFunc(128),};
#undef pCopyFuncif (size >= 128)__copy_group(dest, src, size);if (size & 127)static_copys[size & 127](dest + (size &~127), src + (size &~127));}__forceinline void zmemcpy_max128(char* dest, const char* src, size_t size){
#define pCopyFunc(x) ZMemoryCopy::___copy_##xstatic void(*const static_copys[129])(char* dest, const char* src) ={ZMemoryCopy::___copy_0,pCopyFunc(1), pCopyFunc(2), pCopyFunc(3), pCopyFunc(4), pCopyFunc(5), pCopyFunc(6), pCopyFunc(7), pCopyFunc(8), pCopyFunc(9), pCopyFunc(10),pCopyFunc(11), pCopyFunc(12), pCopyFunc(13), pCopyFunc(14), pCopyFunc(15), pCopyFunc(16), pCopyFunc(17), pCopyFunc(18), pCopyFunc(19), pCopyFunc(20),pCopyFunc(21), pCopyFunc(22), pCopyFunc(23), pCopyFunc(24), pCopyFunc(25), pCopyFunc(26), pCopyFunc(27), pCopyFunc(28), pCopyFunc(29), pCopyFunc(30),pCopyFunc(31), pCopyFunc(32), pCopyFunc(33), pCopyFunc(34), pCopyFunc(35), pCopyFunc(36), pCopyFunc(37), pCopyFunc(38), pCopyFunc(39), pCopyFunc(40),pCopyFunc(41), pCopyFunc(42), pCopyFunc(43), pCopyFunc(44), pCopyFunc(45), pCopyFunc(46), pCopyFunc(47), pCopyFunc(48), pCopyFunc(49), pCopyFunc(50),pCopyFunc(51), pCopyFunc(52), pCopyFunc(53), pCopyFunc(54), pCopyFunc(55), pCopyFunc(56), pCopyFunc(57), pCopyFunc(58), pCopyFunc(59), pCopyFunc(60),pCopyFunc(61), pCopyFunc(62), pCopyFunc(63), pCopyFunc(64), pCopyFunc(65), pCopyFunc(66), pCopyFunc(67), pCopyFunc(68), pCopyFunc(69), pCopyFunc(70),pCopyFunc(71), pCopyFunc(72), pCopyFunc(73), pCopyFunc(74), pCopyFunc(75), pCopyFunc(76), pCopyFunc(77), pCopyFunc(78), pCopyFunc(79), pCopyFunc(80),pCopyFunc(81), pCopyFunc(82), pCopyFunc(83), pCopyFunc(84), pCopyFunc(85), pCopyFunc(86), pCopyFunc(87), pCopyFunc(88), pCopyFunc(89), pCopyFunc(90),pCopyFunc(91), pCopyFunc(92), pCopyFunc(93), pCopyFunc(94), pCopyFunc(95), pCopyFunc(96), pCopyFunc(97), pCopyFunc(98), pCopyFunc(99), pCopyFunc(100),pCopyFunc(101), pCopyFunc(102), pCopyFunc(103), pCopyFunc(104), pCopyFunc(105), pCopyFunc(106), pCopyFunc(107), pCopyFunc(108), pCopyFunc(109), pCopyFunc(110),pCopyFunc(111), pCopyFunc(112), pCopyFunc(113), pCopyFunc(114), pCopyFunc(115), pCopyFunc(116), pCopyFunc(117), pCopyFunc(118), pCopyFunc(119), pCopyFunc(120),pCopyFunc(121), pCopyFunc(122), pCopyFunc(123), pCopyFunc(124), pCopyFunc(125), pCopyFunc(126), pCopyFunc(127), pCopyFunc(128),};
#undef pCopyFunc__assume(size <= 128);static_copys[size](dest, src);}
#pragma runtime_checks( "s", restore)
}
using ZMemoryCopy::zmemcpy;
using ZMemoryCopy::zmemcpy_max128;

zmemcpyinc.h :

#ifndef END_WITH_copysize
#ifndef _COMBINE2
#define _COMBINE2(x,y) x##y
#define _COMBINE(x,y) _COMBINE2(x,y)
#endif
#define END_WITH_copysize(x) _COMBINE(x, copysize)
#endif#pragma runtime_checks( "s", off )
inline void END_WITH_copysize(___copy_)(char* dest, const char* src)
{__asm{mov esi, dword ptr[src];mov edi, dword ptr[dest];}__if_exists(z::If<(copysize >= 0x80 * 3)>::True){__asm{prefetchnta[esi + 0x40];}int vsize = copysize;while ((vsize -= 0x80) >= 0x80)}//__if_exists(z::If<(copysize >= 0x80 * 2)>::True){__asm{movdqu xmm0, xmmword ptr[esi + 0x00];movdqu xmm1, xmmword ptr[esi + 0x10];movdqu xmm2, xmmword ptr[esi + 0x20];movdqu xmm3, xmmword ptr[esi + 0x30];movdqu xmm4, xmmword ptr[esi + 0x40];movdqu xmm5, xmmword ptr[esi + 0x50];movdqu xmm6, xmmword ptr[esi + 0x60];movdqu xmm7, xmmword ptr[esi + 0x70];prefetchnta[esi + 0x80];prefetchnta[esi + 0xC0];movdqu xmmword ptr[edi + 0x00], xmm0;movdqu xmmword ptr[edi + 0x10], xmm1;movdqu xmmword ptr[edi + 0x20], xmm2;movdqu xmmword ptr[edi + 0x30], xmm3;movdqu xmmword ptr[edi + 0x40], xmm4;movdqu xmmword ptr[edi + 0x50], xmm5;movdqu xmmword ptr[edi + 0x60], xmm6;movdqu xmmword ptr[edi + 0x70], xmm7;add esi, 0x80;add edi, 0x80;}}enum { offset1 = 0 };//__if_exists(z::If<(copysize >= 0x80)>::True){__asm{movdqu xmm0, xmmword ptr[esi + 0x00];movdqu xmm1, xmmword ptr[esi + 0x10];movdqu xmm2, xmmword ptr[esi + 0x20];movdqu xmm3, xmmword ptr[esi + 0x30];movdqu xmm4, xmmword ptr[esi + 0x40];movdqu xmm5, xmmword ptr[esi + 0x50];movdqu xmm6, xmmword ptr[esi + 0x60];movdqu xmm7, xmmword ptr[esi + 0x70];}__if_exists(z::If<(copysize & 0x60)>::True){__asm{prefetchnta[esi + 0x80];}}__asm{movdqu xmmword ptr[edi + 0x00], xmm0;movdqu xmmword ptr[edi + 0x10], xmm1;movdqu xmmword ptr[edi + 0x20], xmm2;movdqu xmmword ptr[edi + 0x30], xmm3;movdqu xmmword ptr[edi + 0x40], xmm4;movdqu xmmword ptr[edi + 0x50], xmm5;movdqu xmmword ptr[edi + 0x60], xmm6;movdqu xmmword ptr[edi + 0x70], xmm7;//    add esi, 0x80;//    add edi, 0x80;}enum { offset2 = 0x80 };}__if_exists(z::If<(copysize >= 0x80)>::False){enum { offset2 = 0 };}//__if_exists(z::If<(copysize & 0x40)>::True){__asm{movdqu xmm0, xmmword ptr[esi + offset2 + 0x00];movdqu xmm1, xmmword ptr[esi + offset2 + 0x10];movdqu xmm2, xmmword ptr[esi + offset2 + 0x20];movdqu xmm3, xmmword ptr[esi + offset2 + 0x30];movdqu xmmword ptr[edi + offset2 + 0x00], xmm0;movdqu xmmword ptr[edi + offset2 + 0x10], xmm1;movdqu xmmword ptr[edi + offset2 + 0x20], xmm2;movdqu xmmword ptr[edi + offset2 + 0x30], xmm3;}enum { offset3 = offset2 + 0x40 };}__if_exists(z::If<(copysize & 0x40)>::False){enum { offset3 = offset2 };}//__if_exists(z::If<(copysize & 0x20)>::True){__asm{movdqu xmm4, xmmword ptr[esi + offset3 + 0x00];movdqu xmm5, xmmword ptr[esi + offset3 + 0x10];movdqu xmmword ptr[edi + offset3 + 0x00], xmm4;movdqu xmmword ptr[edi + offset3 + 0x10], xmm5;}enum { offset4 = offset3 + 0x20 };}__if_exists(z::If<(copysize & 0x20)>::False){enum { offset4 = offset3 };}//__if_exists(z::If<(copysize & 0x10)>::True){__asm{movdqu xmm6, xmmword ptr[esi + offset4 + 0x00];movdqu xmmword ptr[edi + offset4 + 0x00], xmm6;}enum { offset5 = offset4 + 0x10 };}__if_exists(z::If<(copysize & 0x10)>::False){enum { offset5 = offset4 };}//__if_exists(z::If<(copysize & 0x8)>::True){__asm{movlpd xmm7, qword ptr[esi + offset5];movlpd qword ptr[edi + offset5], xmm7;}enum { offset6 = offset5 + 0x8 };}__if_exists(z::If<(copysize & 0x8)>::False){enum { offset6 = offset5 };}//__if_exists(z::If<(copysize & 0x7)>::True){enum { copydone = false };{__if_exists(z::If < ((copysize & 0x7) > 4) && copysize >= 8 > ::True) // 5 6 7 //8字节移动版{enum{ copy_offset = (copysize & 0x7) - 8 };__asm{movlpd xmm0, qword ptr[esi + offset6 + copy_offset];movlpd qword ptr[edi + offset6 + copy_offset], xmm0;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x7) >= 4)>::True) // 4 5 6 7        //缓冲区不够先移动4字节{__asm{mov eax, dword ptr[esi + offset6];mov dword ptr[edi + offset6], eax;}enum{ offset6 = offset6 + 4 };}__if_exists(z::If <!copydone && ((copysize & 0x3) == 3) && (copysize >= 4)> ::True) //3{__asm{mov eax, dword ptr[esi + offset6 - 1];mov dword ptr[edi + offset6 - 1], eax;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 3)> ::True) //3{__asm{mov ax, word ptr[esi + offset6];mov word ptr[edi + offset6], ax;mov al, byte ptr[esi + offset6 + 2];mov byte ptr[edi + offset6 + 2], al;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 2) > ::True) //2{__asm{mov ax, word ptr[esi + offset6];mov word ptr[edi + offset6], ax;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 1) > ::True) //1{__asm{mov al, byte ptr[esi + offset6];mov byte ptr[edi + offset6], al;}enum { copydone = true };//return;}__if_exists(z::If <!copydone && ((copysize & 0x3) == 0) > ::True) //0{enum { copydone = true };//return;}__if_exists(z::If<!copydone>::True){static_assert(0, "");}}}
}
#pragma runtime_checks( "s", restore ) #undef copysize








这篇关于z memcpy(for MSVC)小内存高速复制的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/747500

相关文章

Java内存泄漏问题的排查、优化与最佳实践

《Java内存泄漏问题的排查、优化与最佳实践》在Java开发中,内存泄漏是一个常见且令人头疼的问题,内存泄漏指的是程序在运行过程中,已经不再使用的对象没有被及时释放,从而导致内存占用不断增加,最终... 目录引言1. 什么是内存泄漏?常见的内存泄漏情况2. 如何排查 Java 中的内存泄漏?2.1 使用 J

Linux使用dd命令来复制和转换数据的操作方法

《Linux使用dd命令来复制和转换数据的操作方法》Linux中的dd命令是一个功能强大的数据复制和转换实用程序,它以较低级别运行,通常用于创建可启动的USB驱动器、克隆磁盘和生成随机数据等任务,本文... 目录简介功能和能力语法常用选项示例用法基础用法创建可启动www.chinasem.cn的 USB 驱动

关于Java内存访问重排序的研究

《关于Java内存访问重排序的研究》文章主要介绍了重排序现象及其在多线程编程中的影响,包括内存可见性问题和Java内存模型中对重排序的规则... 目录什么是重排序重排序图解重排序实验as-if-serial语义内存访问重排序与内存可见性内存访问重排序与Java内存模型重排序示意表内存屏障内存屏障示意表Int

如何测试计算机的内存是否存在问题? 判断电脑内存故障的多种方法

《如何测试计算机的内存是否存在问题?判断电脑内存故障的多种方法》内存是电脑中非常重要的组件之一,如果内存出现故障,可能会导致电脑出现各种问题,如蓝屏、死机、程序崩溃等,如何判断内存是否出现故障呢?下... 如果你的电脑是崩溃、冻结还是不稳定,那么它的内存可能有问题。要进行检查,你可以使用Windows 11

NameNode内存生产配置

Hadoop2.x 系列,配置 NameNode 内存 NameNode 内存默认 2000m ,如果服务器内存 4G , NameNode 内存可以配置 3g 。在 hadoop-env.sh 文件中配置如下。 HADOOP_NAMENODE_OPTS=-Xmx3072m Hadoop3.x 系列,配置 Nam

禁止复制的网页怎么复制

禁止复制的网页怎么复制 文章目录 禁止复制的网页怎么复制前言准备工作操作步骤一、在浏览器菜单中找到“开发者工具”二、点击“检查元素(inspect element)”按钮三、在网页中选取需要的片段,锁定对应的元素四、复制被选中的元素五、粘贴到记事本,以`.html`为后缀命名六、打开`xxx.html`,优雅地复制 前言 在浏览网页的时候,有的网页内容无法复制。比如「360

JVM内存调优原则及几种JVM内存调优方法

JVM内存调优原则及几种JVM内存调优方法 1、堆大小设置。 2、回收器选择。   1、在对JVM内存调优的时候不能只看操作系统级别Java进程所占用的内存,这个数值不能准确的反应堆内存的真实占用情况,因为GC过后这个值是不会变化的,因此内存调优的时候要更多地使用JDK提供的内存查看工具,比如JConsole和Java VisualVM。   2、对JVM内存的系统级的调优主要的目的是减少

JVM 常见异常及内存诊断

栈内存溢出 栈内存大小设置:-Xss size 默认除了window以外的所有操作系统默认情况大小为 1MB,window 的默认大小依赖于虚拟机内存。 栈帧过多导致栈内存溢出 下述示例代码,由于递归深度没有限制且没有设置出口,每次方法的调用都会产生一个栈帧导致了创建的栈帧过多,而导致内存溢出(StackOverflowError)。 示例代码: 运行结果: 栈帧过大导致栈内存

理解java虚拟机内存收集

学习《深入理解Java虚拟机》时个人的理解笔记 1、为什么要去了解垃圾收集和内存回收技术? 当需要排查各种内存溢出、内存泄漏问题时,当垃圾收集成为系统达到更高并发量的瓶颈时,我们就必须对这些“自动化”的技术实施必要的监控和调节。 2、“哲学三问”内存收集 what?when?how? 那些内存需要回收?什么时候回收?如何回收? 这是一个整体的问题,确定了什么状态的内存可以

NGINX轻松管理10万长连接 --- 基于2GB内存的CentOS 6.5 x86-64

转自:http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=190176&id=4234854 一 前言 当管理大量连接时,特别是只有少量活跃连接,NGINX有比较好的CPU和RAM利用率,如今是多终端保持在线的时代,更能让NGINX发挥这个优点。本文做一个简单测试,NGINX在一个普通PC虚拟机上维护100k的HTTP