Learn more about embedded programming

Can read the embedded problem from PC machine programming, that is the first step; learn to use embedded programming ideas, that is the second step; use PC ideas and embedded ideas combined, applied to the actual project, then It is the third step. Many friends have switched from PC programming to embedded programming. In China, the friends of embedded programming are rarely graduated from computer science, and they all graduated from automatic control and electronic related majors. These children's shoes have strong practical experience, but lack of theoretical knowledge; a large part of computer science graduated children's shoes to get online games, web pages, higher-level applications independent of the operating system. Also not very willing to engage in the embedded industry, after all, this road is not good. They have strong theoretical knowledge, but lack of relevant knowledge such as circuits. Learning in embedded needs to learn some specific knowledge, which is more difficult to walk. Although I have not done industrial surveys, from the perspectives I have seen and recruited, engineers working in the embedded industry either lack theoretical knowledge or lack practical experience. Very few have both. The reason is still the problem of university education in China. This issue is not discussed here to avoid war of words. I want to list a few examples of my practice. Causes everyone to pay attention to some issues when doing projects in embedded. The first question: Colleagues developed a serial port driver under uC/OS-II. The driver and interface were found to be in the test. A communication program has been developed in the application. The serial port driver provides a function for querying the drive buffer characters: GetRxBuffCharNum(). The high level needs to accept a certain number of characters before parsing the package. The code written by a colleague is represented by pseudo code as follows: bExit = FALSE; do { if (GetRxBuffCharNum() >= 30) bExit = ReadRxBuff(buff, GetRxBuffCharNum());} while (!bExit); This code determines the current More than 30 characters in the buffer, all characters in the buffer are read into the buffer until the read is successful. The logic is clear and the ideas are clear. But this code is not working properly. If it is on a PC, there is no problem at all, and the work is abnormal. But it is really unknown in the embedded. Colleagues are very depressed, don't know why. Come and ask me to solve the problem. When I saw the code, I asked him, how is GetRxBuffCharNum() implemented? Open a look: unsigned GetRxBuffCharNum(void){ cpu_register reg; unsigned num; reg = interrupt_disable(); num = gRxBuffCharNum; interrupt_enable(reg); return (num);} Obviously, due to the interruput_disable() and interrupt_enable in the loop Between () is a global critical area, guaranteeing the integrity of gRxBufCharNum. However, because in the outer do { } while() loop, the CPU frequently closes the interrupt and opens the interrupt, which is very short. In fact, the CPU may not respond properly to the UART interrupt. Of course, this is related to uart's baud rate, the size of the hardware buffer, and the speed of the CPU. The baud rate we use is very high, about 3Mbps. The uart start signal and the stop signal occupy one bit. One byte takes 10 cycles. A baud rate of 3 Mbps requires approximately 3.3 us to transfer one byte. How many CPU instructions can 3.3us execute? The 100MHz ARM can execute about 150 instructions. How long does it take to close the interrupt? Generally, ARM needs more than 4 instructions to turn off the interrupt, and there are more than 4 instructions. The code that receives the uart interrupt is actually more than 20 instructions. Therefore, in this way, there is a possibility of a bug in which communication data is lost, which is reflected in the system level, that is, communication is unstable. Modifying this code is actually very simple, the easiest way is to modify it from a high level. That is: bExit = FALSE; do { DelayUs(20); //delay 20us, generally use empty loop instruction to implement num = GetRxBuffCharNum(); if (num >= 30) bExit = ReadRxBuff(buff, num);} while ( !bExit); This way, the CPU has time to execute the interrupted code, thus avoiding the interruption of the interrupt code caused by frequent shutdown of the interrupt, and the resulting information is lost. In embedded systems, most RTOS applications do not have a serial port driver. When you design your own code, you don't fully consider the combination of code and kernel. Causes deep problems in the code. The RTOS is called RTOS because of the fast response to events; the fast response of events depends on the CPU's response to interrupts. Drivers are highly integrated with the kernel in Linux systems, running together in kernel mode. Although RTOS can't copy the structure of linux, it has certain reference significance. As you can see from the above example, embedded developers need to understand the various aspects of the code. The second example: a colleague drives a 14094-string chip. The serial signal is analoged with IO because there is no dedicated hardware. Colleagues handwritten a driver, and the results were debugged for 3 or 4 days, and there is still a problem. I really can't stand it anymore, just look at it, the parallel signal of the control is sometimes normal and sometimes not normal. I looked at the code. The pseudo code is probably: for (i = 0; i < 8; i++) { SetData((data >> i) & 0x1); SetClockHigh(); for (j = 0; j < 5 ; j++); SetClockLow();} sends 8 bits of data in sequence from bit0 to bit7 at each high level. It should be normal. Can't see the problem? I thought about it carefully. I saw the data sheet of 14094 and I understood it. It turns out that 14094 requires that the high level of clock lasts 10 ns, and the low level lasts for 10 ns. This code does a high-level delay and does not have a low-level delay. This code is fine if the interrupt is inserted between low levels. However, if the CPU does not interrupt when it is inserted at low level, it will not work properly. So it's good or bad. The modification is also simple: for (i = 0; i < 8; i++){ SetData((data >> i) & 0x1); SetClockHigh(); for (j = 0; j < 5; j++); SetClockLow() ; for (j = 0; j < 5; j++);} This is completely normal. But this is still a code that can't be well ported, because the compiler can optimize the loss of these two delay loops. If it is lost, it cannot guarantee that the high level and low level will last for 10 ns, and it will not work properly. Therefore, the real portable code should be made into a nanosecond DelayNs (10); like Linux, when powering up, first measure, how long does the nop instruction take to execute, how many nop instructions Execute 10ns. Execute a certain nop command. Use compilers to prevent optimized compilation instructions or special keywords to prevent delay loops from being optimized by the compiler. For example, __volatile__ __asm__("nop;") in GCC; as you can clearly see from this example, writing a good piece of code requires a lot of knowledge. What do you say? Embedded often has no operating system support, or because of operating system support, but because of various restrictions, the operating system provides very few features. Therefore, a lot of code can't be as free as PC programming. Today, I will talk about the problem of memory allocation. Memory fragmentation may not be familiar to everyone. However, in embedded systems, the most feared is memory fragmentation, which is also the number one killer of system stability. I used to do a project. There are a lot of malloc and free in the system, ranging in size from 60 bytes to 64KB. Use an RTOS as a support. At the time, I had two choices, one was malloc and free using the C system library, and the other was using the fixed memory allocation provided by the operating system. The design of our system requires stable operation for more than 3 months. In fact, it took about 6 days to run continuously. All kinds of problems have been suspected. Finally, in the memory allocation, it is actually a long time. After a large amount of memory allocation, the system's memory becomes fragmented and cannot be continuous. Although there is a large space, it cannot allocate continuous space. When there is a large space to apply, it can only be a chance to finish. In order to make the system meet the original design requirements, we simulated the entire hardware on the PC, ran the embedded code on the PC, and overloaded malloc and free, and made a complicated statistical program. Count the memory behavior of the system. After running for a few days, the data is extracted and analyzed. Although the application memory is very busy, there are still some rules. We classify the following 100 bytes into one class, 512B into one class, and 1KB as one. Classes, 2KB are classified into one class, and 64KB are classified into one class. Count the number of each category and add a 30% margin to the original one. Making a fixed memory application makes the system run continuously for a long time. Embedded is like this, not afraid of the original method, it is afraid that the performance is not up to the requirements. Memory overflow problem, memory overflow problem Embedded system is more terrible than PC system! It is often undetected and overflows. It's hard to think of it, especially the beginners of C/C++, who are unfamiliar with the pointers and can't check them. Since the PC system has an MMU, when the memory is severely out of bounds, there is MMU protection, which will not cause serious disaster consequences. Embedded is often without MMU, the difference is very big, the system code is destroyed and can run. Only God and the CPU know what it is. Let's take a look at this code: char *strcpy(char *dest, const char * src){ assert(dest != NULL && src != NULL); while (*src != '\0') { *dest++ = *src++; } *dest = '\0'; return (dest);} This code is a string copy of the code, the PC is written like this, basically it can. But embedded should be wary of one thing, that is, src really ends with '\0'. If it is not, then it is a tragedy. When will it end, huh, huh, only the old man of God knows. If this code is lucky enough to run, it is estimated that the program will not run normally. Because the memory area pointed to by dest is almost destroyed. In order to be compatible with the standard C / C + + library, there is really no good way, so this problem can only be left to the programmer to check. The same, memcpy ( dest, src, n); memory copy the same problem, beware of n passing a negative value. This is how many bytes are copied, and negative values ​​are cast to positive. Become a large positive number, causing the memory after dest to be completely destroyed... The memory pointer in the embedded must be strictly checked before use, and the size of the memory must be strictly debugged. Otherwise, tragedy is hard to avoid. Such as a function pointer, although a NULL, 0 is assigned in the embedded. If it is ARM, there is no abnormal error, and it is directly reset, because calling this function pointer even makes the code run from 0. And 0 is the location of the first code that runs after the ARM is powered up. This is especially true on ARM7. This tragedy is more tragic than on the PC, and the MMU is sure to give an error with no defined instructions. Caused the attention of programmers. In the embedded, all are left to the programmer to find. Memory overflow occurs at any unintended moment, how much heap do you allocate to the entire front-end system (or operating system)? How big is the stack? What is the calling depth of the system under normal circumstances (maximum is how much), how many stacks do you use? It is not enough to look at the function of the program correctly. You also need to count these parameters. Otherwise, as long as there is a place there is overflow. It is fatal to the system. Embedded systems require long continuous system operation and stringent reliability requirements. It takes some time to carefully grind these systems.

The debugging of embedded systems is often complicated, and the available means are not as much as PC programming. The development cost is much larger than that of PC systems. The main means of debugging the embedded system is only the single step tracking represented by JTAG, the printf clipping method and so on.

These two debugging methods are not all in the embedded solution. Jtag requires the debugger to have a debugging device (which can be expensive) that is connected to the target system. Use a software such as GDB Client to log in to the debugging device and track the running program. To be honest, this method is the ultimate debugging method for embedded, and it is also a good debugging method. However, there are still several shortcomings. When there are too many breakpoints, beyond the hardware limit, some low-end CPUs do not support more breakpoints, so you need JTAG to use software emulation, or use software traps (soft interrupts or exceptions). Implement breakpoints. The mechanism is more complicated. To put it simply, 1. It can't be debugged for a long time, and it is not stable. 2. It may affect the behavior of the program at runtime, and it will affect the timing. After the JTAG system is attached, the hardware implementation of the breakpoint does not affect the speed of the system, but the software implementation of the breakpoint must sacrifice some performance. Reliability is also compromised. When there are too many breakpoints and the system enters the critical area again, the breakpoint may not work. Because the embedded implementation of the global critical region often needs to close the interrupt, some CPUs have no non-maskable interrupts. When the breakpoint exceeds a certain number, the software breakpoint is used, and the software breakpoint needs to be used in the case of interrupted operation... Special debugging timing problem And the code of the high-speed communication class, JTAG help is not big. The communication process is often very fast, and the communication package is also connected one after another in order to complete a complete action. If it is high-speed communication, the breakpoint is unable to get the program to work. So you can only use the printf killing method, printf killing method is very good. But also pay attention to a few issues: embedded systems often have no screen, printf output is output through the serial port. There are two types of serial port working modes, one is query, the other is interrupt, or DMA. Either way, the printf of the debug output can only be output using the query method. Never use interrupt or DMA. Whether it is the front-end program or the operating system, it is inconvenient, maybe you need to print in the global critical (close the interrupt), you may need to print in the interrupt (not allowed nested interrupt), maybe Print in some drivers (many of the matching devices are not initialized, memory allocation and interrupts do not work well). In these cases, it is not wise to use Uart to interrupt the output characters. So the debug output can only use the query method. Don't fantasize about the way you use the fork, you don't have to. In a word, not reliable! Since debugging, the reliable output is the first requirement. Because of this, printf will also affect the working efficiency of the code. The highest baud rate of the serial port is 115200bps. The faster the CPU is, the more time is wasted, because it needs to wait for the output of the last character to be completed. This time is completely consumed by idling. time. So using printf requires some tricks to print again without affecting some of the key timings, rather than being arbitrarily frustrated... flooding the bug. The above two methods can not solve all the problems well. In practice, if the embedded system has one or two LED lights, try to use the IO port to turn it off under special circumstances, which can also indicate the program. status. This approach is suitable for debugging interrupts, critical areas and these issues. The time required to point the LED light is very short, basically a memory read and write command, if the IO port register is the CPU unified address. The effect is basically minimal. When debugging some complicated timings, you can also use the free IO port, pull it down in a special case, pull it up, and then use a digital oscilloscope or logic analyzer to grab and analyze it. In particular, analyze the execution frequency, execution time, optimization effect, etc. of a piece of code. It has a great significance for the overall performance improvement. For simple microcontrollers, the vendor development software has a timing statistics function. But for microcontrollers with cache and MMU, timing statistics are not accurate, and often not as accurate as those measured with an oscilloscope. If you don't have an oscilloscope to use the time counter inside the CPU, you can also calculate the time. You need to use it in combination with printf. I am a colleague, debugging Philips ARM7, because the Philips ARM7 external expansion RAM is all static RAM, even in the case of CPU crash, as long as the power is off, the data in the SRAM will not be lost, due to SRAM and internal SRAM unified addressing Therefore, access is also a read and write instruction, which is very fast. Using this feature, he marked all the modules and points of the program. When the system is not working properly, after ARM7 is reset, the first work of ARM7 power-on is to take out the data before resetting and print it out. This makes it possible to debug the ARM7 code, a very clever way. If only SDRAM friends can't use this method, because as long as the system is reset, the SDRAM is not refreshed and the data will be lost.

Everyone knows that the biggest challenge of embedded is that the hardware and software mature at the same time; there is a problem, I don't know whether it is a software problem or a hardware problem. Of course, most problems can be solved in a virtual way, but the virtual is ultimately virtual. Not actual, on the actual board, there are still many problems. The embedded field, especially the underlying technology, consists of two parts: software (driver) and hardware. To solve this problem, two parts of knowledge are needed, and the quality of personnel is higher. I have encountered many difficult problems, all of which are complex system problems. 1, a system requires continuous 24 hours of work, even if the power is off, you must save the power off state. When the power supply is normal, it is necessary to restore the state before the power failure and continue working. In practice, we also did the software, but the actual effect is not what we think. Ten thousand power cuts, there are always dozens of times abnormal; there is no way to reproduce, only guess and guess. Because the system is powered off, this is not easy to debug, JTAG is hung, the system is now powered off, and the target board is out of power. There is no way to debug the tracking step. The original design idea is that the control circuit uses some of the energy stored by the capacitor to continue working after the power is turned off, saves the state, and saves it, then enters the standby state. After testing the signal to detect the power failure, there is no problem. Later, this problem became a suspense problem... This system is divided into two modules, a working module and a control module. The control module has capacitors to continue to supply power, and the working module has no capacitors to work; so when a power outage occurs, the entire system is not powered off at the same time. When the control module detects a power failure, the working module is actually out of power early, so the working module cannot correctly transmit the relevant data back, causing the control module to fail to work properly. The timing of the two power outages is very close and it is impossible to judge the sequence. The solution is also very simple, that is, the power-off detection module is synchronized with the working module, and there is no problem. 2, or the problem of power failure protection, we use the relay to simulate the situation of power failure after tens of thousands of normal, finally on the whole machine experiment, the results often find that the power failure can not be properly protected. There is nothing unusual about looking at the circuit carefully, it's the same. As a result, the engineering department accused us that the R&D department did not carefully test it, and the things that were sent out were all problematic. Oh, sad. After careful analysis, we believe that the possibility of software anomalies is very small. The main problem is still in the hardware, the super capacitor on the hardware may not store enough energy under frequent power failure, so that the system completes the protection process. So what is causing frequent power outages? According to the design requirements, the super capacitor will be filled with 80% energy in 3~5s, which is theoretically enough. What else will be less than 3~5s of frequent power outages? It is incredible to say it, and it is discovered by using a digital oscilloscope to continuously track the power of the control board. It turns out that three-phase AC requires a phase protector, and phase protection will switch frequently when the system is working (may be related to the state of the system). The solution is to simply connect the controller's power supply to the front of the phase protector. These problems seem to be hardware problems, and they are also frequently encountered in the debugging process of products. These problems require the software staff to confirm whether the bug in the software can cause this situation, and then the hardware engineer needs to confirm the hardware. Of course, the hardware verification process is long and complicated, and the debugging methods are very limited; the debugging of embedded software is better than the hardware, and the cost and efficiency will be better. Therefore, it is often necessary for embedded software personnel to spend a lot of time to confirm software problems, and finally to suspect hardware. As an embedded developer, it is a very effective way to understand the basic principles of hardware, combine the working principles of software, and work with hardware engineers to locate errors. Some friends on the Internet often ask me some questions. There is a lot of multi-processor issues related to the underlying knowledge. On the issue of multi-processors, I have also learned a lot, let's talk about it, about multi-CPU applications in the embedded field. Embedded is one of the application areas of computer science. Since it is one of the application fields of computational science, to do this field, you must have a strong knowledge of computer theory.

First of all, there are several types of processors. The processors are the same model. Everyone is exactly the same, connected by a communication method, such as multi-port RAM, rapidIO, Gigabit Ethernet, or PCI-E; the processor is different. Models and even architectures are completely different. Connected by a communication method, the same as above, such as multi-port RAM, RapidIO, etc.; multiple CPUs are integrated in the same chip. These CPUs share everything and belong to a system that is more tightly coupled than multi-port RAM. Why use multiple processors? Large-scale parallel computing; want to take advantage of multiple CPU features, such as the DM642, applied to complex video solutions. Want to take advantage of DSP's floating-point computing capabilities, but also use ARM's transactional computing capabilities; simply improve system performance. For general applications, improving system performance is the basic starting point. But the application of multiprocessors in embedded systems is not an easy task. Multi-processor software design is very difficult, and debugging is also a big problem. If you do not use operating system processing, use the front and back system. Then you have to design a communication algorithm and design a result integration system. Such systems design a lot of things on their own, and the reliability and fault-tolerant design of the bus is crucial. So if possible, using a mature and stable operating system to support multiple processors can reduce the development difficulty. However, finding such an operating system is not an easy task. First of all, to identify your own application, do you need a thread process migration? Need processor balancing? For multiprocessors, if you do not support threaded process migration, then you can't talk about the dynamic balance of the processor task, or you can only specify which processor the thread process runs on beforehand. For heterogeneous multiprocessors, thread migration and process migration do not have much practical significance. For companies pursuing interests, there is still no practical value. Therefore, migration is limited to symmetric processors. However, symmetric processors are not a process that can be migrated. For a symmetric processor, the operating system encapsulates the underlying layer, allowing the user to develop it like developing a CPU. Of course, it is impossible to be completely consistent with a single CPU, but at least it is much less difficult. Many friends asked me if RTEMS can run on CMP multiprocessors like x86? of course. However, the design is different from the ordinary symmetric multiprocessor. Because the CPU on the CMP processor shares a lot of things, interrupts, memory, and bus, their addressing space is basically the same. For an RTOS such as RTEMS, it uses a heterogeneous way to support symmetric processors, that is, there are several CPUs that have to run several RTEMS. Then communication is particularly important, multiple RTEMS need multiple system TICK, then where TICK comes from, CMP shares a lot of resources, then the user must specify the interrupt source for RTEMS manually, divide the memory space, which causes However, although many CPUs on the CMP are running RTEMS, many of the drivers for the CPU are different. This tightly coupled system is very difficult to handle.

Compared with CMP, the SMP of the same kind of CPU is simpler, because all the drivers are the same, and the communication driver may be specially treated because of the communication method, but this will greatly reduce the pressure of development and The difficulty of debugging. It’s always better than a Core per CPU, and that’s going to crash. Especially the debugging problem, so from the economic point of view, I still prefer this multi-processor system composed of multiple identical single CPUs.

Many times, for the heterogeneous processor, of course, RTEMS can be easily settled, but it is still a problem. Multiple cores need their own RTEMS support, and development is inconvenient. Moreover, the debugging of the operating system is still relatively complicated. Therefore, the real version of the scheme is that the processor responsible for the transaction operation runs the operating system among the heterogeneous processors, and the processor responsible for the calculation adopts the front and back system, and simply responds to the calculation request of the operating system through the shared memory communication. This greatly reduces the difficulty of development. Anyway, the operating system treats the DSP as a hardware register, writes a few registers to get the result, or enters a set of astronomical data to get a complex result. Anyway, in short, is a reactive approach that is used in most projects. It is simple, reliable and practical. It seems that multiprocessors in embedded systems are still highly relevant to applications.

This article is reproduced from the network, please contact us if you are interested in copyright.

HDI PCB

HDI circuit board is a circuit board with high circuit distribution density using micro-blind and buried via technology.The HDI board is divided into inner layer circuit and outer layer circuit, and the internal connection of each layer circuit is realized by drilling and metallization.


Therefore, the higher the application of HDI technology, the higher the manufacturing level of the laminate. Ordinary HDI boards are laminated once, and high-end HDI uses two, three or more lamination processes, as well as electroplating fill holes, stack holes, laser direct drilling, etc.

We are a large professional printed circuit board manufacturer and also have the ability to produce multi-layer HDI circuit boards.

HDI PCB Manufacturer,Multilayer HDI PCB,Blind Buried via plate,HDI Plate,ENIG Surface Finished

Huizhou Liandajin Electronic Co., Ltd , https://www.ldjpcb.com

Posted on