-
Language is forgiving, Regex isn't
Some background context I’ve watched the early failings of LLMs as things have progressively improved. Many people point to overconfident hallucinations and inaccuracies. I also noticed they were bad at regular expressions. In the early stages of LLMs being kind of a ‘toy,’ none of this mattered to me much. And then we started seeing ‘AI’ being ingested into the pipelines of real products. Yeah there were still overconfident hallucinations used for summaries of things. Continue reading →
-
ARM 12-bit Immediates are Too High Level
The below post will be talking in the context of 32-bit ARMv7. A video that supplements this content can be found here: [youtu.be/4PmjTFgEy...](https://youtu.be/4PmjTFgEybI) The Immediate Issue Those familiar with the 'Immediate' form of many ARM instructions may know that these 12 bits to encode the immediate value aren't as simple as it may seem at first glance. For a point of reference, we will use the MOV instruction as an example, as in MOV r0, #1337. Continue reading →
-
Invoke is Too High Level
(or another perspective on the Invoke vs Call argument) The video version (for the illiterate) can be found at: [youtu.be/QyjXBv3sq...](https://youtu.be/QyjXBv3sqRY) I'm in the process of re-certifying for the GREM certification (GIAC Reverse Engineering Malware). Although I'm pretty good with assembly language in a handful of architectures (Motorolla, x86, propeller, and ARM), my skills are shit with Windows and its APIs. In the context of GREM and static code analysis goes, I still have a ways to go; a 'not seeing the forest for the trees' issue. Continue reading →
-
ARM Assembly is Too High Level - ROR and RRX
Note: If you prefer video format to reading stuff, there's a companion video for this: https://youtu.be/ONQLWdd5nuc Looking at instruction encodings, 'ROR r0, #0' should be the same as 'RRX r0, r0'. Let's first take a look at the encoding for the ROR instruction: So Rm gets rotated imm5 places and gets stored into Rd Now let's look at the encoding for RRX: Note that the encoding is identical to ROR, with the exception that the imm5 field is harcoded to 0. Continue reading →
-
ARM Assembly is Too High Level - Moving by Shifting
The syntax for the register form of a Logical Shift Left is: LSL{S}c Rd, Rm, #imm5 This will take a value stored in the source register of Rm and shift the bits left by the amount defined in #imm5. The result of this operation is stored in the destination register of Rd. Like many instructions on ARM, you can make it conditional defined by a condition code (c) and define if you want the flags to be set as well with {S}. Continue reading →
-
sed-regex Based BrainFuck Compiler
BrainFuck is an 'esoteric' programming language with only 8 one-character instructions. I've used it here-and-there for well over a decade. I love minimalist languages, so RISCy. A brainfuck environment operates on a large array of data. There's an instruction to move the pointer in this array forwards and backwards and to increment or decrement it's value...that's already half the language. There's also an instruction for input or output of 1 character. Continue reading →
-
Assembly is Too High Level - Commutative Property, Sometimes - it may save your byte
I remember learning these properties in basic algebra: Associative, Distributive, and Commutative. It's the Commutative property that states that a + b = b + a. The same principle is true with multiplication. In x86 pointer math, of course the results of these operations follow the commutative property; that's just math. However, the machine encoding doesn't consistently take this into account. To be facetious with the blog title image, machine code takes apple color into account most of the time, assembly language just looks at the number of apples. Continue reading →
-
Assembly is Too High-Level - Signed Displacements
For those that don't know about unsigned and signed data types, it's not all that complicated. One byte can hold a total of 256 possible values. If these values were only positive numbers and included zero, we would have a number range of 0-255. But what if we wanted negative numbers? The byte is divided; we now have a range of -128 through 127. When including zero, this is all 256 possible values. Continue reading →
-
Boot Sector Graphical Programming - Tutorial
This tutorial is aimed at those that have some assembly experience, but very minimal 16-bit BIOS programming experience, in other words; a short list of some of my friends that I want to coerce into doing some BIOS programming. Assembling: Qemu Assemble source: nasm yourboot.asm -f bin -o yourboot.bin Run with qemu: qemu tronsolitare.bin Run with qemu (alternate): qemu-system-i386 -hda yourboot.bin VirtualBox Create floppy image: Use this padding in 2nd to last line of code: 1440 * 1024) - ($ - $$) db 0 (instead of times 510-($-$$) db 0) Continue reading →
-
CactusCon Slides - Machining - A Love Story
Here is the full ~6Mb image that I used as my slide deck within MS Paint in Windows 3.1 for my CactusCon 2016 presentation: Machining, A Love Story. Below the large image are all the images again, slide-by-slide, with brief notes; so there can be some context. All non-screenshot art done by KRT c0c4!N (my lovely girlfriend), it should be noted that I limited her to 16 colors with a specific pallet. Continue reading →
-
Assembly is Too High Level - Repetition of REP Instructions That Don't Repeat Anything
The REP (Repeat String Operation) is a pretty cool prefix; It modifies a single string instruction to repeat until the ECX register reaches zero. As this only applies to one instruction (as apposed to a block of code), ECX needs a way to decrement, REP automatically decrements ECX by 1 each execute of the string operation instruction. So the idea is to set ECX to the amount of times you want the string operation to execute and the run the string operation with the REP prefix. Continue reading →
-
Assembly is Too High Level - Why ESP doesn't scale - But EBP can still Base
The main 8 general purpose registers are EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI. In that order. You will see this structure in a lot of places. I will give some examples below, but it is in no way exhuastive; I just wanted to show some variety. There's the B0-B7 and B8-BF MOV instructions where the 2nd hex digit defines which register to receive an immediate value, notice that the registers are in the order described above. Continue reading →
-
Assembly is Too High Level - Redundant Bit commands
Compared to some of the most recent posts in this series, this one is a pretty basic example of a redundancy. This redundancy applies to the bit shifting instructions of: RCL, RCR, ROL, ROR, SAL, SAR, SHL, and SHR. These instructions can take an 8-bit immediate value, but there is also dedicated encoding for the operand to just be the value '1'. This is a very common operand for these type of instructions anyway, so it makes sense. Continue reading →
-
Assembly is Too High Level - SIB Doubles
I'm finding that there is a full playground in the ModR/M encoding, and this post is specifically about a SIB obscurity, only because of the way I see NASM assembling some of my assembly. Then I found other cool things NASM puts up with Consider this code: Functionally, they both result in the same thing. There is even seperate machine-code to accurately represent both (kind of). But if we assemble it, we end up with this: Continue reading →
-
Assembly is Too High Level - Load InEffective Address
The LEA (Load Effective Address) instruction allows us to copy the address of a memory location (in the memory addressing format you would find in ModR/M encoding) into a register. This instruction is also often used as a multiplication math hack used in place of MUL when LEA can be used instead. With the memory (pointer) encoding of the ModR/M byte (and the SIB) bytes, we are able to add 3 different numbers (two of them registers, one of them an immediate), and one of those numbers (one of the registers) can be multiplied by 2,4, or 8. Continue reading →
-
Assembly is Too High-Level - TEST r32, r-m32, exists in assembly, but not the machine
...And the TEST r32, r/m32 that exists in assembly is more just kind of a lie... An interesting thing about instructions that use the ModR/M encoding is that both the source and destination operands can be a register, but not both can be a memory location. When it comes to the registers, this has been the source of a lot of cool redundancies. This post is about a cool memory encoding redundancy though. Continue reading →
-
Assembly is Too High-Level - BSWAPin 16-bit Registers
But what actually happens? As it turns out, the Intel manual is correct in stating that you should use xchg instead of bswap. In practice, it's hard to say the result of this 16-bit bswap is 'undefined;' as it is consistent with what it does each time. Instead of swapping the contents of ah and al 8-bit registers within ax, it actually just clears the register to 0x00. Continue reading →
-
Assembly is Too High-Level - Self Modifying Code with Basic Arithmetic
I should say that we are able to do this trick all in assembly, but none of it would make sense without an understanding of machine code. This post is about simple self-modifying code tricks you can do with addition and subtraction to an instruction to make it another instruction, while also maintaining consistent addressing modes and operands (I.E. adding 8 to the 2nd byte of machine code of 'add bl, 5' would become 'or bl, 5'). Continue reading →
-
Follow-up on creating Vm0wd2Qy - 9000
This is referring to my previous post on Vm0wd2Qy, and a clarification of how I got my results. If you repeatedly Base64 encode a string, you will eventually get Vm0wd2Qy as the first part of your string. In my previous post, I have 10,000 characters that you would eventually get as the first part of your string if you keep doing this. My process for getting these 10,000 characters involved a kind of brute force, but with the obvious assistance of scripting and cli stuff. Continue reading →
-
Assembly is Too High Level - Jump Near When Short
This article is about a redundancy with short and near jumps. Both of the jumps I will talk about are relative; the immediate data after the jump instruction is a signed offset for how far to jump. The difference between a short and near jump is simple: the 0xeb short jump has a byte for its operand, and the 0xe9 near jump has 4 bytes for it's operand. This means that we can jump -128 through 127 bytes with a short jump and -2,147,483,648 through 2,147,483,647 bytes with a near jump. Continue reading →